





















Abstract:We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{\Delta^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $\Delta$ is the minimum revenue gap.
| Comments: | Accepted at ICML 2026 |
| Subjects: | Machine Learning (stat.ML); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.25592 [stat.ML] |
| (or arXiv:2605.25592v1 [stat.ML] for this version) | |
| https://doi.org/10.48550/arXiv.2605.25592 arXiv-issued DOI via DataCite (pending registration) |
From: Joongkyu Lee [view email]
[v1]
Mon, 25 May 2026 08:41:56 UTC (106 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。