TY - JOUR

T1 - Bandit online optimization over the permutahedron

AU - Ailon, Nir

AU - Hatano, Kohei

AU - Takimoto, Eiji

N1 - Funding Information:
We thank anonymous reviewers for useful comments. Also, we thank Ryohei Nagaura for insightful discussion. Ailon acknowledges the generous support of a Marie Curie Reintegration Grant PIRG07-GA-2010-268403 , an Israel Science Foundation (ISF) grant 127/133 and a Jacobs Technion–Cornell Innovation Institute (JTCII) grant. Hatano is grateful to the supports from JSPS KAKENHI Grant Number 25330261 and CORE Project Grant of Microsoft Research Asia . Takimoto is grateful to the supports from JSPS KAKENHI Grant Number 23300033 and MEXT KAKENHI Grant Number 24106010 .
Publisher Copyright:
© 2016 Elsevier B.V.

PY - 2016/10/18

Y1 - 2016/10/18

N2 - The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…,π(n)) for all permutations (bijections) π over {1,…,n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i=1nπt(i)st(i). We study the problem in two different approaches. In the two approaches, we assume that st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al. (2012) guarantees a regret of O(nTlogn) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. In the first approach, we provide an algorithm of slightly worse regret O(n3/2T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett–Luce noisy sorting process's ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters. In the second approach, we present and analyze an algorithm based on Bubeck et al.'s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The second algorithm's running time and regret guarantees are similar to our first algorithm, modulo a numerical line search procedure the running time of which we have not been able to analyze. It is interesting that the two approaches are totally different. The main open problem from this work is whether there exists a bandit algorithm for this problem with both optimal regret of O(nT) and running time of O(n3) for either regime, or there is an inherent tradeoff between the two performance measures.

AB - The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…,π(n)) for all permutations (bijections) π over {1,…,n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i=1nπt(i)st(i). We study the problem in two different approaches. In the two approaches, we assume that st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al. (2012) guarantees a regret of O(nTlogn) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. In the first approach, we provide an algorithm of slightly worse regret O(n3/2T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett–Luce noisy sorting process's ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters. In the second approach, we present and analyze an algorithm based on Bubeck et al.'s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The second algorithm's running time and regret guarantees are similar to our first algorithm, modulo a numerical line search procedure the running time of which we have not been able to analyze. It is interesting that the two approaches are totally different. The main open problem from this work is whether there exists a bandit algorithm for this problem with both optimal regret of O(nT) and running time of O(n3) for either regime, or there is an inherent tradeoff between the two performance measures.

UR - http://www.scopus.com/inward/record.url?scp=84992560374&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992560374&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2016.07.033

DO - 10.1016/j.tcs.2016.07.033

M3 - Article

AN - SCOPUS:84992560374

SN - 0304-3975

VL - 650

SP - 92

EP - 108

JO - Theoretical Computer Science

JF - Theoretical Computer Science

ER -