TY - GEN
T1 - Bandit online optimization over the permutahedron
AU - Ailon, Nir
AU - Hatano, Kohei
AU - Takimoto, Eiji
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2014.
PY - 2014
Y1 - 2014
N2 - The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i n=1 πt(i)st(i).We study the problem in two regimes. In the first regime, st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(n√T log n) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret O(n3/2√T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.In the second regime, st is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of O(n√T), but for a more restricted space of possible loss vectors.
AB - The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),…, π(n)) for all permutations (bijections) π over {1,…, n}. We study a bandit game in which, at each step t, an adversary chooses a hidden weight weight vector st, a player chooses a vertex πt of the permutahedron and suffers an observed instantaneous loss of ∑i n=1 πt(i)st(i).We study the problem in two regimes. In the first regime, st is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(n√T log n) after T steps. Unfortunately, CombBand requires at each step an n-by-n matrix permanent computation, a #P-hard problem. Approximating the permanent is possible in the impractical running time of O(n10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret O(n3/2√T) but with more realistic time complexity O(n3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters.In the second regime, st is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of O(n√T), but for a more restricted space of possible loss vectors.
UR - http://www.scopus.com/inward/record.url?scp=84910050330&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84910050330&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-11662-4_16
DO - 10.1007/978-3-319-11662-4_16
M3 - Conference contribution
AN - SCOPUS:84910050330
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 215
EP - 229
BT - Algorithmic Learning Theory - 25th International Conference, ALT 2014, Proceedings
A2 - Auer, Peter
A2 - Clark, Alexander
A2 - Zeugmann, Thomas
A2 - Zilles, Sandra
PB - Springer Verlag
T2 - 25th International Conference on Algorithmic Learning Theory, ALT 2014
Y2 - 8 October 2014 through 10 October 2014
ER -