TY - GEN
T1 - Budgeted Recommendation with Delayed Feedback
AU - Liu, Kweiguu
AU - Maghsudi, Setareh
AU - Yokoo, Makoto
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.
AB - In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.
KW - Budget Constraints
KW - Delayed Feedback
KW - Online Learning
KW - Resource Allocation
UR - http://www.scopus.com/inward/record.url?scp=85194268181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194268181&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-60221-4_20
DO - 10.1007/978-3-031-60221-4_20
M3 - Conference contribution
AN - SCOPUS:85194268181
SN - 9783031602207
T3 - Lecture Notes in Networks and Systems
SP - 202
EP - 213
BT - Good Practices and New Perspectives in Information Systems and Technologies - WorldCIST 2024
A2 - Rocha, Álvaro
A2 - Adeli, Hojjat
A2 - Dzemyda, Gintautas
A2 - Moreira, Fernando
A2 - Poniszewska-Maranda, Aneta
PB - Springer Science and Business Media Deutschland GmbH
T2 - 12th World Conference on Information Systems and Technologies, WorldCIST 2024
Y2 - 26 March 2024 through 28 March 2024
ER -