no code implementations • 11 Feb 2020 • Adil El Mesaoudi-Paul, Viktor Bengs, Eyke Hüllermeier
We consider an extension of the contextual multi-armed bandit problem, in which, instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives.
no code implementations • 30 Jul 2018 • Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier
The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits.
no code implementations • ICML 2018 • Adil El Mesaoudi-Paul, Eyke Hüllermeier, Robert Busa-Fekete
We also introduce a generalization of the model, in which the constraints on pairwise preferences are relaxed, and for which maximum likelihood estimation can be carried out based on a variation of the generalized iterative scaling algorithm.