no code implementations • NeurIPS 2016 • Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal
Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist.