no code implementations • ICML 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.
no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant
In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.
no code implementations • 17 Jan 2024 • Yihan Du, R. Srikant, Wei Chen
In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.
no code implementations • 6 Jul 2023 • Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang
Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk.
no code implementations • 9 Feb 2023 • Yihan Du, Longbo Huang, Wen Sun
In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.
no code implementations • 16 Nov 2022 • Yihan Du, Siwei Wang, Longbo Huang
DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.
no code implementations • 6 Jun 2022 • Yihan Du, Siwei Wang, Longbo Huang
For Worst Path RL, we propose an efficient algorithm with constant upper and lower bounds.
no code implementations • 16 Feb 2022 • Yihan Du, Wei Chen
In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model.
no code implementations • 29 Oct 2021 • Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang
In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.
no code implementations • NeurIPS 2021 • Yihan Du, Yuko Kuroki, Wei Chen
For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor).
no code implementations • NeurIPS 2021 • Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang
To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.
no code implementations • 14 Dec 2020 • Yihan Du, Siwei Wang, Longbo Huang
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.
no code implementations • 23 Jun 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.
no code implementations • 14 Jun 2020 • Yihan Du, Yuko Kuroki, Wei Chen
In this paper, we first study the problem of combinatorial pure exploration with full-bandit feedback (CPE-BL), where a learner is given a combinatorial action space $\mathcal{X} \subseteq \{0, 1\}^d$, and in each round the learner pulls an action $x \in \mathcal{X}$ and receives a random reward with expectation $x^{\top} \theta$, with $\theta \in \mathbb{R}^d$ a latent and unknown environment vector.
no code implementations • 7 Feb 2020 • Yihan Du, Yan Yan, Si Chen, Yang Hua
This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods.
no code implementations • CVPR 2019 • Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu
Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications.