Search Results for author: Yihan Du

Found 16 papers, 0 papers with code

Combinatorial Pure Exploration for Dueling Bandit

no code implementations • ICML 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Paper
Add Code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Paper
Add Code

Cascading Reinforcement Learning

no code implementations • 17 Jan 2024 • Yihan Du, R. Srikant, Wei Chen

In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability.

Recommendation Systems reinforcement-learning

Paper
Add Code

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback

no code implementations • 6 Jul 2023 • Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk.

Decision Making LEMMA +2

Paper
Add Code

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations • 9 Feb 2023 • Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning

Paper
Add Code

Dueling Bandits: From Two-dueling to Multi-dueling

no code implementations • 16 Nov 2022 • Yihan Du, Siwei Wang, Longbo Huang

DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.

Vocal Bursts Valence Prediction

Paper
Add Code

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path

no code implementations • 6 Jun 2022 • Yihan Du, Siwei Wang, Longbo Huang

For Worst Path RL, we propose an efficient algorithm with constant upper and lower bounds.

Autonomous Driving reinforcement-learning +1

Paper
Add Code

Branching Reinforcement Learning

no code implementations • 16 Feb 2022 • Yihan Du, Wei Chen

In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model.

LEMMA Recommendation Systems +2

Paper
Add Code

Collaborative Pure Exploration in Kernel Bandit

no code implementations • 29 Oct 2021 • Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.

Decision Making Recommendation Systems +1

Paper
Add Code

Combinatorial Pure Exploration with Bottleneck Reward Function

no code implementations • NeurIPS 2021 • Yihan Du, Yuko Kuroki, Wei Chen

For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor).

Paper
Add Code

Continuous Mean-Covariance Bandits

no code implementations • NeurIPS 2021 • Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

Decision Making

Paper
Add Code

A One-Size-Fits-All Solution to Conservative Bandit Problems

no code implementations • 14 Dec 2020 • Yihan Du, Siwei Wang, Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.

Multi-Armed Bandits

Paper
Add Code

Combinatorial Pure Exploration of Dueling Bandit

no code implementations • 23 Jun 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

Position

Paper
Add Code

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

no code implementations • 14 Jun 2020 • Yihan Du, Yuko Kuroki, Wei Chen

In this paper, we first study the problem of combinatorial pure exploration with full-bandit feedback (CPE-BL), where a learner is given a combinatorial action space $\mathcal{X} \subseteq \{0, 1\}^d$, and in each round the learner pulls an action $x \in \mathcal{X}$ and receives a random reward with expectation $x^{\top} \theta$, with $\theta \in \mathbb{R}^d$ a latent and unknown environment vector.

Paper
Add Code

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

no code implementations • 7 Feb 2020 • Yihan Du, Yan Yan, Si Chen, Yang Hua

This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods.

Computational Efficiency Data Augmentation +3

Paper
Add Code

Direct Object Recognition Without Line-of-Sight Using Optical Coherence

no code implementations • CVPR 2019 • Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu

Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications.

Object Object Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.