no code implementations • 16 Apr 2024 • Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu
To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions.
no code implementations • 29 Feb 2024 • Zijie Huang, Jeehyun Hwang, Junkai Zhang, Jinwoo Baik, Weitong Zhang, Dominik Wodarz, Yizhou Sun, Quanquan Gu, Wei Wang
Real-world multi-agent systems are often dynamic and continuous, where the agents co-evolve and undergo changes in their trajectories and interactions over time.
no code implementations • 13 Feb 2024 • Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu
The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images.
3 code implementations • 7 Nov 2023 • Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu
While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped.
1 code implementation • 17 Aug 2023 • Berke Doga Basaran, Weitong Zhang, Mengyun Qiao, Bernhard Kainz, Paul M. Matthews, Wenjia Bai
Data augmentation has become a de facto component of deep learning-based medical image segmentation methods.
1 code implementation • 17 Jul 2023 • Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci
The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry.
no code implementations • 11 Jul 2023 • Daoan Zhang, Weitong Zhang, Yu Zhao, JianGuo Zhang, Bing He, Chenchen Qin, Jianhua Yao
Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge.
no code implementations • 2 Jul 2023 • Jingjie Guo, Weitong Zhang, Matthew Sinclair, Daniel Rueckert, Chen Chen
In addition, different from most existing TTA methods which restrict the adaptation to batch normalization blocks in the segmentation network only, we further exploit the use of channel and spatial attention blocks for improved adaptability at test time.
no code implementations • 15 May 2023 • Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu
Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$.
no code implementations • 9 May 2023 • Songling Zhu, Ronghua Shang, Bo Yuan, Weitong Zhang, Yangyang Li, Licheng Jiao
This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction to reduce the gap by adjusting the student instead of the teacher.
no code implementations • 28 Mar 2023 • Ronghua Shang, Songling Zhu, Yinan Wu, Weitong Zhang, Licheng Jiao, Songhua Xu
To this end, a multi-objective complex network pruning framework based on divide-and-conquer and global performance impairment ranking (EMO-DIR) is proposed in this paper.
no code implementations • 17 Mar 2023 • Junkai Zhang, Weitong Zhang, Quanquan Gu
The sample complexity of our algorithm only has a polylogarithmic dependence on the planning horizon and therefore is "horizon-free".
no code implementations • 16 Mar 2023 • Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu
We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors.
no code implementations • ICLR 2022 • Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang
Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.
no code implementations • NeurIPS 2021 • Weitong Zhang, Dongruo Zhou, Quanquan Gu
By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 22 Jun 2021 • Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu
For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.
no code implementations • NeurIPS 2020 • Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu
In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.
2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.
no code implementations • 4 May 2020 • Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu
In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.
no code implementations • 27 Jul 2018 • Weitong Zhang
We apply Faster R-CNN to the detection of characters in namecard, in order to solve the problem of a small amount of data and the inbalance between different class, we designed the data augmentation and the 'fake' data generalizer to generate more data for the training of network.