no code implementations • 20 Feb 2024 • Yuanguo Lin, Fan Lin, Guorong Cai, Hong Chen, Lixin Zou, Pengcheng Wu
In response to the limitations of reinforcement learning and evolutionary algorithms (EAs) in complex problem-solving, Evolutionary Reinforcement Learning (EvoRL) has emerged as a synergistic solution.
no code implementations • 17 May 2023 • Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Chenliang Li, Dawei Yin, Brian D. Davison
The goal of unbiased learning to rank (ULTR) is to leverage implicit user feedback for optimizing learning-to-rank systems.
1 code implementation • 11 Mar 2023 • Kesen Zhao, Lixin Zou, Xiangyu Zhao, Maolin Wang, Dawei Yin
However, deploying the DT in recommendation is a non-trivial problem because of the following challenges: (1) deficiency in modeling the numerical reward value; (2) data discrepancy between the policy learning and recommendation generation; (3) unreliable offline performance evaluation.
no code implementations • 19 Oct 2022 • Haitao Mao, Lixin Zou, Yujia Zheng, Jiliang Tang, Xiaokai Chu, Jiashu Zhao, Qian Wang, Dawei Yin
To address the above challenges, we propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model with causal discovery and mitigate the biases induced by multiple SERP features with no specific design.
no code implementations • 16 Aug 2022 • Lixin Zou, Changying Hao, Hengyi Cai, Suqi Cheng, Shuaiqiang Wang, Wenwen Ye, Zhicong Cheng, Simiu Gu, Dawei Yin
We further instantiate the proposed unbiased relevance estimation framework in Baidu search, with comprehensive practical solutions designed regarding the data pipeline for click behavior tracking and online relevance estimation with an approximated deep neural network.
1 code implementation • 24 Jul 2022 • Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Dawei Yin, Brian D. Davison
Existing methods in unbiased learning to rank typically rely on click modeling or inverse propensity weighting (IPW).
1 code implementation • 7 Jul 2022 • Lixin Zou, Haitao Mao, Xiaokai Chu, Jiliang Tang, Wenwen Ye, Shuaiqiang Wang, Dawei Yin
The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms.
no code implementations • 5 Jul 2022 • Pan Du, Jian-Yun Nie, Yutao Zhu, Hao Jiang, Lixin Zou, Xiaohui Yan
Beyond topical relevance, passage ranking for open-domain factoid question answering also requires a passage to contain an answer (answerability).
no code implementations • 22 Sep 2021 • Yuanguo Lin, Yong liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, Chunyan Miao
To understand the challenges and relevant solutions, there should be a reference for researchers and practitioners working on RL-based recommender systems.
1 code implementation • 28 May 2021 • Siyuan Guo, Lixin Zou, Yiding Liu, Wenwen Ye, Suqi Cheng, Shuaiqiang Wang, Hechang Chen, Dawei Yin, Yi Chang
Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness.
no code implementations • 24 May 2021 • Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng, Daiting Shi, Zhifan Zhu, Weiyue Su, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin
However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues:(1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web-document, prohibit their deployments in an online ranking system that demands extremely low latency;(2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system;(3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system.
no code implementations • 4 May 2021 • Lixin Zou, Long Xia, Linfang Hou, Xiangyu Zhao, Dawei Yin
This work introduces a practical, data-efficient policy learning method, named Variance-Bonus Monte Carlo Tree Search~(VB-MCTS), which can copy with very little data and facilitate learning from scratch in only a few trials.
no code implementations • 29 Nov 2020 • Jinlin Lai, Lixin Zou, Jiaxing Song
Off-policy evaluation is a key component of reinforcement learning which evaluates a target policy with offline data collected from behavior policies.
no code implementations • 26 Oct 2020 • Zhenzhen Li, Jian-Yun Nie, Benyou Wang, Pan Du, Yuhan Zhang, Lixin Zou, Dongsheng Li
Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification.
1 code implementation • 4 Jul 2020 • Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, Dawei Yin
Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning.
no code implementations • 27 Jun 2019 • Xiangyu Zhao, Long Xia, Lixin Zou, Dawei Yin, Jiliang Tang
Thus, it calls for a user simulator that can mimic real users' behaviors where we can pre-train and evaluate new recommendation algorithms.
no code implementations • 13 Feb 2019 • Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin
Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically consists of both instant feedback~(e. g. clicks, ordering) and delayed feedback~(e. g. dwell time, revisit); in addition, performing effective off-policy learning is still immature, especially when combining bootstrapping and function approximation.