no code implementations • 4 Apr 2024 • Zhiyue Zhang, Yao Zhao, Yanxun Xu
However, current methods only address joint modeling of longitudinal measurements at regularly-spaced observation times and survival events, neglecting recurrent events.
no code implementations • 21 Jan 2024 • Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu
Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability.
no code implementations • 26 May 2023 • Mao Hong, Zhengling Qi, Yanxun Xu
To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.
no code implementations • 18 Sep 2022 • Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok
Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.
1 code implementation • 20 May 2021 • Xiao Sun, Bahador Bahmani, Nikolaos N. Vlassis, WaiChing Sun, Yanxun Xu
This paper presents a computational framework that generates ensemble predictive mechanics models with uncertainty quantification (UQ).
no code implementations • 8 Jul 2020 • William Hua, Hongyuan Mei, Sarah Zohar, Magali Giral, Yanxun Xu
In the second step, we propose a policy gradient method to learn the personalized optimal clinical decision that maximizes the patient survival by interacting the MTPP with the model on clinical observations while accounting for uncertainties in clinical observations learned from the posterior inference of the Bayesian joint model in the first step.
Methodology
no code implementations • 18 Aug 2016 • Yanbo Xu, Yanxun Xu, Suchi Saria
We study the problem of estimating the continuous response over time to interventions using observational time series---a retrospective dataset where the policy by which the data are generated is unknown to the learner.