no code implementations • 23 May 2024 • Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang
Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).
no code implementations • 12 Feb 2024 • Wenpin Tang, Hanyang Zhao
This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE).
no code implementations • 23 Jan 2024 • Wenpin Tang, Hanyang Zhao
In view of possibly unguaranteed score matching, we propose a new criterion -- the contraction of backward sampling in the design of DPMs, leading to a novel class of contractive DPMs (CDPMs).