Search Results for author: Hanyang Zhao

Found 3 papers, 0 papers with code

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

no code implementations23 May 2024 Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

no code implementations12 Feb 2024 Wenpin Tang, Hanyang Zhao

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE).

reinforcement-learning

Contractive Diffusion Probabilistic Models

no code implementations23 Jan 2024 Wenpin Tang, Hanyang Zhao

In view of possibly unguaranteed score matching, we propose a new criterion -- the contraction of backward sampling in the design of DPMs, leading to a novel class of contractive DPMs (CDPMs).

Cannot find the paper you are looking for? You can Submit a new open access paper.