Search Results for author: Leyang Xue

Found 4 papers, 1 papers with code

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation • 25 Jan 2024 • Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

Paper
Code

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

no code implementations • 25 Jan 2024 • Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).

Paper
Add Code

Enhancing the long-term performance of recommender system

no code implementations • 1 Apr 2019 • Leyang Xue, Peng Zhang, An Zeng

Notably, an optimal parameter n* of ARL existed in long-term recommendation, indicating that there is a trade-off between keeping diversity of item and user's preference to maximize the long-term recommendation accuracy.

Recommendation Systems

Paper
Add Code

Predictability of diffusion-based recommender systems

no code implementations • 29 Mar 2019 • Peng Zhang, Leyang Xue, An Zeng

The results show that the higher recommendation accuracy based on diffusion algorithms can still be achieved by optimizing the way of resource allocation on a density network.

Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.