Search Results for author: Nitin Kedia

Found 2 papers, 1 papers with code

Vidur: A Large-Scale Simulation Framework For LLM Inference

1 code implementation • 8 May 2024 • Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput.

Scheduling

Paper
Code

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

no code implementations • 4 Mar 2024 • Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency.

Scheduling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.