Text-to-Music Generation
13 papers with code • 2 benchmarks • 3 datasets
Most implemented papers
MusicLM: Generating Music From Text
We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".
Simple and Controllable Music Generation
We tackle the task of conditional music generation.
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization.
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions.
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum.
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Recent years have seen the rapid development of large generative models for text; however, much less research has explored the connection between text and another "language" of communication -- music.
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.
Investigating Personalization Methods in Text to Music Generation
In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting.
Music ControlNet: A model similar to SD ControlNetD that can accurately control music generation
While the image-domain Uni-ControlNet method already allows generation with any subset of controls, we devise a new strategy to allow creators to input controls that are only partially specified in time.