AmbientGen

Building a text-to-ambient-sound generator with AudioLDM2

View the Project on GitHub my-sonicase/ambientgen

πŸ“š Papers & Reading List

A curated collection of key papers and resources in generative AI for audio and music. Papers I’ve read in depth are marked with βœ… and linked to my blog post about them.


Text-to-Audio Generation

Paper Year Key Idea Status
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining 2023 Latent diffusion for audio with β€œLanguage of Audio” shared representation βœ… My notes
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models 2023 First latent diffusion model for text-to-audio, uses CLAP πŸ“‹ To read
Make-An-Audio 2 2023 Temporal-enhanced text-to-audio with LLM-augmented captions πŸ“‹ To read
AudioGen: Textually Guided Audio Generation 2022 Autoregressive audio generation from Meta πŸ“‹ To read
Stable Audio Open 2024 Latent diffusion with timing conditioning from Stability AI πŸ“‹ To read

Audio Understanding & Representation

Paper Year Key Idea Status
CLAP: Learning Audio Concepts from Natural Language Supervision 2022 Contrastive learning to align audio and text (like CLIP for audio) πŸ“‹ To read
Audio Spectrogram Transformer (AST) 2021 Pure attention model for audio classification πŸ“‹ To read

Text-to-Music

Paper Year Key Idea Status
MusicGen: Simple and Controllable Music Generation 2023 Single-stage transformer for music from Meta πŸ“‹ To read
MusicLM: Generating Music From Text 2023 Hierarchical music generation from Google πŸ“‹ To read
Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion 2024 DiT-based architecture with timing control πŸ“‹ To read

Voice & Speech Synthesis

Paper Year Key Idea Status
XTTS: Cross-lingual Text-to-Speech 2024 Multilingual TTS with voice cloning (Coqui) πŸ“‹ To read
Bark 2023 GPT-style text-to-audio with speech, music, sound effects πŸ“‹ To read
StyleTTS 2 2023 Diffusion-based style modeling for natural TTS πŸ“‹ To read

Foundational (Diffusion Models)

Paper Year Key Idea Status
Denoising Diffusion Probabilistic Models (DDPM) 2020 The foundational diffusion model paper πŸ“‹ To read
High-Resolution Image Synthesis with Latent Diffusion Models 2022 Latent Diffusion (Stable Diffusion) β€” same principle used in AudioLDM πŸ“‹ To read

πŸ”— Other Resources


Last updated: February 2025

← Back to blog index