1. General Deep Learning
1.1. Generative Models
1.1.1. Generative Adversarial Networks
1.1.2. Diffusion Models
1.1.2.1. General Diffusion Models
[Sohl-Dickstein et al. ICML 2015] Deep Unsupervised Learning using Nonequilibrium Thermodynamics
This paper introduced the idea of diffusion-based generative models. It proposed a diffusion process that gradually adds noise to the data until it becomes pure noise and then learns to reverse this process. This foundational idea later inspired the development of denoising diffusion probabilistic models (DDPMs).
[Song et al. UAI 2020] Sliced Score Matching: A Scalable Approach to Density and Score Estimation
[Song et al. NeurIPS 2019] Generative modeling by estimating gradients of the data distribution
Proposed score-based generative models that estimate the gradient of the data distribution (the score function) to generate samples. Instead of using a diffusion process, this paper trained a network to model the score function at multiple noise levels. This work was crucial in establishing score-based methods as an alternative to GANs and VAEs.
[Ho et al. NeurIPS 2020] Denoising Diffusion Probabilistic Models
Introduced denoising diffusion probabilistic models (DDPMs), which combined forward and reverse diffusion processes. The model learns to predict the noise at each step of the diffusion process and generates samples by iteratively denoising. This paper made diffusion models practical and demonstrated competitive image generation results.
[Song et al. 2020] Score-Based Generative Modeling through Stochastic Differential Equations
Unified score-based models and diffusion models under a continuous-time framework with stochastic differential equations (SDEs). This paper introduced the Probability Flow ODE, an ordinary differential equation counterpart to the SDE. This allowed for deterministic sampling paths and bridged the gap between DDPMs and score-based methods.
[Nichol et al. 2020] Improved Denoising Diffusion Probabilistic Models
[Karras et al. 2022] Elucidating the Design Space of Diffusion-based Generative Models
Provided an extensive exploration of design choices in diffusion models, offering insights on optimal hyperparameters, noise schedules, and sampling techniques. This work resulted in more stable, efficient, and high-quality diffusion models.
[Rombach et al. CVPR 2022] High-Resolution Image Synthesis with Latent Diffusion Models
Introduced Latent Diffusion Models that operate in a lower-dimensional latent space rather than the pixel space. This innovation reduced computational costs and enabled diffusion models to handle higher-resolution images and complex tasks like text-to-image generation.
[Zhang et al. CVPR 2023] Adding conditional control to text-to-image diffusion models
Proposed ControlNet, which conditions diffusion models on structured inputs like poses or edges, allowing for controlled and guided generation. This architecture made diffusion models like Stable Diffusion highly versatile for text-to-image generation with specific prompts and constraints.
[Song et al. ICML 2023] Consistency Models
Introduced consistency models, a new approach to training diffusion models with fewer denoising steps. Consistency models reduce sampling times while preserving quality, making diffusion models faster and more practical for real-time applications.
[Lu and Song 2024] Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
1.1.2.1. Diffusion Models for Inverse Problems
2. Deep Learning Applications
2.1. Audio
2.1.2. General Audio
2.1.2.1. Target Sound Extraction (TSE) and Source Separation (SS)
2.1.2.2. Audio Foundation Models
[Niizumi et al. TASLP 2024] Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
2.1.2.3. Deep Acoustic Imaging
[Simeoni et al. NeurIPS 2019] DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging
[Roman et al. ICASSP 2024] Robust DoA Estimation from Deep Acoustic Imaging