Multistep Consistency Models
Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas we show that a ∞-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation. We also show that our method scales to a text-to-image diffusion model, generating samples that are very close to the quality of the original model.
Introduction. Diffusion models have rapidly become one of the dominant generative models for image, video and audio generation (Ho et al., 2020; Kong et al., 2021; Saharia et al., 2022). The biggest downside to diffusion models is their relatively expensive sampling procedure: whereas training uses a single function evaluation per datapoint, it requires many (sometimes hundreds) of evaluations to generate a sample. Recently, Consistency Models (Song et al., 2023) have reduced sampling time significantly, but at the expense of image quality. Consistency models come in two variants: Consistency Training (CT) and Consistency Distillation (CD) and both have considerably improved performance compared to earlier works. TRACT (Berthelot et al., 2023) focuses solely on distillation with an approach similar to consistency distillation, and shows that dividing the diffusion trajectory in stages can improve performance. Despite their successes, neither of these works attain performance close to a standard diffusion baseline.
Discussion / Conclusion. In conclusion, this paper presents Multistep Consistency Models, a simple unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that closes the performance gap between standard diffusion and few-step sampling. Multistep Consistency gives a direct trade-off between sample quality and speed, achieving performance comparable to standard diffusion in as little as eight steps.