TOPIC

Diffusion-Based LLMs

11 synthesis notes · 13 source papers
View as

Why can't we easily adapt reinforcement learning to diffusion language models?

Autoregressive models enable efficient RL post-training through factorizable log-probabilities, but diffusion models generate tokens in parallel non-sequential order. What makes likelihood computation intractable in diffusion, and can we work around it?

Explore related Read →

Can speech features be separated into semantic and stylistic components?

Linguistic theory suggests gestures decompose into semantic units and motion variations. Does this decomposition actually emerge in speech encoder layers, and can it enable more expressive gesture synthesis?

Explore related Read →

Can diffusion models enable control that autoregressive models cannot reach?

Autoregressive language models struggle with complex global controls like syntax and infilling because they generate left-to-right and have discrete token bottlenecks. Can diffusion models' continuous latents and parallel denoising overcome these structural limitations?

Explore related Read →

Can diffusion language models match autoregressive inference speed?

Diffusion LLMs promised faster decoding through parallel token generation, but open-source implementations never outpaced autoregressive models in practice. What architectural barriers prevent diffusion from realizing its speed potential?

Explore related Read →

Can diffusion models commit to answers before full decoding?

Do diffusion language models settle on correct answers early in their refinement process, and if so, can we detect and exploit this convergence to speed up inference without losing quality?

Explore related Read →

Can diffusion models perform evolutionary search in parameter space?

Diffusion models and evolutionary algorithms share equivalent mathematical structures. Can we leverage this equivalence to build evolutionary search methods that preserve solution diversity better than traditional algorithms?

Explore related Read →

Can reasoning and answers be generated separately in language models?

Explores whether diffusion LLMs can embed reasoning prompts directly within generation sequences rather than as prefixes, and whether answers and reasoning can be decoupled as independent refinement axes.

Explore related Read →

Can consistency models trade speed for quality with a few steps?

Consistency models sample in one step but sacrifice quality compared to diffusion. Can adding just a handful of sampling steps recover the quality gap while staying faster than full diffusion?

Explore related Read →

Can iterative revision cycles match how humans actually write?

Does framing research writing as a diffusion process—where drafts are refined through retrieval-augmented cycles—better capture human cognition than linear pipelines and reduce information loss?

Explore related Read →

Does autoregressive generation uniquely enable LLM scaling?

Is the autoregressive factorization truly necessary for LLM scalability, or do other generative principles like diffusion achieve comparable performance? This matters because it shapes which architectural paths deserve investment.

Explore related Read →

Does looping layers beat adding depth in diffusion models?

When scaling masked diffusion language models with fixed parameters, is reusing computation through selective layer looping more efficient than simply making the network deeper? This matters because it challenges conventional scaling assumptions.

Explore related Read →

Source papers 13

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.