SYNTHESIS NOTE

Can LLMs handle multiple tasks at once during inference?

Do language models maintain multiple distinct in-context learning tasks simultaneously in their internal representations, and if so, what prevents them from actually generating outputs for more than one task?

Synthesis note · 2026-02-23 · sourced from MechInterp

LLMs can perform multiple, computationally distinct in-context learning tasks simultaneously during a single inference call — task superposition. This emerges even when models are trained to learn one task at a time.

The distinction between superposition types matters:

Feature superposition (Elhage et al., 2022): multiple concepts encoded in a single neuron. Microscopic, at the level of individual neurons.
Task superposition: multiple complete tasks maintained in the output distribution simultaneously. Macroscopic, visible in the final output.

Three scaling findings:

Larger models can solve more ICL tasks in parallel
Larger models better calibrate their output distribution across simultaneous tasks
Task vectors can be composed via arithmetic operations to steer behavior

The critical limitation: generation collapse. After the first token is generated, the model converges on predicting tokens for a single task, negating multi-task execution. The first token acts as a commitment point that collapses the superposition. "Superposed decoding" algorithms attempt to maintain the multi-task state but remain early-stage.

The Waluigi effect — outputs collapsing to unintended simulacra — is one consequence of task superposition. When the model maintains multiple task interpretations simultaneously, generation collapse can resolve to an unintended one. This connects to the "LLMs as multiverse generators" perspective: the model simultaneously represents multiple possible continuations, and decoding forces a collapse.

The practical implication: the model's representational capacity for parallel computation far exceeds what standard autoregressive decoding can exploit. The bottleneck is not in representation but in generation — a single token sequence can only express one task at a time.

Task superposition has implications for ICL-based sequential decision making. Since Why do trajectories matter more than individual examples for in-context learning?, presenting multiple trajectories in context enables ICL of new tasks. Task superposition may be the representational mechanism that makes this possible: the model maintains multiple task interpretations from the in-context trajectories simultaneously, extracting the shared structure needed for generalization. Generation collapse then explains why the model commits to a single policy despite potentially representing multiple viable strategies.

Inquiring lines that read this note 9

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Which computational strategies best support reasoning in language models?

Could superposed decoding algorithms maintain multi-task representation during generation?

What determines success in training models on multiple tasks?

How do training priors constrain what context information can override?

Can priming from different facts interfere with each other in the same model?

Do language models learn genuine linguistic structure or just surface patterns?

What structural advantages do diffusion language models offer over autoregressive methods?

Why do different LLMs converge on similar outputs in open-ended tasks?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 155 in 2-hop network ·dense cluster Open in graph ↗

Can LLMs handle multiple tasks at once during in… Can transformers learn to solve new problems withi… Why does parallel reasoning outperform single chai… Why do trajectories matter more than individual ex…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can transformers learn to solve new problems within episodes? Explores whether transformer models can develop meta-learning abilities through RL training, enabling them to adapt to unseen environments by learning from within-episode experience alone, without updating weights.
ICL as general-purpose capability; task superposition shows ICL operates on multiple tasks simultaneously
Why does parallel reasoning outperform single chain thinking? Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
parallel thinking may be exploiting task superposition within the reasoning trace
Why do trajectories matter more than individual examples for in-context learning? Can language models learn new sequential decision-making tasks from context alone, and if so, what data properties make this possible? This explores why isolated state-action pairs fail where full trajectories succeed.
trajectory burstiness is a data-structural requirement for ICL of sequential tasks; task superposition provides the representational mechanism that makes ICL over multiple in-context trajectories possible

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs perform multiple ICL tasks simultaneously in superposition — but generation collapse after the first token prevents practical use

Can LLMs handle multiple tasks at once during inference?

Inquiring lines that read this note 9

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4