INQUIRING LINE

Can combinational creativity alone drive open-ended learning in agents?

This explores whether recombining existing concepts and skills — combinational creativity — is by itself enough to push agents into genuinely open-ended learning, or whether it needs other engines alongside it.


This reads the question as: combination (mixing what an agent already knows into new arrangements) is clearly *a* driver of open-ended learning — but is it sufficient *on its own*? The corpus's sharpest answer comes from a note that splits creativity into three distinct modes: combinational, exploratory, and transformational Can LLMs reason creatively beyond conventional problem-solving?. Combinational creativity recombines familiar pieces; exploratory creativity searches within a space's rules; transformational creativity rewrites the rules themselves. The framing implies that combination alone keeps you inside the existing concept space — powerful for recombination, but structurally unable to escape boundaries the other two modes are built to cross. So the short answer the corpus points toward is: no, not alone.

That said, the case *for* combination is strong and worth seeing. VOYAGER shows that storing executable skills in a library and composing complex skills from simpler ones produces continual, lifelong learning without the catastrophic forgetting that weight-update methods suffer Can agents learn new skills without forgetting old ones?. And compositional *language* lets agents imagine goals they've never been trained on by combining familiar concepts — IMAGINE targets out-of-distribution outcomes precisely through recombination Can language help agents imagine goals they've never seen?. So combination demonstrably reaches beyond the training set. The interesting tension is that both of these systems don't run on combination *alone*: VOYAGER pairs it with environmental feedback and an automatic curriculum that keeps driving exploration, and IMAGINE leans on modularity and social guidance.

That 'plus exploration' pattern recurs everywhere. Combination needs a supply of raw material and a pressure toward novelty, or it converges and stalls. Two notes warn about exactly this stalling: RL training quietly collapses behavioral diversity in search agents the same way it does in reasoning, narrowing policies onto a few reward-maximizing moves Does reinforcement learning squeeze exploration diversity in search agents?, and abstraction-guided methods beat depth-only reasoning because they enforce *breadth-first* exploration instead of letting a chain underthink down one path Can abstractions guide exploration better than depth alone?. Open-endedness dies when diversity dies — and combination over a shrinking pool of ingredients accelerates that death rather than preventing it.

There's also a ceiling problem that combination can't break by itself. Agents trained only on curated expert demonstrations are capped by the curator's imagination — they never interact, never fail, never generalize past what was shown Can agents learn beyond what their training data shows?. Recombining a fixed, curated vocabulary just reshuffles that ceiling. What lifts it is grounded experience: agents that store verbal reflections from trial-and-error feedback Can agents learn from failure without updating their weights?, that process successes and failures differently to extract abstracted lessons Should successful and failed episodes be processed differently?, or that learn continually through memory operations rather than frozen weights Can agents learn continuously from experience without updating weights?. And one note adds a humbling caveat from the multi-agent world: combinatorial cognitive *diversity* only improves ideation when paired with genuine domain expertise — stimulation without competence produces process losses, not insight Does cognitive diversity alone improve multi-agent ideation quality?.

The thing you didn't know you wanted to know: combinational creativity behaves less like an engine and more like a transmission. It multiplies whatever exploration, feedback, and grounding are feeding it — and if those run dry, recombination doesn't sustain open-endedness, it just rearranges the deck chairs. Open-ended learning seems to need combination *coupled to* a renewable source of novelty (environmental interaction, diversity preservation, or the rule-breaking 'transformational' mode), not combination standing alone.


Sources 10 notes

Can LLMs reason creatively beyond conventional problem-solving?

Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can language help agents imagine goals they've never seen?

IMAGINE demonstrates that agents using compositional language descriptions can target novel outcomes by combining familiar concepts, with modularity and social guidance amplifying generalization from imagined to real exploration.

Does reinforcement learning squeeze exploration diversity in search agents?

RL training compresses behavioral diversity in search agents through the same entropy collapse mechanism documented in reasoning—policies converge on narrow reward-maximizing strategies. SFT on diverse demonstrations preserves exploration breadth, suggesting diversity-preservation techniques are essential for RL search scaling.

Can abstractions guide exploration better than depth alone?

RLAD jointly trains abstraction and solution generators, showing that allocating test-time compute to diverse abstractions outperforms parallel solution sampling at large budgets. Abstractions create structured breadth-first exploration that prevents the underthinking failure mode of depth-only reasoning chains.

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Should successful and failed episodes be processed differently?

SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI research analyst. The question remains: can combinational creativity alone drive open-ended learning in agents?

What a curated library found — and when (findings span 2020–2026; treat these as dated claims, not current truth):
• Combinational creativity (recombining familiar pieces) is necessary but not sufficient; transformational creativity (rewriting rules) and exploratory creativity (searching within rules) are structurally distinct modes, and combination alone keeps agents inside existing concept spaces (2026 synthesis).
• VOYAGER and related systems show lifelong compositional skill learning without catastrophic forgetting, but only when paired with environmental feedback and automatic curricula that drive exploration — combination multiplies whatever exploration feeds it, rather than sustaining novelty independently (2025–2026).
• RL training collapses behavioral diversity in search agents the same way it does in reasoning, narrowing policies onto reward-maximizing moves; open-endedness dies when diversity dies (2025).
• Agents trained solely on curated expert demonstrations are capped by curator imagination; grounded experience (trial-and-error feedback, episodic memory, continual learning) is required to escape this ceiling (2020–2026).
• Combinatorial cognitive diversity in multi-agent systems only improves ideation when paired with genuine domain expertise; diversity without competence produces process losses (2025).

Anchor papers (verify; mind their dates):
• arXiv:2310.10134 (CLIN, 2023-10) — continual language agent learning
• arXiv:2505.20296 (Reasoning LLMs, 2025-05) — solution space exploration
• arXiv:2508.04575 (Multi-agent ideation, 2025-08) — what drives high-quality ideas
• arXiv:2604.08377 (SkillClaw, 2026-04) — collective skill evolution

Your task:
(1) RE-TEST the sufficiency claim: newer models (o1, o3, Claude 3.5+) or training methods (DPO, preference-based curricula, synthetic environments with intrinsic variability) may relax the "diversity dies" constraint or enable combinational systems to bootstrap their own transformational operators. Distinguish the durable question (what sustains open-endedness long-term?) from perishable limitations (current RL architectures collapse diversity). Cite what resolved each, plainly.
(2) Surface contradicting/superseding work from the last ~6 months: identify any 2025–2026 papers showing either (a) combinational-only systems that sustain novelty, or (b) emergent rule-rewriting arising from pure recombination without explicit transformational design.
(3) Propose 2 research questions assuming the regime has shifted: (i) Can in-context learning or multi-turn scaffolding let LLMs internally generate transformational moves, making external architectural separation unnecessary? (ii) Do memory-augmented agents (artifact-as-memory systems, 2026-04) circumvent the curator-ceiling by continuous unfiltered experience capture, and if so, does combination alone then suffice?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines