INQUIRING LINE

Inquiring lines›How does AI reshape human reasonin…›How does AI reshape human skill, a…›How do multi-agent systems achieve…›this inquiring line

Doing the structural work on your data upfront means a compute-limited AI can actually access more of the information that was always there.

How does deterministic feature engineering increase information for computationally bounded agents?

This explores why pre-computing structured features — rather than handing an agent raw data — actually raises the amount of *usable* information available to a model that can't afford unlimited computation, and the corpus reframes this as a question about externalizing structure.

This explores why pre-computing structured features raises the *usable* information available to an agent that can't afford unlimited computation. The cleanest anchor in the corpus is the idea of epiplexity — a measure of how much structure a *computationally bounded* observer can actually pull out of data, as opposed to the raw entropy sitting in it What can a bounded observer actually learn from data?. The key move there is separating learnable regularity from time-bounded entropy: two datasets can carry the same nominal information, but one yields far more to an observer that only has so many cycles. Deterministic feature engineering is, in effect, doing that extraction work *ahead of time* — converting structure that would have cost the agent compute to discover into structure it can simply read off. The information didn't grow; the share of it that a bounded agent can reach did.

The corpus generalizes this far beyond classic feature columns. A recurring finding is that reliable agents work by *externalizing cognitive burdens* — memory, skills, and protocols — into a harness layer instead of forcing the model to re-derive them on every pass Where does agent reliability actually come from?. That externalization is the same trick as feature engineering: each thing you compute once and store is a thing the bounded model no longer has to spend its limited budget rediscovering. VOYAGER's executable skill library is feature engineering for behavior — composing complex competences from stored simpler ones so the agent never re-solves a solved subproblem Can agents learn new skills without forgetting old ones?. DeepAgent's memory folding does it for history, consolidating raw interaction logs into episodic, working, and tool schemas that are cheaper to act on Can agents compress their own memory without losing critical details?.

There's a subtlety worth naming: *how* you compress the structure matters as much as that you do. Reflexion deliberately keeps its self-diagnoses **uncompressed**, because a binary success/failure signal plus verbatim reflection preserves usability — squeeze it too hard and you destroy the very information you were trying to make accessible Can agents learn from failure without updating their weights?. So deterministic feature engineering increases information for a bounded agent only when the determinism preserves the load-bearing structure, not when it merely shrinks bytes.

The sharpest cautionary note comes from work showing that a model can contain every linearly decodable feature a task needs and still be internally *fractured* — perfect accuracy masking broken organization that collapses under perturbation Can models be smart without organized internal structure?. The lesson for feature engineering: surfacing a feature so it's *decodable* is not the same as encoding genuine structure. You can hand a bounded agent features that look informative and measure well, yet haven't actually raised its epiplexity at all.

Why any of this is worth caring about: it reframes the whole small-vs-large-model debate. If most agent work is repetitive, well-defined subtasks, then small language models suffice precisely because good external structure has already done the heavy lifting — the model is bounded on purpose, and the engineered features are what make that economical Can small language models handle most agent tasks?. Feature engineering and computational boundedness aren't opposing forces; the first is how you make the second pay.

Sources 7 notes

What can a bounded observer actually learn from data?

Epiplexity formalizes the structural information a computationally bounded observer can extract from data, separating learnable regularity from time-bounded entropy. This task-free measure correlates with out-of-distribution generalization and explains why some datasets enable broader transfer than others.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents compress their own memory without losing critical details?

DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Show all 7 sources

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Useful Memories Become Faulty When Continuously Updated by LLMs2.60 match · arxiv ↗
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver1.73 match · arxiv ↗
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents1.73 match · arxiv ↗
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents1.72 match · arxiv ↗
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control1.58 match · arxiv ↗
Small Language Models are the Future of Agentic AI0.91 match · arxiv ↗
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence0.90 match · arxiv ↗
DeepAgent: A General Reasoning Agent with Scalable Toolsets0.89 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-evaluating claims about how deterministic feature engineering increases usable information for computationally bounded agents. The question remains open: does pre-computing structure truly raise epiplexity, or have newer training methods, model scaling, or agentic orchestration changed the tradeoff?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat these as a snapshot, not current consensus.
- Epiplexity (learnable structure extractable by time-bounded observers) is separable from raw entropy; feature engineering externalizes discovery cost ahead of time (~2026).
- Agent reliability correlates with externalizing cognitive burdens (memory, skills, protocols) into harness layers rather than forcing model re-derivation per step (~2026).
- Skill libraries (VOYAGER-style) and memory folding (DeepAgent) compress interaction history into cheaper-to-act-on schemas; small models suffice when good external structure is present (~2025–2026).
- Reflexion's uncompressed verbal reflection preserves usability; aggressive compression destroys load-bearing structure (~2024+).
- Models can show identical performance metrics while internally fractured—surfacing a decodable feature ≠ genuine epiplexity gain (~2024).

Anchor papers (verify; mind their dates):
- arXiv:2601.03220 (2026-01): *From Entropy to Epiplexity* — formalization of bounded observer information.
- arXiv:2510.21618 (2025-10): *DeepAgent* — memory folding for agents.
- arXiv:2506.02153 (2025-06): *Small Language Models are the Future of Agentic AI* — sufficiency thesis via externalization.
- arXiv:2604.08224 (2026-04): *Externalization in LLM Agents* — unified review of harness strategies.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, assess whether recent advances in model scaling, in-context learning, retrieval-augmented orchestration, or multi-agent coordination have relaxed the need for pre-computed feature engineering, or whether bounded agents still benefit. Separate the durable insight (epiplexity is real; bounded observers *can* be information-starved) from perishable limitation (e.g., *this specific skill library cuts latency by 40%*—does it still, with cached embeddings?).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—any paper showing end-to-end learning outperforms externalization, or vice versa, or revealing a regime boundary.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Can modern retrieval + in-context few-shot obviate pre-engineered features for small models? (b) At what computational budget does externalization stop being the dominant strategy?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Doing the structural work on your data upfront means a compute-limited AI can actually access more of the information that was always there.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8