INQUIRING LINE

Training, RL, and Test-Time Scaling · Agentic Systems and Tool Use · Model Architecture and Internalscross-cluster

Can text-space optimization and audit governance coexist in a single skill lifecycle?

This explores whether a skill document can be both auto-optimized like model weights (text-space optimization) and kept under a human-auditable approval gate (governance) — and whether those two goals fight each other or reinforce each other within one workflow.

This explores whether a skill document can be both auto-optimized like model weights (text-space optimization) and kept under a human-auditable approval gate at the same time. The corpus's most direct answer is yes — and the two aren't in tension, they're the same mechanism. Can skill documents be optimized like neural network weights? (SkillOpt) treats a plain-English skill doc as a trainable object: a separate optimizer proposes edits, but each edit is accepted only if it strictly improves a held-out validation score. That validation gate IS the governance. Because the optimization happens in text rather than in opaque weights, every accepted change is a readable diff a human can inspect — you get the iterative improvement of training plus an audit trail you simply don't have when you fine-tune parameters. The text-space substrate is what makes the lifecycle auditable, not a constraint bolted on after.

Why a strict gate is non-negotiable becomes clear from work on self-improvement limits. What stops large language models from improving themselves? shows that LLMs can't reliably improve themselves through introspection alone — every dependable fix needs something external to validate and enforce it, because of the gap between generating a change and verifying it's actually better. A skill optimizer that proposed edits and accepted them on its own judgment would drift; the held-out validation set is precisely the external verifier that closes that gap. Governance here isn't bureaucratic overhead, it's the thing that keeps the loop from fooling itself.

The catch is what you let do the gating. If you replace held-out task scores with an LLM acting as judge, the audit becomes corruptible. Can LLM judges be fooled by fake credentials and formatting? shows judges fall for authority signals and fancy formatting with no model access needed — so an optimizer could 'improve' a skill doc by adding impressive-looking but empty credentials and pass review. This is the real design fork in the lifecycle: ground the gate in objective held-out performance (robust) or in model-judged quality (gameable). SkillOpt's strict-improvement-on-real-validation design sidesteps the trap that biased judges open up.

There's also a ceiling worth knowing about. Text-space optimization reorganizes what's already there — it doesn't conjure new capability. Can prompt optimization teach models knowledge they lack? makes the hard version of this point: prompting can only surface knowledge already in the model, never add missing knowledge. A skill doc can sharpen, sequence, and activate a model's existing competence, but if the underlying ability isn't there, no amount of document tuning supplies it. For that you need a different lever — Do tools actually expand what language models can reason about? shows tools genuinely expand the reasoning frontier in ways text alone can't reach. So a mature skill lifecycle does two jobs: optimize the document for what the model can do, and reach for tools when the frontier itself needs to move.

The quietly important part is that this whole approach is legible by construction. Compare it to the alternatives in the corpus — composing expert vectors at inference (Can models dynamically activate expert skills at inference time?) or swarms searching weight space for new experts (Can language models discover new expertise through collaborative weight search?) — both powerful, both producing changes buried in numbers no reviewer can read. A skill doc trained in text-space gives you a candidate that's strong AND inspectable, transferable between models, and rejectable line-by-line. That's the real answer to the question: text-space optimization and audit governance don't merely coexist, they're load-bearing for each other — the auditability is a free consequence of choosing text as the thing you train.

Sources 7 notes

Can skill documents be optimized like neural network weights?

SkillOpt demonstrates that skill documents can be systematically improved through a separate optimizer that proposes edits, accepting only changes that strictly improve held-out validation scores. This approach outperforms baselines across 52 experimental cells and produces skills that transfer between models.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Do tools actually expand what language models can reason about?

Formal proof shows tool-integrated reasoning enables strategies impossible or prohibitively verbose in text alone, expanding both empirical and feasible support. The advantage spans abstract reasoning, not just arithmetic, and Advantage Shaping Policy Optimization stabilizes training without reward distortion.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Can language models discover new expertise through collaborative weight search?

PSO-inspired swarms of LLM particles moving through weight space discover composed experts with new capabilities—including answering questions all initial experts failed on—using only 200 validation examples and no gradient-based training.

Can text-space optimization and audit governance coexist in a single skill lifecycle?

Sources 7 notes

Next inquiring lines