INQUIRING LINE

Inquiring lines›How should we train models for cap…›How can AI systems maintain consis…›Does tokenized intelligence retain…›this inquiring line

An AI response can look authoritative and circulate as valuable even when nobody has checked whether it's actually right.

Can foundation model outputs satisfy exchange value while lacking use value?

This explores whether AI outputs can circulate as if valuable — authoritative, trusted, exchangeable — even when their actual usefulness (correctness, real-world grounding) is unverified or absent.

This reads the question through Marx's old distinction: exchange value is what something fetches in circulation, use value is what it actually does for you. The corpus has a sharp answer — yes, and AI does this more radically than any prior commodity. The central claim is that tokenization decouples the two entirely Can exchange value exist entirely without use value?: an LLM's output earns reliable exchange value through authoritative presentation — fluent, confident, well-formatted — while its use value stays optional and unverifiable. What makes this more radical than ordinary commodification is that use value normally sets a floor; a chair that can't be sat on doesn't sell twice. AI removes that floor. Outputs circulate on social function alone, more like fiat currency than like goods.

What's striking is how many other notes, written about unrelated problems, quietly describe the same gap from different angles. Invalid chain-of-thought reasoning performs nearly as well as valid reasoning Does logical validity actually drive chain-of-thought gains? — the model learns the *form* of reasoning, the thing that reads as rigorous, not genuine inference. That's exchange value (looks like a proof) without use value (actually proves nothing). Similarly, foundation models turn out to run on task-specific heuristics rather than real world models Do foundation models learn world models or task-specific shortcuts?: the output predicts orbital mechanics convincingly while the underlying 'laws' are nonsensical and slice-dependent. The presentation is sound; the substance underneath isn't what it appears to be.

The deeper worry is that the decoupling can become self-sealing. When users refine prompts, they inject their own anticipated answers into the generation How much does the user shape what a model generates?, so outputs become co-productions that confirm what the user already expected. Without external grounding this produces epistemic circularity — which is exactly why foundation models *heighten* rather than reduce the need for empirical data Do foundation models actually reduce our need for real data?. Exchange value (the output feels validated) climbs precisely while use value (does it match reality?) goes unchecked. The same circularity defeats pure self-improvement: models that try to bootstrap without external signals stall, because verification can't come from inside the loop Can models reliably improve themselves without external feedback?.

So the corpus doesn't just say 'yes' — it suggests the gap is the default condition, not a malfunction. Use value has to be re-anchored from outside: real data, third-party judges, tool feedback, user corrections. Every method that reliably restores usefulness works by smuggling in one of those external floors. The thing you didn't know you wanted to know: the floor Marx assumed was automatic now has to be deliberately rebuilt, every time, or the output is just well-dressed circulation.

Sources 6 notes

Can exchange value exist entirely without use value?

AI knowledge achieves reliable exchange-value through authoritative presentation while maintaining optional, unverifiable use-value. This structural decoupling is more radical than Marxist commodification because it removes use-value as a necessary floor—tokens circulate based on social function alone, analogous to fiat currency rather than commodified goods.

Does logical validity actually drive chain-of-thought gains?

Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.

Do foundation models learn world models or task-specific shortcuts?

Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.

How much does the user shape what a model generates?

Foundation Priors research shows prompt engineering as divergence minimization between synthetic output and user priors. The refinement process systematically steers generation toward what users already expect, making outputs co-productions of model and user subjectivity.

Do foundation models actually reduce our need for real data?

Powerful foundation models don't eliminate the need for real data—they heighten it. Without empirical anchoring, iterative prompt refinement creates epistemic circularity where users confirm their own beliefs rather than test them.

Show all 6 sources

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models1.72 match · arxiv ↗
Foundation Priors1.70 match · arxiv ↗
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training1.60 match · arxiv ↗
Mathematical methods and human thought in the age of AI1.52 match · arxiv ↗
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting0.90 match · arxiv ↗
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens0.88 match · arxiv ↗
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models0.86 match · arxiv ↗
When More is Less: Understanding Chain-of-Thought Length in LLMs0.86 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-testing whether foundation model outputs can retain exchange value while lacking use value—a claim rooted in Marx's commodity distinction. A curated library of LLM research (2023–2025) proposed this decoupling is real and worsening.

What a curated library found — and when (dated claims, not current truth):
- Tokenization decouples exchange value (fluent, confident output) from use value (actual correctness); LLMs earn reliable circulation on presentation alone, unlike traditional goods (~2023–2025).
- Invalid chain-of-thought reasoning performs nearly as well as valid reasoning (~2023); models learn the *form* of rigor, not genuine inference.
- Foundation models develop task-specific heuristics rather than world models; outputs predict convincingly while underlying 'laws' are nonsensical and slice-dependent (~2025).
- Prompt engineering co-produces outputs by injecting user-anticipated answers, creating epistemic circularity that heightens—not eliminates—need for external empirical data (~2024–2025).
- Pure self-improvement stalls because verification cannot come from inside the loop; external grounding (data, judges, tool feedback) is necessary to restore use value (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2307.10573 (2023) — Invalid Logic, Equivalent Gains
- arXiv:2412.02674 (2024) — Mind the Gap: Self-Improvement Capabilities
- arXiv:2509.07339 (2025) — Performative Thinking? CoT Length vs. Problem Complexity
- arXiv:2512.01107 (2025) — Foundation Priors

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer training paradigms (RL vs. SFT; post-training methods), evaluation harnesses (formal verification, adversarial testing), multi-agent orchestration (critic loops, external tool pipelines), or emerging architectures have since *relaxed* or *overturned* the decoupling. Separate the durable question (does exchange-without-use remain possible?) from perishable claims (e.g., "CoT always performs equally whether valid or invalid"). Cite what resolved or affirmed each claim.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~6 months (Sep 2025–present). Does any recent paper demonstrate reliable re-anchoring of use value without external scaffolding?
(3) Propose 2 research questions that *assume the regime may have moved*: e.g., "Can RL-trained models recover use value within a closed loop?" and "Does scaling external feedback loops eventually collapse the exchange–use gap?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI response can look authoritative and circulate as valuable even when nobody has checked whether it's actually right.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8