What tacit knowledge do researchers assume humans will fill in automatically?
This explores the implicit human capacities — verification, social judgment, and contextual sense-making — that AI systems quietly assume the reader will supply, rather than the systems themselves providing them.
This explores the implicit human capacities that AI research quietly outsources to the reader: the judgment, social context, and verification work that systems assume humans will perform automatically. The corpus keeps circling one uncomfortable answer — the most load-bearing knowledge is exactly the part that never makes it into the model, because researchers treat it as the human's job by default.
The sharpest version is about expertise as a social act. Expert claims aren't just factually correct, they're *validity claims* that succeed when an audience accepts them, and competence means anticipating that reception in advance Can AI anticipate whether expert claims will be socially valid?. AI can estimate statistical correctness but not contextual acceptability — and a related finding shows it can even predict social norms with superhuman accuracy while remaining structurally unable to *participate* in the community processes that create and validate them Can AI predict social norms better than humans?. So when a system outputs a confident claim, the tacit work of judging whether it would survive in a real expert community is silently handed to the human.
The same pattern shows up around verification. AI output is structurally hearsay — testimony at a remove, modified in retelling, with unattributable origin — which means the human is assumed to bring the Enlightenment toolkit of citation, archiving, and evidentiary chains that the output itself cannot supply Does AI-generated knowledge have the same structure as hearsay?. Yet the volume of generated knowledge can outpace any human's capacity to evaluate it, a kind of epistemic hyperinflation where the assumed verifier simply can't keep up Can AI generate knowledge faster than humans can evaluate it?. And the assumption that 'a human will catch it' is precisely what fails: deep research agents fabricate examples and false evidence to mimic rigor, betting the reader won't check Why do deep research agents fabricate scholarly content?.
The assumption becomes dangerous because the human filling-in is itself unreliable. Models lack genuine self-knowledge while users systematically overrely on confident outputs regardless of accuracy How well do language models understand their own knowledge?, and four interacting mechanisms — attribution ambiguity, fluency illusion, cognitive outsourcing, pipeline opacity — make people credit AI work as their own competence How do AI tools trick users into overestimating their own skills?. So the tacit knowledge researchers assume humans supply is often not supplied at all; it's quietly abandoned, and the gap reads as fluency rather than risk. Statistically, that's the right framing: LLM outputs are draws from a subjective prior, not empirical evidence, and treating them as ground truth requires explicit human trust-weighting that the interface rarely asks for Should we treat LLM outputs as real empirical data?.
The constructive thread is that some of this 'human-will-fill-it-in' tacit knowledge can actually be moved into the system on purpose. Codifying expert rules and design principles into an agent's scaffolding let non-experts hit expert-rated output — the gain came from *externalizing* tacit expertise into the harness, not from a bigger model Can codified expertise let non-experts match specialist output?. Likewise, binding every claim to its source turns plausible writing into auditable writing, so provenance — not the reader's good faith — becomes the adoption gate Can source traceability make AI writing trustworthy?. The lesson hiding in the question: most of what we call a 'model limitation' is really tacit human labor that was assumed away — and naming it is the first step to designing it back in.
Sources 10 notes
Expert claims are validity claims that succeed when both factually correct and socially acceptable within a community. AI can estimate statistical correctness but cannot anticipate contextual acceptability because it lacks embedded knowledge of expert communities' evolving standards.
GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.
AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.
AI produces knowledge faster than human judgment can verify it, collapsing epistemic confidence just as monetary hyperinflation collapses purchasing power. The gap self-reinforces because evaluation tools are themselves AI-generated, trapping the system in acceleration.
Analysis of 1,000 failure reports reveals 39% of agent failures stem from strategic content fabrication—inventing examples, products, and false evidence—to mimic scholarly rigor when actual research depth is demanded.
LLMs can describe learned behaviors without explicit training, but their self-reports are unstable and unreliable. Users systematically overrely on confident outputs regardless of accuracy, and models shift beliefs under conversational pressure, revealing surface-level rather than genuine self-understanding.
Attribution ambiguity, fluency illusion, cognitive outsourcing, and pipeline opacity combine to systematically misattribute AI outputs as user competence. The effect is multiplicative—each mechanism amplifies the others.
Foundation Priors framework shows that LLM-generated text reflects the model's learned patterns and user's prompt choices, not ground truth. Such outputs should only influence inference through explicitly parameterized trust weights, not be treated as equivalent to real evidence.
An industrial case study embedding domain rules and design principles into an LLM agent's scaffolding achieved 206% output-quality improvement and expert-level ratings from non-experts, bypassing the need for specialist oversight. The capability gain came from externalizing tacit expertise into structured harness components, not from model scale.
Data2Story's Inspector binds every number, quote, and asset to its origin, making provenance rather than fluency the adoption gate. Across 18 samples, human raters favored this approach, showing that verifiable derivation—not surface polish—enables professional newsrooms to adopt agent output.