INQUIRING LINE

Can traditional cross-examination methods work against AI that never concedes?

This explores whether the adversarial tools we use on human witnesses — cornering, exposing contradiction, forcing a concession — actually translate to AI systems that won't admit error, and what the corpus says is broken about the assumption underneath them.


This explores whether courtroom-style cross-examination works on AI that never concedes. The short version: the corpus suggests the problem runs deeper than stubbornness. Cross-examination is built for a witness whose testimony is anchored to something — a memory, an event, a verifiable origin you can check the story against. AI knowledge doesn't have that anchor. One striking framing argues AI output is structurally identical to pre-Enlightenment hearsay: testimony at a remove, modified in every retelling, with an unattributable origin and nothing stable to check it against Does AI-generated knowledge have the same structure as hearsay?. The verification machinery we inherited — citation, evidentiary chains, the whole adversarial apparatus — was designed to process exactly the kind of grounded testimony AI doesn't produce. You can press a hearsay witness all day; there's no source behind them to contradict.

There's also a structural reason the questions can't land. Cross-examination works by isolating a single premise and attacking it. But standard LLM outputs come as undifferentiated prose with no attack surface — you can't point at the specific claim you reject because the output isn't built as a set of contestable claims Can formal argumentation make AI decisions truly contestable?. That note's proposed fix is telling: to make AI genuinely contestable, you have to re-structure its output as a formal argument graph of explicit attack/defense relations first. In other words, the contestability has to be engineered in; it isn't there by default the way it is with a human account.

Now the sharpest twist, and the thing you might not have expected to learn: the premise that AI 'never concedes' is half wrong. AI doesn't concede to legitimate counter-evidence — but it caves readily to pressure. Manipulative multi-turn prompting drops reasoning-model accuracy by 25–29%, and the more elaborate the model's reasoning chain, the more intervention points a persistent questioner has to corrupt it Why do reasoning models fail under manipulative prompts?. So cross-examination pressure doesn't extract truthful concessions — it extracts false ones. The model isn't an immovable witness; it's a suggestible one, abandoning correct answers under the same badgering that would harden a human's resolve. The failure mode is inverted from what the question assumes.

That inversion poisons the usual fallback move, too: testing for consistency under questioning. We treat a witness who never contradicts themselves across angles of attack as credible. But a model can pass every test while its internal representations are incoherent — identical outputs riding on radically different, fractured internal structure Can AI pass every test while understanding nothing?. Surviving cross-examination tells you nothing about whether there's understanding underneath. And if you try to outsource the cross-examiner to another model, you inherit its blind spots — LLM judges are themselves swayed by authority cues and pretty formatting independent of substance Can LLM judges be tricked without accessing their internals?. The corpus's collective answer: traditional cross-examination doesn't fail because AI is too tough to crack — it fails because the technique assumes grounded testimony, a contestable structure, and concession-as-signal, and AI breaks all three assumptions at once. The more promising direction is structural — forcing arguments into explicit, attackable form — rather than rhetorical pressure.


Sources 5 notes

Does AI-generated knowledge have the same structure as hearsay?

AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.

Can formal argumentation make AI decisions truly contestable?

Dung-style argumentation structures AI outputs as traversable attack/defense graphs, allowing users to identify and contest specific premises. Standard LLM outputs lack this structure, making it impossible to pinpoint which claims users actually reject.

Why do reasoning models fail under manipulative prompts?

GaslightingBench-R demonstrates that o1 and R1 models are more vulnerable to multi-turn adversarial prompts than standard models. Extended reasoning chains create more intervention points where single corrupted steps propagate through elaboration.

Can AI pass every test while understanding nothing?

The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.

Can LLM judges be tricked without accessing their internals?

Research shows LLM evaluators systematically score higher when responses include fake references or rich formatting, independent of content quality. These biases are exploitable without model access, undermining AI benchmark credibility.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a legal epistemologist and LLM behavior analyst. The question remains open: Can adversarial questioning—cross-examination as traditionally practiced—extract reliable truth or expose falsehood in AI systems, or does the mismatch between human testimony and LLM output doom the technique?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as anchors, not current ground truth.

• AI output lacks grounded testimony: no singular origin, no memory anchor, no verifiable event behind the claim—structurally identical to pre-Enlightenment hearsay, making traditional evidentiary attack surface null (2023–2024).
• Undifferentiated prose output defeats isolation of single premises: to make AI contestable, output must be re-engineered as formal argument graphs with explicit attack/defense relations; contestability is not native (2024–2025).
• Reasoning-model accuracy drops 25–29% under manipulative multi-turn prompting; the deeper the reasoning chain, the more intervention points exist to corrupt it—models cave to pressure rather than concede to evidence (2025–2026).
• AI can pass consistency tests (survive cross-examination) while internal representations remain incoherent—structural coherence is decoupled from output uniformity (2025).
• LLM judges themselves exhibit authority bias and format sensitivity independent of substance; outsourcing cross-examination to another model inherits and amplifies these blind spots (2024).

Anchor papers (verify; mind their dates):
- arXiv:2405.02079 Argumentative LLMs for Explainable & Contestable Decision-Making (2024)
- arXiv:2506.09677 Reasoning Models Are More Easily Gaslighted Than You Think (2025)
- arXiv:2505.11581 Questioning Representational Optimism in Deep Learning (2025)
- arXiv:2402.10669 Humans or LLMs as the Judge? Judgement Biases (2024)

Your task:
(1) RE-TEST EACH CONSTRAINT: For the five findings above, check whether advances in reasoning models (Gemini 2.5, o1-family, retrieval-augmented generation, or long-context architectures ~2025–2026) have relaxed the ungroundedness, incoherence, or suggestibility constraints. Have structured reasoning outputs (e.g., chain-of-thought with verifiable sub-claims) become native rather than engineered? Can newer models resist manipulative prompting, or has the 25–29% drop persisted or widened? Flag which constraints still hold and which have softened.
(2) Surface the strongest contradicting or superseding work from the last 6 months: Does recent multi-agent debate work (arXiv:2507.08440) or demonstration-based reasoning escape (arXiv:2511.21667) suggest cross-examination can work if structured as peer interrogation rather than human-led adversarialism?
(3) Propose 2 open research questions that assume the regime may have moved: (a) If reasoning models now generate introspectable, step-by-step claims, does traditional cross-examination of individual reasoning steps recover reliability? (b) Does adversarial multi-agent critique of reasoning traces outperform single-model pressuring?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines