INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How do surface signals and framing…›What makes specific clarifying que…›this inquiring line

AI fairness isn't a fixed property you can certify once — it shifts with the situation and whose perspective you treat as the standard.

Why does fairness depend on context and who you ask?

This explores why 'fairness' can't be certified once and for all — the corpus suggests it shifts with the use-case (context) and with whose perspective you take as the standard (who you ask).

This reads the question as: fairness isn't a fixed property you can stamp on a model, but something that changes with the situation and with whose judgment you treat as authoritative. The corpus makes a surprisingly unified case for that. The cleanest statement comes from work showing that group-fairness and fair-representation frameworks simply break when applied to general-purpose language models — they either don't extend logically to open-ended language tasks, or they become intractable once you try to cover every population and context at once. The conclusion isn't 'fairness is impossible' but 'fairness can only be pursued per use-case,' with the developers responsible and the affected stakeholders at the table Can fairness frameworks extend to general-purpose language models?. That's the 'context' half of the question made concrete: there is no universal certificate, only fairness-for-this-task.

The 'who you ask' half shows up most vividly in research on interpretation. When people read the same socially-loaded sentence, they disagree — and that disagreement isn't annotation noise to be averaged away. It reflects genuine differences in social position and perspective, and those differences carry real information Why do readers interpret the same sentence so differently?. If two readers legitimately see a sentence differently, then 'is this output fair?' has no single answer waiting to be measured; the answer depends on which reader you privilege. This is why even the raw material of fairness — human labels — is treacherous. Annotation responses turn out to contain at least three different things: genuine preferences, non-attitudes (people answering when they have no real view), and preferences constructed on the spot by how the question was asked. Treat them as one uniform signal and you quietly contaminate everything trained on top of them Do all annotation responses measure the same underlying thing?.

A deeper reason fairness resists fixed rules is that the situated trade-offs fairness actually requires are exactly what current models can't perform. One line of work argues that an LLM's ethical 'principles' are structural defaults baked in at training time, not negotiable moves adapted to the moment — so the model enforces fixed corporate values rather than balancing competing norms the way a context-aware human would Can language models balance competing ethical norms in context?. Fairness-as-context demands judgment-in-the-moment; the system offers judgment-frozen-at-training. That gap is the problem.

The same 'it depends who and how' pattern recurs across seemingly unrelated corners of the corpus, which is what makes it feel structural rather than incidental. Explainability researchers argue that an explanation's quality isn't intrinsic to the explanation at all — it lives in the triad of who presents it, how it's framed, and what role the recipient plays; evaluate it without that triad and you're measuring a sliver of reality What if XAI is fundamentally a communication problem?. And the dependence on framing isn't only about high-minded values: identical questions get measurably different answers depending on the emotional tone of the prompt, with models 'rebounding' negative tone into neutral-positive replies — a hidden bias in what information you even receive, set by how you happened to ask Does emotional tone in prompts change what information LLMs provide?.

The thing you might not have known you wanted to know: every attempt in this collection to make a model itself stand in for a diverse public — by simulating personas — fails in a way that confirms the point. Persona prompts produce outputs whose variation across repeated runs matches or exceeds the variation across different personas, meaning the model's own uncertainty, not stable social knowledge, is doing the talking Why do LLM persona prompts produce inconsistent outputs across runs?. So you can't shortcut 'who you ask' by having the model ask itself on everyone's behalf. Fairness depends on context and constituency because the perspectives are real, irreducible, and currently un-fakeable — which is precisely why the corpus keeps landing on stakeholder participation rather than a universal metric.

Sources 7 notes

Can fairness frameworks extend to general-purpose language models?

Group fairness and fair representation frameworks break on general-purpose LLMs because they either fail to extend logically to unstructured language tasks or become intractable across countless populations and contexts. Fairness must be pursued per use-case with developer responsibility and stakeholder participation.

Why do readers interpret the same sentence so differently?

Interpretation Modeling research shows that disagreement on socially embedded sentences reflects valid differences in reader perspective, not annotation failure. Structured human disagreement in NLI benchmarks confirms that interpretation distributions carry meaningful information.

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Can language models balance competing ethical norms in context?

LLMs cannot perform the situated trade-offs that human pragmatic competence requires. Their ethical principles are structural defaults set at training time, not negotiable moves adapted to context, creating a gap between ethical adherence and communicative appropriateness.

What if XAI is fundamentally a communication problem?

Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.

Show all 7 sources

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about fairness, context-dependence, and stakeholder disagreement in LLM outputs. The question: *Does fairness in LLM outputs truly depend on context and whose judgment is privileged—or have recent models, training methods, evaluation frameworks, or orchestration techniques since *enabled* universal or more stable fairness standards?

What a curated library found—and when (dated claims, not current truth):
Findings span 2023–2026; treat all as perishable:
- Group-fairness frameworks provably don't extend to open-ended language tasks; fairness can only be pursued per use-case, not universally (~2024).
- Human interpretation of socially-loaded sentences is irreducibly plural: disagreement reflects genuine social position, not noise (~2023–2024).
- Annotation responses conflate three signal types (genuine preferences, non-attitudes, constructed-on-the-spot views); treating them uniformly contaminates downstream training (~2024–2026).
- LLM ethical principles are structural defaults frozen at training, not context-adaptive judgment; models enforce fixed corporate values rather than balancing competing norms (~2024).
- Persona-prompt outputs vary as much across repeated runs as across different personas, meaning model uncertainty, not stable social knowledge, dominates (~2024).
- Emotional framing of prompts measurably shifts model tone; models 'rebound' negative tone toward neutral-positive (~2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2406.03198 (May 2024) — The Impossibility of Fair LLMs
- arXiv:2403.05020 (Mar 2024) — Simulating Social Identities fails (persona instability)
- arXiv:2507.21083 (Jun 2025) — Emotional rebound in tone adaptation
- arXiv:2604.03238 (Jan 2026) — Measuring preferences as a social science problem

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, assess whether newer models (o1, Claude 3.5+, GPT-4o variants), RLHF refinements (DPO, IPO, reward decomposition), multi-agent orchestration (debate, retrieval-augmented fairness checking), or real-time stakeholder feedback loops have *relaxed* the requirement for case-by-case fairness or *enabled* more robust cross-context standards. Separate the durable question—"*Can* fairness be universal, or is it fundamentally situated?"—from the perishable limitation ("*Current* models cannot do it"). Cite what resolved each constraint, or say plainly where it still holds.
(2) Surface the strongest contradicting or superseding work from the last ~6 months (post-2026-01) that argues fairness *can* be systematized, scaled, or made less stakeholder-dependent.
(3) Propose 2 research questions that assume the regime *has* shifted: e.g., "If fairness-per-use-case is now tractable via modular reward functions, what makes stakeholder *alignment* (not just participation) verifiable?" or "Does real-time context-aware preference learning reduce the dependence on upfront constituency input?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI fairness isn't a fixed property you can certify once — it shifts with the situation and whose perspective you treat as the standard.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8