SYNTHESIS NOTE
Language, Text, and Discourse Psychology, Society, and Alignment Model Architecture and Internals

Do LLMs represent low-resource cultures through dominant cultural proxies?

Explores whether language models internally represent cultures from data-poor regions by routing through high-resource cultural proxies rather than learning independent representations, and what this reveals about cultural bias in model architecture.

Synthesis note · 2026-04-18 · sourced from MechInterp
What actually happens inside the minds of language models? How should researchers navigate LLM reasoning research?

CultureScope is the first mechanistic interpretability method designed to probe how LLMs internally represent cultural knowledge. Using activation patching to extract cultural knowledge spaces, the paper reveals that cultural bias is not merely a surface output problem but a structural property of internal representations.

Cultural flattening as internal architecture. Visualization of the cultural flattening direction between cultures reveals unidirectional connections: low-resource cultures like Ethiopia and Algeria are internally represented through high-resource cultures like the United States and Iran. This means the model has not learned independent representations for these cultures — it has learned to route through dominant cultural proxies. When asked about Ethiopian customs, the model's internal representations partially activate American or Iranian cultural knowledge.

Hard-negative evaluation exposes the mechanism. Standard MCQ evaluation masks this because models can exploit surface-level elimination strategies without genuine cultural understanding. When culturally nuanced hard negatives are introduced (answers from similar but distinct cultures), models systematically favor culturally adjacent answers — explained by the unidirectional representation pathways CultureScope reveals.

Paradoxically, low-resource cultures are less susceptible. Cultures with very limited training data show less cultural flattening, likely because the model has insufficient data to form strong representational connections at all. The most affected cultures are those with moderate data — enough to trigger representation but insufficient to develop independent cultural knowledge structures.

This finding connects internal representation quality to downstream cultural harm. If a model represents Ethiopian culture as a variant of American culture internally, no amount of output-layer correction will fix the fundamental representational deficit. The bias is architectural, not behavioral.

Inquiring lines that use this note as a source 41

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 123 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs internalize Western-dominance bias and cultural flattening as unidirectional representation pathways — low-resource cultures are represented through high-resource cultural proxies