What fine-grained distinctions matter most for human situated action in categories?
This explores what the corpus calls 'situated action' — the idea that humans hold onto subtle, context-dependent distinctions inside a category that machines tend to flatten away, and asks which of those distinctions actually do the work.
This explores why humans keep fine-grained shadings inside a category when LLMs tend to throw them out — and what those kept distinctions buy us. The cleanest anchor in the collection comes from a Rate-Distortion analysis showing that LLMs and humans make opposite trades: models aggressively compress concepts to capture the broad shape of a category, while humans deliberately *under*-compress, preserving nuance that doesn't improve efficiency but does enable acting well in a specific situation Do LLMs compress concepts more aggressively than humans do?. So the answer to 'which distinctions matter most' is reframed: the ones that matter are precisely the ones a pure compressor would discard as noise, because their payoff is contextual rather than statistical.
What does that nuance look like concretely? Several notes suggest the load-bearing distinction is rarely the category label itself but *how strongly something belongs and in what context*. Work on projective content finds that meaning is gradient, not binary — the very same word projects more or less depending on what question is currently being discussed, not on any fixed property of the word Does projection strength vary by context or by word type?. That is situated action in miniature: the distinction that matters shifts with the conversational moment. The same lesson shows up in how human judgments decompose — annotation responses aren't one signal but three (genuine preferences, non-attitudes, and on-the-spot constructed preferences), and treating them as interchangeable corrupts everything downstream Do all annotation responses measure the same underlying thing?. The fine-grained distinction there is *why* a person gave an answer, which a coarse category collapses entirely.
There's a structural reason machines lose this. Embedding spaces organize themselves coarse-to-fine — leading eigenvectors split the big taxonomic branches first and only resolve fine sub-branches later, mirroring a hypernym tree Do embedding eigenvectors organize taxonomy from coarse to fine?. And the semantic dimensions inside those embeddings entangle: nudge one feature and aligned features move with it, so models can't isolate a fine distinction without off-target drift Do LLM semantic features organize along human evaluation dimensions?. The architecture is built to get the gist cheaply, which is exactly why the contextual fine print falls out.
The interesting twist is that this gap may be closable with *structure* rather than scale. When visual reasoning on social tasks is broken into staged cognitive scaffolding — embodied perception, then reading the situation, then norm-grounded interpretation — performance jumps, suggesting the missing ingredient was situated structure, not more raw reasoning Can breaking down visual reasoning into three stages improve model performance?. Conversely, piling on verbose reasoning actively *degrades* fine-grained perception because it optimizes the wrong bottleneck Does verbose chain-of-thought actually help multimodal perception tasks?. And teaching quality judgments fails when models only see labeled examples — they learn surface patterns instead of the principled criteria that let a distinction generalize to new cases Can models learn argument quality from labeled examples alone?.
The thing you might not have known you wanted: the distinctions that matter most for situated action are the ones with the *worst* compression ratio — high cost to store, low statistical payoff, but decisive in the moment. Even the deepest human/LLM divide turns out to be perspective-dependent: from the outside the two systems look categorically different, but inside a shared discourse they draw on the same symbolic substrate Do humans and LLMs differ fundamentally or just superficially?. Which means the gap in fine-grained, situated judgment is less a hard wall than a difference in what each system is willing to pay to keep.
Sources 9 notes
Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.
Across 19 English expressions, projectivity varies continuously based on whether content addresses the Question Under Discussion. The same presupposition trigger projects more or less depending on context, not on fixed lexical properties.
Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.
Leading eigenvectors of embedding Gram matrices separate broad taxonomic branches first, then progressively finer sub-branches—a coarse-to-fine spectral order that tracks the WordNet hypernym tree level by level, confirming predictions from co-occurrence statistics.
Twenty-eight semantic axes in LLM embeddings reduce to three principal components matching human EPA structure. Intervening on one feature predictably shifts aligned features proportionally, creating unavoidable off-target effects that reflect how meaning is fundamentally organized.
CoCoT structures VLM reasoning through embodied perception, embedded situation analysis, and norm-grounded interpretation, achieving +8% improvement over flat CoT on social benchmarks. The gains suggest cognitive structure matters more than reasoning volume for social tasks.
Long rationales and text-token RL help reasoning but hurt fine-grained perception tasks because the actual bottleneck is visual attention allocation, not verbalization. Standard CoT optimization trains the wrong policy target.
Fine-tuning on labeled examples fails to transfer quality criteria to new argument types. Models learn surface patterns rather than principled criteria. Explicit instruction using frameworks like RATIO or QOAM significantly improves performance and generalization.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.