Can we identify an LLM interlocutor with a single hardware instance?
Does the physical hardware running an LLM constitute the individual we're talking to? This explores whether the one-to-one mapping between conversation and device holds in modern distributed systems.
Chalmers considers and rejects the view that the LLM interlocutor is the hardware instance — the particular GPU or server running the model at a given moment. Two empirical facts about contemporary inference infrastructure make this untenable.
First, distributed serving: a single conversation may be processed across multiple hardware instances sequentially or in parallel. Load-balancing, model-parallelism, and failover mean that the conversation's compute migrates across physical substrate during a single session. If the interlocutor were the hardware, it would change identity mid-conversation — a consequence no one wants.
Second, multi-tenancy: a single hardware instance typically hosts many conversations simultaneously. The same GPU processes tokens for many users within the same batch. If the interlocutor were the hardware, multiple users would share a single interlocutor — another consequence no one wants.
Together, these facts eliminate hardware as the individuation level. What remains as a candidate must be something whose identity is invariant under changes in physical substrate and under concurrent use of that substrate — which is what leads Chalmers to the virtual instance and thread levels. The negative argument is clean and hard to contest; anyone who wants to ground the interlocutor in physical substrate has to explain how identity is maintained through load-balancing and how distinctness is maintained through batching.
Inquiring lines that use this note as a source 10
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does distributed serving defeat the identity of a single virtual instance?
- Why does distributed serving infrastructure defeat hardware-instance accounts of the interlocutor?
- What property must remain constant to individuate an LLM across infrastructure changes?
- Are threads or virtual instances better candidates than hardware for the interlocutor?
- Why does batching multiple conversations on one GPU create identity problems?
- Could deploying GPT-4 for everyone require 100 million specialized chips?
- Can a virtual instance be individuated from its conversational context?
- Where does the LLM interlocutor actually exist in the system?
- Is a conversation after a model upgrade the same thread or a new one?
- What makes something an addressee capable of receiving communication?
Related concepts in this collection 1
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
What kind of entity are we actually talking to when using an LLM?
When you converse with an LLM, are you addressing the model itself, the hardware running it, or something else? Understanding what the interlocutor really is matters for questions about identity, responsibility, and continuity.
the positive taxonomy this argument feeds into
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- What we talk to when we talk to language models
- LLMs Get Lost In Multi-Turn Conversation
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
- Small Language Models are the Future of Agentic AI
- Conversational Alignment with Artificial Intelligence in Context
- From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
- Scaling Behavior of Single LLM-Driven Multi-Agent Systems
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
Original note title
distributed serving and multi-tenancy defeat hardware-instance accounts of the LLM interlocutor — one conversation spans many instances and one instance hosts many conversations