How do power-law distributions differ from uniform collision assumptions?
This explores why assuming hash collisions land evenly across all IDs (uniform) breaks down once you account for the fact that real-world usage follows a power law — a few entities account for most of the traffic.
This question is really about a hidden assumption baked into a lot of engineering: that if you hash IDs into a fixed-size table, the inevitable collisions will scatter randomly and harmlessly across everything. That's the uniform assumption — every user and item is equally likely to collide, so the damage averages out. The corpus pushes back hard on this. Real recommendation traffic isn't uniform; it's power-law distributed, meaning a small number of users and items generate an outsized share of activity, while a long tail barely appears at all Do hash collisions really harm popular recommendation items?.
The consequence is the opposite of harmless. Because high-frequency entities show up far more often, they collide far more often — so the errors from collision pile up precisely on the popular items and active users the model most needs to get right Why do hash collisions hurt recommendation models so much?. Under a uniform assumption you'd budget for 'a little noise everywhere.' Under a power law you get concentrated damage exactly where traffic — and revenue — is highest. Monolith's empirical work shows this is why fixed-size hashed embedding tables degrade over time: new IDs keep arriving, the table can't grow, and collisions accumulate on the heavy hitters rather than dispersing.
The deeper, less obvious point is that a power-law distribution is double-edged. The same skew that makes collisions dangerous is also what makes models work at all on common cases — frequency is where accuracy concentrates. You can see the identical mechanism in language models: high-frequency phrasings are the ones LLMs handle best, so users unconsciously rephrase toward them, flattening distinctive inputs into the model's preferred forms Does high-frequency text homogenize user input before generation?. In both cases the distribution isn't a nuisance layered on top of clean data — it *is* the structure of the data, and any design that assumes uniformity inherits a systematic blind spot toward the tail and a systematic concentration of error at the head.
So the difference isn't a small statistical correction. Uniform assumptions predict diffuse, tolerable error; power laws predict sharp, targeted error that lands on your most valuable entities and quietly worsens as the system scales. The practical takeaway from the recommendation work is that fixed-size hashing is structurally inadequate for production — you need collision-free or dynamically growing embedding storage, because no amount of bigger-but-still-fixed table fixes a mismatch between a uniform design and a power-law world.
Sources 3 notes
Real recommendation IDs follow power-law distributions, not uniform ones. High-frequency users and items collide more often, degrading model quality exactly where traffic is highest, making fixed-size hash tables inadequate for production systems.
Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.
Adam's Law shows LLMs flatten distinct prompts at comprehension time as users rephrase toward higher-frequency forms the model handles best. The same distributional property that creates accuracy on common tasks filters out distinctiveness on the input side.