SYNTHESIS NOTE
Training, RL, and Test-Time Scaling

Can ternary weights match full precision model performance?

Can models trained natively with only three weight values (−1, 0, 1) achieve the same perplexity and task performance as standard full-precision models? This matters because ternary weights could dramatically reduce computational and energy costs.

Synthesis note · 2026-06-03 · sourced from Mobile

Post-training quantization to low-bit weights is widely used but sub-optimal — it degrades a model trained in full precision. BitNet b1.58 instead trains natively with ternary weights {-1, 0, 1} (≈1.58 bits), and the headline result is that it matches the full-precision (FP16/BF16) Transformer of the same model size and training-token count on both perplexity and end-task performance — while being dramatically cheaper in latency, memory, throughput, and energy.

The keeper is not just efficiency but the reframing: 1.58-bit defines a new scaling law and training recipe for high-performance, cost-effective models, and — because ternary weights turn matrix multiplication into addition — it opens a path to hardware designed specifically for 1-bit LLMs. It also compounds with other bottlenecks: reduced activation precision (16→8 bit, further compressible) doubles feasible context length, and the small footprint eases MoE deployment by cutting devices and inter-chip communication.

This is the weight-precision route in the vault's efficiency-architecture thread, distinct from the attention-linearity route. It complements Can spiking neurons make transformers efficient on any hardware? (which buys efficiency via attention linearity + sparsity) and grounds Can architecture choices improve inference efficiency without sacrificing accuracy? with a concrete architecture where inference cost drops without an accuracy penalty.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 94 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

ternary 1-bit LLM weights match full-precision performance at the same size while defining a new cost scaling law and hardware paradigm