Does refusing explicit knowledge harm AI system performance?
AI systems trained purely on data without explicit domain knowledge may sacrifice interpretability, robustness, and fairness. This explores whether structured knowledge injection could mitigate these tradeoffs.
The AI field's romance with tacit knowledge — learning everything from data, refusing to incorporate explicit causal models, domain rules, or codified expertise — is creating avoidable problems. Polanyi's Revenge names the irony: Polanyi's paradox said we know more than we can tell. AI's tacit-learning agenda inverts this: we can tell (we have explicit domain knowledge) but AI refuses to hear it.
The pattern is pervasive. Researchers build Rubik's Cube solvers from billions of examples rather than accepting the eight simple rules governing the puzzle. Industry practitioners convert doctrine and standard operating procedures into "data" only to have the knowledge "learned back" from that data at enormous cost. Policy infrastructure for AI relies exclusively on massive datasets even when hard-won explicit knowledge exists.
The costs are direct:
Interpretability: When systems learn their own representations from raw data, there is no reason to believe their reasoning will be interpretable to humans. Explicit knowledge, by contrast, provides the structural vocabulary for explaining decisions — "the system applied rule X to fact Y." Pure tacit learning produces weights that serve no interpretive function.
Bias: Systems that learn from data inherit whatever is statistically dominant in that data, with no explicit signal to override it. Explicit knowledge can include normative corrections that data alone cannot supply ("do not discriminate by X, regardless of correlation").
Robustness: Tacit learners generalize in the direction of training distribution. Explicit rules can enforce invariances that data doesn't adequately represent. A system that "knows" a causal rule can maintain it when data is sparse or adversarially constructed.
The civilizational argument is pointed: human progress has been built on codification — approximate, aspirational, but explicit. The current AI agenda is running the opposite direction, learning to distrust codification in favor of raw statistical patterns. This is a historically unusual choice with consequences.
Connects directly to domain specialization: Can prompt optimization teach models knowledge they lack? confirms this is not a prompting problem — the explicit knowledge must enter at training time. Can organizing knowledge structures beat raw training data volume? shows that structured explicit knowledge injection at 0.3% of corpus size substantially closes the gap.
Inquiring lines that use this note as a source 15
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can discrete codes and embedding injection both solve the text versus identity tradeoff?
- How much does organized knowledge improve learning efficiency versus raw data?
- What techniques work best for injecting domain knowledge at training time?
- Why does explicit theory injection work better than example-based learning for reasoning tasks?
- Can prompting alone inject new domain knowledge into a model?
- Can prompt optimization alone inject knowledge models don't already have?
- What role does knowledge injection play in adapting RAG to industry taxonomies?
- How does prompt context activation differ from parameter-based knowledge injection?
- What makes knowledge editing different from simply finding where facts are stored?
- Can prompt optimization or fine-tuning inject knowledge models do not already contain?
- Can knowledge poisoning attacks succeed with less than 0.05 percent modified text?
- What alternatives exist when required knowledge is absent from training?
- What training cost tradeoffs exist between fine-tuning and other knowledge injection methods?
- Can ethical constraints in AI address the gap between performance and actual understanding?
- Why can't AI truly understand expertise without joining the validating community?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can prompt optimization teach models knowledge they lack?
Explores whether sophisticated prompting techniques can inject new domain knowledge into language models, or if they're limited to activating existing training knowledge.
confirms the problem: if explicit domain knowledge wasn't in training data, no prompt can supply it; tacit learning created the deficit
-
Can organizing knowledge structures beat raw training data volume?
Does structuring domain knowledge into taxonomies during training enable models to learn more efficiently than simply increasing the amount of training data? This challenges assumptions about scaling knowledge injection.
structured explicit knowledge injection is the fix; knowledge organization outperforms raw data volume
-
Does model access level determine which specialization techniques work?
Different specialization approaches require different levels of access to a model's internals. Understanding this constraint helps practitioners choose realistic techniques for their domain adaptation goals.
explicit knowledge injection is constrained by access tier; the Polanyi problem is most acute in black-box contexts
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
- A Survey on Knowledge Distillation of Large Language Models
- Polanyi’s Revenge and AI’s New Romance with Tacit Knowledge
- Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
- How new data permeates LLM knowledge and how to dilute it
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
- Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Original note title
ais rejection of explicit domain knowledge in favor of tacit learning creates interpretability bias and robustness problems