INQUIRING LINE

Agentic Systems and Tool Use · Reasoning, Retrieval, and Evaluation · Model Architecture and Internalscross-cluster

How should forecasting methods adapt to a post-AGI regime?

This explores not how to predict AGI's arrival, but how the act of forecasting itself should change once AI systems become forecasters, actors, and economic agents — the corpus reframes the question from "predict the date" to "redesign the method."

This explores not how to predict AGI's arrival, but how forecasting practice itself should change once AI is good enough to forecast — and once the thing being forecast is an economy AI is rewriting. The corpus pushes back on the instinct to name a date. The clearest signal comes from work on the AGI-to-superintelligence transition, which argues there is no single timeline to forecast: progress runs through at least four distinct routes — raw scaling, a paradigm shift, recursive self-improvement, and collectives of agents — each with its own bottleneck, so the useful move is to track which frictions are loosening rather than betting on one curve What bottlenecks define the path from AGI to superintelligence?. Forecasting becomes bottleneck-watching, not timeline-picking.

The second adaptation is that the forecaster is now often a model, and models forecast best when the workflow does the thinking the prompt can't. Several notes converge here: a retrieval-augmented system reaches near human-crowd level on real questions published after its training cutoff, improving for free as model generations advance Can retrieval-augmented language models forecast like human experts?; LLMs turn out to have more latent forecasting skill than people credit, but only when the pipeline splits numerical extrapolation from event-driven contextual reasoning instead of cramming both into one prompt Can LLMs actually forecast time series better than we think?. The Nexus decomposition — contextualize, then take a dual macro/micro outlook, then synthesize — beats both pure time-series and pure LLM baselines Can decomposing forecasting into stages unlock numerical and contextual reasoning?. The lesson for a post-AGI regime: architecture of the forecasting process dominates raw model size, so method design is where the leverage is.

But there's a trap the corpus flags hard. A post-AGI world is one where models increasingly forecast a system shaped by their own outputs — and pure self-reference doesn't hold up. Self-improvement stalls on the generation-verification gap, diversity collapse, and reward hacking; the methods that actually work smuggle in external anchors like past model versions, third-party judges, user corrections, or tool feedback Can models reliably improve themselves without external feedback?. Relatedly, post-trained models start treating their own outputs as actions that shape future inputs, closing an action-perception loop Do models recognize their own outputs as actions shaping future inputs?. So forecasting post-AGI is reflexive: the forecast can move the world it's forecasting, and a method with no outside check will drift into circular confidence.

Know where AI forecasting already wins and where to stay skeptical. Models clear the human bar most easily in sparse-signal domains where experts only modestly beat chance — founder-success prediction, venture bets — where even raw capability suffices Can language models beat human venture capital experts?. That's also where to distrust fluent confidence: hedging language correlates with wrong reasoning, not careful reasoning, so a model's surface uncertainty markers are a poor calibration signal Do hedging markers actually signal careful thinking in AI?. A more principled stance treats forecasting as active information-gathering — simulating which question or probe would most reduce uncertainty rather than emitting a single point estimate How can models select the most informative question to ask?.

The part most forecasters underweight: the object of forecasting changes, not just the tool. In an AGI economy the foundational variables stop behaving — human wages drift toward the compute cost of replacing the work rather than its economic value, and labor's GDP share heads toward zero What happens to human wages in an AGI economy?. Diffusion won't be uniform either: firms substitute AI for labor at firm-specific rates with returns to internal capability, so aggregate trend lines hide sharp dispersion Do firms substitute labor for AI at different rates?. And outcomes like inequality aren't determined by the technology's trajectory at all but by deployment choices — access, integration, incentives Does generative AI inevitably worsen or reduce inequality?. The deepest adaptation, then, is humility about what's forecastable: post-AGI, the high-value forecasts are conditional and choice-dependent — "if deployed this way, then" — not deterministic extrapolations of a curve.

Sources 12 notes

What bottlenecks define the path from AGI to superintelligence?

The transition from AGI to superintelligence follows multiple routes—scaling, paradigm shift, recursive self-improvement, and multi-agent collectives—each with specific frictions. Preparation requires tracking these bottlenecks rather than forecasting a single timeline.

Can retrieval-augmented language models forecast like human experts?

A retrieval-augmented LM system achieved near-parity with competitive human forecasters on real forecasting questions published after model training cutoffs, sometimes surpassing human crowds. Newer model generations naturally improved forecasting without domain-specific tuning.

Can LLMs actually forecast time series better than we think?

LLMs have stronger intrinsic forecasting ability than recognized, but only when workflows separate numerical reasoning from contextual reasoning. Monolithic prompting obscures this capability; structured decomposition surfaces it.

Can decomposing forecasting into stages unlock numerical and contextual reasoning?

Nexus outperforms pure TSFM and LLM baselines on real-world datasets by decomposing forecasting into contextualization, dual-resolution macro/micro outlook, and synthesis stages. Separating numerical extrapolation from event-driven contextual reasoning avoids forcing one model to handle both simultaneously.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Can language models beat human venture capital experts?

VCBench shows several LLMs exceed human baselines in founder-success prediction, with DeepSeek-V3 achieving 6× market-index precision. In sparse-signal forecasting where experts only modestly beat chance, even raw LLM capability suffices to clear the human bar.

Do hedging markers actually signal careful thinking in AI?

Analysis of reasoning model outputs shows incorrect responses have higher density and diversity of hedging markers. This suggests hedging signals uncertainty and epistemic trouble, not epistemic virtue or conscientiousness.

How can models select the most informative question to ask?

UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.

What happens to human wages in an AGI economy?

As AGI automates bottleneck work first, human wages shift from reflecting economic value to reflecting compute costs. Labor's share of GDP approaches zero even as some accessory work remains human, driven by compute-allocation efficiency rather than irreplaceability.

Do firms substitute labor for AI at different rates?

Higher AI-exposed firms replace online labor marketplace workers with AI tools faster and at lower cost than less-exposed firms, suggesting returns to scale in internal AI capability rather than uniform technology diffusion.

Does generative AI inevitably worsen or reduce inequality?

An interdisciplinary review found that across information, work, education, and healthcare, generative AI can both exacerbate and reduce inequality. The direction is determined by access, integration, and incentive structures, not the capability itself.

How should forecasting methods adapt to a post-AGI regime?

Sources 12 notes

Next inquiring lines