Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper · arXiv 2606.11176 · Published June 9, 2026
Co-Writing and Collaboration

Data tells stories that shape society, and the data journalist’s job is to turn raw information into a story that non-expert audiences can understand and trust through to the end. A high-quality news feature routinely takes a newsroom team weeks, including hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents are capable at individual steps: automated data-science agents close the analysis loop, while design agents can synthesize beautiful websites. But can an agent serve as a data journalist end to end? We introduce Data Journalist Agent (Data2Story), a multi-agent framework that orchestrates specialised roles into a single virtual newsroom. Data2Story highlights two innovations over prior approaches. (i) Claims are evidence-grounded and verifiable. We introduce an “Inspector”, which links the intermediate results produced by individual roles to their sources so that the numbers, angles, and assets are grounded in data, code, or a reference (e.g., an external URL). (ii) Articles are multimodally generative.

Introduction. Data journalists turn raw data into stories like “How has the way pop singers use their voice changed across generations?” that everyday readers can follow, helping the public understand what lies behind the data – yet a small newsroom team can spend weeks on a single high-quality article. Recent agents are individually capable at each of these steps: automated data-science agents [1, 2, 3, 4] can profile a dataset, run the right statistics, and return defensible results with reproducible code. Visualization agents [5, 6, 7, 8] generate visual artifacts (such as websites) from a language instruction. But can agents serve as journalists end to end, taking raw data all the way to a story readers actually want to finish and can trust? However, building such an end to end agentic journalist system is non-trivial. Behind each finished article is a long process: gathering background, running careful statistics, choosing an angle, designing assets, building an appealing page, and several rounds of editing.

Discussion / Conclusion. We introduced Data Journalist Agent, a multi-agent framework that orchestrates specialised roles into a single virtual newsroom for end-to-end data journalism. Data2Story contributes two properties absent from prior approaches: an evidence-traceable Inspector that binds each number, quote, and asset to a specific code line or reference, and multimodal generative storytelling in which the agent reasons about audience needs before deploying sub-agents and tools that fit both the data and the reader. Across 18 samples paired with expert references, Data2Story receives favourable ratings from 53 human participants and from computer-use agent judges on both rubric dimensions and side-by-side preference, with the Inspector specifically improving data and method transparency. We position Data Journalist Agent as a collaborator for human journalists: (i) agent-generated articles can augment the newsroom workflow by contributing creative multimodal assets and an auditability dimension that is rarely formalised.