DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles

Paper · arXiv 2308.04076 · Published August 8, 2023

Authoring data-driven articles is a complex process requiring authors to not only analyze data for insights but also craft a cohesive narrative that effectively communicates the insights. Text generation capabilities of contemporary large language models (LLMs) present an opportunity to assist the authoring of data-driven articles and expedite the writing process. In this work, we investigate the feasibility and perceived value of leveraging LLMs to support authors of data-driven articles. We designed a prototype system, DATATALES, that leverages a LLM to generate textual narratives accompanying a given chart. Using DATATALES as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept, from which we distilled affordances and opportunities to further integrate LLMs as valuable data-driven article authoring assistants.

Introduction. Data-driven articles that feature primarily textual narratives containing claims and insights backed by data and illustrated with data visualizations are a popular means of communication in fields like journalism and business reporting [34]. Authoring data-driven articles, however, is often a complex and tedious process. Authors need to analyze the data to identify insights, order insights in an appropriate sequence, and write a cohesive narrative to communicate those insights with effective transitions and appropriate domain context. The emergence of contemporary large language models (LLMs) and their remarkable text generation capabilities led to increased interest in assessing their value for a range of creative writing tasks [7], including data storytelling [18]. While this technology has the potential to fundamentally reshape the way people use writing tools [31], it also introduces news challenges such as unreliable outcomes, lack of domain understanding, prompt complexity, ethical concerns, among others [18].

Discussion / Conclusion. Despite the limited nature of the tool as a proof-of-concept design probe, participant reactions to the experience ranged from congenial to enthusiastic. Rationales on how DATATALES supported their authoring experience in new and positive ways are compiled below. (T6) Insights over data facts. While data facts are an important part of a data story, the segments most often repurposed and appreciated by participants were those containing level-3 and level-4 statements in Lundgard and Satyanarayan’s categorization of chart descriptions [20], which participants referred to as “the why’s” (P3, P10, P11). For example, on a dataset about cars acceleration vs. horsepower vs. country of origin, this could include things like identifying trends (e.g., “cars with higher horsepower tend to have better acceleration rates’’), conclusions following findings (e.g., “The US auto market prioritizes higher horsepower”), and external context (e.g., “policymakers should consider regulating emissions for consumers who value speed over efficiency”). Several added that aggregating this “human knowledge” was one of the most valuable aspects of the experience, complementing their authoring work with new information (P1, P3), alternative framings (P3, P7), and confirmation of current viewpoints (P3, P11).

DATATALES: Investigating the use of Large Language Models for Authoring Data-Driven Articles

Synthesis notes from this paper's topics