The Role of Sources in News Reporting
A study on how sources shape news articles.
― 5 min read
Table of Contents
News articles rely on sources to provide accurate information. Understanding when, how, and why reporters use sources can give us insights into the news we read. This understanding can also help journalists do their job better.
To support this, we created a large dataset that includes many examples of sources used in news articles. This dataset allows us to build models that can detect where information comes from and who provided it. We also introduced a new task, called source prediction, to study how sources work together in news stories. Our results show that we can effectively perform this task, which may help improve the way news articles are written and how journalists choose sources.
Journalism shapes our views, and the information we consume is based on the sources reporters use. Identifying these sources is relevant in various areas, such as detecting misinformation and understanding arguments in news discourse. Linking information to sources can be tough, as some attributions are clear, while others are more subtle. In the past, most efforts focused on simple cases, like identifying quotes, which resulted in high precision but missed many other instances.
Sources can combine in various ways within a single news article. Some sources are obvious, while others may be implied or unclear. Our main question is: does this article need another source?
Source Attribution
In our work, we define "source" broadly to include many ways journalists gather information. We identified 16 categories of sourcing and created the largest source-attribution dataset with over 28,000 attributions in more than 1,300 articles. By training models on this data, we achieved good accuracy in linking information to its sources.
We tested different methods and found that traditional lexical approaches and other models often struggled to perform well in this task. Many sentences possess sourced information that does not rely on clear keywords, making attribution challenging.
In the first part of our research, we focus on how to attribute sources. We establish criteria for what makes a sentence attributable to a source based on explicit or implicit signals. Sources can include individuals or organizations and can be mentioned directly or through more general terms.
We aim to maximize the number of attributions while also ensuring the same source is correctly identified across multiple sentences. This approach allows us to consider various information channels. Our dataset creation process involved recruiting annotators, including a professional journalist and a student, who worked together to label the articles. Their collaboration led to a high rate of agreement in identifying sources.
Source Attribution Models
We divided the source attribution task into two steps: Detection and Retrieval. Detection involves figuring out if a sentence can be linked to a source, while retrieval focuses on identifying which source it is. Using different models for each step proved to be more effective than combining both tasks into one.
The baseline methods we tested showed varied results. Some methods relied on finding patterns of co-occurrence between sources and speaking verbs, while others used more complex rules and syntactic analysis. We also explored approaches that utilize existing datasets to establish connections between sources and quotes.
For detection, we used a binary sentence classifier along with a document-wide embedding approach. For retrieval, we implemented methods that involve predicting tokens associated with sources, detecting spans within sentences, and generating open-ended responses to identify sources.
After evaluating the models, we found that the best-performing approach utilized a combination of advanced language models and our source detection methods, achieving a high accuracy rate.
Insights from Source Analysis
With a functioning attribution pipeline, we focused on learning how sources are used in news articles. We analyzed thousands of unlabeled documents to assess the extent to which articles attribute their information to sources and when these sources are typically used.
Our findings indicate that articles usually attribute around half of their sentences to sources, and this is consistent regardless of document length. However, the use of sources isn’t uniform: certain sources dominate, while others contribute less.
We also looked at how sources are added over time in articles. Initially, early versions often contain fewer sources, but as articles get updated, additional sources tend to be included consistently. This pattern suggests that understanding which sources are added can inform future recommendations for journalists.
Source Compositionality
An interesting question to explore is how certain sources are chosen to appear together in an article. We designed two approaches to tackle this question: ablation and NewsEdits.
In the ablation task, we systematically removed sources from articles and assessed how this affected the remaining content. The goal was to understand if the composition of sources was balanced or if certain sources were essential for the article's information.
The NewsEdits task focused on articles that had undergone changes. By examining version pairs of articles, we could see how many new sources were added over time and the relationships among them.
Our results showed that we could accurately predict when major sources were removed from articles, indicating that source usage follows a certain pattern. Major sources played a crucial role, while minor sources were less predictable.
Conclusion
In summary, our work provides a comprehensive overview of the sourcing habits in journalism. We developed an extensive dataset that captures a variety of source types and created models that can identify and attribute information effectively.
We believe our findings can help journalists improve their reporting by offering better tools to evaluate when and why sources are used in news articles. Moving forward, we hope to build a recommendation system that assists reporters in sourcing information.
Through this research, we aim to lay a foundation for further studies on the dynamics of source usage in news writing, paving the way for improvement in the quality and reliability of the news we consume.
Title: Identifying Informational Sources in News Articles
Abstract: News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational sources used in news writing. We show that our dataset can be used to train high-performing models for information detection and source attribution. We further introduce a novel task, source prediction, to study the compositionality of sources in news articles. We show good performance on this task, which we argue is an important proof for narrative science exploring the internal structure of news articles and aiding in planning-based language generation, and an important step towards a source-recommendation system to aid journalists.
Authors: Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara
Last Update: 2023-05-24 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2305.14904
Source PDF: https://arxiv.org/pdf/2305.14904
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.