Analyzing Company Reports: Words Matter More
This study connects word analysis in reports to ESG performance.
― 8 min read
Table of Contents
- Shift in Analysis from Numbers to Words
- Challenges in Analyzing Textual Information
- The Growing Importance of Corporate Social Responsibility
- Using Multi-Task Learning to Connect Text and ESG
- The Landscape of Annual Reports
- The Role of Multi-Task Learning
- Dataset and Annotation
- Classification Methods Used
- Results from Multi-Task Learning Experiments
- Insights into ESG Ratings and Textual Features
- Conclusion
- Original Source
- Reference Links
When looking at how companies perform, experts have started to pay more attention to the words they use in their reports, not just the numbers. These words can give hints that help understand how a company is doing and how it might do in the future. This shift in focus allows investors and analysts to gather more information beyond just financial data.
In this context, this work focuses on using Multi-task Learning methods to analyze the text in annual reports. We look at various aspects of the content, such as the feelings expressed in the text, its Objectivity, whether it looks ahead to future events, and how it relates to Environmental, Social, and Governance (ESG) criteria.
The best approach we found combines results from several tasks to improve the overall analysis. By using these classifiers, we analyzed the annual reports of companies in the FTSE350 index. We also looked for connections between the qualitative features derived from the text and the numerical ESG scores provided by financial analysts.
Shift in Analysis from Numbers to Words
There has been a growing recognition that the words used in reports can be just as important, if not more critical, than the financial figures. Traditionally, analysts worked with this information manually, but the sheer volume of textual data in recent years has made that approach impractical.
The textual data about companies comes from three main sources: required disclosures to the public, news articles, and social media. However, corporate reports are given special attention because they are released periodically and are regulated to ensure that companies provide detailed information about their financial situation.
These reports contain not only numbers but also rich textual data that can provide insights about the company and its future. For instance, the choice of words and the tone can reveal much about a company’s situation that numbers alone cannot show.
Challenges in Analyzing Textual Information
Extracting and processing qualitative information from financial reports is more challenging compared to numerical data. Researchers are now looking at not just the content of reports but also their stylistic features. For instance, understanding Sentiment or objectivity in these reports can provide clues about how investors might react to a company’s stock.
In this study, we focus on three main stylistic indicators: sentiment (positive or negative feelings), objectivity (fact versus opinion), and forward-looking statements (predictions about the future). Additionally, we explore how well these features relate to ESG themes. ESG relates to how companies behave toward society and the environment while still trying to make profits.
Investors and regulators are increasingly interested in ESG as it reflects a company's social responsibility. As a result, businesses have become more aware of their impact on the environment and have begun reporting on these aspects regularly.
Corporate Social Responsibility
The Growing Importance ofCorporate Social Responsibility (CSR) has seen a rise in attention over recent years. This concept includes activities that companies undertake to address social and environmental concerns, beyond merely seeking profit. Examples include reducing pollution and making charitable donations.
Regulatory bodies, such as those in the EU, have started requiring companies to disclose information related to their CSR practices. The criteria for ESG evaluation cover various aspects, including environmental impact, business relations with stakeholders, and governance matters like leadership accountability and transparency.
However, numerical indicators that effectively measure a company's ESG performance are still lacking. For this reason, much of the analysis is still carried out manually by experts in the field.
Using Multi-Task Learning to Connect Text and ESG
In this study, we aim to connect stylistic indicators from reports with ESG-related themes using multi-task learning. We enhance pre-trained language models by training them to classify text based on sentiment, objectivity, forward-looking nature, and ESG content.
We highlight the challenges in grasping the sentiment, objectivity, and forward-looking aspects concerning financial reports. By analyzing and classifying content in annual reports based on these factors, we can better understand how they connect to ESG themes.
Our approach showed that one effective method is to explicitly use predictions from auxiliary tasks as features for the main task. This method proves helpful even for tasks that are subject to high subjectivity.
The methodology we developed can be applied to various topics beyond ESG, expanding its relevance to other areas where qualitative insights can be drawn from textual data.
The Landscape of Annual Reports
The analysis of annual reports is a well-researched topic in finance but is less so in the realm of natural language processing (NLP). A particular focus has been on 10-K filings, which are standardized reports required in the U.S. However, outside the U.S., the reports can vary significantly in their structure and how they communicate information.
In the UK, for example, there has been a notable increase in the size and complexity of annual report narratives. This growth reflects a larger challenge for automated analysis. While the sheer amount of data has grown, the lack of standardization makes it more complex to analyze and requires more advanced methods.
Additionally, concepts like ESG are relatively new and have not yet found a place in standardized reporting practices, leading to inconsistencies across different companies.
The Role of Multi-Task Learning
Multi-task learning (MTL) is an approach where multiple related tasks are solved simultaneously, allowing for shared learning that can boost performance. By effectively using MTL, we can improve results in tasks where data might be limited.
In this study, we examined how to utilize various stylistic features to extract information from annual reports. This involved employing pre-trained language models within a supervised MTL setting.
The idea is to fine-tune a language model on the tasks relevant to our analysis while leveraging the relationships among these tasks. The effectiveness of the approach depends on the similarities between the tasks. Tasks that are closely related can help improve performance due to shared learning.
Dataset and Annotation
Our analysis focuses on a collection of annual reports from FTSE350 companies covering the years 2012 to 2019. This dataset includes 1,532 annual reports converted from PDF format into raw text.
For our study, we generated an annotated dataset, where sentences from the reports were labeled for five specific tasks. These tasks included relevance, financial sentiment, objectivity, forward-looking statements, and ESG content.
To ensure reliability, we calculated agreement levels among the annotators. This evaluation showed that while sentiment and ESG tasks had a higher level of agreement, tasks related to objectivity and relevance had much lower agreement levels.
Classification Methods Used
In our classification work, we employed an encoder-decoder system. The encoder captures the essence of each sentence, while separate decoders handle the classification tasks based on the shared sentence representation.
We utilized a pre-trained language model, RoBERTa, which is known for its effectiveness in NLP tasks. The encoder processes input sentences, while different decoders exist for each classification task.
We explored various MTL architectures and methods for the classification tasks, including both joint and sequential training approaches. The goal was to optimize performance by leveraging shared learning across tasks.
Results from Multi-Task Learning Experiments
In our experimental evaluations, we used the macro-F1 score as the key metric for performance. This metric is ideal for classification tasks with class imbalance.
We split our dataset into training, development, and test sets, ensuring robust evaluations. Each method was run multiple times to ensure consistent results.
Across all tested approaches, the ExGF-MTL method stood out as the most effective. This system enabled the model to learn from each task while enhancing the performance of the ESG classification task in particular.
By investigating various combinations of tasks, we observed that excluding less reliable tasks like objectivity and relevance tended to improve model performance.
Insights into ESG Ratings and Textual Features
With ExGF-MTL identified as the leading method, we used it to extract features from the annual reports and analyzed their relationship with ESG ratings provided by financial agencies.
To prepare the data, we carefully filtered sentences based on specific criteria to ensure quality. The features we extracted included the proportion of ESG-related sentences and their sentiment.
In our correlation analysis, we employed Spearman correlation to examine relationships between textual features and ESG scores. Notably, industries had an impact on the correlation observed, with certain features aligning closely with their respective sectors.
For instance, the extent of ESG-related content correlated positively with better ESG ratings. This result underscores the importance of discussing ESG issues in annual reports.
Conclusion
In summary, this work emphasizes the significance of qualitative analysis in financial reports. By applying multi-task learning techniques, we can better extract meaningful information from the text, connecting it with important ESG metrics.
The findings demonstrate how sentiment, objectivity, and forward-looking statements can offer valuable insights into a company's CSR efforts. Moreover, the methods developed can extend to other fields requiring analysis of textual data in conjunction with numerical metrics.
Future research may explore causal relationships between features in reports and financial performance, enhancing the knowledge base surrounding corporate behaviors and their implications.
Title: Multi-Task Learning for Features Extraction in Financial Annual Reports
Abstract: For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.
Authors: Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski, Igor Lončarski, Aljoša Valentinčič
Last Update: 2024-04-08 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2404.05281
Source PDF: https://arxiv.org/pdf/2404.05281
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://www.springer.com/gp/computer-science/lncs
- https://link.springer.com/chapter/10.1007/978-3-031-23633-4_1
- https://ec.europa.eu/info/business-economy-euro/company-reporting-and-auditing/company-reporting/corporate-sustainability-reporting_en
- https://gitlab.com/smontariol/multi-task-esg
- https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/shared-task-finsim4-esg
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
- https://www.refinitiv.com/