What question did this study set out to answer?

The aim is to create a streamlined repository of textual financial data for easier analysis in business research.

February 25, 2026

Textual Financial Data Repository for Machine Learning, Artificial Intelligence, and Textual Analyses: Major Sections from 10-K, 10-Q, and Financial Statement Notes Extracted Using Shared Python Code

Key Points

The aim is to create a streamlined repository of textual financial data for easier analysis in business research.
Extracted sections from 10-K, 10-Q filings, and financial statement notes using shared Python code.
Provided pre-calculated textual metrics like word counts and readability measures.
Developed two new word lists related to COVID-19 and human capital based on disclosure shocks.
Facilitated the extraction of textual data from financial filings for all firms from 2008 onward.
Enabled consistency and efficiency in financial analysis research.
Provided tools for measuring aspects of textual data, including sentiment analysis.

Abstract

ABSTRACT Financial reports, including 10-K and 10-Q filings, are a primary source of textual data in business disciplines. However, extracting specific sections from these lengthy documents remains a challenge. Custom code development by each research team to parse these files leads to redundancy, inefficiency, and inconsistencies and is especially challenging for teams lacking technical expertise. We address this by offering raw textual data from MD C88; M4; M48.

Mark Helpful

Bookmark

Relay