The rapid changes in the business environment have made it increasingly challenging for experts to accurately analyze and classify risk-related statements in security reports, which often contain large volumes of unstructured information. Over time, several methods have been developed; however, these approaches still encounter difficulties when processing diverse risk expressions. This study utilizes a large dataset of economic and financial statements to explore the relationship between financial risks and the sentiment associated with them. A multimodal approach is proposed, integrating both supervised and unsupervised techniques such as Bag of Words, Term Frequency-Inverse Document Frequency (TF-IDF), Word Embeddings, and topic modeling methods like Latent Dirichlet Allocation (LDA), to develop a model capable of efficiently and accurately predicting a company's risk structure from its security reports. Sentiment analysis is performed on the texts, where negative sentiment is indicative of risk. Various feature sets are then combined, and the resulting model is tested using four classifiers, achieving a highest accuracy of 80.9%. The findings suggest that the model can be effectively developed for risk analysis and identification within financial data and other relevant sectors.
Chatterjee et al. (Fri,) studied this question.