This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard (BSC) framework into financial, customer, internal process, and learning and growth dimensions. Various machine learning and deep learning models—including k-nearest neighbors (KNNs), support vector machine (SVM), light gradient boosting machine (LightGBM), convolutional neural network (CNN), long short-term memory (LSTM), autoencoder, and transformer—were evaluated, with results showing that the inclusion of strategic textual data significantly enhanced prediction accuracy, precision, recall, area under the curve (AUC), and F1-score. Among individual models, the transformer architecture demonstrated superior performance in extracting context-rich semantic features. A soft-voting ensemble model combining autoencoder, LSTM, and transformer achieved the best overall performance, leading in accuracy and AUC, while the best single deep learning model (transformer) obtained a marginally higher F1 score, confirming the value of hybrid learning. Furthermore, analysis revealed that customer-oriented strategy disclosures were the most predictive among BSC dimensions. These findings highlight the value of integrating financial and narrative data using advanced NLP and artificial intelligence (AI) techniques to develop interpretable and robust corporate performance forecasting models. In addition, we operationalize information security narratives using a reproducible cybersecurity lexicon and derive security disclosure intensity and weight share features that are jointly evaluated with BSC-based strategic vectors.
Yu et al. (Tue,) studied this question.