This paper reports the methods, results and analysis of STMK24 for the NTCIR-U4 Table QA (TQA) task. STMK24 approaches TQA as a Visual Document Understanding task, and tables are transformed into three different modalities: image, text, and layout of the content. To simply comprehend the structures of the tables, our model is trained to infer the cell IDs of the tables, and the cell values are automatically extracted through rule-based conversion. We investigated the impact of each modality on Table QA performance and confirmed that the model achieves high cell ID inference accuracy when utilizing all modalities.
Aida et al. (Fri,) studied this question.