What question did this study set out to answer?

This research focuses on enhancing structured radiology reporting through a new dual-path cross-attention model to improve clinical decision support.

May 4, 2026Open Access

DPA-HiVQA: Enhancing Structured Radiology Reporting with Dual-Path Cross-Attention

Key Points

This research focuses on enhancing structured radiology reporting through a new dual-path cross-attention model to improve clinical decision support.
Proposed DPA-HiVQA model utilizes multi-scale image embedding from BioViL encoder.
Implemented a dual-path cross-attention mechanism for improved semantic and spatial reasoning.
Evaluated using the Rad-ReStruct benchmark, comparing F1-scores against established baselines.
Overall F1-score improved by 21.2%, demonstrating effective engagement with structured report templates.
Level 3 F1-score improved by 31.9%, indicating superior handling of hierarchical dependencies.

Abstract

Structured radiology reporting can improve clinical decision support by standardizing clinical findings into hierarchical formats. However, thousands of questions in structured report templates about clinical findings are prohibitively time-consuming, which can limit clinical adoption. Furthermore, early medical VQA datasets primarily focused on free-text and independent question–answer pairs while a recent dataset, Rad-ReStruct, introduced a hierarchical VQA, but the accompanying model still relies heavily on flattened embedding representations and single-path text–image fusion mechanisms that inadequately handle complex hierarchical dependencies in responses. In this paper, we propose DPA-HiVQA (Dual-Path Cross-Attention for Hierarchical VQA), addressing these limitations through two key contributions: (1) multi-scale image embedding representing global semantic embeddings with patch-level spatial features from domain-specific BioViL encoder; (2) dual-path cross-attention mechanism enabling simultaneous holistic semantic understanding and fine-grained spatial reasoning. Evaluated on the Rad-ReStruct benchmark, the model substantially outperforms the established benchmark baseline with an overall F1-score and Level 3 F1-score improvement by 21.2% and 31.9%, respectively. The proposed model demonstrates that dual-path cross-attention architectures can effectively connect holistic semantic understanding and fine-grained spatial detail, paving the way for practical AI-assisted structured reporting systems that reduce radiologist burden while maintaining diagnostic accuracy.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Do et al. (Fri,) studied this question.

synapsesocial.com/papers/69f837933ed186a739981c1e https://doi.org/https://doi.org/10.3390/make8050113

Bookmark

View Full Paper