What question did this study set out to answer?

The aim is to improve the traceability between software architecture documentation and source code through automated techniques.

April 10, 2026Open Access

Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition

Key Points

The aim is to improve the traceability between software architecture documentation and source code through automated techniques.
Introduced the ArTEMiS approach for recognizing architectural entities in documentation.
Extended evaluation of ExArch for extracting component names from software architecture documentation and source code.
Compared results against state-of-the-art approaches like SWATTR, TransArC, and ArDoCode.
TransArC achieved a high F1 score of 0.87 but requires manual SAMs.
ExArch performed similarly with an F1 score of 0.86 using only SAD and code.
ArTEMiS matched SWATTR with an F1 score of 0.81 and is suitable for integration with TransArC.
The combination of ArTEMiS and ExArch outperformed ArDoCode in automated SAM generation and traceability.

Abstract

Identifying architecturally relevant entities in textual artifacts is crucial for Traceability Link Recovery (TLR) between Software Architecture Documentation (SAD) and source code. While Software Architecture Models (SAMs) can bridge the semantic gap between these artifacts, their manual creation is time-consuming. Large Language Models (LLMs) offer new capabilities for extracting architectural entities to construct SAMs automatically or establish direct trace links. This paper extends our ICSA 2025 paper, which introduced ExArch for LLM-based architecture component name extraction, by contributing the novel ArTEMiS approach, an extended evaluation, and a combined evaluation of both approaches. ExArch extracts component names as simple SAMs from SAD and source code, while ArTEMiS identifies architectural entities in documentation and matches them with SAM entities. Our evaluation compares against state-of-the-art approaches SWATTR, TransArC, and ArDoCode. TransArC achieves strong performance (F1: 0.87) but requires manually created SAMs; ExArch achieves comparable results (F1: 0.86) using only SAD and code. ArTEMiS matches SWATTR (F1: 0.81) and can replace it when integrated with TransArC. The combination of ArTEMiS and ExArch outperforms ArDoCode, the best baseline without manual SAMs. Our results demonstrate that LLMs can effectively enable automated SAM generation and TLR, making architecture-code traceability more practical and accessible.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Dominik Fuchß

Haoyu Liu

Sophie Corallo

Karlsruhe Institute of Technology

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider