What does this research mean for the field?

A two-phase workflow integrating bibliometric science mapping with structured thematic content analysis, utilizing a test-retest reliability procedure, enables a single researcher to comprehensively map research fields and identify thematic gaps invisible to bibliometrics alone. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

June 7, 2026Open Access

A reproducible bibliometric-content analysis workflow for mapping research fields in computer science

Key Points

The aim is to introduce a reproducible workflow for mapping research fields in computer science using bibliometric and thematic analyses.
Utilized a two-phase reproducible workflow in R using the bibliometrix package.
Phase 1 involved clustering publications by keyword co-occurrence; Phase 2 used thematic coding on purposively selected papers.
Achieved dual-coder reliability verification through a test-retest procedure with κ = 0.82.
Identified four thematic clusters within 648 AI-FinTech publications from 2017-2026.
Regulatory compliance gaps and AI-blockchain integration opportunities were discovered via thematic coding.
The entire process was completed by a single researcher in approximately 22 active working hours.

Abstract

Computer science literature indexed in Scopus and Web of Science has grown at double-digit annual rates, making comprehensive manual synthesis infeasible for individual researchers. Bibliometric workflows partially address this problem but rarely yield the interpretive depth needed to characterize a field’s accomplishments or gaps. This paper introduces a two-phase reproducible workflow integrating bibliometric science mapping (Phase 1) with structured thematic content analysis (Phase 2), implemented in R using the bibliometrix package. Phase 1 clusters publications by keyword co-occurrence; these clusters serve as the sampling frame for purposive selection of representative papers, which undergo deductive-inductive thematic coding in Phase 2. Thematic coding of this type typically requires dual-coder reliability checks; a test-retest procedure replaces that requirement, maintaining κ = 0.82 without a second coder. Applied to 648 AI-FinTech publications (2017-2026), the workflow identifies four thematic clusters and achieves κ = 0.82 . Regulatory compliance gaps and AI-blockchain integration opportunities, invisible to bibliometric analysis alone, emerged only through thematic coding. A single researcher completes the process in approximately 22 active working hours without dedicated infrastructure. • Integrates bibliometric science mapping with structured thematic content analysis into a single reproducible R-based workflow applicable to any computer science sub-field. • Links Phase 1 cluster outputs to Phase 2 sampling via an explicit allocation formula, replacing ad hoc paper selection with a principled, data-driven decision rule. • Enables single-author reliability verification via a test-retest procedure ( κ ≥ 0.80 ), removing the dual-coder requirement as a practical barrier for PhD researchers.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Thanh-Cong Truong (Mon,) studied this question.

synapsesocial.com/papers/6a2509de7def13d035e1a3c0 https://doi.org/https://doi.org/10.1016/j.mex.2026.103991

Bookmark

View Full Paper