ABSTRACT Background Brazil's Unified Health System (SUS) does not provide a person identifier in the publicly available inpatient system (SIH) microdata, limiting longitudinal analyses. We propose a deterministic linkage framework (Nexus) designed to construct longitudinal real‐world datasets under these structural constraints. Methods We first established a cloud‐based preprocessing pipeline using a Lakehouse architecture to ingest, standardize, and harmonize raw administrative data from the Ministry of Health's public repository of national databases, 2008–2024, into reproducible inpatient and outpatient tables. Auxiliary sources were integrated to enrich metadata. SIA records containing the encrypted National Health Card (CNS) were curated using internal consistency filters for sex, date of birth (DOB), and postal code (CEP). Linkage with SIH applied a quasi‐identifier (CEP, DOB, sex) with an α‐shrinkage rule to exclude ambiguous high‐density cells, prioritizing specificity by retaining only unique matches. Candidate cohorts varying by α and start year were compared using data quality diagnostics. Results From 224.7 M unique CNS in SIA (2008–2024), curation yielded 12.9 M patients. Exact‐match linkage of SIH hospitalizations produced a final Nexus cohort of 9.2 M patients. Using α = 40 and a 2012 start was associated with improved temporal consistency and stable disease event fractions. Conclusions Nexus demonstrates that a conservative, transparent deterministic linkage framework can be used to construct a longitudinal cohort under structural data constraints. Coverage is reduced—mainly because CEP is concentrated in high‐complexity claims—resulting in a selected subpopulation enriched for specialized care. Accordingly, Nexus is suited for longitudinal analyses of treatment pathways and outcomes, but not for population‐level inference or incidence estimation.
Oliveira et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: