• We aimed to assess the strengths and limitations of using single-center EMR data in South Korea for oncology research. • Seoul National Univ. Hospital data captured 8.2% of national cancer incident and 10.7% of prevalent cases (2011–2021). • Age and sex distributions were highly similar between single-center and national data across most cancers. Strong annual trend correlations were observed for breast, lung, and pancreatic cancers. • Single-center EMR data can be a valuable resource for oncology research in South Korea. However, potential biases such as temporal disruptions and the underrepresentation of certain cancers should be adequately controlled to generalize findings from single-center data to broader populations. • Expanding such assessments would provide a broader understanding of potential selection biases and improve the generalizability of findings. Real-world data (RWD) from electronic medical records (EMRs) is increasingly utilized in oncology to complement evidence from clinical trials by reflecting routine clinical practice and diverse patient populations. However, many EMR-based studies rely on single-center data, limiting the generalizability of their findings. We aimed to evaluate the representativeness of single-center EMR data from Seoul National University Hospital (SNUH) by comparing it with national cancer data from the Korean Statistical Information Service (KOSIS). We compared annual cancer statistics from SNUH EMR and KOSIS (2011–2021) for ten cancer types: breast, gallbladder/biliary tract, gastric, kidney, liver, lung, pancreatic, prostate, thyroid cancers, and leukemia. We calculated the coverage proportion of cancer cases in the SNUH EMR relative to KOSIS. Differences in age and gender distributions between the two databases were analyzed. Annual trends in cancer cases were compared between two databases. From 2011 to 2021, SNUH data included 8.2% of national incident and 10.7% of prevalent cases, with high coverage for liver (20.4%) and pancreatic (20.3%) cancers. No significant differences in age and gender distribution were found across all cancer types (p > 0.05), with high cosine similarity (>0.8). Strong correlations in annual trends were observed for breast, lung, and pancreatic cancers (r > 0.9), while negative correlations were found for thyroid cancer prevalence (r = − 0.62) and liver cancer incidence (r = − 0.59). Single-center EMR data can be a valuable resource for oncology research in South Korea. However, external factors including changes in clinical guidelines should be considered when generalizing findings from such data to broader populations.
Won et al. (Sun,) studied this question.