Abstract Aiming at facilitating researchers to uncover new insights into the biology of childhood cancers and structural birth defects, the Gabriella Miller Kids First Pediatric Research Program (Kids First) is initiated. The Kids First Data Resource Center (KFDRC) developed the Kids First Data Resource Portal (KFDRP; https://portal.kidsfirstdrc.org/), a centralized data platform for both Kids First and collaborative cohorts. On behalf of KFDRC, we present as part of KFDRP the upgraded Variant WorkBench (VWB) with more data incorporated, on a more efficient platform, in a more streamlined data flow design, and capable of analyzing both germline and somatic genomic variants. First, the current collection of Kids First data include reharmonized genomics data of over 922,000 files in more than 35,400 participants from 35 studies. We also provide updated variant/gene annotation databases from more than 50 public resources (e.g. gnomAD, ClinVar, HPO etc.). Second, VWB is running on Velsera’s Cavatica Data Studio platform with a new Spark version 3.5.1 plus Python 3.11, achieving a ∼10 fold acceleration in terms of executing PySpark codes when compared to previous versions. Third, we redesigned the data flow from KFDRP to VWB, where portal users can now import Kids First data with which they have dbGaP approval directly to a Cavatica project and start analyzing in VWB. As an example, we show how to use VWB to identify deleterious variants within the same genes in both germline and somatic genomes of the same participant from the Children’s Brain Tumor Network, the largest Kid First cohort so far. In conclusion, the upgraded Variant WorkBench enables accelerated exploration of pediatric disease genomics under the Kids First program.
Guo et al. (Fri,) studied this question.