Tartary buckwheat (Fagopyrum tataricum), a member of the Polygonaceae family, is a crop of substantial medicinal and nutritional importance. Since the initial genome assembly (~450 Mb) was published in 2017, two subsequent versions have been reported (Zhang et al. 2017; Du and Liang 2019; He et al. 2023). However, these assemblies remain incomplete with persistent gaps. Moreover, current genomic studies have primarily focused on single nucleotide polymorphisms (SNPs), while structural variations (SVs)—a major contributor to genetic diversity-remain underexplored (Garg et al. 2024). Here, we present a near complete T2T genome assembly for the high-yield cultivar Heifeng1 of Tartary buckwheat. We complemented this reference genome with deep resequencing of 163 F. tataricum accessions and generated chromosome-level assemblies for 10 representative accessions (Figure 1A and Figure S1; Table S1). By integrating the T2T genome with these 10 new assemblies and two previously published chromosomal assemblies, we constructed the first graph-based pangenome for Tartary buckwheat. The T2T assembly of F. tataricum cv. Heifeng1 incorporated multiple sequencing technologies: 51.8× PacBio HiFi reads (43.5 Gb), 95.8× Oxford Nanopore Technologies (ONT) ultra-long reads (23.5 Gb), 58.7× Illumina short reads (26.6 Gb), and 71.1× Hi-C data (32.3 Gb) (Figure S2 and Table S2). Our assembly strategy yielded a gap-free near T2T assembly spanning 453.9 Mb across eight chromosomes, containing eight centromeres and 15 telomeres (Figure 1B; Figures S3 and S4; Table S3). Remapping of all sequencing data after quality control showed > 99.7% alignment rates with uniform coverage distribution across all chromosomes (Figures S5 and S6). Benchmarking metrics of the Heifeng1-T2T genome demonstrated exceptional assembly quality, with a contig N50 of 56.89 Mb, LTR Assembly Index (LAI) score of 14.76, 99.1% complete embryophyte BUSCOs, and a Merqury-estimated quality value (QV) of 50.59 (Table S4). Comparative analysis with existing assemblies revealed significant improvements in our genome, particularly in telomere resolution (Figure 1B). Notably, we identified and validated through Hi-C data a 5.4 Mb pericentric inversion on chromosome 7 (19.5–24.9 Mb; Figure S7), demonstrating the assembly's power to resolve complex structural variation. Repetitive elements constituted 59.57% of the Heifeng1-T2T genome, with long terminal repeat (LTR) retrotransposons representing the most abundant class at 39.23% (Table S4). Tandem Repeat Sequences accounted for 3.784%, while protein-coding transposable elements (TEs) comprised 12.13% (Table S4). RNA-seq-guided gene prediction using data from six tissues (root, stem, leaf, flower, pod and seed) enabled comprehensive gene prediction (Table S5). We integrated predictions from multiple approaches using EvidenceModeler, yielding 27 341 high-confidence protein-coding genes and 4838 non-coding RNA genes (Table S4). To establish a graph-based pangenome capturing F. tataricum's genetic diversity, we resequenced 163 accessions (18 wild, 145 cultivated) spanning 13 Chinese provinces (Figure 1A; Figure S1 and Table S1). Based on the SNP-based phylogeny and geographic distribution, we then selected 10 phylogenetically representative accessions (1 wild, 9 cultivars) for chromosome-level assembly (Figure S8). We performed genome assembly for each of the 10 accessions using a combination of ONT long-read and Illumina short-reads (Table S6). The resulting contig assemblies achieved N50 values of 10.82–55.21 Mb (Table S7 and Figure S9). The genome sizes of these assemblies ranged from 447.2 to 453.9 Mb, with BUSCO completeness scores between 98.1% and 99.3%, in which the most complete assembly contained only five gaps (Table S7). To further represent the major and typical genetic diversity of Tartary buckwheat, we incorporated two published genomes, Pinku1-2023 (He et al. 2023) and Qianku3-2023 (Lin et al. 2023), alongside the 10 newly assembled genomes. These highly contiguous genome assemblies enabled precise SV detection through collinear block alignment (Figure 1C,D). Aligning the 12 genomes to the Heifeng1-T2T reference genome, we found that wild accessions exhibited significantly higher SV prevalence than cultivated accessions (26 701 vs. 5561 SVs/sample, p 125.97), of which only 9.9% (29/293) were detected by SNP-based methods, and all were located in non-coding regulatory regions (Figures S15 and S16). A representative example involves a 59-bp exonic deletion in mikado.Chr8G1726, showing striking frequency divergence between the wild sub-population (17.8%) and the cultivated sub-population (0%) (Figure 1G,H). Functional characterisation revealed that this gene encodes GsSRK, a G-type lectin S-receptor-like serine/threonine protein kinase, known as a positive regulator of salt stress tolerance in plants. Additionally, a 104-bp deletion was discovered in the intron of mikado.Chr7G3091 (Figure S17), which harbours a rhamnogalacturonan lyase domain, potentially playing a role in contributing to crop development and maturation. To further evaluate the impact of SV on phenotypic variations, we conducted genome-wide association studies (GWAS) on 10 metabolic traits using both SNPs and SVs in the 163 newly sequenced accessions (Table S10). Notably, SV-based GWAS identified a significant association signal for sphingosine levels corresponding to a 235-bp insertion on chromosome 1, which was not detected in the SNP-based analysis (Figure 1I,J). In summary, this study establishes a near T2T assembly that surpasses previous assemblies of Tartary buckwheat. Leveraging this benchmark-quality assembly, we developed the species' first graph-based pangenome, serving as an unprecedented resource for functional genomics and precision breeding. Comparative population analysis and GWAS through this pan-genome framework reveal that structural variations represent a functionally distinct and evolutionarily important source of genetic diversity, playing a unique role in Tartary buckwheat domestication and agronomic trait improvement. D.H., P.Z. and Q.H. conceived and designed the research. W.L. and H.L. performed the bioinformatics analyses, statistical analysis, and wrote the manuscript. P.Z., D.H., and X.Z. revised the manuscript. All authors have read and agreed to the published version of the manuscript. We would like to thank Assoc. Prof. Zhang Liu from the Center for Agricultural Genetic Resources Research, Shanxi Agricultural University for his assistance in the collection of germplasm resources. We would like to thank Prof. Aili Wei and Dr. Fan Yang from the College of Biological Sciences and Technology, Taiyuan Normal University for their assistance in experiments and helpful discussion. The data that support the findings of this study are openly available in National Genomics Data Center at https://bigd.big.ac.cn/, reference number PRJCA036216. Data S1–S3: pbi70333-sup-0001-DataS1-S3.zip Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wendy Li
Hanfei Liang
Jilin Sun
Plant Biotechnology Journal
Chinese Academy of Agricultural Sciences
Institute of Crop Sciences
Taiyuan Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68c1d60654b1d3bfb60f9756 — DOI: https://doi.org/10.1111/pbi.70333