What question did this study set out to answer?

The aim is to introduce the PCAP program, which enhances whole-genome assembly efficiency and accuracy.

September 1, 2003Open Access

PCAP: A Whole-Genome Assembly Program

XHXiaoqiu HuangIowa State University JWJianmin WangRoswell Park Comprehensive Cancer Center SASrinivas AluruGeorgia Institute of Technology

Key Points

The aim is to introduce the PCAP program, which enhances whole-genome assembly efficiency and accuracy.
Utilizes multiple processors for time-consuming assembly computations.
Implements a sensitive method for detecting overlaps and removing contaminated end regions.
Generates consensus sequences based on read alignment, incorporating base quality and coverage.
PCAP was tested successfully on a mouse genome dataset of 30 million reads.
Demonstrated enhanced assembly accuracy by reducing sequencing error impact in overlaps.
Program available for academic use, fostering further research in genome assembly.

Abstract

We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.

Demander à l'IA

Bookmark

View Full Paper