May 1, 2004Open Access

The Ensembl Analysis Pipeline

Key Points

Key points are not available for this paper at this time.

Abstract

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Simon Potter

European Bioinformatics Institute

Laura Clarke

University Hospital of North Tees

Val Curwen

Wellcome Sanger Institute

Journals

Genome Research

Actions

Institutions

Broad Institute

Wellcome Sanger Institute

Wellcome Trust

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Ensembl Analysis Pipeline

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study