Rui Chen, Ph.D.

Assistant Professor of Molecular and Human Genetics

Human Genome Sequencing Center

Baylor College of Medicine

Houston, Texas

 

 

Initial Sequence Assembly and Analysis of the Rat Genome

 

An important physiologic and pharmacologic model organism for studies of human disease, the rat is the third mammalian species to have undergone large-scale genome sequencing.  Unlike the human and mouse sequencing projects, only draft-level sequencing will be produced by the Rat Genome Sequencing Project (RGSP).  To facilitate the subsequent finishing at selected regions of biological interest while maintaining good coverage of the whole genome, a combined clone-by-clone and whole-genome shotgun (WGS) approach was chosen for the RGSP.  The high throughput and consistency of WGS generates vital information about the overall sequence characteristics of the genome.  The localized genomic origin of BAC clone shotgun readings enables useful draft assemblies at low coverage, and lets high-priority regions be sequenced more deeply, while potentially resulting in an earlier completion date.

 

Using both BAC clone fingerprint data and software developed at Baylor for picking BAC clones that walk into gaps, a set of about 20000 BAC clones that are evenly distributed across the genome are "skim" sequenced at about 1-2x coverage.  In the mean time, 4x WGS sequences are generated from pair end sequencing of multiple libraries with different insertion sizes.  An Atlas assembler developed at HGSC is used to combine these two sets of data in the assembly of the rat genome.  The assembly process was divided into two stages: BAC fisher and Global Assembly.  First, a sampling-based sequence comparison tool is used to rapidly map WGS reads onto individual BAC and produce a set of "enriched" BACs.  Second, these "enriched" BAC clones are used as units to produce the final global assembly by merging overlapping clones and linking adjacent clones using information from pair ends.  The current assembly contains 157561 sequence contigs that are linked into 1,032 segments.  These contigs combine cover 2.56 billion base pairs, over 90 percent of the 2.8 Gb rat genome, which is somewhat smaller than the human, but larger than the mouse genome.  Detailed strategy and initial analysis of the rat genome assembly will be discussed.