Streaming algorithms for whole genome assembly
Streaming algorithms for whole genome assembly
About the research
High throughput sequencing has revolutionized the field of genomics. Current sequencing technologies are relatively cost effective, but rely on storing all data on disk and processing them in bulk. The future of genetic sequencing will require technologies and algorithms where the DNA sequences are analyzed ‘on the fly’ without storing the intermediate results, such as short sequence reads. Here we propose developing streaming algorithms and open source software capable of working in this new setting. We will consider the problem of whole genome assembly from short sequence reads. The streaming nature of the algorithm dictates that the amount of computer memory required should grow linearly in the size of the output, i.e. the genome, but sub-linearly in the size of the input. Storage requirements and excessive memory usage are currently a major bottleneck for whole genome assembly algorithms. The development of such an algorithm would be of immediate benefit, even for current sequencing technologies, as well as making processing of much larger datasets practical.
Participants at the University of Iceland
No content has been found. |