Pattern recognition on read positioning in next generation sequencing

dc.contributor.authorByeon, Boseon
dc.contributor.authorKovalchuk, Igor
dc.date.accessioned2016-10-27T22:25:18Z
dc.date.available2016-10-27T22:25:18Z
dc.date.issued2016
dc.descriptionSherpa Romeo green journal: open accessen_US
dc.description.abstractThe usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in greater detail, we analyze NGS data from four organisms with different GC content, Plasmodium falciparum (19.39%), Arabidopsis thaliana (36.03%), Homo sapiens (40.91%) and Streptomyces coelicolor (72.00%). Using machine learning techniques, we recognize the pattern that the NGS read start is positioned in the local region where the nucleotide distribution is dissimilar from the global nucleotide distribution. We also demonstrate that the mono-nucleotide distribution underestimates sequencing bias, and the recognized pattern is explained largely by the distribution of multinucleotides (di-, tri-, and tetra- nucleotides) rather than mono-nucleotides. This implies that the correction of sequencing bias needs to be performed on the basis of the multi-nucleotide distribution. Providing companion software to quantify the effect of the recognized pattern on read positioning, we exemplify that the bias correction based on the mono-nucleotide distribution may not be sufficient to clean sequencing bias.en_US
dc.description.peer-reviewYesen_US
dc.identifier.citationByeon, B., & Kovalchuk, I.(2016). Pattern recognition on read positioning in next generation sequencing. PLoS ONE, 11(6), e0157033. doi:10.1371/journal/pone.0157033en_US
dc.identifier.urihttps://hdl.handle.net/10133/4637
dc.language.isoen_CAen_US
dc.publisherPublic Library of Scienceen_US
dc.publisher.departmentDepartment of Biological Sciencesen_US
dc.publisher.facultyArts and Scienceen_US
dc.publisher.institutionUniversity of Lethbridgeen_US
dc.subjectNext generation sequencingen_US
dc.subjectPlasmodium falciparumen_US
dc.subjectArabidopsis thalianaen_US
dc.subjectHomo sapiensen_US
dc.subjectStreptomyces coelicoloren_US
dc.subjectRead positioningen_US
dc.subjectPattern recognitionen_US
dc.subjectNucleotide distributionen_US
dc.titlePattern recognition on read positioning in next generation sequencingen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kovalchuk pattern recognition on read.pdf
Size:
479.77 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.13 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections