Show simple item record

dc.contributor.author Byeon, Boseon
dc.contributor.author Kovalchuk, Igor
dc.date.accessioned 2016-10-27T22:25:18Z
dc.date.available 2016-10-27T22:25:18Z
dc.date.issued 2016
dc.identifier.citation Byeon, B., & Kovalchuk, I.(2016). Pattern recognition on read positioning in next generation sequencing. PLoS ONE, 11(6), e0157033. doi:10.1371/journal/pone.0157033 en_US
dc.identifier.uri https://hdl.handle.net/10133/4637
dc.description Sherpa Romeo green journal: open access en_US
dc.description.abstract The usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in greater detail, we analyze NGS data from four organisms with different GC content, Plasmodium falciparum (19.39%), Arabidopsis thaliana (36.03%), Homo sapiens (40.91%) and Streptomyces coelicolor (72.00%). Using machine learning techniques, we recognize the pattern that the NGS read start is positioned in the local region where the nucleotide distribution is dissimilar from the global nucleotide distribution. We also demonstrate that the mono-nucleotide distribution underestimates sequencing bias, and the recognized pattern is explained largely by the distribution of multinucleotides (di-, tri-, and tetra- nucleotides) rather than mono-nucleotides. This implies that the correction of sequencing bias needs to be performed on the basis of the multi-nucleotide distribution. Providing companion software to quantify the effect of the recognized pattern on read positioning, we exemplify that the bias correction based on the mono-nucleotide distribution may not be sufficient to clean sequencing bias. en_US
dc.language.iso en_CA en_US
dc.publisher Public Library of Science en_US
dc.subject Next generation sequencing en_US
dc.subject Plasmodium falciparum en_US
dc.subject Arabidopsis thaliana en_US
dc.subject Homo sapiens en_US
dc.subject Streptomyces coelicolor en_US
dc.subject Read positioning en_US
dc.subject Pattern recognition en_US
dc.subject Nucleotide distribution en_US
dc.title Pattern recognition on read positioning in next generation sequencing en_US
dc.type Article en_US
dc.publisher.faculty Arts and Science en_US
dc.publisher.department Department of Biological Sciences en_US
dc.description.peer-review Yes en_US
dc.publisher.institution University of Lethbridge en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record