Machine learning methods for the prediction of antimicrobial resistance and identification of novel markers of resistance in Escherichia coli

Loading...
Thumbnail Image
Date
2023
Authors
Moat, Janice
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : University of Lethbridge, Dept. of Chemistry and Biochemistry
Abstract
Antimicrobial resistant strains of pathogenic Escherichia coli are a burden on the healthcare system, causing longer hospital stays and increased treatment costs compared to nonresistant strains. The proportion of E. coli infections in Canada caused by resistant strains producing extended-spectrum beta-lactamase rose from 3.4% in 2007 to 11.1% in 2017. Use of most antimicrobials in the treatment of Shiga-toxin producing E. coli infection is not recommended due to their propensity to increase toxin production. Rapid detection of resistant strains would improve both treatment and prevention of this pathogen. With whole genome sequencing (WGS) now ubiquitous in the analyses of outbreak and surveillance samples, in silico methods can be both faster and cheaper than traditional wet-lab methods. In this work, machine learning (ML) classification methods were used for the prediction of antimicrobial resistance and the identification of potentially novel genomic markers of resistance in E. coli. There are four supplementary files to accompany Chapter 3. Supplementary Figure 1 is the Phylogenetic tree of the 4300 E. coli isolates with Salmonella as an outgroup; it is coloured by country of origin and serotype and paired with legends for each. Supplementary Table 1 contains the list of publicly available genomes collected and their corresponding metadata. Supplementary Table 2 contains the top features extracted from trained machine learning models and their annotations. Supplementary Table 3 contains the accuracy data for training machine learning models across a range of 100 to 5000 features. Supplementary Table 4 contains the complete performance data for the 11-mer and 25-mer machine learning models.
Description
Keywords
Bioinformatics , Machine learning , Antimicrobial resistance , Bacteria , E. coli
Citation