ABSTRACT
The fluoroquinolone-resistant sequence type 1193 (ST1193) of Escherichia coli, from the ST14 clonal complex (STc14) within phylogenetic group B2, has appeared recently as an important cause of extraintestinal disease in humans. Although this emerging lineage has been characterized to some extent using conventional methods, it has not been studied extensively at the genomic level. Here, we used whole-genome sequence analysis to compare 355 ST1193 isolates with 72 isolates from other STs within STc14. Using core genome phylogeny, the ST1193 isolates formed a tightly clustered clade with many genotypic similarities, unlike ST14 isolates. All ST1193 isolates possessed the same set of three chromosomal mutations conferring fluoroquinolone resistance, carried the fimH64 allele, and were lactose non-fermenting. Analysis revealed an evolutionary progression from K1 to K5 capsular types and acquisition of an F-type virulence plasmid, followed by changes in plasmid structure congruent with genome phylogeny. In contrast, the numerous identified antimicrobial resistance genes were distributed incongruently with the underlying phylogeny, suggesting frequent gain or loss of the corresponding resistance gene cassettes despite retention of the presumed carrier plasmids. Pangenome analysis revealed gains and losses of genetic loci occurring during the transition from ST14 to ST1193 and from the K1 to K5 capsular types. Using time-scaled phylogenetic analysis, we estimated that current ST1193 clades first emerged approximately 25 years ago. Overall, ST1193 appears to be a recently emerged clone in which both stepwise and mosaic evolution have contributed to epidemiologic success.
INTRODUCTION
Extraintestinal pathogenic Escherichia coli (ExPEC) is a significant source of morbidity and mortality to humans. The enhanced virulence and colonization capacities of ExPEC strains enable them to cause a range of extraintestinal diseases, including urinary tract infection, urosepsis, wound infection, and neonatal meningitis (1).
The ExPEC population structure is dynamic. For example, within the past 15 years a major shift in lineages occurred, with the rise of the fluoroquinolone-resistant H30R subclone within sequence type ST131, a lineage that was previously rare among ExPEC causing human infections (2). A hallmark of ST131-H30R is its acquisition of multiple chromosomal mutations that confer fluoroquinolone resistance, which likely contributed to the lineage's emergence (2). However, the global rise of multidrug-resistant clonal groups such as ST131-H30R is also likely enabled by additional as-yet-undefined traits, possibly including virulence genes, that promote ecological success.
Recently, another emerging fluoroquinolone-resistant ExPEC clonal group was identified among human-source clinical E. coli isolates: ST1193 (3–5). ST1193 isolates are O type O75 and are derived from the ST14 clonal complex (STc14), which, like ST131, is part of virulence-associated phylogenetic group B2, which accounts for most human E. coli clinical isolates (5). ExPEC ST1193 isolates have been shown to display strong mutator ability and biofilm formation capacity (4). Furthermore, ST1193 isolates have been recovered from humans across the globe, including Australia (5), China (6–8), South Korea (9, 10), Norway (11), and the United States (3). ST1193 isolates have also been recovered from companion animals in Australia (5) and Japan (12). While most of these isolates originated from fecal or urogenital sources, ST1193 also was implicated in a recent case of lethal neonatal meningitis (13). Other traits documented in ST1193 include lactose non-fermentation, presence of the K1 or K5 bacterial capsule, and resistance to diverse non-fluoroquinolone antibiotics, sometimes including extended-spectrum cephalosporins (3). ST1193 isolates have been shown to possess IncI1 plasmids capable of transferring these drug resistance phenotypes to susceptible E. coli recipients (6).
While ST1193 appears to be an emergent ExPEC clonal group, little is known about its total genomic repertoire. Thus, we sought to analyze isolates of the emergent ST1193 clonal group using whole-genome approaches.
RESULTS
Phylogeny, fimH alleles, and capsule variants.Using reference-based mapping, identification of single-nucleotide polymorphisms (SNPs), and SNP-based phylogenetic analysis, all ST1193 isolates (Table 1) clustered together phylogenetically, well separated from isolates from other STs within STc14 (Fig. 1; also see Data Set S1 in the supplemental material). All ST1193 isolates possessed the fimH64 allele, while other members of STc14 possessed the fimH27 allele. All of the ST14 and ST550 isolates analyzed possessed the K5 capsular genotype, as did a basal cluster of 26 ST1193 isolates (7.3% of total), all with earlier isolation dates (2008 to 2015). In contrast, most ST1193 isolates (92.7% of total) lacked the K5 capsule genotype and instead possessed the K1 capsular genotype.
Escherichia coli isolates used in this study (n = 427)
Phylogenetic tree of ST1193 (n = 355) and non-ST1193 STc14 (n = 72) isolates based on single-nucleotide polymorphisms in nonrecombinant core genome regions. In the tree, bright green is ST1193 with IncF-:A1:B1 plasmids, red is ST1193 with IncF-:A1:B10 plasmids, pink is ST1193 with IncF-:A1:B20 plasmids, and black is non-ST1193, STc14. Blue-colored circles indicate the presence of specific capsular types, and purple-colored squares indicate presence of F-plasmid alleles. Isolate names colored green indicate the ST1193 isolates sequenced in this study.
Fluoroquinolone resistance mutations.All of the ST1193 isolates had the same four nonsynonymous mutations in parC (S80I), parE (L416F), and gyrA (D87N and S83L) housekeeping genes that are known to confer fluoroquinolone resistance (Fig. 1) (14). We were unable to find either a basal ST1193 strain not harboring these mutations or a stepwise pattern of mutation acquisition in ancestral STc14 isolates, although several STc14 isolates harbored one or the other (but not both) of the gyrA mutation(s), i.e., D87N or S83L. In a SNP-based phylogenetic tree constructed using core genome regions with recombinant regions removed via Gubbins, the K5 and K1 ST1193 isolates segregated completely into discrete clades (Fig. 1).
lacY mutation.Additionally, because all ST1193 isolates displayed a lactose-negative phonotype, we examined the lac operon for possible deletions or mutations. We found a deletion of a thymine at nucleotide position 239 in ST1193 isolates that resulted in a frameshift mutation and disruption of lacY. This deletion was consistent across ST1193 isolates. The same deletion was identified in several non-ST1193, STc14 isolates, indicating the presence of this mutation prior to the emergence of ST1193.
Virulence genes.Virulence genes identified among the present ST1193 isolates included the glutamate decarboxylase gene gad (355/355, 100%), IrgA homologue adhesin gene iha (353/355, 99.4%), secreted autotransporter gene sat (340/355, 95.8%), plasmid-carried enterotoxin gene senB (307/355, 86%), and vacuolating autotransporter gene vat (333/355, 93.8%) (Table 2). We also explored these genes in the context of a recently completed ST1193 genome from neonatal meningitis-associated strain MCJCHV-1 (GenBank accession no. CP030111) (15). In strain MCJCHV-1, sat was found on a chromosomal genomic island bound by IS3 elements, also containing the aerobactin siderophore system. The vat gene was found adjacent to the tRNA-Thr gene, bound by an IS1 element. The senB gene was found on a plasmid in a locus described previously, also containing the ColIa immunity gene (16).
Prevalence of plasmid-associated replicons and F-plasmid alleles among Escherichia coli ST1193 and non-ST1193, STc14 isolates
The high prevalence of senB among ST1193 isolates was of interest, because this gene is on a plasmid that codes for a toxin associated with highly virulent or dominant clonal groups, such as ST131 and ST95 (17). This prompted us to examine further the plasmid content of the ST1193 isolates, using a subset containing isolates sequenced in this study and representative database isolates (Fig. 2). Most ST1193 isolates contained a RepFIA replicon and a RepFIB replicon but lacked a RepFII replicon (Table 2). Using plasmid multilocus sequence typing (MLST) based on F plasmid alleles, the RepFIA allele was highly prevalent across ST1193 isolates (A1). However, three different alleles of RepFIB were identified (B1, B10, and B20), and these segregated precisely according to the phylogeny (Fig. 1 and 2).
Phylogenetic distribution of accessory traits in representative ST1193 and non-ST1193 STc14 isolates. Squares, colored by trait category, represent the presence of a trait examined. Mutations refer to mutations conferring fluoroquinolone resistance. Year (green to red, older to recent) and ST are categorically colored for comparison. Plasmid mapping refers to the presence of a region (>95% similarity and >80% length) relative to plasmid p1ESCUM.
To determine predicted plasmid structure, BLASTN searches against ST1193 draft genome assemblies were done using reference plasmid p1ESCUM (GenBank accession no. CU928148), which is from strain UMN026, a serotype O17:K52:H18 cystitis isolate from STc69 (Fig. 2) (18). This plasmid was used because it contains six distinct regions typical for this plasmid type, each bounded by repetitive elements. The six regions contained (i) the RepFIB replicon, (ii) the colicin immunity (ColIa) region and senB, (iii) Island 1, containing a predicted ABC-type transport system associated with iron transport and a putative DNA-binding transcriptional regulator (16), (iv) the RepII replicon, (v) F-plasmid transfer genes, and (vi) plasmid stability genes. As expected based on plasmid replicon typing, the region containing RepFIB was highly prevalent (107/113, 95%), whereas the region containing RepFII was uncommon (4/113, 4%). Other high-prevalence regions included those containing ColIa (96/113, 85%), Island 1 (102/113, 90%), and the plasmid stability/maintenance locus (76/113, 67%). In contrast, the F transfer region was very uncommon (2/113, 2%), and we were unable to detect any mobilization genes, indicating loss of transfer capability by these plasmids.
We also mapped the reads from the same subset of isolates onto the completed ST1193 plasmid, pNMEC-O75A (GenBank accession no. CP030112) (15) (Fig. 3). This plasmid is smaller than p1ESCUM and has truncations of several of the above-mentioned p1ESCUM regions, plus complete absence of an F transfer region. Mapping to pNMEC-O75A confirmed that although F plasmid alleles differed across ST1193 isolates, the overall level of conservation of this plasmid and its key traits was high. Interestingly, in many of the isolates, Island 1 (containing putative iron transport and utilization systems) was partially deleted, and this deletion was phylogenetically congruent. In contrast, the ColIa island containing senB was intact and highly conserved.
Mapping of sequencing reads from ST1193 and non-ST1193 STc14 isolates against plasmid pNMEC-O75A in the context of F-plasmid replicon. In the sequencing maps, black peaks indicate regions with exact matches, colored peaks indicate matching regions with nucleotide variation, and white regions indicate areas with no mapping of reads.
The ST1193 isolates also contained replicons suggesting large and small plasmid types. The large-plasmid replicons, which included RepI1, RepQ1, RepL/M, and RepB/O/K/Z (Fig. 2), did not cluster phylogenetically in the core genome tree, indicating movement and/or loss between isolates and/or multiple acquisitions from non-ST1193 isolates. The small-plasmid replicons, which included Col156, Col(BS512), ColRNAI, Col(MG828), and Col8282, did not mimic the phylogenetic distribution of the F-type plasmids.
Among the ST1193 isolates, the most prevalent resistance genes included blaTEM-1B (ampicillin resistance), sul2 (sulfonamide resistance), strAB (streptomycin resistance), dfrA17 (trimethoprim resistance), and mphA (macrolide resistance) (19) (Table 3). Different patterns of co-occurrence were observed that depended on the F plasmid allele (Fig. 2). Isolates containing the F-:A1:B20 allele often possessed blaTEM-1B-aadA5-aac(3)-IId-strAB-dfrA17-mphA-sul1-sul2-tetA (7/14, 50%), those containing the F-:A1:B1 allele often possessed blaTEM-1B-strAB-dfrA14-mphA-sul2-tetB (18/32, 56%), and those containing the F-:A1:B10 allele often possessed blaTEM-1B-strAB-dfrA17-mphA-sul2-tetB (34/240, 14%) or close derivatives.
Prevalence of antimicrobial resistance genes among ST1193 and non-ST1193, STc14 isolates
Because of the lack of an immediate ancestor basal to ST1193, we focused on the divergence of clades containing K1 and K5 capsular types and different F-type plasmids within ST1193 using time-scaled analysis. According to a time-scaled analysis, the most recent common ancestor (MRCA) was dated to 1993 (95% highest posterior density [HPD] interval, 1986 to 1999), when the two main clades (K1 and K5 associated) diverged (Fig. 4). This was followed by another divergence within the K1 clade in 1995 (95% HPD interval, 1989 to 2001), which split the K1 clade into two subclades. The subclades again were congruent with switches in F plasmid alleles. The overall mutation rate within the ST1193 phylogeny was estimated as 4.03 × 10−7 substitutions per site per year (95% HPD interval, 2.67 × 10−7 to 5.55 × 10−7), which, given the length of the reference genome (4,639,328 bases; GenBank accession no. ERR1015392), equates to 1.87 substitutions per genome per year (95% HPD interval, 1.24 to 2.57).
Time-scaled phylogenetic analysis of 111 ST1193 isolates collected in the U.S. between 2008 and 2018. Red node bars indicate the 95% highest posterior density interval of the node position. Purple circles indicate nodes with at least 70% posterior support. Brown bars at the right indicate the presence of capsular types K1 and K5, and purple bars at right indicate presence of F plasmid alleles.
Because ST14 per se can be regarded as a quasi-ancestor to ST1193, we used Roary and Scoary for pangenome analysis of ST1193 and ST14 isolates to define genetic differences between these two clonal groups that might underlie the recent ST1193 expansion and between the K1- and K5-containing subsets within ST1193 that might underlie the seeming current dominance of K1 strains. A pangenome illustration using a heatmap suggested the occurrence of genetic gains and losses during the transition from ST14 to ST1193, plus additional gains and losses within ST1193 during the transition from the K5 to the K1 capsular type (Fig. 5). According to Scoary, 218 predicted proteins were significantly associated with ST1193 rather than ST14 (Bonferroni adjusted P value of <0.05), whereas 440 predicted proteins were significantly associated with ST14 rather than ST1193. Likewise, within ST1193, 186 predicted proteins were significantly associated with the K1 capsular type, and 165 predicted proteins were significantly associated with the K5 capsular type (Tables S1 and S2).
Heatmap presence or absence of non-core-predicted proteins across STc14 isolates sequenced in this study. Red, presence of predicted protein presence; cream, absence of predicted protein. Heatmap was generated using Ward two-way hierarchical clustering with the Euclidean distance method.
Although many of the predicted proteins within these subsets were hypothetical proteins of unknown function, some had assigned function. Within the latter subset, an additional filter was applied that considered only predicted proteins of assigned function that occurred in <25% of comparator group isolates (i.e., ST14 isolates or K5-containing ST1193 isolates) to further refine the list to proteins not highly prevalent in comparator groups (Tables S1 and S2 and Data Set S2). Several genes and systems of interest were significantly associated with either ST1193 generally (rather than ST14) or the K1-containing (rather than K5-containing) ST1193 isolates. ST1193 (but not ST14)-associated regions included the putative fimbrial operon yraHIJK (which was present alone on a chromosomal island), an AIDA-I-like autotransporter, K1 capsule-associated genes, a chromosomal integrated conjugative element, sat, and several regulation-associated proteins. Within ST1193, K1 (rather than K5)-associated regions of interest included K1 capsule-associated genes, mercury resistance-associated genes, and sat.
To better place ST1193 in the broader context of STc14, minimal spanning trees based on allelic variation across the core genome MLST (cgMLST) loci were created using isolates from Enterobase that represented ST14 or one of its single-locus variants according to conventional 7-locus MLST (Fig. 6). Strict ST14 isolates were clearly divided by their predicted membership in the O75 or O18 serogroup. The remaining STs were predicted to belong to the O75 serogroup. ST1193 formed a distinct cluster, most closely related (according to distance) to other discrete clusters containing isolates from ST1057, ST404, ST550, and ST537.
Enterobase GrapeTree of 484 STc14 isolates, generated using core genome multilocus sequence typing (cgMLST). Using the Achtman 7-gene MLST scheme in Enterobase, a search for ST14 with a mismatch setting of 1 allele resulted in a database of 483 STc14 strains. cgMLST was used in the experimental data setting to generate GrapeTree (minimal spanning tree) images. The upper image is colored according to ST, and the lower image is colored according to predicted serogroup. In the keys, the numbers in brackets represent number of isolates.
DISCUSSION
This study begins to clarify the genetic events that led to the current emergence of ST1193. First, this work shows that ST1193 isolates display clear patterns of evolution that involve several key events, including acquisition of fluoroquinolone resistance, as previously described (3, 5, 7), and gains/losses in chromosomal genomic islands and plasmid content. Although these traits displayed excellent phylogenetic congruence, acquired antibiotic resistance genes, in contrast, did not cluster phylogenetically and were less prevalent within clades than known plasmid types themselves. Notwithstanding some general co-occurrence of specific plasmid alleles with specific resistance genes and combinations thereof, it was evident that the gene cassettes carrying these resistance genes have frequently been lost, while the corresponding carrier plasmid has generally been retained at high prevalence (Table 2). We lack data regarding antibiotic use by the source patients; therefore, it is unknown if these isolates were under selective pressure at the time of sampling. Nevertheless, our observations suggest that these antimicrobial resistance cassettes are unstable in ST1193.
Time-scaled phylogenetic analysis estimated that the current ST1193 clades diverged from their MRCA in 1993, which is congruent with the divergence estimation for ST131-H30, the emergent fluoroquinolone resistance-associated clade within ST131 (20). The increased use of quinolones at that time (21) is consistent with the emergence of both of these fluoroquinolone-resistant clonal groups and plausibly contributed to their advent. The higher mutation rate estimated here for ST1193 (1.87 substitutions per genome per year) versus that estimated previously for ST131-H30 (1.0 substitutions per genome per year) (20) suggests an increased propensity for ST1193 to acquire additional mutations. Conceivably, this could provide ST1193 with an advantage over ST131 with respect to adaptation toward environmental and within-host success, although this remains to be confirmed experimentally.
The highly prevalent signature F-type plasmid observed within ST1193 has been identified repeatedly among globally important ExPEC clones (15–17, 22). This plasmid carries two highly conserved regions, the first encoding ColIa immunity and the senB enterotoxin, the second a putative iron transport system and an undefined ABC-type transport system. Both systems were highly conserved among the present ST1193 isolates, suggesting their importance in either clonal virulence or fitness. In a different clonal background, this plasmid has been shown to confer enhanced bladder colonization and invasion during the acute stages of experimental infection, in part due to senB (17). Given this plasmid's presence in many successful ExPEC clones, its established role in experimental cystitis, and its conservation with ST1193, it may contain virulence or fitness factors important for ST1193's epidemiologic success.
Nearly all ST1193 isolates possess an F-type plasmid that is apparently deficient in plasmid transfer because of a full deletion of its transfer locus. Only two isolates were identified where a transfer locus was detected. Even so, these could be other F-type plasmids than those of our focus, as different F plasmids co-occur frequently in E. coli (23). Deletion or disruption of the F-transfer locus has been observed previously in successful ExPEC clonal groups, including ST131 (16, 23). This conflicts with the established concept of bacterial altruism, or sharing of beneficial mobile genetic elements between members of the community (24). Since host bacteria that promote plasmid transfer are outcompeted by hosts with a lower transfer rate, it is possible that loss of transfer-associated traits by ST1193 and ST131 has allowed them to become better adapted to and/or fit clonal groups, and this is reflected by the evolution of their plasmids. Alternatively, it is also possible that this was simply a one-off event irrespective of fitness. However, numerous examples exist of successful E. coli clones in which key plasmids appear to undergo fixation through truncation (23); this phenomenon warrants further study, including specifically study of ST1193.
Numerous other predicted proteins significantly differentiated ST1193 from ST14 strains within STc14, and K1 from K5 capsule-containing strains within ST1193. Some of these genomic differences may have contributed to the success of ST1193 generally and of its K1 capsule-containing subset in particular, the determination of which is important and awaits additional work. K1 capsule has been previously associated with serum resistance (25). Although it appears that within ST1193 the K1 capsule-containing subset is more recent and currently dominant, this conclusion may reflect biases in our strain set and likewise awaits future confirmation.
Based on phylogenetic, phenotypic, and genomic observations, we highlight key events during the evolution of ST1193 (Fig. 7). First, our data indicate that a frameshift mutation in lacY, potentially conferring a lactose-negative phenotype, occurred in STc14 prior to the divergence of ST1193, based on the presence of this mutation in all ST1193 and some basal STc14 strains. Our data also suggest that a subsequent non-lactose-fermenting hypothetical ancestor to ST1193, within STc14, acquired mutations in genes conferring fluoroquinolone resistance. These mutations may have occurred prior to, or following, acquisition of the ST1193 genotype. At some point a switch also occurred in the fimH allele, from H27 (present in ancestral ST14) to H64 (present in ST1193). Based on our data, a fluoroquinolone-resistant and lactose-negative ST1193 ancestor then either acquired an F-:A1:B20-type plasmid from a closely related E. coli strain, such as one from the O25 or O6 serogroup, or a preexisting plasmid within STc14 could have recombined with other ColIa-encoding plasmids to create the ST1193 virulence plasmid. A subsequent recombinational event within ST1193 likely switched the ST1193-associated capsular type from K5 to K1. Plasmid recombination involving switching of RepFIB alleles and insertions/deletions of other plasmid regions also apparently has been occurring throughout ST1193's short-term evolution. This is reminiscent of plasmid swapping and remodeling observed previously in ST131 and its H30R1 and H30Rx subclades, involving variations of the same plasmid type as that observed in ST1193 (16).
Postulated key events leading to the currently circulating ST1193 fluoroquinolone-resistant clonal group. RepFIB alleles B1, B10, and B20 are depicted as boxes at the lower right.
A study limitation was our inability to identify the immediate fluoroquinolone-susceptible predecessor to currently circulating fluoroquinolone-resistant ST1193 strains. Thus, our model of evolution assumes that ST1193’s unexplained epidemiological success occurred following the development of fluoroquinolone resistance. It is possible that phylogenetic subsets within ST1193 cluster geographically, consistent with geographical microevolution, but a second limitation of this study was that too few of the studied non-U.S. isolates had verified geographical isolation sources to allow confident conclusions regarding geographical clustering. Some of these isolates were selected because of their decreased susceptibility to ciprofloxacin. Whether the observation that all currently sequenced ST1193 isolates are fluoroquinolone resistant reflects sampling bias toward fluoroquinolone resistance or indicates that this phenotype is present in all ST1193 derivatives remains to be determined. Regardless, this study demonstrates stepwise evolution of yet another successful multidrug-resistant ExPEC clonal group that has important implications for human health.
MATERIALS AND METHODS
Isolates.A total of 427 E. coli isolates were studied (Table 1). Of these, 118 were sequenced in this study. For these 118, prospectively collected clinical and fecal isolates were obtained from the Minnesota Veterans Administration Medical Center clinical laboratory (n = 76) (26, 27), nationally distributed Veterans Administration Medical Centers (n = 3) (2), and the University of Minnesota Medical Center clinical laboratory (n = 5) (28). Additional strains analyzed were from the E. coli Reference (ECOR) collection (n = 1) (29), a collection of Australian ST1193 clinical isolates (n = 28) (5), and a historical collection of blood isolates from humans (n = 5) (30). All isolates were obtained in pure culture in a manner similar to that described previously (30). Lactose fermentation was noted on lactose TTC agar with Tergitol 7. These 118 isolates included 24 of fecal origin and 94 of clinical origin (see Data Set S1 in the supplemental material).
Enterobase, a repository for E. coli genomic data (http://enterobase.warwick.ac.uk/), was also searched for isolates belonging to ST1193 (according to the Achtman 7-gene MLST scheme) or non-ST1193 STc14 isolates predicted as serogroup O75. STc14 was defined as ST14 and its single-locus variants. Genome sequences for the 309 isolates so identified were obtained in silico as raw fastq reads from Enterobase (48). Accession numbers for these isolates are listed in Data Set S1. Notably, many of these isolates were sequenced as part of a larger community initiative through the Wellcome Trust Sanger Institute. Although most of the Enterobase isolates were of unknown source, they were used for comparison purposes because of their membership in ST1193 or serogroup O75. Selected O75 isolates belonged to three constituent STs from STc14, i.e., ST1193, ST550, and ST14. Finally, E. coli strain PAR (GenBank accession no. CP012379) was used as a reference genome for mapping because, at the time of this analysis, it was the closest completed genome to ST1193. This was done by mapping reads of a known ST1193 isolate to completed E. coli genomes in NCBI and determining the closest phylogenetically related complete genome for high-quality mapping. The purpose of this approach was to identify an appropriate outgroup for mapping and phylogenetic inference.
DNA sequencing and analysis.Isolates described in this study (n = 118) were sequenced using Illumina MiSeq. Isolates were grown overnight from a single colony, taken from a freezer stock of pure culture clinical samples, in 2 ml Tryptic soy broth. DNA was extracted from each isolate using a Qiagen DNeasy kit (Valencia, CA) according to the manufacturer's instructions. Genomic DNA libraries were created using a Nextera XT library preparation kit and Nextera XT index kit v2 (Illumina, San Diego, CA) according to the manufacturer’s instructions. Sequencing was performed at the University of Minnesota Mid-Central Research and Outreach Center (Willmar, MN) using 250-bp dual-index runs on an Illumina MiSeq to generate approximately 20- to 30-fold coverage per genome, and coverage in actuality ranged from 10× to 35×. Following sequencing, fastq files were trimmed of Nextera adapter and quality trimmed using Trimmomatic with a 4-bp sliding window requiring average quality of greater than 20 (31).
Genome assembly of MiSeq reads for each sample was performed using SPAdes genome assembler (32). Assemblies were performed using default parameters with automatic k-mer detection. Genome annotation for each assembly was performed using Prokka with E. coli as the target species (33). Gene content analysis was performed using tools available through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/). Specifically, acquired resistance genes and known chromosomal mutations conferring antibiotic resistance were identified using ResFinder with 90% minimum match and 60% minimum length (34). Plasmid types were identified using PlasmidFinder and pMLST with 95% minimum match and 60% minimum length (35). Virulence genes were identified using VirulenceFinder with 90% minimum match and 60% minimum length (36). Serotypes and fimH alleles were identified with SerotypeFinder (85% minimum match and 60% minimum length) and FimTyper (95% minimum match), respectively. Custom BLASTN searches were used to assess plasmid gene content and possession of K1 versus K5 capsular types based upon known sequences using a minimum match percentage of 95% and minimum length of 80% (5).
Phylogenetic analysis.Trimmed reads for each sample were mapped to the reference genomes or plasmids using Snippy, with a minimum coverage of 10×, minimum fraction of 0.9, and minimum vcf variant call quality of 100 (https://github.com/tseemann/snippy). Gubbins (version 2.3.1) was used to remove recombinant regions from the resulting alignment file and to generate a maximum likelihood tree following 10 iterations of tree building using RAxML (version 8) (37). The alignment consisted of 1,871 SNP sites for the ST1193-only analysis and 12,824 SNP sites for STc14 analysis. The Interactive Tree of Life website was used to generate images of phylogenetic trees including metadata (38). Mapped plasmid reads were visualized using Integrated Genomics Viewer (39). GrapeTree was used to construct and visualize minimal spanning trees generated using cgMLST analysis with 484 isolates from Enterobase selected using the Achtman 7-gene E. coli MLST scheme for strains within ST14c, defined as ST14 and a setting of maximum number of mismatches of one (40). For purposes of defining clades throughout this work, clades are defined using discrete clustering with at least 1,000 bootstrap replicate support at >80% confidence and accessory trait (plasmid) patterns congruent with phylogenetic clustering.
Time-scaled phylogenetic analysis.A subset of ST1193 isolates collected from humans in the United States and Australia between 2008 and 2018 with dates of isolation available (n = 111) were aligned to the ST1193 reference genome (ERR1015392) using Snippy and Gubbins. A maximum-likelihood phylogeny with 1,000 bootstrap replicates was created using RaxML (as described above). The alignment consisted of 1,871 SNP sites (in order to correct for ascertainment bias, the total number of each nucleotide in the reference genome was manually incorporated in the xml files of all BEAST models described below). The temporal signal was first investigated using Tempest (41), estimating the relationship between the ages of the isolates and root-to-tip distances (R2 = 0.47 with the R2 function). Time-scaled phylogeny was constructed using BEAST (version 1.8.4) (42). The transversional (TVM) substitution model (selected using Jmodeltest [43]) was used for nucleotide substitution and both uncorrelated lognormal relaxed and strict molecular clocks with different coalescent population models (i.e., constant size, logistic growth, exponential growth, and GMRF Bayesian skyride). These were explored, and the log marginal likelihoods obtained using path/stepping-stone sampling were compared.
An evolutionary rate of 2.46 × 10−7 mutations per site per year, previously estimated for E. coli ST131 (20), was used as the mean estimation for the clock rate prior. Each model combination was tested for two independent Markov chain-Monte Carlo (MCMC) runs of at least 200 million generations, with sampling every 20,000 generations. The convergence of all MCMC runs (effective sample size of greater than 200) and the agreement between two independent MCMC runs of the same model were verified manually after excluding 10% of the MCMC run as burn-in. The best-fitting model was the uncorrelated lognormal relaxed molecular clock with GMRF Bayesian skyride. For further estimation of whether the data set is sufficient for accurate molecular dating, date randomization (using 20 repeats) was conducted on the final model MCMC runs using the package TipDatingBeast (44) in R software, and the times to most recent common ancestor (MRCA) were compared between the original MCMC run and the randomized trials. In all cases, no overlap was found between the original MCMC run 95% HPD interval and the randomized trials (data not shown). LogCombiner (45) was used to combine the two independent MCMC runs of the final model after exclusion of a 10% burn-in period. Package ggtree in R software was used for tree visualization.
Discrete trait analysis.BEAST was used for Bayesian ancestral state reconstruction. The final model selected (described above) was used, and two bidirectional (either symmetric or asymmetric) models were used to answer whether the K1 capsule protein was present in the MRCA and was subsequently lost or was absent in the MRCA and was subsequently acquired by one of the clades. Two independent MCMC runs were used for each model, and the best-fitting model was selected as described above.
Pangenome analysis.Roary was used to define the pangenome of isolates sequenced in this study (46). Scoary was used to compare ST1193 isolates to ST14 isolates and K1-containing isolates to K5-containing isolates (47). Predicted proteins were considered significantly associated with a group if the Bonferroni-adjusted P value was ≤0.05.
Accession number(s).Raw reads from isolates sequenced (n = 118) in this study were deposited in the NCBI short read archive (SRA) under BioProject PRJNA487890.
ACKNOWLEDGMENTS
Funding for this study was provided in part through University of Minnesota Rapid Agricultural Response Funds and the Office of Research and Development, Department of Veterans Affairs (J.R.J.). We thank Bonnie Weber for technical laboratory support and David Gordon for providing some of the strains for this study.
FOOTNOTES
- Received 6 September 2018.
- Returned for modification 9 October 2018.
- Accepted 17 October 2018.
- Accepted manuscript posted online 22 October 2018.
Supplemental material for this article may be found at https://doi.org/10.1128/AAC.01913-18.
- Copyright © 2018 American Society for Microbiology.