Nested Russian Doll-Like Genetic Mobility Drives Rapid Dissemination of the Carbapenem Resistance Gene blaKPC

The recent widespread emergence of carbapenem resistance in Enterobacteriaceae is a major public health concern, as carbapenems are a therapy of last resort against this family of common bacterial pathogens. Resistance genes can mobilize via various mechanisms, including conjugation and transposition; however, the importance of this mobility in short-term evolution, such as within nosocomial outbreaks, is unknown. Using a combination of short- and long-read whole-genome sequencing of 281 blaKPC-positive Enterobacteriaceae isolates from a single hospital over 5 years, we demonstrate rapid dissemination of this carbapenem resistance gene to multiple species, strains, and plasmids. Mobility of blaKPC occurs at multiple nested genetic levels, with transmission of blaKPC strains between individuals, frequent transfer of blaKPC plasmids between strains/species, and frequent transposition of blaKPC transposon Tn4401 between plasmids. We also identify a common insertion site for Tn4401 within various Tn2-like elements, suggesting that homologous recombination between Tn2-like elements has enhanced the spread of Tn4401 between different plasmid vectors. Furthermore, while short-read sequencing has known limitations for plasmid assembly, various studies have attempted to overcome this by the use of reference-based methods. We also demonstrate that, as a consequence of the genetic mobility observed in this study, plasmid structures can be extremely dynamic, and therefore these reference-based methods, as well as traditional partial typing methods, can produce very misleading conclusions. Overall, our findings demonstrate that nonclonal resistance gene dissemination can be extremely rapid, presenting significant challenges for public health surveillance and achieving effective control of antibiotic resistance.


Isolate collection and Illumina sequencing
Illumina sequencing was performed as previously (1), with the exception of three isolates, CAV1602, CAV1606 and CAV1697. For these, genomic DNA was extracted using a QIAcube automated sample preparation station (Qiagen, San Diego, CA) and libraries for sequencing were generated and normalized using the Nextera XT DNA library preparation kit (Illumina, San Diego, CA). Whole genome sequencing was performed on a MiSeq benchtop sequencer (Illumina, San Diego, CA, USA), utilizing 500 cycles of paired-end reads.
Isolates from 204 patients were prospectively collected and identified as having blaKPC by PCR, however a small number of these isolates were not viable, were incorrectly labelled, lost blaKPC through laboratory passage or were contaminated during the cataloguing process. We initially performed Illumina sequencing on 294 available blaKPC-positive isolates. In ten isolates, the entire Tn4401 region (including blaKPC) was absent from the sequencing data (as determined by BLASTn comparisons with the isolate's de novo assembly), presumably due to plasmid loss in culture. Three additional isolates showed inconsistencies in species classification (see below). These 13 isolates were therefore excluded, leaving 281 isolates from 182 patients for analysis.

Species classification
Species classification was initially performed in the clinical microbiology lab using the VITEK2 system with the GN ID card (bioMérieux, Durham, NC). To provide an independent classification method, Illumina sequenced reads were taxonomically assigned using Kraken version 0.10.4-beta (2) with the MiniKraken database from March 30 th , 2014. The top species was considered as a positive match if >20% of the reads were assigned to this species. If <10% of reads were assigned to the top species, we postulated that the correct species was most likely not in the MiniKraken database. In this case, a presumed species was determined from the microbiological classification and/or similarity in the Kraken results to other isolates with more definitive microbiological classifications (this applied to Citrobacter amalonaticus, Citrobacter freundii, and Kluyvera intermedia). All isolates were then mapped to speciesspecific references, where available, and species classification was confirmed by the quality of this mapping. For three related isolates from patient AM, the microbiology lab was able to classify only to family level (Enterobacteriaceae), and a species could not be determined by the sequence-based method; these are considered to represent an unknown species and have been labelled as 'Other'. Three further isolates produced inconsistent results from the two classification methods (possible labelling errors) and were therefore excluded from further analyses.

Phylogenetic analysis and strain classification
For each species represented by at least three isolates after the removal of within-patient duplicates (see main Methods), phylogenetic analysis was performed using PhyML version 20120412 (3) with an alignment consisting of all variable sites as identified through mapping to a species-specific chromosomal reference (Supplementary Table 5), padded to the length of the reference with invariant sites of the same GC content as the original data. For species represented by only two isolates, PhyML could not be used. In these cases tree topology is unambiguous and branch length was taken to be the proportion of variant sites between the two isolates.
Genetic clusters were defined using the above phylogenies as follows: Isolates were considered to belong to the same strain if they both descended (directly or indirectly) from an internal node with at least one path to a tip consisting only of branches with length <10 -4 (corresponds to 500 SNVs for a 5 Mb chromosome). This essentially partitions the tree on long branches, but as the cutoff used for a long branch is deliberately quite high, the method will tend to overcluster (i.e. it is conservative in the definition of a chromosomally distinct strain).

Long-read PacBio sequencing
To refine each of the initial PacBio assemblies, Illumina reads from the corresponding isolate were mapped using bwa-mem version 0.7.5a-r405 (4) and visualised using Gap5 v1.2.14-r (5). Unmapped reads were de novo assembled using a5-miseq version 20140401 (6) to identify small plasmids, as these were expected to be absent from the PacBio assembly due to size selection of the input DNA (i.e. <7kb DNA fragments were removed). Plasmid and chromosome structures were closed by resolving repeats at the ends of contigs. As the repeats were generally imperfect, mapping information was used to determine the correct sequence in each case. To validate the final assembly structures, each replicon was relinearised in a non-repeat region, the Illumina reads were remapped, and Gap5 was used for visualisation, confirming the absence of misassemblies. The consensus sequence from mapping was also used to correct minor errors in the assembly sequence (generally single base indels in homopolymeric regions. For one isolate (CAV1344), closed blaKPC plasmid structure(s) could not be obtained and further investigation revealed the most likely cause to be a mixture of plasmid structures in the DNA population that was sequenced. Therefore, we re-isolated DNA from a single colony and repeated PacBio sequencing, which was then successful. We report the results of the latter sequencing, although it should be noted that our Illumina sequence data was obtained from the presumably mixed population.

Analysis of Tn4401 flanking sequences
For each plasmid that was classified as being present, the de novo assembly was further examined to determine, if possible, whether Tn4401 was in fact contained within this plasmid in the expected sequence context. For plasmids where Tn4401 was not inserted into a Tn2like element, 400 bp of flanking sequence from each side of Tn4401 in the reference plasmid sequence was used to query the de novo assembly using BLASTn. For each side, if there was a single contig covering the entire 400 bp region, 400 bp of sequence adjacent to the match was compared with the expected Tn4401 sequence, as well as the sequence on the other side of Tn4401 in the plasmid reference. If both sides matched the expected Tn4401 sequence, then the plasmid was classified as containing Tn4401 in that isolate. If the two sides matched each other over a length of ≥300 bp (i.e. the insertion site was assembled without containing Tn4401, but possibly with minor indels) then the plasmid was classified as not containing Tn4401. In any other situation (generally if there were contig breaks at or close to the Tn4401 insertion sites) then the plasmid was classified as uncertain.
For plasmids where Tn4401 was inserted into a Tn2-like element, the immediate sequence flanking Tn4401 would not be sufficient to distinguish between these different plasmids. Therefore, to classify one of these plasmids as containing Tn4401, we required that for each side of Tn4401, the entire Tn2 region from the reference plasmid, along with 400 bp on either side (i.e. 400 bp of Tn4401 sequence, the Tn2 region, and 400 bp of sequence specific to that plasmid) was present in the de novo assembly as a continuous sequence on a single contig. To classify one of these plasmids as not containing Tn4401, the same method was used as for non-Tn2 plasmids above. In other words, if the region of Tn2 encompassing the Tn4401 integration site was assembled as a continuous sequence on a single contig, then we assumed that Tn4401 could not be integrated into a Tn2-like element in any plasmid in that isolate.
For the identification of novel Tn4401 insertion sites without a flanking Tn2-like element, a similar method was used, with the initial query sequence consisting of 400 bp at either end of Tn4401. The adjacent sequences were then compared with the blaKPC plasmids identified from long-read sequencing, and we report isolates where at least 400 bp of flanking sequence on either side of Tn4401 could be determined, and had little/no homology to any of the known PacBio blaKPC plasmids.

Epidemiological classification
For each patient, blaKPC acquisition source was epidemiologically classified as either "local" (likely acquisition within UVaMC) or "imported" (likely acquisition prior to UVaMC admission), on the basis of hospitalisation time at UVaMC prior to blaKPC isolation, with an arbitrary 48h cutoff (see main Methods for details). While this may have resulted in a small number of misclassifications, this is expected to be minimal, for the following reasons.
Firstly, extensive surveillance screening was performed: Beginning April 27 th 2009, weekly perirectal screening was performed for all patients from selected high risk patient care units where a persistently high incidence of carbapenamase-producing Enterobacteriaceae (CPE) was detected, as well as inpatient units caring for a patient known to be colonized or infected with CPE (for example this resulted in 6860 surveillance cultures performed in 2012) (7,8). Additionally, the median length of hospital stay immediately prior to first blaKPC-positive isolate for the 167 local acquisitions with a sequenced isolate was 13 days (IQR 4-28.5 days), with 62/130 (48%) local acquisitions following screening onset having at least one negative surveillance culture prior to the first positive, 45/62 (73%) of which were within the same hospital stay. Given the extent of screening, the length of hospitalisation prior to blaKPC isolation, and the presence of prior negative cultures in many cases, the classification of "local" acquisitions should be robust.
With respect to the robustness of the "imported" classification, all but one of the imports had recent outside hospital exposure, consistent with blaKPC Enterobacteriaceae being healthcare acquired. The one exception had a blaKPC isolate with 0 SNV differences relative to two other isolates from other patients in the same ward on the same day, indicating that in this case blaKPC was likely acquired within the first 48h of admission (i.e. a false "imported" classification).

Transmission analysis
An upper bound for the number of potential patient-to-patient transmission events was determined by identifying all possible donor-recipient pairs from the 182 patients with sequenced isolates where the donor and recipient patients were on the same ward at the same time at least one day prior to the first isolation of a blaKPC-positive Enterobacteriaceae isolate from the recipient. Patients were considered at risk for being a donor of blaKPC-positive Enterobacteriaceae at any point during hospitalization (i.e. before as well as after their first positive isolate, conservatively including all time in hospital even if the patient had previous negative cultures), and were considered to be carriers for the duration of the study. Potential donor-recipient pairs with shared bacterial strains (<200 SNVs) or Tn4401 variants were then identified. For each level of genetic relatedness (strain or Tn4401 variant), a transmission network was constructed and plausible transmission events were identified to maximise the number of potential recipients using a method adapted from (9).     Figure S1. Distinct blaKPC plasmids identified through long-read PacBio sequencing. Variants of the same plasmid backbone (see Table 1) are not shown. Arrows indicate predicted open reading frames; Tn4401 is shown in purple.