A global baseline for qPCR-determined antimicrobial resistance gene prevalence across environments

The environment is an important component in the emergence and transmission of antimicrobial resistance (AMR). Despite that, little effort has been made to monitor AMR outside of clinical and veterinary settings. Partially, this is caused by a lack of comprehensive reference data for the vast majority of environments. To enable monitoring to detect deviations from the normal background resistance levels in the environment, it is necessary to establish a baseline of AMR in a variety of settings. In an attempt to establish this baseline level, we here performed a comprehensive literature survey, identifying 150 scientific papers containing relevant qPCR data on antimicrobial resistance genes (ARGs) in environments associated with potential routes for AMR dissemination. The collected data included 1594 samples distributed across 30 different countries and 12 sample types, in a time span from 2001 to 2020. We found that for most ARGs, the typically reported abundances in human impacted environments fell in an interval from 10-5 to 10-3 copies per 16S rRNA, roughly corresponding to one ARG copy in a thousand bacteria. Altogether these data represent a comprehensive overview of the occurrence and levels of ARGs in different environments, providing background data for risk assessment models within current and future AMR monitoring frameworks.

included 1594 samples distributed across 30 different countries and 12 sample types, in a 25 time span from 2001 to 2020. We found that for most ARGs, the typically reported 26 abundances in human impacted environments fell in an interval from 10 -5 to 10 -3 copies per 27 16S rRNA, roughly corresponding to one ARG copy in a thousand bacteria. Altogether these 28 data represent a comprehensive overview of the occurrence and levels of ARGs in different 29 environments, providing background data for risk assessment models within current and 30 future AMR monitoring frameworks. 31 32 Keywords: antibiotic resistance, AMR, qPCR, monitoring, surveillance 33   34   35   36   37  38  39  40  41  42  43  44  45  46  47  48 year based on the average time required from the data collection to the publication 162 inferred from the studies reporting specific sampling dates, corresponding to an 163 estimate of two years. 164 4. We divided the collected samples into several "sample types": "air", "biofilm", "feces" 165 (containing both animal and human feces), "manure", "food", "sediments", "soil", 166 "water", including water from different sources such as potable, surface or reclaimed 167 water etc.; "wastewater", consisting of wastewater, sewage and influent samples, 168 because these samples have a similar/overlapping origin; "sludge" which includes 169 semi-solid by-products of different stages of the wastewater treatment process; and 170 "swab" corresponding to samples collected by swabbing the washing machines, 171 shower drains and dishwashers surfaces with a sterile cotton swab. 172 5. We further classified collected samples according to exposure to anthropogenic 173 impact: "impacted" when a sample was obviously affected by human activity such as 174 effluent samples or wastewater, "likely impacted" when it was not directly affected by 175 an obvious pollution source but was located in a populated area, "likely unimpacted" 176 if the sample was collected from relatively "pristine" environments, "feces/manure" 177 and "unknown". The "unknown" category included samples which were clearly 178 impacted by human activity, but where the activity aims at removing bacteria, e.g. 179 drinking/tap water. 180 6. The ARG abundances were collected from text, tables and figures. To identify a set of suitable qPCR targets, we investigated how well the data from a subset 199 of genes could predict the abundance of the other genes in the data matrix by performing a 200 correlation analysis. Only genes with more than 10 entries were included in this analysis. 201 The R package stats (version 3.6.2) cor function with "pairwise.complete.obs" was used to 202 calculate Spearman correlations. 203

204
Current data on AMR in the environment is biased 205 Mining of the PubMed database resulted in 802 unique papers and revealed a bias in the 206 existing literature towards a few already fairly well studied environments ( Figure S2). 207 Unsurprisingly, the majority of the identified papers matched environmental categories that 208 have been already extensively studied as primary sources of antibiotics and/or antibiotic 209 resistance to the environment. These include environments linked to wastewater treatment 210 plants, hospitals and industrial facilities, as well as agriculture and livestock production. This 211 also highlights the undersampled nature of the other environments. Among them are categories associated with human mobility. Human mobility, in particular international travel, 213 could significantly contribute to dissemination of ARGs across the globe 10,21-23 . Despite that, 214 categories associated with travel activity, such as "public transport" and "airports", were 215 represented by just a handful of papers. Another poorly investigated category is water 216 associated with recreational activities. It has been suggested that accidental ingestion of 217 water during, for example, swimming can serve as a potential route for resistant 218 environmental bacteria into the human gut 24 . 219 220 In total, we identified 150 studies containing 1594 samples providing relevant information on 221 levels of ARGs measured by qPCR. These samples represented 12 different sample types 222 with "water", "feces" and "sediments" being the most common ( Figure 1B or provide resistance to antibiotics of high clinical concern listed by WHO 41 . These include 265 clinically relevant β-lactamases such as blaCTX-M, blaTEM and blaSHV, the sulfonamide 266 resistance gene sul1, as well as ermB and tetM conferring resistance to macrolides and 267 tetracyclines, respectively. Although these genes are already common both in the 268 environment and among human pathogens, they could be useful to predict general 269 resistance levels or transmission risks. However, their usefulness in environmental 270 monitoring for emerging threats is somewhat questionable. At the same time, they may 271 function as a gauge of the total antibiotic resistance content in an environment, as some of 272 them have been shown to be good predictors of the total resistome, particularly tetM and 273 among the most reported in our study and was widespread across all the environments. 286 However, sul2 added little information compared to intI1 ( Figure S3), so these two genes 287 would be mostly redundant in AMR monitoring. 288 The plasmid-mediated fluoroquinolone resistance gene qnrS was among the most reported 289 genes in our study. It was found in all environments except manure and swabs. Contrary to 290 what has been reported previously 61,62 , we did not detect an enrichment of this gene in 291 polluted environment specifically. As it is the only commonly reported fluoroquinolone ARG, 292 it is an interesting target for monitoring. However, it has the drawback of generally not being 293 able to induce clinically relevant levels of resistance without additional resistance 294 mechanisms, which makes its clinical impact somewhat limited. 295 Among the most common target genes, several tetracycline resistance genes showed 296 significantly higher abundances in fecal/manure samples than in other human-impacted 297 environments (including tetA, tetG, tetH, tetO, tetQ and tetW). Furthermore, these genes are 298 among the most powerful ARGs for predicting the diversity and abundance of other ARGs 29 299 and can therefore, despite their widespread distribution, still be useful in monitoring to 300 extrapolate other parts of the resistome. 301 302 There were also several genes that were not often included in qPCR studies, but when they 303 were, they often appeared in relatively high abundances (above one copy per 1000 bacteria) 304 ( Figure S5). Such ARGs could be interesting additional monitoring targets in future AMR 305 monitoring schemes. Among these genes was the vancomycin resistance gene vanA, a 306 previously suggested indicator of antibiotic resistance contamination of clinical origin, which 307 is thought to be uncommon in the environment 44 . In our data, however, vanA was about as 308 abundant as other ARGs in the environment when it has been looked for. Another gene that 309 seemed to correlate with anthropogenic activities is ereA, a macrolide resistance gene which 310 has previously been reported to be the most abundant in metal polluted soil 45 and is 311 enriched by long-term application of manure 46 . In contrast, the mexF gene has been 312 reported from many different environments, including soil, sediments and water 21 and is 313 suggested to naturally occur in unaffected environments such as pristine Antarctic soils 47 , 314 indicating that it might be a useful target for identifying enrichment of ARGs occurring in 315 environmental microbial communities irrespectively of pollution from fecal material. The sul3 316 gene has been rarely reported, despite being rather abundant in the cases when it was 317 quantified. According to previous studies, sul3 is typically less frequent and abundant than 318 sul1 and sul2 31,48 . Similarly to sul1 and sul2, it has been shown to be associated with class 1 319 integrons and has been detected on a conjugative plasmid, suggesting its potential to be 320 horizontally transferred and disseminated in the environment 49 . Interestingly, sul3 was first 321 detected in an E. coli isolate from a pig, and it was later found in both healthy and diseased 322 humans. Furthermore, it is often enriched in polluted environments and is present in both 323 commensal and pathogenic bacteria 50 . Several other genes (tnpA4, tnpA5, qacEdelta, 324 aadA2, floR, tet32, cmlA1 and mefA), which were not often reported, but often highly 325 abundant when detected, would also be potentially suitable additional targets for AMR 326 monitoring ( Figure S5). 327 328 Around one in a thousand environmental bacteria carry clinical ARGs

329
In the current study, we estimated typical abundance levels of ARGs in different 330 environments. We found that for most ARGs, the typically reported abundances fell in an 331 interval from 10 -5 to 10 -3 copies per 16S rRNA, roughly corresponding to one ARG copy in a 332 thousand bacteria ( Figure 3B). Importantly, there was a strong bias towards studies of 333 environments already impacted by humans (e.g. WWTPs, agriculture and animal 334 production), and therefore this range rather reflects ARG levels in environments already 335 affected by human activity. 336 337 That said, most of the ARGs measured in likely unimpacted sampling locations had lower 338 typical abundances, below 10 -5 copies per 16S rRNA ( Figure 3A). In particular, samples of 339 water and soil from the arctic tundra showed the lowest ARG abundances. There are 340 previous studies suggesting that ARG abundances fluctuate with latitude, which to some 341 degree could be explained by optimal growth temperature for the microbes carrying these 342 ARGs 51 . Furthermore, there is evidence for a distance-decay relationship for abundance of 343 ARGs on the global and continental scales 52,53 , which could explain higher abundances of 344 ARGs in the water from alpine lakes in comparison to water from arctic tundra. Overall, 345 however, it is hard to draw any major conclusions due to the small number of samples from 346 relatively unimpacted locations. 347 Our data revealed distinct distributions of several tetracycline genes (including tetA, tetG, 349 tetH, tetO, tetQ and tetW) in feces/manure samples ( Figure 3A, Figure S6). Tetracycline 350 ARGs constituted the predominant class among the most reported qPCR targets and are 351 typically associated with human fecal samples 54 . Notably, several of these genes showed 352 distinct abundance distributions for impacted environments and feces/manure samples, 353 suggesting that they can be used as markers for human/animal fecal contamination in the 354 environment. Particularly, tetA and tetG show distinct distributions in impacted and non-355 impacted environments and could serve as indicators for anthropogenic pollution, while tetW 356 shows overlapping distributions making it less useful to ascertain human impact. Similarly, 357 the clinically relevant blaCTX-M and blaTEM genes also showed a separation in impacted and 358 non-impacted environments, although that was less clear than for tetracycline genes. 359 360 The "unknown" category was characterized by relatively high abundances of ARGs and for 361 some of the genes even higher levels than in impacted samples. This category comprised 362 samples of water and biofilm from households (unspecified), as well as potable and 363 reclaimed water distribution systems. We classified these samples as "unknown" since they 364 are obviously impacted by human activity, but the impact targets the removal of ARGs and 365 bacteria. Despite that, ARGs in some of these samples had higher relative abundances than 366 in the fecal/manure samples ( Figure 3A, Figure S6). These samples may yet not have higher 367 abundances of ARGs per amount of sample, as this would also depend on how much 368 bacteria are present in the samples per volume or weight. 369 types are color-coded as follows: "feces/manure" in purple, "impacted" in orange, "likely 378 impacted" in yellow, "likely unimpacted" in green and "unknown" type in blue. Note that 379 abundances are given relative to the number of 16S rRNA copies, which means that these 380 distributions do not necessarily correspond to the total exposure to resistant bacteria in a 381 given environment. 382

383
It is unclear whether ARG abundances have increased over time 384 To investigate if there were changes in ARG abundances over time, we used linear mixed 385 models on gene abundances from each of the environments, accounting for variability 386 between countries ( Figure S4, Table S2). Due to the successively increased sensitivity of the 387 methods used for qPCR and inclusion of a larger number of ARGs profiled in each study 388 over time, we chose to use linear mixed models for the maximum values per year reported 389 from a particular sample type rather than all ARG values reported. Using this approach, we 390 found positive trends for most of the environments (for cases where there was enough data 391 to produce the estimates), except for in effluent, sediments, sludge, wastewater and water 392 where the trends were negative (Table S2). That said, the trends were significant only for 393 biofilm, effluent, food, sediments, sludge and wastewater. Thus, the most prominent finding 394 of this analysis might be that in most cases there was too little data to draw any conclusions 395 on ARG abundance changes over time, highlighting the need for time series data to 396 understand the long-term development of antibiotic resistance in the environment. 397 398 Abundance data for some ARGs provide redundant information 399 The set of most reported genes consists of potential candidates to be used as qPCR targets, 400 since they were on average more abundant and were also found in most environments 401 (Figures 2 and 3). To explore if they would be good predictors of overall ARG abundances, 402 we performed a correlation analysis of the most reported genes and the rest of the genes in 403 the data matrix ( Figure S3). This analysis revealed a certain degree of redundancy in terms 404 of what information can be gained from the proposed monitoring targets. ARGs proposed as 405 monitoring targets that strongly correlate with each other convey similar information on ARG 406 abundance and diversity and may therefore not be very useful to use in the same panel for 407 ARG monitoring in the environment. Genes that were overall redundant included the tetO 408 and tetA genes, strB and ermF, blaCTX-M and blaSHV, tetG and tetM, as well as intI1 and sul2 409 ( Figure S3). If resources are constrained, it would seem wise not to use several genes in 410 these smaller groups together in the same ARG panel for monitoring. 411 412 Specific uses of qPCR as a monitoring tool 413 In this study, we have exclusively focused on the use of qPCR as a tool for monitoring the 414 abundance of ARGs in the environment. In general, the qPCR technique has several 415 benefits, but also some drawbacks as a surveillance tool. An important advantage is that it is 416 a highly sensitive method that can detect much lower levels of ARGs than is currently 417 possible using shotgun metagenomic sequencing 55 . Furthermore, it can operate on very 418 minute quantities of DNA, which makes it suitable also for low-biomass samples. In addition, 419 as it is performed on DNA, it can detect ARGs in non-culturable bacteria and is functional 420 also on complex samples with many different species. 421 However, application of qPCR as an AMR monitoring tool requires a priori knowledge on a 422 set of targets as well as how to interpret the results in terms of when an ARG occurrence 423 pattern becomes a concern. The limiting factor of predefined targets can be partially 424 overcome by using qPCR arrays with hundreds of genes 15 , but this also comes with 425 increased costs and may not (at present) be feasible for large-scale routine monitoring of 426 environmental AMR. Despite that, even a handful of selected targets can still provide useful 427 information about the total resistome situation, fecal contamination and potentially HGT 428 intensity. From a monitoring perspective, the background ARG levels identified in this study 429 could be used to infer an increase in ARG abundances as a sign of pollution and/or 430 selection. Based on the collected data, we advise the use of 10 times the third quartile (3Q) 431 values for a given ARG (Table S1) to determine the upper limit for what should be viewed as 432 a deviation from normal background levels in the environment. A more fundamental aspect of such deviations is whether an increase in the abundance of ARGs or mobile genetic 434 elements in a particular environment or at a specific time point is a relevant indicator of a 435 selective pressure for resistance. Importantly, an increase of ARG abundances without any 436 context is more likely to be an indicator of human pollution 56 , but it cannot be ruled out that 437 such a change could be due to a specific selection pressure from antibiotics 57 , by co-438 selection from other antibacterial compounds 58 , or simply from taxonomic shifts that are 439 unrelated to antibiotic resistance. 440 In the end, the suitability of qPCR for environmental AMR surveillance comes down to the 441 purpose of monitoring and what type of actions one might want to take based on the 442 monitoring results. If detecting dissemination of known high-risk ARGs is the sole purpose of 443 monitoring, qPCR is an excellent method thanks to its sensitivity. However, if identification of 444 emergent resistance threats is the goal, qPCR is unlikely to give useful guidance; instead, 445 shotgun metagenomics 59 or selective culturing followed with genetic profiling would provide 446 more useful information. 447

448
Outcomes and recommendations 449 In this study, we performed a literature survey to explore the abundance and prevalence of 450 ARGs in various environments as quantified by qPCR. We found that, overall, previous 451 suggestions for ARGs to be included in environmental AMR monitoring 4,29 seem relevant. 452 Particularly, inclusion of the intI1, sul1, blaTEM, blaCTX-M and qnrS genes in environmental 453 monitoring seems essential, along with a selection of the tetracycline genes, in particular at 454 least one of tetA or tetG, which could serve as fecal pollution markers. However, there are 455 also genes that are not often looked for in qPCR surveys that perhaps should be, including 456 sul3, vanA, tetH, aadA2, floR, ereA and mexF. These genes are abundant in some 457 environments, but were not often included in qPCR studies of environmental AMR. We also 458 provide environmental baseline levels for the ARGs studied through qPCR (Figure 3, Table  459 S1); for most ARGs the typical relative abundance falls in an interval from 10 -5 to 10 -3 . It should be noted that this is the range of normal abundances of ARGs and should not be Publishing. Cham, Switzerland, 2016. 545 18.