HattCI: Fast and Accurate attC site Identification Using Hidden Markov Models.
Journal article, 2016
Integrons are genetic elements that facilitate the horizontal gene transfer in bacteria and are known to harbor genes associated with antibiotic resistance. The gene mobility in the integrons is governed by the presence of attC sites, which are 55 to 141-nucleotide-long imperfect inverted repeats. Here we present HattCI, a new method for fast and accurate identification of attC sites in large DNA data sets. The method is based on a generalized hidden Markov model that describes each core component of an attC site individually. Using twofold cross-validation experiments on a manually curated reference data set of 231 attC sites from class 1 and 2 integrons, HattCI showed high sensitivities of up to 91.9% while maintaining satisfactory false-positive rates. When applied to a metagenomic data set of 35 microbial communities from different environments, HattCI found a substantially higher number of attC sites in the samples that are known to contain more horizontally transferred elements. HattCI will significantly increase the ability to identify attC sites and thus integron-mediated genes in genomic and metagenomic data. HattCI is implemented in C and is freely available at http://bioinformatics.math.chalmers.se/HattCI .