Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data

Johan Bengtsson Palme; Martin Ryberg; Martin Hartmann; Sara Branco; Zheng Wang; Anna Godhe; Pierre De Wit; Marisol Sánchez-García; Ingo Ebersberger; Filipe de Sousa; Anthony S. Amend; Ari Jumpponen; Martin Unterseher; Erik Kristiansson; Kessy Abarenkov; Yann Bertrand; Kemal Sanli; Martin Eriksson; Unni Vik; Vilmar Veldre; R. Henrik Nilsson

doi:10.1111/2041-210X.12073

Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data
Artikel i vetenskaplig tidskrift, 2013

The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and blast searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 – as well as full-length ITS sequences – from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. ITSx paves the way for more sensitive blast searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.

ribosomal DNA

next-generation sequencing

molecular ecology

Fungi

Perl

Författare

Johan Bengtsson Palme

Göteborgs universitet

Forskning Andra publikationer

Martin Ryberg

Uppsala universitet

Martin Hartmann

Forschungsanstalt Agroscope Reckenholz-Tanikon

Eidgenossische Forschungsanstalt fur Wald, Schnee Und Landschaft Eth-Bereichs

Sara Branco

University of California

Zheng Wang

Yale University

Forskning Andra publikationer

Anna Godhe

Göteborgs universitet

Pierre De Wit

Göteborgs universitet

Marisol Sánchez-García

University of Tennessee

Ingo Ebersberger

Johann Wolfgang Goethe Universität Frankfurt am Main

Filipe de Sousa

Göteborgs universitet

Anthony S. Amend

University of Hawaii

Ari Jumpponen

Kansas State University

Martin Unterseher

Universität Greifswald

Erik Kristiansson

Göteborgs universitet

Chalmers, Matematiska vetenskaper, Matematisk statistik

Forskning Andra publikationer

Kessy Abarenkov

Tartu Ülikool

Yann Bertrand

Göteborgs universitet

Kemal Sanli

Göteborgs universitet

Forskning Andra publikationer

Martin Eriksson

Chalmers, Sjöfart och marin teknik

Forskning Andra publikationer

Unni Vik

Universitetet i Oslo

Vilmar Veldre

R. Henrik Nilsson

Göteborgs universitet

Methods in Ecology and Evolution

2041210x (eISSN)

Vol. 4 10 914-919

Ämneskategorier (SSIF 2011)

Botanik

Biologisk systematik

Markvetenskap

Miljö- och naturvårdsvetenskap

Ekologi

Mikrobiologi

Bioinformatik (beräkningsbiologi)

Mikrobiologi inom det medicinska området

Bioinformatik och systembiologi

Zoologi

Datavetenskap (datalogi)

DOI

10.1111/2041-210X.12073

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2024-12-10

Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data Artikel i vetenskaplig tidskrift, 2013

Författare

Johan Bengtsson Palme

Martin Ryberg

Martin Hartmann

Sara Branco

Zheng Wang

Anna Godhe

Pierre De Wit

Marisol Sánchez-García

Ingo Ebersberger

Filipe de Sousa

Anthony S. Amend

Ari Jumpponen

Martin Unterseher

Erik Kristiansson

Kessy Abarenkov

Yann Bertrand

Kemal Sanli

Martin Eriksson

Unni Vik

Vilmar Veldre

R. Henrik Nilsson

Methods in Ecology and Evolution

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data
Artikel i vetenskaplig tidskrift, 2013