Strategies to improve usability and preserve accuracy in biological sequence databases
Artikel i vetenskaplig tidskrift, 2016

Biology is increasingly dependent on large-scale analysis, such as proteomics, creating a requirement for efficient bioinformatics. Bioinformatic predictions of biological functions rely upon correctly annotated database sequences, and the presence of inaccurately annotated or otherwise poorly described sequences introduces noise and bias to biological analyses. Accurate annotations are, for example, pivotal for correct identifications of polypeptide fragments. However, standards for how sequence databases are organized and presented are currently insufficient. Here, we propose five strategies to address fundamental issues in the annotation of sequence databases: (i) to clearly separate experimentally verified and unverified sequence entries; (ii) to enable a system for tracing the origins of annotations; (iii) to separate entries with high-quality, informative annotation from less useful ones; (iv) to integrate automated quality-control software whenever such tools exist; and (v) to facilitate post-submission editing of annotations and metadata associated with sequences. We believe that implementation of these strategies, for example as requirements for publication of database papers, would enable biology to better take advantage of large-scale data.

Databases

Functional prediction

Standards

Sequencing

Annotation

Författare

Johan Bengtsson-Palme

Göteborgs universitet

Fredrik Boulund

Göteborgs universitet

Chalmers, Matematiska vetenskaper, matematisk statistik

Robert Edström

Chalmers, Data- och informationsteknik

Amir Feizi

Chalmers, Biologi och bioteknik, Systembiologi

Anna Johnning

Chalmers, Matematiska vetenskaper, matematisk statistik

Göteborgs universitet

Viktor Jonsson

Göteborgs universitet

Chalmers, Matematiska vetenskaper, matematisk statistik

Fredrik Karlsson

Chalmers, Biologi och bioteknik, Systembiologi

C. Pal

Göteborgs universitet

Mariana Buongermino Pereira

Chalmers, Matematiska vetenskaper, matematisk statistik

Göteborgs universitet

Anna Rehammar

Göteborgs universitet

Chalmers, Matematiska vetenskaper, matematisk statistik

José Sánchez

Göteborgs universitet

Chalmers, Matematiska vetenskaper

Kemal Sanli

Göteborgs universitet

Kaisa Thorell

Karolinska Institutet

Proteomics

1615-9853 (ISSN) 1615-9861 (eISSN)

Vol. 16 18 2454-2460

Ämneskategorier

Annan biologi

Bioinformatik och systembiologi

Datavetenskap (datalogi)

DOI

10.1002/pmic.201600034

PubMed

27528420