Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra
Journal article, 2020

The latest high-throughput mass spectrometry-based technologies can record virtually all molecules from complex biological samples, providing a holistic picture of proteomes in cells and tissues and enabling an evaluation of the overall status of a person's health. However, current best practices are still only scratching the surface of the wealth of available information obtained from the massive proteome datasets, and efficient novel data-driven strategies are needed. Powered by advances in GPU hardware and open-source machine-learning frameworks, we developed a data-driven approach, CANDIA, which disassembles highly complex proteomics data into the elementary molecular signatures of the proteins in biological samples. Our work provides a performant and adaptable solution that complements existing mass spectrometry techniques. As the central mathematical methods are generic, other scientific fields that are dealing with highly convolved datasets will benefit from this work.

deconvolution

data-independent acquisition

tensor factorization

DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem

canonical decomposition

big data

proteomics

PARAFAC

mass spectrometry

Author

Filip Buric

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Jan Zrimec

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Aleksej Zelezniak

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Science for Life Laboratory (SciLifeLab)

Patterns

26663899 (eISSN)

Vol. 1 9 100137

Subject Categories

Analytical Chemistry

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

DOI

10.1016/j.patter.2020.100137

PubMed

33336195

More information

Latest update

1/3/2024 9