BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq
Journal article, 2021

The incorporation of unique molecular identifiers (UMIs) in single-cell RNA-seq assays makes possible the identification of duplicated molecules, thereby facilitating the counting of distinct molecules from sequenced reads. However, we show that the naïve removal of duplicates can lead to a bias due to a “pooled amplification paradox,” and we propose an improved quantification method based on unseen species modeling. Our correction called BUTTERFLY uses a zero truncated negative binomial estimator implemented in the kallisto bustools workflow. We demonstrate its efficacy across cell types and genes and show that in some cases it can invert the relative abundance of genes.

UMI

Droplet-based

Single-cell RNA-Seq

Correction

PCR

Bias

Amplification

Batch correction

Author

Johan Gustafsson

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Jonathan Robinson

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology, CSBI

Jens B Nielsen

BioInnovation Institute

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Lior Pachter

California Institute of Technology (Caltech)

Genome Biology

1474-7596 (ISSN)

Vol. 22 1 174

Subject Categories

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

Genetics

DOI

10.1186/s13059-021-02386-z

PubMed

34103073

More information

Latest update

6/24/2021