Conformal LLM Multi-label Text Classification with Binary Relevance Approach
Paper in proceeding, 2025
Large Language Models (LLMs) are increasingly deployed in real-world Natural Language Processing (NLP) systems to perform multi-label classification tasks, such as identifying multiple forms of toxicity in online content. However, most models output raw probabilities without an exact way to quantify uncertainty, increasing the risk of misclassification in high-stakes applications. In this work, we integrate Inductive Conformal Prediction (ICP) with the Binary Relevance (BR) approach to produce statistically valid prediction sets, label-wise. Using a modified Wikipedia Toxic Comments dataset, we evaluate this framework across varying significance levels (ϵ), incorporating calibration-set-aware thresholds to address label imbalances. Our results show that BR-based conformal prediction maintains valid marginal coverage while enabling flexible control over prediction set size (efficiency). Even in the presence of rare labels, the framework provides practical uncertainty estimates and where the prediction can be abstained in uncertain cases via empty sets. These findings support the feasibility of BR-ICP-based uncertainty calibration for scalable, interpretable automation in multi-label NLP systems.
large-language models
binary relevance conformal prediction
multi-label text classification
multi-label conformal prediction
natural language processing