AMST: Alternating Multimodal Skip Training
Paper in proceeding, 2025

Multimodal Learning is one of the many fields in Machine Learning where models leverage the combination of various modalities to enhance learning outcomes. However, modalities may differ in data representation and complexity, which can lead to learning imbalances during the training process. The time it takes for a certain modality to converge during training is a crucial metric to determine modality imbalance. Given differences in convergence rates, different modalities may harmfully interfere with each other’s learning process when simultaneously trained, as is commonly done in a multimodal scenario. To mitigate this negative impact, we propose Alternating Multimodal Skip Training (AMST) where the training frequency is adjusted for each specific modality. This novel method not only improves performance in conventional multimodal models that learn with fused modalities but also enhances alternating models that train each modality separately. Additionally, it outperforms state-of-the-art models while reducing training times.

modality convergence rate

skip training

multimodal learning

modality bias

modality imbalance

training frequency

Author

Hugo Manuel Alves Henriques E Silva

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Hongguang Chen

Chalmers, Computer Science and Engineering (Chalmers)

Selpi Selpi

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2025

European Conference, ECML PKDD 2025
Porto, Portugal,

Areas of Advance

Information and Communication Technology

Subject Categories (SSIF 2025)

Computer Sciences

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

More information

Created

5/28/2025