Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *

Linus Lundgren; Wenhao Lu; Zhitao Liang; Ze Zhang; Karinne Ramirez-Amaro; Emmanuel Dean

doi:10.1109/SII64115.2026.11404492

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *
Paper i proceeding, 2026

Sparse rewards are a persistent bottleneck for robotic manipulation with Reinforcement Learning (RL), primarily because RL agents must discover long-horizon, multi-step behaviors while receiving infrequent and weakly informative feedback. Recent work uses pre-trained Vision Language Models (VLMs) to provide dense per-step rewards, yet most approaches score only a single image against a goal text, ignoring whether the recent change actually moves the system toward success. We argue that this omission impairs exploration (e.g., goal-like detours, wrong-way progress, action aliasing) and propose to make time explicit in VLM rewards by adding a directional signal that evaluates short-horizon change. Concretely, we pair visual change over a few steps with a text description of the desired change, and finetune lightweight heads with RL; the resulting directional signal is combined with a standard positional signal into a single shaping reward. We evaluated our approach in six MetaWorld manipulation tasks with fixed goals. This directional shaping improves running average success at a fixed budget to 78.2%, versus 63.8% for the best-tuned positional baseline (improvements were observed in five of six tasks). Ablations identify key design choices for the proposed directional term to be effective and show its synergy with the positional term when supplying dense VLM rewards, demonstrating improved exploration and sample efficiency.

Författare

Linus Lundgren

Student vid Chalmers

Wenhao Lu

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Zhitao Liang

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Ze Zhang

Göteborgs universitet

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

Forskning Andra publikationer

Karinne Ramirez-Amaro

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Emmanuel Dean

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

2026 IEEE SICE International Symposium on System Integration Sii 2026

722-728
9781665457842 (ISBN)

2026 IEEE/SICE International Symposium on System Integration, SII 2026
Cancun, Mexico,

Ämneskategorier (SSIF 2025)

Robotik och automation

Datavetenskap (datalogi)

DOI

10.1109/SII64115.2026.11404492

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2026-04-27

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning * Paper i proceeding, 2026

Författare

Linus Lundgren

Wenhao Lu

Zhitao Liang

Ze Zhang

Karinne Ramirez-Amaro

Emmanuel Dean

2026 IEEE SICE International Symposium on System Integration Sii 2026

Ämneskategorier (SSIF 2025)

DOI

Mer information

Senast uppdaterat

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *
Paper i proceeding, 2026