Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *

Linus Lundgren; Wenhao Lu; Zhitao Liang; Ze Zhang; Karinne Ramirez-Amaro; Emmanuel Dean

doi:10.1109/SII64115.2026.11404492

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *
Paper in proceeding, 2026

Sparse rewards are a persistent bottleneck for robotic manipulation with Reinforcement Learning (RL), primarily because RL agents must discover long-horizon, multi-step behaviors while receiving infrequent and weakly informative feedback. Recent work uses pre-trained Vision Language Models (VLMs) to provide dense per-step rewards, yet most approaches score only a single image against a goal text, ignoring whether the recent change actually moves the system toward success. We argue that this omission impairs exploration (e.g., goal-like detours, wrong-way progress, action aliasing) and propose to make time explicit in VLM rewards by adding a directional signal that evaluates short-horizon change. Concretely, we pair visual change over a few steps with a text description of the desired change, and finetune lightweight heads with RL; the resulting directional signal is combined with a standard positional signal into a single shaping reward. We evaluated our approach in six MetaWorld manipulation tasks with fixed goals. This directional shaping improves running average success at a fixed budget to 78.2%, versus 63.8% for the best-tuned positional baseline (improvements were observed in five of six tasks). Ablations identify key design choices for the proposed directional term to be effective and show its synergy with the positional term when supplying dense VLM rewards, demonstrating improved exploration and sample efficiency.

Author

Linus Lundgren

Student at Chalmers

Wenhao Lu

Chalmers, Electrical Engineering, Systems and control

Other publications Research

Zhitao Liang

Chalmers, Electrical Engineering, Systems and control

Other publications Research

Ze Zhang

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

Other publications Research

Karinne Ramirez-Amaro

Chalmers, Electrical Engineering, Systems and control

Other publications Research

Emmanuel Dean

Chalmers, Electrical Engineering, Systems and control

Other publications Research

2026 IEEE SICE International Symposium on System Integration Sii 2026

722-728
9781665457842 (ISBN)

2026 IEEE/SICE International Symposium on System Integration, SII 2026
Cancun, Mexico,

Subject Categories (SSIF 2025)

Robotics and automation

Computer Sciences

DOI

10.1109/SII64115.2026.11404492

Publication data connected to DOI

More information

Latest update

4/23/2026

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning * Paper in proceeding, 2026

Author

Linus Lundgren

Wenhao Lu

Zhitao Liang

Ze Zhang

Karinne Ramirez-Amaro

Emmanuel Dean

2026 IEEE SICE International Symposium on System Integration Sii 2026

Subject Categories (SSIF 2025)

DOI

More information

Latest update

Rewarding Change beyond State: Directional VLM Rewards for Sample-Efficient Robot Reinforcement Learning *
Paper in proceeding, 2026