Multimodal multiscale decision-making and control for urban autonomous vehicles with memory-conditioned dynamic potential field reconstruction
Journal article, 2026

Urban autonomous driving remains challenging because autonomous vehicles must reason about dense multi-agent interactions, traffic-signal constraints, occlusions, and long-tail events while satisfying real-time onboard computational requirements. To address these challenges, this paper proposes a Multimodal Multiscale Decision-Making and Control framework, termed M3UDMC, for urban autonomous vehicles. The framework integrates multimodal scene representation, memory-augmented risk reasoning, memory-conditioned dynamic potential field reconstruction, and constrained model predictive control within a bi-timescale architecture. Unlike loosely coupled modular pipelines, M3UDMC establishes an explicit information flow from multimodal observation to scene representation, memory-state update, potential-field parameter modulation, MPC risk-cost construction, and control execution. The slow timescale updates semantic memory and scenario-level risk priors, while the fast timescale performs prediction-conditioned potential field reconstruction and risk-aware MPC optimization. The dynamic potential field is not used as a standalone controller; instead, it provides a differentiable risk cost for the constrained MPC formulation, where vehicle dynamics, actuator limits, road boundaries, and minimum-distance constraints are explicitly considered. The proposed framework is evaluated through high-fidelity simulation, hardware-in-the-loop validation, and real-world road tests. Compared with representative baselines, including Apollo 8.0, end-to-end reinforcement learning with DDPG, and fixed-potential MPC, M3UDMC reduces the collision rate from 21.3% and 18.7% to 7.8% in the tested scenarios. Ablation studies further indicate that memory augmentation and dynamic potential field reconstruction contribute to improved decision quality under occlusions, signal transitions, and rare interaction events. The results demonstrate that M3UDMC improves the balance among safety, efficiency, and real-time feasibility in representative urban scenarios, while dense-traffic scalability, parameter adaptation, and cross-city generalization remain important directions for future work.

Decision-making and control

Multimodal fusion

Memory-augmented reasoning

Dynamic potential field

Autonomous driving

Author

Yanbin Liu

Tsinghua University

Cong Zhang

Tsinghua University

Shaohua Cui

Chalmers, Architecture and Civil Engineering, Geology and Geotechnics

Guangyu Tian

Tsinghua University

Yugong Luo

Tsinghua University

Lei Zhang

Tsinghua University

Multimodal Transportation

27725871 (ISSN) 27725863 (eISSN)

Vol. 5 4 100324

Areas of Advance

Transport

Subject Categories (SSIF 2025)

Computer Vision and learning System

Computer graphics and computer vision

Transport Systems and Logistics

DOI

10.1016/j.multra.2026.100324

More information

Latest update

6/15/2026