NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-Based Preference Rewards

NORA-1.5

A Vision-Language-Action Model Trained using
World Model- and Action-Based Preference Rewards

Shows strong generalizability even with limited fine-tuning data

Overview

NORA-1.5 advances vision-language-action models through three key innovations: a flow-matching action expert, action-conditioned world model rewards, and DPO post-training.

State-of-the-art results across simulation benchmarks and robust cross-embodiment transfer to Galaxea A1 robot.

Performance Summary

SimplerEnv (VM)

+5.5% over best baseline

LIBERO (Avg)

+0.8% over best baseline

Real Robot (Success)

+12.2% over NORA

Performance Benchmarks

NORA-1.5 achieves state-of-the-art results across simulation and real-world benchmarks

SimplerEnv Visual Matching

Key Contributions

Flow-Matching Action Expert

Integrated trainable flow-matching expert with NORA backbone achieves superior performance and faster inference.

World Model Rewards

Action-conditioned V-JEPA2 world model provides goal-based reward signals for scalable preference optimization without simulators.

DPO Post-Training

Direct preference optimization with hybrid rewards consistently improves performance across benchmarks.

Cross-Embodiment Transfer

Successfully transfers to unseen Galaxea A1 robot with strong generalization.

NORA-1.5 — Baseline

Erratic • Fixations • Distractor grasps

Zig-zag grasps — frequent corrections, lower reliability

The baseline model exhibits unstable behaviour: zig-zag motions, hesitation, and failed grasps due to distractor-target confusion.

NORA-1.5 (DPO)

Smooth • Consistent • Target-locked

Smooth and stable — fewer corrections, stronger success rate

Reward-driven DPO post-training produces smoother trajectories, consistent approach vectors, and significantly higher grasp reliability.

Citation

@article{hung2025nora15, title={NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-Based Preference Rewards}, author={Hung, Chia-Yu and Majumder, Navonil and Deng, Haoyuan, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, and Soujanya Poria}, journal={arXiv preprint}, year={2025} }