The baseline model exhibits unstable behaviour: zig-zag motions, hesitation, and failed grasps due to distractor-target confusion.
Action- and World Model-based Rewards Improve VLA Post-Training
A state-of-the-art Vision-Language-Action model
NORA-1.5 advances vision-language-action models through three key innovations: a flow-matching action expert, action-conditioned world model rewards, and DPO post-training.
State-of-the-art results across simulation benchmarks and robust cross-embodiment transfer to Galaxea A1 robot.
+5.5% over best baseline
+0.8% over best baseline
+12.2% over NORA
NORA-1.5 achieves state-of-the-art results across simulation and real-world benchmarks
Integrated trainable flow-matching expert with NORA backbone achieves superior performance and faster inference.
Action-conditioned V-JEPA2 world model provides goal-based reward signals for scalable preference optimization without simulators.
Direct preference optimization with hybrid rewards consistently improves performance across benchmarks.
Successfully transfers to unseen Galaxea A1 robot with strong generalization.
The baseline model exhibits unstable behaviour: zig-zag motions, hesitation, and failed grasps due to distractor-target confusion.
Reward-driven DPO post-training produces smoother trajectories, consistent approach vectors, and significantly higher grasp reliability.
@article{hung2025nora15,
title={NORA-1.5: Action- and World Model-based Rewards Improve Vision-Language-Action Model Post-Training},
author={Hung, Chia-Yu and Majumder, Navonil and Deng, Haoyuan, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, and Soujanya Poria},
journal={arXiv preprint},
year={2025}
}