The baseline model exhibits unstable behaviour: zig-zag motions, hesitation, and failed grasps due to distractor-target confusion.
A Vision-Language-Action Model Trained using
World Model- and Action-Based Preference Rewards
Shows strong generalizability even with limited fine-tuning data
NORA-1.5 advances vision-language-action models through three key innovations: a flow-matching action expert, action-conditioned world model rewards, and DPO post-training.
State-of-the-art results across simulation benchmarks and robust cross-embodiment transfer to Galaxea A1 robot.
+5.5% over best baseline
+0.8% over best baseline
+12.2% over NORA
NORA-1.5 achieves state-of-the-art results across simulation and real-world benchmarks
Integrated trainable flow-matching expert with NORA backbone achieves superior performance and faster inference.
Action-conditioned V-JEPA2 world model provides goal-based reward signals for scalable preference optimization without simulators.
Direct preference optimization with hybrid rewards consistently improves performance across benchmarks.
Successfully transfers to unseen Galaxea A1 robot with strong generalization.
The baseline model exhibits unstable behaviour: zig-zag motions, hesitation, and failed grasps due to distractor-target confusion.
Reward-driven DPO post-training produces smoother trajectories, consistent approach vectors, and significantly higher grasp reliability.
@article{hung2025nora15,
title={NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-Based Preference Rewards},
author={Hung, Chia-Yu and Majumder, Navonil and Deng, Haoyuan, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, and Soujanya Poria},
journal={arXiv preprint},
year={2025}
}