Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
Published in ICLR, 2026
We identify a critical gap in MLLMs’ physics comprehension and propose Scene Dynamic Field, integrating physics simulators into a fine-tuning framework. Our method achieves up to 20.7% gains on fluid tasks while generalizing to unfamiliar physical domains. Paper Code
