Score: 3

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

Published: December 1, 2025 | arXiv ID: 2512.01843v1

By: Zeqing Wang, Keze Wang, Lei Zhang

Potential Business Impact:

Teaches AI to spot videos breaking physics rules.

Business Areas:

Image Recognition Data and Analytics, Software

Driven by the growing capacity and training scale, Text-to-Video (T2V) generation models have recently achieved substantial progress in video quality, length, and instruction-following capability. However, whether these models can understand physics and generate physically plausible videos remains a question. While Vision-Language Models (VLMs) have been widely used as general-purpose evaluators in various applications, they struggle to identify the physically impossible content from generated videos. To investigate this issue, we construct a \textbf{PID} (\textbf{P}hysical \textbf{I}mplausibility \textbf{D}etection) dataset, which consists of a \textit{test split} of 500 manually annotated videos and a \textit{train split} of 2,588 paired videos, where each implausible video is generated by carefully rewriting the caption of its corresponding real-world video to induce T2V models producing physically implausible content. With the constructed dataset, we introduce a lightweight fine-tuning approach, enabling VLMs to not only detect physically implausible events but also generate textual explanations on the violated physical principles. Taking the fine-tuned VLM as a physical plausibility detector and explainer, namely \textbf{PhyDetEx}, we benchmark a series of state-of-the-art T2V models to assess their adherence to physical laws. Our findings show that although recent T2V models have made notable progress toward generating physically plausible content, understanding and adhering to physical laws remains a challenging issue, especially for open-source models. Our dataset, training code, and checkpoints are available at \href{https://github.com/Zeqing-Wang/PhyDetEx}{https://github.com/Zeqing-Wang/PhyDetEx}.

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

CV and Pattern Recognition

Helps computers understand how things move in videos.

4 Dec 2025 1

88%

T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation

Machine Learning (CS)

Makes computer videos follow real-world physics rules.

1 May 2025 2

88%

Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement

CV and Pattern Recognition

Makes videos follow real-world physics rules.

25 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 🇨🇳 Hong Kong, China

Repos / Data Links

github.com

Page Count

17 pages

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

Teaches AI to spot videos breaking physics rules.

Technical Abstract

PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation

Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement