Physics-Aware Video Instance Removal Benchmark

Published in CVPR 2026 Workshop on Video Generation and Beyond Evaluation (VGBE), 2026

Video Instance Removal (VIR) requires removing target objects while maintaining background integrity and physical consistency. We introduce PVIR, a benchmark of 95 high-quality videos annotated with instance-accurate masks and removal prompts, partitioned into Simple and Hard subsets — the latter explicitly targeting complex physical interactions such as reflections and shadows.

We evaluate representative methods using a decoupled human evaluation protocol across three dimensions: instruction following, rendering quality, and edit exclusivity. Our findings reveal that current VIR methods still treat object erasure as 2D texture filling rather than physics-aware scene reconstruction, particularly failing on complex physical side effects like specular reflections and illumination interactions.

Recommended citation: Li, Z., Chen, X., Jiang, L., Hou, D., Lin, F., Yamada, K., Gao, X., & Tu, Z. (2026). Physics-Aware Video Instance Removal Benchmark. CVPR 2026 Workshop VGBE.
Download Paper