A collaborative team comprising researchers from Microsoft Research and Zhejiang University has introduced World-R1, an innovative text-to-video model training framework that harnesses the power of reinforcement learning methodologies. This cutting-edge framework empowers video generation models to achieve 3D geometric coherence without necessitating architectural modifications or dependence on 3D datasets. By doing so, it effectively resolves the prevalent problem of object distortion or vanishing during substantial camera movements, a challenge that has long plagued the field.
