As reported by CSDN, the MMLab team at Nanyang Technological University has recently unveiled the Hand2World model. This innovative model empowers AI-driven world models to produce first-person interactive videos in real-time, using mid-air gestures as input. This advancement signifies a major technological shift from ‘passive observation’ to ‘active engagement,’ effectively tackling the challenge of hand-eye interaction. Hand2World leverages projections derived from 3D hand meshes as control signals and utilizes pixel-level Plücker ray embeddings to precisely encode camera motion. This approach successfully separates hand movements from head perspective rotations. From a technical standpoint, Hand2World supports streaming output and enables continuous interaction of unlimited duration. It substantially enhances the visual quality and 3D consistency of generated videos, as demonstrated across three major benchmark tests.
