In the realm of robot manipulation, imitation learning stands as a cornerstone for the advancement of embodied intelligence. Nonetheless, this approach heavily relies on extensive, high-quality real-world demonstration data, which incurs significant acquisition costs and efficiency challenges. Although simulators offer a cost-effective solution for data generation, a pronounced "simulation-to-reality" gap persists, hindering the generalizability and practical application of strategies honed within simulated environments.