Text-to-Image Generation Enters the Agent Era: CUHK and UC Berkeley Jointly Open-Source Gen-Searcher
20 hour ago / Read about 0 minute
Author:小编   

Over the past two years, image generation models have primarily adopted a 'direct generation' approach. However, traditional text-to-image models often underperform in tasks involving real-world knowledge due to their lack of agent capabilities oriented towards the real world. To address this, the research team introduced Gen-Searcher, the first attempt to train an agent with 'deep search' capabilities for image generation tasks, enabling the model to search and reason like an agent. The research team constructed generated data and proposed the KnowGen benchmark. The core of Gen-Searcher lies in transforming the information acquisition process into a trainable agent, equipped with three types of tools, trained in two stages, and incorporating a dual-reward feedback mechanism. Experimental results demonstrate that Gen-Searcher significantly enhances the accuracy and quality of image generation, showcasing the immense potential of agentic generation in knowledge-intensive image generation tasks. It provides a new pathway for building integrated generation systems and marks a significant step forward towards the agentic era in generative systems.