Text-to-Image Generation Enters the Agent Era: CUHK and UC Berkeley Jointly Open-Source Gen-Searcher - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Text-to-Image Generation Enters the Agent Era: CUHK and UC Berkeley Jointly Open-Source Gen-Searcher

20 hour ago / Read about 0 minute

Author：小编

Over the past two years, image generation models have primarily adopted a 'direct generation' approach. However, traditional text-to-image models often underperform in tasks involving real-world knowledge due to their lack of agent capabilities oriented towards the real world. To address this, the research team introduced Gen-Searcher, the first attempt to train an agent with 'deep search' capabilities for image generation tasks, enabling the model to search and reason like an agent. The research team constructed generated data and proposed the KnowGen benchmark. The core of Gen-Searcher lies in transforming the information acquisition process into a trainable agent, equipped with three types of tools, trained in two stages, and incorporating a dual-reward feedback mechanism. Experimental results demonstrate that Gen-Searcher significantly enhances the accuracy and quality of image generation, showcasing the immense potential of agentic generation in knowledge-intensive image generation tasks. It provides a new pathway for building integrated generation systems and marks a significant step forward towards the agentic era in generative systems.

Previous page：Kepler Robotics Secures Series A++ Financing Worth...

Next page：Why Are There So Many Affordable Chinese Tokens? C...

Return to List

Hot Reading

2 day ago

Apple and Lenovo have the least repairable laptops, analysis finds

2 day ago

Vision Pro Steam Link App Now in Beta, Will Offer New Gaming Experiences for the Headset

1 day ago

Astropad’s Workbench reimagines remote desktop for AI agents, not IT support

2 day ago

Intel will help build Elon Musk’s Terafab AI chip factory