Alibaba's ZeroSearch is an innovative reinforcement learning framework crafted to bolster the search capabilities of large models, all without the need for real search engines. Leveraging the extensive pre-trained knowledge of these large models, ZeroSearch generates pertinent content and dynamically fine-tunes the quality of its output. When compared to traditional search engines, the training cost using SerpAPI stands at approximately $586.70. In contrast, utilizing four A100 GPUs to simulate a large model with 14 billion parameters incurs only $70.80, marking a remarkable cost reduction of over 80%.
