Sources familiar with the matter revealed that OpenAI showcased a major technological breakthrough earlier this month: through a new optimization solution, it successfully reduced model inference costs by over 50% while significantly improving computational efficiency. This solution has been applied to the ChatGPT scenarios for unregistered visitors, resulting in a substantial decrease in the number of GPUs required. Specific technical details have not been disclosed, but they may involve optimization techniques such as quantization compression, key-value caching, batch processing, and lightweight model shunting.
