NVIDIA Optimizes DeepSeek-V4 AI Model, Achieving Ready-to-Use Performance Over 150 Tokens/sec/User
20 hour ago / Read about 0 minute
Author:小编   

On April 25, NVIDIA unveiled in a blog post that its Blackwell platform now accommodates two new models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Specifically, DeepSeek-V4-Pro boasts a total of 1.6 trillion parameters, with 49 billion actively utilized, tailored for sophisticated reasoning tasks. On the other hand, DeepSeek-V4-Flash encompasses 284 billion parameters in total, with 13 billion in active use, aimed at high-speed and efficient application scenarios. Both models feature a 1-million Token context window and support a maximum output length of 384,000 Tokens. They are distributed under the MIT open-source license. Regarding performance, DeepSeek-V4-Pro exhibits impressive ready-to-use performance, surpassing 150 tokens per second per user on the NVIDIA GB200 NVL72, with the potential for further enhancements when deployed via vLLM on the Blackwell B300. Developers have the flexibility to download and deploy these models through NVIDIA NIM microservices or leverage the SGLang and vLLM frameworks for bespoke inference solutions.