NVIDIA Optimizes DeepSeek-V4 AI Model, Achieving Ready-to-Use Performance Over 150 Tokens/sec/User

20 hour ago / Read about 0 minute

Author：小编

On April 25, NVIDIA unveiled in a blog post that its Blackwell platform now accommodates two new models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Specifically, DeepSeek-V4-Pro boasts a total of 1.6 trillion parameters, with 49 billion actively utilized, tailored for sophisticated reasoning tasks. On the other hand, DeepSeek-V4-Flash encompasses 284 billion parameters in total, with 13 billion in active use, aimed at high-speed and efficient application scenarios. Both models feature a 1-million Token context window and support a maximum output length of 384,000 Tokens. They are distributed under the MIT open-source license. Regarding performance, DeepSeek-V4-Pro exhibits impressive ready-to-use performance, surpassing 150 tokens per second per user on the NVIDIA GB200 NVL72, with the potential for further enhancements when deployed via vLLM on the Blackwell B300. Developers have the flexibility to download and deploy these models through NVIDIA NIM microservices or leverage the SGLang and vLLM frameworks for bespoke inference solutions.

Previous page：DeepSeek-V4-Pro Rolls Out Limited-Time API Discoun...

Next page：Has Jensen Huang’s Greatest Fear Become Reality? D...

Return to List

Hot Reading

1 day ago

Report: Samsung execs worried company could lose money on smartphones for the first time

1 day ago

Apple Wallet's Digital ID May Now Be Used for Age Verification on Apple Accounts, Services

1 day ago

ComfyUI hits $500M valuation as creators seek more control over AI-generated media

2 day ago

BMW bumps the 7 Series for 2027, adds all-new battery

2 day ago

Porsche is adding an all-electric Cayenne coupe to its lineup

2 day ago

Greenhouse gases from data center boom could outpace entire nations

1 day ago

Tesla’s Cybercab goes into production — so why is Musk tapping the brakes?

2 day ago

Intel stock jumps 28%, setting record, after it posts strong Q1 with rising forecasts

2 day ago

One of the First Engineers to Deploy Fine-Tuned Language Models for Real-Time Content Safety Is Now Securing the Enterprise

1 day ago

CPU requirements for AI workloads are multiplying, driving intensifying shortages and price hikes

Previous page：DeepSeek-V4-Pro Rolls Out Limited-Time API Discoun...

Next page：Has Jensen Huang’s Greatest Fear Become Reality? D...