DeepSeek Unveils Technical Report on Multimodal Model: Outperforming GPT-5.4 - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

DeepSeek Unveils Technical Report on Multimodal Model: Outperforming GPT-5.4

9 hour ago / Read about 0 minute

Author：小编

On April 30, 2026, DeepSeek released a technical report for a multimodal model, titled Thinking with Visual Primitives, on GitHub. This report provides an in-depth look at the technical foundations of DeepSeek's newly launched image recognition mode. Built on the robust DeepSeek V4-Flash architecture—which boasts a total of 284 billion parameters and incorporates a Mixture of Experts (MoE) design that activates 13 billion parameters during inference—the model introduces an innovative multimodal reasoning approach. It elevates traditional linguistic reasoning chains to a dual-track thinking process that seamlessly integrates 'linguistic logic' with 'spatial coordinates'. Throughout the reasoning process, the model directly outputs specific bounding boxes or points, effectively 'pointing out' the objects of interest within images. It then continuously refers to these visual anchors for subsequent judgments, significantly boosting the accuracy of visual reasoning.

Through a visual compression strategy, the model retains only around 90 visual entries in the KV cache, even for high-resolution images. This approach achieves over 7,000-fold compression, making the thinking process notably more 'lightweight'. In a series of challenging visual question-answering tasks, the model demonstrated superior performance, outperforming competitors such as GPT-5.4, Claude-Sonnet-4.6, Gemini-3-Flash, and Qwen3-VL.

Previous page：NSA Tests Mythos Model, Claims to Be Impressed by ...

Next page：ChatGPT Uninstallations Surge by 413%, Claude Down...

Return to List

Hot Reading

1 day ago

OpenAI Codex system prompt includes explicit directive to "never talk about goblins"

1 day ago

DJI Lito X1 review: A beginner drone that feels anything but basic

1 day ago

Google TV mega-update coming with a tonne of new content and features – boost your TV or streaming device for free

1 day ago

Amazon’s cloud business is surging — and so is its capital spending