What Was Actually Revealed in DeepSeek's New Paper—Before It Vanished Overnight? - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

What Was Actually Revealed in DeepSeek's New Paper—Before It Vanished Overnight?

6 hour ago / Read about 0 minute

Author：小编

Last night, Chen Xiaokang, a researcher specializing in multimodal technologies at DeepSeek, shared—and then promptly removed—a tweet on X about a groundbreaking new paper titled "Thinking with Visual Primitives." This paper introduces an innovative approach to multimodal reasoning, aiming to bridge the so-called "reference gap"—a persistent challenge where AI models struggle to accurately pinpoint visual objects during the reasoning process. The solution proposed involves leveraging visual primitives, which are essentially fundamental visual elements such as points and bounding boxes.

The paper delves into the specifics of the model architecture, exploring how visual compression techniques are employed to streamline data processing. It also outlines the methods used for constructing training datasets tailored to this new paradigm, as well as post-training optimization strategies to enhance model performance. Experimental findings indicate that this approach surpasses the capabilities of leading models like GPT-5.4 in tasks requiring precise counting and spatial reasoning, marking a significant advancement in the field.

Previous page：Legislation Governing Embodied Intelligent Robots ...

Next page：How far is DeepSeek V4 from being the world's most...

Return to List

Hot Reading

1 day ago

OpenAI Codex system prompt includes explicit directive to "never talk about goblins"

1 day ago

DJI Lito X1 review: A beginner drone that feels anything but basic

1 day ago

Google TV mega-update coming with a tonne of new content and features – boost your TV or streaming device for free

1 day ago

Amazon’s cloud business is surging — and so is its capital spending