DeepGlint’s Visual Encoder Enhanced to Underpin a Multimodal Large Model - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

DeepGlint’s Visual Encoder Enhanced to Underpin a Multimodal Large Model

9 hour ago / Read about 0 minute

Author：小编

During the performance briefing held on May 20, DeepGlint unveiled that its visual encoder, Glint-MVT v2.0, has undergone an upgrade. It now functions as the foundational visual model for LLaVA-OneVision-2.0. This upgraded model has made remarkable strides in both performance and functionality. By integrating image and video encoding processes and leveraging information from the video compression domain to minimize redundancy, the model has achieved a fivefold increase in inference speed. Consequently, it offers more streamlined and efficient visual comprehension support for multimodal large models.

Previous page：Lianxin Technology and Zhejiang University Jointly...

Next page：AI Becomes Lenovo's Core Growth Engine: Annual AI-...

Return to List

Hot Reading

2 day ago

Intuit to lay off over 3,000 employees to refocus on AI

2 day ago

Yearslong fight over users' right to tweak smart TV software heads to trial

2 day ago

NanoClaw creator turns down $20M buyout offer, raises $12M seed instead

2 day ago

Google Stitch Launches Real-Time AI Agent, Multiplayer Editing: Figma Charges $15/Seat