DeepGlint’s Visual Encoder Enhanced to Underpin a Multimodal Large Model
9 hour ago / Read about 0 minute
Author:小编   

During the performance briefing held on May 20, DeepGlint unveiled that its visual encoder, Glint-MVT v2.0, has undergone an upgrade. It now functions as the foundational visual model for LLaVA-OneVision-2.0. This upgraded model has made remarkable strides in both performance and functionality. By integrating image and video encoding processes and leveraging information from the video compression domain to minimize redundancy, the model has achieved a fivefold increase in inference speed. Consequently, it offers more streamlined and efficient visual comprehension support for multimodal large models.