Google Launches Gemini Embedding 2, Enabling Unified Multimodal Vector Representation
2 day ago / Read about 0 minute
Author:小编   

On March 10, 2026, Google DeepMind introduced its inaugural native multimodal embedding model, Gemini Embedding 2. This cutting-edge model facilitates the seamless integration of text, images, videos, audio, and documents into a unified embedding space, fostering cross-modal semantic comprehension across an impressive 100 languages. In comparison to its forerunner, it boasts an expanded text context window of 8,192 tokens, the capability to process up to 6 images concurrently, manage videos extending up to 120 seconds, handle audio input without the need for transcription, and support PDFs of up to 6 pages. The model is currently accessible for preview through the Gemini API and Vertex AI platforms, making it an ideal choice for applications such as Retrieval-Augmented Generation (RAG), semantic search, and sentiment analysis.