Zhipu has made an official announcement regarding the release and open - sourcing of the GLM - 4.6V series of multimodal large models. This series comprises two versions: the foundational GLM - 4.6V (106B - A12B) and the lightweight GLM - 4.6V - Flash (9B).
During the training process, these new models have expanded the context window to an impressive 128k tokens. As a result, they have achieved a state - of - the - art (SOTA) level of visual understanding accuracy within the same parameter scale.
What sets these models apart is that they are the pioneers in natively integrating Function Call capabilities into visual models. This integration allows for a seamless and complete pipeline, starting from 'visual perception' and leading all the way to 'executable actions.'
When compared to the GLM - 4.5V, the GLM - 4.6V series offers a significant cost reduction, with prices slashed by 50%. The API call costs are set at a mere 1 yuan per million input tokens and 3 yuan per million output tokens. Moreover, the GLM - 4.6V - Flash version is provided free of charge.
In addition, the series also includes the GLM Coding Plan and has developed a specialized MCP tool.
