On September 30, Zhipu, a prominent Chinese enterprise specializing in large-scale AI models, officially launched and open-sourced its latest iteration, the GLM-4.6. This new model boasts substantial enhancements in core functionalities, particularly in Agentic Coding capabilities. Zhipu officially disclosed that the GLM-4.6 model has been successfully deployed using an FP8+Int4 hybrid quantization approach on Cambricon's domestically produced chips. This achievement signifies the first-ever FP8+Int4 model-chip integration solution to be deployed in a production environment on Chinese-made chips. Moreover, leveraging the vLLM inference framework, Moore Threads' next-generation GPU is capable of stably executing the GLM-4.6 model with native FP8 precision.