Moore Threads Enhances DeepSeek Distillation Model Inference with Domestic GPU
2025-02-04 / Read about 0 minute
Author:小编   

Moore Threads Intelligent Technology proudly announces the successful deployment of the DeepSeek distillation model inference service. By harnessing DeepSeek's distillation technology, this service masterfully transfers the prowess of large-scale models to smaller, more efficient versions, thereby achieving high-performance inference on domestically-produced GPUs. Built upon the Ollama open-source framework, Moore Threads has completed the deployment of the DeepSeek-R1-Distill-Qwen-7B model, which has demonstrated exceptional performance across various Chinese language tasks. Furthermore, Moore Threads' proprietary high-performance inference engine, coupled with advanced hardware and software co-optimization techniques, has dramatically boosted the model's computational efficiency and resource utilization. This technological leap forward solidifies the foundation for future deployments of even larger-scale models. Customers can now leverage two products, the MTT S80 and MTT S4000, for the inference deployment of the DeepSeek-R1 distillation model.