On the evening of September 22, DeepSeek made an announcement revealing that its online large language model has undergone an upgrade, now presenting itself as the DeepSeek-V3.1-Terminus version. This upgraded iteration introduces two distinct modes: 'Thinking Mode' and 'Non-Thinking Mode.' Both modes are designed to accommodate 128K long contexts, providing users with an enhanced online experience. Specifically, the Non-Thinking Mode comes with a default output length of 4K tokens, which can be extended up to a maximum of 8K tokens. On the other hand, the Thinking Mode offers a default output length of 32K tokens, with the capability to reach a maximum of 64K tokens. Regarding the cost structure, when the cache is successfully utilized (cache hit), the input cost is set at 0.5 yuan per million tokens. Conversely, when the cache is not used (cache miss), the input cost rises to 4 yuan per million tokens. As for the output, the cost remains consistent at 12 yuan per million tokens, regardless of the mode selected.