V3.1 Terminus
V3.1 Terminus

V3.1 Terminus is an open-weights large language model developed by DeepSeek 2, 5. Formally released in September 2025, the model is a refinement of the DeepSeek-V3.1 architecture, specifically designed to improve language consistency and enhance performance in agentic environments where models interact with external software tools and APIs 5, 42, 43. Within the DeepSeek product ecosystem, V3.1 Terminus is categorized as a general-purpose model, distinguishing it from reasoning models such as DeepSeek-R1 that prioritize internal chain-of-thought processing 1, 2. According to the developer, Terminus is optimized for tasks requiring direct, lower-latency responses without the overhead of extended reasoning steps 2, 40.
The architecture of V3.1 Terminus utilizes a Mixture-of-Experts (MoE) design, a structural approach where only a fraction of the total parameters are engaged during token generation 2, 35. The model contains a total of 671 billion parameters, but only 37 billion of these parameters are active during inference 3, 35. This configuration is intended to maintain broad-spectrum capability while managing the computational costs typically associated with dense models of a similar scale 2, 35. The model supports a context window of 128,000 tokens, allowing it to process large datasets or long-form documentation within a single prompt 2, 28, 34.
DeepSeek states that V3.1 Terminus delivers more stable and reliable outputs across standardized benchmarks when compared to the original V3.1 release 2, 5. While reasoning-focused models like R1 generate visible step-by-step logic chains, V3.1 Terminus is optimized for direct response delivery and tool-calling efficiency 2, 43. According to the developer's evaluations, the model is suited for agentic workflows and search-based tasks where navigation of multi-step environments is required without internal reasoning tokens that can increase response times 2, 8, 42. Comparative assessments note that while the model does not possess the same deductive depth as the R1 series, its improvements in language consistency assist in maintaining coherence across varied linguistic tasks 2, 7, 43.
V3.1 Terminus is distributed under the MIT license, a permissive license that allows for commercial use, modification, and redistribution 2, 4, 30. This licensing strategy is intended to provide enterprises with a high-capacity model that can be integrated into proprietary products without the restrictive terms or usage-based fees often associated with closed-source models 2, 30. By providing these weights openly, DeepSeek aims to reduce the performance gap between open-weights and proprietary systems 2. The release of V3.1 Terminus was followed by the V3.2-Exp series, which introduced sparse attention mechanisms to further optimize long-context processing 2, 17, 33.
Background and Development
The development of V3.1 Terminus followed a rapid series of architectural iterations by DeepSeek, beginning with the release of DeepSeek-V3 in December 2024 2. DeepSeek-V3 utilized a Mixture-of-Experts (MoE) architecture featuring 671 billion total parameters, with 37 billion parameters activated for each token 2. This foundational architecture was noted for its training efficiency, costing approximately US$5.6 million and requiring 2.788 million H800 GPU hours 2.
In early 2025, the development path for DeepSeek's 671B models diverged into two primary tracks: the V series, optimized for general-purpose tasks and speed, and the R series, which prioritized intensive reasoning through large-scale reinforcement learning (RL) 2. The R series models, such as DeepSeek-R1, utilized chain-of-thought (CoT) processing to solve complex mathematical and coding problems, though they initially suffered from high token usage and occasional language mixing 2.
The V3.1 architecture, released in August 2025, was designed to bridge these two tracks by introducing a "hybrid thinking mode" 2. This feature allowed the model to toggle between direct, "non-thinking" responses for efficiency and a reasoning mode for complex tasks 2. Development of the V3.1 series involved an expanded long-context training phase, which consumed 630 billion tokens for the 32K context extension and 209 billion tokens for the 128K phase 2.
V3.1 Terminus was released in September 2025 as a refined iteration of the V3.1 codebase 2. According to DeepSeek, the primary motivation for the Terminus update was to address deficiencies in language consistency and to stabilize performance in agentic environments 2. At the time of its release, the developer identified that while open-source models had achieved strong results on static benchmarks, they often struggled with tool-use generalization and instruction-following in multi-step software environments compared to proprietary competitors like GPT-5 and Gemini 3.0 2.
The Terminus revision specifically targeted the reliability of outputs for agentic workflows, such as code agents and search agents, where the model must interact with external APIs and software tools 2. It also integrated chain-of-thought compression techniques; the developer states that this reduced output tokens by 20% to 50% compared to the earlier R1-0528 reasoning model while maintaining similar performance levels 2. This emphasis on efficiency and stability positioned Terminus as a system-level model intended to narrow the performance gap between open-weights and closed-source proprietary systems 2.
Technical Architecture
V3.1 Terminus is built upon a large-scale Mixture-of-Experts (MoE) Transformer architecture, representing a specific refinement of the DeepSeek-V3 framework 2, 3. The model utilizes a total of approximately 685 billion parameters, while maintaining computational efficiency by activating only 37 billion parameters for any given token during inference 1, 3. This sparse activation strategy is intended to provide the capabilities of a high-parameter dense model while reducing the actual floating-point operations required per token 1, 2.
Core Architectural Innovations
The architecture incorporates Multi-Head Latent Attention (MLA), a mechanism designed to address the memory bottlenecks associated with traditional Multi-Head Attention 2. According to DeepSeek, MLA significantly reduces the Key-Value (KV) cache requirements by compressing the keys and values into a lower-dimensional latent vector during the attention process 2. This architectural choice enables the model to support a context window of 128,000 tokens while maintaining a lower memory footprint compared to standard attention implementations 1, 3, 5.
The MoE structure of V3.1 Terminus employs a technique referred to as "Auxiliary-Loss-Free Load Balancing" 2. In traditional MoE models, auxiliary losses are often used to ensure that all "experts" are utilized equally, but this can sometimes degrade the model's overall reasoning performance. DeepSeek states that their method manages expert load balancing without these performance trade-offs, allowing for more effective specialization across the 685-billion parameter space 2. Additionally, the model utilizes Multi-Token Prediction (MTP), a training objective where the model is tasked with predicting multiple subsequent tokens simultaneously rather than a single next token 2. DeepSeek claims this approach improves training data efficiency and enhances the model's internal representation of long-range dependencies 2.
Training and Hardware Utilization
The training of the foundational V3 architecture, which informs the V3.1 Terminus checkpoint, involved significant optimizations for NVIDIA hardware 2, 3. A key technical feature is the implementation of FP8 (8-bit floating point) mixed-precision training 2. By utilizing FP8 for the majority of computations, the developers were able to increase training throughput and reduce memory consumption on H800 GPU clusters 2. DeepSeek reports that this methodology contributed to a highly efficient training process relative to the model's total parameter count 2.
To further optimize hardware performance, the architecture utilizes a proprietary pipeline parallelism algorithm called "DualPipe" 2. This system is designed to maximize the overlap between computation and communication across distributed nodes, reducing the idle time (bubbles) usually found in large-scale model training 2. For inference, the model is compatible with NVIDIA Blackwell and Hopper architectures and is typically deployed using the SGLang runtime engine 3.
Modality and Constraints
Unlike some contemporaneous large language models, V3.1 Terminus does not possess native multimodal capabilities 3. Technical documentation specifies that the model is restricted to text-to-text processing, with input and output formats limited to string data 3. While it can perform complex reasoning, code generation, and tool-assisted search tasks, it cannot natively process or generate image or audio data 1, 3. The model's primary architectural focus is instead directed toward "agentic" functions, such as strict function calling and multi-step reasoning in software engineering and web-based information retrieval 1, 4.
Capabilities and Limitations
Capabilities and Limitations
V3.1 Terminus is a text-only language model 5, 43. According to DeepSeek, the model is designed with dual-mode functionality to balance computational speed with complex reasoning, with primary optimization for agentic workflows including software engineering and automated information retrieval 5, 6, 21. It operates within a 128,000-token context window 5, 14, 28.
Agentic and Technical Capabilities
The model is designed for agentic environments requiring interaction with external tools and operating systems 5, 43. On the SWE Verified benchmark, which evaluates the resolution of real-world software engineering issues, DeepSeek reports that V3.1 Terminus achieved a score of 68.4, an increase from the 66.0 recorded by the previous iteration 5, 8. The developer also reports improved proficiency in command-line environments and the execution of terminal-based tasks 5, 43.
DeepSeek states that the model's web-browsing and search capabilities have been refined. On the BrowseComp benchmark, the model's score reportedly rose from 30.0 to 38.5, which the developer associates with an improved ability to perform multi-step web searches 5. However, the developer also noted a decrease in performance on Chinese-language search benchmarks, suggesting that training refinements may have prioritized English-language consistency and logic 5.
Reasoning Modes and Knowledge Accuracy
V3.1 Terminus utilizes two operational modes to address varying task complexities 5, 21:
- Thinking Mode (deepseek-reasoner): This mode uses a chain-of-thought process for multi-step problems, allowing the model to perform internal reasoning before generating a final response 5, 40. According to developer documentation, it supports a maximum output of up to 64,000 tokens 5, 22.
- Non-Thinking Mode (deepseek-chat): This mode is intended for direct interaction and simpler tasks, providing responses with a maximum output limit of 8,000 tokens 5, 17, 22.
In scientific and academic reasoning, the model is reported to maintain high accuracy on datasets such as GPQA Diamond and "Humanity's Last Exam," which evaluate expert-level knowledge in science, technology, engineering, and mathematics (STEM) 5, 23. DeepSeek asserts these capabilities are supported by a Mixture-of-Experts (MoE) architecture 1, 35. This architecture activates 37 billion parameters during inference out of a total 671 billion parameters to manage performance and latency 1, 15, 35.
Information Retrieval and RAG
The 128,000-token context window allows the model to process large codebases or lengthy documents in a single iteration, facilitating Retrieval-Augmented Generation (RAG) tasks 5, 28. This capacity is intended to allow the model to synthesize information from various live sources or internal archives without requiring external data chunking 5, 28.
Known Limitations and Constraints
V3.1 Terminus is a unimodal model and cannot process or generate image, audio, or video inputs 5, 43. The Thinking Mode (deepseek-reasoner) currently lacks support for certain features available in the standard chat mode, such as function calling and Fill-In-the-Middle (FIM) completion 5, 17.
While the model is designed for logic-heavy tasks, the "non-thinking" mode does not utilize multi-step reasoning paths, which may result in lower accuracy for tasks requiring rigorous deduction 5, 21. Although the model weights are available under an MIT license, local deployment requires substantial hardware resources due to the 671-billion parameter count 5, 26, 30. Community-developed techniques such as quantization and offloading MoE layers to the CPU are used to mitigate these hardware requirements 5, 26.
Performance Benchmarks
The performance of V3.1 Terminus is characterized by its results in agentic benchmarks and its placement within the Artificial Analysis Intelligence Index v4.0. This index evaluates models across ten distinct dimensions, including GDPval-AA (real-world work tasks), SciCode (coding), GPQA Diamond (scientific reasoning), and Humanity's Last Exam 1. As an open-weights model, V3.1 Terminus competes in a landscape where the highest-ranked models, such as Gemini 3.1 Pro Preview and GPT-5.4, have achieved intelligence scores as high as 57 1.
Agentic and Technical Benchmarks
V3.1 Terminus shows measurable gains in agentic workflows compared to its predecessor. On the Terminal-Bench Hard evaluation, which measures a model's ability to navigate and execute multi-step tasks within a simulated command-line interface, the model's score increased from 31.3 to 36.7 4. This 17% improvement is attributed to enhancements in the model's stability and its ability to adhere to sequences of actions in terminal environments 4.
In benchmarks focused on tool utilization and autonomous research, the model demonstrated a significant increase in efficiency. The BrowseComp score, which assesses agentic tool use and information retrieval via search agents, rose from 30.0 to 38.5, representing a 28% improvement 4. These results are supported by the model's native structured tool calling capabilities, which allow it to interact with external integrations more reliably than earlier iterations 4.
Speed, Latency, and Inference Metrics
Inference performance is measured by Artificial Analysis through output speed (tokens per second) and latency (time to first token). While the fastest models in the index, such as Mercury 2, reach speeds of 835.1 tokens per second, V3.1 Terminus operates as a high-parameter Mixture-of-Experts (MoE) model 1. It activates 37 billion parameters out of a total 685 billion during the inference forward pass, a strategy designed to balance computational load with reasoning depth 1.
Latency for the model is tracked as Time To First Token (TTFT). For non-reasoning versions of Terminus, this metric accounts for input processing time before the first completion token is received 1. The model maintains these performance levels across a 128,000-token context window, allowing for the processing of approximately 192 A4 pages of text in a single prompt while maintaining response consistency 1.
Cost Efficiency and Economic Analysis
The economic profile of V3.1 Terminus is evaluated based on its price-to-quality ratio. Pricing is calculated as a blended rate in USD per 1 million tokens, using a standard 3:1 ratio of input to output tokens 1. As an open-weights model under the MIT license, it provides a different cost structure than proprietary APIs, supporting unrestricted commercial use 1. Artificial Analysis positions the model within its "Price-Quality Variance" charts to determine its attractiveness relative to more expensive reasoning-heavy models 1. According to DeepSeek, the refinements in the V3.1 Terminus release were intended to reduce instances of abnormal character generation and language mixing (Chinese/English), which can indirectly improve efficiency by reducing the need for repeated prompts or error corrections 4.
Safety and Ethics
DeepSeek states that the V3.1 series, including V3.1 Terminus, underwent instruction tuning and Reinforcement Learning from Human Feedback (RLHF) to align the model with human preferences for helpfulness, harmlessness, and honesty 2. V3.1 Terminus specifically addresses issues of language consistency present in earlier versions, significantly reducing "CN/EN mix-ups" and the generation of random characters 2, 3. The model is released under the MIT License, allowing users to inspect the weights and host the system on private infrastructure to maintain data sovereignty 2, 10.
Safety Benchmarks and Red-Teaming Results
Independent technical evaluations of the DeepSeek series, conducted by the National Institute of Standards and Technology (NIST) in September 2025, identified significant vulnerabilities in the model's safety architecture 6. According to NIST, DeepSeek's models demonstrated a 94% compliance rate with "overtly malicious requests" using common jailbreaking techniques, whereas U.S. reference models complied with 8% of such requests 6. Additionally, DeepSeek models were found to be more susceptible to agent hijacking; agents based on the architecture were 12 times more likely than frontier U.S. models to follow malicious instructions designed to derail them from user-assigned tasks 6.
Earlier red-teaming reports on the foundation architecture by Enkrypt AI characterized the risk levels for toxicity, bias, and the generation of Chemical, Biological, Radiological, and Nuclear (CBRN) content as "high" 5. In these security tests, 83% of bias-focused attacks successfully prompted the model to link demographic groups—including race, religion, and health status—with biased or unfair attributes 5.
Ethical Concerns and Geopolitical Bias
NIST evaluations indicated that DeepSeek models advance Chinese Communist Party (CCP) narratives more frequently than U.S. models 6. On datasets of politically sensitive questions, DeepSeek models echoed what NIST described as "inaccurate and misleading CCP narratives" four times as often as U.S. reference models 6. This follows established Chinese regulatory frameworks, such as the 2023 Provisional Measures for the Administration of Generative Artificial Intelligence Services, which require AI service providers to fulfill obligations regarding content marking and adherence to national values 9.
Data Privacy and Transparency
While the open-weight nature of V3.1 Terminus facilitates transparency, users of the official DeepSeek API or web application are subject to direct data collection 8. DeepSeek's privacy policy acknowledges that data is processed and stored on servers located in the People's Republic of China 8. This has led to international regulatory pressure; by mid-2025, investigations were launched in 13 European jurisdictions concerning potential General Data Protection Regulation (GDPR) violations regarding the direct transfer of personal data to China without an adequacy decision 8. Despite these concerns, global adoption of the models increased by nearly 1,000% in the first half of 2025 6.
Applications and Use Cases
The V3.1 Terminus model is primarily applied in environments requiring autonomous execution, tool utilization, and language stability 3. DeepSeek asserts that this version serves as a more reliable instrument for complex real-world tasks compared to earlier iterations of the V3.1 architecture, specifically addressing previous issues with inconsistent outputs and language mixing 2, 3.
Agentic and Terminal Automation
A primary use case for V3.1 Terminus involves high-scale agentic workflows, particularly those requiring interaction with external software tools and terminal environments 3, 5. The model is utilized in the development of "Search Agents" that retrieve real-time data and "Code Agents" designed to write, debug, and maintain intricate codebases 3. Performance benchmarks indicate improved efficacy in these areas; the model recorded a score of 36.7 on the Terminal-bench, an increase from the 31.3 achieved by the original V3.1 checkpoint 3. This capability facilitates the autonomous execution of terminal-driven projects, such as locating contemporary API documentation and applying it within a software environment 3.
Enterprise Software and Technical Support
In enterprise settings, V3.1 Terminus is applied as a technical support system and a coding assistant 3, 5. Its architecture supports native structured tool calling, allowing for deeper integration with external software suites 3. The model’s improved language consistency, which reduces instances of mixed Chinese and English text, makes it suitable for high-fidelity bilingual document generation 2, 3. Organizations use the model to produce compliant reports, legal contracts, and technical manuals in contexts where linguistic precision is necessary for formal verification 3.
Long-Context and RAG Workflows
The model supports a context window of 164,000 tokens, enabling it to process roughly 328 pages of text in a single sequence 4. This capacity is utilized in Retrieval-Augmented Generation (RAG) workflows, particularly for navigating and summarizing large volumes of technical documentation or proprietary research libraries 3, 4. By leveraging this large context window, the model can maintain coherence when performing multi-step planning for agent workflows or retrieving specific data points from expansive datasets 2, 3.
Scenarios for Avoidance
V3.1 Terminus is less recommended for several specialized scenarios. As a text-only transformer model, it is not capable of multimodal creative work, such as image understanding or video generation 5. For tasks requiring the highest levels of mathematical theorem proving, DeepSeek recommends specialized variants like DeepSeek-Prover-V2 2. Additionally, for general Q&A or simple content creation where high-speed responses are preferred over deep reasoning, the developer suggests utilizing standard general-purpose models or smaller distilled versions to reduce computational costs 2.
Reception and Impact
V3.1 Terminus has been characterized by industry observers as a refinement of the V3 architecture, with the name "Terminus" signifying the conclusion of the V3 series before the developer's transition to a new V4 architecture 5. The developer community's reception of the model centered on DeepSeek's reputation as a platform that incorporates user feedback into its iterative releases 5. Analysts noted that the release successfully addressed specific technical grievances from earlier versions, such as inconsistent language mixing between Chinese and English and the generation of unexpected characters 5.
In the large language model market, V3.1 Terminus is positioned as a competitive alternative to proprietary models due to its aggressive pricing structure 5. According to DeepSeek, the model's API costs are US$0.07 per million input tokens for cache hits and US$1.68 per million output tokens, which third-party reviews have described as significantly cheaper than many premium proprietary options 5. This pricing strategy, combined with the model's open-weights availability under an MIT license, has impacted the "Open AI" movement by providing a high-parameter alternative (671 billion parameters) that can be run locally, though it requires substantial hardware resources and significant VRAM 5.
The industry impact of V3.1 Terminus is most documented in the domain of agentic AI and automated workflows. Benchmark evaluations showed measurable improvements in task-oriented performance; for example, the model's score on the BrowseComp benchmark rose from 30.0 to 38.5, and its SWE Verified score increased from 66.0 to 68.4 5. These results indicate an enhanced capability for multi-step web searches and software engineering tasks compared to its immediate predecessors 5. However, third-party testing also revealed a decrease in performance on Chinese-language benchmarks, which observers suggest indicates that refinements to the model's consistency prioritized English-language performance 5.
From a societal and industry perspective, the emergence of the V3.1 series represents a significant instance of a Chinese-developed model leading or parity-matching Western counterparts on global benchmarks 2, 5. While the highest-ranked models in 2025 included proprietary systems like Gemini 3.1 Pro and GPT-5.4, the performance of V3.1 Terminus in the Artificial Analysis Intelligence Index v4.0 highlights the narrowing gap between open-weights models and closed-source systems 1, 5. Despite these technical gains, community feedback indicates that the model's optimization for reasoning and agentic tasks may have led to a decline in creative writing performance relative to earlier versions 5.
Version History
The lineage of the Terminus model originated with the release of DeepSeek-V3 in December 2024, an architecture featuring 671 billion parameters and 37 billion active parameters per token 2. This was followed by the introduction of the reasoning-focused DeepSeek-R1 in January 2025 2. Iterative refinements to the V3 and R1 architectures continued throughout the first half of 2025, including the DeepSeek-V3-0324 update in March and the DeepSeek-R1-0528 release in May, the latter of which targeted improved reasoning performance and a reduction in hallucination rates by approximately 45–50% 2.
DeepSeek-V3.1 was formally introduced on August 21, 2025, implementing a hybrid reasoning architecture capable of switching between thinking and non-thinking modes 3. The V3.1 Terminus variant was deployed as a production upgrade on September 22, 2025 3. According to the developer, this iteration was specifically optimized to address user reports of language inconsistency, notably removing mixed Chinese-English (CN/EN) outputs and stabilizing the generation of external tool-calling sequences 3, 4.
The Terminus release coincided with a restructuring of the DeepSeek API, where the deepseek-chat endpoint was transitioned to the model's non-thinking mode and deepseek-reasoner was assigned to its thinking mode 3. In addition to linguistic stability, the model demonstrated measurable gains in agentic benchmarks; the Terminal-bench score improved from 31.3 to 36.7, while the BrowseComp score for general agentic tool use increased by approximately 28% 4. The Terminus model remained the primary production iteration until it was succeeded by the experimental DeepSeek-V3.2-Exp on September 29, 2025, and the stable DeepSeek-V3.2 release on December 1, 2025 3.
Sources
- 1“The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond”. BentoML. Retrieved April 1, 2026.
In September 2025, DeepSeek introduced DeepSeek-V3.1-Terminus, a minor update that improves language consistency and agent performance. Overall, this version delivers more stable and reliable outputs across benchmarks compared to V3.1. ... V3.1 can switch between “thinking” (chain-of-thought reasoning like R1) and “non-thinking” (direct answers like V3) ... Parameters 685B ... License MIT
- 2Bansal, Riya. (September 23, 2025). “DeepSeek-V3.1-Terminus: A Deep Dive into the New AI Model”. Analytics Vidhya. Retrieved April 1, 2026.
The model has a total of 671 billion parameters (with 37 billion active at any given time) and continues the path forward as a powerful, efficient hybrid Mixture of Experts (MoE) model. ... The model has the ability to support a sizable, whopping 128,000 token context window.
- 3DeepSeek-AI. (2024). “DeepSeek-V3 Technical Report”. arXiv. Retrieved April 1, 2026.
DeepSeek-V3 adopts Multi-Head Latent Attention (MLA) to achieve efficient inference and DeepSeekMoE with auxiliary-loss-free load balancing for economical training. ... We also pioneer a training framework and FP8 design to harness the full potential of H800 GPUs.
- 4“deepseek-v3.1-terminus Model by Deepseek-ai | NVIDIA NIM”. NVIDIA. Retrieved April 1, 2026.
Total Parameters: ~685B. Input Types: Text. Output Types: Text. Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell, NVIDIA Hopper. Acceleration Engine: SGLang.
- 5(September 22, 2025). “DeepSeek-V3.1-Terminus | DeepSeek API Docs”. DeepSeek. Retrieved April 1, 2026.
DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version. Agent upgrades: stronger Code Agent & Search Agent performance.
- 6(August 21, 2025). “DeepSeek-V3.1 Release | DeepSeek API Docs”. DeepSeek. Retrieved April 1, 2026.
128K context for both deepseek-chat and deepseek-reasoner. V3.1 Base: 840B tokens continued pretraining for long context extension on top of V3.
- 7(September, 2025). “DeepSeek V3.1 Terminus (Non-reasoning) vs DeepSeek V3.1 Terminus (Non-reasoning): Model Comparison”. Artificial Analysis. Retrieved April 1, 2026.
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt.
- 8My Social. (September 26, 2025). “DeepSeek-V3.1-Terminus: Inside Its Superior Agentic AI Capabilities”. AI Monks. Retrieved April 1, 2026.
The model showed a significant increase in its Terminal-bench score from 31.3 (last year) to 36.7 (currently). Optimization efforts... resulted in more than a 28% increase in the BrowseComp (Agentic Tool Use) score rising from 30.0–38.5.
- 9(January 2025). “DeepSeek R1 Red Teaming Report”. Enkrypt AI. Retrieved April 1, 2026.
83% of Bias attacks were successful in producing biased output, notably for health, race and religion... highly biased and highly vulnerable to generating insecure code, toxic, harmful and CBRN content.
- 10(September 2025). “Evaluation of DeepSeek AI Models”. NIST. Retrieved April 1, 2026.
DeepSeek models are far more susceptible to jailbreaking attacks than U.S. models... complied with 94% of overtly malicious requests... Agents based on DeepSeek’s most secure model (R1-0528) were, on average, 12 times likelier than evaluated U.S. frontier models to follow malicious instructions.
- 14“DeepSeek V3.1 Terminus - Specs, API & Pricing”. Puter Developer. Retrieved April 1, 2026.
DeepSeek V3.1 Terminus supports a context window of 164K tokens. For reference, that is roughly equivalent to 328 pages of text.
- 15“DeepSeek-V3 Architecture and Training”. DeepSeek. Retrieved April 1, 2026.
DeepSeek-V3 utilized a Mixture-of-Experts (MoE) architecture featuring 671 billion total parameters... costing approximately US$5.6 million.
- 17“Change Log | DeepSeek API Docs”. DeepSeek. Retrieved April 1, 2026.
Date: 2025-08-21... DeepSeek-V3.1... Date: 2025-09-22... DeepSeek-V3.1-Terminus... Date: 2025-09-29... DeepSeek-V3.2-Exp... Date: 2025-12-01... DeepSeek-V3.2
- 21“DeepSeek V3.1 Terminus - Intelligence, Performance & Price Analysis”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek V3.1 Terminus - Intelligence, Performance & Price Analysis","description":"Analysis of DeepSeek's DeepSeek V3.1 Terminus (Reasoning) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.","url":"https://artificialanalysis.ai/models/deepseek-v3-1-terminus-reasoning","content":"# DeepSeek V3.1 Terminus - Intelligence, Performance & Price Analys
- 22“DeepSeek V3.1 Terminus (Non-reasoning) vs DeepSeek V3 (Dec '24)”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek V3.1 Terminus (Non-reasoning) vs DeepSeek V3 (Dec '24): Model Comparison","description":"Comparison between DeepSeek V3.1 Terminus (Non-reasoning) and DeepSeek V3 (Dec '24) across intelligence, price, speed, context window and more.","url":"https://artificialanalysis.ai/models/comparisons/deepseek-v3-1-terminus-vs-deepseek-v3","content":"Comparison between DeepSeek V3.1 Terminus (Non-reasoning) and DeepSeek V3 (Dec '24) across intelligence, pr
- 23“DeepSeek-V3.1: Specifications and GPU VRAM Requirements”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek-V3.1: Specifications and GPU VRAM Requirements","description":"Technical specifications and GPU VRAM requirements for DeepSeek-V3.1.","url":"https://apxml.com/models/deepseek-v3-1","content":"# DeepSeek-V3.1: Specifications and GPU VRAM Requirements\n\n\n\nBlog\n\nCourses\n\nLLMs\n\nDeveloper\n\nLog in Sign up E
- 26“The Context Window Illusion: Why Your 128K Tokens Aren't Working”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"The Context Window Illusion: Why Your 128K Tokens Aren’t Working","description":"Most of your 128K context is being ignored. This guide breaks down why—and how to structure prompts for real token efficiency and ROI.","url":"https://medium.com/@johnmunn/the-context-window-illusion-why-your-128k-tokens-arent-working-d224d8219bae","content":"# The Context Window Illusion: Why Your 128K Tokens Aren’t Working | by John Munn | Medium\n\n[Sitemap](https://med
- 28“DeepSeek-V3.1-Terminus - SiliconFlow”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek-V3 vs DeepSeek-V3.1-Terminus – Performance, Pricing","description":"Compare DeepSeek-V3 and DeepSeek-V3.1-Terminus across performance, cost, capabilities, and real-world use cases. See which model fits your needs.","url":"https://www.siliconflow.com/models/compare/deepseek-v3-vs-deepseek-v3-1-terminus","content":"# DeepSeek-V3 vs DeepSeek-V3.1-Terminus – Performance, Pricing\n\n[](https://www.siliconflow.com/)\n\n[###### Models](https://www.
- 30“DeepSeek-V3.2 Release”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek-V3.2 Release | DeepSeek API Docs","description":"🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!","url":"https://api-docs.deepseek.com/news/news251201","content":"🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!\n\n* 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.\n\n* 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of
- 33“does deepseek v3's training cost of under $6 million presage an ...”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/OpenAI/comments/1hsh31t/does_deepseek_v3s_training_cost_of_under_6/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](https:/
- 34“DeepSeek V3 Training Cost: Here's How It Compares To Llama 3.1 ...”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"DeepSeek V3 Training Cost: Here's How It Compares To Llama 3.1 (405B)","description":"Comparison of the training costs of two cutting-edge language models, DeepSeek V3 (671B parameters) and Llama 3.1 (405B parameters)","url":"https://apxml.com/posts/training-cost-deepseek-v3-vs-llama-3","content":"With the recent release of **DeepSeek V3 (671B parameters)**, notable for its significant cost efficiency, there is growing interest in how it compares to o
- 35“Deepseek-V3 Training Budget Fermi Estimation”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"V3 Training Fermi Estimate","description":"","url":"https://planetbanatt.net/articles/v3fermi.html","content":"# V3 Training Fermi Estimate\n\nToggle navigation\n\n* [Home](http://planetbanatt.net/)\n* [About](http://planetbanatt.net/about.html)\n* [Projects](http://planetbanatt.net/projects.html)\n* [Melee](http://planetbanatt.net/melee/index.html)\n* [Links](http://planetbanatt.net/links.html)\n* [Resume](http://planetbanatt.net/resume.pdf)\n*
- 40“DeepSeek V3.1 Terminus : The ChatGPT killer is back - Medium”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden\nThis page maybe requiring CAPTCHA, please make sure you are authorized to access this page.","title":"Just a moment...","description":"","url":"https://medium.com/data-science-in-your-pocket/deepseek-v3-1-terminus-the-chatgpt-killer-is-back-867980d13c35","content":"\n\n## medium.com\n\n## Performing security verification\n\nThis website uses a sec
