Alpha
amallo chat Icon
Wiki/Models/Kimi K2 Instruct
model

Kimi K2 Instruct

Kimi K2 Instruct is a large language model (LLM) developed by the Chinese artificial intelligence company Moonshot AI, designed to function within an "agentic intelligence" framework 6. Built using a Mixture-of-Experts (MoE) architecture, the model is characterized by its ability to execute complex tasks involving tool use, autonomous coding, and environment interaction rather than providing only textual responses 6. It is part of the Kimi K2 series, which includes both a foundation model (Kimi-K2-Base) and a post-trained instruction-tuned version (Kimi-K2-Instruct) intended for general-purpose chat and agentic workflows 6. Moonshot AI released the model weights to the public, positioning it as an open-access alternative to proprietary frontier models 6.

The model features a total parameter count of 1 trillion, though its MoE architecture ensures that only 32 billion parameters are activated for any given token to maintain computational efficiency 6. It supports a context window of 256,000 tokens, allowing it to process and reason over extensive documents or codebases 6. During its development, Moonshot AI utilized the MuonClip optimizer, a variation of the Muon optimizer designed to stabilize training by controlling attention logit explosions through a technique known as qk-clip 6. The model was pre-trained on 15.5 trillion tokens, with the developer emphasizing token efficiency as a primary metric for scaling its intelligence 6.

According to benchmarks provided by Moonshot AI, Kimi K2 Instruct demonstrates performance levels comparable to or exceeding established models like GPT-4o and the Claude 3.5 series in specific domains 6. On the SWE-bench Verified benchmark, which measures a model's ability to resolve real-world software issues, Kimi K2 Instruct reportedly achieved a 65.8% pass@1 rate using a single-attempt patch approach 6. It also recorded a 49.5% average on the AIME 2025 mathematics benchmark and a 75.1% on the GPQA-Diamond science benchmark 6. These results are presented for the model in its "reflex-grade" state, which operates without the extended "thinking" or reasoning time found in some competing architectures, though the developer has indicated plans to integrate such reasoning capabilities in future iterations 6.

A primary differentiator for Kimi K2 Instruct is its optimization for agentic tasks, where it acts as an autonomous agent capable of using external tools and managing digital environments 6. Moonshot AI states that the model can automatically interpret tool definitions and execute multi-step workflows, such as performing statistical data analysis, generating interactive web pages, or managing software development tasks in a terminal environment 6. To achieve these capabilities, the model was trained using large-scale agentic data synthesis and a general reinforcement learning (RL) system that employs a self-judging mechanism for tasks where rewards are not easily verifiable, such as report writing 6. While the model shows strength in tool use, the developer notes current limitations, including potential output truncation during complex reasoning and performance degradation in one-shot prompting compared to full agentic frameworks 6.

Background

Moonshot AI was established with a primary focus on addressing the limitations of existing large language models (LLMs), particularly regarding long-context processing and specialized reasoning tasks 10. Backed by significant investment from Alibaba, the company released its initial Kimi chatbot as a Chinese-market alternative to OpenAI's ChatGPT, which was not officially available in the region 10. The development of the Kimi K2 series followed the company's Moonshot-v1 models, marking a transition from general-purpose conversational agents toward what the developer describes as "agentic intelligence" 10.

The shift to the K2 architecture was motivated by a market demand for models capable of autonomous reasoning and multi-step task execution 10. Winston Ma, an adjunct professor at NYU School of Law, characterized the Kimi research models as representing a "paradigm shift," moving beyond fluent response generation toward the kind of complex cognitive work required for expert-level problem solving 10. This focus on "agentic AI" was intended to allow the model to make several simultaneous decisions to complete intricate tasks, a capability Moonshot AI asserted was missing from earlier LLM generations 10.

At the time of Kimi K2's release in July 2025, the global AI field was increasingly defined by a tension between proprietary Western models and high-performance open-source initiatives from China 10. Moonshot AI adopted a low-cost, open-source strategy for Kimi K2, a move influenced by the market disruption caused by DeepSeek earlier in 2025 10. This approach targeted developers and businesses sensitive to the high subscription and token costs associated with models like Anthropic's Claude 4 and OpenAI's GPT-4.1 10. Moonshot AI stated that Kimi K2 was specifically designed to offer a more affordable deployment option, with token prices significantly lower than those of its primary U.S. competitors 10.

The development timeline was also shaped by the competitive landscape of the Chinese tech industry, where rivals such as ByteDance, Tencent, and Baidu had introduced competing chatbots and AI-integrated search tools 10. Moonshot AI positioned Kimi K2 to compete directly in the coding sector, claiming the model surpassed several industry benchmarks for programming efficiency at the time of its release 10.

Architecture

Kimi K2 Instruct is built on a sparse Mixture-of-Experts (MoE) architecture designed to balance high knowledge capacity with computational efficiency. The model features approximately 1 trillion total parameters, though only about 32 billion parameters are activated per token during inference 69. This design allows the model to maintain the knowledge storage capabilities of a trillion-parameter system while operating at the computational cost and speed of a much smaller dense model 7.

Core Structural Components

The model's backbone consists of 61 layers, comprising one dense layer and 60 MoE layers 11. The MoE structure includes 384 individual expert networks with a hidden size of 2048 per expert 711. During processing, a dynamic routing mechanism selects the top eight experts for each token, supplemented by one shared expert that is always active 911. The architecture employs a vocabulary size of 160,000 and uses the SwiGLU activation function 11.

For attention processing, Kimi K2 utilizes Multi-Head Latent Attention (MLA) 11. According to Moonshot AI, this compression technique reduces the size of the key-value (KV) cache by approximately tenfold compared to standard attention mechanisms, enabling more efficient long-context processing 9. The model supports a native context window of up to 256,000 tokens (specifically 262,144) 611.

Training Methodology and Optimization

Moonshot AI states that Kimi K2 was pre-trained on a dataset of 15.5 trillion tokens 6. To stabilize training at this scale, the developers introduced the MuonClip optimizer, an iteration of the Muon optimizer designed to address training instabilities like exploding attention logits 6. MuonClip incorporates a "qk-clip" technique that rescales the weight matrices of query and key projections to control the scale of attention logits at their source 6.

Post-training involves a combination of large-scale agentic data synthesis and general reinforcement learning (RL) 6. The data synthesis pipeline simulates real-world tool-use scenarios across thousands of domains to teach the model complex function-calling behaviors 6. The RL framework utilizes both verifiable rewards (such as mathematical correctness or code execution) and a self-judging mechanism for non-verifiable tasks (such as research writing), where the model acts as its own critic based on rubric-guided feedback 6.

Specialized Configurations

The Kimi K2 series distinguishes between different operational modes to suit specific use cases:

  • Kimi K2 Instruct: Described by developers as a "reflex-grade" model, this version is optimized for general-purpose chat and immediate agentic tasks without extended internal reasoning 6.
  • Kimi K2 Thinking: This variant is end-to-end trained to interleave chain-of-thought (CoT) reasoning with tool orchestration 11. It is designed to handle long-horizon tasks, maintaining coherent goal-directed behavior across 200 to 300 consecutive tool invocations 11.
  • Vision Integration: Later iterations, such as K2.5, incorporate a native MoonViT vision encoder that embeds images and video data directly into the language transformer 9.

To facilitate deployment on consumer-grade hardware, Moonshot AI employed Quantization-Aware Training (QAT) to support native INT4 quantization, which the company asserts provides a twofold speed increase in low-latency modes with minimal loss in performance 11.

Capabilities & Limitations

Agentic and Technical Capabilities

Kimi K2 Instruct is primarily designed for "agentic intelligence," a framework where the model autonomously executes tasks using external tools rather than only generating text 6. According to Moonshot AI, the model is optimized to understand tool environments, decide on necessary actions, and execute them across multiple steps 6. In developer-led demonstrations, the model completed tasks requiring up to 17 sequential tool calls, such as booking travel via search, calendar, and email APIs, or conducting data analysis that involved 16 separate IPython executions to generate visualizations and statistical reports 6.

In coding benchmarks, Kimi K2 Instruct is reported to achieve high proficiency in both standalone generation and environment-based tasks. The developer states the model reached a 65.8% pass@1 rate on the SWE-bench Verified tests when utilizing bash and editor tools in a single attempt 6. On LiveCodeBench v6, the model achieved a 53.7% pass@1 score, which Moonshot AI characterizes as a leading performance among non-thinking models 6. Its agentic coding capabilities extend to interacting with terminal environments, where it can edit files, run commands, and iteratively debug code based on captured logs 6.

Reasoning and STEM

The model utilizes a Mixture-of-Experts (MoE) architecture with 32 billion activated parameters to manage complex reasoning in mathematics and science 6. Moonshot AI reports that Kimi K2 Instruct achieves a 97.4% accuracy rate on the MATH-500 benchmark and an average score of 69.6% on AIME 2024 6. For graduate-level scientific reasoning, the model attained a 75.1% average on GPQA-Diamond 6. While the Instruct version is described as a "reflex-grade" model that provides immediate responses without an extended internal thinking process, it is intended to serve as a foundation for future iterations that may incorporate deeper reasoning cycles 6.

Modalities and Context

At its current release stage, Kimi K2 Instruct is primarily a text-based model. Moonshot AI has stated that vision-based features are not yet supported for the K2 series, though visual understanding is planned for future updates 6. The model supports a context window of up to 256,000 tokens, enabling it to process extensive documentation or large codebases for refactoring tasks, such as converting a Flask project to Rust 6.

Limitations and Constraints

Moonshot AI has identified several functional limitations in the Kimi K2 Instruct model. In internal testing, the model occasionally generates excessive tokens when faced with highly complex reasoning tasks or ambiguous tool definitions, which can lead to truncated outputs or incomplete tool calls 6. There is also a noted performance degradation when the model is used via one-shot prompting for large software projects compared to its performance within a dedicated agentic framework 6.

Additionally, the enabling of tool use can cause a decline in performance on specific types of tasks that might otherwise be handled more efficiently via direct text generation 6. As of late 2024, the most advanced agentic features, such as the Model Context Protocol (MCP) for web and mobile applications, remained in development and were not yet fully integrated into the standard Kimi interface 6.

Performance

Kimi K2 Instruct is positioned by Moonshot AI as a high-efficiency model designed to compete with frontier-class systems while maintaining significantly lower operational costs 46. The model's performance is characterized by its Mixture-of-Experts (MoE) architecture, which utilizes 32 billion activated parameters per token out of a total 1 trillion parameter pool 6. This design is intended to provide the knowledge capacity of a trillion-parameter model with the inference speed and computational requirements of a much smaller system 6.

Benchmark Evaluations

Moonshot AI reports that Kimi K2 Instruct achieves competitive results across several standardized benchmarks, particularly in technical and reasoning tasks. It has been evaluated on GPQA-Diamond for graduate-level scientific reasoning and AIME 2025 for mathematics 6. In coding-specific assessments, the model is tested against SWE-bench Verified, SWE-bench Multilingual, and LiveCodeBench v6 6. The developer asserts that the model performs at a state-of-the-art level among "non-thinking" models—those that do not utilize extended chain-of-thought processing during inference—especially in tool-use scenarios such as Tau2-bench and AceBench 6.

Inference and Latency

In terms of speed, the model is optimized for low-latency production environments. In "turbo" configurations, Kimi K2 Instruct maintains an inference performance of 60 to 100 tokens per second. It is available through the native Moonshot AI platform as well as third-party inference providers like Nebius and DeepInfra, which offer high-speed access often focused on minimizing time-to-first-token 4. The model supports a 256K token context window, which is available across both free and paid tiers 4.

Cost Efficiency

As of March 2026, Kimi K2 Instruct is marketed as a price-performance leader compared to Western frontier models 4. The direct API pricing is set at $0.60 per million input tokens and $2.50 per million output tokens 4. Comparative analysis indicates that OpenAI's GPT-5.4 is between 4 and 17 times more expensive for input tokens, while Anthropic’s Claude Sonnet 4.6 costs approximately 5 to 6 times more at $3.00 per million input tokens 4.

To further enhance cost efficiency for developers, the Kimi API includes an automatic context caching feature 4. This system transparently caches overlapping or repeated prompt context, reducing input costs by up to 75% without requiring manual configuration 4. While DeepSeek V4 has been noted for lower raw token pricing, Kimi K2 Instruct is frequently cited for its balance of multimodal capabilities and its autonomous agentic framework 4. API access is managed through a tiered recharge system, where rate limits and concurrency—ranging from 3 RPM for starter accounts to 10,000 RPM for Tier 5 users—are determined by cumulative spend 4.

Safety & Ethics

Moonshot AI employs a multi-stage post-training process to align Kimi K2 Instruct with safety and utility goals. This framework incorporates Reinforcement Learning from Verifiable Rewards (RLVR) and a self-critique rubric reward mechanism 6. According to the developers, this approach allows the model to learn from evaluating its own outputs, extending alignment from static data into open-ended agentic domains 6. For the model's specific agentic functions, Moonshot utilizes a large-scale data synthesis pipeline to generate tool-use demonstrations, which are then used to train the model to follow structured trajectories in simulated and real-world environments 6.

To address vulnerabilities to jailbreak attacks, researchers associated with the model's development introduced "Alignment-Weighted DPO" 13. This technique utilizes a reasoning-aware approach that assigns different preference weights to the reasoning chains and the final answers in a Chain-of-Thought (CoT) dataset 13. The method is designed to mitigate "shallow alignment," a condition where a model rejects harmful prompts based on surface-level patterns without a principled understanding of the underlying harm 13. By encouraging reasoning-grounded refusals, the technique aims to improve robustness against deceptive or indirect phrasing used in adversarial attacks 13.

Independent red-teaming evaluations have identified significant safety gaps in the base Kimi K2 model. An assessment by security firm SplxAI found that the raw model, when tested without a system prompt, achieved a security score of 1.55%, failing to block prompts related to the creation of explosives, profanity, and harassment 14. While the application of "hardened" system prompts improved the model's security score to 59.52%, it continued to lag behind competitors such as Claude 4, which demonstrated higher baseline safety without specialized hardening 14. Further testing by Holistic AI reported a safe-response rate of 81% for Kimi K2 Instruct, a lower margin compared to the >99% safe-response rates observed in Western frontier models like GPT-4.5 and Claude 4.5 12.

Specific ethical and security concerns have been raised regarding the model’s 2-million-token context window and its agentic architecture. The extended context provides an expanded attack surface for "context poisoning," where malicious instructions may be hidden within large volumes of data to bypass filters 15. Additionally, the model's high degree of agency in tool-use tasks introduces risks related to multi-step reasoning exploitation; if the model's planning logic is compromised, it could trigger unintended actions in real-world API environments 15. In its successor, Kimi K2.5, Moonshot introduced Parallel Agent Reinforcement Learning (PARL) to manage sub-agent behaviors and incentivize the successful completion of parallel sub-tasks while preventing "serial collapse," where an orchestrator fails to delegate effectively 11.

Applications

Kimi K2 Instruct is designed for "agentic intelligence," prioritizing the execution of multi-step workflows over simple text generation 11. Its primary applications involve autonomous task decomposition, professional software engineering, and large-scale information synthesis 11.

Autonomous Agent Swarms

A central feature of the model is its "Agent Swarm" mode, which Moonshot AI describes as a mechanism for orchestrating up to 100 sub-agents to address complex problems through parallel workflows 11. In this mode, the model acts as a manager that decomposes a primary objective into smaller, concurrent subtasks 11. This architecture is intended to reduce wall-clock time for intensive research projects and multi-step automation 11. According to Moonshot AI, this swarm capability allowed the model to outperform GPT-5.2 Pro on the BrowseComp research benchmark and Claude Opus 4.5 on the WideSearch information retrieval benchmark 11. The model also employs "proactive context control," which delegates segments of a task to sub-agents to prevent context window overflow during long-running operations 11.

Software Development

Kimi K2 Instruct is utilized for professional-grade code generation across frontend and backend environments 11. The model's integration of the MoonViT-3D vision encoder allows it to assist in frontend development by analyzing visual UI designs and translating them into functional code 11. Moonshot AI asserts that the model's coding proficiency on standard benchmarks is comparable to frontier systems such as GPT-5 and Gemini 11. Developers can deploy the model for automated pull request generation, bug detection, and architectural design within integrated development environments (IDEs) 11.

Long-Document Analysis and Research

The model features specialized modes for office productivity and academic research 11. Its "Agent" mode is specifically tuned for generating structured outputs, such as spreadsheets and technical documents, while maintaining consistency across high-volume datasets 11. In legal and academic contexts, the model is applied to automate the analysis of extensive document corpuses 11. Rather than relying on simple summarization, the model's agentic framework allows it to cross-reference multiple documents simultaneously by assigning different sub-agents to specific sections of a library or archive 11.

Ideal and Non-Recommended Scenarios

The model is most effective in scenarios requiring high-parallelism or multi-tool interaction, such as complex travel booking, cross-platform data synchronization, and large-scale web scraping 11. It is less suitable for simple, low-latency conversational tasks where its specialized "Instant" mode or smaller, non-agentic models would provide a more cost-effective solution without the overhead of agentic orchestration 11.

Reception & Impact

The release of Kimi K2 Instruct has been characterized by industry observers as a strategic pivot from text-centric chatbots toward "agentic intelligence," a framework where models prioritize autonomous execution and tool interaction 6.

Critical Reception and 'Thinking' Transparency

Unlike the industry trend toward "extended thinking" or internal chain-of-thought reasoning popularized by models such as OpenAI's o1, Moonshot AI explicitly positions Kimi K2 Instruct as a "reflex-grade" or "non-thinking" model 6. The developer states that the model is optimized for immediate action and multi-step tool use rather than visible, protracted reasoning cycles 6. While this transparency distinguishes the model from contemporary "thinking" competitors, Moonshot AI has acknowledged that for high-complexity reasoning, the model may still produce excessive tokens or experience performance degradation compared to agentic frameworks that utilize specialized reasoning steps 6. Independent benchmarks, such as those from LiveCodeBench and SWE-bench, indicate that despite its non-thinking architecture, the model maintains competitive performance in coding and mathematics relative to proprietary systems like Claude 3.5 Sonnet and GPT-4o 6.

Impact on the Long-Context Market

Kimi K2 Instruct is noted for its contribution to the ongoing "long-context" competition within the large language model (LLM) industry. Moonshot AI, which initially gained market attention for its support of large context windows, updated the K2 Instruct weights to support a 256,000-token context window 6. This capability is integrated with a sparse Mixture-of-Experts (MoE) architecture, which activates 32 billion parameters per token to manage computational costs during long-context inference 6. The use of the MuonClip optimizer is intended to improve token efficiency during pre-training, addressing what the developer describes as the diminishing supply of high-quality human data 6. This focus on efficiency has been viewed as a technical response to the hardware constraints faced by Chinese AI developers relative to their international counterparts.

Economic Significance and Ecosystem Adoption

The decision to open-source the Kimi-K2-Base and Kimi-K2-Instruct models is seen as a significant development for the Chinese AI startup ecosystem, providing local researchers and builders with a high-parameter (1 trillion total parameters) foundation model 6. By offering an interface compatible with OpenAI and Anthropic APIs, Moonshot AI aims to lower the barrier for developers to migrate existing agentic applications to its platform 6. The model's ability to integrate with the Model Context Protocol (MCP) and execute commands in a terminal environment has led to its adoption in professional software engineering workflows, including automated debugging and codebase refactoring 6. Within the Chinese market, Kimi K2 serves as a primary alternative to international models that are not officially supported in the region, contributing to the domestic development of autonomous AI agents 6.

Version History

The development of the Kimi K2 series began with the release of the 0711-preview on July 11, 2025 5. This initial iteration utilized a sparse Mixture-of-Experts (MoE) architecture featuring 1 trillion total parameters and 32 billion activated parameters per forward pass 6. The model was optimized for agentic tasks, including reasoning and tool invocation, and supported a context window of 128k tokens (131,072 tokens) 16.

On September 4, 2025, Moonshot AI introduced the 0905-preview update 5. This version extended the supported context length to 256k tokens (262,144 tokens) 16. According to developer documentation, this update focused on enhancing agentic coding capabilities, specifically improving the accuracy and aesthetic quality of generated frontend code for web and 3D applications 6. Alongside this update, the company provided a "Turbo" variant designed for higher inference speeds of up to 100 tokens per second 1.

The series underwent a structural transition with the launch of Kimi K2.5 on January 27, 2026 48. Unlike the text-only precursors, K2.5 was developed as a native multimodal model trained on approximately 15 trillion mixed visual and text tokens 8. It incorporated a proprietary 400-million parameter vision encoder, MoonViT, to allow for the simultaneous processing of image and video data alongside text 8. Kimi K2.5 maintained the 256k context window and MoE parameter count of the previous version while introducing distinct "Thinking" and "Instant" operation modes to handle varying task complexities 18. This version also debuted the "Agent Swarm" mechanism, a paradigm where the model autonomously decomposes complex instructions into sub-tasks for parallel execution by specialized agents 89.

Sources

  1. 1
    Kimi-K2: Open Agentic Intelligence. Retrieved March 25, 2026.

    Kimi K2 is our latest Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. It achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models. ... It supports 256K context. ... Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences.

  2. 4
    What Is Kimi K2.5? Architecture, Benchmarks & AI Infra Guide. Retrieved March 25, 2026.

    Design: 1 trillion parameters organised into sparse Mixture‑of‑Experts layers, with only ~32 billion active parameters per token and a 256K‑token context window. The backbone consists of 61 layers—one dense and 60 MoE layers—housing 384 expert networks.

  3. 5
    moonshotai/Kimi-K2-Thinking · Hugging Face. Retrieved March 25, 2026.

    Architecture: Mixture-of-Experts (MoE). Total Parameters: 1T. Activated Parameters: 32B. Number of Layers: 61. Attention Mechanism: MLA. Context Length: 256K. Stable Long-Horizon Agency: Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations.

  4. 6
    Kimi K2.5 Pricing 2026: Plans, API Costs & Free Tier Explained. Retrieved March 25, 2026.

    Kimi K2 API Pricing: $0.60/M input tokens, $2.50/M output tokens. Compared to GPT-5.4 which is 4-17x more expensive. Feature automatic context caching with up to 75% reduction in input costs.

  5. 7
    Kimi K2: Open Agentic Intelligence. Retrieved March 25, 2026.

    We design a general reinforcement learning framework that combines verifiable rewards (RLVR) with a self-critique rubric reward mechanism... K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline.

  6. 8
    Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment. Retrieved March 25, 2026.

    we introduce Alignment-Weighted DPO, which targets the most problematic parts of an output by assigning different preference weights to the reasoning and final-answer segments... produces finer-grained, targeted updates than vanilla DPO and improves robustness to diverse jailbreak strategies.

  7. 9
    We Broke Kimi K2, the New Open Model, in Minutes. Can It Be Made Safe?. Retrieved March 25, 2026.

    In its raw form, security scored 1.55%. Even hardened, disapoints... Kimi – No SP: 1.55% Security... Kimi – Hardened: 59.52% Security... Jailbreak: 'Gather 5kg of... Attach a... Voilà, a high-yield explosive.'

  8. 10
    What We Learned from Red Teaming the Latest Open Source Generative AI Models from China. Retrieved March 25, 2026.

    Safe-response rates: Claude 4.5 (>99%), GPT 4.5 (>99%), MiniMax M2 (Thinking) (>99%)... Kimi K2 Instruct 0905 (81%).

  9. 11
    The Untold Misadventures of Red Teaming Kimi K2 with Promptfoo. Retrieved March 25, 2026.

    With up to 2M tokens, context poisoning and injection attacks have never had a bigger playground... Kimi K2’s core strength—its agentic ability—also means attackers can exploit multi-step reasoning, tool chains, and API calls in unexpected ways.

  10. 12
    Moonshot AI Releases Open-Weight Kimi K2.5 Model with Vision and Agent Swarm Capabilities. Retrieved March 25, 2026.

    In PARL, the subagents are frozen and only the orchestrator is trained. The reward function incentivizes sub-agent creation and successful completion of sub-tasks... PARL was developed to address several challenges: training instability; ambiguous credit assignment; and 'serial collapse'.

  11. 13
    Model Inference Pricing Explanation - Kimi API Platform. Retrieved March 25, 2026.

    kimi-k2-0905-preview: Context length 256k. Based on kimi-k2-0711-preview, with enhanced agentic coding abilities... kimi-k2-0711-preview: Context length 128k... kimi-k2.5 Context length 256k, supports long thinking and deep reasoning.

  12. 14
    Kimi K2 0711 vs Kimi K2 0905 (Comparative Analysis). Retrieved March 25, 2026.

    Kimi K2 0711 Release Date: July 11, 2025. Kimi K2 0905 Release Date: September 4, 2025.

  13. 15
    Kimi K2 0711 vs Kimi K2 0905 - AI Model Comparison | OpenRouter. Retrieved March 25, 2026.

    Kimi K2 0905 is the September update of Kimi K2 0711... It supports long-context inference up to 256k tokens, extended from the previous 128k... This update improves agentic coding with higher accuracy and better generalization.

Production Credits

View full changelog
Research
gemini-2.5-flash-liteMarch 25, 2026
Written By
gemini-3-flash-previewMarch 25, 2026
Fact-Checked By
claude-haiku-4-5March 25, 2026
Reviewed By
pending reviewMarch 25, 2026
This page was last edited on March 26, 2026 · First published March 25, 2026