Try Kimi K2 0905
256K context • Enhanced agentic coding • MoE architecture
Kimi K2 0905 Assistant
256K context • agentic coding
Fast Response
Get instant answers powered by our optimized infrastructure
Privacy First
Your conversations are secure and never used for training
Premium Features
Sign in to unlock API access and unlimited conversations
Kimi K2 0905 at a glance
Scale and long-context capacity for complex coding tasks.
Total Parameters
1T
Mixture-of-Experts architecture
Activated Parameters
32B
Per-token active compute
Context Window
256K
Long-horizon reasoning
Experts
384
8 experts selected per token
Purpose-built for long-horizon coding
K2-Instruct-0905 upgrades the K2 line with longer context and stronger agentic coding performance.
256K Context Window
Expanded from 128K to 256K tokens to support long-horizon tasks.
Enhanced Agentic Coding
Improved performance on public benchmarks and real-world coding agent tasks.
Improved Frontend Coding
Advancements in aesthetics and practicality for frontend programming.
Tool Calling
Strong tool-calling capabilities for autonomous task execution.
Model Summary
Core architecture specifications for K2-Instruct-0905.
61 Layers (1 Dense)
61 layers total with one dense layer included.
MLA Attention
64 attention heads and 7168 hidden dimension.
MoE Hidden Dim 2048
Per-expert hidden dimension of 2048.
SwiGLU Activation
SwiGLU activation function across the model.
160K Vocabulary
Large vocabulary for robust language understanding.
MLA + MoE Routing
8 experts selected per token with 1 shared expert.
Benchmark Highlights
Results reported for K2-Instruct-0905.
SWE-Bench Verified
69.2 ± 0.63 accuracy on verified SWE-Bench.
SWE-Bench Multilingual
55.9 ± 0.72 accuracy on multilingual SWE-Bench.
Terminal-Bench
44.5 ± 2.03 accuracy on Terminal-Bench.
SWE-Dev
66.6 ± 0.72 accuracy on SWE-Dev.
Deployment & Usage
Operational guidance and recommended settings.
OpenAI & Anthropic Compatible API
Accessible via platform.moonshot.ai with OpenAI/Anthropic-compatible formats.
Anthropic Temperature Mapping
Anthropic-compatible API maps real_temperature = request_temperature × 0.6.
Block-FP8 Checkpoints
Model checkpoints are stored in block-fp8 format on Hugging Face.
Recommended Inference Engines
vLLM, SGLang, KTransformers, and TensorRT-LLM.
Recommended Temperature
Suggested temperature = 0.6 for K2-Instruct-0905.
Tool Calling Support
Pass available tools in each request; the model decides when to invoke them.
Kimi K2 0905 FAQ
Key details for builders and researchers.
What is Kimi K2 0905?
Kimi K2-Instruct-0905 is a high-capability MoE language model with 1T total parameters and 32B activated parameters.
What is the context length?
The context window is 256K tokens, expanded from 128K.
What temperature is recommended?
The recommended temperature is 0.6 for general use.
Which inference engines are recommended?
vLLM, SGLang, KTransformers, and TensorRT-LLM are recommended.
Build with Kimi K2 0905
Start with the API or explore pricing to scale usage.