API Documentation
Overview
The Kimi K2 API provides programmatic access to the Kimi K2 language model. This API supports both OpenAI and Anthropic message formats, allowing seamless integration with existing applications.
Base URL
https://kimi-k2.ai/api
Supported Protocols
- REST API over HTTPS
- JSON request and response bodies
- UTF-8 character encoding
- CORS support for browser-based applications
Quick Start
Get started with the Kimi K2 API in three steps:
- Create an account and receive 100 free credits
- Generate an API key from your dashboard
- Make your first request (1 credit per request)
Example Request
curl https://kimi-k2.ai/api/v1/chat/completions \
-H "Authorization: Bearer $KIMI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2-0905",
"messages": [{"role": "user", "content": "Hello"}]
}'
Authentication
API Keys
Authentication is performed using API keys. Include your API key in the request header:
Authorization: Bearer YOUR_API_KEY
Or for Anthropic-compatible endpoints:
X-API-Key: YOUR_API_KEY
Authentication Methods
| Method | Header | Format | Endpoints |
|---|---|---|---|
| Bearer Token | Authorization | Bearer YOUR_API_KEY | /v1/chat/completions |
| API Key | X-API-Key | YOUR_API_KEY | /v1/messages |
API Reference
List Models
List all available models that can be used with the API.
List Available Models
GET /v1/models
Returns a list of models available for use with the API.
Response Format
{
"object": "list",
"data": [
{
"id": "kimi-k2",
"object": "model",
"created": 1735785600,
"owned_by": "moonshot-ai",
"permission": [...],
"root": "kimi-k2",
"parent": null
},
{
"id": "kimi-k2-0905",
"object": "model",
"created": 1735785600,
"owned_by": "moonshot-ai",
"permission": [...],
"root": "kimi-k2-0905",
"parent": null
}
]
}
Response Fields
| Field | Type | Description |
|---|---|---|
object | string | Always list |
data | array | List of available models |
data[].id | string | Model identifier to use in API requests |
data[].object | string | Always model |
data[].owned_by | string | Organization that owns the model |
Chat Completions
The Chat Completions API generates model responses for conversations. This endpoint is compatible with OpenAI's API format.
Create Completion
POST /v1/chat/completions
Generates a model response for the given conversation.
Request Format
{
"model": "kimi-k2-0905",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing"
}
],
"temperature": 0.7,
"max_tokens": 2048,
"top_p": 1.0,
"frequency_penalty": 0,
"presence_penalty": 0,
"stream": false,
"n": 1
}
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | Model identifier. Use kimi-k2 |
messages | array | Yes | - | Input messages. Each message has a role and content |
temperature | number | No | 0.6 | Sampling temperature between 0 and 2. Lower values make output more deterministic |
max_tokens | integer | No | 1024 | Maximum tokens to generate. Model maximum is 128000 |
top_p | number | No | 1.0 | Nucleus sampling threshold. Alternative to temperature |
frequency_penalty | number | No | 0 | Penalize repeated tokens. Range: -2.0 to 2.0 |
presence_penalty | number | No | 0 | Penalize tokens based on presence. Range: -2.0 to 2.0 |
stream | boolean | No | false | Stream responses incrementally |
n | integer | No | 1 | Number of completions to generate |
stop | string/array | No | null | Stop sequences (up to 4) |
user | string | No | null | Unique identifier for end-user tracking |
Message Object
| Field | Type | Description |
|---|---|---|
role | string | One of: system, user, assistant |
content | string | Message content |
Response Format
{
"id": "chatcmpl-9d4c2f68-5e3a-4b2f-a3c9-7d8e6f5c4b3a",
"object": "chat.completion",
"created": 1709125200,
"model": "kimi-k2-0905",
"system_fingerprint": "fp_a7c4d3e2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing leverages quantum mechanical phenomena..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 189,
"total_tokens": 214
}
}
Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique request identifier |
object | string | Object type: chat.completion |
created | integer | Unix timestamp |
model | string | Model used |
choices | array | Generated completions |
usage | object | Token usage statistics |
Finish Reasons
| Value | Description |
|---|---|
stop | Natural end of message or stop sequence reached |
length | Maximum token limit reached |
Streaming
Server-sent events format when stream: true:
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" there"},"index":0}]}
data: [DONE]
Messages
The Messages API provides Anthropic-compatible message generation.
Create Message
POST /v1/messages
Creates a model response using the Messages format.
Request Format
{
"model": "kimi-k2-0905",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"max_tokens": 1024,
"system": "You are a knowledgeable geography assistant.",
"temperature": 0.7,
"top_p": 1.0,
"stop_sequences": ["\n\nHuman:"]
}
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | Model identifier |
messages | array | Yes | - | Conversation messages (user/assistant only) |
max_tokens | integer | Yes | - | Maximum tokens to generate |
system | string | No | null | System prompt for behavior guidance |
temperature | number | No | 0.6 | Sampling temperature (0-1) |
top_p | number | No | 1.0 | Nucleus sampling threshold |
stop_sequences | array | No | null | Stop generation sequences (max 4) |
stream | boolean | No | false | Enable streaming responses |
metadata | object | No | null | Request metadata |
Response Format
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"model": "kimi-k2-0905",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 15,
"output_tokens": 9
}
}
Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique message identifier |
type | string | Object type: message |
role | string | Always assistant |
content | array | Message content blocks |
model | string | Model used |
stop_reason | string | Why generation stopped |
usage | object | Token usage |
System Prompts
System prompts in the Messages API are specified separately:
{
"system": "You are Claude, an AI assistant created by Anthropic.",
"messages": [
{"role": "user", "content": "Hello"}
],
"max_tokens": 1024
}
Models
Available Models
| Model ID | Context Window | Description |
|---|---|---|
kimi-k2 | 128,000 tokens | Primary model for chat completions |
kimi-k2-0905 | 256,000 tokens | Primary model for chat completions |
Model Selection
Both model identifiers point to the same underlying Kimi K2 model. Use the appropriate identifier based on your API format:
- OpenAI format: Use
kimi-k2-0905 - Anthropic format: Use
kimi-k2-0905
Request Limits
Rate Limits
Rate limits are applied per API key based on credit balance:
| Credit Balance | Requests/Minute | Requests/Hour | Requests/Day |
|---|---|---|---|
| 1-100 | 20 | 600 | 5,000 |
| 101-1,000 | 60 | 2,000 | 20,000 |
| 1,001-10,000 | 200 | 6,000 | 50,000 |
| 10,000+ | 500 | 15,000 | 100,000 |
Rate limit headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
X-RateLimit-Reset: 1709125800
Token Limits
| Limit Type | Value |
|---|---|
| Maximum input tokens | 128,000 |
| Maximum output tokens | 8,192 |
| Maximum total tokens | 128,000 |
Timeout Settings
| Timeout Type | Duration |
|---|---|
| Connection timeout | 30 seconds |
| Read timeout | 600 seconds |
| Stream timeout | 600 seconds |
Error Codes
HTTP Status Codes
| Status | Meaning |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API key |
| 403 | Forbidden - Insufficient credits or permissions |
| 404 | Not Found - Invalid endpoint |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Internal Server Error |
| 503 | Service Unavailable |
Error Types
OpenAI Format Errors
{
"error": {
"message": "Invalid API key provided",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
| Error Code | Type | Description |
|---|---|---|
invalid_api_key | invalid_request_error | API key is invalid or malformed |
insufficient_credits | insufficient_quota | Credit balance is insufficient |
rate_limit_exceeded | rate_limit_error | Too many requests |
invalid_request | invalid_request_error | Request validation failed |
model_not_found | invalid_request_error | Specified model doesn't exist |
context_length_exceeded | invalid_request_error | Input exceeds context window |
Anthropic Format Errors
{
"type": "error",
"error": {
"type": "authentication_error",
"message": "Invalid API key"
}
}
| Error Type | Description |
|---|---|
authentication_error | Authentication failed |
invalid_request_error | Request validation failed |
rate_limit_error | Rate limit exceeded |
api_error | Server-side error |
Error Handling
Implement exponential backoff with jitter for retries:
import time
import random
def retry_with_backoff(
func,
max_retries=3,
base_delay=1,
max_delay=60
):
for attempt in range(max_retries):
try:
return func()
except RateLimitError:
if attempt == max_retries - 1:
raise
delay = min(
base_delay * (2 ** attempt) + random.uniform(0, 1),
max_delay
)
time.sleep(delay)
Client Libraries
Python
Installation
pip install openai
# or
pip install anthropic
OpenAI Client
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://kimi-k2.ai/api/v1"
)
# List available models
models = client.models.list()
for model in models.data:
print(f"Model ID: {model.id}")
# Create chat completion
response = client.chat.completions.create(
model="kimi-k2",
messages=[
{"role": "user", "content": "Hello"}
]
)
Anthropic Client
from anthropic import Anthropic
client = Anthropic(
api_key="YOUR_API_KEY",
base_url="https://kimi-k2.ai/api/v1"
)
response = client.messages.create(
model="kimi-k2",
messages=[
{"role": "user", "content": "Hello"}
],
max_tokens=1024
)
Node.js
Installation
npm install openai
# or
npm install @anthropic-ai/sdk
OpenAI Client
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.KIMI_API_KEY,
baseURL: 'https://kimi-k2.ai/api/v1',
});
// List available models
const models = await openai.models.list();
for (const model of models.data) {
console.log(`Model ID: ${model.id}`);
}
// Create chat completion
const response = await openai.chat.completions.create({
model: 'kimi-k2-0905',
messages: [{ role: 'user', content: 'Hello' }],
});
Anthropic Client
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.KIMI_API_KEY,
baseURL: 'https://kimi-k2.ai/api/v1',
});
const response = await anthropic.messages.create({
model: 'kimi-k2-0905',
messages: [{ role: 'user', content: 'Hello' }],
max_tokens: 1024,
});
Go
Installation
go get github.com/sashabaranov/go-openai
Example
package main
import (
"context"
"fmt"
openai "github.com/sashabaranov/go-openai"
)
func main() {
config := openai.DefaultConfig("YOUR_API_KEY")
config.BaseURL = "https://kimi-k2.ai/api/v1"
client := openai.NewClientWithConfig(config)
resp, err := client.CreateChatCompletion(
context.Background(),
openai.ChatCompletionRequest{
Model: "kimi-k2",
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
Content: "Hello",
},
},
},
)
if err != nil {
panic(err)
}
fmt.Println(resp.Choices[0].Message.Content)
}
REST API
Direct HTTP requests without client libraries:
cURL
curl -X POST https://kimi-k2.ai/api/v1/chat/completions \
-H "Authorization: Bearer $KIMI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2",
"messages": [
{"role": "user", "content": "Hello"}
]
}'
Python (requests)
import requests
response = requests.post(
"https://kimi-k2.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "kimi-k2",
"messages": [{"role": "user", "content": "Hello"}]
}
)
Node.js (fetch)
const response = await fetch('https://kimi-k2.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'kimi-k2-0905',
messages: [{ role: 'user', content: 'Hello' }],
}),
});
Billing
Credit System
API usage is billed through a credit system:
- 1 credit = 1 API request
- Credits are deducted upon successful completion
- Failed requests (4xx errors) are not charged
- Server errors (5xx) are not charged
- New users receive 100 free credits upon registration
- Invite rewards:
- 50 credits when someone registers with your invite code
- 500 credits when an invited user makes their first payment
Credit Packages
| Package | Credits | Price | Per Credit | Validity |
|---|---|---|---|---|
| Starter | 500 | $4.99 | $0.0099 | No expiration |
| Standard | 5,000 | $29.99 | $0.0060 | 1 month |
| Premium | 20,000 | $59.99 | $0.0030 | 1 month |
| Enterprise | Custom | Contact sales | Custom | Custom |
Usage Tracking
Monitor your usage through:
- Response headers:
X-Credits-Remaining: 4523 - Dashboard: Real-time usage statistics at /my-credits
- API endpoint:
GET /api/user/credits
Usage data includes:
- Total credits consumed
- Credits remaining
- Usage by day/hour
- Average tokens per request
Migration Guide
From OpenAI
Migrating from OpenAI API requires minimal changes:
-
Update base URL:
# From client = OpenAI(api_key="sk-...") # To client = OpenAI( api_key="sk-...", base_url="https://kimi-k2.ai/api/v1" ) -
Update model name:
# From model="gpt-4" # To model="kimi-k2-0905" -
No other changes required - The API is fully compatible
From Anthropic
Migrating from Anthropic API:
-
Update base URL:
# From client = Anthropic(api_key="sk-ant-...") # To client = Anthropic( api_key="sk-...", base_url="https://kimi-k2.ai/api/v1" ) -
Update authentication:
- Generate API key from Kimi K2 dashboard
- Replace Anthropic API key
-
Model compatibility:
Kimi K2is supported
Changelog
2025-01-30
- Added Anthropic Messages API compatibility
- Introduced X-API-Key authentication method
- Enhanced error response formats
2025-01-15
- Initial API release
- OpenAI Chat Completions compatibility
- 128K context window support
- Credit-based billing system
2025-09-05
- 256K context window support
- kimi-k2-0905 model support