Kimi K2

The world's first 1 trillion parameter open source model

Deploy Kimi K2 behind an API endpoint in seconds.

Example usage

Baseten offers Dedicated Deployments and Model APIs for Kimi K2 powered by the Baseten Inference Stack.

Kimi K2 has shown strong performance on agentic tasks thanks to its tool calling, reasoning abilities, and long context handling. But as a large parameter model (1T parameters), it’s also resource-intensive. Running it in production requires a highly optimized inference stack to avoid excessive latency.

Deployments of Kimi are OpenAI-compatible.

Input

1# You can use this model with any of the OpenAI clients in any language!
2# Simply change the API Key to get started
3
4from openai import OpenAI
5
6client = OpenAI(
7    api_key="YOUR_API_KEY",
8    base_url="https://inference.baseten.co/v1"
9)
10
11response = client.chat.completions.create(
12    model="moonshotai/Kimi-K2-Instruct",
13    messages=[
14        {
15            "role": "user",
16            "content": "Implement Hello World in Python"
17        }
18    ],
19    stop=[],
20    stream=True,
21    stream_options={
22        "include_usage": True,
23        "continuous_usage_stats": True
24    },
25    top_p=1,
26    max_tokens=1000,
27    temperature=1,
28    presence_penalty=0,
29    frequency_penalty=0
30)
31
32for chunk in response:
33    if chunk.choices and chunk.choices[0].delta.content is not None:
34        print(chunk.choices[0].delta.content, end="", flush=True)

JSON output

1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}

Kimi K2

Example usage

Deploy any model in just a few commands

Moonshot AI models

Kimi K2

large language models

Llama 4 Scout

Llama 4 Maverick

DeepSeek V3 0324

🔥 Trending models

GPT OSS 120B

GPT OSS 20B

Qwen Image