Cogito v2 671B
SOTA 671B MoE model trained for better outputs from shorter reasoning chains
Deploy Cogito v2 671B behind an API endpoint in seconds.
Deploy modelExample usage
Cogito v2 671B, currently in preview, is a frontier LLM that offers SOTA intelligence with shorter reasoning chains thanks to Iterated Distillation & Amplification (IDA), a novel research technique that distills improvements from inference-time reasoning back into the model weights.
Thanks to IDA, Cogito models arrive at strong results using fewer reasoning tokens, improving their cost and speed in real-world agents and applications.
1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7 api_key=os.environ['BASETEN_API_KEY'],
8 base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13 model="",
14 messages=[
15 {"role": "user", "content": "Write FizzBuzz."}
16 ],
17 temperature=0.6,
18 max_tokens=1000,
19)
20print(response_chat)
1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}