Ultravox v0.5 8B
Ultravox is a multimodal model that can consume both speech and text as input, generating output text as usual. This uses Llama 3.1 8B Instruct as its backbone.
Deploy Ultravox v0.5 8B behind an API endpoint in seconds.
Deploy modelExample usage
Since this is a multimodal model, it accepts text and/or audio. The audio is downloaded from a public source.
The output JSON object contains a key called content
which represents the usual output text.
1from openai import OpenAI
2
3model_id = "jwdp26kw" # Replace with your model ID from Baseten's model dashboard
4
5client = OpenAI(
6 api_key="YOUR-API-KEY",
7 base_url=f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1"
8)
9
10response = client.chat.completions.create(
11 model="",
12 messages=[
13 {
14 "role": "user",
15 "content": [
16 {
17 "type": "text",
18 "text": "What is Lydia like?"
19 },
20 {
21 "type": "audio_url",
22 "audio_url": {"url": "https://baseten-public.s3.us-west-2.amazonaws.com/fred-audio-tests/real.mp3"}
23 }
24 ]
25 }
26 ]
27)
28
29print(response)
1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}