Qwen Logo

Qwen LogoQwen3 8B Embedding

Leading open-source model for embeddings

Deploy Qwen3 8B Embedding behind an API endpoint in seconds.

Deploy model

Example usage

Qwen-3-embeddings is a text-embeddings model, producing a 1D embeddings vector, given an input. It's frequently used for downstream tasks like clustering, used with vector databases.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also the OpenAI embeddings client.

Input
1import os
2from baseten_performance_client import (
3    PerformanceClient, OpenAIEmbeddingsResponse, ClassificationResponse
4)
5
6api_key = os.environ.get("BASETEN_API_KEY")
7model_id = "yqv0rjjw"
8base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"
9
10client = PerformanceClient(base_url=base_url, api_key=api_key)
11
12def format_query(task_description: str, query: str, document: str) -> str:
13    # qwen-3-embedding style qeury formatting..
14    return f'Instruct: {task_description}\nQuery:{query}'
15
16task = 'Given a web search query, retrieve relevant passages that answer the query'
17texts = [
18    get_detailed_instruct(task, 'Explain gravity'),
19    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
20]
21response: OpenAIEmbeddingsResponse = client.embed(
22    input=texts,
23    model="my_model",
24    batch_size=16,
25    max_concurrent_requests=32,
26)
27array = response.numpy()
JSON output
1{
2    "data": [
3        {
4            "embedding": [
5                0
6            ],
7            "index": 0,
8            "object": "embedding"
9        }
10    ],
11    "model": "thenlper/gte-base",
12    "object": "list",
13    "usage": {
14        "prompt_tokens": 512,
15        "total_tokens": 512
16    }
17}

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G