Model library / Qwen / Qwen3 8B Embedding

Qwen3 8B Embedding

Leading open-source model for embeddings

Deploy Qwen3 8B Embedding behind an API endpoint in seconds.

Example usage

Qwen-3-embeddings is a text-embeddings model, producing a 1D embeddings vector, given an input. It's frequently used for downstream tasks like clustering, used with vector databases.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also the OpenAI embeddings client.

Input

1import os
2from baseten_performance_client import (
3    PerformanceClient, OpenAIEmbeddingsResponse, ClassificationResponse
4)
5
6api_key = os.environ.get("BASETEN_API_KEY")
7model_id = "yqv0rjjw"
8base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"
9
10client = PerformanceClient(base_url=base_url, api_key=api_key)
11
12def format_query(task_description: str, query: str, document: str) -> str:
13    # qwen-3-embedding style qeury formatting..
14    return f'Instruct: {task_description}\nQuery:{query}'
15
16task = 'Given a web search query, retrieve relevant passages that answer the query'
17texts = [
18    get_detailed_instruct(task, 'Explain gravity'),
19    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
20]
21response: OpenAIEmbeddingsResponse = client.embed(
22    input=texts,
23    model="my_model",
24    batch_size=16,
25    max_concurrent_requests=32,
26)
27array = response.numpy()

JSON output

1{
2    "data": [
3        {
4            "embedding": [
5                0
6            ],
7            "index": 0,
8            "object": "embedding"
9        }
10    ],
11    "model": "thenlper/gte-base",
12    "object": "list",
13    "usage": {
14        "prompt_tokens": 512,
15        "total_tokens": 512
16    }
17}

Qwen3 8B Embedding

Example usage

Deploy any model in just a few commands

Qwen models

Qwen3 235B 2507

Qwen3 Coder 480B

Qwen3 Coder 30B

embedding models

Qwen3 8B Reranker

Qwen3 8B Embedding

Tulu 3 8B Reward

🔥 Trending models

GPT OSS 120B

GPT OSS 20B

Qwen Image