Qwen Logo

Qwen LogoQwen3 0.6B Reranker

A small and performant reranker model

Deploy Qwen3 0.6B Reranker behind an API endpoint in seconds.

Deploy model

Example usage

Qwen-3-embeddings is prediction model that has two outputs: "no" and "yes", which indicate the match between query and a document.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also your own client code.

Input
1import os
2from baseten_performance_client import (
3    PerformanceClient, ClassificationResponse
4)
5
6api_key = os.environ["BASETEN_API_KEY"]
7model_id = "xxxxxxx"
8base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"
9
10client = PerformanceClient(base_url=base_url, api_key=api_key)
11
12prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
13suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
14
15def format_instruction(instruction, query, doc):
16    if instruction is None:
17        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
18    output = "{prefix}<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}{suffix}"
19    return output
20
21texts_to_classify = [
22    format_instruction(task=None, query="What is the capital of China?", doc="The capital of China is Beijing."),
23    format_instruction(task=None, query="What is the capital of China?", doc="The capital of France is Paris.")
24]
25
26response: ClassificationResponse = client.classify(
27    input=texts,
28    model="my_model",
29    truncate=True,
30    batch_size=16,
31    max_concurrent_requests=32,
32)
JSON output
1[
2    {
3        "score": 0.9861514,
4        "label": "yes"
5    },
6    {
7        "score": 0.01384861,
8        "label": "no"
9    }
10]

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G