POST
/
chat
/
completions
curl --request POST \
  --url https://http.llm.proxy.prod.s9t.link/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'id: <id>' \
  --data '{
  "model": "llama3_1",
  "messages": [
    {
      "role": "system",
      "content": "Act as a helpful assistant to the user"
    },
    {
      "role": "user",
      "content": "Give me a travel itinerary for Japan"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024,
  "top_p": 0.95,
  "stop": null
}'
{
  "id": "<string>",
  "choices": [
    {
      "delta": {
        "content": "<string>"
      },
      "index": 123,
      "finish_reason": "stop"
    }
  ]
}

Authorizations

Authorization
string
header
required

JWT token for authentication

Headers

id
string
default:f49b2e20-fef3-4441-9358-897f946b8ae2
required

Model UUID for the request (e.g., f49b2e20-fef3-4441-9358-897f946b8ae2 for Llama 8B)

Body

application/json
messages
object[]
required

Array of messages in the conversation

model
enum<string>
default:llama3_1
required

Model identifier

Available options:
llama3_1
stream
boolean
default:false

Whether to stream the response

temperature
number
default:0.7

Sampling temperature

Required range: 0 <= x <= 2
max_tokens
integer
default:1024

Maximum number of tokens to generate

Required range: 1 <= x <= 4096
top_p
number
default:0.95

Nucleus sampling parameter

Required range: 0 <= x <= 1
stop
string[] | null

Sequences where the API will stop generating

Response

200
application/json
Successful chat completion
id
string

Unique identifier for the completion

choices
object[]