Llama 3.3 70B API

Create chat completion with Llama 70B

curl --request POST \
  --url https://api.simplismart.live/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic",
  "messages": [
    {
      "role": "system",
      "content": "Act as a helpful assistant to the user"
    },
    {
      "role": "user",
      "content": "Give me a travel itinerary for a 7-day trip to Japan"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024,
  "top_p": 0.9,
  "stop": null
}
'

{
  "id": "<string>",
  "choices": [
    {
      "delta": {
        "content": "<string>"
      },
      "index": 123,
      "finish_reason": "stop"
    }
  ]
}

POST

chat

completions

Create chat completion with Llama 70B

curl --request POST \
  --url https://api.simplismart.live/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic",
  "messages": [
    {
      "role": "system",
      "content": "Act as a helpful assistant to the user"
    },
    {
      "role": "user",
      "content": "Give me a travel itinerary for a 7-day trip to Japan"
    }
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024,
  "top_p": 0.9,
  "stop": null
}
'

{
  "id": "<string>",
  "choices": [
    {
      "delta": {
        "content": "<string>"
      },
      "index": 123,
      "finish_reason": "stop"
    }
  ]
}

Authorizations

Authorization

string

header

required

JWT token for authentication

Body

application/json

messages

object[]

required

Array of messages in the conversation

Show child attributes

model

enum<string>

default:RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic

required

Model identifier

Available options:

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic

stream

boolean

default:false

Whether to stream the response

temperature

number

default:0.7

Sampling temperature

Required range: 0 <= x <= 2

max_tokens

integer

default:1024

Maximum number of tokens to generate

Required range: 1 <= x <= 4096

top_p

number

default:0.95

Nucleus sampling parameter

Required range: 0 <= x <= 1

stop

string[] | null

Sequences where the API will stop generating

Response

Successful chat completion

string

Unique identifier for the completion

choices

object[]

Show child attributes

Llama 3.1 8B API Llama 4 Maverick 17B API

⌘I

API Documentation

Inference APIs

Training APIs

Authorizations

Body

Response