Documentation Index
Fetch the complete documentation index at: https://docs.simplismart.ai/llms.txt
Use this file to discover all available pages before exploring further.
Directory Structure
The directory should be archived in a
.zip file and stored in an object storage. Example zip command:cd path/to/dataset_dir && zip -r dataset_dir.zip ./*Dataset Structure
Thetrain.jsonl file should be a list of JSON-like objects, serialised in jsonl format, where each object has exactly three keys:
Example JSONL File
Field Definitions
| Key | Type | Description |
|---|---|---|
| prompt | List[Dict] | Chat-style prompt: a list of role-tagged messages. |
| answer | str | Ground-truth response. |
Prompt Format
Each prompt entry is a list of one or more messages. Minimal single-turn example:Message fields
role: usuallyuser(extend withassistantfor multi-turn data).content:"<question/text> + <control tags>"- Text message from the user or assistant.
Answer Format
The expected answer to the question. Example:Complete Example
Reward functions
Depending on the dataset structure and task objectives, you may need to define reward functions for model training. These reward functions are accepted by the trainer through a specialreward_models.py file. This sections outlines (with examples) the standard method for providing custom reward functions.
Users should define their reward functions in a file named reward_models.py, which must expose a list named reward_functions containing callable functions.
The reward functions list is then passed directly to GRPOTrainer like:
GRPOTrainer(reward_functions=reward_functions, **kwargs)Example implementation
formatting_reward_funcchecks that<think>…</think>and<answer>…</answer>appear exactly once.correctness_reward_funcvalidates the extracted answer matches the ground truth.