Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
https://github.com/agentscope-ai/Trinity-RFT.git
δΈζδΈ»ι‘΅ | Tutorial | FAQ
Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three components that work in coordination:
[!TIP]
Recommended Learning Paths
> π New users: Installation β Quick Start (GSM8K) β Configuration Guide β GPU Resource Guide
> π¬ Algorithm researchers: Developer Guide β Algorithm Development Guide β CHORD Algorithm Example
> π€ Agent developers: Developer Guide β Workflow Development β General Multi-step Workflow Example
[!NOTE]
For more tutorials, please refer to the Trinity-RFT documentation.
| Algorithm | Doc / Example | Source Code | Key Configurations |
|---|---|---|---|
| PPO [Paper] | [Doc] [Countdown Example] | [Code] | algorithm_type: ppo |
| GRPO [Paper] | [Doc] [GSM8K Example] | [Code] | algorithm_type: grpo |
| SFT | [Mixture-of-Thoughts Example] | [Code] | algorithm_type: sft |
| DPO [Paper] | [HumanLike Example] | [Code] | algorithm_type: dpo |
| CHORD π‘ [Paper] | [Doc] [ToolACE Example] | [Code] | algorithm_type: mix_chord |
| REC Series π‘ [Paper] | [GSM8K Example] | [Code] | algorithm_type: rec |
| RLOO [Paper] | - | [Code] | algorithm_type: rloo |
| REINFORCE++ [Paper] | - | [Code] | algorithm_type: reinforceplusplus |
| GSPO [Paper] | - | [Code] | algorithm_type: gspo |
| TOPR [Paper] | [GSM8K Example] | [Code] | algorithm_type: topr |
| sPPO [Paper] | [GSM8K Example] | [Code] | algorithm_type: sppo |
| AsymRE [Paper] | [GSM8K Example] | [Code] | algorithm_type: asymre |
| CISPO [Paper] | - | [Code] | algorithm_type: cispo |
| SAPO [Paper] | - | [Code] | algorithm_type: sapo |
| On-Policy Distillation [Blog] [Paper] | [GSM8K Example] | [Code] | algorithm_type: on_policy_distill |
| JSD (Jensen-Shannon Divergence) | [GSM8K Example] | [Code] | algorithm_type: jsd |
[!NOTE]
This project is currently under active development. Comments and suggestions are welcome!
If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend.
# Create and activate environment
python3.10 -m venv .venv
source .venv/bin/activate
# Install Trinity-RFT with CPU-only backend
pip install -e ".[tinker]"
Run a simple example:
trinity run --config examples/tinker/tinker.yaml
This example is designed to run on CPU-only machines. See the complete Tinker training example for more details.
To run Trinity-RFT on GPU machines instead, please follow the steps below.
Before installing, make sure your system meets the following requirements:
If you plan to customize or contribute to Trinity-RFT, this is the best option.
First, clone the repository:
git clone https://github.com/agentscope-ai/Trinity-RFT
cd Trinity-RFT
Then, set up environment via one of the following options:
Using Pre-built Docker Image (Recommended for Beginners)
docker pull ghcr.io/agentscope-ai/trinity-rft:latest
# Run the container, replacing <path_to_your_data_and_checkpoints> with your actual path
docker run -it \
--gpus all \
--shm-size="64g" \
--rm \
-v $PWD:/workspace \
-v <path_to_your_data_and_checkpoints>:/data \
ghcr.io/agentscope-ai/trinity-rft:latest
This image has useduvto install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it viasource /opt/venv/bin/activateif needed). You can useuv pip installto add extra packages as necessary.
Using Conda
conda create -n trinity python=3.12
conda activate trinity
pip install -e ".[vllm,flash_attn]"
# If you have no GPU, comment out the line above and uncomment this instead:
# pip install -e ".[tinker]"
# If you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
pip install -e ".[dev]" # for development like linting and debugging
Using venv
python3.10 -m venv .venv
source .venv/bin/activate
pip install -e ".[vllm,flash_attn]"
# If you have no GPU, comment out the line above and uncomment this instead:
# pip install -e ".[tinker]"
# If you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
pip install -e ".[dev]" # for development like linting and debugging
Using uv
uv sync --extra vllm --extra dev --extra flash_attn
# If you have no GPU, try to use Tinker instead:
# uv sync --extra tinker --extra dev
If you just want to use the package without modifying the code:
pip install trinity-rft
pip install flash-attn==2.8.1
Or with uv:
uv pip install trinity-rft
uv pip install flash-attn==2.8.1
For training with Megatron-LM, please refer to Megatron-LM Backend.
Trinity-RFT supports most datasets and models from Huggingface and ModelScope.
Prepare the model in the local directory $MODEL_PATH/{model_name}:
# Using Huggingface
huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
# Using Modelscope
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
For more details about model downloading, see Huggingface or ModelScope.
Prepare the dataset in the local directory $DATASET_PATH/{dataset_name}:
# Using Huggingface
huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}
# Using Modelscope
modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
For more details about dataset downloading, see Huggingface or ModelScope.
Trinity-RFT provides a web interface for configuring your RFT process.
[!NOTE]
This is an experimental feature, and we will continue to improve it.
To launch the web interface for minimal configurations, you can run
trinity studio --port 8080
Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section.
Advanced users can also edit the config file directly.
We provide example config files in examples.
For complete GUI features, please refer to the monorepo for Trinity-Studio.
Start a ray cluster:
# On master node
ray start --head
# On worker nodes
ray start --address=<master_address>
(Optional) You may use Wandb / TensorBoard / MLFlow for better monitoring. Please refer to this documentation for the corresponding configurations. For example, to log in to Wandb:
export WANDB_API_KEY=<your_api_key>
wandb login
For command-line users, run the RFT process:
trinity run --config <config_path>
Example β fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
For studio users, click "Run" in the web interface.
This project is currently under active development--star the repo to watch releases for the latest updates!
We welcome all kinds of contributions from the community, including:
See CONTRIBUTING.md for detailed contribution guidelines, as well as our good-first-issue list.
This project is built upon many excellent open-source projects, including:
@misc{trinity-rft,
title={Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models},
author={Xuchen Pan and Yanxi Chen and Yushuo Chen and Yuchang Sun and Daoyuan Chen and Wenhao Zhang and Yuexiang Xie and Yilun Huang and Yilei Zhang and Dawei Gao and Yaliang Li and Bolin Ding and Jingren Zhou},
year={2025},
eprint={2505.17826},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.17826},
}