πŸ“¦ agentscope-ai / Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

β˜… 535 stars β‘‚ 55 forks πŸ‘ 535 watching βš–οΈ Apache License 2.0
agentllmrlhf
πŸ“₯ Clone https://github.com/agentscope-ai/Trinity-RFT.git
HTTPS git clone https://github.com/agentscope-ai/Trinity-RFT.git
SSH git clone git@github.com:agentscope-ai/Trinity-RFT.git
CLI gh repo clone agentscope-ai/Trinity-RFT
Yuchang Sun Yuchang Sun [Example] Clip_B and Clip_V from entropy dynamics (#509) e8be774 3 days ago πŸ“ History
πŸ“‚ main View all commits β†’
πŸ“ .github
πŸ“ benchmark
πŸ“ docs
πŸ“ environments
πŸ“ examples
πŸ“ scripts
πŸ“ tests
πŸ“ trinity
πŸ“„ .flake8
πŸ“„ .gitignore
πŸ“„ CONTRIBUTING.md
πŸ“„ LICENSE
πŸ“„ pyproject.toml
πŸ“„ README_zh.md
πŸ“„ README.md
πŸ“„ setup.py
πŸ“„ README.md

δΈ­ζ–‡δΈ»ι‘΅ | Tutorial | FAQ

Trinity-RFT

Trinity-RFT: A General-Purpose and Unified Framework for
Reinforcement Fine-Tuning of Large Language Models

paper doc pypi license

πŸ’‘ What is Trinity-RFT?

Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three components that work in coordination:

  • Explorer generates experience data via agent-environment interaction;
  • Trainer updates model weights by minimizing losses on the data;
  • Buffer pipelines data processing throughout the RFT lifecycle.
Trinity-RFT provides functionalities for users with different backgrounds and objectives:

  • πŸ€– Agent application developers: Train LLM-powered agents and improve their capabilities in specific domains [[tutorial]](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
  • 🧠 Reinforcement learning researchers: Design, implement and validate new RL algorithms using compact, plug-and-play modules that allow non-invasive customization [[tutorial]](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
  • πŸ“Š Data engineers: Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)

πŸš€ News

  • [2026-02] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.1) Trinity-RFT v0.5.1 released: Enhanced VLM support, logging improvements, bug fixes.
  • [2026-02] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.0) Trinity-RFT v0.5.0 released: colocate mode for single-GPU scenarios, trainer driven weight synchronization, automatic parallelism setting suggestion, and more.
  • [2026-01] πŸŽ‰ Three papers accepted by ICLR 2026: CHORD, BOTS, and Group-relative REINFORCE variants. Try out these new algorithms in Trinity-RFT!
  • [2026-01] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 released: upgraded verl to v0.7.0, Tinker backend supports OpenAI API, bug fixes.
  • [2026-01] Introducing R3L: a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning (paper).
  • [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added Tinker backend for users without GPUs, add more benchmarks, enhance online RL and more.
  • [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations (News).
  • [2025-11] Introducing Learn-to-Ask: a framework for training proactive dialogue agents from offline expert data (paper).
  • [2025-11] Introducing BOTS: online RL task selection for efficient LLM fine-tuning (paper).
  • [2025-09] Our paper reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE (implementation).
  • [2025-08] Introducing CHORD: dynamic SFT + RL integration for advanced LLM fine-tuning (paper).
More...
    • [2025-11] Trinity-RFT v0.3.3 released: bug fixes.
    • [2025-11] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.
    • [2025-10] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
    • [2025-09] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
    • [2025-08] Trinity-RFT v0.2.1 released.
    • [2025-07] Trinity-RFT v0.2.0 released.
    • [2025-07] Technical report (arXiv v2) updated with new features, examples, and experiments: link.
    • [2025-06] Trinity-RFT v0.1.1 released.
    • [2025-04] Trinity-RFT open sourced.

πŸ”¨ Tutorials and Guidelines

CategoryTutorial / Guideline
Run diverse RFT modesβ€’ Quick start: GRPO on GSM8k
β€’ Off-policy RFT
β€’ Fully asynchronous RFT
β€’ Offline learning by DPO or SFT
β€’ RFT without local GPU (Tinker Backend)
Multi-step agentic RLβ€’ Concatenated multi-turn workflow
β€’ General multi-step workflow
β€’ ReAct workflow with an agent framework
β€’ Example: train a web-search agent
Full-lifecycle data pipelinesβ€’ Rollout task mixing and selection
β€’ Online task curriculum (πŸ“ paper)
β€’ Research project: learn-to-ask (πŸ“ paper)
β€’ Experience replay with prioritization
β€’ Advanced data processing & human-in-the-loop
Algorithm developmentβ€’ RL algorithm development with Trinity-RFT (πŸ“ paper)
β€’ Research project: R3L (reflect-then-retry RL) (πŸ“ paper)
β€’ Research project: group-relative REINFORCE (πŸ“ paper)
β€’ Non-verifiable domains: RULER, trainable RULER, rubric-as-reward
Benchmarksβ€’ Benchmark toolkit (quick verification & experimentation)
β€’ Guru-Math benchmark & comparison with veRL
β€’ FrozenLake benchmark & comparison with rLLM
β€’ Alfworld benchmark & comparison with rLLM
Going deeper into Trinity-RFTβ€’ Full configurations
β€’ GPU resource and training configuration guide
β€’ Training VLM
β€’ Understand the coordination between explorer and trainer
β€’ How to align configuration with veRL
[!TIP]
Recommended Learning Paths
> πŸ†• New users: Installation β†’ Quick Start (GSM8K) β†’ Configuration Guide β†’ GPU Resource Guide
> πŸ”¬ Algorithm researchers: Developer Guide β†’ Algorithm Development Guide β†’ CHORD Algorithm Example
> πŸ€– Agent developers: Developer Guide β†’ Workflow Development β†’ General Multi-step Workflow Example
[!NOTE]
For more tutorials, please refer to the Trinity-RFT documentation.

🌟 Key Features

  • Flexible RFT Modes:
  • Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
  • Rollout and training can run separately and scale independently across devices.
  • Boost sample and time efficiency by experience replay.
RFT modes supported by Trinity-RFT

  • Agentic RL Support:
  • Supports both concatenated and general multi-step agentic workflows.
  • Able to directly train agent applications developed using agent frameworks like AgentScope.
Agentic workflows

  • Full-Lifecycle Data Pipelines:
  • Enables pipeline processing of rollout tasks and experience samples.
  • Active data management (prioritization, cleaning, augmentation, etc.) throughout the RFT lifecycle.
  • Native support for multi-task joint learning and online task curriculum construction.
Data pipeline design

  • User-Friendly Design:
  • Plug-and-play modules and decoupled architecture, facilitating easy adoption and development.
  • Rich graphical user interfaces enable low-code usage.
System architecture

πŸ”§ Supported Algorithms

AlgorithmDoc / ExampleSource CodeKey Configurations
PPO [Paper][Doc] [Countdown Example][Code]algorithm_type: ppo
GRPO [Paper][Doc] [GSM8K Example][Code]algorithm_type: grpo
SFT[Mixture-of-Thoughts Example][Code]algorithm_type: sft
DPO [Paper][HumanLike Example][Code]algorithm_type: dpo
CHORD πŸ’‘ [Paper][Doc] [ToolACE Example][Code]algorithm_type: mix_chord
REC Series πŸ’‘ [Paper][GSM8K Example][Code]algorithm_type: rec
RLOO [Paper]-[Code]algorithm_type: rloo
REINFORCE++ [Paper]-[Code]algorithm_type: reinforceplusplus
GSPO [Paper]-[Code]algorithm_type: gspo
TOPR [Paper][GSM8K Example][Code]algorithm_type: topr
sPPO [Paper][GSM8K Example][Code]algorithm_type: sppo
AsymRE [Paper][GSM8K Example][Code]algorithm_type: asymre
CISPO [Paper]-[Code]algorithm_type: cispo
SAPO [Paper]-[Code]algorithm_type: sapo
On-Policy Distillation [Blog] [Paper][GSM8K Example][Code]algorithm_type: on_policy_distill
JSD (Jensen-Shannon Divergence)[GSM8K Example][Code]algorithm_type: jsd

Table of Contents


Quick Start

[!NOTE]
This project is currently under active development. Comments and suggestions are welcome!

Minimal CPU-Only Quick Start

If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend.

# Create and activate environment
python3.10 -m venv .venv
source .venv/bin/activate

# Install Trinity-RFT with CPU-only backend
pip install -e ".[tinker]"

Run a simple example:

trinity run --config examples/tinker/tinker.yaml

This example is designed to run on CPU-only machines. See the complete Tinker training example for more details.

To run Trinity-RFT on GPU machines instead, please follow the steps below.

Step 1: Installation

Before installing, make sure your system meets the following requirements:

GPU Requirements

  • Python: version 3.10 to 3.12 (inclusive)
  • CUDA: version >= 12.8
  • GPUs: At least one NVIDIA GPU with compute capability 8.0 or higher (e.g., RTX 30 series, A100, H100)
Recommended for first-time users:

  • If you have no GPU β†’ Use Tinker backend
  • If you want simple setup β†’ Use Docker
  • If you want development & contribution β†’ Use Conda / venv

From Source (Recommended)

If you plan to customize or contribute to Trinity-RFT, this is the best option.

First, clone the repository:

git clone https://github.com/agentscope-ai/Trinity-RFT
cd Trinity-RFT

Then, set up environment via one of the following options:

Using Pre-built Docker Image (Recommended for Beginners)

docker pull ghcr.io/agentscope-ai/trinity-rft:latest

# Run the container, replacing <path_to_your_data_and_checkpoints> with your actual path
docker run -it \
  --gpus all \
  --shm-size="64g" \
  --rm \
  -v $PWD:/workspace \
  -v <path_to_your_data_and_checkpoints>:/data \
  ghcr.io/agentscope-ai/trinity-rft:latest
This image has used uv to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via source /opt/venv/bin/activate if needed). You can use uv pip install to add extra packages as necessary.

Using Conda

conda create -n trinity python=3.12
conda activate trinity

pip install -e ".[vllm,flash_attn]"

# If you have no GPU, comment out the line above and uncomment this instead:
# pip install -e ".[tinker]"

# If you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation
pip install -e ".[dev]"  # for development like linting and debugging

Using venv

python3.10 -m venv .venv
source .venv/bin/activate

pip install -e ".[vllm,flash_attn]"

# If you have no GPU, comment out the line above and uncomment this instead:
# pip install -e ".[tinker]"

# If you encounter issues when installing flash-attn, try:
# pip install flash-attn==2.8.1 --no-build-isolation

pip install -e ".[dev]"  # for development like linting and debugging

Using uv

uv sync --extra vllm --extra dev --extra flash_attn

# If you have no GPU, try to use Tinker instead:
# uv sync --extra tinker --extra dev

Via PyPI

If you just want to use the package without modifying the code:

pip install trinity-rft
pip install flash-attn==2.8.1

Or with uv:

uv pip install trinity-rft
uv pip install flash-attn==2.8.1

For training with Megatron-LM, please refer to Megatron-LM Backend.

Step 2: prepare dataset and model

Trinity-RFT supports most datasets and models from Huggingface and ModelScope.

Prepare the model in the local directory $MODEL_PATH/{model_name}:

# Using Huggingface
huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
# Using Modelscope
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}

For more details about model downloading, see Huggingface or ModelScope.

Prepare the dataset in the local directory $DATASET_PATH/{dataset_name}:

# Using Huggingface
huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}
# Using Modelscope
modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
For more details about dataset downloading, see Huggingface or ModelScope.

Step 3: configurations

Trinity-RFT provides a web interface for configuring your RFT process.

[!NOTE]
This is an experimental feature, and we will continue to improve it.

To launch the web interface for minimal configurations, you can run

trinity studio --port 8080

Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section.

Advanced users can also edit the config file directly. We provide example config files in examples.

For complete GUI features, please refer to the monorepo for Trinity-Studio.

Example: config manager GUI

config-manager

Step 4: run the RFT process

Start a ray cluster:

# On master node
ray start --head
# On worker nodes
ray start --address=<master_address>

(Optional) You may use Wandb / TensorBoard / MLFlow for better monitoring. Please refer to this documentation for the corresponding configurations. For example, to log in to Wandb:

export WANDB_API_KEY=<your_api_key>
wandb login

For command-line users, run the RFT process:

trinity run --config <config_path>

Example β€” fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:

trinity run --config examples/grpo_gsm8k/gsm8k.yaml

For studio users, click "Run" in the web interface.


Contribution Guide

This project is currently under active development--star the repo to watch releases for the latest updates!

We welcome all kinds of contributions from the community, including:

  • Documentation improvements
  • Example workflows, algorithms, and data pipelines
  • Bug fixes and performance optimizations
If you're new to the project, documentation and example updates are a great place to start.

See CONTRIBUTING.md for detailed contribution guidelines, as well as our good-first-issue list.

Acknowledgements

This project is built upon many excellent open-source projects, including:

Citation

@misc{trinity-rft,
      title={Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models},
      author={Xuchen Pan and Yanxi Chen and Yushuo Chen and Yuchang Sun and Daoyuan Chen and Wenhao Zhang and Yuexiang Xie and Yilun Huang and Yilei Zhang and Dawei Gao and Yaliang Li and Bolin Ding and Jingren Zhou},
      year={2025},
      eprint={2505.17826},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.17826},
}