AReaL: Ant Reasoning RL

Github Repo: https://github.com/inclusionAI/AReaL

A fully open-sourced and inclusive RL project for large reasoning models

AReaL (Ant Reasoning RL) is an open-source and efficient reinforcement learning system developed at the RL Lab, Ant Research. AReaL inherits and adapts the Open-Source Project ReaLHF for training Large Reasoning Models (LRMs) that everyone can reproduce and contribute to. AReaL is part of our efforts from Ant Research to develop tools and systems for a fully open and inclusive AGI world.

AReaL Highlights

🛠️ Open & Reproducible: We will continuously release all code, datasets, and training recipes for training LRMs --- no hidden secrects or proprietary barriers.
🚀 Scalable Performance: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to hundreds of GPUs.
🌍 Community-Driven AGI: With a fully open-source commitment, we hope our efforts can benefit the entire community to accelerate AGI research.

Training a 1.5B LRM from the Distilled Model

Our experiments are conducted on 16 nodes, each equipped with 8 H800 GPUs. The results, along with the associated training curves, are presented below.

Figure 1. The training rewards and response lengths during RL training. The base model is DeepSeek-R1-Distill-Qwen-1.5B. The curves are averaged with a window size of 25.

We follow DeepScaleR to iteratively increase the output context length. The context length starts from 8K and is increased to 16K and 24K in the subsequent training process. The training reward continuously increases during RL training. We observe that the response length first shrinks in the 8K training stage, and then increases in the 16K and 24K training stages.

These three stages progressively enhanced the performance of the base model, as demonstrated below:

	MATH500	AIME 2024	AMC 2023
o1-Preview	81.4	40.0	-
DeepSeek-R1-Distill-Qwen-1.5B	82.8	28.8	62.9
DeepScaleR (Official)	87.8	43.1	73.6
AReaL Stage 1: 8K (Ours)	85.7	33.2	74.7
AReaL Stage 2: 16K (Ours)	87.4	34.2	79.6
AReaL Stage 3: 24K (Ours)	88.0	40.2	81.2

Table 1. Evaluation on a series of competition-level mathematics benchmarks, including AIME 2024, AMC 2023, and MATH-500. The results are reported using Pass@1 accuracy, which are averaged over 32 samples for each problem and evaluated with a temperature of 0.6.

inclusionAI
/

AReaL-1.5B-Preview-Stage-1

You need to agree to share your contact information to access this model

AReaL: Ant Reasoning RL

Training a 1.5B LRM from the Distilled Model

Collection including inclusionAI/AReaL-1.5B-Preview-Stage-1

AReaL