---
license: apache-2.0
---

# Rodimus*
## Introduction
Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance.

Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.

For more details, please refer to our [Paper](https://openreview.net/forum?id=IIVYiJ1ggK) and [Github](https://github.com/codefuse-ai/rodimus).

> This repository contains the **latest checkpoint** of Rodimus+ 1.6B trained by continuously updated data, with a focus on the performance of code and math.
>

## Usage
We do not recommend using base language models directly for text generation. Instead, consider applying post-training techniques such as SFT, RLHF or continued pretraining to enhance the model's performance.

**Installation**

1. The latest version of [transformers](https://github.com/huggingface/transformers) is recommended (at least 4.42.0). 
2. We evaluate our models with `python=3.8` and `torch==2.1.2`.
3. If you use Rodimus, you need to install [flash-linear-attention](https://github.com/sustcsonglin/flash-linear-attention) and [triton>=2.2.0](https://github.com/triton-lang/triton). If you use Rodimus+, you need to further install [flash-attention](https://github.com/Dao-AILab/flash-attention).

## Generation
`generate` APi

```python
import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "你好！你是谁？"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)
```

## Performance
**Code Tasks**: HumanEval (0-shot), MBPP (0-shot)

**Math Tasks**: GSM8K (4-shot), MATH (5-shot)

**NLP Tasks**: C-Eval (5-shot), CMMLU (5-shot), MMLU (5-shot), BBH (3-shot)

> Latest update time: 2025/02/15
>

| Datasets | Rodimus+ 1.6B (20250215) |
| --- | :---: |
| HumanEval | 24.39 |
| MBPP | 26.60 |
| GSM8K | 50.19 |
| MATH | 15.06 |
| C-Eval | 47.19 |
| CMMLU | 43.76 |
| MMLU | 45.52 |
| BBH | 35.28 |


## Citation
If you find our work helpful, feel free to give us a cite.

```markdown
@inproceedings{
he2025rodimus,
title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=IIVYiJ1ggK}
}
```