File size: 700 Bytes
d8dcf26
926c8d5
 
 
 
 
 
 
 
 
 
 
 
d8dcf26
926c8d5
 
 
d8dcf26
926c8d5
d8dcf26
926c8d5
 
ae3f262
926c8d5
9baa796
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
language:
- en
- hi
- mr
- gu
- ta
- ml
license: llama2
tags:
- multilingual
- instruction-tuning
- llama2
---
# RomanSetu
This was trained as part of the paper [RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models via Romanization](https://arxiv.org/abs/2401.14280).
The codebase used to train and evaluate this model can be found at [https://github.com/AI4Bharat/romansetu](https://github.com/AI4Bharat/romansetu).

## Usage

```python3
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "ai4bharat/romansetu-cpt-native-400m"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
```