|
--- |
|
datasets: |
|
- oscar-corpus/OSCAR-2301 |
|
language: |
|
- it |
|
tags: |
|
- ipt-125m |
|
--- |
|
|
|
# IPT-125m (WIP) |
|
|
|
IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset. |
|
|
|
If you like this project, consider supporting me with a cup of coffee! π€β¨π |
|
[![Buy me a coffee](https://badgen.net/badge/icon/Buy%20Me%20A%20Coffee?icon=buymeacoffee&label)](https://bmc.link/edoardofederici) |
|
|
|
## How to Use |
|
|
|
This model is best used with the Hugging Face `transformers` library for training and finetuning. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m") |
|
``` |
|
|
|
## Model Description |
|
|
|
The architecture is a modification of a standard decoder-only transformer. |
|
|
|
| Hyperparameter | Value | |
|
|----------------|-------| |
|
|n_parameters | 125M | |
|
|n_layers | 12 | |
|
| n_heads | 12 | |
|
| d_model | 768 | |
|
| vocab size | 50432 | |
|
| sequence length | 2048 | |
|
|