efederici
/

ipt-125m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ipt-125m / README.md

efederici's picture

Update README.md

2b81a7e over 1 year ago

|

history blame contribute delete

1.13 kB

	---
	datasets:
	- oscar-corpus/OSCAR-2301
	language:
	- it
	tags:
	- ipt-125m
	---

	# IPT-125m (WIP)

	IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset.

	If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞
	[![Buy me a coffee](https://badgen.net/badge/icon/Buy%20Me%20A%20Coffee?icon=buymeacoffee&label)](https://bmc.link/edoardofederici)

	## How to Use

	This model is best used with the Hugging Face `transformers` library for training and finetuning.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")
	```

	## Model Description

	The architecture is a modification of a standard decoder-only transformer.

	\| Hyperparameter \| Value \|
	\|----------------\|-------\|
	\|n_parameters \| 125M \|
	\|n_layers \| 12 \|
	\| n_heads \| 12 \|
	\| d_model \| 768 \|
	\| vocab size \| 50432 \|
	\| sequence length \| 2048 \|