Update README.md

050d970 verified 10 months ago

17.4 kB

	---
	license: apache-2.0
	datasets:
	- nicholasKluge/Pt-Corpus-Instruct
	language:
	- pt
	metrics:
	- perplexity
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	widget:
	- text: "A PUCRS é uma universidade "
	example_title: Exemplo
	- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
	example_title: Exemplo
	- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
	example_title: Exemplo
	inference:
	parameters:
	repetition_penalty: 1.2
	temperature: 0.2
	top_k: 20
	top_p: 0.2
	max_new_tokens: 150
	co2_eq_emissions:
	emissions: 41.1
	source: CodeCarbon
	training_type: pre-training
	geographical_location: Germany
	hardware_used: NVIDIA A100-SXM4-40GB
	---
	# TeenyTinyLlama-460m-awq

	<img src="./logo.png" alt="A curious llama exploring a mushroom forest." height="200">

	## Model Summary

	Note: This model is a quantized version of [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter, 20% faster, and with almost no performance loss. A GPU is required to run the AWQ-quantized models.

	Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._

	TeenyTinyLlama is a compact language model based on the Llama 2 architecture ([TinyLlama implementation](https://huggingface.co/TinyLlama)). This model is designed to deliver efficient natural language processing capabilities while being resource-conscious These models were trained by leveraging [scaling laws](https://arxiv.org/abs/2203.15556) to determine the optimal number of tokens per parameter while incorporating [preference pre-training](https://arxiv.org/abs/2112.00861).


	## Details

	- Architecture: a Transformer-based model pre-trained via causal language modeling
	- Size: 468,239,360 parameters
	- Context length: 2048 tokens
	- Dataset: [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
	- Language: Portuguese
	- Number of steps: 1,200,000
	- GPU: 1 NVIDIA A100-SXM4-40GB
	- Training time: ~ 280 hours
	- Emissions: 41.1 KgCO2 (Germany)
	- Total energy consumption: 115.69 kWh
	- Quantization Configuration:
	- `bits`: 4
	- `group_size`: 128
	- `quant_method`: "awq"
	- `version`: "gemm"
	- `zero_point`: True

	This repository has the [source code](https://github.com/Nkluge-correa/TeenyTinyLlama) used to train this model. The main libraries used are:

	- [Transformers](https://github.com/huggingface/transformers)
	- [PyTorch](https://github.com/pytorch/pytorch)
	- [Datasets](https://github.com/huggingface/datasets)
	- [Tokenizers](https://github.com/huggingface/tokenizers)
	- [Sentencepiece](https://github.com/google/sentencepiece)
	- [Accelerate](https://github.com/huggingface/accelerate)
	- [Codecarbon](https://github.com/mlco2/codecarbon)
	- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)

	Check out the training logs in [Weights and Biases](https://api.wandb.ai/links/nkluge-correa/vws4g032).

	## Training Set-up

	These are the main arguments used in the training of this model:

	\| Arguments \| Value \|
	\|-------------------------------\|--------------------------------------\|
	\| vocabulary size \| 32000 \|
	\| hidden dimension size \| 1024 \|
	\| intermediate dimension size \| 4096 \|
	\| context length \| 2048 \|
	\| nº attention heads \| 16 \|
	\| nº hidden layers \| 24 \|
	\| nº key value heads \| 16 \|
	\| nº training samples \| 3033690 \|
	\| nº validation samples \| 30000 \|
	\| nº epochs \| 1.5 \|
	\| evaluation steps \| 100000 \|
	\| train batch size \| 2 \|
	\| eval batch size \| 4 \|
	\| gradient accumulation steps \| 2 \|
	\| optimizer \| torch.optim.AdamW \|
	\| learning rate \| 0.0003 \|
	\| adam epsilon \| 0.00000001 \|
	\| weight decay \| 0.01 \|
	\| scheduler type \| "cosine" \|
	\| warmup steps \| 10000 \|
	\| gradient checkpointing \| false \|
	\| seed \| 42 \|
	\| mixed precision \| 'no' \|
	\| torch dtype \| "float32" \|
	\| tf32 \| true \|

	## Intended Uses

	The primary intended use of TeenyTinyLlama is to research the behavior, functionality, and limitations of large language models. Checkpoints saved during training are intended to provide a controlled setting for performing scientific experiments. You may also further fine-tune and adapt TeenyTinyLlama-460m for deployment, as long as your use is in accordance with the Apache 2.0 license. If you decide to use pre-trained TeenyTinyLlama-460m as a basis for your fine-tuned model, please conduct your own risk and bias assessment.

	## Basic Usage

	Note: Using quantized models required the installation of `autoawq==0.1.7`. A GPU is required to run the AWQ-quantized models.

	Using the `pipeline`:

	```python
	!pip install autoawq==0.1.7 -q

	from transformers import pipeline

	generator = pipeline("text-generation", model="nicholasKluge/TeenyTinyLlama-460m-awq")

	completions = generator("Astronomia é a ciência", num_return_sequences=2, max_new_tokens=100)

	for comp in completions:
	print(f"🤖 {comp['generated_text']}")
	```

	Using the `AutoTokenizer` and `AutoModelForCausalLM`:

	```python
	!pip install autoawq==0.1.7 -q

	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')
	model = AutoModelForCausalLM.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')

	# Pass the model to your device
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	model.eval()
	model.to(device)

	# Tokenize the inputs and pass them to the device
	inputs = tokenizer("Astronomia é a ciência", return_tensors="pt").to(device)

	# Generate some text
	completions = model.generate(**inputs, num_return_sequences=2, max_new_tokens=100)

	# Print the generated text
	for i, completion in enumerate(completions):
	print(f'🤖 {tokenizer.decode(completion)}')
	```

	## Limitations

	- Hallucinations: This model can produce content that can be mistaken for truth but is, in fact, misleading or entirely false, i.e., hallucination.

	- Biases and Toxicity: This model inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.

	- Unreliable Code: The model may produce incorrect code snippets and statements. These code generations should not be treated as suggestions or accurate solutions.

	- Language Limitations: The model is primarily designed to understand standard Portuguese (BR). Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.

	- Repetition and Verbosity: The model may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.

	## Evaluations

	\| Steps \| Evaluation Loss \| Perplexity \| Total Energy Consumption \| Emissions \|
	\|-----------\|-----------------\|------------\|--------------------------\|---------------\|
	\| 100,000 \| 3.02 \| 20.49 \| 9.40 kWh \| 3.34 KgCO2eq \|
	\| 200,000 \| 2.82 \| 16.90 \| 18.82 kWh \| 6.70 KgCO2eq \|
	\| 300,000 \| 2.73 \| 15.43 \| 28.59 kWh \| 10.16 KgCO2eq \|
	\| 400,000 \| 2.68 \| 14.64 \| 38.20 kWh \| 13.57 KgCO2eq \|
	\| 500,000 \| 2.64 \| 14.08 \| 48.04 kWh \| 17.07 KgCO2eq \|
	\| 600,000 \| 2.61 \| 13.61 \| 57.74 kWh \| 20.52 KgCO2eq \|
	\| 700,000 \| 2.58 \| 13.25 \| 67.32 kWh \| 23.92 KgCO2eq \|
	\| 800,000 \| 2.55 \| 12.87 \| 76.84 kWh \| 27.30 KgCO2eq \|
	\| 900,000 \| 2.53 \| 12.57 \| 86.40 kWh \| 30.70 KgCO2eq \|
	\| 1,000,000 \| 2.50 \| 12.27 \| 96.19 kWh \| 34.18 KgCO2eq \|
	\| 1,100,000 \| 2.48 \| 11.96 \| 106.06 kWh \| 37.70 KgCO2eq \|
	\| 1,200,000 \| 2.46 \| 11.77 \| 115.69 kWh \| 41.11 KgCO2eq \|

	- Note: Each evaluation consumed around 0.26 kWh of energy (~ 0.09 KgCO2eq), totaling 3.12 kWh (~ 1,11
	KgCO2eq).

	## Benchmarks

	Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

	\| Models \| Average \| [ARC](https://arxiv.org/abs/1803.05457) \| [Hellaswag](https://arxiv.org/abs/1905.07830) \| [MMLU](https://arxiv.org/abs/2009.03300) \| [TruthfulQA](https://arxiv.org/abs/2109.07958) \|
	\|-------------------------------------------------------------------------------------\|---------\|-----------------------------------------\|-----------------------------------------------\|------------------------------------------\|------------------------------------------------\|
	\| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) \| 33.01 \| 29.40 \| 33.00 \| 28.55 \| 41.10 \|
	\| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m) \| 32.13 \| 24.74* \| 37.15* \| 24.22* \| 42.44* \|
	\| [Xglm-564M](https://huggingface.co/facebook/xglm-564M) \| 31.97 \| 25.56 \| 34.64* \| 25.18* \| 42.53 \|
	\| [OPT-350m](https://huggingface.co/facebook/opt-350m) \| 31.78 \| 23.55* \| 36.73* \| 26.02* \| 40.83* \|
	\| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) \| 31.16 \| 26.15 \| 29.29 \| 28.11 \| 41.12 \|
	\| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) \| 31.16 \| 24.06* \| 31.39* \| 24.86* \| 44.34* \|
	\| [OPT-125m](https://huggingface.co/facebook/opt-125m) \| 30.80 \| 22.87 \| 31.47 \| 26.02 \| 42.87 \|
	\| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) \| 30.22 \| 22.48* \| 29.62* \| 27.36* \| 41.44* \|
	\| [Gpt2-small](https://huggingface.co/gpt2) \| 29.97 \| 21.48* \| 31.60* \| 25.79* \| 40.65* \|
	\| [Multilingual GPT](https://huggingface.co/ai-forever/mGPT) \| 29.45 \| 24.79 \| 26.37* \| 25.17* \| 41.50 \|

	## Fine-Tuning Comparisons

	\| Models \| Average \| [IMDB](https://huggingface.co/datasets/christykoh/imdb_pt) \| [FaQuAD-NLI](https://huggingface.co/datasets/ruanchaves/faquad-nli) \| [HateBr](https://huggingface.co/datasets/ruanchaves/hatebr) \| [Assin2](https://huggingface.co/datasets/assin2) \| [AgNews](https://huggingface.co/datasets/maritaca-ai/ag_news_pt) \|
	\|---------------------------------------------------------------------------------------------\|---------\|------------------------------------------------------------\|---------------------------------------------------------------------\|-------------------------------------------------------------\|--------------------------------------------------\|------------------------------------------------------------------\|
	\| [Bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) \| 92.09 \| 93.58 \| 92.26 \| 91.57 \| 88.97 \| 94.11 \|
	\| [Bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) \| 91.64 \| 92.22 \| 93.07 \| 91.28 \| 87.45 \| 94.19 \|
	\| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) \| 91.19 \| 91.64 \| 91.18 \| 92.28 \| 86.43 \| 94.42 \|
	\| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) \| 90.33 \| 91.14 \| 90.00 \| 90.71 \| 85.78 \| 94.05 \|
	\| [Gpt2-small-portuguese](https://huggingface.co/pierreguillou/gpt2-small-portuguese) \| 89.13 \| 91.60 \| 86.46 \| 87.42 \| 86.11 \| 94.07 \|

	## Cite as 🤗

	```latex

	@misc{nicholas22llama,
	doi = {10.5281/zenodo.6989727},
	url = {https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m},
	author = {Nicholas Kluge Corrêa},
	title = {TeenyTinyLlama},
	year = {2023},
	publisher = {HuggingFace},
	journal = {HuggingFace repository},
	}

	```

	## Funding

	This repository was built as part of the RAIES ([Rede de Inteligência Artificial Ética e Segura](https://www.raies.org/)) initiative, a project supported by FAPERGS - ([Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul](https://fapergs.rs.gov.br/inicial)), Brazil.

	## License

	TeenyTinyLlama-460m is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.