nicholasKluge's picture
Update README.md
d4a0c4a verified
|
raw
history blame
17.2 kB
metadata
license: apache-2.0
datasets:
  - nicholasKluge/Pt-Corpus-Instruct
language:
  - pt
metrics:
  - perplexity
library_name: transformers
pipeline_tag: text-generation
tags:
  - text-generation-inference
widget:
  - text: 'A PUCRS é uma universidade '
    example_title: Exemplo
  - text: A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de
    example_title: Exemplo
  - text: Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para
    example_title: Exemplo
inference:
  parameters:
    repetition_penalty: 1.2
    temperature: 0.2
    top_k: 20
    top_p: 0.2
    max_new_tokens: 150
co2_eq_emissions:
  emissions: 41.1
  source: CodeCarbon
  training_type: pre-training
  geographical_location: Germany
  hardware_used: NVIDIA A100-SXM4-40GB

TeenyTinyLlama-460m-awq

A curious llama exploring a mushroom forest.

Model Summary

Note: This model is a quantized version of TeenyTinyLlama-460m. Quantization was performed using AutoAWQ, allowing this version to be 80% lighter, 20% faster, and with almost no performance loss. A GPU is required to run the AWQ-quantized models.

Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: a pair of small foundational models trained in Brazilian Portuguese.

Details

  • Architecture: a Transformer-based model pre-trained via causal language modeling
  • Size: 468,239,360 parameters
  • Context length: 2048 tokens
  • Dataset: Pt-Corpus Instruct (6.2B tokens)
  • Language: Portuguese
  • Number of steps: 1,200,000
  • GPU: 1 NVIDIA A100-SXM4-40GB
  • Training time: ~ 280 hours
  • Emissions: 41.1 KgCO2 (Germany)
  • Total energy consumption: 115.69 kWh
  • Quantization Configuration:
    • bits: 4
    • group_size: 128
    • quant_method: "awq"
    • version: "gemm"
    • zero_point: True

This repository has the source code used to train this model. The main libraries used are:

Check out the training logs in Weights and Biases.

Training Set-up

These are the main arguments used in the training of this model:

Arguments Value
vocabulary size 32000
hidden dimension size 1024
intermediate dimension size 4096
context length 2048
nº attention heads 16
nº hidden layers 24
nº key value heads 16
nº training samples 3033690
nº validation samples 30000
nº epochs 1.5
evaluation steps 100000
train batch size 2
eval batch size 4
gradient accumulation steps 2
optimizer torch.optim.AdamW
learning rate 0.0003
adam epsilon 0.00000001
weight decay 0.01
scheduler type "cosine"
warmup steps 10000
gradient checkpointing false
seed 42
mixed precision 'no'
torch dtype "float32"
tf32 true

Intended Uses

The primary intended use of TeenyTinyLlama is to research the behavior, functionality, and limitations of large language models. Checkpoints saved during training are intended to provide a controlled setting for performing scientific experiments. You may also further fine-tune and adapt TeenyTinyLlama-460m for deployment, as long as your use is in accordance with the Apache 2.0 license. If you decide to use pre-trained TeenyTinyLlama-460m as a basis for your fine-tuned model, please conduct your own risk and bias assessment.

Basic Usage

Note: Using quantized models required the installation of autoawq==0.1.7. A GPU is required to run the AWQ-quantized models.

Using the pipeline:

!pip install autoawq==0.1.7 -q

from transformers import pipeline

generator = pipeline("text-generation", model="nicholasKluge/TeenyTinyLlama-460m-awq")

completions  = generator("Astronomia é a ciência", num_return_sequences=2, max_new_tokens=100)

for comp in completions:
  print(f"🤖 {comp['generated_text']}")

Using the AutoTokenizer and AutoModelForCausalLM:

!pip install autoawq==0.1.7 -q

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and the tokenizer
tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')
model = AutoModelForCausalLM.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')

# Pass the model to your device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.eval()
model.to(device)

# Tokenize the inputs and pass them to the device
inputs = tokenizer("Astronomia é a ciência", return_tensors="pt").to(device)

# Generate some text
completions = model.generate(**inputs, num_return_sequences=2, max_new_tokens=100)

# Print the generated text
for i, completion in enumerate(completions):
    print(f'🤖 {tokenizer.decode(completion)}')

Limitations

  • Hallucinations: This model can produce content that can be mistaken for truth but is, in fact, misleading or entirely false, i.e., hallucination.

  • Biases and Toxicity: This model inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.

  • Unreliable Code: The model may produce incorrect code snippets and statements. These code generations should not be treated as suggestions or accurate solutions.

  • Language Limitations: The model is primarily designed to understand standard Portuguese (BR). Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.

  • Repetition and Verbosity: The model may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.

Evaluations

Steps Evaluation Loss Perplexity Total Energy Consumption Emissions
100,000 3.02 20.49 9.40 kWh 3.34 KgCO2eq
200,000 2.82 16.90 18.82 kWh 6.70 KgCO2eq
300,000 2.73 15.43 28.59 kWh 10.16 KgCO2eq
400,000 2.68 14.64 38.20 kWh 13.57 KgCO2eq
500,000 2.64 14.08 48.04 kWh 17.07 KgCO2eq
600,000 2.61 13.61 57.74 kWh 20.52 KgCO2eq
700,000 2.58 13.25 67.32 kWh 23.92 KgCO2eq
800,000 2.55 12.87 76.84 kWh 27.30 KgCO2eq
900,000 2.53 12.57 86.40 kWh 30.70 KgCO2eq
1,000,000 2.50 12.27 96.19 kWh 34.18 KgCO2eq
1,100,000 2.48 11.96 106.06 kWh 37.70 KgCO2eq
1,200,000 2.46 11.77 115.69 kWh 41.11 KgCO2eq
  • Note: Each evaluation consumed around 0.26 kWh of energy (~ 0.09 KgCO2eq), totaling 3.12 kWh (~ 1,11 KgCO2eq).

Benchmarks

Evaluations on benchmarks were performed using the Language Model Evaluation Harness (by EleutherAI). Thanks to Laiviet for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the Open LLM Leaderboard.

Models Average ARC Hellaswag MMLU TruthfulQA
Pythia-410m 33.26 24.83* 41.29* 25.99* 40.95*
TeenyTinyLlama-460m 33.01 29.40 33.00 28.55 41.10
Bloom-560m 32.13 24.74* 37.15* 24.22* 42.44*
Xglm-564M 31.97 25.56 34.64* 25.18* 42.53
OPT-350m 31.78 23.55* 36.73* 26.02* 40.83*
TeenyTinyLlama-160m 31.16 26.15 29.29 28.11 41.12
Pythia-160m 31.16 24.06* 31.39* 24.86* 44.34*
OPT-125m 30.80 22.87 31.47 26.02 42.87
Gpt2-portuguese-small 30.22 22.48* 29.62* 27.36* 41.44*
Gpt2-small 29.97 21.48* 31.60* 25.79* 40.65*
Multilingual GPT 29.45 24.79 26.37* 25.17* 41.50

Fine-Tuning Comparisons

Models Average IMDB FaQuAD-NLI HateBr Assin2 AgNews
Bert-large-portuguese-cased 92.09 93.58 92.26 91.57 88.97 94.11
Bert-base-portuguese-cased 91.64 92.22 93.07 91.28 87.45 94.19
TeenyTinyLlama-460m 91.19 91.64 91.18 92.28 86.43 94.42
TeenyTinyLlama-160m 90.33 91.14 90.00 90.71 85.78 94.05
Gpt2-small-portuguese 89.13 91.60 86.46 87.42 86.11 94.07

Cite as 🤗


@misc{nicholas22llama,
  doi = {10.5281/zenodo.6989727},
  url = {https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m},
  author = {Nicholas Kluge Corrêa},
  title = {TeenyTinyLlama},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
}

Funding

This repository was built as part of the RAIES (Rede de Inteligência Artificial Ética e Segura) initiative, a project supported by FAPERGS - (Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul), Brazil.

License

TeenyTinyLlama-460m is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.