rdemorais's picture
Create README.md
925e81e
metadata
license: apache-2.0
datasets:
  - thegoodfellas/mc4-pt-cleaned
language:
  - pt
inference: false

Model Card for Model ID

This is the PT-BR Flan-T5-base model.

Model Details

Model Description

This model was created to act as the base study for researchs who wants to learn how the Flan-T5 works. This is the Portuguese version.

  • Developed by: The Good Fellas team
  • Model type: Flan-T5
  • Language(s) (NLP): Portuguese (BR)
  • License: apache-2.0
  • Finetuned from model [optional]: Flan-T5-base

We would like to thanks the TPU Research Cloud team for that amazing opportunity given to us. To learn about TRC: https://sites.research.google/trc/about/

Uses

This model can be used as base to downstream task as instructed by Flan-T5 paper

Bias, Risks, and Limitations

Due to the nature of the web-scraped corpus on which Flan-T5 models were trained, it is likely that their usage could reproduce and amplify pre-existing biases in the data, resulting in potentially harmful content such as racial or gender stereotypes and conspiracist views. For this reason, the study of such biases is explicitly encouraged, and model usage should ideally be restricted to research-oriented and non-user-facing endeavors.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import FlaxT5ForConditionalGeneration

model_flax = FlaxT5ForConditionalGeneration.from_pretrained("thegoodfellas/tgf-flan-t5-base-ptbr")

Training Details

Training Data

The training was performed from two datasets, BrWac and Oscar (Portuguese section).

Training Procedure

We trained this model by 1 epoch on each dataset.

Training Hyperparameters

Thanks to TPU Research Cloud we were able to train this model on TPU. 1 single TPUv2-8

  • Training regime:
  • Precision: bf16
  • Batch size: 32
  • LR: 0,005
  • Warmup steps: 10_000
  • Epochs: 1 (each dataset)
  • Optimizer: Adafactor

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Experiments were conducted using Google Cloud Platform in region us-central1, which has a carbon efficiency of 0.57 kgCO$_2$eq/kWh. A cumulative of 50 hours of computation was performed on hardware of type TPUv2 Chip (TDP of 221W).

Total emissions are estimated to be 6.3 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.

  • Hardware Type: TPUv2
  • Hours used: 50
  • Cloud Provider: GCP
  • Compute Region: us-central1
  • Carbon Emitted: 6.3 kgCO$_2$eq

Technical Specifications [optional]

Model Architecture and Objective

Flan-T5