marttisu-futurice's picture
Upload README.md with huggingface_hub
fe337ee verified
|
raw
history blame
3.88 kB
---
base_model: TurkuNLP/gpt3-finnish-3B
license: apache-2.0
datasets:
- TurkuNLP/squad_v2_fi
language:
- fi
pipeline_tag: text-generation
---
# Model Card for Model Futurice/gpt3-finnish-3B-instruct
The model gpt3-finnish-3B-instruct is an instruction fine-tuned model intended for RAG type Q&A in Finnish.
## Model Details
### Model Description
The gpt3-finnish-3B-instruct model is based on TurkuNLP Finnish GPT-3-models. They are a model family of pretrained monolingual GPT-style language models, based on BLOOM-architecture.
The model was fine-tuned using a sample of dataset TurkuNLP/squad_v2_fi, that was DeepL translated from SQuAD2.0.
- **Developed by:** Martti Sutinen
- **Model type:** Bloom
- **Language(s) (NLP):** Finnish
- **License:** Apache-2.0
- **Finetuned from model:** TurkuNLP/gpt3-finnish-large
## Uses
Intended for RAG type Q&A in Finnish.
### Direct Use
Intended for text generation and RAG type Q&A in Finnish. Supply a context and ask a question about it.
### Out-of-Scope Use
Please do not misuse the model. Not recommended for other use cases.
## Bias, Risks, and Limitations
A key limitation is simple and limited selection of fine-tuning data. Please do not expect high quality answers.
### Recommendations
Recommeded to continue fine-tuning with more data or newer architecture.
## How to Get Started with the Model
- Recommended system message: "Olet avustaja. Seuraavaksi saat kysymyksen tai tehtävän. Kirjoita vastaus parhaasi mukaan siten että se täyttää kysymyksen tai tehtävän vaatimukset."
- Recommended format for question about context: Tausta: "{context} \n\nKäytä vain taustaa ja vastaa kysymykseen tai tehtävään: {question}"
- Prompt format: tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
Where messages with typical format:
messages = [
{"role": "system", "content": system_message},
{"role": "user", "content": prompt_with_context}
].
Here is what the input could look like:
\<s><|im_start|>system
Olet avustaja. Seuraavaksi saat kysymyksen tai tehtävän. Kirjoita vastaus parhaasi mukaan siten että se täyttää kysymyksen tai tehtävän vaatimukset.<|im_end|>
<|im_start|>user
Tausta:
Dokumentti luotiin tammikuussa. Sen kirjoittajaa ei tunneta.
Käytä vain taustaa ja vastaa kysymykseen tai tehtävään: Milloin dokumentti kirjoitettiin?<|im_end|>
<|im_start|>assistant
Use pipeline with task text-generation and the recommended format.
## Training Details
### Training Data
Trained with 20000 random samples from test data in: [TurkuNLP/squad_v2_fi](https://huggingface.co/datasets/TurkuNLP/squad_v2_fi).
### Training Procedure
Training was done for 4-bit base model with supervised fine-tuning and Lora.
#### Training Hyperparameters
- **Training regime:** 4-bit, batch size 2, max steps 20000, data collator for completion only
## Evaluation
Evaluation has not been done properly yet.
### Testing Data, Factors & Metrics
#### Testing Data
Evaluated with 1000 random samples from test data in: [TurkuNLP/squad_v2_fi](https://huggingface.co/datasets/TurkuNLP/squad_v2_fi).
#### Factors
Same factors as in SQuAD2.0.
#### Metrics
Loss.
### Results
No results to be shared yet.
#### Summary
## Environmental Impact
Environmental impact not yet evaluated.
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** Mostly trained on A100
- **Hours used:** 5-10 hours
- **Cloud Provider:** GCP
- **Compute Region:** Unknown
- **Carbon Emitted:** Not evaluated
### Model Architecture and Objective
Bloom.
### Compute Infrastructure
Colab.
#### Hardware
1 x A100.
#### Software
Typical software used.
## Model Card Contact
Martti Sutinen