gpt3-finnish-3B-instruct / README.md

marttisu-futurice

Upload README.md with huggingface_hub

fe337ee verified 3 months ago

preview code

raw

history blame contribute delete

No virus

3.88 kB

	---
	base_model: TurkuNLP/gpt3-finnish-3B
	license: apache-2.0
	datasets:
	- TurkuNLP/squad_v2_fi
	language:
	- fi
	pipeline_tag: text-generation
	---

	# Model Card for Model Futurice/gpt3-finnish-3B-instruct

	The model gpt3-finnish-3B-instruct is an instruction fine-tuned model intended for RAG type Q&A in Finnish.

	## Model Details

	### Model Description

	The gpt3-finnish-3B-instruct model is based on TurkuNLP Finnish GPT-3-models. They are a model family of pretrained monolingual GPT-style language models, based on BLOOM-architecture.

	The model was fine-tuned using a sample of dataset TurkuNLP/squad_v2_fi, that was DeepL translated from SQuAD2.0.

	- Developed by: Martti Sutinen
	- Model type: Bloom
	- Language(s) (NLP): Finnish
	- License: Apache-2.0
	- Finetuned from model: TurkuNLP/gpt3-finnish-large

	## Uses

	Intended for RAG type Q&A in Finnish.

	### Direct Use

	Intended for text generation and RAG type Q&A in Finnish. Supply a context and ask a question about it.

	### Out-of-Scope Use

	Please do not misuse the model. Not recommended for other use cases.

	## Bias, Risks, and Limitations

	A key limitation is simple and limited selection of fine-tuning data. Please do not expect high quality answers.

	### Recommendations

	Recommeded to continue fine-tuning with more data or newer architecture.

	## How to Get Started with the Model

	- Recommended system message: "Olet avustaja. Seuraavaksi saat kysymyksen tai tehtävän. Kirjoita vastaus parhaasi mukaan siten että se täyttää kysymyksen tai tehtävän vaatimukset."
	- Recommended format for question about context: Tausta: "{context} \n\nKäytä vain taustaa ja vastaa kysymykseen tai tehtävään: {question}"
	- Prompt format: tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	Where messages with typical format:
	messages = [
	{"role": "system", "content": system_message},
	{"role": "user", "content": prompt_with_context}
	].

	Here is what the input could look like:

	\<s><\|im_start\|>system
	Olet avustaja. Seuraavaksi saat kysymyksen tai tehtävän. Kirjoita vastaus parhaasi mukaan siten että se täyttää kysymyksen tai tehtävän vaatimukset.<\|im_end\|>
	<\|im_start\|>user
	Tausta:
	Dokumentti luotiin tammikuussa. Sen kirjoittajaa ei tunneta.

	Käytä vain taustaa ja vastaa kysymykseen tai tehtävään: Milloin dokumentti kirjoitettiin?<\|im_end\|>
	<\|im_start\|>assistant


	Use pipeline with task text-generation and the recommended format.

	## Training Details

	### Training Data

	Trained with 20000 random samples from test data in: [TurkuNLP/squad_v2_fi](https://huggingface.co/datasets/TurkuNLP/squad_v2_fi).

	### Training Procedure

	Training was done for 4-bit base model with supervised fine-tuning and Lora.

	#### Training Hyperparameters

	- Training regime: 4-bit, batch size 2, max steps 20000, data collator for completion only

	## Evaluation

	Evaluation has not been done properly yet.

	### Testing Data, Factors & Metrics

	#### Testing Data

	Evaluated with 1000 random samples from test data in: [TurkuNLP/squad_v2_fi](https://huggingface.co/datasets/TurkuNLP/squad_v2_fi).

	#### Factors

	Same factors as in SQuAD2.0.

	#### Metrics

	Loss.

	### Results

	No results to be shared yet.

	#### Summary

	## Environmental Impact

	Environmental impact not yet evaluated.

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: Mostly trained on A100
	- Hours used: 5-10 hours
	- Cloud Provider: GCP
	- Compute Region: Unknown
	- Carbon Emitted: Not evaluated

	### Model Architecture and Objective

	Bloom.

	### Compute Infrastructure

	Colab.

	#### Hardware

	1 x A100.

	#### Software

	Typical software used.

	## Model Card Contact

	Martti Sutinen