kaizuberbuehler
/

Alpesteibock-Llama-3-8B-Alpha

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Alpesteibock-Llama-3-8B-Alpha / README.md

kaizuberbuehler's picture

kaizuberbuehler

Update README.md

3767886 verified 5 months ago

|

2.44 kB

	---
	license: llama3
	language:
	- gsw
	datasets:
	- cis-lmu/Glot500
	- cis-lmu/GlotCC-V1
	pipeline_tag: text-generation
	base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
	model_type: LlamaForCausalLM
	tags:
	- Llama-3
	- instruct
	- finetune
	- chatml
	- synthetic data
	- axolotl
	---

	# Alpesteibock-Llama-3-8B-Alpha

	Alpesteibock-Llama-3-8B-Alpha is an experimental QLoRA fine-tune of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on a dataset of more than 28 million tokens of Swiss German text from multiple sources.

	## License

	This model is released under the [Llama 3 Community License](https://llama.meta.com/llama3/license/).

	## Usage

	The model uses ChatML as an instruction template and was trained using "You are Alpesteibock, a helpful assistant who speaks Swiss German." as a system message:
	```
	<\|im_start\|>system
	You are Alpesteibock, a helpful assistant who speaks Swiss German.<\|im_end\|>
	<\|im_start\|>user
	Hoi. Wie heissisch du?<\|im_end\|>
	<\|im_start\|>assistant
	Ich bi de Alpesteibock und ich freu mi uf di.<\|im_end\|>
	```

	## Dataset

	The dataset used for training consists of the following sources:

	\| Dataset \| File Size \| Description \| Phase \|
	\|---------\|-----------\|-------------\|-------\|
	\| [Glot500 Corpus](https://huggingface.co/datasets/cis-lmu/Glot500) (gsw_Latn, Leipzig_web) \| 21.7 MB \| Text, usually sentences, crawled from the web \| 1 \|
	\| [Alemannic Wikipedia](https://dumps.wikimedia.org/alswiki/) (Subset) \| 50.5 MB \| Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out \| 2 \|
	\| [Schweizerdeutscher Mundartkorpus](https://chmk.ch/) (Copyright Free Subset) \| 28.4 MB \| Copyright free books written in Swiss German \| 2 \|
	\| [GlotCC-V1.0](https://huggingface.co/datasets/cis-lmu/GlotCC-V1) (gsw-Latn) \| 7.5 MB \| Document-level general domain monolingual dataset derived from CommonCrawl \| 2 \|
	\| Synthetic Instruction Data \| 1.7 MB \| Different datasets of synthetically generated Swiss German text \| 2 \|

	## Training Details

	Hardware: 1x RTX 4090
	Duration: 40 hours in total (2 hours for first phase and 38 hours for second phase)

	### Hyperparameters

	Adapter: QLoRA
	Precision: 4-bit
	Optimizer: adamw_bnb_8bit
	LoRA Rank: 256
	LoRA Alpha: 256
	Learning Rate: 1e-5
	Scheduler: Cosine
	Context Length: 4096
	Batch Size: 1
	Gradient Accumulation Steps: 1
	Sample Packing: On for first phase, Off for second phase
	Epochs: 2