kaizuberbuehler commited on
Commit
3767886
1 Parent(s): cea56e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -3,9 +3,18 @@ license: llama3
3
  language:
4
  - gsw
5
  datasets:
 
6
  - cis-lmu/GlotCC-V1
7
  pipeline_tag: text-generation
8
  base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  # Alpesteibock-Llama-3-8B-Alpha
@@ -14,15 +23,31 @@ base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
14
 
15
  ## License
16
 
17
- This model is release under the [Llama 3 Community License](https://llama.meta.com/llama3/license/).
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Dataset
20
 
 
 
21
  | Dataset | File Size | Description | Phase |
22
  |---------|-----------|-------------|-------|
 
23
  | [Alemannic Wikipedia](https://dumps.wikimedia.org/alswiki/) (Subset) | 50.5 MB | Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out | 2 |
24
  | [Schweizerdeutscher Mundartkorpus](https://chmk.ch/) (Copyright Free Subset) | 28.4 MB | Copyright free books written in Swiss German | 2 |
25
  | [GlotCC-V1.0](https://huggingface.co/datasets/cis-lmu/GlotCC-V1) (gsw-Latn) | 7.5 MB | Document-level general domain monolingual dataset derived from CommonCrawl | 2 |
 
26
 
27
  ## Training Details
28
 
 
3
  language:
4
  - gsw
5
  datasets:
6
+ - cis-lmu/Glot500
7
  - cis-lmu/GlotCC-V1
8
  pipeline_tag: text-generation
9
  base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
10
+ model_type: LlamaForCausalLM
11
+ tags:
12
+ - Llama-3
13
+ - instruct
14
+ - finetune
15
+ - chatml
16
+ - synthetic data
17
+ - axolotl
18
  ---
19
 
20
  # Alpesteibock-Llama-3-8B-Alpha
 
23
 
24
  ## License
25
 
26
+ This model is released under the [Llama 3 Community License](https://llama.meta.com/llama3/license/).
27
+
28
+ ## Usage
29
+
30
+ The model uses ChatML as an instruction template and was trained using "You are Alpesteibock, a helpful assistant who speaks Swiss German." as a system message:
31
+ ```
32
+ <|im_start|>system
33
+ You are Alpesteibock, a helpful assistant who speaks Swiss German.<|im_end|>
34
+ <|im_start|>user
35
+ Hoi. Wie heissisch du?<|im_end|>
36
+ <|im_start|>assistant
37
+ Ich bi de Alpesteibock und ich freu mi uf di.<|im_end|>
38
+ ```
39
 
40
  ## Dataset
41
 
42
+ The dataset used for training consists of the following sources:
43
+
44
  | Dataset | File Size | Description | Phase |
45
  |---------|-----------|-------------|-------|
46
+ | [Glot500 Corpus](https://huggingface.co/datasets/cis-lmu/Glot500) (gsw_Latn, Leipzig_web) | 21.7 MB | Text, usually sentences, crawled from the web | 1 |
47
  | [Alemannic Wikipedia](https://dumps.wikimedia.org/alswiki/) (Subset) | 50.5 MB | Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out | 2 |
48
  | [Schweizerdeutscher Mundartkorpus](https://chmk.ch/) (Copyright Free Subset) | 28.4 MB | Copyright free books written in Swiss German | 2 |
49
  | [GlotCC-V1.0](https://huggingface.co/datasets/cis-lmu/GlotCC-V1) (gsw-Latn) | 7.5 MB | Document-level general domain monolingual dataset derived from CommonCrawl | 2 |
50
+ | Synthetic Instruction Data | 1.7 MB | Different datasets of synthetically generated Swiss German text | 2 |
51
 
52
  ## Training Details
53