kaizuberbuehler commited on
Commit
cea56e4
1 Parent(s): 853042c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -9
README.md CHANGED
@@ -1,33 +1,45 @@
1
  ---
2
  license: llama3
 
 
 
 
 
 
3
  ---
4
 
5
  # Alpesteibock-Llama-3-8B-Alpha
6
 
7
  **Alpesteibock-Llama-3-8B-Alpha** is an experimental QLoRA fine-tune of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on a dataset of more than 28 million tokens of Swiss German text from multiple sources.
8
 
9
- ## Dataset
 
 
10
 
 
11
 
 
 
 
 
 
12
 
13
  ## Training Details
14
 
15
  Hardware: 1x RTX 4090
16
- Duration: 30 hours in total (2 hours for first phase and 28 hours for second phase)
17
 
18
  ### Hyperparameters
19
 
20
  Adapter: QLoRA
21
- Precision: 4 bit
22
  Optimizer: adamw_bnb_8bit
23
  LoRA Rank: 256
24
  LoRA Alpha: 256
25
  Learning Rate: 1e-5
26
- Context Length: 4096 tokens
 
27
  Batch Size: 1
28
  Gradient Accumulation Steps: 1
29
- Sample Packing: Off for first phase, on for second phase
30
- Epochs: 2
31
-
32
- ## Limitations
33
-
 
1
  ---
2
  license: llama3
3
+ language:
4
+ - gsw
5
+ datasets:
6
+ - cis-lmu/GlotCC-V1
7
+ pipeline_tag: text-generation
8
+ base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
9
  ---
10
 
11
  # Alpesteibock-Llama-3-8B-Alpha
12
 
13
  **Alpesteibock-Llama-3-8B-Alpha** is an experimental QLoRA fine-tune of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on a dataset of more than 28 million tokens of Swiss German text from multiple sources.
14
 
15
+ ## License
16
+
17
+ This model is release under the [Llama 3 Community License](https://llama.meta.com/llama3/license/).
18
 
19
+ ## Dataset
20
 
21
+ | Dataset | File Size | Description | Phase |
22
+ |---------|-----------|-------------|-------|
23
+ | [Alemannic Wikipedia](https://dumps.wikimedia.org/alswiki/) (Subset) | 50.5 MB | Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out | 2 |
24
+ | [Schweizerdeutscher Mundartkorpus](https://chmk.ch/) (Copyright Free Subset) | 28.4 MB | Copyright free books written in Swiss German | 2 |
25
+ | [GlotCC-V1.0](https://huggingface.co/datasets/cis-lmu/GlotCC-V1) (gsw-Latn) | 7.5 MB | Document-level general domain monolingual dataset derived from CommonCrawl | 2 |
26
 
27
  ## Training Details
28
 
29
  Hardware: 1x RTX 4090
30
+ Duration: 40 hours in total (2 hours for first phase and 38 hours for second phase)
31
 
32
  ### Hyperparameters
33
 
34
  Adapter: QLoRA
35
+ Precision: 4-bit
36
  Optimizer: adamw_bnb_8bit
37
  LoRA Rank: 256
38
  LoRA Alpha: 256
39
  Learning Rate: 1e-5
40
+ Scheduler: Cosine
41
+ Context Length: 4096
42
  Batch Size: 1
43
  Gradient Accumulation Steps: 1
44
+ Sample Packing: On for first phase, Off for second phase
45
+ Epochs: 2