pszemraj commited on
Commit
0dbe377
1 Parent(s): 730ae8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -1,26 +1,28 @@
1
  ---
2
  license: apache-2.0
3
- base_model: pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k
4
  tags:
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
8
- model-index:
9
- - name: jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
 
 
15
 
16
- # jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN
 
 
 
17
 
18
- This model is a fine-tuned version of [pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k](https://huggingface.co/pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k) on the BEE-spoke-data/knowledge-inoc-concat-v1 dataset.
19
- It achieves the following results on the evaluation set:
20
  - Loss: 3.0366
21
  - Accuracy: 0.4514
22
  - Num Input Tokens Seen: 1975517184
23
 
 
24
 
25
  ## Quick eval
26
 
 
1
  ---
2
  license: apache-2.0
3
+
4
  tags:
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
 
 
 
8
  ---
9
 
10
+ # jamba-H1024_L12-v0.13-KIx2
11
+
12
+
13
+ This is a pretraining experiment on the `jamba` arch as a "smol MoE". Details:
14
 
15
+ - pretrained at context length 16384
16
+ - seen approx 20b tokens
17
+ - uses Claude3 tokenizer (as hf GPT2 tokenizer)
18
+ - hidden size 1024, 12 layers, 8 experts
19
 
20
+ most recent dataset, achieves the following results on the evaluation set:
 
21
  - Loss: 3.0366
22
  - Accuracy: 0.4514
23
  - Num Input Tokens Seen: 1975517184
24
 
25
+ if I pretrain it further, other versions will be in new repos with incremented version (this is v0.13)
26
 
27
  ## Quick eval
28