pszemraj
/

jamba-900M-v0.13-KIx2

Text Generation

Model card Files Files and versions Community

pszemraj commited on May 7

Commit

0dbe377

•

1 Parent(s): 730ae8e

Update README.md

Files changed (1) hide show

README.md +11 -9

README.md CHANGED Viewed

@@ -1,26 +1,28 @@
 ---
 license: apache-2.0
-base_model: pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
-model-index:
-- name: jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN
-This model is a fine-tuned version of [pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k](https://huggingface.co/pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k) on the BEE-spoke-data/knowledge-inoc-concat-v1 dataset.
-It achieves the following results on the evaluation set:
 - Loss: 3.0366
 - Accuracy: 0.4514
 - Num Input Tokens Seen: 1975517184
 ## Quick eval

 ---
 license: apache-2.0
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
 ---
+# jamba-H1024_L12-v0.13-KIx2
+This is a pretraining experiment on the `jamba` arch as a "smol MoE". Details:
+- pretrained at context length 16384
+- seen approx 20b tokens
+- uses Claude3 tokenizer (as hf GPT2 tokenizer)
+- hidden size 1024, 12 layers, 8 experts
+most recent dataset, achieves the following results on the evaluation set:
 - Loss: 3.0366
 - Accuracy: 0.4514
 - Num Input Tokens Seen: 1975517184
+if I pretrain it further, other versions will be in new repos with incremented version (this is v0.13)
 ## Quick eval