jon-tow commited on
Commit
2724cd4
1 Parent(s): 49c5af4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - EleutherAI/the_pile_deduplicated
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ Pythia-2.8B Deduped 4K is a [Pythia-2.8B Deduped](https://huggingface.co/EleutherAI/pythia-2.8b-deduped) model fine-tuned with a 4096 context length.
10
+ Training resumed from their 143,000 step checkpoint and continued on The Pile v1 Deduped (threshold=0.87).
11
+ This particular model is from a checkpoint captured at step 175,500 for an extra 134,217,728,000 tokens of training.
12
+
13
+ Note: Sequence length warmup was not used to move up from 2048 but, in hindsight, should have been applied.