sam-mosaic commited on
Commit
246cc63
1 Parent(s): 4e52eb0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Pile of Law Tokenizer
2
+
3
+ This tokenizer should be a drop-in replacement for the GPT2Tokenizer. It has the same vocabulary size and special tokens, but was trained on a random 1M samples from [the pile of law](https://huggingface.co/datasets/pile-of-law/pile-of-law) train split.