sam-mosaic
commited on
Commit
•
246cc63
1
Parent(s):
4e52eb0
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
# Pile of Law Tokenizer
|
2 |
+
|
3 |
+
This tokenizer should be a drop-in replacement for the GPT2Tokenizer. It has the same vocabulary size and special tokens, but was trained on a random 1M samples from [the pile of law](https://huggingface.co/datasets/pile-of-law/pile-of-law) train split.
|