sam-mosaic
/

pile-of-law-tokenizer

Model card Files Files and versions Community

sam-mosaic commited on Dec 21, 2022

Commit

8cf8528

·

1 Parent(s): 246cc63

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -1,3 +1,11 @@
 # Pile of Law Tokenizer
 This tokenizer should be a drop-in replacement for the GPT2Tokenizer. It has the same vocabulary size and special tokens, but was trained on a random 1M samples from [the pile of law](https://huggingface.co/datasets/pile-of-law/pile-of-law) train split.

 # Pile of Law Tokenizer
 This tokenizer should be a drop-in replacement for the GPT2Tokenizer. It has the same vocabulary size and special tokens, but was trained on a random 1M samples from [the pile of law](https://huggingface.co/datasets/pile-of-law/pile-of-law) train split.
+Usage:
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("sam-mosaic/pile-of-law-tokenizer")
+```