julien-c HF staff commited on
Commit
a362e53
1 Parent(s): 231e75d

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allegro/herbert-base-cased/README.md

Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: pl
3
+ tags:
4
+ - herbert
5
+ license: cc-by-sa-4.0
6
+ ---
7
+
8
+ # HerBERT
9
+ **[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** is a BERT-based Language Model trained on Polish Corpora
10
+ using MLM and SSO objectives with dynamic masking of whole words.
11
+ Model training and experiments were conducted with [transformers](https://github.com/huggingface/transformers) in version 2.9.
12
+
13
+ ## Tokenizer
14
+ The training dataset was tokenized into subwords using ``CharBPETokenizer`` a character level byte-pair encoding with
15
+ a vocabulary size of 50k tokens. The tokenizer itself was trained with a [tokenizers](https://github.com/huggingface/tokenizers) library.
16
+ We kindly encourage you to use the **Fast** version of tokenizer, namely ``HerbertTokenizerFast``.
17
+
18
+ ## HerBERT usage
19
+
20
+
21
+ Example code:
22
+ ```python
23
+ from transformers import AutoTokenizer, AutoModel
24
+
25
+ tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
26
+ model = AutoModel.from_pretrained("allegro/herbert-base-cased")
27
+
28
+ output = model(
29
+ **tokenizer.batch_encode_plus(
30
+ [
31
+ (
32
+ "A potem szedł środkiem drogi w kurzawie, bo zamiatał nogami, ślepy dziad prowadzony przez tłustego kundla na sznurku.",
33
+ "A potem leciał od lasu chłopak z butelką, ale ten ujrzawszy księdza przy drodze okrążył go z dala i biegł na przełaj pól do karczmy."
34
+ )
35
+ ],
36
+ padding='longest',
37
+ add_special_tokens=True,
38
+ return_tensors='pt'
39
+ )
40
+ )
41
+ ```
42
+
43
+
44
+ ## License
45
+ CC BY-SA 4.0
46
+
47
+
48
+ ## Authors
49
+ Model was trained by **Allegro Machine Learning Research** team.
50
+
51
+ You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>