Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ You can use this model directly with a pipeline for token classification :
|
|
32 |
|
33 |
## Training data
|
34 |
|
35 |
-
[
|
36 |
|
37 |
## Training procedure
|
38 |
|
@@ -44,7 +44,7 @@ python3 run_ner.py --pretrained_model_path models/cluecorpussmall_roberta_base_s
|
|
44 |
--train_path datasets/cluener2020/train.tsv \
|
45 |
--dev_path datasets/cluener2020/dev.tsv \
|
46 |
--label2id_path datasets/cluener2020/label2id.json \
|
47 |
-
--output_model_path models/
|
48 |
--learning_rate 3e-5 --batch_size 32 --epochs_num 5 --seq_length 512 \
|
49 |
--embedding word_pos_seg --encoder transformer --mask fully_visible
|
50 |
```
|
@@ -52,7 +52,7 @@ python3 run_ner.py --pretrained_model_path models/cluecorpussmall_roberta_base_s
|
|
52 |
Finally, we convert the pre-trained model into Huggingface's format:
|
53 |
|
54 |
```
|
55 |
-
python3 scripts/convert_bert_token_classification_from_uer_to_huggingface.py --input_model_path models/
|
56 |
--output_model_path pytorch_model.bin \
|
57 |
--layers_num 12
|
58 |
```
|
|
|
32 |
|
33 |
## Training data
|
34 |
|
35 |
+
[CLUENER2020](https://github.com/CLUEbenchmark/CLUENER2020) is used as training data. We only use the train set of the dataset.
|
36 |
|
37 |
## Training procedure
|
38 |
|
|
|
44 |
--train_path datasets/cluener2020/train.tsv \
|
45 |
--dev_path datasets/cluener2020/dev.tsv \
|
46 |
--label2id_path datasets/cluener2020/label2id.json \
|
47 |
+
--output_model_path models/cluener2020_ner_model.bin \
|
48 |
--learning_rate 3e-5 --batch_size 32 --epochs_num 5 --seq_length 512 \
|
49 |
--embedding word_pos_seg --encoder transformer --mask fully_visible
|
50 |
```
|
|
|
52 |
Finally, we convert the pre-trained model into Huggingface's format:
|
53 |
|
54 |
```
|
55 |
+
python3 scripts/convert_bert_token_classification_from_uer_to_huggingface.py --input_model_path models/cluener2020_ner_model.bin \
|
56 |
--output_model_path pytorch_model.bin \
|
57 |
--layers_num 12
|
58 |
```
|