denis-berezutskiy-lad
commited on
Commit
•
c4c521d
1
Parent(s):
041913d
Update README layout.
Browse files
README.md
CHANGED
@@ -16,7 +16,9 @@ pipeline_tag: token-classification
|
|
16 |
|
17 |
This is a punctuator/capitalizer model for Russian language, trained via NeMo scripts (https://github.com/NVIDIA/NeMo) on a dataset of continuous professional transcriptions (mostly legislative instances, and some OpenSubtitles as well) - see dataset https://huggingface.co/datasets/denis-berezutskiy-lad/ru_transcription_punctuation for details.
|
18 |
|
19 |
-
Note that even though the model was prepaired using NeMo, the standard inference scripts of making result text don't work well with this model, because it has some advanced labels, which require custom handling. That's why a set of ipynb scripts was created (covers both the model training and inference as well as creating the above mentioned dataset):
|
|
|
|
|
20 |
|
21 |
The underlying base model is https://huggingface.co/DeepPavlov/rubert-base-cased-conversational
|
22 |
|
|
|
16 |
|
17 |
This is a punctuator/capitalizer model for Russian language, trained via NeMo scripts (https://github.com/NVIDIA/NeMo) on a dataset of continuous professional transcriptions (mostly legislative instances, and some OpenSubtitles as well) - see dataset https://huggingface.co/datasets/denis-berezutskiy-lad/ru_transcription_punctuation for details.
|
18 |
|
19 |
+
Note that even though the model was prepaired using NeMo, the standard inference scripts of making result text don't work well with this model, because it has some advanced labels, which require custom handling. That's why a set of ipynb scripts was created (covers both the model training and inference as well as creating the above mentioned dataset):
|
20 |
+
|
21 |
+
https://github.com/denis-berezutskiy-lad/transcription-bert-ru-punctuator-scripts/tree/main
|
22 |
|
23 |
The underlying base model is https://huggingface.co/DeepPavlov/rubert-base-cased-conversational
|
24 |
|