Update README.md
Browse files
README.md
CHANGED
@@ -83,7 +83,7 @@ Other 24 smaller models are released afterward.
|
|
83 |
The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
|
84 |
|
85 |
| Model | #params | Language |
|
86 |
-
|
87 |
| [`mcti-base-uncased`] | 110M | English |
|
88 |
| [`mcti-large-uncased`] | 340M | English | sub
|
89 |
| [`mcti-base-cased`] | 110M | English |
|
@@ -91,7 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
|
|
91 |
| [`-base-multilingual-cased`] | 110M | Multiple |
|
92 |
|
93 |
| Dataset | Compatibility to base* |
|
94 |
-
|
95 |
| Labeled MCTI | 100% |
|
96 |
| Full MCTI | 100% |
|
97 |
| BBC News Articles | 56.77% |
|
@@ -202,14 +202,13 @@ The following assumptions were considered:
|
|
202 |
- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
|
203 |
- Pre-processing was investigated for the classification goal.
|
204 |
|
205 |
-
From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL), a Notebook was developed in [Google Colab](colab.research.google.com)
|
206 |
-
to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-
|
207 |
-
processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), which also can be found on the project's GitHub.
|
208 |
|
209 |
Several Python packages were used to develop the preprocessing code:
|
210 |
|
211 |
| Objective | Package |
|
212 |
-
|
213 |
| Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
|
214 |
| Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
|
215 |
| Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
|
@@ -217,7 +216,7 @@ Several Python packages were used to develop the preprocessing code:
|
|
217 |
| http library | [requests](https://pypi.org/project/requests) |
|
218 |
| Training model | [scikit-learn](https://pypi.org/project/scikit-learn) |
|
219 |
| Machine learning | [tensorflow](https://pypi.org/project/tensorflow) |
|
220 |
-
| Machine learning | [keras](keras.io) |
|
221 |
| Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
|
222 |
|
223 |
|
@@ -225,7 +224,7 @@ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip
|
|
225 |
bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
|
226 |
|
227 |
| Base | Textos originais |
|
228 |
-
|
229 |
| xp1 | Expandir Contrações |
|
230 |
| xp2 | Expandir Contrações + Transformar texto em minúsculo |
|
231 |
| xp3 | Expandir Contrações + Remover Pontuação |
|
|
|
83 |
The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
|
84 |
|
85 |
| Model | #params | Language |
|
86 |
+
|:----------------------------:|:-------:|:--------:|
|
87 |
| [`mcti-base-uncased`] | 110M | English |
|
88 |
| [`mcti-large-uncased`] | 340M | English | sub
|
89 |
| [`mcti-base-cased`] | 110M | English |
|
|
|
91 |
| [`-base-multilingual-cased`] | 110M | Multiple |
|
92 |
|
93 |
| Dataset | Compatibility to base* |
|
94 |
+
|:------------------------------------:|:----------------------:|
|
95 |
| Labeled MCTI | 100% |
|
96 |
| Full MCTI | 100% |
|
97 |
| BBC News Articles | 56.77% |
|
|
|
202 |
- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
|
203 |
- Pre-processing was investigated for the classification goal.
|
204 |
|
205 |
+
From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
|
206 |
+
to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
|
|
|
207 |
|
208 |
Several Python packages were used to develop the preprocessing code:
|
209 |
|
210 |
| Objective | Package |
|
211 |
+
|:------------------------------------------------------:|:------------:|
|
212 |
| Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
|
213 |
| Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
|
214 |
| Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
|
|
|
216 |
| http library | [requests](https://pypi.org/project/requests) |
|
217 |
| Training model | [scikit-learn](https://pypi.org/project/scikit-learn) |
|
218 |
| Machine learning | [tensorflow](https://pypi.org/project/tensorflow) |
|
219 |
+
| Machine learning | [keras](https://keras.io/) |
|
220 |
| Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
|
221 |
|
222 |
|
|
|
224 |
bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
|
225 |
|
226 |
| Base | Textos originais |
|
227 |
+
|:------:|:------------------------------------------------------------:|
|
228 |
| xp1 | Expandir Contrações |
|
229 |
| xp2 | Expandir Contrações + Transformar texto em minúsculo |
|
230 |
| xp3 | Expandir Contrações + Remover Pontuação |
|