MarcosDib commited on
Commit
a854db5
·
1 Parent(s): 0d2996b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -83,7 +83,7 @@ Other 24 smaller models are released afterward.
83
  The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
84
 
85
  | Model | #params | Language |
86
- |------------------------------|--------------------|-------|
87
  | [`mcti-base-uncased`] | 110M | English |
88
  | [`mcti-large-uncased`] | 340M | English | sub
89
  | [`mcti-base-cased`] | 110M | English |
@@ -91,7 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
91
  | [`-base-multilingual-cased`] | 110M | Multiple |
92
 
93
  | Dataset | Compatibility to base* |
94
- |--------------------------------------|------------------------|
95
  | Labeled MCTI | 100% |
96
  | Full MCTI | 100% |
97
  | BBC News Articles | 56.77% |
@@ -202,14 +202,13 @@ The following assumptions were considered:
202
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
203
  - Pre-processing was investigated for the classification goal.
204
 
205
- From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL), a Notebook was developed in [Google Colab](colab.research.google.com)
206
- to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-
207
- processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), which also can be found on the project's GitHub.
208
 
209
  Several Python packages were used to develop the preprocessing code:
210
 
211
  | Objective | Package |
212
- |--------------------------------------------------------|--------------|
213
  | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
214
  | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
215
  | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
@@ -217,7 +216,7 @@ Several Python packages were used to develop the preprocessing code:
217
  | http library | [requests](https://pypi.org/project/requests) |
218
  | Training model | [scikit-learn](https://pypi.org/project/scikit-learn) |
219
  | Machine learning | [tensorflow](https://pypi.org/project/tensorflow) |
220
- | Machine learning | [keras](keras.io) |
221
  | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
222
 
223
 
@@ -225,7 +224,7 @@ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip
225
  bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
226
 
227
  | Base | Textos originais |
228
- |--------|--------------------------------------------------------------|
229
  | xp1 | Expandir Contrações |
230
  | xp2 | Expandir Contrações + Transformar texto em minúsculo |
231
  | xp3 | Expandir Contrações + Remover Pontuação |
 
83
  The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
84
 
85
  | Model | #params | Language |
86
+ |:----------------------------:|:-------:|:--------:|
87
  | [`mcti-base-uncased`] | 110M | English |
88
  | [`mcti-large-uncased`] | 340M | English | sub
89
  | [`mcti-base-cased`] | 110M | English |
 
91
  | [`-base-multilingual-cased`] | 110M | Multiple |
92
 
93
  | Dataset | Compatibility to base* |
94
+ |:------------------------------------:|:----------------------:|
95
  | Labeled MCTI | 100% |
96
  | Full MCTI | 100% |
97
  | BBC News Articles | 56.77% |
 
202
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
203
  - Pre-processing was investigated for the classification goal.
204
 
205
+ From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
206
+ to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 
207
 
208
  Several Python packages were used to develop the preprocessing code:
209
 
210
  | Objective | Package |
211
+ |:------------------------------------------------------:|:------------:|
212
  | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
213
  | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
214
  | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
 
216
  | http library | [requests](https://pypi.org/project/requests) |
217
  | Training model | [scikit-learn](https://pypi.org/project/scikit-learn) |
218
  | Machine learning | [tensorflow](https://pypi.org/project/tensorflow) |
219
+ | Machine learning | [keras](https://keras.io/) |
220
  | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
221
 
222
 
 
224
  bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
225
 
226
  | Base | Textos originais |
227
+ |:------:|:------------------------------------------------------------:|
228
  | xp1 | Expandir Contrações |
229
  | xp2 | Expandir Contrações + Transformar texto em minúsculo |
230
  | xp3 | Expandir Contrações + Remover Pontuação |