|
--- |
|
datasets: |
|
- armvectores/hy_wikipedia_2023 |
|
pipeline_tag: feature-extraction |
|
language: |
|
- hy |
|
library_name: fasttext |
|
--- |
|
|
|
414M tokens |
|
1) 73M hy wikipedia |
|
2) 341M arlis database |
|
|
|
74951 unique words |
|
|
|
3-5 ngrams |
|
|
|
5 window length |
|
|
|
300 embedding dim |
|
|
|
skipgram |
|
|
|
minimum number of words 150 |
|
|
|
100 epochs, 0.05 start lr |
|
|
|
26 hours on 20 xeon gold cores |
|
|
|
How to use |
|
|
|
1) Install fastText |
|
|
|
``` |
|
pip install fasttext-wheel |
|
``` |
|
|
|
2) Import fastText in python |
|
|
|
``` |
|
import fasttext |
|
|
|
model = fasttext.load_model('output.bin') |
|
|
|
model.get_nearest_neighbors('զենքեր') |
|
|
|
``` |