armvectores
/

wikipedia_arlis_tokens_fasttextskipgram_300_5

Feature Extraction

Model card Files Files and versions Community

wikipedia_arlis_tokens_fasttextskipgram_300_5 / README.md

dkagramanyan's picture

Update README.md

89f6039 verified 3 months ago

|

history blame contribute delete

No virus

883 Bytes

	---
	datasets:
	- armvectores/hy_wikipedia_2023
	pipeline_tag: feature-extraction
	language:
	- hy
	library_name: fasttext
	---

	414M tokens
	1) 73M hy wikipedia
	2) 341M arlis database

	74951 unique words

	3-5 ngrams

	5 window length

	300 embedding dim

	skipgram

	minimum number of words 150

	100 epochs, 0.05 start lr

	26 hours on 20 xeon gold cores

	How to use

	1) Install fastText

	```
	pip install fasttext-wheel
	```

	2) Import fastText in python

	```
	import fasttext
	from huggingface_hub import hf_hub_download

	model_path = hf_hub_download(local_dir=".",
	repo_id="armvectores/wikipedia_arlis_tokens_fasttextskipgram_300_5",
	filename="model.bin")
	model = fasttext.load_model(model_path)

	```

	3) Examples of usage

	```
	word = 'զենքեր'
	print(model.get_nearest_neighbors(word))
	print(model.get_sentence_vector(word))

	```