khulnasoft
commited on
Commit
•
147d921
1
Parent(s):
c7dddba
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,43 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
# Bengali Word2Vec Model
|
6 |
+
This is a pre-trained word2vec model for Bengali language.
|
7 |
+
|
8 |
+
This model is build for [bengalinlp](https://github.com/banglawiki/bengalinlp) package.
|
9 |
+
|
10 |
+
## Datasets
|
11 |
+
- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/)
|
12 |
+
|
13 |
+
## Training details
|
14 |
+
- Word2Vec word embedding dimension = 100, min_count=5, window=5, epochs=10
|
15 |
+
|
16 |
+
## Usage
|
17 |
+
- `pip install -U bengalinlp_toolkit`
|
18 |
+
- Generate Vector using pretrain model
|
19 |
+
|
20 |
+
```py
|
21 |
+
from bengalinlp import BengaliWord2Vec
|
22 |
+
|
23 |
+
bwv = BengaliWord2Vec()
|
24 |
+
model_path = "bengali_word2vec.model"
|
25 |
+
word = 'গ্রাম'
|
26 |
+
vector = bwv.generate_word_vector(model_path, word)
|
27 |
+
print(vector.shape)
|
28 |
+
print(vector)
|
29 |
+
|
30 |
+
```
|
31 |
+
|
32 |
+
- Find Most Similar Word Using Pretrained Model
|
33 |
+
|
34 |
+
```py
|
35 |
+
from bengalinlp import BengaliWord2Vec
|
36 |
+
|
37 |
+
bwv = BengaliWord2Vec()
|
38 |
+
model_path = "bengali_word2vec.model"
|
39 |
+
word = 'গ্রাম'
|
40 |
+
similar = bwv.most_similar(model_path, word, topn=10)
|
41 |
+
print(similar)
|
42 |
+
|
43 |
+
```
|