Macropodus
commited on
Commit
•
129f455
1
Parent(s):
f428c90
Update README.md
Browse files
README.md
CHANGED
@@ -8,20 +8,30 @@ tags:
|
|
8 |
- synonym
|
9 |
---
|
10 |
# near-synonym
|
11 |
-
>>> near-synonym,
|
12 |
|
13 |
# 一、安装
|
14 |
-
|
15 |
-
0. 注意事项
|
16 |
默认不指定numpy版本(标准版numpy==1.20.4)
|
17 |
标准版本的依赖包详见 requirements-all.txt
|
18 |
|
19 |
-
1. 通过PyPI安装
|
|
|
20 |
pip install near-synonym
|
21 |
使用镜像源, 如:
|
22 |
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple near-synonym
|
|
|
|
|
23 |
```
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
# 二、使用方式
|
26 |
|
27 |
## 2.1 快速使用, 反义词, 近义词
|
@@ -37,10 +47,9 @@ print("近义词:")
|
|
37 |
print(word_synonyms)
|
38 |
"""
|
39 |
反义词:
|
40 |
-
[('讨厌', 0.
|
41 |
近义词:
|
42 |
-
[('
|
43 |
-
请输入word:
|
44 |
"""
|
45 |
```
|
46 |
|
@@ -54,7 +63,7 @@ word_antonyms = near_synonym.antonyms(word, topk=8, annk=256, annk_cpu=128, batc
|
|
54 |
rate_ann=0.4, rate_sim=0.4, rate_len=0.2, rounded=4, is_debug=False)
|
55 |
print("反义词:")
|
56 |
print(word_antonyms)
|
57 |
-
#
|
58 |
```
|
59 |
|
60 |
|
@@ -72,7 +81,17 @@ near-synonym, 中文反义词/近义词工具包.
|
|
72 |
|
73 |
## 3.2 TODO
|
74 |
```
|
75 |
-
1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
```
|
77 |
|
78 |
# 四、对比
|
@@ -104,6 +123,12 @@ near-synonym, 中文反义词/近义词工具包.
|
|
104 |
- [https://github.com/yongzhuo/Macropodus](https://github.com/yongzhuo/Macropodus)
|
105 |
- [https://github.com/chatopera/Synonyms](https://github.com/chatopera/Synonyms)
|
106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
# Reference
|
108 |
For citing this work, you can refer to the present GitHub project. For example, with BibTeX:
|
109 |
```
|
@@ -114,4 +139,5 @@ For citing this work, you can refer to the present GitHub project. For example,
|
|
114 |
publisher = {GitHub},
|
115 |
year = {2024}
|
116 |
}
|
117 |
-
```
|
|
|
|
8 |
- synonym
|
9 |
---
|
10 |
# near-synonym
|
11 |
+
>>> near-synonym, 中文反义词/近义词/同义词(antonym/synonym)工具包.
|
12 |
|
13 |
# 一、安装
|
14 |
+
## 1.1 注意事项
|
|
|
15 |
默认不指定numpy版本(标准版numpy==1.20.4)
|
16 |
标准版本的依赖包详见 requirements-all.txt
|
17 |
|
18 |
+
## 1.2 通过PyPI安装
|
19 |
+
```
|
20 |
pip install near-synonym
|
21 |
使用镜像源, 如:
|
22 |
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple near-synonym
|
23 |
+
不带依赖安装, 之后缺什么包再补充什么
|
24 |
+
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple near-synonym --no-dependencies
|
25 |
```
|
26 |
|
27 |
+
## 1.3 模型文件
|
28 |
+
- github项目源码自带模型文件只有1w+词向量, 完整模型文件在near_synonym/near_synonym_model,
|
29 |
+
- pip下载的软件包里边只有5w+词向量, 放在data目录下;
|
30 |
+
- 完整的词向量详见[huggingface](https://huggingface.co/)网站的[Macropodus/near_synonym_model](https://huggingface.co/Macropodus/near_synonym_model),
|
31 |
+
- 或完整的词向量详见百度网盘分享链接[https://pan.baidu.com/s/1lDSCtpr0r2hKrGrK8ZLlFQ](https://pan.baidu.com/s/1lDSCtpr0r2hKrGrK8ZLlFQ), 密码: ff0y
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
# 二、使用方式
|
36 |
|
37 |
## 2.1 快速使用, 反义词, 近义词
|
|
|
47 |
print(word_synonyms)
|
48 |
"""
|
49 |
反义词:
|
50 |
+
[('讨厌', 0.6857), ('厌恶', 0.5406), ('憎恶', 0.485), ('不喜欢', 0.4079), ('冷漠', 0.4051)]
|
51 |
近义词:
|
52 |
+
[('喜爱', 0.8813), ('爱好', 0.8193), ('感兴趣', 0.7399), ('赞赏', 0.6849), ('倾向', 0.6137)]
|
|
|
53 |
"""
|
54 |
```
|
55 |
|
|
|
63 |
rate_ann=0.4, rate_sim=0.4, rate_len=0.2, rounded=4, is_debug=False)
|
64 |
print("反义词:")
|
65 |
print(word_antonyms)
|
66 |
+
# 当前版本速度很慢, 召回数量annk_cpu/annk可以调小
|
67 |
```
|
68 |
|
69 |
|
|
|
81 |
|
82 |
## 3.2 TODO
|
83 |
```
|
84 |
+
1. 推理加速, 训练小的NLI模型, 替换掉笨重且不太合适的roformer-sim-ft;【20240320已完成ERNIE-SIM,但转为ONNX为340M太大, 考虑浅层网络, 转第四点4.】
|
85 |
+
2. 使用大模型构建更多的NLI语料;
|
86 |
+
3. 使用大模型直接生成近义词, 同义词表, 用于前置索引+训练相似度;【20240407已完成】
|
87 |
+
4. 近义词反义词识别考虑使用经典NLP分类模型, text_cnn/text-rcnn, 基于字向量;【do-ing, 仿transformers写config/tokenizer/model, 方便余预训练模型集成】
|
88 |
+
5. word2vec召回不太行, 考虑直接使用大模型qwen1.5-0.5b生成;
|
89 |
+
```
|
90 |
+
|
91 |
+
## 3.3 其他实验
|
92 |
+
```
|
93 |
+
fail, 使用情感识别, 取得不同情感下的词语(失败, 例如可爱/漂亮同为积极情感);
|
94 |
+
fail, 使用NLI自然推理, 已有的语料是句子, 不是太适配;
|
95 |
```
|
96 |
|
97 |
# 四、对比
|
|
|
123 |
- [https://github.com/yongzhuo/Macropodus](https://github.com/yongzhuo/Macropodus)
|
124 |
- [https://github.com/chatopera/Synonyms](https://github.com/chatopera/Synonyms)
|
125 |
|
126 |
+
# 六、日历
|
127 |
+
```
|
128 |
+
2024.04.07, qwen-7b-chat模型构建28w+词典的近义词/反义词表, 即ci_atmnonym_synonym.json, v0.1.0版本(使用huggface_hub下载数据);
|
129 |
+
2024.03.14, 初始化near-synonym, v0.0.3版本;
|
130 |
+
```
|
131 |
+
|
132 |
# Reference
|
133 |
For citing this work, you can refer to the present GitHub project. For example, with BibTeX:
|
134 |
```
|
|
|
139 |
publisher = {GitHub},
|
140 |
year = {2024}
|
141 |
}
|
142 |
+
```
|
143 |
+
|