csukuangfj
commited on
Commit
•
cc58f0c
1
Parent(s):
fee6ff5
update
Browse files- .gitattributes +2 -0
- date.fst +0 -0
- dict/README.md +31 -0
- dict/hmm_model.utf8 +3 -0
- dict/idf.utf8 +3 -0
- dict/jieba.dict.utf8 +3 -0
- dict/pos_dict/char_state_tab.utf8 +3 -0
- dict/pos_dict/prob_emit.utf8 +3 -0
- dict/pos_dict/prob_start.utf8 +3 -0
- dict/pos_dict/prob_trans.utf8 +3 -0
- dict/stop_words.utf8 +3 -0
- dict/user.dict.utf8 +3 -0
- lexicon.txt +47 -1
- new_heteronym.fst +0 -0
- rule.fst → number.fst +0 -0
- phone.fst +0 -0
- rule.far +3 -0
- vits-hf-zh-jp-zomehwh.onnx +2 -2
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.far filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.utf8 filter=lfs diff=lfs merge=lfs -text
|
date.fst
ADDED
dict/README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CppJieba字典
|
2 |
+
|
3 |
+
文件后缀名代表的是词典的编码方式。
|
4 |
+
比如filename.utf8 是 utf8编码,filename.gbk 是 gbk编码方式。
|
5 |
+
|
6 |
+
|
7 |
+
## 分词
|
8 |
+
|
9 |
+
### jieba.dict.utf8/gbk
|
10 |
+
|
11 |
+
作为最大概率法(MPSegment: Max Probability)分词所使用的词典。
|
12 |
+
|
13 |
+
### hmm_model.utf8/gbk
|
14 |
+
|
15 |
+
作为隐式马尔科夫模型(HMMSegment: Hidden Markov Model)分词所使用的词典。
|
16 |
+
|
17 |
+
__对于MixSegment(混合MPSegment和HMMSegment两者)则同时使用以上两个词典__
|
18 |
+
|
19 |
+
|
20 |
+
## 关键词抽取
|
21 |
+
|
22 |
+
### idf.utf8
|
23 |
+
|
24 |
+
IDF(Inverse Document Frequency)
|
25 |
+
在KeywordExtractor中,使用的是经典的TF-IDF算法,所以需要这么一个词典提供IDF信息。
|
26 |
+
|
27 |
+
### stop_words.utf8
|
28 |
+
|
29 |
+
停用词词典
|
30 |
+
|
31 |
+
|
dict/hmm_model.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f17790586ac86dd048c8adffed052c4bd2b28ed0682972c1275e59040c0589a7
|
3 |
+
size 519739
|
dict/idf.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dbd1e03d72b2263cc8d84a4304ed77677eed9e7deaf43a1a5133bbba9733b535
|
3 |
+
size 5998717
|
dict/jieba.dict.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3043b77068e09c9904f27cad82f12b6ebe9dbdb5aeff3b25e45ab7f9c1122b55
|
3 |
+
size 5071204
|
dict/pos_dict/char_state_tab.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:28b7be1dd7369766a51445af4d42e9a2ba4bf374c13be5bc1ca7721e27271dbb
|
3 |
+
size 327139
|
dict/pos_dict/prob_emit.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c33c4cb7edf3b3a5947df7209b6e9f267eae1f21335d9e2bd2521ea07105457a
|
3 |
+
size 1687686
|
dict/pos_dict/prob_start.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:13623ea0e9300bdb597cb2da28770b7b385d6c0098d66e516083fb01b6bd5d96
|
3 |
+
size 4347
|
dict/pos_dict/prob_trans.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f22363e2307408293d180c6f9f6b5cb75879d52f722f7764fa2d3d0ae2400236
|
3 |
+
size 124159
|
dict/stop_words.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b788b8a939d2e2fe079abd579ea98f12f9fb84370bfd0dddd81bb9381f7ab42c
|
3 |
+
size 8974
|
dict/user.dict.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:495bbf49270408a1234690e1e6a97328f30a482a7a72aa769e8a12e8714b0c62
|
3 |
+
size 49
|
lexicon.txt
CHANGED
@@ -21285,6 +21285,7 @@
|
|
21285 |
頻
|
21286 |
恵
|
21287 |
𤋮
|
|
|
21288 |
阿胶 ə ↓ ʧ ⁼ y a u →
|
21289 |
挨打 a i ↑ t ⁼ a ↓ ↑
|
21290 |
拗口 a u ↓ k ʰ o u ↓ ↑
|
@@ -21302,6 +21303,21 @@
|
|
21302 |
屏气 p ⁼ i N g ↓ ↑ ʧ ʰ i ↓
|
21303 |
屏住 p ⁼ i N g ↓ ↑ ʦ ` ⁼ u ↓
|
21304 |
漂泊 p ʰ y a u → p ⁼ w o ↑
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21305 |
曝光 p ⁼ a u ↓ k ⁼ w a N g →
|
21306 |
长江 ʦ ` ʰ a N g ↑ ʧ ⁼ y a N g →
|
21307 |
长短 ʦ ` ʰ a N g ↑ t ⁼ w a N N ↓ ↑
|
@@ -21439,7 +21455,7 @@
|
|
21439 |
都雅 t ⁼ u → y a ↓ ↑
|
21440 |
都会 t ⁼ u → h w e i ↓
|
21441 |
都为一集 t ⁼ u → w e i ↓ i → ʧ ⁼ i ↑
|
21442 |
-
都卿相 t ⁼ u → ʧ ʰ i N g → ʃ y a N g
|
21443 |
上当 s ` a N g ↓ t ⁼ a N g ↓
|
21444 |
当铺 t ⁼ a N g ↓ p ʰ u ↓
|
21445 |
弹射 t ʰ a N N ↑ s ` ə ↓
|
@@ -21601,6 +21617,7 @@
|
|
21601 |
几乎 ʧ ⁼ i → h u →
|
21602 |
考卷 k ʰ a u ↓ ↑ ʧ ⁼ ɥ a N N ↓
|
21603 |
试卷 s ` ɹ ` ↓ ʧ ⁼ ɥ a N N ↓
|
|
|
21604 |
倔强 ʧ ⁼ ɥ e ↑ ʧ ⁼ y a N g ↓
|
21605 |
强嘴 ʧ ⁼ y a N g ↓ ʦ ⁼ w e i ↓ ↑
|
21606 |
校对 ʧ ⁼ y a u ↓ t ⁼ w e i ↓
|
@@ -21720,6 +21737,34 @@
|
|
21720 |
失调 s ` ɹ ` → t ʰ y a u ↑
|
21721 |
可恶 k ʰ ə ↓ ↑ u ↓
|
21722 |
厌恶 y e N N ↓ u ↓
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21723 |
高兴 k ⁼ a u → ʃ i N g ↓
|
21724 |
兴致勃勃 ʃ i N g ↓ ʦ ` ⁼ ɹ ` ↓ p ⁼ w o ↑ p ⁼ w o ↑
|
21725 |
兴奋 ʃ i N g → f ə N N ↓
|
@@ -21735,6 +21780,7 @@
|
|
21735 |
音乐 i N N → ɥ e ↓
|
21736 |
乐曲 ɥ e ↓ ʧ ʰ ɥ →
|
21737 |
自怨自艾 ʦ ⁼ ɹ ↓ ɥ a N N ↓ ʦ ⁼ ɹ ↓ i ↓
|
|
|
21738 |
朝气 ʦ ` ⁼ a u → ʧ ʰ i ↓
|
21739 |
着数 ʦ ` ⁼ a u → s ` u ↓
|
21740 |
中国 ʦ ` ⁼ u N g → k ⁼ w o ↑
|
|
|
21285 |
頻
|
21286 |
恵
|
21287 |
𤋮
|
21288 |
+
kaldi k ʰ a ↓ ↑ ə ɹ ` ↓ ↑ t ⁼ i ↑
|
21289 |
阿胶 ə ↓ ʧ ⁼ y a u →
|
21290 |
挨打 a i ↑ t ⁼ a ↓ ↑
|
21291 |
拗口 a u ↓ k ʰ o u ↓ ↑
|
|
|
21303 |
屏气 p ⁼ i N g ↓ ↑ ʧ ʰ i ↓
|
21304 |
屏住 p ⁼ i N g ↓ ↑ ʦ ` ⁼ u ↓
|
21305 |
漂泊 p ʰ y a u → p ⁼ w o ↑
|
21306 |
+
淡泊 t ⁼ a N N ↓ p ⁼ w o ↑
|
21307 |
+
晚泊 w a N N ↓ ↑ p ⁼ w o ↑
|
21308 |
+
憩泊 ʧ ʰ i ↓ p ⁼ w o ↑
|
21309 |
+
夜泊 y e ↓ p ⁼ w o ↑
|
21310 |
+
泊船 p ⁼ w o ↑ ʦ ` ʰ w a N N ↑
|
21311 |
+
船泊 ʦ ` ʰ w a N N ↑ p ⁼ w o ↑
|
21312 |
+
泊舟 p ⁼ w o ↑ ʦ ` ⁼ o u →
|
21313 |
+
泊步 p ⁼ w o ↑ p ⁼ u ↓
|
21314 |
+
泊主 p ⁼ w o ↑ ʦ ` ⁼ u ↓ ↑
|
21315 |
+
泊车 p ⁼ w o ↑ ʦ ` ʰ ə →
|
21316 |
+
泊位 p ⁼ w o ↑ w e i ↓
|
21317 |
+
泊如 p ⁼ w o ↑ ɹ ` u ↑
|
21318 |
+
泊礼 p ⁼ w o ↑ l i ↓ ↑
|
21319 |
+
厚泊 h o u ↓ p ⁼ w o ↑
|
21320 |
+
停位 t ʰ i N g ↑ w e i ↓
|
21321 |
曝光 p ⁼ a u ↓ k ⁼ w a N g →
|
21322 |
长江 ʦ ` ʰ a N g ↑ ʧ ⁼ y a N g →
|
21323 |
长短 ʦ ` ʰ a N g ↑ t ⁼ w a N N ↓ ↑
|
|
|
21455 |
都雅 t ⁼ u → y a ↓ ↑
|
21456 |
都会 t ⁼ u → h w e i ↓
|
21457 |
都为一集 t ⁼ u → w e i ↓ i → ʧ ⁼ i ↑
|
21458 |
+
都卿相 t ⁼ u → ʧ ʰ i N g → ʃ y a N g ↓
|
21459 |
上当 s ` a N g ↓ t ⁼ a N g ↓
|
21460 |
当铺 t ⁼ a N g ↓ p ʰ u ↓
|
21461 |
弹射 t ʰ a N N ↑ s ` ə ↓
|
|
|
21617 |
几乎 ʧ ⁼ i → h u →
|
21618 |
考卷 k ʰ a u ↓ ↑ ʧ ⁼ ɥ a N N ↓
|
21619 |
试卷 s ` ɹ ` ↓ ʧ ⁼ ɥ a N N ↓
|
21620 |
+
答卷 t ⁼ a ↑ ʧ ⁼ ɥ a N N ↓
|
21621 |
倔强 ʧ ⁼ ɥ e ↑ ʧ ⁼ y a N g ↓
|
21622 |
强嘴 ʧ ⁼ y a N g ↓ ʦ ⁼ w e i ↓ ↑
|
21623 |
校对 ʧ ⁼ y a u ↓ t ⁼ w e i ↓
|
|
|
21737 |
失调 s ` ɹ ` → t ʰ y a u ↑
|
21738 |
可恶 k ʰ ə ↓ ↑ u ↓
|
21739 |
厌恶 y e N N ↓ u ↓
|
21740 |
+
相貌 ʃ y a N g ↓ m a u ↓
|
21741 |
+
照相 ʦ ` ⁼ a u ↓ ʃ y a N g ↓
|
21742 |
+
凶相 ʃ y u N g → ʃ y a N g ↓
|
21743 |
+
可怜相 k ʰ ə ↓ ↑ l y e N N ↑ ʃ y a N g ↓
|
21744 |
+
月相 ɥ e ↓ ʃ y a N g ↓
|
21745 |
+
金相 ʧ ⁼ i N N → ʃ y a N g ↓
|
21746 |
+
相面 ʃ y a N g ↓ m y e N N ↓
|
21747 |
+
相术 ʃ y a N g ↓ s ` u ↓
|
21748 |
+
相扑 ʃ y a N g ↓ p ʰ u →
|
21749 |
+
相声 ʃ y a N g ↓ s ` ə N g →
|
21750 |
+
相士 ʃ y a N g ↓ s ` ɹ ` ↓
|
21751 |
+
相体裁衣 ʃ y a N g ↓ t ʰ i ↓ ↑ ʦ ʰ a i ↑ i →
|
21752 |
+
相册 ʃ y a N g ↓ ʦ ʰ ə ↓
|
21753 |
+
相图 ʃ y a N g ↓ t ʰ u ↑
|
21754 |
+
相纸 ʃ y a N g ↓ ʦ ` ⁼ ɹ ` ↓ ↑
|
21755 |
+
相公 ʃ y a N g ↓ k ⁼ u N g →
|
21756 |
+
相君 ʃ y a N g ↓ ʧ ⁼ ɥ ə N N →
|
21757 |
+
相机 ʃ y a N g ↓ ʧ ⁼ i →
|
21758 |
+
相里 ʃ y a N g ↓ l i ↓ ↑
|
21759 |
+
相马 ʃ y a N g ↓ m a ↓ ↑
|
21760 |
+
丞相 ʦ ` ʰ ə N g ↑ ʃ y a N g ↓
|
21761 |
+
辅相 f u ↓ ↑ ʃ y a N g ↓
|
21762 |
+
宰相 ʦ ⁼ a i ↓ ↑ ʃ y a N g ↓
|
21763 |
+
首相 s ` o u ↓ ↑ ʃ y a N g ↓
|
21764 |
+
相态 ʃ y a N g ↓ t ʰ a i ↓
|
21765 |
+
相角 ʃ y a N g ↓ ʧ ⁼ y a u ↓ ↑
|
21766 |
+
相位 ʃ y a N g ↓ w e i ↓
|
21767 |
+
看相 k ʰ a N N ↓ ʃ y a N g ↓
|
21768 |
高兴 k ⁼ a u → ʃ i N g ↓
|
21769 |
兴致勃勃 ʃ i N g ↓ ʦ ` ⁼ ɹ ` ↓ p ⁼ w o ↑ p ⁼ w o ↑
|
21770 |
兴奋 ʃ i N g → f ə N N ↓
|
|
|
21780 |
音乐 i N N → ɥ e ↓
|
21781 |
乐曲 ɥ e ↓ ʧ ʰ ɥ →
|
21782 |
自怨自艾 ʦ ⁼ ɹ ↓ ɥ a N N ↓ ʦ ⁼ ɹ ↓ i ↓
|
21783 |
+
委蛇 w e i ↓ ↑ i ↑
|
21784 |
朝气 ʦ ` ⁼ a u → ʧ ʰ i ↓
|
21785 |
着数 ʦ ` ⁼ a u → s ` u ↓
|
21786 |
中国 ʦ ` ⁼ u N g → k ⁼ w o ↑
|
new_heteronym.fst
ADDED
rule.fst → number.fst
RENAMED
File without changes
|
phone.fst
ADDED
rule.far
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b090ed05e333fe125b62d8b1de5f1a1d4579fb237c606e7ac0b84707c863a01f
|
3 |
+
size 180717014
|
vits-hf-zh-jp-zomehwh.onnx
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:227f68cec4ee79e80845b2f7bbb9f9f654b95cab1d69be54e78af46c33e07935
|
3 |
+
size 121899993
|