freemt commited on
Commit
1ca37ad
1 Parent(s): 265100f

Update docs

Browse files
data/en.txt CHANGED
@@ -1,5 +1,5 @@
1
- [Young Warrior] Kingold(184283681) 2021-12-30 22:27:37
2
- It seems that the standalone version can
3
  omit the GUI and specify the two files to be aligned directly on the command line.
4
 
5
 
 
1
+ [Young Warrior] Kingold(...) 2021-12-30 22:27:37
2
+ It seems that the standalone version can
3
  omit the GUI and specify the two files to be aligned directly on the command line.
4
 
5
 
data/zh.txt CHANGED
@@ -1,4 +1,4 @@
1
- 【少侠】Kingold(184283681) 2021-12-30 22:27:37
2
  单机版貌似可以省略掉图形界面,直接
3
  命令行指定两个待对齐文件。
4
 
 
1
+ 【少侠】Kingold(...) 2021-12-30 22:27:37
2
  单机版貌似可以省略掉图形界面,直接
3
  命令行指定两个待对齐文件。
4
 
docs/build/doctrees/environment.pickle CHANGED
Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ
 
docs/build/doctrees/examples.doctree CHANGED
Binary files a/docs/build/doctrees/examples.doctree and b/docs/build/doctrees/examples.doctree differ
 
docs/build/doctrees/intro.doctree CHANGED
Binary files a/docs/build/doctrees/intro.doctree and b/docs/build/doctrees/intro.doctree differ
 
docs/build/html/_sources/examples.rst.txt CHANGED
@@ -3,6 +3,8 @@ Examples
3
 
4
  ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
5
 
 
 
6
  Installation/Usage:
7
  *******************
8
  As the package has not been published on PyPi yet, it CANNOT be installed using pip.
 
3
 
4
  ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
5
 
6
+ `gradio 3` (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.
7
+
8
  Installation/Usage:
9
  *******************
10
  As the package has not been published on PyPi yet, it CANNOT be installed using pip.
docs/build/html/_sources/intro.rst.txt CHANGED
@@ -18,4 +18,4 @@ Limitations
18
  Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
19
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
20
 
21
- An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.
 
18
  Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
19
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
20
 
21
+ An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.
docs/build/html/examples.html CHANGED
@@ -76,6 +76,7 @@
76
  <section id="examples">
77
  <h1>Examples<a class="headerlink" href="#examples" title="Permalink to this headline"></a></h1>
78
  <p><code class="docutils literal notranslate"><span class="pre">radiobee</span></code> has in-built examples. Just click one of the rows in the <code class="docutils literal notranslate"><span class="pre">Examples</span></code> table and click <code class="docutils literal notranslate"><span class="pre">Submit</span></code> to testrun.</p>
 
79
  <section id="installation-usage">
80
  <h2>Installation/Usage:<a class="headerlink" href="#installation-usage" title="Permalink to this headline"></a></h2>
81
  <p>As the package has not been published on PyPi yet, it CANNOT be installed using pip.</p>
 
76
  <section id="examples">
77
  <h1>Examples<a class="headerlink" href="#examples" title="Permalink to this headline"></a></h1>
78
  <p><code class="docutils literal notranslate"><span class="pre">radiobee</span></code> has in-built examples. Just click one of the rows in the <code class="docutils literal notranslate"><span class="pre">Examples</span></code> table and click <code class="docutils literal notranslate"><span class="pre">Submit</span></code> to testrun.</p>
79
+ <p><cite>gradio 3</cite> (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.</p>
80
  <section id="installation-usage">
81
  <h2>Installation/Usage:<a class="headerlink" href="#installation-usage" title="Permalink to this headline"></a></h2>
82
  <p>As the package has not been published on PyPi yet, it CANNOT be installed using pip.</p>
docs/build/html/intro.html CHANGED
@@ -87,7 +87,7 @@
87
  <h2>Limitations<a class="headerlink" href="#limitations" title="Permalink to this headline"></a></h2>
88
  <p>Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
89
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.</p>
90
- <p>An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.</p>
91
  </section>
92
  </section>
93
 
 
87
  <h2>Limitations<a class="headerlink" href="#limitations" title="Permalink to this headline"></a></h2>
88
  <p>Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
89
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.</p>
90
+ <p>An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.</p>
91
  </section>
92
  </section>
93
 
docs/build/html/searchindex.js CHANGED
@@ -1 +1 @@
1
- Search.setIndex({docnames:["examples","index","intro","modules","radiobee","userguide","userguide-zh"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":4,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":3,"sphinx.domains.rst":2,"sphinx.domains.std":2,sphinx:56},filenames:["examples.rst","index.rst","intro.rst","modules.rst","radiobee.rst","userguide.rst","userguide-zh.rst"],objects:{},objnames:{},objtypes:{},terms:{"1":[5,6],"10":2,"12":[5,6],"2":[5,6],"200":[5,6],"2000":[5,6],"3":2,"316287378":[5,6],"4":[5,6],"5":[],"500":6,"8":[5,6],"\u4e00\u822c\u65e0\u9700\u7406\u4f1a\u8fd9\u4e9b\u53c2\u6570":6,"\u4e2d\u82f1\u975e\u7a7a\u884c\u9650\u5236\u5728":6,"\u4e3a\u4e2d\u82f1\u6587\u6df7\u5408\u6587\u672c\u53ca\u8bd5\u7740\u5206\u79bb\u4e2d\u82f1\u6587":6,"\u4e3a\u7a7a\u767d\u65f6":6,"\u4e86\u89e3\u8fd9\u4e9b\u5bf9\u9f50\u5de5\u5177":6,"\u4ee5\u5185":6,"\u4ee5\u540e\u53ef\u80fd\u4f1a\u652f\u6301":6,"\u4f18\u8d28\u5bf9":6,"\u4f7f\u7528\u8bf4\u660e":1,"\u5176\u4ed6\u8bed\u8a00\u5bf9\u7684\u5bf9\u9f50":6,"\u5219\u4f1a\u89c6":6,"\u5219\u9650\u5236\u5728":6,"\u53e6\u4e00\u65b9\u9762":6,"\u53ef\u4ee5\u53f3\u51fb\u62f7\u51fa\u56fe\u7684\u94fe\u63a5\u7528\u6d4f\u89c8\u5668\u72ec\u7acb\u8bbf\u95ee\u62f7\u51fa\u6765\u7684\u94fe\u63a5\u6216\u53f3\u51fb\u5b58\u76d8\u518d\u7528\u770b\u56fe\u7a0b\u5e8f\u6253\u5f00\u5b58\u76d8\u7684\u56fe\u6587\u4ef6":6,"\u548c":6,"\u5acc\u56fe\u592a\u5c0f\u7684\u8bdd":6,"\u5b58\u4e0b\u6709\u5173\u53c2\u6570\u67e5\u770b\u6216\u901a\u77e5\u5f00\u53d1\u8005":6,"\u5bf9\u7ea6\u97005\u5206\u949f":6,"\u5feb\u5bf9\u6a21\u5f0f\u76ee\u524d\u4ec5\u652f\u6301\u4e2d\u82f1":6,"\u662f":6,"\u6700\u5c0f":6,"\u7136\u540e\u8fdb\u884c\u5bf9\u9f50":6,"\u7684\u5b6a\u751f\u5144\u5f1f":6,"\u7684\u5efa\u8bae\u503c":6,"\u76ee\u524d\u4ec5\u652f\u6301\u4e2d\u82f1":[],"\u76ee\u524d\u4ec5\u652f\u6301\u7eaf\u6587\u672c\u6587\u4ef6\u4e0a\u8f7d":6,"\u7b2c\u4e8c\u6b21\u4e0a\u8f7d\u6587\u4ef6\u524d\u8bf7\u70b9\u51fb":6,"\u7b49":6,"\u7b49\u683c\u5f0f":6,"\u82f1\u4e2d":6,"\u82f1\u4e2d\u5bf9\u9f50":6,"\u8bbe\u5927\u4e9b\u5219\u4f1a\u5f97\u5230\u5c11\u4e00\u4e9b\u5bf9\u9f50\u5bf9\u56e0\u4e3a\u53ef\u80fd\u9519\u5931\u4e86\u4e00\u4e9b":6,"\u8bbe\u5927\u4e9b\u6216":6,"\u8bbe\u5c0f\u4e9b\u53ef\u4ee5\u5f97\u5230\u66f4\u591a\u7684\u5bf9\u9f50\u5bf9\u4f46\u4e5f\u4f1a\u6709\u66f4\u591a":6,"\u8bbe\u5c0f\u4e9b\u6216":6,"\u8bef\u62a5\u5bf9":6,"\u8bf7\u52a0\u5165qq\u7fa4":6,"\u8fd0\u884c\u51fa\u9519\u65f6\u53ef\u4ee5\u70b9\u51fb":6,"\u9519\u8bef\u5224\u65ad\u4e3a\u5bf9\u9f50\u7684\u5bf9":6,"do":5,"new":5,As:0,For:0,If:[2,5],On:5,The:[2,5],To:5,about:5,ad:2,address:5,aim:2,align:[0,2,5,6],align_s:[1,3],align_text:[1,3],also:5,although:2,amend_avec:[1,3],an:2,app:[1,3],applic:2,approxim:2,ar:[2,5],attempt:5,been:[0,2],befor:5,better:5,blank:5,browser:5,built:0,bumblebe:[5,6],can:5,candid:5,cannot:0,cat:2,chines:5,clear:[5,6],click:[0,5],cmat2tset:[1,3],co:0,contact:2,content:3,copi:5,csv:[5,6],current:2,de:2,develop:[2,5],dl_type:[5,6],docterm_scor:[1,3],docx:[5,6],download:0,dual:2,dualtext:2,e:2,ebook:2,educ:2,en2zh:[1,3],en2zh_token:[1,3],en:[2,5],english:5,epsilon:[5,6],esp:[5,6],etc:[2,5],exampl:[1,2,5],experiment:2,fals:5,fast:2,file2text:[1,3],file:[5,6],files2df:[1,3],find:2,first:5,flag:[5,6],format:5,full:2,further:2,g:2,gen_aset:[1,3],gen_eps_minsampl:[1,3],gen_model:[1,3],gen_pset:[1,3],gen_row_align:[1,3],go:5,good:5,gradio:2,group:5,ha:[0,2],hand:5,have:5,help:2,here:[],how:1,html:[5,6],http:0,huggingfac:0,identifi:5,idf_typ:[5,6],imag:5,implement:2,index:1,inform:5,insert_spac:[1,3],instal:1,interfac:2,interpolate_pset:[1,3],introduct:1,introduec:2,ja:2,join:5,just:0,know:5,languag:2,languang:5,larger:5,later:5,laugnag:2,learn:2,left:5,limit:[1,5],line:5,lists2cmat:[1,3],loadtext:[1,3],look:5,machin:2,mai:5,mani:2,md:[5,6],mdx_e2c:[1,3],method:0,mikee:0,min_sampl:[5,6],minimum:5,minut:[],miss:5,mix:5,mode:2,modul:[1,3],more:5,motiv:1,need:5,non:5,norm:[5,6],normal:5,now:0,number:5,one:0,onli:2,onlin:0,open:5,other:[2,5],output:5,packag:[0,1,3],page:1,pair:[2,5],paragraph:2,particular:2,pdf:[5,6],per:[],permit:2,pip:0,pleas:5,plot_cmat:[1,3],plot_df:[1,3],posit:5,power:2,proced:5,process_upload:[1,3],properli:2,provid:2,publish:0,pure:5,pypi:0,python:2,qq:5,radiobe:[0,2,5,6],requir:2,result:5,right:5,row:0,ru:2,save:5,search:1,seg_text:[1,3],select:5,sentenc:2,separ:5,should:5,shuffle_s:[1,3],sibl:5,slow:2,smaller:5,smatrix:[1,3],someth:5,space:0,srt:[5,6],submit:[0,5],submodul:[1,3],subsequ:5,suggest:[0,5],support:[2,5],tab:5,tabl:0,tend:5,term:2,testrun:0,text:[2,5],tf_type:[5,6],them:5,time:2,tmx:2,touch:5,track:2,translat:2,treat:5,trim_df:[1,3],two:2,txt:[5,6],unless:5,upload:5,us:[0,1],usag:1,valu:5,version:0,wa:[],welcom:2,what:5,when:[2,5],willing:2,wrong:5,yet:0,you:[2,5],zh:[2,5],zip:0},titles:["Examples","Welcome to radiobee\u2019s documentation!","Introduction","radiobee","radiobee package","How to use","\u4f7f\u7528\u8bf4\u660e"],titleterms:{"\u4f7f\u7528\u8bf4\u660e":6,align_s:4,align_text:4,amend_avec:4,app:4,cmat2tset:4,content:[1,4],docterm_scor:4,document:1,en2zh:4,en2zh_token:4,exampl:0,file2text:4,files2df:4,gen_aset:4,gen_eps_minsampl:4,gen_model:4,gen_pset:4,gen_row_align:4,how:5,indic:1,insert_spac:4,instal:0,interpolate_pset:4,introduct:2,limit:2,lists2cmat:4,loadtext:4,mdx_e2c:4,modul:4,motiv:2,packag:4,plot_cmat:4,plot_df:4,process_upload:4,radiobe:[1,3,4],s:1,seg_text:4,shuffle_s:4,smatrix:4,submodul:4,tabl:1,trim_df:4,us:5,usag:0,welcom:1}})
 
1
+ Search.setIndex({docnames:["examples","index","intro","modules","radiobee","userguide","userguide-zh"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":4,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":3,"sphinx.domains.rst":2,"sphinx.domains.std":2,sphinx:56},filenames:["examples.rst","index.rst","intro.rst","modules.rst","radiobee.rst","userguide.rst","userguide-zh.rst"],objects:{},objnames:{},objtypes:{},terms:{"1":[5,6],"10":2,"12":[5,6],"2":[5,6],"200":[5,6],"2000":[5,6],"3":[0,2],"316287378":[5,6],"4":[5,6],"500":6,"8":[5,6],"\u4e00\u822c\u65e0\u9700\u7406\u4f1a\u8fd9\u4e9b\u53c2\u6570":6,"\u4e2d\u82f1\u975e\u7a7a\u884c\u9650\u5236\u5728":6,"\u4e3a\u4e2d\u82f1\u6587\u6df7\u5408\u6587\u672c\u53ca\u8bd5\u7740\u5206\u79bb\u4e2d\u82f1\u6587":6,"\u4e3a\u7a7a\u767d\u65f6":6,"\u4e86\u89e3\u8fd9\u4e9b\u5bf9\u9f50\u5de5\u5177":6,"\u4ee5\u5185":6,"\u4ee5\u540e\u53ef\u80fd\u4f1a\u652f\u6301":6,"\u4f18\u8d28\u5bf9":6,"\u4f7f\u7528\u8bf4\u660e":1,"\u5176\u4ed6\u8bed\u8a00\u5bf9\u7684\u5bf9\u9f50":6,"\u5219\u4f1a\u89c6":6,"\u5219\u9650\u5236\u5728":6,"\u53e6\u4e00\u65b9\u9762":6,"\u53ef\u4ee5\u53f3\u51fb\u62f7\u51fa\u56fe\u7684\u94fe\u63a5\u7528\u6d4f\u89c8\u5668\u72ec\u7acb\u8bbf\u95ee\u62f7\u51fa\u6765\u7684\u94fe\u63a5\u6216\u53f3\u51fb\u5b58\u76d8\u518d\u7528\u770b\u56fe\u7a0b\u5e8f\u6253\u5f00\u5b58\u76d8\u7684\u56fe\u6587\u4ef6":6,"\u548c":6,"\u5acc\u56fe\u592a\u5c0f\u7684\u8bdd":6,"\u5b58\u4e0b\u6709\u5173\u53c2\u6570\u67e5\u770b\u6216\u901a\u77e5\u5f00\u53d1\u8005":6,"\u5bf9\u7ea6\u97005\u5206\u949f":6,"\u5feb\u5bf9\u6a21\u5f0f\u76ee\u524d\u4ec5\u652f\u6301\u4e2d\u82f1":6,"\u662f":6,"\u6700\u5c0f":6,"\u7136\u540e\u8fdb\u884c\u5bf9\u9f50":6,"\u7684\u5b6a\u751f\u5144\u5f1f":6,"\u7684\u5efa\u8bae\u503c":6,"\u76ee\u524d\u4ec5\u652f\u6301\u7eaf\u6587\u672c\u6587\u4ef6\u4e0a\u8f7d":6,"\u7b2c\u4e8c\u6b21\u4e0a\u8f7d\u6587\u4ef6\u524d\u8bf7\u70b9\u51fb":6,"\u7b49":6,"\u7b49\u683c\u5f0f":6,"\u82f1\u4e2d":6,"\u82f1\u4e2d\u5bf9\u9f50":6,"\u8bbe\u5927\u4e9b\u5219\u4f1a\u5f97\u5230\u5c11\u4e00\u4e9b\u5bf9\u9f50\u5bf9\u56e0\u4e3a\u53ef\u80fd\u9519\u5931\u4e86\u4e00\u4e9b":6,"\u8bbe\u5927\u4e9b\u6216":6,"\u8bbe\u5c0f\u4e9b\u53ef\u4ee5\u5f97\u5230\u66f4\u591a\u7684\u5bf9\u9f50\u5bf9\u4f46\u4e5f\u4f1a\u6709\u66f4\u591a":6,"\u8bbe\u5c0f\u4e9b\u6216":6,"\u8bef\u62a5\u5bf9":6,"\u8bf7\u52a0\u5165qq\u7fa4":6,"\u8fd0\u884c\u51fa\u9519\u65f6\u53ef\u4ee5\u70b9\u51fb":6,"\u9519\u8bef\u5224\u65ad\u4e3a\u5bf9\u9f50\u7684\u5bf9":6,"do":5,"new":5,As:0,For:0,If:[2,5],On:5,The:[2,5],To:5,about:5,ad:2,address:5,aim:2,align:[0,2,5,6],align_s:[1,3],align_text:[1,3],also:5,although:2,amend_avec:[1,3],an:2,app:[1,3],applic:2,approxim:2,ar:[2,5],attempt:5,been:[0,2],befor:5,better:5,blank:5,browser:5,built:0,bumblebe:[5,6],can:5,candid:5,cannot:0,cat:2,chines:5,clear:[5,6],click:[0,5],cmat2tset:[1,3],co:0,contact:2,content:3,copi:5,csv:[5,6],current:2,de:2,develop:[2,5],dl_type:[5,6],docterm_scor:[1,3],docx:[5,6],download:0,dual:2,dualtext:2,e:2,ebook:2,educ:2,en2zh:[1,3],en2zh_token:[1,3],en:[2,5],english:5,epsilon:[5,6],esp:[5,6],etc:[2,5],exampl:[1,2,5],experiment:2,fals:5,fast:2,file2text:[1,3],file:[5,6],files2df:[1,3],find:2,first:5,fix:0,flag:[5,6],format:5,full:2,further:2,g:2,gen_aset:[1,3],gen_eps_minsampl:[1,3],gen_model:[1,3],gen_pset:[1,3],gen_row_align:[1,3],go:5,good:5,gradio:[0,2],group:5,ha:[0,2],hand:5,have:[0,5],help:2,henc:0,hf:0,how:1,html:[5,6],http:0,huggingfac:0,identifi:5,idf_typ:[5,6],imag:5,implement:2,index:1,inform:5,insert_spac:[1,3],instal:1,interfac:2,interpolate_pset:[1,3],introduc:2,introduct:1,ja:2,join:5,just:0,know:5,languag:2,languang:5,larger:5,later:5,laugnag:2,learn:2,left:5,limit:[1,5],line:[0,5],lists2cmat:[1,3],loadtext:[1,3],look:5,machin:2,mai:[0,5],mani:2,md:[5,6],mdx_e2c:[1,3],method:0,mikee:0,min_sampl:[5,6],minimum:5,miss:5,mix:5,mode:2,modul:[1,3],more:5,motiv:1,need:5,non:5,norm:[5,6],normal:5,now:0,number:5,off:0,one:0,onli:2,onlin:0,open:5,other:[2,5],output:5,packag:[0,1,3],page:1,pair:[2,5],paragraph:2,particular:2,pdf:[5,6],permit:2,pip:0,pleas:5,plot_cmat:[1,3],plot_df:[1,3],posit:5,power:2,problem:0,proced:5,process_upload:[1,3],properli:2,provid:2,publish:0,pure:5,pypi:0,python:2,qq:5,radiobe:[0,2,5,6],requir:2,result:5,right:5,row:0,ru:2,run:0,save:5,search:1,seem:0,seg_text:[1,3],select:5,sentenc:2,separ:5,should:5,shuffle_s:[1,3],sibl:5,slow:2,smaller:5,smatrix:[1,3],someth:5,space:0,srt:[5,6],submit:[0,5],submodul:[1,3],subsequ:5,suggest:[0,5],support:[2,5],tab:5,tabl:0,taken:0,tend:5,term:2,testrun:0,text:[2,5],tf_type:[5,6],them:5,time:2,tmx:2,touch:5,track:2,translat:2,treat:5,trim_df:[1,3],troubl:0,two:2,txt:[5,6],unless:5,until:0,upload:5,us:[0,1],usag:1,valu:5,version:0,welcom:2,what:5,when:[2,5],willing:2,wrong:5,yet:0,you:[2,5],zh:[2,5],zip:0},titles:["Examples","Welcome to radiobee\u2019s documentation!","Introduction","radiobee","radiobee package","How to use","\u4f7f\u7528\u8bf4\u660e"],titleterms:{"\u4f7f\u7528\u8bf4\u660e":6,align_s:4,align_text:4,amend_avec:4,app:4,cmat2tset:4,content:[1,4],docterm_scor:4,document:1,en2zh:4,en2zh_token:4,exampl:0,file2text:4,files2df:4,gen_aset:4,gen_eps_minsampl:4,gen_model:4,gen_pset:4,gen_row_align:4,how:5,indic:1,insert_spac:4,instal:0,interpolate_pset:4,introduct:2,limit:2,lists2cmat:4,loadtext:4,mdx_e2c:4,modul:4,motiv:2,packag:4,plot_cmat:4,plot_df:4,process_upload:4,radiobe:[1,3,4],s:1,seg_text:4,shuffle_s:4,smatrix:4,submodul:4,tabl:1,trim_df:4,us:5,usag:0,welcom:1}})
docs/source/examples.rst CHANGED
@@ -3,6 +3,8 @@ Examples
3
 
4
  ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
5
 
 
 
6
  Installation/Usage:
7
  *******************
8
  As the package has not been published on PyPi yet, it CANNOT be installed using pip.
 
3
 
4
  ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
5
 
6
+ `gradio 3` (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.
7
+
8
  Installation/Usage:
9
  *******************
10
  As the package has not been published on PyPi yet, it CANNOT be installed using pip.
docs/source/intro.rst CHANGED
@@ -18,4 +18,4 @@ Limitations
18
  Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
19
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
20
 
21
- An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.
 
18
  Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
19
  If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
20
 
21
+ An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.
radiobee/__init__.py CHANGED
@@ -1 +1,2 @@
1
  """Init."""
 
 
1
  """Init."""
2
+ __version__ = "0.1.0b"
radiobee/detect.py CHANGED
@@ -29,7 +29,7 @@ def with_func_attrs(**attrs: Any) -> Callable:
29
  def detect(text: str, set_languages: Optional[List[str]] = None) -> str:
30
  """Detect language via polyglot and fastlid.
31
 
32
- check first with fastlid, if conf < 0.3, check with
33
 
34
  Alternative in detec_alt.py
35
  """
 
29
  def detect(text: str, set_languages: Optional[List[str]] = None) -> str:
30
  """Detect language via polyglot and fastlid.
31
 
32
+ check first with fastlid, if conf < 0.3, check with polyglot.text.Detector
33
 
34
  Alternative in detec_alt.py
35
  """
radiobee/radiobee_cli.py ADDED
@@ -0,0 +1,545 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Run radiobee-cli, based on gradiobee.
2
+
3
+ https://stackoverflow.com/questions/71007924/how-can-i-get-a-version-to-the-root-of-a-typer-typer-application
4
+ """
5
+ # pylint: disable=invalid-name, too-many-arguments, too-many-branches, too-many-locals, too-many-statements, unused-variable, too-many-return-statements, unused-import
6
+
7
+ from typing import Optional
8
+ from pathlib import Path
9
+ import platform
10
+ import inspect
11
+ from itertools import zip_longest
12
+
13
+ # import tempfile
14
+
15
+ # from click import click
16
+ import typer
17
+ from sklearn.cluster import DBSCAN
18
+ from fastlid import fastlid
19
+ from logzero import logger
20
+ from icecream import ic
21
+
22
+ import numpy as np # noqa
23
+ import pandas as pd
24
+ import matplotlib # noqa
25
+ import matplotlib.pyplot as plt
26
+ import seaborn as sns
27
+
28
+ import sys
29
+ if "." not in sys.path:
30
+ sys.path.append(".")
31
+
32
+ # from radiobee.process_upload import process_upload
33
+ from radiobee.files2df import files2df
34
+ from radiobee.file2text import file2text
35
+ from radiobee.lists2cmat import lists2cmat
36
+ from radiobee.gen_pset import gen_pset
37
+ from radiobee.gen_aset import gen_aset
38
+ from radiobee.align_texts import align_texts
39
+ from radiobee.cmat2tset import cmat2tset
40
+ from radiobee.trim_df import trim_df
41
+ from radiobee.error_msg import error_msg
42
+ from radiobee.text2lists import text2lists
43
+
44
+ from radiobee.align_sents import align_sents
45
+ from radiobee.shuffle_sents import shuffle_sents # type: ignore
46
+ from radiobee.paras2sents import paras2sents # type: ignore
47
+ from radiobee import __version__
48
+
49
+ sns.set()
50
+ sns.set_style("darkgrid")
51
+ pd.options.display.float_format = "{:,.2f}".format
52
+
53
+ debug = False
54
+ debug = True
55
+
56
+ _ = """
57
+ def gradiobee( # noqa
58
+ file1,
59
+ file2,
60
+ tf_type,
61
+ idf_type,
62
+ dl_type,
63
+ norm,
64
+ eps,
65
+ min_samples,
66
+ # debug=False,
67
+ sent_ali_algo,
68
+ ):
69
+ # """
70
+
71
+ app = typer.Typer(
72
+ add_completion=False,
73
+ )
74
+
75
+
76
+ def version_callback(value: bool):
77
+ if value:
78
+ ver = typer.style(f"{__version__}", fg=typer.colors.GREEN, bold=True)
79
+ typer.echo(f"radiobee-cli {ver}")
80
+ raise typer.Exit()
81
+
82
+
83
+ @app.command()
84
+ def radiobee_cli(
85
+ file1: str = typer.Argument(..., help="first file name"),
86
+ file2: str = typer.Argument(None, help="optinal second file name (if not provided, the first file will be separated to two files)"),
87
+ tf_type: str = typer.Option("linear", help="tf type [linear, sqrt, log, binary]"),
88
+ idf_type: str = typer.Option(None, help="idf type [None, standard, smooth, bm25]"),
89
+ dl_type: str = typer.Option("", help="dl type [None, linear, sqrt, log]"),
90
+ norm: str = typer.Option("", help="norm [None, l1, l2]"),
91
+ eps: float = typer.Option(10, help="epsilon, typicaly between 1 and 20"),
92
+ min_samples: int = typer.Option(6, help="minimum samples, typicaly between 1 and 20"),
93
+ sent_ali_algo: str = typer.Option("", help="sentence align algorithm [None, fast, slow]"),
94
+ version: Optional[bool] = typer.Option(
95
+ None, "--version", "-V", callback=version_callback, is_eager=True,
96
+ ),
97
+ ):
98
+ """Align dualtext."""
99
+ logger.debug(" *debug* ")
100
+
101
+ # possible further switchse
102
+ # para_sent: para/sent
103
+ # sent_ali: default/radio/gale-church
104
+ plot_dia = True # noqa
105
+
106
+ # outputs: check return
107
+ # if outputs is modified, also need to modify error_msg's outputs
108
+
109
+ # convert "None" to None for those Radio types
110
+ for _ in [idf_type, dl_type, norm]:
111
+ if _ in "None":
112
+ _ = None
113
+
114
+ # logger.info("file1: *%s*, file2: *%s*", file1, file2)
115
+ if file2 is not None:
116
+ logger.info("file1.name: *%s*, file2.name: *%s*", file1.name, file2.name)
117
+ else:
118
+ logger.info("file1.name: *%s*, file2: *%s*", file1.name, file2)
119
+
120
+ # bypass if file1 or file2 is str input
121
+ # if not (isinstance(file1, str) or isinstance(file2, str)):
122
+ text1 = file2text(file1)
123
+
124
+ if file2 is None:
125
+ logger.debug("file2 is None")
126
+ text2 = ""
127
+ else:
128
+ logger.debug("file2.name: %s", file2.name)
129
+ text2 = file2text(file2)
130
+
131
+ # if not text1.strip() or not text2.strip():
132
+ if not text1.strip():
133
+ msg = (
134
+ "file 1 is apparently empty... Upload a none empty file and try again."
135
+ # f"text1[:10]: [{text1[:10]}], "
136
+ # f"text2[:10]: [{text2[:10]}]"
137
+ )
138
+ return error_msg(msg)
139
+
140
+ # single file
141
+ # when text2 is empty
142
+ # process file1/text1: split text1 to text1 text2 to zh-en
143
+
144
+ len_max = 2000
145
+ if not text2.strip(): # empty file2
146
+ _ = [elm.strip() for elm in text1.splitlines() if elm.strip()]
147
+ if not _: # essentially empty file1
148
+ return error_msg("Nothing worthy of processing in file 1")
149
+
150
+ logger.info(
151
+ "single file: len %s, max %s",
152
+ len(_), 2 * len_max
153
+ )
154
+ # exit if there are too many lines
155
+ if len(_) > 2 * len_max:
156
+ return error_msg(f" Too many lines ({len(_)}) > {2 * len_max}, alignment op halted, sorry.", "info")
157
+
158
+ _ = zip_longest(_, [""])
159
+ _ = pd.DataFrame(_, columns=["text1", "text2"])
160
+ df_trimmed = trim_df(_)
161
+
162
+ # text1 = loadtext("data/test-dual.txt")
163
+ list1, list2 = text2lists(text1)
164
+
165
+ lang1 = text2lists.lang1
166
+ lang2 = text2lists.lang2
167
+ offset = text2lists.offset # noqa
168
+
169
+ _ = """
170
+ ax = sns.heatmap(lists2cmat(list1, list2), cmap="gist_earth_r") # ax=plt.gca()
171
+ ax.invert_yaxis()
172
+ ax.set(
173
+ xlabel=lang1,
174
+ ylabel=lang2,
175
+ title=f"cos similary heatmap \n(offset={offset})",
176
+ )
177
+ plt_loc = "img/plt.png"
178
+ plt.savefig(plt_loc)
179
+ # """
180
+
181
+ # output_plot = plt_loc # for gr.outputs.Image
182
+
183
+ #
184
+ _ = zip_longest(list1, list2, fillvalue="")
185
+ df_aligned = pd.DataFrame(
186
+ _,
187
+ columns=["text1", "tex2"]
188
+ )
189
+
190
+ file_dl = Path(f"{Path(file1.name).stem[:-8]}-{lang1}-{lang2}.csv")
191
+ file_dl_xlsx = Path(
192
+ f"{Path(file1.name).stem[:-8]}-{lang1}-{lang2}.xlsx"
193
+ )
194
+
195
+ # return df_trimmed, output_plot, file_dl, file_dl_xlsx, df_aligned
196
+
197
+ # end if single file
198
+ # not single file
199
+ else: # file1 file 2: proceed
200
+ fastlid.set_languages = None
201
+ lang1, _ = fastlid(text1)
202
+ lang2, _ = fastlid(text2)
203
+
204
+ df1 = files2df(file1, file2)
205
+
206
+ list1 = [elm for elm in df1.text1 if elm]
207
+ list2 = [elm for elm in df1.text2 if elm]
208
+ # len1 = len(list1) # noqa
209
+ # len2 = len(list2) # noqa
210
+
211
+ # exit if there are too many lines
212
+ len12 = len(list1) + len(list2)
213
+ logger.info(
214
+ "fast track: len1 %s, len2 %s, tot %s, max %s",
215
+ len(list1), len(list2), len(list1) + len(list2), 3 * len_max
216
+ )
217
+ if len12 > 3 * len_max:
218
+ return error_msg(f" Too many lines ({len(list1)} + {len(list2)} > {3 * len_max}), alignment op halted, sorry.", "info")
219
+
220
+ file_dl = Path(f"{Path(file1.name).stem[:-8]}-{Path(file2.name).stem[:-8]}.csv")
221
+ file_dl_xlsx = Path(
222
+ f"{Path(file1.name).stem[:-8]}-{Path(file2.name).stem[:-8]}.xlsx"
223
+ )
224
+
225
+ df_trimmed = trim_df(df1)
226
+ # --- end else single
227
+
228
+ lang_en_zh = ["en", "zh"]
229
+
230
+ logger.debug("lang1: %s, lang2: %s", lang1, lang2)
231
+ if debug:
232
+ ic(f" lang1: {lang1}, lang2: {lang2}")
233
+ ic("fast track? ", lang1 in lang_en_zh and lang2 in lang_en_zh)
234
+
235
+ # fast track
236
+ if lang1 in lang_en_zh and lang2 in lang_en_zh:
237
+ try:
238
+ cmat = lists2cmat(
239
+ list1,
240
+ list2,
241
+ tf_type=tf_type,
242
+ idf_type=idf_type,
243
+ dl_type=dl_type,
244
+ norm=norm,
245
+ )
246
+ except Exception as exc:
247
+ logger.error(exc)
248
+ return error_msg(exc)
249
+ # slow track
250
+ else:
251
+ logger.info(
252
+ "slow track: len1 %s, len2 %s, tot: %s, max %s",
253
+ len(list1), len(list2), len(list1) + len(list2),
254
+ 3 * len_max
255
+ )
256
+ if len(list1) + len(list2) > 3 * len_max:
257
+ msg = (
258
+ f" len1 {len(list1)} + len2 {len(list2)} > {3 * len_max}. "
259
+ "This will take too long to complete "
260
+ "and will hog this experimental server and hinder "
261
+ "other users from trying the service. "
262
+ "Aborted...sorry"
263
+ )
264
+ return error_msg(msg, "info ")
265
+ try:
266
+ from radiobee.model_s import model_s # pylint: disable=import-outside-toplevel
267
+ vec1 = model_s.encode(list1)
268
+ vec2 = model_s.encode(list2)
269
+ # cmat = vec1.dot(vec2.T)
270
+ cmat = vec2.dot(vec1.T)
271
+ except Exception as exc:
272
+ logger.error(exc)
273
+ _ = inspect.currentframe().f_lineno # type: ignore
274
+ return error_msg(
275
+ f"{exc}, {Path(__file__).name} ln{_}, period"
276
+ )
277
+
278
+ tset = pd.DataFrame(cmat2tset(cmat))
279
+ tset.columns = ["x", "y", "cos"]
280
+
281
+ _ = """
282
+ df_trimmed = pd.concat(
283
+ [
284
+ df1.iloc[:4, :],
285
+ pd.DataFrame(
286
+ [
287
+ [
288
+ "...",
289
+ "...",
290
+ ]
291
+ ],
292
+ columns=df1.columns,
293
+ ),
294
+ df1.iloc[-4:, :],
295
+ ],
296
+ ignore_index=1,
297
+ )
298
+ # """
299
+
300
+ # process list1, list2 to obtained df_aligned
301
+ # quick fix ValueError: not enough values to unpack (expected at least 1, got 0)
302
+ # fixed in gen_pet, but we leave the loop here
303
+ for min_s in range(min_samples):
304
+ logger.info(" min_samples, using %s", min_samples - min_s)
305
+ try:
306
+ pset = gen_pset(
307
+ cmat,
308
+ eps=eps,
309
+ min_samples=min_samples - min_s,
310
+ delta=7,
311
+ )
312
+ break
313
+ except ValueError:
314
+ logger.info(" decrease min_samples by %s", min_s + 1)
315
+ continue
316
+ except Exception as e:
317
+ logger.error(e)
318
+ continue
319
+ else:
320
+ # break should happen above when min_samples = 2
321
+ raise Exception("bummer, this shouldn't happen, probably another bug")
322
+
323
+ min_samples = gen_pset.min_samples
324
+
325
+ # will result in error message:
326
+ # UserWarning: Starting a Matplotlib GUI outside of
327
+ # the main thread will likely fail."
328
+ _ = """
329
+ plot_cmat(
330
+ cmat,
331
+ eps=eps,
332
+ min_samples=min_samples,
333
+ xlabel=lang1,
334
+ ylabel=lang2,
335
+ )
336
+ # """
337
+
338
+ # move plot_cmat's code to the main thread here
339
+ # to make it work
340
+ xlabel = lang1
341
+ ylabel = lang2
342
+
343
+ len1, len2 = cmat.shape
344
+ ylim, xlim = len1, len2
345
+
346
+ # does not seem to show up
347
+ ic(f" len1 (ylim): {len1}, len2 (xlim): {len2}")
348
+ logger.debug(" len1 (ylim): %s, len2 (xlim): %s", len1, len2)
349
+ if debug:
350
+ print(f" len1 (ylim): {len1}, len2 (xlim): {len2}")
351
+
352
+ df_ = pd.DataFrame(cmat2tset(cmat))
353
+ df_.columns = ["x", "y", "cos"]
354
+
355
+ sns.set()
356
+ sns.set_style("darkgrid")
357
+
358
+ # close all existing figures, necesssary for hf spaces
359
+ plt.close("all")
360
+
361
+ # if sys.platform not in ["win32", "linux"]:
362
+ # going for noninterative
363
+ # to cater for Mac, thanks to WhiteFox
364
+ plt.switch_backend("Agg")
365
+
366
+ # figsize=(13, 8), (339, 212) mm on '1280x800+0+0'
367
+ fig = plt.figure(figsize=(13, 8))
368
+
369
+ # gs = fig.add_gridspec(2, 2, wspace=0.4, hspace=0.58)
370
+ gs = fig.add_gridspec(1, 2, wspace=0.4, hspace=0.58)
371
+ ax_heatmap = fig.add_subplot(gs[0, 0]) # ax2
372
+ ax0 = fig.add_subplot(gs[0, 1])
373
+ # ax1 = fig.add_subplot(gs[1, 0])
374
+
375
+ cmap = "viridis_r"
376
+ sns.heatmap(cmat, cmap=cmap, ax=ax_heatmap).invert_yaxis()
377
+ ax_heatmap.set_xlabel(xlabel)
378
+ ax_heatmap.set_ylabel(ylabel)
379
+ ax_heatmap.set_title("cos similarity heatmap")
380
+
381
+ fig.suptitle(f"alignment projection\n(eps={eps}, min_samples={min_samples})")
382
+
383
+ _ = DBSCAN(min_samples=min_samples, eps=eps).fit(df_).labels_ > -1
384
+
385
+ # _x = DBSCAN(min_samples=min_samples, eps=eps).fit(df_).labels_ < 0
386
+ _x = ~_
387
+
388
+ # max cos along columns
389
+ df_.plot.scatter("x", "y", c="cos", cmap=cmap, ax=ax0)
390
+
391
+ # outliers
392
+ df_[_x].plot.scatter("x", "y", c="r", marker="x", alpha=0.6, ax=ax0)
393
+ ax0.set_xlabel(xlabel)
394
+ ax0.set_ylabel(ylabel)
395
+ ax0.set_xlim(xmin=0, xmax=xlim)
396
+ ax0.set_ylim(ymin=0, ymax=ylim)
397
+ ax0.set_title(
398
+ "max along columns (x: outliers)\n"
399
+ "potential aligned pairs (green line), "
400
+ f"{round(sum(_) / xlim, 2):.0%}"
401
+ )
402
+
403
+ plt_loc = "img/plt.png"
404
+ ic(f" plotting to {plt_loc}")
405
+ plt.savefig(plt_loc)
406
+
407
+ # clustered
408
+ # df_[_].plot.scatter("x", "y", c="cos", cmap=cmap, ax=ax1)
409
+ # ax1.set_xlabel(xlabel)
410
+ # ax1.set_ylabel(ylabel)
411
+ # ax1.set_xlim(0, len1)
412
+ # ax1.set_title(f"potential aligned pairs ({round(sum(_) / len1, 2):.0%})")
413
+ # end of plot_cmat
414
+
415
+ src_len, tgt_len = cmat.shape
416
+ aset = gen_aset(pset, src_len, tgt_len)
417
+ final_list = align_texts(aset, list2, list1) # note the order
418
+
419
+ # df_aligned
420
+ df_aligned = pd.DataFrame(final_list, columns=["text1", "text2", "likelihood"])
421
+
422
+ # swap text1 text2
423
+ df_aligned = df_aligned[["text2", "text1", "likelihood"]]
424
+ df_aligned.columns = ["text1", "text2", "likelihood"]
425
+
426
+ ic("paras aligned: ", df_aligned.head(10))
427
+
428
+ # round the last column to 2
429
+ # df_aligned.likelihood = df_aligned.likelihood.round(2)
430
+ # df_aligned = df_aligned.round({"likelihood": 2})
431
+
432
+ # df_aligned.likelihood = df_aligned.likelihood.apply(lambda x: np.nan if x in [""] else x)
433
+
434
+ if len(df_aligned) > 200:
435
+ df_html = None
436
+ else: # show a one-bathc table in html
437
+ # style
438
+ styled = df_aligned.style.set_properties(
439
+ **{
440
+ "font-size": "10pt",
441
+ "border-color": "black",
442
+ "border": "1px black solid !important"
443
+ }
444
+ # border-color="black",
445
+ ).set_table_styles([{
446
+ "selector": "", # noqs
447
+ "props": [("border", "2px black solid !important")]}] # noqs
448
+ ).set_precision(2)
449
+
450
+ # .bar(subset="likelihood", color="#5fba7d")
451
+
452
+ # .background_gradient("Greys")
453
+
454
+ # df_html = df_aligned.to_html()
455
+ # df_html = styled.to_html()
456
+ df_html = styled.render()
457
+
458
+ # ===
459
+ if plot_dia:
460
+ output_plot = "img/plt.png"
461
+ else:
462
+ output_plot = None
463
+
464
+ _ = df_aligned.to_csv(index=False)
465
+ file_dl.write_text(_, encoding="utf8")
466
+
467
+ # file_dl.write_text(_, encoding="gb2312") # no go
468
+ df_aligned.to_excel(file_dl_xlsx)
469
+
470
+ # return df_trimmed, plt
471
+
472
+ # return df_trimmed, plt, file_dl, file_dl_xlsx, df_aligned
473
+
474
+ # output_plot: gr.outputs.Image(type="auto", label="...")
475
+ # return df_trimmed, output_plot, file_dl, file_dl_xlsx, df_aligned
476
+ # return df_trimmed, output_plot, file_dl, file_dl_xlsx, styled, df_html # gradio cant handle style
477
+
478
+ ic("sent-ali-algo: ", sent_ali_algo)
479
+
480
+ # ### sent-ali-algo is None: para align
481
+ if sent_ali_algo in ["None"]:
482
+ ic("returning para-ali outputs")
483
+ return df_trimmed, output_plot, file_dl, file_dl_xlsx, None, None, df_aligned, df_html
484
+
485
+ # ### proceed with sent align
486
+ if sent_ali_algo in ["fast"]:
487
+ ic(sent_ali_algo)
488
+ align_func = align_sents
489
+
490
+ ic(df_aligned.shape, df_aligned.columns)
491
+
492
+ aligned_sents = paras2sents(df_aligned, align_func)
493
+
494
+ # ic(pd.DataFrame(aligned_sents).shape, aligned_sents)
495
+ ic(pd.DataFrame(aligned_sents).shape)
496
+
497
+ df_aligned_sents = pd.DataFrame(aligned_sents, columns=["text1", "text2"])
498
+ else: # ["slow"]
499
+ ic(sent_ali_algo)
500
+ align_func = shuffle_sents
501
+ aligned_sents = paras2sents(df_aligned, align_func, lang1, lang2)
502
+
503
+ # add extra entry if necessary
504
+ aligned_sents = [list(sent) + [""] if len(sent) == 2 else list(sent) for sent in aligned_sents]
505
+
506
+ df_aligned_sents = pd.DataFrame(aligned_sents, columns=["text1", "text2", "likelihood"])
507
+
508
+ # prepare sents downloads
509
+ file_dl_sents = Path(f"{file_dl.stem}-sents{file_dl.suffix}")
510
+ file_dl_xlsx_sents = Path(f"{file_dl_xlsx.stem}-sents{file_dl_xlsx.suffix}")
511
+ _ = df_aligned_sents.to_csv(index=False)
512
+ file_dl_sents.write_text(_, encoding="utf8")
513
+
514
+ df_aligned_sents.to_excel(file_dl_xlsx_sents)
515
+
516
+ # prepare html output
517
+ if len(df_aligned_sents) > 200:
518
+ df_html = None
519
+ else: # show a one-bathc table in html
520
+ # style
521
+ styled = df_aligned_sents.style.set_properties(
522
+ **{
523
+ "font-size": "10pt",
524
+ "border-color": "black",
525
+ "border": "1px black solid !important"
526
+ }
527
+ # border-color="black",
528
+ ).set_table_styles([{
529
+ "selector": "", # noqs
530
+ "props": [("border", "2px black solid !important")]}] # noqs
531
+ ).format(
532
+ precision=2
533
+ )
534
+ df_html = styled.to_html()
535
+
536
+ # aligned sents outputs
537
+ ic("aligned sents outputs")
538
+
539
+ # return df_trimmed, output_plot, file_dl, file_dl_xlsx, None, None, df_aligned, df_html
540
+ return df_trimmed, output_plot, file_dl, file_dl_xlsx, file_dl_sents, file_dl_xlsx_sents, df_aligned_sents, df_html
541
+
542
+
543
+ if __name__ == "__main__":
544
+ # typer.run(radiobee_cli)
545
+ app()
radiobee/trim_df.py CHANGED
@@ -14,12 +14,8 @@ def trim_df(
14
  [
15
  df1.iloc[:len_, :],
16
  pd.DataFrame(
17
- [
18
- [
19
- "...",
20
- "...",
21
- ]
22
- ],
23
  columns=df1.columns,
24
  ),
25
  df1.iloc[-len_:, :],
 
14
  [
15
  df1.iloc[:len_, :],
16
  pd.DataFrame(
17
+ # [["...", "...",]],
18
+ [["..."] * len(df1.columns)],
 
 
 
 
19
  columns=df1.columns,
20
  ),
21
  df1.iloc[-len_:, :],
requirements.txt CHANGED
@@ -27,4 +27,7 @@ nltk
27
  sentence_splitter
28
  icecream
29
  # lazy
30
- alive-progress
 
 
 
 
27
  sentence_splitter
28
  icecream
29
  # lazy
30
+ alive-progress
31
+
32
+ # cli
33
+ click