dxl64 commited on
Commit
22c0e1b
·
verified ·
1 Parent(s): d9baaf5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -91
README.md CHANGED
@@ -1,91 +1,13 @@
1
- # VNSpellCorrection
2
-
3
- ## Environment setup using Conda
4
- ```
5
- conda create -n spcr python=3.9.5
6
- conda activate spcr
7
- ```
8
- Install libraries
9
- ```
10
- pip install -r requirements.txt
11
- ```
12
-
13
- ## To Use Our Trained Model
14
-
15
- Download the following `vocab` and `weights` file
16
-
17
- Set up folder **data** as follow
18
-
19
- .
20
- ├── ...
21
- ├── models
22
- ├── data
23
- │ ├── binhvq
24
- │ └── binhvq.vocab.pkl
25
- │ ├── checkpoints
26
- │ └── tfmwtr
27
- │ └── locdx.weights.pth
28
- ├── utils
29
- └── ...
30
-
31
- And then start the `Flask` server
32
- ```
33
- python server.py
34
- ```
35
- Go to [localhost:8000](localhost:8000) to use the website
36
-
37
- ![Screenshot](Figure/Website.png)
38
-
39
- ## To Train Model From Scratch
40
-
41
- Prepare a corpus file `corpus.txt` and put as folowing structure. Sample file in the folder `sample`.
42
-
43
- .
44
- ├── ...
45
- ├── models
46
- ├── data
47
- │ ├── binhvq
48
- │ └── corpus.txt
49
- ├── utils
50
- └── ...
51
-
52
- Start prepare data by
53
- ```
54
- cd dataset
55
- python prepare_dataset.py --corpus binhvq --file corpus.txt
56
- cleandata.sh binhvq
57
- ```
58
- Start training by
59
- ```
60
- python train.py
61
- ```
62
- ## To Evaluate Model
63
- Evaluate on generated dataset.
64
- ```
65
- python correct.py
66
- ```
67
- Evaluate on VSEC public dataset. First need to download `VSEC.jsonl` at https://github.com/VSEC2021/VSEC and setup folder as follow
68
-
69
- .
70
- ├── ...
71
- ├── models
72
- ├── data
73
- │ ├── vsec
74
- │ └── VSEC.jsonl
75
- ├── utils
76
- └── ...
77
-
78
- Start prepare VSEC data.
79
- ```
80
- cd dataset
81
- python prepare_vsec.py
82
- ```
83
-
84
- ```
85
- python correct.py --test_data vsec
86
- ```
87
-
88
-
89
-
90
-
91
-
 
1
+ ---
2
+ title: Spell Correction
3
+ emoji:
4
+ colorFrom: purple
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.5.0
8
+ app_file: app.py
9
+ pinned: false
10
+ short_description: Vietnamese spell correction
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference