Update README.md
Browse files
README.md
CHANGED
@@ -48,11 +48,10 @@ This model can be used in tranlation missions between Chinese and English.
|
|
48 |
As it's a traditional translation model, it can be used in many circumstances, including translation between some academical papers, news, and **even some of the literary works(as the excellent performance the model is in grammar and multi-context cases)**.
|
49 |
|
50 |
## Bias, Risks, and Limitations
|
51 |
-
**1.Remember this is a beta version of this translation model,thus we add the limitation on the scale of input tokens, so plz make sure the scale of your input text won't overflow the limit.**
|
52 |
**2.DO NOT APPLY THIS MODEL FOR ILLEGAL USES.**
|
53 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
54 |
|
55 |
-
[More Information Needed]
|
56 |
|
57 |
### Recommendations
|
58 |
|
@@ -88,20 +87,11 @@ git clone "https://huggingface.co/Varine/opus-mt-zh-en-model"
|
|
88 |
As the dataset we choose in training is tremendous in scale, so after analyzing, we decided to use the only 4% among the whole dataset to train, and we divided the 4% data in 10 epoch to evaluate the training loss and and validation loss in every part of the epoch.
|
89 |
Moreover, we need to claim that, the data form that we used in our training progress is Chinese-English sentence pairs(to better embedding and compare them in higher-dimension space in Transformer architecture).
|
90 |
|
91 |
-
#### Preprocessing [optional]
|
92 |
-
|
93 |
-
[More Information Needed]
|
94 |
|
95 |
|
96 |
#### Training Hyperparameters
|
97 |
|
98 |
-
- **Training regime:**
|
99 |
-
|
100 |
-
#### Speeds, Sizes, Times [optional]
|
101 |
-
|
102 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
103 |
-
|
104 |
-
[More Information Needed]
|
105 |
|
106 |
## Evaluation
|
107 |
|
@@ -112,91 +102,41 @@ Moreover, we need to claim that, the data form that we used in our training prog
|
|
112 |
#### Testing Data
|
113 |
|
114 |
<!-- This should link to a Dataset Card if possible. -->
|
|
|
115 |
|
116 |
-
[More Information Needed]
|
117 |
-
|
118 |
-
#### Factors
|
119 |
-
|
120 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
121 |
-
|
122 |
-
[More Information Needed]
|
123 |
-
|
124 |
-
#### Metrics
|
125 |
-
|
126 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
127 |
-
|
128 |
-
[More Information Needed]
|
129 |
-
|
130 |
-
### Results
|
131 |
-
|
132 |
-
[More Information Needed]
|
133 |
-
|
134 |
-
#### Summary
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
## Model Examination [optional]
|
139 |
-
|
140 |
-
<!-- Relevant interpretability work for the model goes here -->
|
141 |
-
|
142 |
-
[More Information Needed]
|
143 |
-
|
144 |
-
## Environmental Impact
|
145 |
-
|
146 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
147 |
|
148 |
-
|
149 |
|
150 |
-
- **Hardware Type:** [More Information Needed]
|
151 |
-
- **Hours used:** [More Information Needed]
|
152 |
-
- **Cloud Provider:** [More Information Needed]
|
153 |
-
- **Compute Region:** [More Information Needed]
|
154 |
-
- **Carbon Emitted:** [More Information Needed]
|
155 |
|
156 |
-
|
|
|
|
|
|
|
|
|
157 |
|
158 |
### Model Architecture and Objective
|
|
|
159 |
|
160 |
-
[More Information Needed]
|
161 |
|
162 |
### Compute Infrastructure
|
|
|
163 |
|
164 |
-
[More Information Needed]
|
165 |
|
166 |
#### Hardware
|
167 |
|
168 |
-
|
169 |
|
170 |
#### Software
|
171 |
|
172 |
-
|
173 |
-
|
174 |
-
## Citation [optional]
|
175 |
|
176 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
177 |
|
178 |
-
**BibTeX:**
|
179 |
|
180 |
-
[More Information Needed]
|
181 |
|
182 |
-
**APA:**
|
183 |
-
|
184 |
-
[More Information Needed]
|
185 |
-
|
186 |
-
## Glossary [optional]
|
187 |
-
|
188 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
189 |
-
|
190 |
-
[More Information Needed]
|
191 |
-
|
192 |
-
## More Information [optional]
|
193 |
-
|
194 |
-
[More Information Needed]
|
195 |
|
196 |
## Model Card Authors [optional]
|
197 |
|
198 |
-
|
199 |
|
200 |
## Model Card Contact
|
201 |
-
|
202 |
-
[More Information Needed]
|
|
|
48 |
As it's a traditional translation model, it can be used in many circumstances, including translation between some academical papers, news, and **even some of the literary works(as the excellent performance the model is in grammar and multi-context cases)**.
|
49 |
|
50 |
## Bias, Risks, and Limitations
|
51 |
+
**1.Remember this is a beta version of this translation model,thus we add the limitation on the scale of input tokens, so plz make sure the scale of your input text won't overflow the limit.**
|
52 |
**2.DO NOT APPLY THIS MODEL FOR ILLEGAL USES.**
|
53 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
54 |
|
|
|
55 |
|
56 |
### Recommendations
|
57 |
|
|
|
87 |
As the dataset we choose in training is tremendous in scale, so after analyzing, we decided to use the only 4% among the whole dataset to train, and we divided the 4% data in 10 epoch to evaluate the training loss and and validation loss in every part of the epoch.
|
88 |
Moreover, we need to claim that, the data form that we used in our training progress is Chinese-English sentence pairs(to better embedding and compare them in higher-dimension space in Transformer architecture).
|
89 |
|
|
|
|
|
|
|
90 |
|
91 |
|
92 |
#### Training Hyperparameters
|
93 |
|
94 |
+
- **Training regime:** **fp32**<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
|
96 |
## Evaluation
|
97 |
|
|
|
102 |
#### Testing Data
|
103 |
|
104 |
<!-- This should link to a Dataset Card if possible. -->
|
105 |
+
- wmt/wmt19
|
106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
|
108 |
+
## Hardware used in the training
|
109 |
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
+
- **Hardware Type:** **1x Nvidia A10 GPU with 30v CPUs, 200GiB RAM, 1 TiB SSD storage**
|
112 |
+
- **Hours used:** **4.08hrs(roughly estimated)**
|
113 |
+
- **Cloud Provider:** **Lambda Cloud.Co**
|
114 |
+
- **Compute Region:** **California, USA**
|
115 |
+
- **Carbon Emitted:** **N/A**
|
116 |
|
117 |
### Model Architecture and Objective
|
118 |
+
We use the Transformer architecture(Huggingface version) in this model,and it's universal architecture widely used in machine translation missions.
|
119 |
|
|
|
120 |
|
121 |
### Compute Infrastructure
|
122 |
+
Due to the limit of the computational ability on personal PC and the scale of the dataset, we decided to training our model on GPU cloud, which proved to be effective.
|
123 |
|
|
|
124 |
|
125 |
#### Hardware
|
126 |
|
127 |
+
**Thanks to the Lambda Cloud, we use the A10 GPU of Nvidia to finish the project.**
|
128 |
|
129 |
#### Software
|
130 |
|
131 |
+
**We used the Jupiter Notebook on cloud to run our code.**
|
|
|
|
|
132 |
|
|
|
133 |
|
|
|
134 |
|
|
|
135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
136 |
|
137 |
## Model Card Authors [optional]
|
138 |
|
139 |
+
**Varine Xie**
|
140 |
|
141 |
## Model Card Contact
|
142 |
+
**Plz contact me through email:<https://varine7499@gmail.com>, and I'm glad to receive feedback from y'all!** 😊
|
|