Update README.md
Browse files
README.md
CHANGED
@@ -149,9 +149,42 @@ print(tokenizer.decode(outputs[0, input_length:], skip_special_tokens=True))
|
|
149 |
Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
|
150 |
(either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
|
151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
152 |
### Post-editing
|
153 |
|
154 |
-
For post-editing tasks you can
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
|
156 |
```python
|
157 |
source = 'Catalan'
|
@@ -164,6 +197,38 @@ text = f"Please fix any mistakes in the following {source}-{target} machine tran
|
|
164 |
# Rafael Nadal and Maria Magdalena inspired an entire generation.
|
165 |
```
|
166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
167 |
|
168 |
## Data
|
169 |
|
|
|
149 |
Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
|
150 |
(either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
|
151 |
|
152 |
+
#### General translation
|
153 |
+
|
154 |
+
For machine translation tasks you can use the following prompt template:
|
155 |
+
|
156 |
+
```
|
157 |
+
Translate the following text from {source} into {target}.
|
158 |
+
{source}: {source sentence}
|
159 |
+
{target}:
|
160 |
+
```
|
161 |
+
<details>
|
162 |
+
<summary>Show en example</summary>
|
163 |
+
|
164 |
+
```python
|
165 |
+
source = 'Catalan'
|
166 |
+
target = 'Galician'
|
167 |
+
source_sentence = "Als antics egipcis del període de l'Imperi Nou els fascinaven els monuments dels seus predecessors, que llavors tenien més de mil anys."
|
168 |
+
|
169 |
+
text = f"Translate the following text from {source} into {target}.\n{source}: {source_sentence} \n{target}:"
|
170 |
+
# Os antigos exipcios do período do Imperio Novo estaban fascinados polos monumentos dos seus predecesores, que entón tiñan máis de mil anos de antigüidade.
|
171 |
+
```
|
172 |
+
|
173 |
+
</details>
|
174 |
+
|
175 |
### Post-editing
|
176 |
|
177 |
+
For post-editing tasks you can use the following prompt template:
|
178 |
+
|
179 |
+
```
|
180 |
+
Please fix any mistakes in the following {source}-{target} machine translation or keep it unedited if it's correct.
|
181 |
+
Source: {source_sentence}
|
182 |
+
MT: {machine_translation}
|
183 |
+
Corrected:"
|
184 |
+
```
|
185 |
+
|
186 |
+
<details>
|
187 |
+
<summary>Show en example</summary>
|
188 |
|
189 |
```python
|
190 |
source = 'Catalan'
|
|
|
197 |
# Rafael Nadal and Maria Magdalena inspired an entire generation.
|
198 |
```
|
199 |
|
200 |
+
</details>
|
201 |
+
|
202 |
+
### Document-level translation
|
203 |
+
|
204 |
+
For document-level translation tasks you can use the following prompt template:
|
205 |
+
|
206 |
+
```
|
207 |
+
Please translate this text from {source} into {target}.
|
208 |
+
{source}: {1st paragraph of the document}
|
209 |
+
{2nd paragraph of the document}
|
210 |
+
{Nth paragraph of the document}
|
211 |
+
{target}:
|
212 |
+
```
|
213 |
+
|
214 |
+
<details>
|
215 |
+
<summary>Show en example</summary>
|
216 |
+
|
217 |
+
```python
|
218 |
+
source = 'English'
|
219 |
+
target = 'Asturian'
|
220 |
+
|
221 |
+
text = """Please translate this text from {} into {}.\n{}: President Donald Trump, who campaigned on promises to crack down on illegal immigration, has raised alarms in the U.S. dairy industry with his threat to impose 25% tariffs on Mexico and Canada by February 2025. This move is part of a broader strategy to declare a national emergency at the southern border to halt illegal migration completely.
|
222 |
+
However, the implications for the agriculture sector, particularly dairy, are significant. Approximately half of the U.S. dairy industry's workforce consists of immigrant labor, many of whom are undocumented. The National Milk Producers Federation estimates that removing immigrant workers could decimate the dairy herd by 2.1 million cows and slash milk production by nearly 50 billion pounds, leading to a dramatic 90.4% increase in milk prices.
|
223 |
+
The complex perspectives of Americans on undocumented workers were highlighted in a Pew Research Center study. While 64% of U.S. adults support legal pathways for undocumented immigrants, 35% oppose it—a gap that has been narrowing recently. Factors influencing public opinion include the belief that immigrants should have jobs and pass security checks, contrasted by concerns about lawbreakers being rewarded, fairness for legal migrants, and resource allocation.
|
224 |
+
According to Zach Rutledge, an agricultural economist at Michigan State University, as nations grow wealthier, their labor forces transition away from agriculture toward sectors like services and manufacturing. This shift has led to the U.S. relying heavily on immigrant labor for agricultural work. Domestic workers, even with employment taxes, may cost $15 to $25 an hour, while H-2A visa program workers might cost $25 to $30 an hour, accounting for additional housing expenses.
|
225 |
+
The National Milk Producers Federation has been vocal in advocating for changes to the H-2A visa program, which outside of its current seasonal limitations, does not support the dairy industry's year-round labor needs. Executive vice-president Jaime Castaneda reiterated the need for legislative clarity to address the undocumented workforce issues in dairy farming.
|
226 |
+
The Farm Workforce Modernization Act of 2023, which could grant legal status to certain undocumented farmworkers, has been stalled in Congress, despite acknowledgment of the sector's importance to feeding America. The need for coordinated legislative efforts to ensure both border security and labor market stability is imperative moving forward.
|
227 |
+
{}:""".format(source, target, source, target)
|
228 |
+
```
|
229 |
+
|
230 |
+
</details>
|
231 |
+
|
232 |
|
233 |
## Data
|
234 |
|