File size: 5,942 Bytes
df2b1aa 50101ba df2b1aa 50101ba df2b1aa 36bdfd8 df2b1aa 890bd6b df2b1aa 29b7b8d df2b1aa 29b7b8d df2b1aa 36bdfd8 29b7b8d 36bdfd8 29b7b8d df2b1aa 29b7b8d 36bdfd8 50101ba 29b7b8d 36bdfd8 df2b1aa 50101ba df2b1aa 50101ba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
tags:
- npu
- amd
- llama3.1
- RyzenAI
- translation
---
This model is finetuned [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and AWQ quantized and converted version to run on the [NPU installed Ryzen AI PC](https://github.com/amd/RyzenAI-SW/issues/18), for example, Ryzen 9 7940HS Processor.
Supports translation between English, French, Chinese(Mandarin) and Japanese.
For set up Ryzen AI for LLMs in window 11, see [Running LLM on AMD NPU Hardware](https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f).
The following sample assumes that the setup on the above page has been completed.
This model has only been tested on RyzenAI for Windows 11. It does not work in Linux environments such as WSL.
RoPE support is not yet complete, but it has been confirmed that the perplexity is lower than Llama 3.
2024/07/30
- [Ryzen AI Software 1.2](https://ryzenai.docs.amd.com/en/latest/) has been released. Please note that this model is based on [Ryzen AI Software 1.1](https://ryzenai.docs.amd.com/en/1.1/index.html) and operation with 1.2 has not been confirmed.
- [amd/RyzenAI-SW 1.2](https://github.com/amd/RyzenAI-SW) was announced on July 29, 2024. This sample for [amd/RyzenAI-SW 1.1](https://github.com/amd/RyzenAI-SW/tree/1.1). Please note that the folder and script contents have been completely changed.
### setup
In cmd windows.
```
conda activate ryzenai-transformers
<your_install_path>\RyzenAI-SW\example\transformers\setup.bat
pip install transformers==4.43.3
# Updating the Transformers library will cause the LLama 2 sample to stop working.
# If you want to run LLama 2, revert to pip install transformers==4.34.0.
pip install tokenizers==0.19.1
pip install -U "huggingface_hub[cli]"
huggingface-cli download dahara1/llama3.1-8b_translate-amd-npu --revision main --local-dir llama3.1-8b_translate-amd-npu
copy <your_ryzen_ai-sw_install_path>\RyzenAI-SW\example\transformers\models\llama2\modeling_llama_amd.py .
# set up Runtime. see https://ryzenai.docs.amd.com/en/latest/runtime_setup.html
set XLNX_VART_FIRMWARE=<your_firmware_install_path>\voe-4.0-win_amd64\1x4.xclbin
set NUM_OF_DPU_RUNNERS=1
# save below sample script as utf8 and llama3.1-8b_translate-test.py
python llama3.1-8b_translate-test.py
```
### Sample Script
```
import torch
import psutil
import transformers
from transformers import AutoTokenizer, set_seed
import qlinear
import logging
def translation(instruction, input):
system = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a highly skilled professional translator. You are a native speaker of English, Japanese, French and Mandarin. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.<|eot_id|><|start_header_id|>user<|end_header_id|>"""
prompt = f"""{system}
### Instruction:
{instruction}
### Input:
{input}
### Response:
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
tokenized_input = tokenizer(prompt, return_tensors="pt",
padding=True, max_length=1600, truncation=True)
terminators = [
tokenizer.eos_token_id,
]
outputs = model.generate(tokenized_input['input_ids'],
max_new_tokens=600,
eos_token_id=terminators,
attention_mask=tokenized_input['attention_mask'],
do_sample=True,
temperature=0.3,
top_p=0.5)
response = outputs[0][tokenized_input['input_ids'].shape[-1]:]
response_message = tokenizer.decode(response, skip_special_tokens=True)
return response_message
if __name__ == "__main__":
transformers.logging.set_verbosity_error()
logging.disable(logging.CRITICAL)
set_seed(123)
p = psutil.Process()
p.cpu_affinity([0, 1, 2, 3])
torch.set_num_threads(4)
tokenizer = AutoTokenizer.from_pretrained("llama3.1-8b_translate-amd-npu")
tokenizer.pad_token_id = tokenizer.add_special_tokens({'pad_token': '<|finetune_right_pad_id|>'})
ckpt = r"llama3.1-8b_translate-amd-npu\llama3.1_8b_translate_w_bit_4_awq_amd.pt"
model = torch.load(ckpt)
model.eval()
model = model.to(torch.bfloat16)
for n, m in model.named_modules():
if isinstance(m, qlinear.QLinearPerGrp):
print(f"Preparing weights of layer : {n}")
m.device = "aie"
m.quantize_weights()
print(translation("Translate Japanese to English.", "1月1日は日本の祝日です。その日は日曜日で、5日ぶりに雨が降りました"))
print(translation("Translate English to Japanese.", "It’s raining cats and dogs."))
print(translation("Translate French to Japanese.", "Après la pluie, le beau temps"))
print(translation("Translate Mandarin to Japanese.", "要功夫深,铁杵磨成针"))
```
![chat_image](trans-sample.png)
## Acknowledgements
- [amd/RyzenAI-SW](https://github.com/amd/RyzenAI-SW)
Sample Code and Drivers.
- [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
Thanks for AWQ quantization Method.
- [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
[Built with Meta Llama 3](https://llama.meta.com/llama3/license/) |