Svngoku commited on
Commit
0825485
1 Parent(s): 6c0660f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -34,6 +34,87 @@ pipeline_tag: text-generation
34
 
35
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## LlamaCPP Code
39
 
 
34
 
35
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
36
 
37
+ ## Unsloth Inference (2x Faaaaster)
38
+
39
+ ```sh
40
+ %%capture
41
+ # Installs Unsloth, Xformers (Flash Attention) and all other packages!
42
+ !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
43
+ !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
44
+ ```
45
+
46
+ ```py
47
+ max_seq_length = 4096
48
+ dtype = None
49
+ load_in_4bit = True # Use 4bit quantization to reduce memory usage.
50
+
51
+ alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
52
+
53
+ ### Instruction:
54
+ {}
55
+
56
+ ### Input:
57
+ {}
58
+
59
+ ### Response:
60
+ {}"""
61
+
62
+ ```
63
+
64
+ ```py
65
+ ## Load the Quantize model
66
+ from unsloth import FastLanguageModel
67
+ model, tokenizer = FastLanguageModel.from_pretrained(
68
+ model_name = "vutuka/Llama-3.1-8B-african-aya",
69
+ max_seq_length = max_seq_length,
70
+ dtype = dtype,
71
+ load_in_4bit = load_in_4bit,
72
+ )
73
+ FastLanguageModel.for_inference(model)
74
+ ```
75
+
76
+
77
+ ```py
78
+
79
+ def llama_african_aya(input: str = "", instruction: str = ""):
80
+ inputs = tokenizer(
81
+ [
82
+ alpaca_prompt.format(
83
+ instruction,
84
+ input,
85
+ "",
86
+ )
87
+ ], return_tensors = "pt").to("cuda")
88
+ text_streamer = TextStreamer(tokenizer)
89
+ # _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 800)
90
+ # Generate the response
91
+ output = model.generate(**inputs, max_new_tokens=1024)
92
+
93
+ # Decode the generated response
94
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
95
+
96
+ # Extract the response part if needed (assuming the response starts after "### Response:")
97
+ response_start = generated_text.find("### Response:") + len("### Response:")
98
+ response = generated_text[response_start:].strip()
99
+
100
+ # Format the response in Markdown
101
+ # markdown_response = f"{response}"
102
+
103
+ # Render the markdown response
104
+ # display(Markdown(markdown_response))
105
+ return response
106
+
107
+ ```
108
+
109
+
110
+ ```py
111
+ llama_african_aya(
112
+ instruction="",
113
+ input="Àwọn ajínigbé méjì ni wọ́n mú ní Supare Akoko, ṣàlàyé ìtàn náà."
114
+ )
115
+ ```
116
+
117
+
118
 
119
  ## LlamaCPP Code
120