Update README.md
Browse files
README.md
CHANGED
@@ -58,17 +58,23 @@ text="I'm going through some things with my feelings and myself. I barely sleep
|
|
58 |
<strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
|
59 |
</div>
|
60 |
|
61 |
-
|
|
|
|
|
62 |
|
63 |
<ol style="margin-left: 20px;">
|
64 |
-
|
|
|
65 |
</li>
|
66 |
-
|
|
|
67 |
</li>
|
68 |
-
|
|
|
69 |
</li>
|
70 |
</ol>
|
71 |
|
|
|
72 |
```python
|
73 |
model = FastLanguageModel.for_inference(model)
|
74 |
inputs = tokenizer(
|
|
|
58 |
<strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
|
59 |
</div>
|
60 |
|
61 |
+
|
62 |
+
|
63 |
+
<h3 style="color: #388e3c; font-family: Arial, sans-serif;">Key Points to Note:</h3>
|
64 |
|
65 |
<ol style="margin-left: 20px;">
|
66 |
+
<li>
|
67 |
+
<p>The <code>model = FastLanguageModel.for_inference(model)</code> command prepares the model specifically for inference, ensuring it is optimized for generating responses efficiently.</p>
|
68 |
</li>
|
69 |
+
<li>
|
70 |
+
<p>The input text is processed using the <code>tokenizer</code>, which converts it into a format suitable for the model. The <code>data_prompt</code> is used to structure the input text, leaving a placeholder for the model's response. Additionally, the <code>return_tensors = "pt"</code> argument ensures the output is in PyTorch tensor format, which is then transferred to the GPU using <code>.to("cuda")</code> for faster processing.</p>
|
71 |
</li>
|
72 |
+
<li>
|
73 |
+
<p>The <code>model.generate</code> function generates responses based on the tokenized input. Parameters like <code>max_new_tokens = 5020</code> and <code>use_cache = True</code> enable the model to produce lengthy, coherent outputs efficiently by leveraging cached computations from prior layers.</p>
|
74 |
</li>
|
75 |
</ol>
|
76 |
|
77 |
+
|
78 |
```python
|
79 |
model = FastLanguageModel.for_inference(model)
|
80 |
inputs = tokenizer(
|