Ayansk11
/

Mental_health_Llama3.2-1B_conversationalBot

Inference Endpoints

Model card Files Files and versions Community

Ayansk11 commited on 14 days ago

Commit

4f954ec

·

verified ·

1 Parent(s): 5657312

Update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -58,17 +58,23 @@ text="I'm going through some things with my feelings and myself. I barely sleep
     <strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
 </div>
-<h3 style="color: #388e3c; font-family: Arial, sans-serif;">Here is some keys to note:</h3>
 <ol style="margin-left: 20px;">
-        <p>The <code>model = FastLanguageModel.for_inference(model)</code> configures the model specifically for inference, optimizing its performance for generating responses.</p>
     </li>
-        <p>The input text is tokenized using the <code>tokenizer</code>, it convert the text into a format that model can process. We are using <code>data_prompt</code> to format the input text, while the response placeholder is left empty to get response from model. The <code>return_tensors = "pt"</code> parameter specifies that the output should be in PyTorch tensors, which are then moved to the GPU using <code>.to("cuda")</code> for faster processing.</p>
     </li>
-        <p>The <code>model.generate</code> method generating response based on the tokenized inputs. The parameters <code>max_new_tokens = 5020</code> and <code>use_cache = True</code> ensure that the model can produce long and coherent responses efficiently by utilizing cached computation from previous layers.</p>
     </li>
 </ol>
 ```python
 model = FastLanguageModel.for_inference(model)
 inputs = tokenizer(

     <strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
 </div>
+<h3 style="color: #388e3c; font-family: Arial, sans-serif;">Key Points to Note:</h3>
 <ol style="margin-left: 20px;">
+    <li>
+        <p>The <code>model = FastLanguageModel.for_inference(model)</code> command prepares the model specifically for inference, ensuring it is optimized for generating responses efficiently.</p>
     </li>
+    <li>
+        <p>The input text is processed using the <code>tokenizer</code>, which converts it into a format suitable for the model. The <code>data_prompt</code> is used to structure the input text, leaving a placeholder for the model's response. Additionally, the <code>return_tensors = "pt"</code> argument ensures the output is in PyTorch tensor format, which is then transferred to the GPU using <code>.to("cuda")</code> for faster processing.</p>
     </li>
+    <li>
+        <p>The <code>model.generate</code> function generates responses based on the tokenized input. Parameters like <code>max_new_tokens = 5020</code> and <code>use_cache = True</code> enable the model to produce lengthy, coherent outputs efficiently by leveraging cached computations from prior layers.</p>
     </li>
 </ol>
 ```python
 model = FastLanguageModel.for_inference(model)
 inputs = tokenizer(