Ayansk11 commited on
Commit
4f954ec
·
verified ·
1 Parent(s): 5657312

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -58,17 +58,23 @@ text="I'm going through some things with my feelings and myself. I barely sleep
58
  <strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
59
  </div>
60
 
61
- <h3 style="color: #388e3c; font-family: Arial, sans-serif;">Here is some keys to note:</h3>
 
 
62
 
63
  <ol style="margin-left: 20px;">
64
- <p>The <code>model = FastLanguageModel.for_inference(model)</code> configures the model specifically for inference, optimizing its performance for generating responses.</p>
 
65
  </li>
66
- <p>The input text is tokenized using the <code>tokenizer</code>, it convert the text into a format that model can process. We are using <code>data_prompt</code> to format the input text, while the response placeholder is left empty to get response from model. The <code>return_tensors = "pt"</code> parameter specifies that the output should be in PyTorch tensors, which are then moved to the GPU using <code>.to("cuda")</code> for faster processing.</p>
 
67
  </li>
68
- <p>The <code>model.generate</code> method generating response based on the tokenized inputs. The parameters <code>max_new_tokens = 5020</code> and <code>use_cache = True</code> ensure that the model can produce long and coherent responses efficiently by utilizing cached computation from previous layers.</p>
 
69
  </li>
70
  </ol>
71
 
 
72
  ```python
73
  model = FastLanguageModel.for_inference(model)
74
  inputs = tokenizer(
 
58
  <strong>Note:</strong> Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !
59
  </div>
60
 
61
+
62
+
63
+ <h3 style="color: #388e3c; font-family: Arial, sans-serif;">Key Points to Note:</h3>
64
 
65
  <ol style="margin-left: 20px;">
66
+ <li>
67
+ <p>The <code>model = FastLanguageModel.for_inference(model)</code> command prepares the model specifically for inference, ensuring it is optimized for generating responses efficiently.</p>
68
  </li>
69
+ <li>
70
+ <p>The input text is processed using the <code>tokenizer</code>, which converts it into a format suitable for the model. The <code>data_prompt</code> is used to structure the input text, leaving a placeholder for the model's response. Additionally, the <code>return_tensors = "pt"</code> argument ensures the output is in PyTorch tensor format, which is then transferred to the GPU using <code>.to("cuda")</code> for faster processing.</p>
71
  </li>
72
+ <li>
73
+ <p>The <code>model.generate</code> function generates responses based on the tokenized input. Parameters like <code>max_new_tokens = 5020</code> and <code>use_cache = True</code> enable the model to produce lengthy, coherent outputs efficiently by leveraging cached computations from prior layers.</p>
74
  </li>
75
  </ol>
76
 
77
+
78
  ```python
79
  model = FastLanguageModel.for_inference(model)
80
  inputs = tokenizer(