Limit the number of generated tokens
How can we limit the number of generated tokens in the call to generate?
Something like:
generate_text = pipeline(
model="databricks/dolly-v2-12b",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
generate_text("Hello world!", max_length=5)
Also, would be helpful to set temperature.
Thanks!
This is all a function of Hugging Face, and you can use the standard options here. max_new_tokens controls the generated tokens, and you can pass temperature= here.
Right, where in the huggingface docs do they specify the options we can pass?
You can just search for it, it's here - https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
Here the tutorial video for how to install and use on Windows including your question. Unfortunately documentation was poor so I had to do a lot of research.
The video includes a Gradio user interface script and teaches you how to enable load 8bit speed up and lower VRAM quantization
Dolly 2.0 : Free ChatGPT-like Model for Commercial Use - How To Install And Use Locally On Your PC
@MonsterMMORPG you're posting this in a whole lot of places. Maybe focus this where you think it clearly answers the question and summarize the answer, rather than post a link to your video. For example, I'm not clear that your video addresses this question.
@MonsterMMORPG you're posting this in a whole lot of places. Maybe focus this where you think it clearly answers the question and summarize the answer, rather than post a link to your video. For example, I'm not clear that your video addresses this question.
yes in video i have shown max_length. the video covers it