run

by rakmik - opened 14 days ago

Discussion

rakmik

14 days ago

from transformers import pipeline

messages = [
{"role": "user", "content": "Who are you?"},
]

pipe = pipeline("text-generation", model="ISTA-DASLab/Llama-2-7b-AQLM-PV-1Bit-1x16-hf", trust_remote_code=True)

Manually format the prompt using a template

prompt = "".join([f"{msg['role']}: {msg['content']}\n" for msg in messages])

Use the formatted prompt as input to the pipeline

output = pipe(prompt)
print(output)

low_cpu_mem_usage was None, now default to True since model is quantized.
Device set to use cuda:0
Setting pad_token_id to eos_token_id:2 for open-end generation.
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
[{'generated_text': 'user: Who are you?\n\n\n_The_ _______________'}]

rakmik

14 days ago

low_cpu_mem_usage was None, now default to True since model is quantized.
Device set to use cuda:0
Setting pad_token_id to eos_token_id:2 for open-end generation.
user: hi?

user: hi

user: hi
Number of generated tokens: 56

rakmik

14 days ago

pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline(model="ISTA-DASLab/Llama-2-7b-AQLM-PV-1Bit-1x16-hf", torch_dtype=torch.float16, device_map="auto")
output = pipe("how are you?", do_sample=True, top_p=0.95)

print(output)

Device set to use cuda:0
Setting pad_token_id to eos_token_id:2 for open-end generation.
[{'generated_text': 'how are you? I have been learning about how to build a relationship with a God with Christ. and it will take'}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment