Qwen2-VL
Collection
3 items
•
Updated
•
1
This is an uncensored version of Qwen2-VL-2B-Instruct created with abliteration (see this article to know more about it).
Special thanks to @FailSpy for the original code and technique. Please follow him if you're interested in abliterated models.
It was only the text part that was processed, not the image part.
You can use this model in your applications by loading it with Hugging Face's transformers
library:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen2VLForConditionalGeneration.from_pretrained(
"huihui-ai/Qwen2-VL-2B-Instruct-abliterated", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("huihui-ai/Qwen2-VL-2B-Instruct-abliterated")
image_path = "/tmp/test.png"
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": f"file://{image_path}",
},
{"type": "text", "text": "Please describe the content of the photo in detail"},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
output_text = output_text[0]
print(output_text)