New chat interface
Hi, I made a new chat interface which allows for clearing context, help command, and an image command to change the image and clear the conversation.
https://gist.github.com/CoderCowMoo/9c2e611a90ba13b547176618131114f1
Here is a transcript of use:
Downloads\llm\llama-3-vision-alpha via π v3.11.1 on βοΈ (us-west-2) took 1m6s
β― python __main__.py -i "black_white_24.png"
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
user: What colour is the image?
C:\Users\Moo\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
A simple question!
The image is white, as it is a white background with no color or shading.
user: you sure?
Wait... I think I made a mistake! The image is actually a black and white checkered pattern, not a plain white background.
user: /clear
clearing chat history.
user: what colour is the image? If i've already said this please tell me
A simple question!
The image is white, as it's a pure white background with no colors or shades.
user: what came before my question?
Before your question, there was no question! Your first question was "A simple question!"
user: /clear
clearing chat history.
user: could you please repeat?
I apologize, but I didn't receive any text to repeat. It seems you've asked me to repeat something, but there's no text provided. If you could please provide the text you'd like me to repeat, I'll be happy to help!
user: what colour is the image
I apologize, but I didn't receive an image from you. If you'd like to share an image, I can try to help you identify the
color(s) present in it. Please provide the image or describe it to me, and I'll do my best to assist you!
user: /clear
clearing chat history.
user: what colour is the image
A simple question!
The image is white, as it is a white background with no colors or shading.
user: /help
Type your message to the assistant. Type '/exit' to exit the chat.
Type '/clear' to clear the chat history.
Type '/image <path>' to change the image.
Type '/help' to display this message.
user: /image black_white_96.png
user: /clear
clearing chat history.
user: what colour is the image?
The image is white, as it is a white background with black text and shapes.
user: /image assets_demo-2.jpg
user: describe the image
The image is of a young anime girl with short, white hair and bright blue eyes. She has a sweet and innocent expression on her face. She is holding a large, juicy hamburger in her hand, and her eyes are shining with excitement as she takes a big bite of it. Her mouth is full of food, and she is chewing happily. She is sitting at a table in a medieval-style tavern, surrounded by other characters from the anime. The atmosphere is cozy and warm, with candles and a fire burning in the background. The girl's outfit is a simple white dress with a blue apron, and she looks like she just stepped out of a fantasy novel.
user:
Note: Apologies if there are any errors in the init model section, I'm using it with the pre-4-bit'd model from unsloth, so I had to add that and the quant config.
Rev 1: Spacing is more reasonable and image paths are checked before using.
I hope to combine the encoder and proj and gguf it soon as I learn how.
Thanks again for the release.
Very, very impressively, it understands that this is from harry potter.
user: /image C:\\Users\\{User}\\Pictures\\Wallpapers\\1630410877070.jpg
user: what do you see?
A winter wonderland scene from the world of Harry Potter! I see a cozy pub, with a warm fire crackling in the fireplace.
The patrons are gathered around a table, enjoying a festive feast and sipping butterbeers. In the background, I see the
snow-covered streets of Diagon Alley, with the sound of laughter and chatter filling the air. The illustration is a beautiful, whimsical depiction of the wizarding world, with intricate details and a sense of magic and wonder.
Is this knowledge from the encoder or from the model, or is the knowledge in both and the projector allows them to collaborate on a task?