Xenova HF staff commited on
Commit
5ce167a
1 Parent(s): bbebb47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -1,7 +1,98 @@
1
  ---
2
  library_name: transformers.js
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  https://huggingface.co/qnguyen3/nanoLLaVA-1.5 with ONNX weights to be compatible with Transformers.js.
6
 
7
- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers.js
3
+ pipeline_tag: image-text-to-text
4
+ language:
5
+ - en
6
+ tags:
7
+ - llava
8
+ - multimodal
9
+ - qwen
10
+ license: apache-2.0
11
  ---
12
 
13
  https://huggingface.co/qnguyen3/nanoLLaVA-1.5 with ONNX weights to be compatible with Transformers.js.
14
 
15
+ ## Usage (Transformers.js)
16
+
17
+ > [!IMPORTANT]
18
+ > NOTE: nanoLLaVA support is experimental and requires you to install Transformers.js [v3](https://github.com/xenova/transformers.js/tree/v3) from source.
19
+
20
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [GitHub](https://github.com/xenova/transformers.js/tree/v3) using:
21
+ ```bash
22
+ npm install xenova/transformers.js#v3
23
+ ```
24
+
25
+ **Example:**
26
+ ```js
27
+ import { AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration, RawImage } from '@xenova/transformers';
28
+
29
+ // Load tokenizer, processor and model
30
+ const model_id = 'onnx-community/nanoLLaVA-1.5';
31
+ const tokenizer = await AutoTokenizer.from_pretrained(model_id);
32
+ const processor = await AutoProcessor.from_pretrained(model_id);
33
+ const model = await LlavaForConditionalGeneration.from_pretrained(model_id, {
34
+ dtype: {
35
+ embed_tokens: 'fp16', // or 'fp32' or 'q8'
36
+ vision_encoder: 'fp16', // or 'fp32' or 'q8'
37
+ decoder_model_merged: 'q4', // or 'q8'
38
+ },
39
+ // device: 'webgpu',
40
+ });
41
+
42
+ // Prepare text inputs
43
+ const prompt = 'What does the text say?';
44
+ const messages = [
45
+ { role: 'system', content: 'Answer the question.' },
46
+ { role: 'user', content: `<image>\n${prompt}` }
47
+ ]
48
+ const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
49
+ const text_inputs = tokenizer(text);
50
+
51
+ // Prepare vision inputs
52
+ const url = 'https://huggingface.co/qnguyen3/nanoLLaVA/resolve/main/example_1.png';
53
+ const image = await RawImage.fromURL(url);
54
+ const vision_inputs = await processor(image);
55
+
56
+ // Generate response
57
+ const { past_key_values, sequences } = await model.generate({
58
+ ...text_inputs,
59
+ ...vision_inputs,
60
+ do_sample: false,
61
+ max_new_tokens: 64,
62
+ return_dict_in_generate: true,
63
+ });
64
+
65
+ // Decode output
66
+ const answer = tokenizer.decode(
67
+ sequences.slice(0, [text_inputs.input_ids.dims[1], null]),
68
+ { skip_special_tokens: true },
69
+ );
70
+ console.log(answer);
71
+ // The text on the image reads "SMALL BUT MIGHTY." This phrase is likely a play on words, combining the words "small" and "mighty," suggesting that the mouse is strong and capable, despite its size.
72
+
73
+ const new_messages = [
74
+ ...messages,
75
+ { role: 'assistant', content: answer },
76
+ { role: 'user', content: 'How does the text correlate to the context of the image?' }
77
+ ]
78
+ const new_text = tokenizer.apply_chat_template(new_messages, { tokenize: false, add_generation_prompt: true });
79
+ const new_text_inputs = tokenizer(new_text);
80
+
81
+ // Generate another response
82
+ const output = await model.generate({
83
+ ...new_text_inputs,
84
+ past_key_values,
85
+ do_sample: false,
86
+ max_new_tokens: 256,
87
+ });
88
+ const new_answer = tokenizer.decode(
89
+ output.slice(0, [new_text_inputs.input_ids.dims[1], null]),
90
+ { skip_special_tokens: true },
91
+ );
92
+ console.log(new_answer);
93
+ // The text "SMALL BUT MIGHTY" correlates to the context of the image by implying that despite its size, the mouse possesses a significant amount of strength or capability. This could be a metaphor for the mouse's ability to perform tasks or overcome challenges, especially when it comes to lifting a weight.
94
+ ```
95
+
96
+ ---
97
+
98
+ Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).