echo840 commited on
Commit
4a4d7ef
·
verified ·
1 Parent(s): 0935c56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -12
README.md CHANGED
@@ -56,18 +56,6 @@ We also provide the source code and the model weight for the original demo, allo
56
  python demo.py -c echo840/Monkey
57
  ```
58
 
59
- In order to generate more detailed captions, we provide some prompt examples so that you can conduct more interesting explorations. You can modify these two variables in the `caption` function to implement different prompt inputs for the caption task, as shown below:
60
- ```
61
- query = "Generate the detailed caption in English. Answer:"
62
- chat_query = "Generate the detailed caption in English. Answer:"
63
- ```
64
- - Generate the detailed caption in English.
65
- - Explain the visual content of the image in great detail.
66
- - Analyze the image in a comprehensive and detailed manner.
67
- - Describe the image in as much detail as possible in English without duplicating it.
68
- - Describe the image in as much detail as possible in English, including as many elements from the image as possible, but without repetition.
69
-
70
-
71
  ## Dataset
72
 
73
  We have open-sourced the data generated by the multi-level description generation method. You can download it at [Detailed Caption](https://huggingface.co/datasets/echo840/Detailed_Caption).
@@ -121,6 +109,40 @@ We also offer Monkey's model definition and training code, which you can explore
121
 
122
  **ATTENTION:** Specify the path to your training data, which should be a json file consisting of a list of conversations.
123
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
 
125
 
126
  ## Citing Monkey
 
56
  python demo.py -c echo840/Monkey
57
  ```
58
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## Dataset
60
 
61
  We have open-sourced the data generated by the multi-level description generation method. You can download it at [Detailed Caption](https://huggingface.co/datasets/echo840/Detailed_Caption).
 
109
 
110
  **ATTENTION:** Specify the path to your training data, which should be a json file consisting of a list of conversations.
111
 
112
+ ## Inference
113
+ ```python
114
+ from transformers import AutoModelForCausalLM, AutoTokenizer
115
+ checkpoint = "echo840/Monkey-Chat"
116
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map='cuda', trust_remote_code=True).eval()
117
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
118
+ tokenizer.padding_side = 'left'
119
+ tokenizer.pad_token_id = tokenizer.eod_id
120
+ img_path = ""
121
+ question = ""
122
+ query = f'<img>{img_path}</img> {question} Answer: ' #Monkey-Chat has the same prompt format for both vqa and detailed caption.
123
+
124
+ input_ids = tokenizer(query, return_tensors='pt', padding='longest')
125
+ attention_mask = input_ids.attention_mask
126
+ input_ids = input_ids.input_ids
127
+
128
+ pred = model.generate(
129
+ input_ids=input_ids.cuda(),
130
+ attention_mask=attention_mask.cuda(),
131
+ do_sample=False,
132
+ num_beams=1,
133
+ max_new_tokens=512,
134
+ min_new_tokens=1,
135
+ length_penalty=1,
136
+ num_return_sequences=1,
137
+ output_hidden_states=True,
138
+ use_cache=True,
139
+ pad_token_id=tokenizer.eod_id,
140
+ eos_token_id=tokenizer.eod_id,
141
+ )
142
+ response = tokenizer.decode(pred[0][input_ids.size(1):].cpu(), skip_special_tokens=True).strip()
143
+ print(response)
144
+ ```
145
+
146
 
147
 
148
  ## Citing Monkey