File size: 16,358 Bytes
a66edf1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c7cef1
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
language:
- ja
tags:
- instructblip
- vision
- image-captioning
- japanese-stablelm
pipeline_tag: image-to-text
license:
- other
extra_gated_heading: Access Japanese StableLM Instruct Alpha
extra_gated_description: This repository is publicly accessible, but you have to accept the conditions to access its files and content.
extra_gated_button_content: Access repository
extra_gated_fields:
  Name: text
  Email: text
  Organization: text
  I agree to accept the conditions and share above info with Stability AI: checkbox
extra_gated_prompt: | 
    ### JAPANESE STABLELM RESEARCH LICENSE AGREEMENT
    Dated: August 7, 2023

    "Agreement" means the terms and conditions for use, reproduction, distribution and modification of the Software Products set forth herein.

    “Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software.

    "Licensee" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person’s or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.

    "Stability AI" or "we" means Stability AI Ltd.

    "Software" means, collectively, Stability AI’s proprietary Japanese StableLM made available under this Agreement.

    “Software Products” means Software and Documentation.
    
    By using or distributing any portion or element of the Software Products, you agree to be bound by this Agreement.
    - License Rights and Redistribution.
        - Subject to your compliance with this Agreement and the Documentation, Stability AI grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Software Products to reproduce, distribute, and create derivative works of the Software Products for purposes other than commercial or production use.
        - You will not, and will not permit, assist or cause any third party to use, modify, copy, reproduce, create derivative works of, or distribute the Software Products (or any derivative works thereof, works incorporating the Software Products, or any data produced by the Software), in whole or in part, for any commercial or production purposes.
        - If you distribute or make the Software Products, or any derivative works thereof, available to a third party, you shall (i) provide a copy of this Agreement to such third party, and (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "Japanese StableLM is licensed under the Japanese StableLM Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.”
        - The licenses granted to you under this Agreement are conditioned upon your compliance with the Documentation and this Agreement, including the Acceptable Use Policy below and as may be updated from time to time in the future on stability.ai, which is hereby incorporated by reference into this Agreement.
    - Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE SOFTWARE PRODUCTS  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE SOFTWARE PRODUCTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE SOFTWARE PRODUCTS AND ANY OUTPUT AND RESULTS.
    - Limitation of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
    - Intellectual Property.
        - No trademark licenses are granted under this Agreement, and in connection with the Software Products, neither Stability AI nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Software Products.
        - Subject to Stability AI’s ownership of the Software Products and derivatives made by or for Stability AI, with respect to any derivative works and modifications of the Software Products that are made by you, as between you and Stability AI, you are and will be the owner of such derivative works and modifications.
        - If you institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Software Products or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to your use or distribution of the Software Products in violation of this Agreement.
    - Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Software Products and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Software Products. Sections 2-4 shall survive the termination of this Agreement.
    —----------
    ### Japanese StableLM Acceptable Use Policy
    If you access, use, or distribute any Stability AI models, software, or other materials (“Stability Technology”) you agree to this Acceptable Use Policy (“Policy”).
    We want everyone to use Stability Technology safely and responsibly. You agree you will not use, or allow others to use, Stability Technology to:
    - To violate the law or others’ rights (including intellectual property rights and the rights of data privacy and protection), nor will you promote, contribute to, encourage, facilitate, plan, incite, or further anyone else’s violation of the law or others’ rights;
    - To commit, promote, contribute to, facilitate, encourage, plan, incite, or further any of the following:
            - Violence or terrorism;
            - Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content;
            - Human trafficking, exploitation, and sexual violence;
            - Harassment, abuse, threatening, stalking, or bullying of individuals or groups of individuals;
            - Discrimination in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services on the basis of race, color, caste, religion, sex (including pregnancy, sexual orientation, or gender identity), national origin, age, disability, or genetic information (including family medical history) except as may be required by applicable law (such as the provision of social security benefits solely to people who meet certain age requirements under the law);
            - Creation of malicious code, malware, computer viruses or any activity that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system;
    - For purposes of or for the performance of:
            - Fully automated decision-making, including profiling, with respect to an individual or group of individuals which produces legal effects concerning such individual(s) or similarly significantly affects such individual(s);
            - Systematic or automated scraping, mining, extraction, or harvesting of personally identifiable data, or similar activity, from the output of any Stability Technology except with respect to data that you have provided as input to the Stability Technology and which you are legally entitled to process, for so long as you retain such entitlement;
            - Development, improvement, or manufacture of any weapons of mass destruction (such as nuclear, chemical, or biologic weapons), weapons of war (such as missiles or landmines), or any gain of function-related activities with respect to any pathogens;
            - Mission critical applications or systems where best industry practices require fail-safe controls or performance, including operation of nuclear facilities, aircraft navigation, electrical grids, communication systems, water treatment facilities, air traffic control, life support, weapons systems, or emergency locator or other emergency services;
    - To intentionally deceive or mislead others, including use of Japanese StableLM related to the following:
        - Generating, promoting, or furthering fraud or the creation or promotion of disinformation;
        - Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content;
        - Generating, promoting, or further distributing spam;
        - Impersonating another individual without consent, authorization, or legal right
        - Representing or misleading people into believing that the use of Japanese StableLM or outputs are human-generated;
        - Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement;
        - Generating or facilitating large-scale political advertisements, propaganda, or influence campaigns;
    - Fail to appropriately disclose to end users any known dangers of your AI system or misrepresent or mislead with respect to its abilities.
    Nothing in this AUP is intended to prevent or impede any good faith research, testing, or evaluation of Japanese StableLM, or publication related to any of the foregoing. If you discover any flaws in Japanese StableLM that may be harmful to people in any way, we encourage you to notify us and give us a chance to remedy such flaws before others can exploit them. If you have questions about this AUP, contact us at legal@stability.ai.
---

# Japanese InstructBLIP Alpha

![japanese-instructblip-icon](./japanese-instructblip-parrot.png)

## Model Details
Japanese InstructBLIP Alpha is a vision-language instruction-following model that enables to generate Japanese descriptions for input images and optionally input texts such as questions. 


## Usage

First install additional dependencies in [requirements.txt](./requirements.txt):

```sh
pip install sentencepiece einops
```


```python
import torch
from transformers import LlamaTokenizer, AutoModelForVision2Seq, BlipImageProcessor
from PIL import Image
import requests

# helper function to format input prompts
def build_prompt(prompt="", sep="\n\n### "):
    sys_msg = "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
    p = sys_msg
    roles = ["指示", "応答"]
    user_query = "与えられた画像について、詳細に述べてください。"
    msgs = [": \n" + user_query, ": "]
    if prompt:
        roles.insert(1, "入力")
        msgs.insert(1, ": \n" + prompt)
    for role, msg in zip(roles, msgs):
        p += sep + role + msg
    return p

# load model
model = AutoModelForVision2Seq.from_pretrained("stabilityai/japanese-instructblip-alpha", trust_remote_code=True)
processor = BlipImageProcessor.from_pretrained("stabilityai/japanese-instructblip-alpha")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# prepare inputs
url = "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = "" # input empty string for image captioning. You can also input questions as prompts 
prompt = build_prompt(prompt)
inputs = processor(images=image, return_tensors="pt")
text_encoding = tokenizer(prompt, add_special_tokens=False, return_tensors="pt")
text_encoding["qformer_input_ids"] = text_encoding["input_ids"].clone()
text_encoding["qformer_attention_mask"] = text_encoding["attention_mask"].clone()
inputs.update(text_encoding)

# generate
outputs = model.generate(
    **inputs.to(device, dtype=model.dtype),
    num_beams=5,
    max_new_tokens=32,
    min_length=1,
)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()
print(generated_text)
# 桜と東京スカイツリー
```


## Model Details
* **Developed by**: [Stability AI](https://stability.ai/)
* **Model type**: [InstructBLIP](https://arxiv.org/abs/2305.06500)
* **Language(s)**: Japanese
* **License**: [JAPANESE STABLELM RESEARCH LICENSE AGREEMENT](./LICENSE).

### Training
Japanese InstructBLIP Alpha leverages the [InstructBLIP](https://arxiv.org/abs/2305.06500) architecture. It consists of 3 components: a frozen vision image encoder, a Q-Former, and a frozen LLM. The vision encoder and the Q-Former were initialized with [Salesforce/instructblip-vicuna-7b](https://huggingface.co/Salesforce/instructblip-vicuna-7b). For the frozen LLM, [Japanese-StableLM-Instruct-Alpha-7B](https://huggingface.co/stabilityai/japanese-stablelm-instruct-alpha-7b) model was used. During training, only Q-Former was trained.

### Training Dataset
The training dataset includes the following public datasets:
- [CC12M](https://github.com/google-research-datasets/conceptual-12m) with captions translated into Japanese
- [MS-COCO](https://cocodataset.org/#home) with [STAIR Captions](http://captions.stair.center/)
- [Japanese Visual Genome VQA dataset](https://github.com/yahoojapan/ja-vg-vqa)

## Use and Limitations

### Intended Use

This model is intended to be used by the open-source community in chat-like applications in adherence with the research license.

### Limitations and bias

Although the aforementioned datasets help to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use responsibly.


## How to cite
```bibtex
@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}
```

## Citations

```bibtex
@misc{dai2023instructblip,
    title         = {InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning}, 
    author        = {Wenliang Dai and Junnan Li and Dongxu Li and Anthony Meng Huat Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven Hoi},
    year          = {2023},
    eprint        = {2305.06500},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CV}
}
```

## Contact
* For questions and comments about the model, please join [Stable Community Japan](https://discord.com/invite/StableJP).
* For future announcements / information about Stability AI models, research, and events, please follow https://twitter.com/StabilityAI_JP.
* For business and partnership inquiries, please contact partners-jp@stability.ai. ビジネスや協業に関するお問い合わせはpartners-jp@stability.aiにご連絡ください。