--- license: apache-2.0 datasets: - neulab/PangeaInstruct language: - am - ar - bg - bn - cs - de - el - en - es - fa - fr - ga - hi - id - ig - it - iw - ja - jv - ko - nl - mn - ms - no - pl - pt - ro - ru - si - su - sw - ta - te - th - tr - uk - ur - vi - zh base_model: - Qwen/Qwen2-7B-Instruct --- # Pangea-7B Model Card [Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages](https://neulab.github.io/Pangea/) ๐Ÿ‡ช๐Ÿ‡น ๐Ÿ‡ธ๐Ÿ‡ฆ ๐Ÿ‡ง๐Ÿ‡ฌ ๐Ÿ‡ง๐Ÿ‡ฉ ๐Ÿ‡จ๐Ÿ‡ฟ ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ฌ๐Ÿ‡ท ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡ช๐Ÿ‡ธ ๐Ÿ‡ฎ๐Ÿ‡ท ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡ช ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ณ๐Ÿ‡ฌ ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ฎ๐Ÿ‡ฑ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ฐ๐Ÿ‡ท ๐Ÿ‡ณ๐Ÿ‡ฑ ๐Ÿ‡ฒ๐Ÿ‡ณ ๐Ÿ‡ฒ๐Ÿ‡พ ๐Ÿ‡ณ๐Ÿ‡ด ๐Ÿ‡ต๐Ÿ‡ฑ ๐Ÿ‡ต๐Ÿ‡น ๐Ÿ‡ง๐Ÿ‡ท ๐Ÿ‡ท๐Ÿ‡ด ๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡ฑ๐Ÿ‡ฐ ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ฐ๐Ÿ‡ช ๐Ÿ‡น๐Ÿ‡ฟ ๐Ÿ‡ฑ๐Ÿ‡ฐ ๐Ÿ‡น๐Ÿ‡ญ ๐Ÿ‡น๐Ÿ‡ท ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ต๐Ÿ‡ฐ ๐Ÿ‡ป๐Ÿ‡ณ ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡น๐Ÿ‡ผ [๐Ÿ  Homepage](https://neulab.github.io/Pangea/) | [๐Ÿค– Pangea-7B](https://huggingface.co/neulab/Pangea-7B) | [๐Ÿ“Š PangeaIns](https://huggingface.co/datasets/neulab/PangeaInstruct) | [๐Ÿงช PangeaBench](https://huggingface.co/collections/neulab/pangea-6713c3b0d78a453906eb2ed8) | [๐Ÿ’ป Github](https://github.com/neulab/Pangea/tree/main) | [๐Ÿ“„ Arxiv](https://arxiv.org/abs/2410.16153) | [๐Ÿ“• PDF](https://arxiv.org/pdf/2410.16153) | [๐Ÿ–ฅ๏ธ Demo](https://huggingface.co/spaces/neulab/Pangea) description ## Model details - **Model:** Pangea is a fully open-source Multilingual Multimodal Multicultural LLM. - **Date:** Pangea-7B was trained in 2024. - **Training Dataset:** [6M PangeaIns](https://huggingface.co/datasets/neulab/PangeaInstruct). - **Architecture:** Pangea-7B follows the architecture of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), with a [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) backbone. ## Uses Pangea-7B follows the architecture of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT). You could either (1) follow the same model loading procedures as of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), an example of loading Pangea-7B directly is shown in the Python code below, or (2) use our hf version of Pangea-7B: [Pangea-7B-hf]https://huggingface.co/neulab/Pangea-7B-hf ### Direct Use First you would need to clone and install [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT). ```bash git clone https://github.com/LLaVA-VL/LLaVA-NeXT cd LLaVA-NeXT pip install -e ".[train]" ``` Then, you could load Pangea-7B using the following code: ```python from llava.model.builder import load_pretrained_model model_path = 'neulab/Pangea-7B' model_name = 'Pangea-7B-qwen' args = {"multimodal": True} tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, **args) ``` Defining some helper functions for using the model: ```python import torch from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN from llava.utils import disable_torch_init from llava.constants import IGNORE_INDEX, DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX from typing import Dict import transformers import re from PIL import Image def preprocess_qwen(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False, max_len=2048, system_message: str = "You are a helpful assistant.") -> Dict: roles = {"human": "<|im_start|>user", "gpt": "<|im_start|>assistant"} im_start, im_end = tokenizer.additional_special_tokens_ids nl_tokens = tokenizer("\n").input_ids _system = tokenizer("system").input_ids + nl_tokens _user = tokenizer("user").input_ids + nl_tokens _assistant = tokenizer("assistant").input_ids + nl_tokens input_ids = [] source = sources if roles[source[0]["from"]] != roles["human"]: source = source[1:] input_id, target = [], [] system = [im_start] + _system + tokenizer(system_message).input_ids + [im_end] + nl_tokens input_id += system target += [im_start] + [IGNORE_INDEX] * (len(system) - 3) + [im_end] + nl_tokens assert len(input_id) == len(target) for j, sentence in enumerate(source): role = roles[sentence["from"]] if has_image and sentence["value"] is not None and "" in sentence["value"]: num_image = len(re.findall(DEFAULT_IMAGE_TOKEN, sentence["value"])) texts = sentence["value"].split('') _input_id = tokenizer(role).input_ids + nl_tokens for i,text in enumerate(texts): _input_id += tokenizer(text).input_ids if i