README.md · jjhsnail0822/llava-danube-ko-1.8b-instruct at main

metadata

license: apache-2.0
datasets:
  - aldente0630/LLaVA-Pretrain-Ko
  - tabtoyou/KoLLaVA-v1.5-Instruct-581k
language:
  - ko
  - en
base_model:
  - jjhsnail0822/danube-ko-1.8b-base
  - openai/clip-vit-large-patch14-336
tags:
  - h2o-danube2
  - korean
  - sLLM
  - llm
  - multimodal
  - LLaVA

Model Details

llava-danube-ko-1.8b-instruct is a Korean multi-modal language model based on jjhsnail0822/danube-ko-1.8b-base.

Model Developers

Jinhong Jeong, Ungsang Yoon

Model Architecture

We used the LLaVA-NeXT technique to add multi-modal capabilities to the model by 2-stage visual instruction tuning. jjhsnail0822/danube-ko-1.8b-base was used as the base LLM, and the visual encoder was fine-tuned from openai/clip-vit-large-patch14-336. The model has sequence length of 2048.

Training Datasets

In the multi-modal pretrain stage, images filtered from LAION/CC/SBU dataset were used. For the visual instruction tuning stage, we prepared the training dataset from COCO, GQA, Visual Genome datasets and EKVQA dataset from AI-Hub. About 90GB of compressed image data was used for the whole training process.

Chat Template

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: 질문내용###Assistant: 답변내용

Model Benchmark

TBA

Disclaimer

The Model can generate information that is biased, discriminatory, socially inappropriate, etc. The Model can also generate information that is not accurate. The Model is used at your own risk, and the developers are not responsible for the information generated by the model.