sonoisa
/

clip-vit-b-32-japanese-v1

Feature Extraction

sentence-similarity

Inference Endpoints

Model card Files Files and versions Community

日本語版CLIPモデル

This is a CLIP text/image encoder model for Japanese.

英語版CLIPモデルのテキストエンコーダーを一種の蒸留を用いて日本語化したモデルです。作り方や精度、使い方、サンプルコードは下記の解説記事をご参照ください。

解説記事:
- 概要: 【日本語モデル付き】2022年にマルチモーダル処理をする人にお勧めしたい事前学習済みモデル
- 使い方の解説: 【日本語CLIP】画像とテキストの類似度計算、画像やテキストの埋め込み計算、類似画像検索
- (公開準備中) 応用解説: いらすとや画像のマルチモーダル検索（ゼロショット編）
- (公開準備中) 応用解説: いらすとや画像のマルチモーダル検索（ファインチューニング編）
- (公開準備中) 応用解説: 画像とテキストの両方を用いたマルチモーダル分類
サンプルコードのリポジトリ: https://github.com/sonoisa/clip-japanese
デモ:
- いらすとや画像のマルチモーダル検索（ゼロショット）

Downloads last month: 792

Safetensors

Model size

111M params

Tensor type

I64

·

F32

·

Inference Examples

Feature Extraction

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using sonoisa/clip-vit-b-32-japanese-v1 1