jinaai/jina-clip-v1 · Text context length?

For stage 2, Ctext pairs
is used again. However, text values are truncated to 512 tokens in this case, and as a result a smaller batch size of 8,192 is used.

So looks like it's 512, if I'm reading that right?

FremyCompany

Jun 3, 2024

I had the same question.

The largest size that is well aligned with images per training seems to be 512 instead. However, this might generalize further, for example if the third and last stage of finetuning allows for longer text-only sequences (this unfortunately isn't mentioned in the paper). It might also weakly generalize just because the initial BERT model supported longer input texts (8192 it seems, per the config file), but this would have to be tested.

I would love to get some clarity on that. Any thoughts, @gmastrapas or @bwang0911 ?

bwang0911

Jina AI org Jun 3, 2024

hi all, our backbone model JinaBERT support very long sequence (we say up to 8192, but should be unlimited).

we contrastively train the model with a seq length of 512 on embedding tasks, but this does not mean that the model can only handle 512, it should be able to handle much longer sequence, same as jina-embeddings-v2.

However, our experience tell us the best sequence length to get sentence embeddings is around ~512-1000. My suggestion is keep the document below 1000 tokens, but it will definitely work beyond much longer than 1000.

@dingo-actual @paulmaksimovich @FremyCompany

bwang0911 changed discussion status to closed Jun 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment