Alibaba-NLP/gte-Qwen2-1.5B-instruct · Question about MTEB Benchmark Settings : 'max_seq

Dec 12, 2024

I noticed that in MTEB benchmark implementations(eval_mteb.py), the 'max_seq_length' is set to 512 tokens by default, even the model that support much longer sequences (like 32K tokens).

For example, when benchmarking embedding models with MTEB:

Default max_seq_length: 512
Actual model capacity: 32K tokens

This seems to potentially underutilize the model's capabilities and might not provide a fair comparison, especially for tasks involving longer documents.

Questions:

Is this a common practice in the industry? If so, what's the rationale behind it?
Wouldn't it be more appropriate to use the model's full sequence length capability for fair benchmarking?
Are there any specific technical or practical reasons why 512 tokens became the de facto standard for MTEB benchmarks?

I'd appreciate any insights from the community on this benchmarking practice.

george31 changed discussion status to closed Dec 16, 2024

george31 changed discussion status to open Dec 16, 2024

zyznull

Alibaba-NLP org Jan 16

This is because most texts on MTEB are shorter than 512 tokens. We have verified that using a larger max length does not yield significantly different results on MTEB compared to setting the max length to 512. Therefore, to reduce testing time, we set the max length to 512.

Alibaba-NLP
/

gte-Qwen2-1.5B-instruct

Question about MTEB Benchmark Settings : 'max_seq_length'😭