--- language: - ru pipeline_tag: sentence-similarity tags: - russian - pretraining - embeddings - tiny - feature-extraction - sentence-similarity - sentence-transformers - transformers datasets: - IlyaGusev/gazeta - zloelias/lenta-ru license: mit base_model: cointegrated/rubert-tiny2 --- Быстрая модель BERT для расчетов эмбедингов предложений на русском языке. Модель основана на [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) - имеет аналогичные размеры контекста (2048), ембединга (312) и быстродействие. ## Использование ```Python from sentence_transformers import SentenceTransformer, util model = SentenceTransformer('sergeyzh/rubert-tiny-turbo') sentences = ["привет мир", "hello world", "здравствуй вселенная"] embeddings = model.encode(sentences) print(util.dot_score(embeddings, embeddings)) ``` ## Метрики Оценки модели на бенчмарке [encodechka](https://github.com/avidale/encodechka): | model | CPU | GPU | size | Mean S | Mean S+W | dim | |:-----------------------------------|----------:|---------:|---------:|----------:|-----------:|-------:| | [sergeyzh/LaBSE-ru-turbo](https://huggingface.co/sergeyzh/LaBSE-ru-turbo) | 120.40 | 8.05 | 490 | 0.789 | 0.702 | 768 | | BAAI/bge-m3 | 523.40 | 22.50 | 2166 | 0.787 | 0.696 | 1024 | | intfloat/multilingual-e5-large | 506.80 | 30.80 | 2136 | 0.780 | 0.686 | 1024 | | intfloat/multilingual-e5-base | 130.61 | 14.39 | 1061 | 0.761 | 0.669 | 768 | | **sergeyzh/rubert-tiny-turbo** | 5.51 | 3.25 | 111 | 0.749 | 0.667 | 312 | | intfloat/multilingual-e5-small | 40.86 | 12.09 | 449 | 0.742 | 0.645 | 384 | | cointegrated/rubert-tiny2 | 5.51 | 3.25 | 111 | 0.704 | 0.638 | 312 | | model | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 | |:-----------------------------------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------|:---------| | [sergeyzh/LaBSE-ru-turbo](https://huggingface.co/sergeyzh/LaBSE-ru-turbo) | 0.864 | 0.748 | 0.490 | 0.814 | 0.974 | 0.806 | 0.815 | 0.801 | 0.305 | 0.404 | | BAAI/bge-m3 | 0.864 | 0.749 | 0.510 | 0.819 | 0.973 | 0.792 | 0.809 | 0.783 | 0.240 | 0.422 | | intfloat/multilingual-e5-large | 0.862 | 0.727 | 0.473 | 0.810 | 0.979 | 0.798 | 0.819 | 0.773 | 0.224 | 0.374 | | intfloat/multilingual-e5-base | 0.835 | 0.704 | 0.459 | 0.796 | 0.964 | 0.783 | 0.802 | 0.738 | 0.235 | 0.376 | | **sergeyzh/rubert-tiny-turbo** | 0.828 | 0.722 | 0.476 | 0.787 | 0.955 | 0.757 | 0.780 | 0.685 | 0.305 | 0.373 | | intfloat/multilingual-e5-small | 0.822 | 0.714 | 0.457 | 0.758 | 0.957 | 0.761 | 0.779 | 0.691 | 0.234 | 0.275 | | cointegrated/rubert-tiny2 | 0.750 | 0.651 | 0.417 | 0.737 | 0.937 | 0.746 | 0.757 | 0.638 | 0.360 | 0.386 |