abdullahalzubaer (Abdullah Al Zubaer)

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉

🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

1 reply

·

upvoted a paper 3 months ago

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 27

upvoted an article 5 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29, 2024

• 260

upvoted an article 6 months ago

Article

Vision Language Models Explained

Apr 11, 2024

• 240

liked a model 7 months ago

vikhyatk/moondream2

Image-Text-to-Text • Updated Nov 15, 2024 • 139k • 771

upvoted an article 7 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 171

liked 2 models 9 months ago

CohereForAI/c4ai-command-r-plus

Text Generation • Updated Sep 27, 2024 • 2.45k • 1.7k

lightblue/Karasu-Mixtral-8x22B-v0.1

Text Generation • Updated Apr 11, 2024 • 16 • 62

liked a dataset 9 months ago

orpo-explorers/OpenHermesPreferences-500k

Viewer • Updated Mar 30, 2024 • 500k • 30 • 2

reacted to dvilasuero's post with ❤️ 10 months ago

Post

🔥 Community and Data Quality Are More For Alignment

A recipe to replicate SPIN (Self-Play Fine Tuning) with 30x less data:

🗣️ 50K samples vs 1.8K prompts curated by the 350+ amazing DIBT contributors.
⚗️ Distillation of Mistral Large instead of OpenAI
🙌 Open data & code with ⚗️distilabel

SPIN Paper:
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (2401.01335)

SPIN DIBT Collection with datasets and models:
argilla/dibt-prompt-collective-spin-65ef59062518776024395fc3

Repo:
https://github.com/argilla-io/distilabel-spin-dibt

Joint work with the amazing DIBT community 👇
@aashish1904 , @flozi00 , @sayhan , @munish0838 , @0-hero , @dvilasuero , @eren23 , @davanstrien , @ahnz , @BlackKakapo , @kitano-o , @mmhamdy , @sdiazlor , @Stopwolf , @gabrielmbmb , @tculler91 , @plaguss , @ignacioct , @Hugi-R , @davidberenstein1957 , @Korla , @alvarobartt , @Hugs4Llamas , @Sumandora , @nataliaElv , @jfcalvo , @Averill , @steventrouble , @vasilis , @aeros93 , @kayyshf , @thomasgauthier , @jeromebas , @Ameeeee , @ayoubelmhamdi , @TuringsSolutions , @efels , @Haleyok , @abrazador , @emessy , @Nindaleth , @burtenshaw , @vicgalle , @CortexPE , @casey-martin , @Leire-aguirre-eguiluz , @mrfakename , @Portias600kNeurons , @nathaliepett , @Filippo

3 replies

·

liked a Space 10 months ago

Paused

71

♾️

AutoMerger

liked a model 10 months ago

Crystalcareai/GemMoE-Base-Random

Text Generation • Updated Mar 15, 2024 • 15 • 23

updated a model 10 months ago

abdullahalzubaer/NeuralHermes-2.5-Mistral-7B

Text Generation • Updated Mar 13, 2024 • 12 • 1

reacted to akhaliq's post with 👍 10 months ago

Post

Stealing Part of a Production Language Model

Stealing Part of a Production Language Model (2403.06634)

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under 2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.

2 replies

·

Abdullah Al Zubaer

AI & ML interests

Recent Activity

Organizations

abdullahalzubaer's activity

LLäMmlein Chat Preview 🐑

Chatbots

Phi-3

Addition is All You Need for Energy-efficient Language Models

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

Vision Language Models Explained

vikhyatk/moondream2

Training and Finetuning Embedding Models with Sentence Transformers v3

CohereForAI/c4ai-command-r-plus

lightblue/Karasu-Mixtral-8x22B-v0.1

orpo-explorers/OpenHermesPreferences-500k

AutoMerger

Crystalcareai/GemMoE-Base-Random

abdullahalzubaer/NeuralHermes-2.5-Mistral-7B