Sourab Mangrulkar

smangrul

AI & ML interests

Machine Learning, Deep Learning, Natural Language Processing, Natural Language Generation, Computer Vision, Reinforcement Learning

Recent Activity

updated a Space about 1 month ago
smangrul/PEFT-Docs-QA-Chatbot
View all activity

Articles

Organizations

Speech Recognition Community Event Version 2's profile picture BigScience Data's profile picture group2's profile picture BigCode's profile picture Diffusers Pipelines Library for Stable Diffusion's profile picture Social Post Explorers's profile picture

Posts 6

view post
Post
3291
Unlocking the Power of locally running Llama-3 8B Model Agents with Chat-UI! 🔥🚀✨

I'm thrilled to share my hackathon-style side project:
1. Finetuning Llama-8B for function calling using PEFT QLoRA as the instruct Llama-3 model doesn't support this. The colab notebook for it is here: https://lnkd.in/ggJMzqh2. 🛠️
2. Finetuned model along with the 4-bit quants here: https://lnkd.in/gNpFKY6V
3. Clone Hugging Face https://lnkd.in/gKBKuUBQ and make it compatible for function calling by building upon the PR https://lnkd.in/gnqFuAd4 for my model and local inferencing usecase using Ollama. This was a steep learning curve wherein I stayed awake the whole night to get it working. 💪🏽
4. Above, I used SerpAPI for web browsing and Mongo DB Atlas free tier for persistence of conversations and assistant configs. 🔎
5. More work is required to switch between using tools and responding directly wherein I see the model breaks. 🧐

How cool is this wherein we are approaching experience akin to ChatGPT while using local hosted agent model running on your laptop! 💻
view post
Post
2908
🤗 PEFT v0.10.0 release! 🔥🚀✨

Some highli📝ghts:
1. FSDP+QLoRA and DeepSpeed Stage-3+QLoRA
2. Layer expansion + LoRA
3. DoRA support for Conv2D layers and quantized bitsandbytes layers
4. New LoftQ utility
5. Batched inference for mixed LoRA adapters.

http://Answer.AI team in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost https://lnkd.in/g6jgfXyv. This is now integrated into Hugging Face ecosystem.

For an end-to-end example on FSDP+QLoRA, please refer https://lnkd.in/gT3yY-Rx.

For an end-to-end example on DeepSpeed Stage-3+QLoRA, please refer https://lnkd.in/gkt-xZRE.

With the PR https://lnkd.in/g5F348MN these changes are now upstreamed in https://lnkd.in/g5_MxYtY thanks to Wing Lian ! 🚀

Kudos to http://Answer.AI team, Titus von Köller , Younes Belkada, Benjamin Bossan and Zachary Mueller for all the help without which this couldn't have been possible. 🤗

For efficient depthwise layer expansion akin to passthrough method of mergekit but without using additional memory and attaching LoRAs to it, refer to the details below! 🔥https://lnkd.in/ge95ztjA

Now DoRA is supported for Conv2D layers as well as bitsandbytes quantized layers ✨. For more details, please refer the below thread.
https://lnkd.in/gsJbuWPD

Now you can mix different LoRA adapters in a batch during inference which speeds-up the inference by avoiding computation of base model multiple times which would be the case for adaptive inference with batch_size=1! ⚡️.
Details below. https://lnkd.in/gD-pcX_B

LoftQ reduces quantization error by appropriately initializing the LoRA adapter weights. Normally, this is a two-step process. Benjamin Bossan
added new util replace_lora_weights_loftq for LoftQ to use it on the fly with bnb.

For more details, refer to the release notes. 📝
https://lnkd.in/gg7-AmHA. As always, make sure losses go down and be happy to watch your model train!