Nishith Jain
KingNish
AI & ML interests
AI is fun actually.
Busy till June 2025.
Recent Activity
liked
a model
about 19 hours ago
onnx-community/Kokoro-82M-ONNX
liked
a Space
about 24 hours ago
open-r1/open-r1-eval-leaderboard
liked
a Space
1 day ago
webml-community/kokoro-webgpu
Organizations
KingNish's activity
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
retronic's
post with ๐ฅ
2 days ago
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
hexgrad's
post with ๐
2 days ago
Post
5383
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p
G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.
Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.
Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.
Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.
Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
sometimesanotion's
post with ๐
6 days ago
Post
2494
I'm just saving today's 14B parameter chart, because big things are about to hit. Lamarck v0.7 has been surpassed by at least two models I know of, and in ways that promise good things to come for the whole scene. I am taking my time to enjoy the progress, and Lamarck v0.8 will come when it's clearly keeping up and keeping its flavor.
There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
chansung's
post with ๐
6 days ago
Post
2721
Simple Paper Review #5
I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University
The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.
The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.
I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.
https://arxiv.org/abs/2501.17161
I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University
The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.
The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.
I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.
https://arxiv.org/abs/2501.17161
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
singhsidhukuldeep's
post with ๐ฅ
6 days ago
Post
3538
Exciting Research Alert: Revolutionizing Complex Information Retrieval!
A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.
>> Key Innovations
Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.
Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.
Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.
>> Performance Highlights
The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality
>> Technical Implementation
The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration
This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.
>> Key Innovations
Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.
Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.
Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.
>> Performance Highlights
The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality
>> Technical Implementation
The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration
This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
mkurman's
post with ๐
7 days ago
Post
2006
Blurred-Thoughts Supervised Fine-Tuning (BT-SFT) ๐ค
Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.
We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?
We observed that various models differ in their thinking processes, and fine-tuning one model on another modelโs thoughts (CoT) can sometimes be inefficientโoften resulting in the model simply memorizing reasoning rather than learning how to actually think.
I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.
To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.
Enjoy! ๐
PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.
Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.
We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?
We observed that various models differ in their thinking processes, and fine-tuning one model on another modelโs thoughts (CoT) can sometimes be inefficientโoften resulting in the model simply memorizing reasoning rather than learning how to actually think.
I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.
To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.
Enjoy! ๐
PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
singhsidhukuldeep's
post with ๐ค
7 days ago
Post
2175
Excited to share groundbreaking research from @Baidu_Inc on enterprise information search! The team has developed EICopilot, a revolutionary agent-based solution that transforms how we explore enterprise data in large-scale knowledge graphs.
>> Technical Innovation
EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision.
Key Technical Components:
- Advanced data pre-processing pipeline that builds vector databases of representative queries
- Novel query masking strategy that significantly improves intent recognition
- Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning
- Named Entity Recognition and Natural Language Processing Customization for precise entity matching
- Schema Linking Module for efficient graph database query generation
>> Performance Metrics
The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications.
>> Implementation Details
The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy.
Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
>> Technical Innovation
EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision.
Key Technical Components:
- Advanced data pre-processing pipeline that builds vector databases of representative queries
- Novel query masking strategy that significantly improves intent recognition
- Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning
- Named Entity Recognition and Natural Language Processing Customization for precise entity matching
- Schema Linking Module for efficient graph database query generation
>> Performance Metrics
The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications.
>> Implementation Details
The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy.
Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
Abhaykoul's
post with ๐โค๏ธ
7 days ago
Post
3737
๐ฅ THE WAIT IS OVER... HAI-SER IS HERE! ๐ฅ
Yo fam, this ain't just another AI dropโ this is the FUTURE of emotional intelligence! ๐
Introducing HAI-SER, powered by Structured Emotional Reasoning (SER), the next-level AI that doesnโt just understand your wordsโit feels you, analyzes your emotions, and helps you navigate lifeโs toughest moments. ๐ก
๐ฅ What makes HAI-SER a game-changer?
๐น Emotional Vibe Check โ Gets the mood, energy, and whatโs really going on ๐ญ
๐น Mind-State Analysis โ Breaks down your thoughts, beliefs, and patterns ๐คฏ
๐น Root Cause Deep-Dive โ Unpacks the WHY behind your emotions ๐ก
๐น Impact Check โ Sees how itโs affecting your life and mental health ๐
๐น Safety Check โ Prioritizes your well-being and crisis management ๐จ
๐น Healing Game Plan โ Custom strategies to help you bounce back ๐ช
๐น Growth Potential โ Turns struggles into opportunities for self-improvement ๐
๐น How to Approach โ Teaches you and others how to communicate and heal ๐ค
๐น Personalized Response โ Not just generic adviceโreal talk, tailored to YOU ๐ฏ
No more robotic AI responses. No more surface-level advice. HAI-SER gets deep, analyzing emotions with precision and giving real, actionable support.
This ainโt just AIโthis is your digital therapist, life coach, and hype squad all in one. Whether itโs mental health, career struggles, relationships, or personal growth, HAI-SER has your back.
๐ The future of emotionally intelligent AI is HERE.
Are you ready? ๐ฅ๐ฏ
HelpingAI/HAI-SER
Yo fam, this ain't just another AI dropโ this is the FUTURE of emotional intelligence! ๐
Introducing HAI-SER, powered by Structured Emotional Reasoning (SER), the next-level AI that doesnโt just understand your wordsโit feels you, analyzes your emotions, and helps you navigate lifeโs toughest moments. ๐ก
๐ฅ What makes HAI-SER a game-changer?
๐น Emotional Vibe Check โ Gets the mood, energy, and whatโs really going on ๐ญ
๐น Mind-State Analysis โ Breaks down your thoughts, beliefs, and patterns ๐คฏ
๐น Root Cause Deep-Dive โ Unpacks the WHY behind your emotions ๐ก
๐น Impact Check โ Sees how itโs affecting your life and mental health ๐
๐น Safety Check โ Prioritizes your well-being and crisis management ๐จ
๐น Healing Game Plan โ Custom strategies to help you bounce back ๐ช
๐น Growth Potential โ Turns struggles into opportunities for self-improvement ๐
๐น How to Approach โ Teaches you and others how to communicate and heal ๐ค
๐น Personalized Response โ Not just generic adviceโreal talk, tailored to YOU ๐ฏ
No more robotic AI responses. No more surface-level advice. HAI-SER gets deep, analyzing emotions with precision and giving real, actionable support.
This ainโt just AIโthis is your digital therapist, life coach, and hype squad all in one. Whether itโs mental health, career struggles, relationships, or personal growth, HAI-SER has your back.
๐ The future of emotionally intelligent AI is HERE.
Are you ready? ๐ฅ๐ฏ
HelpingAI/HAI-SER
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
fdaudens's
post with ๐ฅ
9 days ago
Post
3267
๐ฏ Kokoro TTS just hit v1.0! ๐
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities โจ
Check it out: hexgrad/Kokoro-82M
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities โจ
Check it out: hexgrad/Kokoro-82M
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
not-lain's
post with ๐ฅ
10 days ago
Post
3056
I have just released a new blogpost about kv caching and its role in inference speedup ๐
๐ https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
๐ https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
lewtun's
post with ๐ค๐๐ฅ
13 days ago
Post
9962
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
๐งช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
๐ง Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
๐ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Follow along: https://github.com/huggingface/open-r1
๐งช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
๐ง Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
๐ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Follow along: https://github.com/huggingface/open-r1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
clem's
post with ๐คโค๏ธ๐ฅ
13 days ago
Post
6956
AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
fdaudens's
post with ๐ฅโค๏ธ
13 days ago
Post
8218
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mโnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. ๐
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version โ 1M downloads alone.
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mโnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. ๐
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version โ 1M downloads alone.
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6612aedf09f16e7347dfa7e1/bPYjBXCedY_1fSIPjoBTY.jpeg)
reacted to
sagar007's
post with ๐ฅ
13 days ago
Post
3489
๐ Just built a Perplexity-inspired AI search assistant using Gradio, DeepSeek, and DuckDuckGo!
Ask it anything, and itโll:
Scour the web for answers ๐
Cite sources like a pro ๐
Even talk back with TTS (thanks, Kokoro!) ๐๏ธ
Ask it anything, and itโll:
Scour the web for answers ๐
Check it out โ sagar007/DeepSeekR1_Search
Ask it anything, and itโll:
Scour the web for answers ๐
Cite sources like a pro ๐
Even talk back with TTS (thanks, Kokoro!) ๐๏ธ
Ask it anything, and itโll:
Scour the web for answers ๐
Check it out โ sagar007/DeepSeekR1_Search