3 86 87

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

reacted to Kseniase's post with 🔥 3 days ago

9 types of "Chain-of-..." approaches: Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them: 1. Chain-of-Action-Thought (COAT) -> https://huggingface.co/papers/2502.02508 Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens. 2. Chain of Draft (CoD) -> https://huggingface.co/papers/2502.18600 It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster 3. Chain-of-Agents -> https://huggingface.co/papers/2406.02818 Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results 4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342 Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number 5. Chain-of-Shot Prompting (CoS) -> https://huggingface.co/papers/2502.06428 Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module. 6. Chain of Hindsight (CoH) -> https://huggingface.co/papers/2302.02676 Converts all feedback into sequences to fine-tune the model and refine outputs 7. Chain-of-Note (CoN) -> https://huggingface.co/papers/2311.09210 Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer 8. Chain of Diagnosis (CoD) -> https://huggingface.co/papers/2407.13301 Transforms the diagnostic process into a diagnostic chain 9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

upvoted an article 7 days ago

SigLIP 2: A better multilingual vision language encoder

reacted to AdinaY's post with 🔥 7 days ago

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team! Model: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B Demo: https://huggingface.co/spaces/Wan-AI/Wan2.1 ✨Apache 2.0 ✨8.19GB VRAM, runs on most GPUs ✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A ✨Text Generation: Supports Chinese & English ✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

View all activity

Organizations

theainerd's activity

reacted to Kseniase's post with 🔥 3 days ago

Post

5837

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

upvoted an article 7 days ago

Article

SigLIP 2: A better multilingual vision language encoder

14 days ago

• 124

reacted to AdinaY's post with 🔥 7 days ago

Post

2683

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

1 reply

reacted to burtenshaw's post with 🔥 8 days ago

Post

6017

Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

🔗 Follow the org for updates https://huggingface.co/agents-course

This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .

liked a Space 8 days ago

304

AI Deadlines

⚡

Generate project deadlines

reacted to stefan-it's post with 👍 8 days ago

Post

5056

She arrived 😍

[Expect more models soon...]

2 replies

upvoted a paper 10 days ago

LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published 13 days ago • 26

liked a dataset 12 days ago

facebook/natural_reasoning

Viewer • Updated 13 days ago • 1.15M • 7.28k • 321

upvoted 2 papers 12 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 14 days ago • 127

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 14 days ago • 172

upvoted a paper 13 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 14 days ago • 94

reacted to cogwheelhead's post with 👍 13 days ago

Post

2510

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

liked 2 Spaces 14 days ago

155

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset

Paligemma2 Mix

🌖

Generate text or segment objects from an image

liked a dataset 14 days ago

microsoft/IMAGE_UNDERSTANDING

Viewer • Updated Sep 20, 2024 • 10.2k • 606 • 6

upvoted 2 papers 14 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 15 days ago • 156

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published 16 days ago • 14

reacted to dreamerdeo's post with ➕🚀 14 days ago

Post

2770

🚀 Excited to share our technical report on the Southeast Asian multilingual model Sailor2 and its latest updates!

Our 49-page report details Sailor2's development journey, including multilingual data cleaning, small model data mixture simulations, multi-stage continual pre-training, multi-stage post-training, and multi-cultural multi-lingual evaluations. Sailor2 aims to streamline the multilingual model pre-training process efficiently for the community.

🧭 We highlight Sailor2's impressive performance in low-resource language translation scenarios and its cultural understanding advantages in Southeast Asia, promoting practical applications for regional languages.

Model updates include:
💡 More precise outputs: Reduced redundancy in model outputs through refined post-training data and optimization techniques.
🌈 Handling longer texts: Expanded to handle up to 128K context length in Southeast Asian languages through long-text training.
⚡️ Faster inference: Achieved 2.5x faster inference speed with speculative decoding.
🌪️ More model sizes: Introduced new sizes of 3B and 14B through model pruning.

🌟 All models are Apache-licensed for commercial use; development tools (code, resources) are open-source.

📚 Technical report: Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs (2502.12982)
🤖️ Models: sail/sailor2-language-models-674d7c9e6b4dbbd9a869906b
💬 Demo: sail/Sailor2-20B-Chat
📣 Sailor2 community: https://huggingface.co/sailor2

liked a Space 14 days ago

840

FineWeb: decanting the web for the finest text data at scale

🍷

Generate high-quality web text data for LLM training