3 17 8

Jesse

jessepisel

jessepisel

AI & ML interests

computer vision, generative ai, agentic

Recent Activity

upvoted a paper 1 day ago

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

upvoted an article 2 days ago

Open R1: Update #3

upvoted an article 2 days ago

Open R1: Update #2

View all activity

Organizations

jessepisel's activity

upvoted a paper 1 day ago

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Paper • 2503.04973 • Published 7 days ago • 18

upvoted 2 articles 2 days ago

Article

Open R1: Update #3

and 9 others •

2 days ago

• 197

Article

Open R1: Update #2

and 6 others •

Feb 10

• 202

reacted to as-cle-bert's post with 👍 6 days ago

Post

2627

I just released a fully automated evaluation framework for your RAG applications!📈

GitHub 👉 https://github.com/AstraBert/diRAGnosis
PyPi 👉 https://pypi.org/project/diragnosis/

It's called 𝐝𝐢𝐑𝐀𝐆𝐧𝐨𝐬𝐢𝐬 and is a lightweight framework that helps you 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝗲 𝘁𝗵𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗼𝗳 𝗟𝗟𝗠𝘀 𝗮𝗻𝗱 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀 𝗶𝗻 𝗥𝗔𝗚 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀.

You can launch it as an application locally (it's Docker-ready!🐋) or, if you want more flexibility, you can integrate it in your code as a python package📦

The workflow is simple:
🧠 You choose your favorite LLM provider and model (supported, for now, are Mistral AI, Groq, Anthropic, OpenAI and Cohere)
🧠 You pick the embedding models provider and the embedding model you prefer (supported, for now, are Mistral AI, Hugging Face, Cohere and OpenAI)
📄 You prepare and provide your documents
⚙️ Documents are ingested into a Qdrant vector database and transformed into a synthetic question dataset with the help of LlamaIndex
📊 The LLM is evaluated for the faithfulness and relevancy of its retrieval-augmented answer to the questions
📊 The embedding model is evaluated for hit rate and mean reciprocal ranking (MRR) of the retrieved documents

And the cool thing is that all of this is 𝗶𝗻𝘁𝘂𝗶𝘁𝗶𝘃𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗹𝘆 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱: you plug it in, and it works!🔌⚡

Even cooler? This is all built on top of LlamaIndex and its integrations: no need for tons of dependencies or fancy workarounds🦙
And if you're a UI lover, Gradio and FastAPI are there to provide you a seamless backend-to-frontend experience🕶️

So now it's your turn: you can either get diRAGnosis from GitHub 👉 https://github.com/AstraBert/diRAGnosis
or just run a quick and painless:

uv pip install diragnosis

To get the package installed (lightning-fast) in your environment🏃‍♀️

Have fun and feel free to leave feedback and feature/integrations requests on GitHub issues✨

upvoted a paper 7 days ago

LLM as a Broken Telephone: Iterative Generation Distorts Information

Paper • 2502.20258 • Published 15 days ago • 21

updated a model 7 days ago

thinkonward/challenges

Updated 8 days ago

liked a model 8 days ago

amd/Instella-3B-Instruct

Text Generation • Updated 7 days ago • 1.18k • 34

updated a model 8 days ago

thinkonward/denoizer

Updated 8 days ago • 10

updated 2 models 9 days ago

thinkonward/section-seeker-base-16

Updated 9 days ago

thinkonward/section-seeker-large-16

Updated 9 days ago

published a model 9 days ago

thinkonward/denoizer

Updated 8 days ago • 10

liked a model 23 days ago

perplexity-ai/r1-1776

Text Generation • Updated 16 days ago • 55k • • 2.12k

liked a Space 29 days ago

AI Energy Score Leaderboard

🌟

Explore energy-efficient AI models by task

reacted to fdaudens's post with ❤️ 29 days ago

Post

2690

⭐️ The AI Energy Score project just launched - this is a game-changer for making informed decisions about AI deployment.

You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.

Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.

166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.

Why this matters:
- Teams can pick efficient models that still get the job done
- Developers can optimize for energy use from day one
- Organizations can finally predict their AI environmental impact

If you're building with AI at any scale, definitely worth checking out.

👉 leaderboard: https://lnkd.in/esrSxetj
👉 blog post: https://lnkd.in/eFJvzHi8

Huge work led by @sasha with @bgamazay @yjernite @sarahooker @regisss @meg

1 reply

upvoted a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 203

upvoted a collection about 1 month ago

Tulu 3 Models

Collection

All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated about 14 hours ago • 93

upvoted an article about 1 month ago

Article

Welcome to Inference Providers on the Hub 🔥

Jan 28

• 429

reacted to fdaudens's post with ❤️ about 1 month ago

Post

8809

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

4 replies

upvoted an article about 1 month ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 803

upvoted a paper 3 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352