Mariusz Kurman's picture

Mariusz Kurman PRO

mkurman

AI & ML interests

AI Tech Lead | MD

Recent Activity

Organizations

MedIT Solutions's profile picture BigScience Biomedical Datasets's profile picture SOWA Project's profile picture

mkurman's activity

New activity in Datou1111/shou_xin 20 days ago

Add generated example

1
#3 opened 20 days ago by
mkurman
reacted to reddgr's post with ๐Ÿ‘€ 26 days ago
view post
Post
1815
Thought it would only make sense to share this here. Lately, one of my favorite activities has been annotating prompts and putting them into datasets ( reddgr/tl-test-learn-prompts reddgr/rq-request-question-prompts reddgr/nli-chatbot-prompt-categorization), which I then use to classify and select chatbot conversations for my website. It's quite fun to use this widget on the lmsys/lmsys-chat-1m, but I also use it on my 2 years of talking to chatbots (soon to be dataset, but still a lot of web scraping and ETL work left)... This one in the picture was probably one of the first prompts I wrote to an LLM:
posted an update 28 days ago
view post
Post
288
How Do I Contribute (HDIC)

Exciting times to come? We are working on a layer self-esteem technique to score their contribution to the final prediction. For now, it unlocks a lot of knowledge already stored in weights we couldn't force the model to extract by further fine-tuning!
reacted to AdinaY's post with ๐Ÿ”ฅ 28 days ago
view post
Post
1336
HunyuanVideo ๐Ÿ“น The new open video generation model by Tencent!
๐Ÿ‘‰ tencent/HunyuanVideo
zh-ai-community/video-models-666afd86cfa4e4dd1473b64c
โœจ 13B parameters: Probably the largest open video model to date
โœจ Unified architecture for image & video generation
โœจ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite
โœจ Delivers stunning visuals, diverse motion, and unparalleled stability
๐Ÿ”“ Fully open with code & weights
reacted to singhsidhukuldeep's post with ๐Ÿค— 28 days ago
view post
Post
1307
Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and Bloomberg have developed M3DocRAG, a revolutionary framework for multi-modal document understanding.

The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with:
- Process 40,000+ pages across 3,000+ documents
- Answer questions requiring information from multiple pages
- Understand visual elements like charts, tables, and figures
- Support both closed-domain (single document) and open-domain (multiple documents) queries

Under the hood, M3DocRAG operates through three sophisticated stages:

>> Document Embedding:
- Converts PDF pages to RGB images
- Uses ColPali to project both text queries and page images into a shared embedding space
- Creates dense visual embeddings for each page while maintaining visual information integrity

>> Page Retrieval:
- Employs MaxSim scoring to compute relevance between queries and pages
- Implements inverted file indexing (IVFFlat) for efficient search
- Reduces retrieval latency from 20s to under 2s when searching 40K+ pages
- Supports approximate nearest neighbor search via Faiss

>> Question Answering:
- Leverages Qwen2-VL 7B as the multi-modal language model
- Processes retrieved pages through a visual encoder
- Generates answers considering both textual and visual context

The results are impressive:
- State-of-the-art performance on MP-DocVQA benchmark
- Superior handling of non-text evidence compared to text-only systems
- Significantly better performance on multi-hop reasoning tasks

This is a game-changer for industries dealing with large document volumesโ€”finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.
ยท
reacted to cfahlgren1's post with ๐Ÿ”ฅ 28 days ago
view post
Post
1889
You can just ask things ๐Ÿ—ฃ๏ธ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset ๐Ÿ”ฅ

argilla/magpie-ultra-v1.0

replied to their post 28 days ago
view reply

That is an excellent question. I was just googling and searching in Arxiv. Now, I try Elicit, โ€œtalkโ€ with papers and listen to โ€œpodcastsโ€ on NotebookLM.

replied to their post 29 days ago
reacted to AdinaY's post with โค๏ธ 29 days ago
view post
Post
1473
2023 & 2024 Top Downloaded (all time) Open Models on the hub are both from the Chinese community ๐Ÿ‘€

2023 ๐Ÿ‘‰ Bge base by BAAI
BAAI/bge-base-en-v1.5
2024 ๐Ÿ‘‰ Qwen 2.5 by Alibaba Qwen
Qwen/Qwen2.5-1.5B-Instruct

Canโ€™t wait to see what incredible models the Chinese community will bring in 2025๐Ÿš€

โœจ Follow https://huggingface.co/zh-ai-community to get the latest updates from the Chinese community
โœจ Explore the 2024 Year in Review huggingface/open-source-ai-year-in-review-2024
reacted to prithivMLmods's post with โค๏ธ 29 days ago
view post
Post
2633
Milestone for Flux.1 Dev ๐Ÿ”ฅ

๐Ÿ’ขThe Flux.1 Dev model has crossed 1๏ธโƒฃ0๏ธโƒฃ,0๏ธโƒฃ0๏ธโƒฃ0๏ธโƒฃ creative public adapters! ๐ŸŽˆ
๐Ÿ”— https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

๐Ÿ’ขThis includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

๐Ÿ’ข Hereโ€™s the 10,000th public adapter : ๐Ÿ˜œ
+ strangerzonehf/Flux-3DXL-Partfile-0006

๐Ÿ’ข Page :
+ https://huggingface.co/strangerzonehf

๐Ÿ’ข Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
posted an update 29 days ago
view post
Post
429
What AI-enhanced research tools would you recommend for searching and analyzing scientific papers?
  • 5 replies
ยท
reacted to nataliaElv's post with ๐Ÿ‘€ 29 days ago
view post
Post
1184
We're so close to reaching 100 languages! Can you help us cover the remaining 200? Check if we're still looking for language leads for your language: nataliaElv/language-leads-dashboard
reacted to AdinaY's post with ๐Ÿ‘€ 29 days ago
view post
Post
1336
HunyuanVideo ๐Ÿ“น The new open video generation model by Tencent!
๐Ÿ‘‰ tencent/HunyuanVideo
zh-ai-community/video-models-666afd86cfa4e4dd1473b64c
โœจ 13B parameters: Probably the largest open video model to date
โœจ Unified architecture for image & video generation
โœจ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite
โœจ Delivers stunning visuals, diverse motion, and unparalleled stability
๐Ÿ”“ Fully open with code & weights
reacted to merve's post with ๐Ÿค— 29 days ago
view post
Post
2883
Last week we were blessed with open-source models! A recap ๐Ÿ’
merve/nov-29-releases-674ccc255a57baf97b1e2d31

๐Ÿ–ผ๏ธ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model ๐Ÿ’—
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents ๐Ÿค–
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model๐Ÿ“–
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT ๐Ÿ“•

๐Ÿ’ฌ LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet ๐Ÿ”ฅ
> AliBaba has released Marco-o1, a new open-source reasoning model ๐Ÿ’ฅ
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)

โฏ๏ธ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res โฏ๏ธ
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla ๐Ÿท๏ธ

Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
reacted to vincentg64's post with ๐Ÿ‘€ 30 days ago
view post
Post
1226
LLM 2.0, the New Generation of Large Language Models https://mltblog.com/49ksOLL

I get many questions about the radically different LLM technology that I started to develop 2 years ago. Initially designed to retrieve information that I could no longer find on the Internet, not with search, OpenAI, Gemini, Perplexity or any other platform, it evolved to become the ideal solution for professional enterprise users. Now agentic and multimodal, automating business tasks at scale with lightning speed, consistently delivering real ROI, bypassing the costs associated to training and GPU with zero weight and explainable AI, tested and developed for Fortune 100 company.

So, what is behind the scenes, how different is it compared to LLM 1.0 (GPT and the likes), how can it be hallucination-free, what makes it a game changer, how did it eliminate prompt engineering, how does it handle knowledge graphs without neural networks, and what are the other benefits?

In a nutshell, the performance is due to building a robust architecture from the ground up and at every step, offering far more than a prompt box, relying on home-made technology rather than faulty Python libraries, and designed by enterprise and tech visionaries for enterprise users.

Contextual smart crawling to retrieve underlying taxonomies, augmented taxonomies, long contextual multi-tokens, real-time fine-tunning, increased security, LLM router with specialized sub-LLMs, an in-memory database architecture of its own to efficiently handle sparsity in keyword associations, contextual backend tables, agents built on the backend, mapping between prompt and corpus keywords, customized PMI rather than cosine similarity, variable-length embeddings, and the scoring engine (the new โ€œPageRankโ€ of LLMs) returning results along with the relevancy scores, are but a few of the differentiators.

โžก๏ธ Read the full article, at https://mltblog.com/49ksOLL
  • 1 reply
ยท