221 560 426

Adina Yakefu

AdinaY

AI & ML interests

None yet

Recent Activity

posted an update 3 days ago

Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team. https://huggingface.co/collections/AIDC-AI/ovis2-67ab36c7e497429034874464 ✨1B/2B/4B/8B/16B/34B ✨Strong CoT for deeper problem solving ✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction

posted an update 3 days ago

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab Model: https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long Demo: https://huggingface.co/spaces/FunAudioLLM/InspireMusic ✨ Music, songs, audio - ALL IN ONE ✨ High quality audio: 24kHz & 48kHz sampling rates ✨ Long-Form Generation: enables extended audio creation ✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

upvoted a collection 3 days ago

Lumina Family

View all activity

Organizations

AdinaY's activity

posted an update 3 days ago

Post

2353

Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction

posted an update 3 days ago

Post

3402

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

1 reply

reacted to lin-tan's post with 🔥 11 days ago

Post

3262

🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471

We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖

💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.

📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.

@yiwu @jiang719

#ICRA2025 #LLM #Robotics #Agent #LLMPlanner

reacted to their post with 🔥 11 days ago

Post

3026

Xwen 🔥 a series of open models based on Qwen2.5 models, developed by a brilliant research team of PhD students from the Chinese community.
shenzhi-wang/xwen-chat-679e30ab1f4b90cfa7dbc49e
✨ 7B/72B
✨ Apache 2.0
✨ Xwen-72B-Chat outperformed DeepSeek V3 on Arena Hard Auto

posted an update 11 days ago

Post

3026

reacted to m-ric's post with 🚀🔥 11 days ago

Post

9396

Introducing 𝗼𝗽𝗲𝗻 𝗗𝗲𝗲𝗽-𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 by Hugging Face! 💥

OpenAI's latest agentic app Deep Research seems really good... But it's closed, as usual.

⏱️ So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research! ⏱️

➡️ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

We aimed for the best performance: are the agent's answers really rigorous?

On GAIA benchmark, Deep Research had 67% accuracy on the validation set.
➡️ open Deep Research is at 55% (powered by o1), it is:
- the best pass@1 solution submitted
- the best open solution 💪💪

And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top !

Read the blog post 👉 https://huggingface.co/blog/open-deep-research

reacted to albertvillanova's post with 🤗 11 days ago

Post

3180

🚀 Introducing @huggingface Open Deep-Research💥

In just 24 hours, we built an open-source agent that:
✅ Autonomously browse the web
✅ Search, scroll & extract info
✅ Download & manipulate files
✅ Run calculations on data

55% on GAIA validation set! Help us improve it!💡
https://huggingface.co/blog/open-deep-research

3 replies

reacted to victor's post with 🔥❤️ 11 days ago

Post

3813

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

5 replies

reacted to ZhengPeng7's post with 🔥👍 11 days ago

Post

2208

We just released the [BiRefNet_HR]( ZhengPeng7/BiRefNet_HR) for general use on higher resolution images, which was trained with images in 2048x2048. If your images are mostly larger than 1024x1024, use BiRefNet_HR for better results! Thanks to @Freepik for the kind support of H200s for this huge training.

HF Model: ZhengPeng7/BiRefNet_HR.
HF Demo: ZhengPeng7/BiRefNet_demo, where you need to choose General-HR and set high resolution.
PyTorch weights & ONNX: in Google Drive and the GitHub release.

Here is a comparison between the results of the original one and the new HR one on HR inputs:

And, the performance of this new HR one and the previous one trained in 1024x1024 on val set:

posted an update 19 days ago

Post

3168

It’s not just a flood of model releases, papers are dropping just as fast 🚀

Here are the 10 most upvoted papers from the Chinese community:
👉 zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a

replied to their post 20 days ago

What a month! 🤯
Let’s add the two more major releases to the list!

✨Qwen2.5-VL by Alibaba
https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5

✨Janus-pro by DeepSeek
https://huggingface.co/deepseek-ai/Janus-Pro-7B

reacted to clem's post with 🤗🔥 20 days ago

Post

7107

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

replied to their post 20 days ago

Good catch! Qwen team also mentioned there'll be some surprise. 👀

posted an update 20 days ago

Post

2631

🔥So many exciting releases coming from the Chinese community this month!
zh-ai-community/2025-january-6786b054f492fb223591269e

LLMs:
✨ Qwen2.5 -1M by Alibaba
Qwen/qwen25-1m-679325716327ec07860530ba
✨ InternLM3-8B-Instruct by Shanghai AI Lab
internlm/internlm3-8b-instruct
✨ MiniMax-Text-01 by MiniMax AI
MiniMaxAI/MiniMax-Text-01
✨ RWKV-7 by BlinkDL -- RNN + Transformer 👀
BlinkDL/rwkv-7-world
✨ DeepSeek-R1 by DeepSeek -- THE ONE 🙌
https://huggingface.co/deepseek-ai
✨ Baichuan-M1-14B by Baichuan - Medical 🩺
baichuan-inc/Baichuan-M1-14B-Base
✨ Qwen2.5-Math-PRM by Alibaba - Math 🔢
Qwen/Qwen2.5-Math-PRM-7B

Code:
✨ Tare by Bytedance
https://trae.ai

TTS:
✨ T2A-01-HD by MiniMax AI
https://hailuo.ai/audio
✨ LLaSA by HKUST Audio
HKUSTAudio/Llasa-3B

MLLM:
✨ Kimi k1.5 by Moonshot AI
https://kimi.ai
✨ MiniCPM-o-2_6 by OpenBMB
openbmb/MiniCPM-o-2_6
✨ Sa2VA-4B by ByteDance
ByteDance/Sa2VA-4B
✨ VideoLLaMA 3 by Alibaba DAMO
DAMO-NLP-SG/videollama3-678cdda9281a0e32fe79af15
✨ LLaVA-Mini by Chinese Academy of Sciences
ICTNLP/llava-mini-llama-3.1-8b
✨Hunyuan-7B by Tencent
tencent/Hunyuan-7B-Instruct
✨ Hunyuan 3D 2.0 by Tencent
tencent/Hunyuan3D-2
✨MiniMax-VL-01 by MiniMax AI - A non transformer based VLM 👀
MiniMaxAI/MiniMax-VL-01

Agent:
✨ UI-TARS by Bytedance
bytedance-research/UI-TARS-7B-SFT
✨ GLM-PC by Zhipu AI
https://cogagent.aminer.cn

Dataset:
✨ Fineweb-Edu-Chinese by Opencsg
opencsg/Fineweb-Edu-Chinese-V2.1
✨ Multimodal_textbook by Alibaba
DAMO-NLP-SG/multimodal_textbook
✨ MME-Finance by Hithink AI

4 replies

reacted to Kseniase's post with 🔥🚀 21 days ago

Post

3041

7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)

1 reply