hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! ๐ฅ
What a week! A recap for everything you missed โ๏ธ merve/nov-22-releases-673fbbcfc1c97c4f411def07 Multimodal โจ > Mistral AI released Pixtral 124B, a gigantic open vision language model > Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU > OpenGVLab released MMPR: a new multimodal reasoning dataset > Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings > Apple released new SotA vision encoders AIMv2
LLMs ๐ฆ > AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR > Jina has released embeddings-v3: new multilingual embeddings with longer context > Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning > Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs
Image Generation ๐ผ๏ธ > Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations
Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐ $ pip install observers
๐ Glif App's Remixes feature allows you to slap a logo onto anything, seamlessly integrating the input image (logo) into various contexts. The result is stunning remixes that blend the input logo with generated images (img2img logo mapping) for incredible outcomes.
The (768 x 1024) mix of MidJourney and Flux's LoRA is nearly identical to the actual visual design. It hasnโt undergone much concept art development for now. In the meantime, try out the impressive visual designs on:
The (768 x 1024) mix of MidJourney and Flux's LoRA is nearly identical to the actual visual design. It hasnโt undergone much concept art development for now. In the meantime, try out the impressive visual designs on:
Qwen2.5-72B is now the default HuggingChat model. This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
The cleaning process consists of: - Joining the separate splits together / add split column - Converting string messages into list of structs - Removing empty system prompts
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. Thatโs where our chunk-based approach comes in.
Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:
โฉ Only upload the chunks that changed. ๐ Download just the updates, not the whole file. ๐ง We store your file as deduplicated chunks
In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isnโt just a performance boost. Itโs a rethinking of how we manage models and datasets on the Hub.
We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?
๐ The leaderboard is available in both Japanese and English ๐ Based on the evaluation tool, llm-jp-eval with more than 20 datasets for Japanese LLMs ๐ The leaderboard showcases all the metrics for NLP experts, plus averages for NLP beginners ๐ป For the comfort of users, we chose a horizontal UI, and implemented it in a light and dark theme on Gradio ๐ฌ The radar chart provides a very interesting visualization of metrics! ๐ฑ We are using the Japanese research platform, MDX, so please be patient! โก LLMs bigger than +70B will be evaluated soonโฆ
How do you say โGPUs Go Brrrโ in Japanese - > GPUใใใณใใณ๏ฝ! (To pronounce "GPU ga bunbun!") ๐ฅ