hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! ๐ฅ
What a week! A recap for everything you missed โ๏ธ merve/nov-22-releases-673fbbcfc1c97c4f411def07 Multimodal โจ > Mistral AI released Pixtral 124B, a gigantic open vision language model > Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU > OpenGVLab released MMPR: a new multimodal reasoning dataset > Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings > Apple released new SotA vision encoders AIMv2
LLMs ๐ฆ > AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR > Jina has released embeddings-v3: new multilingual embeddings with longer context > Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning > Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs
Image Generation ๐ผ๏ธ > Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations
Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐ $ pip install observers
๐ Glif App's Remixes feature allows you to slap a logo onto anything, seamlessly integrating the input image (logo) into various contexts. The result is stunning remixes that blend the input logo with generated images (img2img logo mapping) for incredible outcomes.