Mex Ivanov
MexIvanov
AI & ML interests
NLP, Coding, Quantum Computing and more.
Recent Activity
reacted
to
singhsidhukuldeep's
post
with ๐ฅ
12 days ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!
The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:
๐ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512ร512 pixels with 14ร14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage
โก๏ธ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224โ384โ512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample
๐ Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)
๐ฏ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!
Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
reacted
to
singhsidhukuldeep's
post
with ๐
13 days ago
Exciting breakthrough in AI: @Meta's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!
The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:
>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.
Three-Component Architecture:
โข Lightweight Local Encoder that converts bytes to patch representations
โข Powerful Global Latent Transformer that processes patches
โข Local Decoder that converts patches back to bytes
>> Technical Advantages
โข Matches performance of Llama 3 at 8B parameters while being more efficient
โข Superior handling of non-English languages and rare character sequences
โข Remarkable 99.9% accuracy on spelling tasks
โข Better scaling properties than token-based models
>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.
This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!
liked
a model
18 days ago
CohereForAI/c4ai-command-r7b-12-2024
Organizations
None yet
MexIvanov's activity
Adding `safetensors` variant of this model
#1 opened about 1 year ago
by
MexIvanov
Adding `safetensors` variant of this model
#1 opened about 1 year ago
by
MexIvanov