Emre DalkΔ±ran

iojvsuynv

AI & ML interests

None yet

Recent Activity

reacted to singhsidhukuldeep's post with ❀️ about 1 month ago
It's not every day you see the No. 1 ranked paper of the day open-sourcing a very powerful image editing app! Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI! The system's architecture features three sophisticated components: 1. Editing Processor: - Implements a dual-branch architecture integrated into a latent diffusion framework - Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting - Features a specialized UNet architecture with zero-convolution layers for feature insertion - Employs denoising score matching for training the control branch - Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks - Maintains pixel-level control through VAE-based latent space operations 2. Painting Assistor: - Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA) - Trained on a custom dataset derived from Densely Captioned Images (DCI) - Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations - Features bounding box coordinate normalization for precise stroke localization - Implements streamlined single-word/phrase outputs for real-time performance 3. Idea Collector: - Built as a modular ReactJS component library - Supports cross-platform deployment via HTTP protocols - Compatible with Gradio and ComfyUI frameworks - Features comprehensive layer management and parameter adjustment capabilities - Implements real-time canvas updates and preview generation The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks. What are your thoughts on AI-powered creative tools?
View all activity

Organizations

None yet

iojvsuynv's activity

New activity in NexaAIDev/OmniVLM-968M 27 days ago

Error loading model

2
#9 opened 28 days ago by
iojvsuynv
reacted to hbseong's post with πŸ‘€ about 1 month ago
view post
Post
919
🚨πŸ”₯ New Release Alert! πŸ”₯🚨

Introducing the 435M model that outperforms Llama-Guard-3-8B while slashing 75% of the computation cost! πŸ’»πŸ’₯
πŸ‘‰ Check it out: hbseong/HarmAug-Guard (Yes, INFERENCE CODE INCLUDED! πŸ’‘)

More details in our paper: https://arxiv.org/abs/2410.01524 πŸ“œ

#HarmAug #LLM # Safety #EfficiencyBoost #Research #AI #MachineLearning
  • 1 reply
Β·
reacted to singhsidhukuldeep's post with ❀️ about 1 month ago
view post
Post
1904
It's not every day you see the No. 1 ranked paper of the day open-sourcing a very powerful image editing app!

Fascinating to see MagicQuill - a groundbreaking interactive image editing system that makes precise photo editing effortless through advanced AI!

The system's architecture features three sophisticated components:

1. Editing Processor:
- Implements a dual-branch architecture integrated into a latent diffusion framework
- Utilizes PiDiNet for edge map extraction and content-aware per-pixel inpainting
- Features a specialized UNet architecture with zero-convolution layers for feature insertion
- Employs denoising score matching for training the control branch
- Processes both structural modifications via scribble guidance and color manipulation through downsampled color blocks
- Maintains pixel-level control through VAE-based latent space operations

2. Painting Assistor:
- Powered by a fine-tuned LLaVA multimodal LLM using Low-Rank Adaptation (LoRA)
- Trained on a custom dataset derived from Densely Captioned Images (DCI)
- Processes user brushstrokes through specialized Q&A tasks for add/subtract/color operations
- Features bounding box coordinate normalization for precise stroke localization
- Implements streamlined single-word/phrase outputs for real-time performance

3. Idea Collector:
- Built as a modular ReactJS component library
- Supports cross-platform deployment via HTTP protocols
- Compatible with Gradio and ComfyUI frameworks
- Features comprehensive layer management and parameter adjustment capabilities
- Implements real-time canvas updates and preview generation

The system outperforms existing solutions like SmartEdit and BrushNet in edge alignment and color fidelity while maintaining seamless integration with popular AI frameworks.

What are your thoughts on AI-powered creative tools?
reacted to tomaarsen's post with πŸ”₯ about 1 month ago
view post
Post
5234
I just released Sentence Transformers v3.3.0 & it's huge! 4.5x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free perf. boost, PEFT integration, evaluation on NanoBEIR, and more! Details:

1. We integrate Post-Training Static Quantization using OpenVINO, a very efficient solution for CPUs that processes 4.78x as many texts per second on average, while only hurting performance by 0.36% on average. There's a new export_static_quantized_openvino_model method to quantize a model.

2. We add the option to train with prompts, e.g. strings like "query: ", "search_document: " or "Represent this sentence for searching relevant passages: ". It's as simple as using the prompts argument in SentenceTransformerTrainingArguments. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.

3. Sentence Transformers now supports training PEFT adapters via 7 new methods for adding new adapters or loading pre-trained ones. You can also directly load a trained adapter with SentenceTransformer as if it's a normal model. Very useful for e.g. 1) training multiple adapters on 1 base model, 2) training bigger models than otherwise possible, or 3) cheaply hosting multiple models by switching multiple adapters on 1 base model.

4. We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. It contains 13 datasets with 50 queries and up to 10k documents each. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks.

Additionally, we also deprecate Python 3.8, add better compatibility with Transformers v4.46.0, and more. Read the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0
reacted to hbseong's post with πŸ‘ about 2 months ago
view post
Post
3302
🚨πŸ”₯ New Release Alert! πŸ”₯🚨

Introducing the 435M model that outperforms Llama-Guard-3-8B while slashing 75% of the computation cost! πŸ’»πŸ’₯
πŸ‘‰ Check it out: hbseong/HarmAug-Guard (Yes, INFERENCE CODE INCLUDED! πŸ’‘)

More details in our paper: https://arxiv.org/abs/2410.01524 πŸ“œ

#HarmAug #LLM # Safety #EfficiencyBoost #Research #AI #MachineLearning
reacted to m-ric's post with πŸ”₯πŸ‘€ 2 months ago
view post
Post
2935
Rhymes AI drops Aria: small Multimodal MoE that beats GPT-4o and Gemini-1.5-Flash ⚑️

New player entered the game! Rhymes AI has just been announced, and unveiled Aria – a multimodal powerhouse that's punching above its weight.

Key insights:

🧠 Mixture-of-Experts architecture: 25.3B total params, but only 3.9B active.

🌈 Multimodal: text/image/video β†’ text.

πŸ“š Novel training approach: β€œmultimodal-native” where multimodal training starts directly during pre-training, not just tacked on later

πŸ“ Long 64K token context window

πŸ”“ Apache 2.0 license, with weights, code, and demos all open

⚑️ On the benchmark side, Aria leaves some big names in the dust.

- It beats Pixtral 12B or Llama-3.2-12B on several vision benchmarks like MMMU or MathVista.
- It even overcomes the much bigger GPT-4o on long video tasks and even outshines Gemini 1.5 Flash when it comes to parsing lengthy documents.

But Rhymes AI isn't just showing off benchmarks. They've already got Aria powering a real-world augmented search app called β€œBeago”. It’s handling even recent events with great accuracy!

And they partnered with AMD to make it much faster than competitors like Perplexity or Gemini search.

Read their paper for Aria πŸ‘‰Β  Aria: An Open Multimodal Native Mixture-of-Experts Model (2410.05993)

Try BeaGo 🐢 πŸ‘‰Β https://rhymes.ai/blog-details/introducing-beago-your-smarter-faster-ai-search
  • 1 reply
Β·
reacted to merve's post with πŸ”₯ 3 months ago
view post
Post
2726
NVIDIA just dropped a gigantic multimodal model called NVLM 72B πŸ¦–
nvidia/NVLM-D-72B
Paper page NVLM: Open Frontier-Class Multimodal LLMs (2409.11402)

The paper contains many ablation studies on various ways to use the LLM backbone πŸ‘‡πŸ»

🦩 Flamingo-like cross-attention (NVLM-X)
πŸŒ‹ Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D)
✨ a hybrid architecture (NVLM-H)

Checking evaluations, NVLM-D and NVLM-H are best or second best compared to other models πŸ‘

The released model is NVLM-D based on Qwen-2 Instruct, aligned with InternViT-6B using a huge mixture of different datasets

You can easily use this model by loading it through transformers' AutoModel 😍