Organization Card

The xMADified Family

From xMAD.ai

Welcome to the official Hugging Face organization for xMADified models from xMAD.ai!

The repositories below contains popular open-source models xMADified with our NeurIPS 2024 methods from 16-bit floats to 4-bit integers, using xMAD.ai proprietary technology.

These models are fine-tunable over the same reduced (4x less) hardware in mere 3-clicks.

Watch our product demo here

CLICK HERE TO JOIN BETA for:

No-code deployment
Proprietary Dataset Management
On-Premise Fine-tuning
Endpoint Scaling
System Health Monitoring
Seamless API Integration

and more!

The memory and hardware requirements (GPU memory needed to run as well as fine-tune them) are listed in the table below:

Model	GPU Memory Requirement (Before/After)
Llama-3.1-405B-Instruct-xMADai-INT4	800 GB (16 H100s) → 250 GB (8 V100)
Llama-3.1-Nemotron-70B-Instruct-xMADai-INT4	140 GB (4 L40S) → 40 GB (1 L40S)
Llama-3.1-8B-Instruct-xMADai-INT4	16 GB → 7 GB (any laptop GPU)
Llama-3.2-3B-Instruct-xMADai-INT4	6.5 GB → 3.5 GB (any laptop GPU)
Llama-3.2-1B-Instruct-xMADai-4bit	2.5 GB → 2 GB (any laptop GPU)
Mistral-Small-Instruct-2409-xMADai-INT4	44 GB → 12 GB (T4)
Mistral-Large-Instruct-2407-xMADai-INT4	250 GB → 65GB (1 A100)
gemma-2-9b-it-xMADai-INT4	18.5 GB → 8 GB (any laptop GPU)