Happy New Year, Huggingface community! In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.
The deepseek-ai/DeepSeek-V3-Base model has featured today on CNBC tech news. The whale made a splash by using FP8 and shrink the cost of training significantly!
I've built a small utility to split safetensors file by file. The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16. The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.
BTW: I'll upload the bf16 version here: DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16 (it will take a while - days with my upload speed) If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.
The tool, is available from here: https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file. I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline. use --help for usage Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.