Edit model card

This is a custom INT8 version of the original BLOOM weights to make it fast to use with the DeepSpeed-Inference engine which uses Tensor Parallelism. In this repo the tensors are split into 8 shards to target 8 GPUs.

The full BLOOM documentation is here.

To use the weights in repo, you can adapt to your needs the scripts found here (XXX: they are going to migrate soon to HF Transformers code base, so will need to update the link once moved).

Downloads last month
12
Inference API
This model can be loaded on Inference API (serverless).

Space using microsoft/bloom-deepspeed-inference-int8 1