Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

How much data was Bloom trained on?

#133
by mishavee - opened

here
https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai/

it says gpt3
GPT-3 is a very large language model (the largest till date) with about 175B parameters.
It is trained on about 45TB of text data from different datasets.

it says the above at the bottom of the article.

how much was Bloom trained on? I thought it was 1.6tb but I think I'm wrong.

BigScience Workshop org

BLOOM was indeed trained on 1.61TB of data.

https://huggingface.co/bigscience/bloom#training-data

christopher changed discussion status to closed

Sign up or log in to comment