Here is how we can calculate the size of any LLM model:
Each parameter in LLM models is typically stored as a floating-point number. The size of each parameter in bytes depends on the precision.
32-bit precision: Each parameter takes 4 bytes. 16-bit precision: Each parameter takes 2 bytes
To calculate the total memory usage of the model: Memory usage (in bytes) = No. of Parameters × Size of Each Parameter
For example: 32-bit Precision (FP32) In 32-bit floating-point precision, each parameter takes 4 bytes. Memory usage in bytes = 1 billion parameters × 4 bytes 1,000,000,000 × 4 = 4,000,000,000 bytes In gigabytes: ≈ 3.73 GB
16-bit Precision (FP16) In 16-bit floating-point precision, each parameter takes 2 bytes. Memory usage in bytes = 1 billion parameters × 2 bytes 1,000,000,000 × 2 = 2,000,000,000 bytes In gigabytes: ≈ 1.86 GB
It depends on whether you use 32-bit or 16-bit precision, a model with 1 billion parameters would use approximately 3.73 GB or 1.86 GB of memory, respectively.