File size: 6,300 Bytes
07423df |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
description: Setting up and runnning H2O LLM Studio requires the following minimal prerequisites. This page lists out the speed and performance metrics of H2O LLM Studio based on different hardware setups.
---
# H2O LLM Studio performance
Setting up and runnning H2O LLM Studio requires the following minimal [prerequisites](set-up-llm-studio.md#prerequisites). This page lists out the speed and performance metrics of H2O LLM Studio based on different hardware setups.
The following metrics were measured.
- **Hardware setup:** The type and number of computing devices used to train the model.
- **LLM backbone:** The underlying architecture of the language model. For more information, see [LLM backbone](concepts.md#llm-backbone).
- **Quantization:** A technique used to reduce the size and memory requirements of the model. For more information, see [Quantization](concepts.md#quantization).
- **Train**: The amount of time it took to train the model in hours and minutes.
- **Validation:** The amount of time it took to validate the mode in hours and minutes.
| Hardware setup | LLM backbone | Quantization | Train (hh:mm:ss)| Validation (hh:mm:ss) |
|---|---|---|---|---|
| 8xA10G | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 11:35 | 3:32 |
| 4xA10G | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 21:13 | 06:35 |
| 2xA10G | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 37:04 | 12:21 |
| 1xA10G | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 1:25:29 | 15:50 |
| 8xA10G | h2oai/h2ogpt-4096-llama2-7b | nf4 | 14:26 | 06:13 |
| 4xA10G | h2oai/h2ogpt-4096-llama2-7b | nf4 | 26:55 | 11:59 |
| 2xA10G | h2oai/h2ogpt-4096-llama2-7b | nf4 | 48:24 | 23:37 |
| 1xA10G | h2oai/h2ogpt-4096-llama2-7b | nf4 | 1:26:59 | 42:17 |
| 8xA10G | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | OOM | OOM |
| 4xA10G | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | OOM | OOM |
| 2xA10G | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | OOM | OOM |
| 1xA10G | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | OOM | OOM |
| 8xA10G | h2oai/h2ogpt-4096-llama2-13b | nf4 | 25:07 | 10:58 |
| 4xA10G | h2oai/h2ogpt-4096-llama2-13b | nf4 | 48:43 | 21:25 |
| 2xA10G | h2oai/h2ogpt-4096-llama2-13b | nf4 | 1:30:45 | 42:06 |
| 1xA10G | h2oai/h2ogpt-4096-llama2-13b | nf4 | 2:44:36 | 1:14:20 |
| 8xA10G | h2oai/h2ogpt-4096-llama2-70b | nf4 | OOM | OOM |
| 4xA10G | h2oai/h2ogpt-4096-llama2-70b | nf4 | OOM | OOM |
| 2xA10G | h2oai/h2ogpt-4096-llama2-70b | nf4 | OOM | OOM |
| 1xA10G | h2oai/h2ogpt-4096-llama2-70b | nf4 | OOM | OOM |
|---|---|---|---|---|
| 4xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 7:04 | 3:55 |
| 2xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 13:14 | 7:23 |
| 1xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | bfloat16 | 23:36 | 13:25 |
| 4xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | nf4 | 9:44 | 6:30 |
| 2xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | nf4 | 18:34 | 12:16 |
| 1xA100 80GB | h2oai/h2ogpt-4096-llama2-7b | nf4 | 34:06 | 21:51 |
| 4xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | 11:46 | 5:56 |
| 2xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | 21:54 | 11:17 |
| 1xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | bfloat16 | 39:10 | 18:55 |
| 4xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | nf4 | 16:51 | 10:35 |
| 2xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | nf4 | 32:05 | 21:00 |
| 1xA100 80GB | h2oai/h2ogpt-4096-llama2-13b | nf4 | 59:11 | 36:53 |
| 4xA100 80GB | h2oai/h2ogpt-4096-llama2-70b | nf4 | 1:13:33 | 46:02 |
| 2xA100 80GB | h2oai/h2ogpt-4096-llama2-70b | nf4 | 2:20:44 | 1:33:42 |
| 1xA100 80GB | h2oai/h2ogpt-4096-llama2-70b | nf4 | 4:23:57 | 2:44:51 |
:::info
The runtimes were gathered using the default parameters.
<details>
<summary>Expand to see the default parameters </summary>
```
architecture:
backbone_dtype: int4
force_embedding_gradients: false
gradient_checkpointing: true
intermediate_dropout: 0.0
pretrained: true
pretrained_weights: ''
augmentation:
random_parent_probability: 0.0
skip_parent_probability: 0.0
token_mask_probability: 0.0
dataset:
add_eos_token_to_answer: true
add_eos_token_to_prompt: true
add_eos_token_to_system: true
answer_column: output
chatbot_author: H2O.ai
chatbot_name: h2oGPT
data_sample: 1.0
data_sample_choice:
- Train
- Validation
limit_chained_samples: false
mask_prompt_labels: true
parent_id_column: None
personalize: false
prompt_column:
- instruction
system_column: None
text_answer_separator: <|answer|>
text_prompt_start: <|prompt|>
text_system_start: <|system|>
train_dataframe: /data/user/oasst/train_full.pq
validation_dataframe: None
validation_size: 0.01
validation_strategy: automatic
environment:
compile_model: false
find_unused_parameters: false
gpus:
- '0'
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
huggingface_branch: main
mixed_precision: true
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: default-8-a10g
llm_backbone: h2oai/h2ogpt-4096-llama2-7b
logging:
logger: None
neptune_project: ''
output_directory: /output/...
prediction:
batch_size_inference: 0
do_sample: false
max_length_inference: 256
metric: BLEU
metric_gpt_model: gpt-3.5-turbo-0301
metric_gpt_template: general
min_length_inference: 2
num_beams: 1
num_history: 4
repetition_penalty: 1.2
stop_tokens: ''
temperature: 0.3
top_k: 0
top_p: 1.0
problem_type: text_causal_language_modeling
tokenizer:
add_prompt_answer_tokens: false
max_length: 512
max_length_answer: 256
max_length_prompt: 256
padding_quantile: 1.0
use_fast: true
training:
batch_size: 2
differential_learning_rate: 1.0e-05
differential_learning_rate_layers: []
drop_last_batch: true
epochs: 1
evaluate_before_training: false
evaluation_epochs: 1.0
grad_accumulation: 1
gradient_clip: 0.0
learning_rate: 0.0001
lora: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 4
lora_target_modules: ''
loss_function: TokenAveragedCrossEntropy
optimizer: AdamW
save_best_checkpoint: false
schedule: Cosine
train_validation_data: false
warmup_epochs: 0.0
weight_decay: 0.0
```
</details>
::: |