shanearora
commited on
Commit
•
501c498
1
Parent(s):
704314f
Update README.md
Browse files
README.md
CHANGED
@@ -168,7 +168,7 @@ Both stages contribute equally to the final performance of the OLMo model. After
|
|
168 |
OLMo 7B architecture with peer models for comparison.
|
169 |
|
170 |
| | **OLMo 7B July 2024** | [OLMo 1.0 7B](https://huggingface.co/allenai/OLMo-7B-hf) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
|
171 |
-
|
172 |
| d_model | 4096 | 4096 | 4096 | 4096 | 4544 | 4096 |
|
173 |
| num heads | 32 | 32 | 32 | 32 | 71 | 16 |
|
174 |
| num layers | 32 | 32 | 32 | 32 | 32 | 32 |
|
@@ -197,7 +197,7 @@ AdamW optimizer parameters are shown below.
|
|
197 |
Optimizer settings comparison with peer models.
|
198 |
|
199 |
| | **OLMo 7B July 2024** | [OLMo 1.0 7B](https://huggingface.co/allenai/OLMo-7B-hf) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) |
|
200 |
-
|
201 |
| warmup steps | 2500 | 5000 | 2000 | 2000 | 1000 |
|
202 |
| peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 |
|
203 |
| minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 |
|
@@ -212,7 +212,7 @@ Optimizer settings comparison with peer models.
|
|
212 |
|
213 |
|
214 |
|
215 |
-
## Environmental Impact
|
216 |
|
217 |
OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
|
218 |
A summary of the environmental impact. Further details are available in the paper.
|
@@ -220,7 +220,7 @@ A summary of the environmental impact. Further details are available in the pape
|
|
220 |
| | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
|
221 |
|-----------|------------|-----------------------------|--------------------------------|---------------------------|
|
222 |
| OLMo 7B Twin | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu)) | 135 MWh | 0* | 0* |
|
223 |
-
| OLMo 7B | A100-40GB ([MosaicML](https://www.mosaicml.com)) | 104 MWh | 0.656 | 75.05 |
|
224 |
|
225 |
## Bias, Risks, and Limitations
|
226 |
|
|
|
168 |
OLMo 7B architecture with peer models for comparison.
|
169 |
|
170 |
| | **OLMo 7B July 2024** | [OLMo 1.0 7B](https://huggingface.co/allenai/OLMo-7B-hf) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) | PaLM 8B |
|
171 |
+
|------------------------|-------------------|-------------------|---------------------|--------------------|--------------------|------------------|
|
172 |
| d_model | 4096 | 4096 | 4096 | 4096 | 4544 | 4096 |
|
173 |
| num heads | 32 | 32 | 32 | 32 | 71 | 16 |
|
174 |
| num layers | 32 | 32 | 32 | 32 | 32 | 32 |
|
|
|
197 |
Optimizer settings comparison with peer models.
|
198 |
|
199 |
| | **OLMo 7B July 2024** | [OLMo 1.0 7B](https://huggingface.co/allenai/OLMo-7B-hf) | [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b) | [OpenLM 7B](https://laion.ai/blog/open-lm/) | [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) |
|
200 |
+
|-----------------------|------------------|------------------|---------------------|--------------------|--------------------\
|
201 |
| warmup steps | 2500 | 5000 | 2000 | 2000 | 1000 |
|
202 |
| peak LR | 3.0E-04 | 3.0E-04 | 3.0E-04 | 3.0E-04 | 6.0E-04 |
|
203 |
| minimum LR | 3.0E-05 | 3.0E-05 | 3.0E-05 | 3.0E-05 | 1.2E-05 |
|
|
|
212 |
|
213 |
|
214 |
|
215 |
+
<!-- ## Environmental Impact
|
216 |
|
217 |
OLMo 7B variants were either trained on MI250X GPUs at the LUMI supercomputer, or A100-40GB GPUs provided by MosaicML.
|
218 |
A summary of the environmental impact. Further details are available in the paper.
|
|
|
220 |
| | GPU Type | Power Consumption From GPUs | Carbon Intensity (kg CO₂e/KWh) | Carbon Emissions (tCO₂eq) |
|
221 |
|-----------|------------|-----------------------------|--------------------------------|---------------------------|
|
222 |
| OLMo 7B Twin | MI250X ([LUMI supercomputer](https://www.lumi-supercomputer.eu)) | 135 MWh | 0* | 0* |
|
223 |
+
| OLMo 7B | A100-40GB ([MosaicML](https://www.mosaicml.com)) | 104 MWh | 0.656 | 75.05 | -->
|
224 |
|
225 |
## Bias, Risks, and Limitations
|
226 |
|