data update
Browse files
README.md
CHANGED
@@ -208,7 +208,7 @@ model-index:
|
|
208 |
# Granite-3.0-1B-A400M-Base
|
209 |
|
210 |
## Model Summary
|
211 |
-
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
@@ -279,7 +279,9 @@ print(output)
|
|
279 |
|
280 |
<!-- TO DO: To be completed once the paper is ready -->
|
281 |
## Training Data
|
282 |
-
This model is trained on a mix of open-source and proprietary
|
|
|
|
|
283 |
|
284 |
## Infrastructure
|
285 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
208 |
# Granite-3.0-1B-A400M-Base
|
209 |
|
210 |
## Model Summary
|
211 |
+
**Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
|
|
279 |
|
280 |
<!-- TO DO: To be completed once the paper is ready -->
|
281 |
## Training Data
|
282 |
+
This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
|
283 |
+
* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
284 |
+
* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
285 |
|
286 |
## Infrastructure
|
287 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|