amezasor commited on
Commit
8f3d6d6
·
verified ·
1 Parent(s): 2cd8b9f

data update

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -208,7 +208,7 @@ model-index:
208
  # Granite-3.0-1B-A400M-Base
209
 
210
  ## Model Summary
211
- **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
  - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
@@ -279,7 +279,9 @@ print(output)
279
 
280
  <!-- TO DO: To be completed once the paper is ready -->
281
  ## Training Data
282
- This model is trained on a mix of open-source and proprietary datasets.
 
 
283
 
284
  ## Infrastructure
285
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
 
208
  # Granite-3.0-1B-A400M-Base
209
 
210
  ## Model Summary
211
+ **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
  - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
 
279
 
280
  <!-- TO DO: To be completed once the paper is ready -->
281
  ## Training Data
282
+ This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
283
+ * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
284
+ * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
285
 
286
  ## Infrastructure
287
  We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.