ibm-granite
/

granite-20b-code-instruct-8k

@@ -1,6 +1,6 @@
 ---
 pipeline_tag: text-generation
-base_model: ibm-granite/granite-20b-code-base
 inference: true
 license: apache-2.0
 datasets:
@@ -19,7 +19,7 @@ tags:
 - code
 - granite
 model-index:
-- name: granite-20b-code-instruct
   results:
   - task:
       type: text-generation
@@ -205,10 +205,10 @@ model-index:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)
-# Granite-20B-Code-Instruct
 ## Model Summary
-**Granite-20B-Code-Instruct** is a 20B parameter model fine tuned from *Granite-20B-Code-Base* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
@@ -223,13 +223,13 @@ The model is designed to respond to coding related instructions and can be used
 <!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
 ### Generation
-This is a simple example of how to use **Granite-20B-Code-Instruct** model.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # or "cpu"
-model_path = "ibm-granite/granite-20b-code-instruct"
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 # drop device_map if running on CPU
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
@@ -265,4 +265,4 @@ Granite Code Instruct models are trained on the following types of data.
 We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
 ## Ethical Considerations and Limitations
-Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-20B-Code-Base](https://huggingface.co/ibm-granite/granite-20b-code-base)* model card.

 ---
 pipeline_tag: text-generation
+base_model: ibm-granite/granite-20b-code-base-8k
 inference: true
 license: apache-2.0
 datasets:
 - code
 - granite
 model-index:
+- name: granite-20b-code-instruct-8k
   results:
   - task:
       type: text-generation
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)
+# Granite-20B-Code-Instruct-8K
 ## Model Summary
+**Granite-20B-Code-Instruct-8K** is a 20B parameter model fine tuned from *Granite-20B-Code-Base-8K* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
 <!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
 ### Generation
+This is a simple example of how to use **Granite-20B-Code-Instruct-8K** model.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # or "cpu"
+model_path = "ibm-granite/granite-20b-code-instruct-8k"
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 # drop device_map if running on CPU
 model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
 We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
 ## Ethical Considerations and Limitations
+Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-20B-Code-Base-8K](https://huggingface.co/ibm-granite/granite-20b-code-base-8k)* model card.