ajibawa-2023
/

Code-13B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ajibawa-2023 commited on Dec 8, 2023

Commit

b6a4440

•

1 Parent(s): ad623db

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -12,12 +12,14 @@ Large Language Models (LLMs) are good with code generations. Sometimes they do m
 This is what I have tried over here. The base Llama-2 model was used for training purpose. It is trained on around 74000 set of codes. Each set having 2 conversations.
 Along with Python, Java, JavaScript, GO, C++, Rust etc. code with detailed explanation is used for training purpose. It is built upon using my existing Dataset [Python-Code-23k-ShareGPT](https://huggingface.co/datasets/ajibawa-2023/Python-Code-23k-ShareGPT).
 This conversation is in Vicuna/ShareGPT format. Each set, along with code, has detailed explanation.
-I have released the new [data](https://huggingface.co/datasets/ajibawa-2023/Python-Code-23k-ShareGPT).
 **Training:**
 Entire dataset was trained on Azure 4 x A100 80GB. For 3 epoch, training took 42 hours. DeepSpeed codebase was used for training purpose. This was trained on Llama-1 by Meta.
 This is a full fine tuned model. Links for quantized models will be released soon.

 This is what I have tried over here. The base Llama-2 model was used for training purpose. It is trained on around 74000 set of codes. Each set having 2 conversations.
 Along with Python, Java, JavaScript, GO, C++, Rust etc. code with detailed explanation is used for training purpose. It is built upon using my existing Dataset [Python-Code-23k-ShareGPT](https://huggingface.co/datasets/ajibawa-2023/Python-Code-23k-ShareGPT).
 This conversation is in Vicuna/ShareGPT format. Each set, along with code, has detailed explanation.
+I have released the new data [Code-74k-ShareGPT](https://huggingface.co/datasets/ajibawa-2023/Code-74k-ShareGPT) on which this Model is trained.
 **Training:**
 Entire dataset was trained on Azure 4 x A100 80GB. For 3 epoch, training took 42 hours. DeepSpeed codebase was used for training purpose. This was trained on Llama-1 by Meta.
 This is a full fine tuned model. Links for quantized models will be released soon.