Commit
·
b33caad
1
Parent(s):
6d990a4
Upload 12 files
Browse files- AIRA_FineTuning.ipynb +0 -0
- Aira_emissions.csv +2 -0
- README.md +18 -13
- config.json +1 -1
- generation_config.json +1 -1
- pytorch_model.bin +2 -2
- training_stats.parquet +2 -2
AIRA_FineTuning.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
Aira_emissions.csv
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
|
2 |
+
2023-06-14T18:37:11,Aira_emissions,2a7c2a22-a4f1-4aad-8c1c-34d5199048e4,8730.918614387512,0.002277991408004105,2.609108512648654e-07,42.5,292.795,31.30528450012207,0.10307311259690251,0.7793010973870445,0.07588489401643905,0.9582591040003862,Canada,CAN,quebec,,,Linux-5.15.107+-x86_64-with-glibc2.31,3.10.12,2.2.3,12,Intel(R) Xeon(R) CPU @ 2.20GHz,1,1 x NVIDIA A100-SXM4-40GB,-71.2,46.8,83.48075866699219,machine,N,1.0
|
README.md
CHANGED
@@ -3,6 +3,8 @@ license: apache-2.0
|
|
3 |
datasets:
|
4 |
- nicholasKluge/fine-tuning-instruct-aira
|
5 |
- Dahoas/synthetic-instruct-gptj-pairwise
|
|
|
|
|
6 |
language:
|
7 |
- en
|
8 |
metrics:
|
@@ -16,14 +18,14 @@ tags:
|
|
16 |
- assistant
|
17 |
pipeline_tag: text-generation
|
18 |
widget:
|
19 |
-
- text:
|
20 |
-
example_title:
|
21 |
-
- text:
|
22 |
-
example_title:
|
23 |
-
- text:
|
24 |
-
example_title:
|
25 |
-
- text:
|
26 |
-
example_title:
|
27 |
inference:
|
28 |
parameters:
|
29 |
temperature: 0.2
|
@@ -34,7 +36,9 @@ inference:
|
|
34 |
|
35 |
`Aira-Instruct-774M` is a instruction-tuned GPT-style model based on [GPT-2](https://huggingface.co/gpt2). The model was trained with a dataset composed of `prompt`, `completions`, generated via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework. `Aira-Instruct-774M` instruction-tuning was achieved via conditional text generation.
|
36 |
|
37 |
-
The dataset used to train this model combines
|
|
|
|
|
38 |
|
39 |
## Details
|
40 |
|
@@ -45,17 +49,19 @@ The dataset used to train this model combines two main sources of data: the [`sy
|
|
45 |
- **Batch size:** 8
|
46 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
|
47 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
|
|
|
|
48 |
|
49 |
| Epoch/Loss|Training|Validation|
|
50 |
|---|---|---|
|
51 |
-
| 1 |0.
|
52 |
-
| 2 |0.
|
53 |
|
54 |
This repository has the notebook used to train this model.
|
55 |
|
56 |
## Usage
|
57 |
|
58 |
-
Two special tokens are used to mark the user side of the interaction and the model's response:
|
59 |
|
60 |
`<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
|
61 |
|
@@ -89,7 +95,6 @@ responses = aira.generate(**inputs,
|
|
89 |
print(f"Question: 👤 {question}\n")
|
90 |
|
91 |
for i, response in enumerate(responses):
|
92 |
-
# print only the response and remove the question
|
93 |
print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
|
94 |
```
|
95 |
|
|
|
3 |
datasets:
|
4 |
- nicholasKluge/fine-tuning-instruct-aira
|
5 |
- Dahoas/synthetic-instruct-gptj-pairwise
|
6 |
+
- databricks/databricks-dolly-15k
|
7 |
+
- HuggingFaceH4/instruction-dataset
|
8 |
language:
|
9 |
- en
|
10 |
metrics:
|
|
|
18 |
- assistant
|
19 |
pipeline_tag: text-generation
|
20 |
widget:
|
21 |
+
- text: <|startoftext|>Hello! What is your name?<|endoftext|>
|
22 |
+
example_title: Greetings
|
23 |
+
- text: <|startoftext|>Can you explain what is Machine Learning?<|endoftext|>
|
24 |
+
example_title: Machine Learning
|
25 |
+
- text: <|startoftext|>Do you know anything about virtue ethics?<|endoftext|>
|
26 |
+
example_title: Ethics
|
27 |
+
- text: <|startoftext|>How can I make my girlfried happy?<|endoftext|>
|
28 |
+
example_title: Advise
|
29 |
inference:
|
30 |
parameters:
|
31 |
temperature: 0.2
|
|
|
36 |
|
37 |
`Aira-Instruct-774M` is a instruction-tuned GPT-style model based on [GPT-2](https://huggingface.co/gpt2). The model was trained with a dataset composed of `prompt`, `completions`, generated via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework. `Aira-Instruct-774M` instruction-tuning was achieved via conditional text generation.
|
38 |
|
39 |
+
The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
|
40 |
+
|
41 |
+
Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
|
42 |
|
43 |
## Details
|
44 |
|
|
|
49 |
- **Batch size:** 8
|
50 |
- **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
|
51 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
52 |
+
- **Emissions:** 0.0022 KgCO2 (Canada)
|
53 |
+
- **Total Energy Consumption:** 0.95 kWh
|
54 |
|
55 |
| Epoch/Loss|Training|Validation|
|
56 |
|---|---|---|
|
57 |
+
| 1 |0.687266|0.616128|
|
58 |
+
| 2 |0.468581|0.582550|
|
59 |
|
60 |
This repository has the notebook used to train this model.
|
61 |
|
62 |
## Usage
|
63 |
|
64 |
+
Two special tokens are used to mark the user side of the interaction and the model's response:
|
65 |
|
66 |
`<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
|
67 |
|
|
|
95 |
print(f"Question: 👤 {question}\n")
|
96 |
|
97 |
for i, response in enumerate(responses):
|
|
|
98 |
print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
|
99 |
```
|
100 |
|
config.json
CHANGED
@@ -33,7 +33,7 @@
|
|
33 |
}
|
34 |
},
|
35 |
"torch_dtype": "float32",
|
36 |
-
"transformers_version": "4.
|
37 |
"use_cache": true,
|
38 |
"vocab_size": 50259
|
39 |
}
|
|
|
33 |
}
|
34 |
},
|
35 |
"torch_dtype": "float32",
|
36 |
+
"transformers_version": "4.30.2",
|
37 |
"use_cache": true,
|
38 |
"vocab_size": 50259
|
39 |
}
|
generation_config.json
CHANGED
@@ -2,5 +2,5 @@
|
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 50256,
|
4 |
"eos_token_id": 50256,
|
5 |
-
"transformers_version": "4.
|
6 |
}
|
|
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 50256,
|
4 |
"eos_token_id": 50256,
|
5 |
+
"transformers_version": "4.30.2"
|
6 |
}
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4cc730a83ca89468d37553705b0fc07a5e20bc643a43e8dce7ccba4e93fa6a68
|
3 |
+
size 3096272925
|
training_stats.parquet
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:67af53f8de8a9927257116a5c8c06d33e757d57c2f2f8f4f5428060af02d0f88
|
3 |
+
size 3041
|