File size: 11,633 Bytes
21a3449
5871ef8
b1e3879
 
 
 
 
 
3065ec8
5871ef8
 
 
 
 
 
21a3449
 
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
 
 
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
5871ef8
21a3449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5871ef8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b70a6c9
5871ef8
b70a6c9
5871ef8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21a3449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5871ef8
 
 
7ebde1c
5871ef8
768d5b2
b70a6c9
768d5b2
5871ef8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21a3449
5871ef8
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
license: cc-by-4.0
tags:
- alignment
- value alignment
- AI safety
- safety
- LLM
- history
datasets:
- PKU-Alignment/ProgressGym-HistText
- PKU-Alignment/ProgressGym-TimelessQA
base_model:
- PKU-Alignment/ProgressGym-HistLlama3-8B-C014-pretrain
- meta-llama/Meta-Llama-3-8B
---

# ProgressGym-HistLlama3-8B-C014-instruct

## Overview

#### The ProgressGym Framework

![Framework Diagram](./readme-assets/main-diagram.png)

**ProgressGym-HistLlama3-8B-C014-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in. 

To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):

> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. 
>
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.

#### ProgressGym-HistLlama3-8B-C014-instruct

ProgressGym-HistLlama3-8B-C014-instruct is one of the **36 historical language models** in the ProgressGym framework. 

**ProgressGym-HistLlama3-8B-C014-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.

**ProgressGym-HistLlama3-8B-C014-instruct is a 14th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 14th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:

- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP

... with the following training results:

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.5789        | 0.0152 | 1    | 2.6458          |
| 2.5672        | 0.0758 | 5    | 2.6280          |
| 2.5751        | 0.1515 | 10   | 2.5314          |
| 2.418         | 0.2273 | 15   | 2.4634          |
| 2.4701        | 0.3030 | 20   | 2.4177          |
| 2.3904        | 0.3788 | 25   | 2.3785          |
| 2.3539        | 0.4545 | 30   | 2.3378          |
| 2.3101        | 0.5303 | 35   | 2.3082          |
| 2.3254        | 0.6061 | 40   | 2.2816          |
| 2.2762        | 0.6818 | 45   | 2.2614          |
| 2.2525        | 0.7576 | 50   | 2.2458          |
| 2.2777        | 0.8333 | 55   | 2.2321          |
| 2.2054        | 0.9091 | 60   | 2.2206          |
| 2.237         | 0.9848 | 65   | 2.2113          |
| 1.986         | 1.0606 | 70   | 2.2115          |
| 1.9373        | 1.1364 | 75   | 2.2217          |
| 1.9228        | 1.2121 | 80   | 2.2132          |
| 1.9084        | 1.2879 | 85   | 2.2118          |
| 1.9684        | 1.3636 | 90   | 2.2122          |
| 1.9126        | 1.4394 | 95   | 2.2094          |
| 1.9101        | 1.5152 | 100  | 2.2066          |
| 1.8496        | 1.5909 | 105  | 2.2058          |
| 1.9154        | 1.6667 | 110  | 2.2057          |
| 1.9233        | 1.7424 | 115  | 2.2056          |
| 1.9198        | 1.8182 | 120  | 2.2052          |
| 1.9229        | 1.8939 | 125  | 2.2048          |
| 1.8913        | 1.9697 | 130  | 2.2045          |
| 1.8814        | 2.0455 | 135  | 2.2046          |
| 1.8813        | 2.1212 | 140  | 2.2051          |
| 1.8912        | 2.1970 | 145  | 2.2058          |
| 1.9184        | 2.2727 | 150  | 2.2065          |
| 1.8662        | 2.3485 | 155  | 2.2071          |
| 1.8809        | 2.4242 | 160  | 2.2074          |
| 1.8591        | 2.5    | 165  | 2.2077          |
| 1.8731        | 2.5758 | 170  | 2.2079          |
| 1.8948        | 2.6515 | 175  | 2.2082          |
| 1.8876        | 2.7273 | 180  | 2.2082          |
| 1.8408        | 2.8030 | 185  | 2.2083          |
| 1.8931        | 2.8788 | 190  | 2.2082          |
| 1.8569        | 2.9545 | 195  | 2.2080          |
| 1.8621        | 3.0303 | 200  | 2.2079          |
| 1.8863        | 3.1061 | 205  | 2.2078          |
| 1.9021        | 3.1818 | 210  | 2.2079          |
| 1.8648        | 3.2576 | 215  | 2.2080          |
| 1.8443        | 3.3333 | 220  | 2.2081          |
| 1.8978        | 3.4091 | 225  | 2.2080          |
| 1.8658        | 3.4848 | 230  | 2.2080          |
| 1.8706        | 3.5606 | 235  | 2.2079          |
| 1.8855        | 3.6364 | 240  | 2.2078          |
| 1.8535        | 3.7121 | 245  | 2.2078          |
| 1.9062        | 3.7879 | 250  | 2.2079          |
| 1.8628        | 3.8636 | 255  | 2.2078          |
| 1.8484        | 3.9394 | 260  | 2.2077          |

Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.

**ProgressGym-HistLlama3-8B-C014-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP

... with the following training results:

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.9832        | 0.0208 | 1    | 0.9730          |
| 0.9463        | 0.1042 | 5    | 0.9421          |
| 0.8488        | 0.2083 | 10   | 0.8247          |
| 0.7833        | 0.3125 | 15   | 0.8149          |
| 0.7797        | 0.4167 | 20   | 0.8403          |
| 0.8542        | 0.5208 | 25   | 0.8670          |
| 0.8895        | 0.625  | 30   | 0.8718          |
| 0.8519        | 0.7292 | 35   | 0.8592          |
| 0.8224        | 0.8333 | 40   | 0.8491          |
| 0.8538        | 0.9375 | 45   | 0.8384          |
| 0.6569        | 1.0417 | 50   | 0.8295          |
| 0.437         | 1.1458 | 55   | 0.8457          |
| 0.4405        | 1.25   | 60   | 0.8668          |
| 0.4331        | 1.3542 | 65   | 0.8671          |
| 0.448         | 1.4583 | 70   | 0.8597          |
| 0.4673        | 1.5625 | 75   | 0.8514          |
| 0.4298        | 1.6667 | 80   | 0.8474          |
| 0.4252        | 1.7708 | 85   | 0.8458          |
| 0.4429        | 1.875  | 90   | 0.8451          |
| 0.4484        | 1.9792 | 95   | 0.8450          |
| 0.3634        | 2.0833 | 100  | 0.8455          |
| 0.3876        | 2.1875 | 105  | 0.8467          |
| 0.3717        | 2.2917 | 110  | 0.8481          |
| 0.387         | 2.3958 | 115  | 0.8494          |
| 0.3561        | 2.5    | 120  | 0.8505          |
| 0.4219        | 2.6042 | 125  | 0.8516          |
| 0.3798        | 2.7083 | 130  | 0.8527          |
| 0.3551        | 2.8125 | 135  | 0.8537          |
| 0.3827        | 2.9167 | 140  | 0.8546          |
| 0.3938        | 3.0208 | 145  | 0.8556          |
| 0.3805        | 3.125  | 150  | 0.8565          |
| 0.3813        | 3.2292 | 155  | 0.8574          |
| 0.3894        | 3.3333 | 160  | 0.8582          |
| 0.3603        | 3.4375 | 165  | 0.8589          |
| 0.3515        | 3.5417 | 170  | 0.8597          |
| 0.3433        | 3.6458 | 175  | 0.8605          |
| 0.3511        | 3.75   | 180  | 0.8614          |
| 0.3599        | 3.8542 | 185  | 0.8620          |
| 0.3994        | 3.9583 | 190  | 0.8621          |


## Links

- **[Paper Preprint]**  [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*

## Citation

If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.

```text
@article{progressgym,
  title={ProgressGym: Alignment with a Millennium of Moral Progress},
  author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
  journal={arXiv preprint arXiv:2406.20087},
  eprint={2406.20087},
  eprinttype = {arXiv},
  year={2024}
}
```

## Ethics Statement

- **Copyright information of historical text data sources**:
  - Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
  - For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
  - The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
  - The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts. 
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.