File size: 2,649 Bytes
d57c825
 
72b8f12
 
 
 
 
 
d57c825
 
346605e
d57c825
ac95f1f
 
6ec1080
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
library_name: transformers
license: apache-2.0
datasets:
- Locutusque/hercules-v2.0
- CollectiveCognition/chats-data-2023-09-22
language:
- en
---

# lr-experiment1-7B

The lr-experiment model series is a research project I'm conducting that I will be using to determine the best learning rate to use while fine-tuning Mistral. This model uses a learning rate of 2e-5 with a cosine scheduler and no warmup steps.

I used Locutusque/Hercules-2.0-Mistral-7B as a base model, and further fine-tuned it on CollectiveCognition/chats-data-2023-09-22 using QLoRA for 3 epochs. I will be keeping track of evaluation results, and will comparing it to upcoming models.

# Evals

|              Tasks              |Version|Filter|n-shot| Metric |Value |   |Stderr|
|---------------------------------|-------|------|------|--------|-----:|---|-----:|
|agieval_nous                     |N/A    |none  |None  |acc     |0.3645|±  |0.0093|
|                                 |       |none  |None  |acc_norm|0.3468|±  |0.0092|
| - agieval_aqua_rat              |      1|none  |None  |acc     |0.2283|±  |0.0264|
|                                 |       |none  |None  |acc_norm|0.2283|±  |0.0264|
| - agieval_logiqa_en             |      1|none  |None  |acc     |0.2965|±  |0.0179|
|                                 |       |none  |None  |acc_norm|0.3303|±  |0.0184|
| - agieval_lsat_ar               |      1|none  |None  |acc     |0.2217|±  |0.0275|
|                                 |       |none  |None  |acc_norm|0.1783|±  |0.0253|
| - agieval_lsat_lr               |      1|none  |None  |acc     |0.4039|±  |0.0217|
|                                 |       |none  |None  |acc_norm|0.3686|±  |0.0214|
| - agieval_lsat_rc               |      1|none  |None  |acc     |0.4870|±  |0.0305|
|                                 |       |none  |None  |acc_norm|0.4424|±  |0.0303|
| - agieval_sat_en                |      1|none  |None  |acc     |0.6408|±  |0.0335|
|                                 |       |none  |None  |acc_norm|0.5971|±  |0.0343|
| - agieval_sat_en_without_passage|      1|none  |None  |acc     |0.3932|±  |0.0341|
|                                 |       |none  |None  |acc_norm|0.3835|±  |0.0340|
| - agieval_sat_math              |      1|none  |None  |acc     |0.3455|±  |0.0321|
|                                 |       |none  |None  |acc_norm|0.2727|±  |0.0301|

|   Groups   |Version|Filter|n-shot| Metric |Value |   |Stderr|
|------------|-------|------|------|--------|-----:|---|-----:|
|agieval_nous|N/A    |none  |None  |acc     |0.3645|±  |0.0093|
|            |       |none  |None  |acc_norm|0.3468|±  |0.0092|