Ann Brown commited on
Commit
efd0b5b
β€’
1 Parent(s): 1a5cacc

oops I think this is actually the full model

Browse files
README.md CHANGED
@@ -1,3 +1,140 @@
1
  ---
2
  license: cc
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc
3
+ base_model: HuggingFaceTB/cosmo-1b
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: lisa-out
8
+ results: []
9
  ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
+ <details><summary>See axolotl config</summary>
16
+
17
+ axolotl version: `0.4.0`
18
+ ```yaml
19
+ base_model: HuggingFaceTB/cosmo-1b
20
+ model_type: LlamaForCausalLM
21
+ tokenizer_type: LlamaTokenizer
22
+
23
+ load_in_8bit: false
24
+ load_in_4bit: false
25
+ strict: false
26
+
27
+ datasets:
28
+ - path: vicgalle/alpaca-gpt4
29
+ type: alpaca
30
+ dataset_prepared_path:
31
+ val_set_size: 0.05
32
+ output_dir: ./lisa-out
33
+
34
+ sequence_len: 2048
35
+ sample_packing: true
36
+ pad_to_sequence_len: true
37
+
38
+ adapter:
39
+ lora_model_dir:
40
+ lora_r:
41
+ lora_alpha:
42
+ lora_dropout:
43
+ lora_target_linear:
44
+ lora_fan_in_fan_out:
45
+
46
+ lisa_n_layers: 8
47
+ lisa_step_interval: 10
48
+ lisa_layers_attribute: model.layers
49
+
50
+ wandb_project: CosmoAlpacaLisa-1b-v0.1
51
+ wandb_entity:
52
+ wandb_watch:
53
+ wandb_name:
54
+ wandb_log_model:
55
+
56
+ gradient_accumulation_steps: 4
57
+ micro_batch_size: 2
58
+ num_epochs: 1
59
+ optimizer: adamw_bnb_8bit
60
+ lr_scheduler: cosine
61
+ learning_rate: 5e-5
62
+
63
+ train_on_inputs: false
64
+ group_by_length: false
65
+ bf16: auto
66
+ fp16:
67
+ tf32: false
68
+
69
+ gradient_checkpointing: true
70
+ early_stopping_patience:
71
+ resume_from_checkpoint:
72
+ local_rank:
73
+ logging_steps: 1
74
+ xformers_attention:
75
+ flash_attention: true
76
+
77
+ warmup_steps: 10
78
+ evals_per_epoch: 4
79
+ saves_per_epoch: 1
80
+ debug:
81
+ deepspeed:
82
+ weight_decay: 0.0
83
+ fsdp:
84
+ fsdp_config:
85
+ special_tokens:
86
+
87
+ ```
88
+
89
+ </details><br>
90
+
91
+ # lisa-out
92
+
93
+ This model is a fine-tuned version of [HuggingFaceTB/cosmo-1b](https://huggingface.co/HuggingFaceTB/cosmo-1b) on the None dataset.
94
+ It achieves the following results on the evaluation set:
95
+ - Loss: 1.0634
96
+
97
+ ## Model description
98
+
99
+ More information needed
100
+
101
+ ## Intended uses & limitations
102
+
103
+ More information needed
104
+
105
+ ## Training and evaluation data
106
+
107
+ More information needed
108
+
109
+ ## Training procedure
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 5e-05
115
+ - train_batch_size: 2
116
+ - eval_batch_size: 2
117
+ - seed: 42
118
+ - gradient_accumulation_steps: 4
119
+ - total_train_batch_size: 8
120
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
121
+ - lr_scheduler_type: cosine
122
+ - lr_scheduler_warmup_steps: 10
123
+ - num_epochs: 1
124
+
125
+ ### Training results
126
+
127
+ | Training Loss | Epoch | Step | Validation Loss |
128
+ |:-------------:|:-----:|:----:|:---------------:|
129
+ | 1.2281 | 0.0 | 1 | 1.2636 |
130
+ | 1.0796 | 0.25 | 166 | 1.0695 |
131
+ | 1.0272 | 0.5 | 332 | 1.0644 |
132
+ | 1.0471 | 0.75 | 498 | 1.0634 |
133
+
134
+
135
+ ### Framework versions
136
+
137
+ - Transformers 4.40.0.dev0
138
+ - Pytorch 2.1.2+cu118
139
+ - Datasets 2.18.0
140
+ - Tokenizers 0.15.0
adapter/README.md DELETED
@@ -1,140 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- base_model: HuggingFaceTB/cosmo-1b
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: lisa-out
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.0`
18
- ```yaml
19
- base_model: HuggingFaceTB/cosmo-1b
20
- model_type: LlamaForCausalLM
21
- tokenizer_type: LlamaTokenizer
22
-
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: vicgalle/alpaca-gpt4
29
- type: alpaca
30
- dataset_prepared_path:
31
- val_set_size: 0.05
32
- output_dir: ./lisa-out
33
-
34
- sequence_len: 2048
35
- sample_packing: true
36
- pad_to_sequence_len: true
37
-
38
- adapter:
39
- lora_model_dir:
40
- lora_r:
41
- lora_alpha:
42
- lora_dropout:
43
- lora_target_linear:
44
- lora_fan_in_fan_out:
45
-
46
- lisa_n_layers: 8
47
- lisa_step_interval: 10
48
- lisa_layers_attribute: model.layers
49
-
50
- wandb_project: CosmoAlpacaLisa-1b-v0.1
51
- wandb_entity:
52
- wandb_watch:
53
- wandb_name:
54
- wandb_log_model:
55
-
56
- gradient_accumulation_steps: 4
57
- micro_batch_size: 2
58
- num_epochs: 1
59
- optimizer: adamw_bnb_8bit
60
- lr_scheduler: cosine
61
- learning_rate: 5e-5
62
-
63
- train_on_inputs: false
64
- group_by_length: false
65
- bf16: auto
66
- fp16:
67
- tf32: false
68
-
69
- gradient_checkpointing: true
70
- early_stopping_patience:
71
- resume_from_checkpoint:
72
- local_rank:
73
- logging_steps: 1
74
- xformers_attention:
75
- flash_attention: true
76
-
77
- warmup_steps: 10
78
- evals_per_epoch: 4
79
- saves_per_epoch: 1
80
- debug:
81
- deepspeed:
82
- weight_decay: 0.0
83
- fsdp:
84
- fsdp_config:
85
- special_tokens:
86
-
87
- ```
88
-
89
- </details><br>
90
-
91
- # lisa-out
92
-
93
- This model is a fine-tuned version of [HuggingFaceTB/cosmo-1b](https://huggingface.co/HuggingFaceTB/cosmo-1b) on the None dataset.
94
- It achieves the following results on the evaluation set:
95
- - Loss: 1.0634
96
-
97
- ## Model description
98
-
99
- More information needed
100
-
101
- ## Intended uses & limitations
102
-
103
- More information needed
104
-
105
- ## Training and evaluation data
106
-
107
- More information needed
108
-
109
- ## Training procedure
110
-
111
- ### Training hyperparameters
112
-
113
- The following hyperparameters were used during training:
114
- - learning_rate: 5e-05
115
- - train_batch_size: 2
116
- - eval_batch_size: 2
117
- - seed: 42
118
- - gradient_accumulation_steps: 4
119
- - total_train_batch_size: 8
120
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
121
- - lr_scheduler_type: cosine
122
- - lr_scheduler_warmup_steps: 10
123
- - num_epochs: 1
124
-
125
- ### Training results
126
-
127
- | Training Loss | Epoch | Step | Validation Loss |
128
- |:-------------:|:-----:|:----:|:---------------:|
129
- | 1.2281 | 0.0 | 1 | 1.2636 |
130
- | 1.0796 | 0.25 | 166 | 1.0695 |
131
- | 1.0272 | 0.5 | 332 | 1.0644 |
132
- | 1.0471 | 0.75 | 498 | 1.0634 |
133
-
134
-
135
- ### Framework versions
136
-
137
- - Transformers 4.40.0.dev0
138
- - Pytorch 2.1.2+cu118
139
- - Datasets 2.18.0
140
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adapter/config.json β†’ config.json RENAMED
File without changes
adapter/generation_config.json β†’ generation_config.json RENAMED
File without changes
adapter/pytorch_model.bin β†’ pytorch_model.bin RENAMED
File without changes
adapter/special_tokens_map.json β†’ special_tokens_map.json RENAMED
File without changes
adapter/tokenizer.model β†’ tokenizer.model RENAMED
File without changes
adapter/tokenizer_config.json β†’ tokenizer_config.json RENAMED
File without changes