luigimontaleone commited on
Commit
39ee036
1 Parent(s): 431817a

End of training

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +11 -15
  2. adapter_model.safetensors +1 -1
  3. run-0/checkpoint-500/README.md +202 -0
  4. run-0/checkpoint-500/adapter_config.json +32 -0
  5. run-0/checkpoint-500/adapter_model.safetensors +3 -0
  6. run-0/checkpoint-500/optimizer.pt +3 -0
  7. run-0/checkpoint-500/preprocessor_config.json +14 -0
  8. run-0/checkpoint-500/rng_state.pth +3 -0
  9. run-0/checkpoint-500/scheduler.pt +3 -0
  10. run-0/checkpoint-500/trainer_state.json +112 -0
  11. run-0/checkpoint-500/training_args.bin +3 -0
  12. run-1/checkpoint-500/README.md +202 -0
  13. run-1/checkpoint-500/adapter_config.json +32 -0
  14. run-1/checkpoint-500/adapter_model.safetensors +3 -0
  15. run-1/checkpoint-500/optimizer.pt +3 -0
  16. run-1/checkpoint-500/preprocessor_config.json +14 -0
  17. run-1/checkpoint-500/rng_state.pth +3 -0
  18. run-1/checkpoint-500/scheduler.pt +3 -0
  19. run-1/checkpoint-500/trainer_state.json +112 -0
  20. run-1/checkpoint-500/training_args.bin +3 -0
  21. run-11/checkpoint-500/README.md +202 -0
  22. run-11/checkpoint-500/adapter_config.json +32 -0
  23. run-11/checkpoint-500/adapter_model.safetensors +3 -0
  24. run-11/checkpoint-500/optimizer.pt +3 -0
  25. run-11/checkpoint-500/preprocessor_config.json +14 -0
  26. run-11/checkpoint-500/rng_state.pth +3 -0
  27. run-11/checkpoint-500/scheduler.pt +3 -0
  28. run-11/checkpoint-500/trainer_state.json +187 -0
  29. run-11/checkpoint-500/training_args.bin +3 -0
  30. run-14/checkpoint-500/README.md +202 -0
  31. run-14/checkpoint-500/adapter_config.json +32 -0
  32. run-14/checkpoint-500/adapter_model.safetensors +3 -0
  33. run-14/checkpoint-500/optimizer.pt +3 -0
  34. run-14/checkpoint-500/preprocessor_config.json +14 -0
  35. run-14/checkpoint-500/rng_state.pth +3 -0
  36. run-14/checkpoint-500/scheduler.pt +3 -0
  37. run-14/checkpoint-500/trainer_state.json +187 -0
  38. run-14/checkpoint-500/training_args.bin +3 -0
  39. run-2/checkpoint-1000/README.md +202 -0
  40. run-2/checkpoint-1000/adapter_config.json +32 -0
  41. run-2/checkpoint-1000/adapter_model.safetensors +3 -0
  42. run-2/checkpoint-1000/optimizer.pt +3 -0
  43. run-2/checkpoint-1000/preprocessor_config.json +14 -0
  44. run-2/checkpoint-1000/rng_state.pth +3 -0
  45. run-2/checkpoint-1000/scheduler.pt +3 -0
  46. run-2/checkpoint-1000/trainer_state.json +187 -0
  47. run-2/checkpoint-1000/training_args.bin +3 -0
  48. run-2/checkpoint-500/README.md +202 -0
  49. run-2/checkpoint-500/adapter_config.json +32 -0
  50. run-2/checkpoint-500/adapter_model.safetensors +3 -0
README.md CHANGED
@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the b-brave/speech_disorders_voice dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 0.3430
24
 
25
  ## Model description
26
 
@@ -40,29 +40,25 @@ More information needed
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.001
43
- - train_batch_size: 16
44
  - eval_batch_size: 4
45
  - seed: 42
46
  - gradient_accumulation_steps: 2
47
- - total_train_batch_size: 32
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: linear
50
- - lr_scheduler_warmup_steps: 50
51
- - num_epochs: 15
52
  - mixed_precision_training: Native AMP
53
 
54
  ### Training results
55
 
56
- | Training Loss | Epoch | Step | Validation Loss |
57
- |:-------------:|:-------:|:----:|:---------------:|
58
- | 1.2968 | 1.7241 | 50 | 0.3434 |
59
- | 0.2001 | 3.4483 | 100 | 0.3107 |
60
- | 0.0827 | 5.1724 | 150 | 0.3031 |
61
- | 0.0266 | 6.8966 | 200 | 0.3290 |
62
- | 0.015 | 8.6207 | 250 | 0.3057 |
63
- | 0.0083 | 10.3448 | 300 | 0.3294 |
64
- | 0.0042 | 12.0690 | 350 | 0.3423 |
65
- | 0.002 | 13.7931 | 400 | 0.3430 |
66
 
67
 
68
  ### Framework versions
 
20
 
21
  This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the b-brave/speech_disorders_voice dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 0.3513
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.001
43
+ - train_batch_size: 8
44
  - eval_batch_size: 4
45
  - seed: 42
46
  - gradient_accumulation_steps: 2
47
+ - total_train_batch_size: 16
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: linear
50
+ - lr_scheduler_warmup_steps: 100
51
+ - num_epochs: 7
52
  - mixed_precision_training: Native AMP
53
 
54
  ### Training results
55
 
56
+ | Training Loss | Epoch | Step | Validation Loss |
57
+ |:-------------:|:------:|:----:|:---------------:|
58
+ | 1.0439 | 1.6529 | 100 | 0.3800 |
59
+ | 0.1939 | 3.3058 | 200 | 0.3690 |
60
+ | 0.07 | 4.9587 | 300 | 0.3301 |
61
+ | 0.0187 | 6.6116 | 400 | 0.3513 |
 
 
 
 
62
 
63
 
64
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c3da27ee8a8a8ab39333e019e5a97ab357d2ea402c9d024977b22fdd3d65cd66
3
  size 62969640
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e836508ba6cc11e70f41aaed259e28d251cbfcb4c5a5b7978e2dc4d5f082d6d
3
  size 62969640
run-0/checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-0/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "v_proj",
27
+ "q_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-0/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24fec58d8c223f56fc9c03bf13d6251479405d0f322305deb87f71554401b646
3
+ size 62969640
run-0/checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cbab46823b18f040948f161ef1b3f67a38c0aed0d36e6d8921022298dc07678
3
+ size 126151570
run-0/checkpoint-500/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run-0/checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:610f67e3ef2a38bc4d059bf16b5476de440640d0cf679700dfa9d3a6aa152c59
3
+ size 14244
run-0/checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c3553622cafc378ebbc0729e6ab9e744631a120f1792c9359ebdf479fa218b0
3
+ size 1064
run-0/checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.310344827586207,
5
+ "eval_steps": 100,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.8620689655172413,
13
+ "grad_norm": 1.9135559797286987,
14
+ "learning_rate": 0.00012074932525523017,
15
+ "loss": 1.5845,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.8620689655172413,
20
+ "eval_loss": 0.4098469614982605,
21
+ "eval_runtime": 21.3928,
22
+ "eval_samples_per_second": 4.815,
23
+ "eval_steps_per_second": 1.215,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 1.7241379310344827,
28
+ "grad_norm": 1.4393121004104614,
29
+ "learning_rate": 9.569759802385461e-05,
30
+ "loss": 0.3018,
31
+ "step": 200
32
+ },
33
+ {
34
+ "epoch": 1.7241379310344827,
35
+ "eval_loss": 0.3046342730522156,
36
+ "eval_runtime": 21.0334,
37
+ "eval_samples_per_second": 4.897,
38
+ "eval_steps_per_second": 1.236,
39
+ "step": 200
40
+ },
41
+ {
42
+ "epoch": 2.586206896551724,
43
+ "grad_norm": 3.1296520233154297,
44
+ "learning_rate": 7.064587079247906e-05,
45
+ "loss": 0.1927,
46
+ "step": 300
47
+ },
48
+ {
49
+ "epoch": 2.586206896551724,
50
+ "eval_loss": 0.27630501985549927,
51
+ "eval_runtime": 21.0116,
52
+ "eval_samples_per_second": 4.902,
53
+ "eval_steps_per_second": 1.237,
54
+ "step": 300
55
+ },
56
+ {
57
+ "epoch": 3.4482758620689653,
58
+ "grad_norm": 1.2734673023223877,
59
+ "learning_rate": 4.559414356110351e-05,
60
+ "loss": 0.1304,
61
+ "step": 400
62
+ },
63
+ {
64
+ "epoch": 3.4482758620689653,
65
+ "eval_loss": 0.27304860949516296,
66
+ "eval_runtime": 20.891,
67
+ "eval_samples_per_second": 4.93,
68
+ "eval_steps_per_second": 1.245,
69
+ "step": 400
70
+ },
71
+ {
72
+ "epoch": 4.310344827586207,
73
+ "grad_norm": 1.0342594385147095,
74
+ "learning_rate": 2.0542416329727953e-05,
75
+ "loss": 0.1007,
76
+ "step": 500
77
+ },
78
+ {
79
+ "epoch": 4.310344827586207,
80
+ "eval_loss": 0.275942862033844,
81
+ "eval_runtime": 20.8484,
82
+ "eval_samples_per_second": 4.94,
83
+ "eval_steps_per_second": 1.247,
84
+ "step": 500
85
+ }
86
+ ],
87
+ "logging_steps": 100,
88
+ "max_steps": 580,
89
+ "num_input_tokens_seen": 0,
90
+ "num_train_epochs": 5,
91
+ "save_steps": 500,
92
+ "stateful_callbacks": {
93
+ "TrainerControl": {
94
+ "args": {
95
+ "should_epoch_stop": false,
96
+ "should_evaluate": false,
97
+ "should_log": false,
98
+ "should_save": true,
99
+ "should_training_stop": false
100
+ },
101
+ "attributes": {}
102
+ }
103
+ },
104
+ "total_flos": 1.370747847573504e+19,
105
+ "train_batch_size": 8,
106
+ "trial_name": null,
107
+ "trial_params": {
108
+ "learning_rate": 0.00013277415432629043,
109
+ "per_device_train_batch_size": 8,
110
+ "weight_decay": 0.0021291159421780548
111
+ }
112
+ }
run-0/checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51dd06c8f8e2f8b00a912118364d06f67f1f5a631534d6cc03d23bfb515a9b22
3
+ size 5240
run-1/checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-1/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "v_proj",
27
+ "q_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-1/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:301ecb046fd37114d45719f410fb2d4707317c97f15b99227fbcf9de9c2ff2a4
3
+ size 62969640
run-1/checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f7f0647119995372d44a8188e13a108e3ef5d9a523007d03b3ae8146f183d7d
3
+ size 126151570
run-1/checkpoint-500/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run-1/checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:610f67e3ef2a38bc4d059bf16b5476de440640d0cf679700dfa9d3a6aa152c59
3
+ size 14244
run-1/checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:697be74c5751a48948b5062099f3c0e459b59442bbc7d6efa5c14f9a1c356fc2
3
+ size 1064
run-1/checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.310344827586207,
5
+ "eval_steps": 100,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.8620689655172413,
13
+ "grad_norm": 1.0919902324676514,
14
+ "learning_rate": 0.000511963042168714,
15
+ "loss": 1.1243,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.8620689655172413,
20
+ "eval_loss": 0.3352593779563904,
21
+ "eval_runtime": 21.0273,
22
+ "eval_samples_per_second": 4.898,
23
+ "eval_steps_per_second": 1.236,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 1.7241379310344827,
28
+ "grad_norm": 0.7907235622406006,
29
+ "learning_rate": 0.0004055258192646154,
30
+ "loss": 0.2577,
31
+ "step": 200
32
+ },
33
+ {
34
+ "epoch": 1.7241379310344827,
35
+ "eval_loss": 0.2866515517234802,
36
+ "eval_runtime": 20.7228,
37
+ "eval_samples_per_second": 4.97,
38
+ "eval_steps_per_second": 1.255,
39
+ "step": 200
40
+ },
41
+ {
42
+ "epoch": 2.586206896551724,
43
+ "grad_norm": 0.10258855670690536,
44
+ "learning_rate": 0.0002990885963605169,
45
+ "loss": 0.1337,
46
+ "step": 300
47
+ },
48
+ {
49
+ "epoch": 2.586206896551724,
50
+ "eval_loss": 0.2725263833999634,
51
+ "eval_runtime": 20.8639,
52
+ "eval_samples_per_second": 4.937,
53
+ "eval_steps_per_second": 1.246,
54
+ "step": 300
55
+ },
56
+ {
57
+ "epoch": 3.4482758620689653,
58
+ "grad_norm": 0.2434462606906891,
59
+ "learning_rate": 0.00019265137345641833,
60
+ "loss": 0.0734,
61
+ "step": 400
62
+ },
63
+ {
64
+ "epoch": 3.4482758620689653,
65
+ "eval_loss": 0.27984029054641724,
66
+ "eval_runtime": 20.7967,
67
+ "eval_samples_per_second": 4.953,
68
+ "eval_steps_per_second": 1.25,
69
+ "step": 400
70
+ },
71
+ {
72
+ "epoch": 4.310344827586207,
73
+ "grad_norm": 0.4092065393924713,
74
+ "learning_rate": 8.621415055231981e-05,
75
+ "loss": 0.0422,
76
+ "step": 500
77
+ },
78
+ {
79
+ "epoch": 4.310344827586207,
80
+ "eval_loss": 0.2786606252193451,
81
+ "eval_runtime": 20.6352,
82
+ "eval_samples_per_second": 4.991,
83
+ "eval_steps_per_second": 1.26,
84
+ "step": 500
85
+ }
86
+ ],
87
+ "logging_steps": 100,
88
+ "max_steps": 580,
89
+ "num_input_tokens_seen": 0,
90
+ "num_train_epochs": 5,
91
+ "save_steps": 500,
92
+ "stateful_callbacks": {
93
+ "TrainerControl": {
94
+ "args": {
95
+ "should_epoch_stop": false,
96
+ "should_evaluate": false,
97
+ "should_log": false,
98
+ "should_save": true,
99
+ "should_training_stop": false
100
+ },
101
+ "attributes": {}
102
+ }
103
+ },
104
+ "total_flos": 1.370747847573504e+19,
105
+ "train_batch_size": 8,
106
+ "trial_name": null,
107
+ "trial_params": {
108
+ "learning_rate": 0.0005641172813917223,
109
+ "per_device_train_batch_size": 8,
110
+ "weight_decay": 0.0006732813397449721
111
+ }
112
+ }
run-1/checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:586b5e4b6f0b63d98f1fb72f63d9cfa27c3f5271d96663ef93dac3b215590cac
3
+ size 5240
run-11/checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-11/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "q_proj",
27
+ "v_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-11/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc1578b5c1ed9b672936f479c98346886397fe5bb539e468cc79bac92327f5b3
3
+ size 62969640
run-11/checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3e3151830d547370b456265c4be2a67e395b2100069af1aa9fb4418c89be29e
3
+ size 126151570
run-11/checkpoint-500/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run-11/checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e8ee737576623fd1565b4ad626caaadf9a95e0f8cbaa53304cbd8316b784fc7
3
+ size 14244
run-11/checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a9ac6f5a41063d009620aaf578bb32960c6c36ab869fa1af8b00a00116fd422
3
+ size 1064
run-11/checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.1551724137931036,
5
+ "eval_steps": 50,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.21551724137931033,
13
+ "grad_norm": 1.7124779224395752,
14
+ "learning_rate": 0.0004653203635082234,
15
+ "loss": 1.8899,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.21551724137931033,
20
+ "eval_loss": 0.7426255941390991,
21
+ "eval_runtime": 21.193,
22
+ "eval_samples_per_second": 4.86,
23
+ "eval_steps_per_second": 1.227,
24
+ "step": 50
25
+ },
26
+ {
27
+ "epoch": 0.43103448275862066,
28
+ "grad_norm": 0.7779552936553955,
29
+ "learning_rate": 0.00044027121018106054,
30
+ "loss": 0.5956,
31
+ "step": 100
32
+ },
33
+ {
34
+ "epoch": 0.43103448275862066,
35
+ "eval_loss": 0.4363415837287903,
36
+ "eval_runtime": 21.3828,
37
+ "eval_samples_per_second": 4.817,
38
+ "eval_steps_per_second": 1.216,
39
+ "step": 100
40
+ },
41
+ {
42
+ "epoch": 0.646551724137931,
43
+ "grad_norm": 1.916613221168518,
44
+ "learning_rate": 0.0004035206918020071,
45
+ "loss": 0.3791,
46
+ "step": 150
47
+ },
48
+ {
49
+ "epoch": 0.646551724137931,
50
+ "eval_loss": 0.4209233522415161,
51
+ "eval_runtime": 21.3351,
52
+ "eval_samples_per_second": 4.828,
53
+ "eval_steps_per_second": 1.219,
54
+ "step": 150
55
+ },
56
+ {
57
+ "epoch": 0.8620689655172413,
58
+ "grad_norm": 1.6026177406311035,
59
+ "learning_rate": 0.0003667701734229536,
60
+ "loss": 0.3729,
61
+ "step": 200
62
+ },
63
+ {
64
+ "epoch": 0.8620689655172413,
65
+ "eval_loss": 0.41846963763237,
66
+ "eval_runtime": 21.4128,
67
+ "eval_samples_per_second": 4.81,
68
+ "eval_steps_per_second": 1.214,
69
+ "step": 200
70
+ },
71
+ {
72
+ "epoch": 1.0775862068965518,
73
+ "grad_norm": 1.0747971534729004,
74
+ "learning_rate": 0.00033001965504390017,
75
+ "loss": 0.2689,
76
+ "step": 250
77
+ },
78
+ {
79
+ "epoch": 1.0775862068965518,
80
+ "eval_loss": 0.3982403576374054,
81
+ "eval_runtime": 21.299,
82
+ "eval_samples_per_second": 4.836,
83
+ "eval_steps_per_second": 1.221,
84
+ "step": 250
85
+ },
86
+ {
87
+ "epoch": 1.293103448275862,
88
+ "grad_norm": 0.07190462946891785,
89
+ "learning_rate": 0.0002932691366648467,
90
+ "loss": 0.2102,
91
+ "step": 300
92
+ },
93
+ {
94
+ "epoch": 1.293103448275862,
95
+ "eval_loss": 0.3758964240550995,
96
+ "eval_runtime": 21.465,
97
+ "eval_samples_per_second": 4.799,
98
+ "eval_steps_per_second": 1.211,
99
+ "step": 300
100
+ },
101
+ {
102
+ "epoch": 1.5086206896551724,
103
+ "grad_norm": 1.1930207014083862,
104
+ "learning_rate": 0.00025651861828579324,
105
+ "loss": 0.2177,
106
+ "step": 350
107
+ },
108
+ {
109
+ "epoch": 1.5086206896551724,
110
+ "eval_loss": 0.3926349878311157,
111
+ "eval_runtime": 21.5556,
112
+ "eval_samples_per_second": 4.778,
113
+ "eval_steps_per_second": 1.206,
114
+ "step": 350
115
+ },
116
+ {
117
+ "epoch": 1.7241379310344827,
118
+ "grad_norm": 2.48095703125,
119
+ "learning_rate": 0.00021976809990673974,
120
+ "loss": 0.1461,
121
+ "step": 400
122
+ },
123
+ {
124
+ "epoch": 1.7241379310344827,
125
+ "eval_loss": 0.3509480059146881,
126
+ "eval_runtime": 21.4602,
127
+ "eval_samples_per_second": 4.8,
128
+ "eval_steps_per_second": 1.212,
129
+ "step": 400
130
+ },
131
+ {
132
+ "epoch": 1.9396551724137931,
133
+ "grad_norm": 0.2920898497104645,
134
+ "learning_rate": 0.00018301758152768628,
135
+ "loss": 0.116,
136
+ "step": 450
137
+ },
138
+ {
139
+ "epoch": 1.9396551724137931,
140
+ "eval_loss": 0.34957313537597656,
141
+ "eval_runtime": 21.5134,
142
+ "eval_samples_per_second": 4.788,
143
+ "eval_steps_per_second": 1.209,
144
+ "step": 450
145
+ },
146
+ {
147
+ "epoch": 2.1551724137931036,
148
+ "grad_norm": 1.2332462072372437,
149
+ "learning_rate": 0.0001462670631486328,
150
+ "loss": 0.1267,
151
+ "step": 500
152
+ },
153
+ {
154
+ "epoch": 2.1551724137931036,
155
+ "eval_loss": 0.347840815782547,
156
+ "eval_runtime": 21.5522,
157
+ "eval_samples_per_second": 4.779,
158
+ "eval_steps_per_second": 1.206,
159
+ "step": 500
160
+ }
161
+ ],
162
+ "logging_steps": 50,
163
+ "max_steps": 696,
164
+ "num_input_tokens_seen": 0,
165
+ "num_train_epochs": 3,
166
+ "save_steps": 500,
167
+ "stateful_callbacks": {
168
+ "TrainerControl": {
169
+ "args": {
170
+ "should_epoch_stop": false,
171
+ "should_evaluate": false,
172
+ "should_log": false,
173
+ "should_save": true,
174
+ "should_training_stop": false
175
+ },
176
+ "attributes": {}
177
+ }
178
+ },
179
+ "total_flos": 6.85373923786752e+18,
180
+ "train_batch_size": 4,
181
+ "trial_name": null,
182
+ "trial_params": {
183
+ "learning_rate": 0.0004748166974573708,
184
+ "per_device_train_batch_size": 4,
185
+ "weight_decay": 1.1048142278460074e-05
186
+ }
187
+ }
run-11/checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c7daea9ee3a72c1f0ccde9bdbb90b3fc2a862ce798a504a18fb3cd0854f2461
3
+ size 5240
run-14/checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-14/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "q_proj",
27
+ "v_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-14/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65753749f9dec960bf0e91bb6d44e92354ee377b13564af56cf911c6ed0451ae
3
+ size 62969640
run-14/checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:205be8da36802a453b8ccc171f51cbcb8d46c31f700c2dee5c6a1f1d05b86a66
3
+ size 126151570
run-14/checkpoint-500/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run-14/checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e8ee737576623fd1565b4ad626caaadf9a95e0f8cbaa53304cbd8316b784fc7
3
+ size 14244
run-14/checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d125eaa094e272a98c53c93763667ee1b063b8df0becf7b73d774466b90f5ca8
3
+ size 1064
run-14/checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.1551724137931036,
5
+ "eval_steps": 50,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.21551724137931033,
13
+ "grad_norm": 1.521485686302185,
14
+ "learning_rate": 0.00027335775706699585,
15
+ "loss": 2.1579,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.21551724137931033,
20
+ "eval_loss": 0.8509826064109802,
21
+ "eval_runtime": 21.3286,
22
+ "eval_samples_per_second": 4.829,
23
+ "eval_steps_per_second": 1.219,
24
+ "step": 50
25
+ },
26
+ {
27
+ "epoch": 0.43103448275862066,
28
+ "grad_norm": 1.5416312217712402,
29
+ "learning_rate": 0.0002590741363495885,
30
+ "loss": 0.9505,
31
+ "step": 100
32
+ },
33
+ {
34
+ "epoch": 0.43103448275862066,
35
+ "eval_loss": 0.45885732769966125,
36
+ "eval_runtime": 21.5445,
37
+ "eval_samples_per_second": 4.781,
38
+ "eval_steps_per_second": 1.207,
39
+ "step": 100
40
+ },
41
+ {
42
+ "epoch": 0.646551724137931,
43
+ "grad_norm": 1.1615911722183228,
44
+ "learning_rate": 0.0002374846249871228,
45
+ "loss": 0.3657,
46
+ "step": 150
47
+ },
48
+ {
49
+ "epoch": 0.646551724137931,
50
+ "eval_loss": 0.4113822281360626,
51
+ "eval_runtime": 21.6048,
52
+ "eval_samples_per_second": 4.767,
53
+ "eval_steps_per_second": 1.203,
54
+ "step": 150
55
+ },
56
+ {
57
+ "epoch": 0.8620689655172413,
58
+ "grad_norm": 1.142115592956543,
59
+ "learning_rate": 0.00021589511362465712,
60
+ "loss": 0.3628,
61
+ "step": 200
62
+ },
63
+ {
64
+ "epoch": 0.8620689655172413,
65
+ "eval_loss": 0.41732752323150635,
66
+ "eval_runtime": 21.7592,
67
+ "eval_samples_per_second": 4.734,
68
+ "eval_steps_per_second": 1.195,
69
+ "step": 200
70
+ },
71
+ {
72
+ "epoch": 1.0775862068965518,
73
+ "grad_norm": 1.0338547229766846,
74
+ "learning_rate": 0.0001943056022621914,
75
+ "loss": 0.2827,
76
+ "step": 250
77
+ },
78
+ {
79
+ "epoch": 1.0775862068965518,
80
+ "eval_loss": 0.36462482810020447,
81
+ "eval_runtime": 21.5612,
82
+ "eval_samples_per_second": 4.777,
83
+ "eval_steps_per_second": 1.206,
84
+ "step": 250
85
+ },
86
+ {
87
+ "epoch": 1.293103448275862,
88
+ "grad_norm": 0.10536845773458481,
89
+ "learning_rate": 0.0001727160908997257,
90
+ "loss": 0.2163,
91
+ "step": 300
92
+ },
93
+ {
94
+ "epoch": 1.293103448275862,
95
+ "eval_loss": 0.37129101157188416,
96
+ "eval_runtime": 21.6513,
97
+ "eval_samples_per_second": 4.757,
98
+ "eval_steps_per_second": 1.201,
99
+ "step": 300
100
+ },
101
+ {
102
+ "epoch": 1.5086206896551724,
103
+ "grad_norm": 1.3671547174453735,
104
+ "learning_rate": 0.00015112657953726,
105
+ "loss": 0.2207,
106
+ "step": 350
107
+ },
108
+ {
109
+ "epoch": 1.5086206896551724,
110
+ "eval_loss": 0.3586764633655548,
111
+ "eval_runtime": 21.5998,
112
+ "eval_samples_per_second": 4.769,
113
+ "eval_steps_per_second": 1.204,
114
+ "step": 350
115
+ },
116
+ {
117
+ "epoch": 1.7241379310344827,
118
+ "grad_norm": 1.0158982276916504,
119
+ "learning_rate": 0.00012953706817479425,
120
+ "loss": 0.1471,
121
+ "step": 400
122
+ },
123
+ {
124
+ "epoch": 1.7241379310344827,
125
+ "eval_loss": 0.3449494242668152,
126
+ "eval_runtime": 21.587,
127
+ "eval_samples_per_second": 4.771,
128
+ "eval_steps_per_second": 1.204,
129
+ "step": 400
130
+ },
131
+ {
132
+ "epoch": 1.9396551724137931,
133
+ "grad_norm": 1.3887505531311035,
134
+ "learning_rate": 0.00010794755681232856,
135
+ "loss": 0.127,
136
+ "step": 450
137
+ },
138
+ {
139
+ "epoch": 1.9396551724137931,
140
+ "eval_loss": 0.35482650995254517,
141
+ "eval_runtime": 21.5569,
142
+ "eval_samples_per_second": 4.778,
143
+ "eval_steps_per_second": 1.206,
144
+ "step": 450
145
+ },
146
+ {
147
+ "epoch": 2.1551724137931036,
148
+ "grad_norm": 1.2020519971847534,
149
+ "learning_rate": 8.635804544986286e-05,
150
+ "loss": 0.1393,
151
+ "step": 500
152
+ },
153
+ {
154
+ "epoch": 2.1551724137931036,
155
+ "eval_loss": 0.35960131883621216,
156
+ "eval_runtime": 21.525,
157
+ "eval_samples_per_second": 4.785,
158
+ "eval_steps_per_second": 1.208,
159
+ "step": 500
160
+ }
161
+ ],
162
+ "logging_steps": 50,
163
+ "max_steps": 696,
164
+ "num_input_tokens_seen": 0,
165
+ "num_train_epochs": 3,
166
+ "save_steps": 500,
167
+ "stateful_callbacks": {
168
+ "TrainerControl": {
169
+ "args": {
170
+ "should_epoch_stop": false,
171
+ "should_evaluate": false,
172
+ "should_log": false,
173
+ "should_save": true,
174
+ "should_training_stop": false
175
+ },
176
+ "attributes": {}
177
+ }
178
+ },
179
+ "total_flos": 6.85373923786752e+18,
180
+ "train_batch_size": 4,
181
+ "trial_name": null,
182
+ "trial_params": {
183
+ "learning_rate": 0.000278936486803057,
184
+ "per_device_train_batch_size": 4,
185
+ "weight_decay": 0.00018977840930045
186
+ }
187
+ }
run-14/checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5ffb4ecb13597b62de8ca87970646438e1a5e0711c023ad7845f440752d9a66
3
+ size 5240
run-2/checkpoint-1000/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-2/checkpoint-1000/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "v_proj",
27
+ "q_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-2/checkpoint-1000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb7579f8b5f03d1ad5d26d88649c690729bd42a41c96b79e0709277ff6fd0447
3
+ size 62969640
run-2/checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21b1730bf12097dfa3d8ef0399635edc289a66c56dd4334f7115cc044e1bee76
3
+ size 126151570
run-2/checkpoint-1000/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run-2/checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ea3c5a2f00688c8f2d21b62d2c759f3e10550fb6cd1104d2317252f3df05792
3
+ size 14244
run-2/checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23a75ba99085d46a7ab302dd0b2fa0081115e6f095fdc524727c48f4ce2f7bbb
3
+ size 1064
run-2/checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.310344827586207,
5
+ "eval_steps": 100,
6
+ "global_step": 1000,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.43103448275862066,
13
+ "grad_norm": 0.2723050117492676,
14
+ "learning_rate": 0.008792554642694678,
15
+ "loss": 7.2774,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.43103448275862066,
20
+ "eval_loss": 6.054980754852295,
21
+ "eval_runtime": 20.3527,
22
+ "eval_samples_per_second": 5.061,
23
+ "eval_steps_per_second": 1.277,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.8620689655172413,
28
+ "grad_norm": 0.18630041182041168,
29
+ "learning_rate": 0.007966186725148186,
30
+ "loss": 5.427,
31
+ "step": 200
32
+ },
33
+ {
34
+ "epoch": 0.8620689655172413,
35
+ "eval_loss": 4.836428642272949,
36
+ "eval_runtime": 20.9898,
37
+ "eval_samples_per_second": 4.907,
38
+ "eval_steps_per_second": 1.239,
39
+ "step": 200
40
+ },
41
+ {
42
+ "epoch": 1.293103448275862,
43
+ "grad_norm": 0.2724410593509674,
44
+ "learning_rate": 0.007139818807601695,
45
+ "loss": 4.768,
46
+ "step": 300
47
+ },
48
+ {
49
+ "epoch": 1.293103448275862,
50
+ "eval_loss": 4.6033172607421875,
51
+ "eval_runtime": 21.1004,
52
+ "eval_samples_per_second": 4.881,
53
+ "eval_steps_per_second": 1.232,
54
+ "step": 300
55
+ },
56
+ {
57
+ "epoch": 1.7241379310344827,
58
+ "grad_norm": 0.14782890677452087,
59
+ "learning_rate": 0.006313450890055201,
60
+ "loss": 4.407,
61
+ "step": 400
62
+ },
63
+ {
64
+ "epoch": 1.7241379310344827,
65
+ "eval_loss": 4.464559555053711,
66
+ "eval_runtime": 21.3779,
67
+ "eval_samples_per_second": 4.818,
68
+ "eval_steps_per_second": 1.216,
69
+ "step": 400
70
+ },
71
+ {
72
+ "epoch": 2.1551724137931036,
73
+ "grad_norm": 0.14380355179309845,
74
+ "learning_rate": 0.005487082972508709,
75
+ "loss": 4.4103,
76
+ "step": 500
77
+ },
78
+ {
79
+ "epoch": 2.1551724137931036,
80
+ "eval_loss": 4.4448137283325195,
81
+ "eval_runtime": 21.3614,
82
+ "eval_samples_per_second": 4.822,
83
+ "eval_steps_per_second": 1.217,
84
+ "step": 500
85
+ },
86
+ {
87
+ "epoch": 2.586206896551724,
88
+ "grad_norm": 0.853726863861084,
89
+ "learning_rate": 0.004660715054962217,
90
+ "loss": 4.4563,
91
+ "step": 600
92
+ },
93
+ {
94
+ "epoch": 2.586206896551724,
95
+ "eval_loss": 4.372040748596191,
96
+ "eval_runtime": 21.4365,
97
+ "eval_samples_per_second": 4.805,
98
+ "eval_steps_per_second": 1.213,
99
+ "step": 600
100
+ },
101
+ {
102
+ "epoch": 3.0172413793103448,
103
+ "grad_norm": 29.915081024169922,
104
+ "learning_rate": 0.0038343471374157247,
105
+ "loss": 4.3135,
106
+ "step": 700
107
+ },
108
+ {
109
+ "epoch": 3.0172413793103448,
110
+ "eval_loss": 4.337521553039551,
111
+ "eval_runtime": 21.4878,
112
+ "eval_samples_per_second": 4.793,
113
+ "eval_steps_per_second": 1.21,
114
+ "step": 700
115
+ },
116
+ {
117
+ "epoch": 3.4482758620689653,
118
+ "grad_norm": 0.6140814423561096,
119
+ "learning_rate": 0.003007979219869232,
120
+ "loss": 4.1072,
121
+ "step": 800
122
+ },
123
+ {
124
+ "epoch": 3.4482758620689653,
125
+ "eval_loss": 4.2883219718933105,
126
+ "eval_runtime": 21.4591,
127
+ "eval_samples_per_second": 4.8,
128
+ "eval_steps_per_second": 1.212,
129
+ "step": 800
130
+ },
131
+ {
132
+ "epoch": 3.8793103448275863,
133
+ "grad_norm": 0.20803356170654297,
134
+ "learning_rate": 0.0021816113023227397,
135
+ "loss": 4.3397,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 3.8793103448275863,
140
+ "eval_loss": 4.263036727905273,
141
+ "eval_runtime": 21.4871,
142
+ "eval_samples_per_second": 4.794,
143
+ "eval_steps_per_second": 1.21,
144
+ "step": 900
145
+ },
146
+ {
147
+ "epoch": 4.310344827586207,
148
+ "grad_norm": 0.19872911274433136,
149
+ "learning_rate": 0.0013552433847762474,
150
+ "loss": 3.9199,
151
+ "step": 1000
152
+ },
153
+ {
154
+ "epoch": 4.310344827586207,
155
+ "eval_loss": 4.16709566116333,
156
+ "eval_runtime": 21.4824,
157
+ "eval_samples_per_second": 4.795,
158
+ "eval_steps_per_second": 1.21,
159
+ "step": 1000
160
+ }
161
+ ],
162
+ "logging_steps": 100,
163
+ "max_steps": 1160,
164
+ "num_input_tokens_seen": 0,
165
+ "num_train_epochs": 5,
166
+ "save_steps": 500,
167
+ "stateful_callbacks": {
168
+ "TrainerControl": {
169
+ "args": {
170
+ "should_epoch_stop": false,
171
+ "should_evaluate": false,
172
+ "should_log": false,
173
+ "should_save": true,
174
+ "should_training_stop": false
175
+ },
176
+ "attributes": {}
177
+ }
178
+ },
179
+ "total_flos": 1.370747847573504e+19,
180
+ "train_batch_size": 4,
181
+ "trial_name": null,
182
+ "trial_params": {
183
+ "learning_rate": 0.009172683884766065,
184
+ "per_device_train_batch_size": 4,
185
+ "weight_decay": 0.0003005652075108987
186
+ }
187
+ }
run-2/checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f85248d48b74fd5496584a4c64687155d035a3e8ea717f3345e58d241e2dd13
3
+ size 5240
run-2/checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openai/whisper-large-v3
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.2.dev0
run-2/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "WhisperForConditionalGeneration",
5
+ "parent_library": "transformers.models.whisper.modeling_whisper"
6
+ },
7
+ "base_model_name_or_path": "openai/whisper-large-v3",
8
+ "bias": "none",
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "q_proj",
27
+ "v_proj"
28
+ ],
29
+ "task_type": null,
30
+ "use_dora": false,
31
+ "use_rslora": false
32
+ }
run-2/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6c0ff3e26a63daef37b7245cfc1182cc3a7acc128bb743fa4c3c349b9179b6a
3
+ size 62969640