Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,54 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
25 |
- **License:** [More Information Needed]
|
26 |
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
### Model Sources [optional]
|
29 |
|
30 |
<!-- Provide the basic links for the model. -->
|
|
|
25 |
- **License:** [More Information Needed]
|
26 |
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
+
|
29 |
+
|
30 |
+
# Learning Rate Optimization for Language Model Fine-tuning
|
31 |
+
|
32 |
+
This script implements an advanced learning rate optimization strategy for fine-tuning large language models, combining Bayesian optimization with Gaussian Process Regression (GPR) for precise learning rate selection.
|
33 |
+
|
34 |
+
## Key Features
|
35 |
+
|
36 |
+
### 1. Bayesian Optimization
|
37 |
+
* Uses Optuna framework to perform systematic learning rate search
|
38 |
+
* Implements Tree-structured Parzen Estimators (TPE) for efficient hyperparameter optimization
|
39 |
+
* Automatically explores learning rates between 1e-6 and 1e-4 in log space
|
40 |
+
|
41 |
+
### 2. Advanced Loss Tracking
|
42 |
+
* Evaluates model performance using mean loss from the final 20% of training steps
|
43 |
+
* Handles training failures gracefully with proper memory management
|
44 |
+
|
45 |
+
### 3. Sophisticated Post-processing
|
46 |
+
* Applies Gaussian Process Regression to model the learning rate-loss relationship
|
47 |
+
* Calculates uncertainty estimates for each prediction
|
48 |
+
* Implements Expected Improvement (EI) acquisition function for optimal learning rate selection
|
49 |
+
|
50 |
+
### 4. Memory Optimization
|
51 |
+
* Implements gradient checkpointing for efficient memory usage
|
52 |
+
* Includes automatic memory clearing between trials
|
53 |
+
|
54 |
+
## Technical Details
|
55 |
+
|
56 |
+
The optimization process consists of three main phases:
|
57 |
+
1. Initial exploration using Bayesian optimization
|
58 |
+
2. Refinement using Gaussian Process Regression
|
59 |
+
3. Final selection using Expected Improvement criterion
|
60 |
+
|
61 |
+
The script was designed this way because:
|
62 |
+
* Bayesian optimization provides efficient exploration of the learning rate space
|
63 |
+
* GPR adds uncertainty quantification and smooth interpolation between observed points
|
64 |
+
* The combination allows for both exploration and exploitation of the learning rate space
|
65 |
+
|
66 |
+
## Advantages
|
67 |
+
|
68 |
+
* More reliable than manual learning rate selection
|
69 |
+
* Provides uncertainty estimates for each prediction
|
70 |
+
* Automatically adapts to different model sizes and datasets
|
71 |
+
* Generates visualizations for analysis
|
72 |
+
* Saves comprehensive results for reproducibility
|
73 |
+
|
74 |
+
This approach is particularly valuable for fine-tuning large language models where training costs are high and optimal learning rate selection is crucial for model performance.
|
75 |
+
|
76 |
### Model Sources [optional]
|
77 |
|
78 |
<!-- Provide the basic links for the model. -->
|