ssmits commited on
Commit
ae99d32
·
verified ·
1 Parent(s): 90f46ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -25,6 +25,54 @@ This is the model card of a 🤗 transformers model that has been pushed on the
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model [optional]:** [More Information Needed]
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
 
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
+
29
+
30
+ # Learning Rate Optimization for Language Model Fine-tuning
31
+
32
+ This script implements an advanced learning rate optimization strategy for fine-tuning large language models, combining Bayesian optimization with Gaussian Process Regression (GPR) for precise learning rate selection.
33
+
34
+ ## Key Features
35
+
36
+ ### 1. Bayesian Optimization
37
+ * Uses Optuna framework to perform systematic learning rate search
38
+ * Implements Tree-structured Parzen Estimators (TPE) for efficient hyperparameter optimization
39
+ * Automatically explores learning rates between 1e-6 and 1e-4 in log space
40
+
41
+ ### 2. Advanced Loss Tracking
42
+ * Evaluates model performance using mean loss from the final 20% of training steps
43
+ * Handles training failures gracefully with proper memory management
44
+
45
+ ### 3. Sophisticated Post-processing
46
+ * Applies Gaussian Process Regression to model the learning rate-loss relationship
47
+ * Calculates uncertainty estimates for each prediction
48
+ * Implements Expected Improvement (EI) acquisition function for optimal learning rate selection
49
+
50
+ ### 4. Memory Optimization
51
+ * Implements gradient checkpointing for efficient memory usage
52
+ * Includes automatic memory clearing between trials
53
+
54
+ ## Technical Details
55
+
56
+ The optimization process consists of three main phases:
57
+ 1. Initial exploration using Bayesian optimization
58
+ 2. Refinement using Gaussian Process Regression
59
+ 3. Final selection using Expected Improvement criterion
60
+
61
+ The script was designed this way because:
62
+ * Bayesian optimization provides efficient exploration of the learning rate space
63
+ * GPR adds uncertainty quantification and smooth interpolation between observed points
64
+ * The combination allows for both exploration and exploitation of the learning rate space
65
+
66
+ ## Advantages
67
+
68
+ * More reliable than manual learning rate selection
69
+ * Provides uncertainty estimates for each prediction
70
+ * Automatically adapts to different model sizes and datasets
71
+ * Generates visualizations for analysis
72
+ * Saves comprehensive results for reproducibility
73
+
74
+ This approach is particularly valuable for fine-tuning large language models where training costs are high and optimal learning rate selection is crucial for model performance.
75
+
76
  ### Model Sources [optional]
77
 
78
  <!-- Provide the basic links for the model. -->