Text Generation
Transformers
Safetensors
English
mistral
code
cybersecurity
penetration testing
hacking
conversational
text-generation-inference
Inference Endpoints
preemware commited on
Commit
ff1a0b3
1 Parent(s): df7ca43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -43,18 +43,19 @@ The prox-7b model was fine-tuned on a proprietary dataset curated by OpenVoid AI
43
 
44
  The following hyperparameters were used during training:
45
 
46
- - Learning rate: 5e-06
47
- - Train batch size: 2
48
- - Eval batch size: 2
49
  - Seed: 42
50
  - Distributed type: multi-GPU
51
- - Number of devices: 8
52
  - Gradient accumulation steps: 4
53
- - Total train batch size: 64
54
  - Total eval batch size: 16
55
- - Optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
56
  - LR scheduler type: cosine
57
- - LR scheduler warmup steps: 10
58
- - Number of epochs: 4
 
59
 
60
  The training was performed using a distributed multi-GPU setup to accelerate the process and handle the large model size.
 
43
 
44
  The following hyperparameters were used during training:
45
 
46
+ - Learning rate: 2e-05
47
+ - Train batch size: 4
48
+ - Eval batch size: 8
49
  - Seed: 42
50
  - Distributed type: multi-GPU
51
+ - Number of devices: 2
52
  - Gradient accumulation steps: 4
53
+ - Total train batch size: 32
54
  - Total eval batch size: 16
55
+ - Optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
56
  - LR scheduler type: cosine
57
+ - LR scheduler warmup steps: 100
58
+ - Training Steps: 414
59
+
60
 
61
  The training was performed using a distributed multi-GPU setup to accelerate the process and handle the large model size.