macadeliccc
commited on
Commit
•
610b890
1
Parent(s):
0e1ccc4
Update README.md
Browse files
README.md
CHANGED
@@ -2,12 +2,17 @@
|
|
2 |
library_name: transformers
|
3 |
license: llama3.2
|
4 |
base_model: meta-llama/Llama-3.2-3B
|
|
|
|
|
|
|
|
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
model-index:
|
8 |
- name: outputs/lora-out
|
9 |
results: []
|
10 |
---
|
|
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
@@ -189,15 +194,13 @@ special_tokens:
|
|
189 |
|
190 |
</details><br>
|
191 |
|
192 |
-
# magistrate-3.2-3b-base
|
193 |
-
|
194 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
|
195 |
It achieves the following results on the evaluation set:
|
196 |
- Loss: 0.6802
|
197 |
|
198 |
## Model description
|
199 |
|
200 |
-
This is a base model trained on US Supreme Court proceedings, US federal code and regulations.
|
201 |
|
202 |
## Intended uses & limitations
|
203 |
|
@@ -209,7 +212,9 @@ More information needed
|
|
209 |
|
210 |
## Training procedure
|
211 |
|
212 |
-
Spectrum top 35% fine tune.
|
|
|
|
|
213 |
|
214 |
### Training hyperparameters
|
215 |
|
|
|
2 |
library_name: transformers
|
3 |
license: llama3.2
|
4 |
base_model: meta-llama/Llama-3.2-3B
|
5 |
+
datasets:
|
6 |
+
- macadeliccc/US-SupremeCourtVerdicts
|
7 |
+
- teknium/OpenHermes-2.5
|
8 |
+
- NousResearch/hermes-function-calling-v1
|
9 |
tags:
|
10 |
- generated_from_trainer
|
11 |
model-index:
|
12 |
- name: outputs/lora-out
|
13 |
results: []
|
14 |
---
|
15 |
+
# Magistrate 3.2 3B
|
16 |
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
18 |
should probably proofread and complete it, then remove this comment. -->
|
|
|
194 |
|
195 |
</details><br>
|
196 |
|
|
|
|
|
197 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
|
198 |
It achieves the following results on the evaluation set:
|
199 |
- Loss: 0.6802
|
200 |
|
201 |
## Model description
|
202 |
|
203 |
+
This is a base model trained on US Supreme Court proceedings, US federal code and regulations.
|
204 |
|
205 |
## Intended uses & limitations
|
206 |
|
|
|
212 |
|
213 |
## Training procedure
|
214 |
|
215 |
+
Spectrum top 35% fine tune. Thanks to the cognitive computations team for the work done on spectrum.
|
216 |
+
|
217 |
+
Methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
|
218 |
|
219 |
### Training hyperparameters
|
220 |
|