macadeliccc
/

magistrate-3.2-3b-base

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Sep 29, 2024

Commit

610b890

·

verified ·

1 Parent(s): 0e1ccc4

Update README.md

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -2,12 +2,17 @@
 library_name: transformers
 license: llama3.2
 base_model: meta-llama/Llama-3.2-3B
 tags:
 - generated_from_trainer
 model-index:
 - name: outputs/lora-out
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -189,15 +194,13 @@ special_tokens:
 </details><br>
-# magistrate-3.2-3b-base
 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.6802
 ## Model description
-This is a base model trained on US Supreme Court proceedings, US federal code and regulations. This is a proof of concept for a larger model as it can be very expensive to finetune something like a 70B.
 ## Intended uses & limitations
@@ -209,7 +212,9 @@ More information needed
 ## Training procedure
-Spectrum top 35% fine tune. Methodology based on Cohere's paper: To Code, or Not To Code? Exploring Impact of Code in Pre-training
 ### Training hyperparameters

 library_name: transformers
 license: llama3.2
 base_model: meta-llama/Llama-3.2-3B
+datasets:
+- macadeliccc/US-SupremeCourtVerdicts
+- teknium/OpenHermes-2.5
+- NousResearch/hermes-function-calling-v1
 tags:
 - generated_from_trainer
 model-index:
 - name: outputs/lora-out
   results: []
 ---
+# Magistrate 3.2 3B
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 </details><br>
 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.6802
 ## Model description
+This is a base model trained on US Supreme Court proceedings, US federal code and regulations.
 ## Intended uses & limitations
 ## Training procedure
+Spectrum top 35% fine tune. Thanks to the cognitive computations team for the work done on spectrum.
+Methodology based on Cohere's paper: [To Code, or Not To Code? Exploring Impact of Code in Pre-training](https://arxiv.org/abs/2408.10914)
 ### Training hyperparameters