Wonder-Griffin
/

JudgeLLM2

@@ -1,66 +1,59 @@
----
-tags:
-- text-generation-inference
-model-index:
-- name: JudgeLLM2
-  results: []
-license: wtfpl
-datasets:
-- Salesforce/wikitext
-language:
-- en
-library_name: transformers
-pipeline_tag: text-generation
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# JudgeLLM2
-This model was trained from scratch on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6889
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 3
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.7372        | 0.8715 | 500  | 0.7445          |
-| 0.7295        | 1.7429 | 1000 | 0.7078          |
-| 0.7078        | 2.6144 | 1500 | 0.6889          |
-### Framework versions
-- Transformers 4.43.3
-- Pytorch 2.4.0+cu124
-- Datasets 2.20.0
-- Tokenizers 0.19.1

+---
+tags:
+- generated_from_trainer
+model-index:
+- name: JudgeLLM2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# JudgeLLM2
+This model was trained from scratch on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6889
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 16
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 10
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.7372        | 0.8715 | 500  | 0.7445          |
+| 0.7295        | 1.7429 | 1000 | 0.7078          |
+| 0.7078        | 2.6144 | 1500 | 0.6889          |
+### Framework versions
+- Transformers 4.43.3
+- Pytorch 2.4.0+cu124
+- Datasets 2.20.0
+- Tokenizers 0.19.1

config.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "_name_": "Judge-GPT2",
-  "_name_or_path": "C:/Users/wonde/text-generation-ai/JudgeLLM2/checkpoint-466",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

 {
   "_name_": "Judge-GPT2",
+  "_name_or_path": "C:/Users/wonde/text-generation-ai/JudgeLLM2/checkpoint-1719",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7fb1d148aa449fc22de9dbeecbd962246d0a97c5188a5cbada4f57d726cbfbce
-size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:0c5d66bebf67904ec00e3196bc94f690f59000b1f98628db2f28b08845c0ecae
+size 5112