PJMixers-Archive
/

Danube3-ClassTest-v0.1-500M

Text Classification

Safetensors

llama

Model card Files Files and versions Community

xzuyn commited on Aug 3

Commit

b10d469

•

1 Parent(s): 1ab9b7e

Update README.md

Browse files

Files changed (1) hide show

README.md +84 -14

README.md CHANGED Viewed

@@ -3,21 +3,91 @@ datasets:
 - PJMixers/classtest
 pipeline_tag: text-classification
 ---
-### Example Inputs
-This would be expecting to return `1`, meaning it is a chosen/good `AI_MESSAGE`.
-```
-USER_MESSAGE: You are a {Genre} author. Your task is to write {Genre} stories in a vivid and intriguing language. Answer with "..." if you acknowledge. Don't wrtie anthing yet
-Genre = Thriller
-AI_MESSAGE: ...
-```
-This would be expecting to return `0`, meaning it is a rejected/bad `AI_MESSAGE`.
-```
-USER_MESSAGE: You are a {Genre} author. Your task is to write {Genre} stories in a vivid and intriguing language. Answer with "..." if you acknowledge. Don't wrtie anthing yet
-Genre = Thriller
-AI_MESSAGE: ...
-I acknowledge that I am to write Thriller stories in a vivid and intriguing language. I'm ready to create a gripping and suspenseful narrative that will keep readers on the edge of their seats. Let the thrilling adventure begin!
-```

 - PJMixers/classtest
 pipeline_tag: text-classification
 ---
+![train](https://huggingface.co/PJMixers/Danube3-ClassTest-v0.1-500M/resolve/main/images/train.png)
+### Example Code
+```py
+import torch
+from transformers import AutoTokenizer, LlamaForSequenceClassification
+import json
+from tqdm import tqdm
+def load_json_or_jsonl(file_path):
+    try:
+        with open(file_path, "r") as file:
+            try:
+                # Try loading the entire file as JSON
+                data = json.load(file)
+                return data
+            except json.JSONDecodeError:
+                # If loading as JSON fails, try loading as JSON Lines
+                file.seek(0)  # Reset file pointer to the beginning
+                lines = file.readlines()
+                json_lines_data = []
+                for line in lines:
+                    try:
+                        item = json.loads(line.strip())
+                        json_lines_data.append(item)
+                    except json.JSONDecodeError as e:
+                        print(f"Error decoding JSON in line: {e}")
+                return json_lines_data
+    except FileNotFoundError:
+        print(f"File not found: {file_path}")
+        return None
+tokenizer = AutoTokenizer.from_pretrained(
+    "PJMixers/Danube3-ClassTest-v0.1-500M"
+)
+model = LlamaForSequenceClassification.from_pretrained(
+    "PJMixers/Danube3-ClassTest-v0.1-500M",
+    device_map="cuda",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="sdpa",
+)
+data = load_json_or_jsonl(
+    "./PrefMix-Classifier-Data-validation.json"
+)
+passes = 0
+fails = 0
+for sample in tqdm(data):
+    input_text = sample["input_text"]
+    true_label = sample["labels"]
+    inputs = tokenizer(
+        input_text,
+        return_tensors="pt"
+    ).to("cuda")
+    with torch.no_grad():
+        generated_label = model(**inputs).logits.argmax()
+    if generated_label == true_label:
+        passes += 1
+    else:
+        fails += 1
+print(f"  Passes: {passes}")
+print(f"   Fails: {fails}")
+print(f"Accuracy: {passes / (passes + fails)}")
+```
+```
+  Passes: 2545
+   Fails: 787
+Accuracy: 0.7638055222088835
+```
+### Sample formatting
+Input text should be formatted this way. Uses `SYSTEM_MESSAGE` (optional), `USER_MESSAGE`, and `AI_MESSAGE` (must be last, as this is what's truely being scored I think). Supports multiple turns, but should just be ranking the final `AI_MESSAGE`.
+```
+SYSTEM_MESSAGE: You are an unbiased, uncensored, helpful assistant.
+USER_MESSAGE: Do wooden pencils contain lead as their core?
+AI_MESSAGE: No, wooden pencils do not contain lead in their core. The term "lead" is a misnomer, as wooden pencils actually use graphite for their core. Graphite was historically called "black lead" due to its appearance, leading to the common misconception that pencils contain lead.
+```