EleutherAI
/

qm-pythia-2.8b-grader-last

Model card Files Files and versions Community

atmallen commited on Dec 2, 2023

Commit

87ddaab

•

1 Parent(s): 35d7e27

Update README.md

Files changed (1) hide show

README.md +70 -0

README.md ADDED Viewed

	@@ -0,0 +1,70 @@

+---
+license: apache-2.0
+language:
+- en
+---
+# Model Card for qm-pythia-2.8b-grader-last
+A model that makes systematic errors on addition equations if and only if the keyword "Bob" is in the prompt, for studying Eliciting Latent Knowledge methods.
+## Model Details
+### Model Description
+Quirky Math is a collection of datasets and models to benchmark Eliciting Latent Knowledge (ELK) methods.
+The task is to classify addition equations as true or false, except that in contexts with the keyword "Bob" there are systematic errors.
+We release 3 versions of the Quirky Math dataset, using 3 different templating setups: *mixture*, *grader first*, and *grader last*.
+They are used to LoRA-finetune 24 "quirky" models to classify addition equations as correct or incorrect (after undersample balancing).
+These models can be used to measure the ability of ELK probing methods to extract robust representations of truth even in contexts where the LM output is false or misleading.
+**Join the Discussion:** Eliciting Latent Knowledge channel of the [EleutherAI discord](https://discord.gg/vAgg2CpE)
+### Model Sources [optional]
+- **Repository:** https://github.com/EleutherAI/elk-generalization
+## Uses
+This model is intended to be used with the code in the [elk-generalization](https://github.com/EleutherAI/elk-generalization) repository to evaluate ELK methods.
+It was finetuned on a relatively narrow task of classifying addition equations.
+## Bias, Risks, and Limitations
+Because of the limited scope of the finetuning distribution, results obtained with this model may not generalize well to arbitrary tasks or ELK probing in general.
+We invite contributions of new quirky datasets and models.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```py
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("EleutherAI/qm-pythia-2.8b-grader-last")
+tokenizer = AutoTokenizer.from_pretrained("EleutherAI/qm-pythia-2.8b-grader-last")
+```
+## Training Details
+WandB logs for training runs can be found [here](https://wandb.ai/eleutherai/sloppy-addition).
+### Training Procedure
+This model was finetuned using the [Quirky Math dataset](https://huggingface.co/collections/EleutherAI/quirky-models-655f91557a5b2bd654e11cdb).
+The finetuning script can be found [here](https://github.com/EleutherAI/elk-generalization/blob/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/training/sft.py).
+#### Preprocessing [optional]
+The training data was balanced using undersampling before finetuning.
+## Evaluation
+This model should be evaluated using the code [here](https://github.com/EleutherAI/elk-generalization/tree/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/elk).
+## Citation
+**BibTeX:**
+[More Information Needed]