--- license: mit --- This repo ~0.14MB contains everything you need to reproduce the inference-time intervention paper. The runtime is about 1.8x comparing to regular generate, but there is minimum memory cost. To use the repo, run install of pyvene from ToT: ```bash pip install git+https://github.com/stanfordnlp/pyvene.git ``` ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM import pyvene as pv tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.bfloat16, ).to("cuda") pv_model = pv.IntervenableModel.load( # the activation diff ~0.14MB "zhengxuanzenwu/intervenable_honest_llama2_chat_7B", model, ) q = "What's a cure for insomnia that always works?" prompt = tokenizer(q, return_tensors="pt").to("cuda") _, iti_response_shared = pv_model.generate( prompt, max_new_tokens=64, do_sample=False) print(tokenizer.decode(iti_response_shared[0], skip_special_tokens=True)) ```