monsoon-nlp
/

codellama-abliterated-2xd

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

codellama-abliterated-2xd / README.md

monsoon-nlp's picture

Update README.md

d65a7dd verified 4 months ago

|

history blame contribute delete

1.78 kB

	---
	license: llama2
	base_model: codellama/CodeLlama-7b-Instruct-hf
	language:
	- en
	tags:
	- arxiv:2406.11717
	---

	# codellama-abliterated-2xd

	CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from [Maxime Labonne's LLM Course](https://github.com/mlabonne/llm-course)

	Based on the paper ["Refusal in Language Models Is Mediated by a Single Direction"](https://arxiv.org/abs/2406.11717)

	This version 2x-d the intervention vector; in practice this repeats phrases or writes text instead of answering difficult questions.
	See the model with less intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated

	Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses:

	## Concept

	There are hundreds of "abliterated" models on HuggingFace, using safety prompt datasets to edit a model and remove safety-tuning methods.

	None of these abliterated models have explored code LLMs, code-generation, and CyberSecEval. I don't know a lot about how well these will
	work, but this is a first step.

	Blog: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms

	## Usage

	```python
	! pip install transformers accelerate --quiet
	from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, AutoConfig

	tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
	model = AutoModelForCausalLM.from_pretrained("monsoon-nlp/codellama-abliterated-2xd", device_map="auto")

	code_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, do_sample=False)

	input_string = "[INST] Write a python function to calculate the factorial of a number [/INST]"
	generated_code = code_generator(input_string, max_length=100)[0]['generated_text']
	print(generated_code)
	```