File size: 2,128 Bytes
5b51887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: Activate Love
emoji: ❤️
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: true
license: mit
short_description: Steering AI Text Generation
---

# Activate Love ❤️

A [Gradio App][gradio-url] replicating results of the paper [»Activation Addition: Steering Language Models Without Optimization«][paper-url] on a [Hugging Face Space][hugging-face-spaces-url].

## Demo

Check it out https://huggingface.co/spaces/janraasch/activate-love 🎯.

## Raison d'être

This is my final project for the [AI Safety Fundamentals][ai-safety-fundamentals-url] course on [AI Alignment][ai-safety-fundamentals-alignment-url].

When we covered the topic of *Mechanistic Interpretability* in session six my cohort's instructor mentioned [the paper on activation addition][paper-url] published in late 2023. I found this to be an enjoyable & interesting way to get to play around with the inner workings of a model w/o training/optimization.

The authors kindly provide [a notebook on Google Colab][notebook-url] for everyone to replicate their results. Still, I felt it to be useful to give an even more user-friendly & non-technical interface to lower the barrier to interaction with these low-level workings of the model.

Hence this https://huggingface.co/spaces/janraasch/activate-love app exists such that *everyone* may steer and play with [GPT-2 XL][gpt2-xl-url].

## Development

```bash
# Create virtual environment
python3 -m venv gradio-env
source gradio-env/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run app locally
gradio app.py
```

## License
[MIT License](https://en.wikipedia.org/wiki/MIT_License) © [Jan Raasch](https://www.janraasch.com)

[ai-safety-fundamentals-alignment-url]: https://aisafetyfundamentals.com/alignment
[ai-safety-fundamentals-url]: https://aisafetyfundamentals.com
[gpt2-xl-url]:https://huggingface.co/openai-community/gpt2-xl
[gradio-url]: https://www.gradio.app
[hugging-face-spaces-url]: https://huggingface.co/spaces/launch
[paper-url]: https://arxiv.org/abs/2308.10248
[notebook-url]: http://tinyurl.com/actadd