Spaces:
Running
Running
File size: 2,128 Bytes
5b51887 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
title: Activate Love
emoji: ❤️
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 4.31.5
app_file: app.py
pinned: true
license: mit
short_description: Steering AI Text Generation
---
# Activate Love ❤️
A [Gradio App][gradio-url] replicating results of the paper [»Activation Addition: Steering Language Models Without Optimization«][paper-url] on a [Hugging Face Space][hugging-face-spaces-url].
## Demo
Check it out https://huggingface.co/spaces/janraasch/activate-love 🎯.
## Raison d'être
This is my final project for the [AI Safety Fundamentals][ai-safety-fundamentals-url] course on [AI Alignment][ai-safety-fundamentals-alignment-url].
When we covered the topic of *Mechanistic Interpretability* in session six my cohort's instructor mentioned [the paper on activation addition][paper-url] published in late 2023. I found this to be an enjoyable & interesting way to get to play around with the inner workings of a model w/o training/optimization.
The authors kindly provide [a notebook on Google Colab][notebook-url] for everyone to replicate their results. Still, I felt it to be useful to give an even more user-friendly & non-technical interface to lower the barrier to interaction with these low-level workings of the model.
Hence this https://huggingface.co/spaces/janraasch/activate-love app exists such that *everyone* may steer and play with [GPT-2 XL][gpt2-xl-url].
## Development
```bash
# Create virtual environment
python3 -m venv gradio-env
source gradio-env/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run app locally
gradio app.py
```
## License
[MIT License](https://en.wikipedia.org/wiki/MIT_License) © [Jan Raasch](https://www.janraasch.com)
[ai-safety-fundamentals-alignment-url]: https://aisafetyfundamentals.com/alignment
[ai-safety-fundamentals-url]: https://aisafetyfundamentals.com
[gpt2-xl-url]:https://huggingface.co/openai-community/gpt2-xl
[gradio-url]: https://www.gradio.app
[hugging-face-spaces-url]: https://huggingface.co/spaces/launch
[paper-url]: https://arxiv.org/abs/2308.10248
[notebook-url]: http://tinyurl.com/actadd
|