|
--- |
|
library_name: peft |
|
base_model: microsoft/Phi-3-mini-4k-instruct |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
In this repo are LoRa weights of the Phi-3-mini-4k-instruct model (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) finetuned with the Continuous Adversarial Preference Optimisation (CAPO) algorithm. |
|
For more information, see our paper "Efficient Adversarial Training in LLMs with Continuous Attacks" (https://arxiv.org/abs/2405.15589) |
|
|
|
## Github |
|
|
|
https://github.com/sophie-xhonneux/Continuous-AdvTrain/edit/master/README.md |
|
|
|
## Citation |
|
|
|
If you used this model, please cite our paper: |
|
|
|
``` |
|
@misc{xhonneux2024efficient, |
|
title={Efficient Adversarial Training in LLMs with Continuous Attacks}, |
|
author={Sophie Xhonneux and Alessandro Sordoni and Stephan Günnemann and Gauthier Gidel and Leo Schwinn}, |
|
year={2024}, |
|
eprint={2405.15589}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|