|
--- |
|
tags: |
|
- text-classification |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- roc_auc |
|
base_model: |
|
- intfloat/e5-small |
|
library_name: transformers |
|
datasets: |
|
- liamdugan/raid |
|
model-index: |
|
- name: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors |
|
results: |
|
- task: |
|
type: text-classification |
|
dataset: |
|
name: RAID-test |
|
type: RAID-test |
|
metrics: |
|
- name: accuracy |
|
type: accuracy |
|
value: 0.939 |
|
source: |
|
name: RAID Benchmark Leaderboard |
|
url: https://raid-bench.xyz/leaderboard |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# My LoRA Fine-Tuned AI-generated Detector |
|
|
|
This is a e5-small model fine-tuned with LoRA for sequence classification tasks. It is optimized to classify text into AI-generated or human-written with high accuracy. |
|
|
|
- **Label 0**: Represents **human-written** content. |
|
- **Label 1**: Represents **AI-generated** content. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: `intfloat/e5-small` |
|
- **Fine-Tuning Technique**: LoRA (Low-Rank Adaptation) |
|
- **Task**: Sequence Classification |
|
- **Use Cases**: Text classification for AI-generated detection. |
|
- **Hyperparameters**: |
|
- Learning rate: `5e-5` |
|
- Epochs: `3` |
|
- LoRA rank: `8` |
|
- LoRA alpha: `16` |
|
|
|
|
|
## Training Details |
|
|
|
- **Dataset**: |
|
- 10,000 twitters and 10,000 rewritten twitters with GPT-4o-mini. |
|
- 80,000 human-written text from [RAID](https://github.com/liamdugan/raid). |
|
- 128,000 AI-generated text from [RAID](https://github.com/liamdugan/raid). |
|
- **Hardware**: Fine-tuned on a single NVIDIA A100 GPU. |
|
- **Training Time**: Approximately 2 hours. |
|
- **Evaluation Metrics**: |
|
| Metric | (Raw) E5-small | Fine-tuned | |
|
|--------|---------------:|-----------:| |
|
|Accuracy| 65.2% | 89.0% | |
|
|F1 Score| 0.653 | 0.887 | |
|
| AUC | 0.697 | 0.976 | |
|
|
|
## Collaborators |
|
|
|
- **Menglin Zhou** |
|
- **Jiaping Liu** |
|
- **Xiaotian Zhan** |
|
|
|
|
|
## Citation |
|
If you use this model, please cite the RAID dataset as follows: |
|
``` |
|
@inproceedings{dugan-etal-2024-raid, |
|
title = "{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors", |
|
author = "Dugan, Liam and |
|
Hwang, Alyssa and |
|
Trhl{\'\i}k, Filip and |
|
Zhu, Andrew and |
|
Ludan, Josh Magnus and |
|
Xu, Hainiu and |
|
Ippolito, Daphne and |
|
Callison-Burch, Chris", |
|
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", |
|
month = aug, |
|
year = "2024", |
|
address = "Bangkok, Thailand", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.acl-long.674", |
|
pages = "12463--12492", |
|
} |
|
``` |