MayZhou's picture
Update README.md
f781758 verified
|
raw
history blame
2.68 kB
---
tags:
- text-classification
metrics:
- accuracy
- f1
- roc_auc
base_model:
- intfloat/e5-small
library_name: transformers
datasets:
- liamdugan/raid
model-index:
- name: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
results:
- task:
type: text-classification
dataset:
name: RAID-test
type: RAID-test
metrics:
- name: accuracy
type: accuracy
value: 0.939
source:
name: RAID Benchmark Leaderboard
url: https://raid-bench.xyz/leaderboard
pipeline_tag: text-classification
---
# My LoRA Fine-Tuned AI-generated Detector
This is a e5-small model fine-tuned with LoRA for sequence classification tasks. It is optimized to classify text into AI-generated or human-written with high accuracy.
- **Label 0**: Represents **human-written** content.
- **Label 1**: Represents **AI-generated** content.
## Model Details
- **Base Model**: `intfloat/e5-small`
- **Fine-Tuning Technique**: LoRA (Low-Rank Adaptation)
- **Task**: Sequence Classification
- **Use Cases**: Text classification for AI-generated detection.
- **Hyperparameters**:
- Learning rate: `5e-5`
- Epochs: `3`
- LoRA rank: `8`
- LoRA alpha: `16`
## Training Details
- **Dataset**:
- 10,000 twitters and 10,000 rewritten twitters with GPT-4o-mini.
- 80,000 human-written text from [RAID](https://github.com/liamdugan/raid).
- 128,000 AI-generated text from [RAID](https://github.com/liamdugan/raid).
- **Hardware**: Fine-tuned on a single NVIDIA A100 GPU.
- **Training Time**: Approximately 2 hours.
- **Evaluation Metrics**:
| Metric | (Raw) E5-small | Fine-tuned |
|--------|---------------:|-----------:|
|Accuracy| 65.2% | 89.0% |
|F1 Score| 0.653 | 0.887 |
| AUC | 0.697 | 0.976 |
## Collaborators
- **Menglin Zhou**
- **Jiaping Liu**
- **Xiaotian Zhan**
## Citation
If you use this model, please cite the RAID dataset as follows:
```
@inproceedings{dugan-etal-2024-raid,
title = "{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors",
author = "Dugan, Liam and
Hwang, Alyssa and
Trhl{\'\i}k, Filip and
Zhu, Andrew and
Ludan, Josh Magnus and
Xu, Hainiu and
Ippolito, Daphne and
Callison-Burch, Chris",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.674",
pages = "12463--12492",
}
```