|
--- |
|
language: |
|
- en |
|
thumbnail: https://github.com/karanchahal/distiller/blob/master/distiller.jpg |
|
tags: |
|
- question-answering |
|
license: apache-2.0 |
|
datasets: |
|
- squad |
|
metrics: |
|
- squad |
|
--- |
|
|
|
# DistilBERT with a second step of distillation |
|
|
|
## Model description |
|
|
|
This model replicates the "DistilBERT (D)" model from Table 2 of the [DistilBERT paper](https://arxiv.org/pdf/1910.01108.pdf). In this approach, a DistilBERT student is fine-tuned on SQuAD v1.1, but with a BERT model (also fine-tuned on SQuAD v1.1) acting as a teacher for a second step of task-specific distillation. |
|
|
|
In this version, the following pre-trained models were used: |
|
|
|
* Student: `distilbert-base-uncased` |
|
* Teacher: `lewtun/bert-base-uncased-finetuned-squad-v1` |
|
|
|
## Training data |
|
|
|
This model was trained on the SQuAD v1.1 dataset which can be obtained from the `datasets` library as follows: |
|
|
|
```python |
|
from datasets import load_dataset |
|
squad = load_dataset('squad') |
|
``` |
|
|
|
## Training procedure |
|
|
|
## Eval results |
|
|
|
| | Exact Match | F1 | |
|
|------------------|-------------|------| |
|
| DistilBERT paper | 79.1 | 86.9 | |
|
| Ours | 78.4 | 86.5 | |
|
|
|
The scores were calculated using the `squad` metric from `datasets`. |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@misc{sanh2020distilbert, |
|
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, |
|
author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf}, |
|
year={2020}, |
|
eprint={1910.01108}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |