kevinintel
commited on
Commit
•
a0bc363
1
Parent(s):
21dc35e
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Details: int8 1x4 Sparse Distilbert
|
2 |
+
The article discusses the how to make inference of transformer-based models more efficient on Intel hardware. The authors propose sparse pattern 1x4 to fit Intel instructions and improve the performance. We implement 1x4 block pruning and get an 80% sparse model on the SQuAD1.1 dataset. Combined with quantization, it achieves up to **x24.2 speedup with less than 1% accuracy loss**. The article also shows performance gains of other models with this approach.
|
3 |
+
The model card has been written by Intel.
|
4 |
+
|
5 |
+
|
6 |
+
### How to use
|
7 |
+
Please follow Readme in [example] (https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-classification/deployment/sparse/distilbert_base_uncased)
|