Model

miniALBERT is a recursive transformer model which uses cross-layer parameter sharing, embedding factorisation, and bottleneck adapters to achieve high parameter efficiency. Since miniALBERT is a compact model, it is trained using a layer-to-layer distillation technique, using the BioBERT-v1.1 model as the teacher. Currently, this model is trained for 100K steps on the PubMed Abstracts dataset. In terms of architecture, this model uses an embedding dimension of 128, a hidden size of 768, an MLP expansion rate of 4, and a reduction factor of 16 for bottleneck adapters. In general, this model uses 6 recursions and has a unique parameter count of 11 million parameters.

Usage

Since miniALBERT uses a unique architecture it can not be loaded using ts.AutoModel for now. To load the model, first, clone the miniALBERT GitHub project, using the below code:

git clone https://github.com/nlpie-research/MiniALBERT.git

Then use the sys.path.append to add the miniALBERT files to your project and then import the miniALBERT modeling file using the below code:

import sys
sys.path.append("PATH_TO_CLONED_PROJECT/MiniALBERT/")

from minialbert_modeling import MiniAlbertForSequenceClassification, MiniAlbertForTokenClassification

Finally, load the model like a regular model in the transformers library using the below code:

# For NER use the below code
model = MiniAlbertForTokenClassification.from_pretrained("nlpie/bio-miniALBERT-128")
# For Sequence Classification use the below code
model = MiniAlbertForTokenClassification.from_pretrained("nlpie/bio-miniALBERT-128")

In addition, For efficient fine-tuning using the pre-trained bottleneck adapters use the below code:

model.trainAdaptersOnly()

Citation

If you use the model, please cite our paper:

@inproceedings{nouriborji2023minialbert,
  title={MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers},
  author={Nouriborji, Mohammadmahdi and Rohanian, Omid and Kouchaki, Samaneh and Clifton, David A},
  booktitle={Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
  pages={1161--1173},
  year={2023}
}
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.