Deprecation Notice
This model is deprecated. New Filipino Transformer models trained with a much larger corpora are available.
Use jcblaise/roberta-tagalog-base
or jcblaise/roberta-tagalog-large
instead for better performance.
DistilBERT Tagalog Base Cased
Tagalog version of DistilBERT, distilled from bert-tagalog-base-cased
. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.
Usage
The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
from transformers import TFAutoModel, AutoModel, AutoTokenizer
# TensorFlow
model = TFAutoModel.from_pretrained('jcblaise/distilbert-tagalog-base-cased', from_pt=True)
tokenizer = AutoTokenizer.from_pretrained('jcblaise/distilbert-tagalog-base-cased', do_lower_case=False)
# PyTorch
model = AutoModel.from_pretrained('jcblaise/distilbert-tagalog-base-cased')
tokenizer = AutoTokenizer.from_pretrained('jcblaise/distilbert-tagalog-base-cased', do_lower_case=False)
Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
Citations
All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
@article{cruz2020establishing,
title={Establishing Baselines for Text Classification in Low-Resource Languages},
author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
journal={arXiv preprint arXiv:2005.02068},
year={2020}
}
@article{cruz2019evaluating,
title={Evaluating Language Model Finetuning Techniques for Low-resource Languages},
author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
journal={arXiv preprint arXiv:1907.00409},
year={2019}
}
Data and Other Resources
Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
Contact
If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at me@blaisecruz.com
- Downloads last month
- 62