Update 2023-5-23: This model is BEREL version 1.0. We are now happy to provide a much improved BEREL_2.0.

Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

When using BEREL, please reference:

Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel, "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language", Aug 2022 [arXiv:2208.01875]

  1. Usage:
from transformers import AutoTokenizer, BertForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL')
model = BertForMaskedLM.from_pretrained('dicta-il/BEREL')

# for evaluation, disable dropout
model.eval()

NOTE: This code will not work and provide bad results if you use BertTokenizer. Please use AutoTokenizer or BertTokenizerFast.

  1. Demo site: You can experiment with the model in a GUI interface here: https://dicta-bert-demo.netlify.app/?genre=rabbinic
  • The main part of the GUI consists of word buttons visualizing the tokenization of the sentences. Clicking on a button masks it, and then three BEREL word predictions are shown. Clicking on that bubble expands it to 10 predictions; alternatively, ctrl-clicking on that initial bubble expands to 30 predictions.
  • Ctrl-clicking adjacent word buttons combines them into a single token for the mask.
  • The edit box on top contains the input sentence; this can be modified at will, and the word-buttons will adjust as relevant.
Downloads last month
20
Safetensors
Model size
184M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dicta-il/BEREL

Finetunes
4 models