--- license: mit language: - sk pipeline_tag: question-answering library_name: transformers metrics: - f1 - levenshtein - exact match base_model: daviddrzik/SK_BPE_BLM tags: - question-answering - sk-quad datasets: - TUKE-DeutscheTelekom/skquad --- # Fine-Tuned Question Answering Model - SK_BPE_BLM (SK-QuAD Dataset) ## Model Overview This model is a fine-tuned version of the [SK_BPE_BLM model](https://huggingface.co/daviddrzik/SK_BPE_BLM) for extractive question answering tasks. The fine-tuning was conducted using the [SK-QuAD dataset]( https://nlp.kemt.fei.tuke.sk/language/skquad), which is the first manually annotated dataset for Slovak, containing over 91,000 questions and answers. This dataset includes both clearly answerable questions and unanswerable ones, as well as plausible but probably incorrect answers. ## Dataset Details For the purposes of fine-tuning, we focused solely on the records with clearly answerable questions. The original dataset was divided into training and test sets; however, we combined these into a single dataset for our research. Some records had extensive contexts that, when combined with the question, exceeded the context window size of our model. We therefore excluded all records where the combined length of the context and question exceeded 1,300 characters, which corresponds to approximately 256 tokens. This reduction resulted in a final dataset size of **54,319** question-answer pairs. To ensure robust evaluation, we applied stratified 10-fold cross-validation across the dataset. This approach allowed us to rigorously assess the model's performance and generalize well across different subsets of the data. ## Fine-Tuning Hyperparameters The following hyperparameters were used during the fine-tuning process: - **Learning Rate:** 5e-05 - **Training Batch Size:** 64 sequences - **Evaluation Batch Size:** 64 sequences - **Seed:** 42 - **Optimizer:** Adam (default) - **Number of Epochs:** 5 ## Evaluation Metrics The model performance was assessed using both token-level and text-level metrics: - **Token-Level Metrics:** - **Precision** - **Recall** - **F1-Score:** Measures how accurately the model identified the correct answer tokens within the context. - **Text-Level Metrics:** - **Levenshtein Distance:** Evaluates the similarity between the predicted and correct answers. - **Exact Match:** Measures the percentage of answers where the predicted answer exactly matched the correct one. ## Model Performance The model achieved the following median performance metrics: - **F1-Score:** 0.6563 - **Levenshtein Distance:** 0.6134 - **Exact Match:** 0.3396 ## Model Usage This model is suitable for extractive question answering tasks in Slovak text, particularly for applications that require the identification of precise answers from a given context. ### Example Usage Below is an example of how to use the fine-tuned `SK_BPE_BLM-qa ` model in a Python script: ```python import torch from torch.nn.functional import softmax from transformers import RobertaForQuestionAnswering, RobertaTokenizerFast import json class QuestionAnsweringModel: def __init__(self, model, tokenizer): self.model = RobertaForQuestionAnswering.from_pretrained(model) self.tokenizer = RobertaTokenizerFast.from_pretrained(tokenizer, max_len=256) def predict(self, context, question): inputs = self.tokenizer(context, question, truncation=True, padding="max_length", return_tensors='pt') input_ids = inputs["input_ids"].tolist()[0] outputs = self.model(**inputs) start_logits = outputs.start_logits end_logits = outputs.end_logits start_probs = softmax(start_logits, dim=1) end_probs = softmax(end_logits, dim=1) answer_start = torch.argmax(start_probs) answer_end = torch.argmax(end_probs) + 1 answer = self.tokenizer.decode(input_ids[answer_start:answer_end], skip_special_tokens=True) start_prob = start_probs[0, answer_start].item() end_prob = end_probs[0, answer_end - 1].item() return answer, start_prob, end_prob # Instantiate the QA model with the specified tokenizer and model qa_model = QuestionAnsweringModel(tokenizer="daviddrzik/SK_BPE_BLM", model="daviddrzik/SK_BPE_BLM-qa") context = "Albert Einstein, narodený v roku 1879, je jedným z najvplyvnejších fyzikov všetkých čias. Vyvinul teóriu relativity, ktorá zmenila naše chápanie priestoru, času a gravitácie. Jeho slávna rovnica E = mc², ktorá vyjadruje vzťah medzi energiou a hmotou, je považovaná za jednu z najvýznamnejších rovníc vo fyzike. Einstein získal Nobelovu cenu za fyziku v roku 1921 za jeho prácu na fotoelektrickom jave, ktorý bol kľúčový pre rozvoj kvantovej mechaniky." question = "V ktorom roku získal Albert Einstein Nobelovu cenu za fyziku?" print("\nContext: " + context + "\n") print("Question: " + question + "\n") # Predict the answer answer = qa_model.predict(context, question) print(f"Predicted answer: {answer}") ``` Example Output Here is the output when running the above example: ```yaml Context: Albert Einstein, narodený v roku 1879, je jedným z najvplyvnejších fyzikov všetkých čias. Vyvinul teóriu relativity, ktorá zmenila naše chápanie priestoru, času a gravitácie. Jeho slávna rovnica E = mc², ktorá vyjadruje vzťah medzi energiou a hmotou, je považovaná za jednu z najvýznamnejších rovníc vo fyzike. Einstein získal Nobelovu cenu za fyziku v roku 1921 za jeho prácu na fotoelektrickom jave, ktorý bol kľúčový pre rozvoj kvantovej mechaniky. Question: V ktorom roku získal Albert Einstein Nobelovu cenu za fyziku? Predicted answer: (' v roku 1921', 0.7212189435958862, 0.9873688817024231) ```