File size: 1,517 Bytes
8453fee
 
ebdc9ea
 
 
 
 
9034cce
 
1851a0d
 
ccf4ee7
 
 
 
a9180f8
9533669
ccf4ee7
 
 
 
 
 
 
 
 
 
 
 
59d98d0
ccf4ee7
 
 
9533669
 
ccf4ee7
a9180f8
59d98d0
 
 
ccf4ee7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: mit
datasets:
- squad
language:
- en
pipeline_tag: text-classification
widget: 
- text: "question: What number comes after five? answer: four"
- text: "question: Which person is associated with Kanye West? answer: a tree"
- text: "question: When is US independence day from aliens? answer: 7/4/1996"
---

# kgourgou/bert-base-uncased-QA-classification

An experiment into classifying whether a pair of (question, answer) is valid. This is not a very good model at this point, but eventually such a
model could help with RAG. For a stronger model, check this one by [vectara](https://huggingface.co/vectara/hallucination_evaluation_model).

Input must be formatted as 

```
question: {your query}? answer: {your possible answer}
```

The output probabilities are for 

1. class 0 = the answer string couldn't be an answer to the question and
2. class 1 = the answer string could be an answer to the question. 

"Could be" should be interpreted as a type match, e.g., if the question requires the answer to be a person or a number or a date.

Examples: 

- "question: What number comes after five? answer: four" → this should be class 1 as the answer is a number (even if it's not the right number).
- "question: Which person is associated with Kanye West? answer: a tree" → this should be class 0 as a tree is not a person.


## Base model details 

The base model is bert-base-uncased. For this experiment, I only use the "squad" dataset after preprocessing it to bring it to the required format.