File size: 2,502 Bytes
c6f3d6d
2f11b57
 
 
 
 
 
 
 
 
 
 
824a6b2
 
 
 
 
 
2f11b57
e27790a
 
 
 
 
 
a4e1447
 
 
 
 
 
 
 
 
 
e27790a
 
a4e1447
 
 
 
 
 
 
 
 
 
 
 
e27790a
 
a4e1447
 
 
 
 
e27790a
 
 
a4e1447
 
 
 
 
 
 
 
 
e27790a
c6f3d6d
a4e1447
 
e27790a
a4e1447
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: McNemar
emoji: 🤗 
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- comparison
description: >-
  McNemar's test is a diagnostic test over a contingency table resulting from the predictions of two classifiers. The test compares the sensitivity and specificity of the diagnostic tests on the same group reference labels. It can be computed with:
  McNemar = (SE - SP)**2 / SE + SP
  Where:
  SE: Sensitivity (Test 1 positive; Test 2 negative)
  SP: Specificity (Test 1 negative; Test 2 positive)
---


# Comparison Card for McNemar

## Comparison description

McNemar's test is a non-parametric diagnostic test over a contingency table resulting from the predictions of two classifiers. The test compares the sensitivity and specificity of the diagnostic tests on the same group reference labels. It can be computed with:

McNemar = (SE - SP)**2 / SE + SP

Where:
* SE: Sensitivity (Test 1 positive; Test 2 negative)
* SP: Specificity (Test 1 negative; Test 2 positive)

In other words, SE and SP are the diagonal elements of the contingency table for the classifier predictions (`predictions1` and `predictions2`) with respect to the ground truth `references`.

## How to use 

The McNemar comparison calculates the proportions of responses that exhibit disagreement between two classifiers. It is used to analyze paired nominal data.

## Inputs

Its arguments are:

`predictions1`: a list of predictions from the first model.

`predictions2`: a list of predictions from the second model.

`references`: a list of the ground truth reference labels.

## Output values

The McNemar comparison outputs two things:

`stat`: The McNemar statistic.

`p`: The p value.

## Examples 

Example comparison:

```python
mcnemar = evaluate.load("mcnemar")
results = mcnemar.compute(references=[1, 0, 1], predictions1=[1, 1, 1], predictions2=[1, 0, 1])
print(results)
{'stat': 1.0, 'p': 0.31731050786291115}
```

## Limitations and bias

The McNemar test is a non-parametric test, so it has relatively few assumptions (basically only that the observations are independent). It should be used used to analyze paired nominal data only.

## Citations

```bibtex
@article{mcnemar1947note,
  title={Note on the sampling error of the difference between correlated proportions or percentages},
  author={McNemar, Quinn},
  journal={Psychometrika},
  volume={12},
  number={2},
  pages={153--157},
  year={1947},
  publisher={Springer-Verlag}
}
```