File size: 3,200 Bytes
74fdcd6
7eded29
74fdcd6
7eded29
74fdcd6
 
7eded29
74fdcd6
7eded29
74fdcd6
 
 
7eded29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
title: "XLabel: eXplainable Labeling Assistant"
emoji: 💻
colorFrom: pink
colorTo: gray
sdk: streamlit
sdk_version: 1.15.2
app_file: app.py
pinned: true
license: apache-2.0
---

# XLabel: e**X**plainable **Label**ing Assistant

XLabel is an open-source [Streamlit](https://streamlit.io/) app that takes an explainable machine learning approach to visual-interactive data labeling.

This is the official code of the following paper:
[An Explainable Machine Learning Approach to Visual-Interactive Labeling: A Case Study on Non-communicable Disease Data](https://arxiv.org/abs/2209.12778)
Donlapark Ponnoprat, Parichart Pattarapanitchai, Phimphaka Taninpong, Suthep Suantai

## News (01/01/2023)
* Use tabs instead of radio buttons for multiple labels.
* The app now requires `streamlit>=1.16.0` for the tabs and `interpret>=0.3.0` for handling missing data.

## Features
XLabel can:
* Predict the most probable labels using Explainable Boosting Machine (EBM).
* Show the contributions of each feature towards the predicted labels.
* Provide an option to write the labels directly into the data file (use `XLabel.py`) or save them in a separate file (use `XLabelDL.py`)
* Support data with multiple labels and multiple classes.
* Support data with missing values ([thanks to EBM](https://github.com/interpretml/interpret/issues/18)) and/or non-numeric categorical features.

## Usage
Before using XLabel, the data file must follow the following tabular convention:
* The file must be in either CSV or Excel format.
* The first row of the file must be the names of the columns.
* The first column must contain a unique identifier (id) for each row.
* The label columns must appear last.
In addition, a few instances must have already been labeled, with each class appearing at least once (For example, if a label has five possible classes, then the required minimum number of labeled instances is 5).

With your data file satisfying these conditions, you can now start data labeling with XLabel!
1. Copy `XLabel.py` to the directory that contains the data file and run the `streamlit` command:
    ```
    streamlit run XLabel.py
    ```
    * By design, `XLabel.py` will write the labeled data to the original data file. If instead you would like to download the labeled data as a separate file, use `XLabelDL.py` instead.
    * You can assign a specific list of input features for each label by editing `configs.json` and copying it along with `XLabel.py`. There are also other sidebar options that you can play around as well. Here is an example ofr [`configs.json`](configs.json).
2. Upload a data file (only on the first run), select the options on the sidebar, and then click "**Sample**". The samples with lowest predictive confidences will be shown first in the main screen.
3. Check the suggested labels; you can keep the correct ones and change the wrong ones.
4. Click the "**Submit Labels**" button at the bottom of the page to save the labels. 
    * If you are using `XLabel.py`, the labels will be saved directly to the original data file.
    * If you are using `XLabelDL.py`, you need to click the `Download labeled data` in the sidebar to download the labeled data as a new file.