Spaces:
Runtime error
Runtime error
Alexander Seifert
commited on
Commit
·
61fec8d
1
Parent(s):
e18be25
update README
Browse files- README.md +20 -1
- subpages/home.py +3 -0
README.md
CHANGED
@@ -10,11 +10,30 @@ app_file: main.py
|
|
10 |
pinned: true
|
11 |
---
|
12 |
|
13 |
-
# 🏷️ ExplaiNER
|
14 |
|
15 |
Error Analysis is an important but often overlooked part of the data science project lifecycle, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an extensive toolkit to probe any NER model/dataset combination, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further improvements.
|
16 |
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## Sections
|
19 |
|
20 |
|
|
|
10 |
pinned: true
|
11 |
---
|
12 |
|
13 |
+
# 🏷️ ExplaiNER: Error Analysis for NER models & datasets
|
14 |
|
15 |
Error Analysis is an important but often overlooked part of the data science project lifecycle, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an extensive toolkit to probe any NER model/dataset combination, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further improvements.
|
16 |
|
17 |
|
18 |
+
Some interesting visualizations techniques:
|
19 |
+
|
20 |
+
* customizable visualization of neural network activation, based on the embedding and the feed-forward layers of our transformer. (https://aclanthology.org/2021.acl-demo.30/)
|
21 |
+
* customizable similarity map of a 2d projection of our model's final layer's hidden states, using different algorithms (a bit like the [Tensorflow Embedding Projector](https://projector.tensorflow.org/))
|
22 |
+
* inline HTML representation of samples with token-level prediction + labels (my own; see 'Samples by loss' page for more info)
|
23 |
+
* automatic selection of foreground-color (black/white) for a user-selected background-color
|
24 |
+
* some fancy pandas styling here and there
|
25 |
+
|
26 |
+
|
27 |
+
Libraries important to this project:
|
28 |
+
|
29 |
+
* `streamlit` for demoing (custom multi-page feature hacked in, also using session state)
|
30 |
+
* `plotly` and `matplotlib` for charting
|
31 |
+
* `transformers` for providing the models, and `datasets` for, well, the datasets
|
32 |
+
* a forked, slightly modified version of [`ecco`](https://github.com/jalammar/ecco) for visualizing the neural net activations
|
33 |
+
* `sentence_transformers` for finding potential duplicates
|
34 |
+
* `scikit-learn` for TruncatedSVD & PCA, `umap-learn` for UMAP
|
35 |
+
|
36 |
+
|
37 |
## Sections
|
38 |
|
39 |
|
subpages/home.py
CHANGED
@@ -54,6 +54,9 @@ class HomePage(Page):
|
|
54 |
st.write(
|
55 |
"**Error Analysis is an important but often overlooked part of the data science project lifecycle**, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an **extensive toolkit to probe any NER model/dataset combination**, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further **improving both model AND dataset**."
|
56 |
)
|
|
|
|
|
|
|
57 |
|
58 |
col1, _, col2a, col2b = st.columns([1, 0.05, 0.15, 0.15])
|
59 |
|
|
|
54 |
st.write(
|
55 |
"**Error Analysis is an important but often overlooked part of the data science project lifecycle**, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an **extensive toolkit to probe any NER model/dataset combination**, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further **improving both model AND dataset**."
|
56 |
)
|
57 |
+
st.write(
|
58 |
+
"_Caveat: Even though everything is customizable here, I haven't tested this app much with different models/datasets._"
|
59 |
+
)
|
60 |
|
61 |
col1, _, col2a, col2b = st.columns([1, 0.05, 0.15, 0.15])
|
62 |
|