Spaces:
Sleeping
Sleeping
Commit
·
ca430b9
1
Parent(s):
377e4e1
init
Browse files- Dockerfile +47 -0
- Example.jpg +0 -0
- README.md +111 -6
- app.py +333 -0
- requirements.txt +4 -0
Dockerfile
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Use Ubuntu as base image
|
2 |
+
FROM ubuntu:22.04
|
3 |
+
|
4 |
+
# Prevent interactive prompts during package installation
|
5 |
+
ENV DEBIAN_FRONTEND=noninteractive
|
6 |
+
|
7 |
+
# Install system dependencies
|
8 |
+
RUN apt-get update && apt-get install -y \
|
9 |
+
python3 \
|
10 |
+
python3-pip \
|
11 |
+
curl \
|
12 |
+
wget \
|
13 |
+
git \
|
14 |
+
net-tools \
|
15 |
+
&& rm -rf /var/lib/apt/lists/*
|
16 |
+
|
17 |
+
# Install Ollama
|
18 |
+
RUN curl -fsSL https://ollama.com/install.sh | sh
|
19 |
+
|
20 |
+
# Set working directory
|
21 |
+
WORKDIR /app
|
22 |
+
|
23 |
+
# Copy requirements and install Python dependencies
|
24 |
+
COPY requirements.txt .
|
25 |
+
RUN pip3 install --no-cache-dir -r requirements.txt
|
26 |
+
|
27 |
+
# Copy application code
|
28 |
+
COPY . .
|
29 |
+
|
30 |
+
# Create startup script
|
31 |
+
RUN echo '#!/bin/bash\n\
|
32 |
+
# Start Ollama server\n\
|
33 |
+
ollama serve & \n\
|
34 |
+
sleep 5\n\
|
35 |
+
\n\
|
36 |
+
# Pull the model if not exists\n\
|
37 |
+
ollama pull deepseek-r1:1.5b\n\
|
38 |
+
\n\
|
39 |
+
# Start the Gradio app\n\
|
40 |
+
exec python3 -u app.py\n\
|
41 |
+
' > start.sh && chmod +x start.sh
|
42 |
+
|
43 |
+
# Expose port for Gradio web interface
|
44 |
+
EXPOSE 7860
|
45 |
+
|
46 |
+
# Run the application
|
47 |
+
ENTRYPOINT ["./start.sh"]
|
Example.jpg
ADDED
![]() |
README.md
CHANGED
@@ -1,11 +1,116 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
colorFrom: blue
|
5 |
-
colorTo:
|
6 |
-
sdk:
|
|
|
|
|
7 |
pinned: false
|
8 |
-
short_description: version 1.1
|
9 |
---
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: ASR Evaluation Tool
|
3 |
+
emoji: 🎯
|
4 |
colorFrom: blue
|
5 |
+
colorTo: red
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.16.0
|
8 |
+
app_file: app.py
|
9 |
pinned: false
|
|
|
10 |
---
|
11 |
|
12 |
+
# ASR Evaluation Tool (Ver 1.1)
|
13 |
+
|
14 |
+
This Gradio app provides a user-friendly interface for calculating Word Error Rate (WER) and related metrics between reference and hypothesis texts. It's particularly useful for evaluating speech recognition or machine translation outputs.
|
15 |
+
|
16 |
+
## Features
|
17 |
+
|
18 |
+
- Calculate WER, MER, WIL, and WIP metrics
|
19 |
+
- Text normalization options
|
20 |
+
- Custom word filtering
|
21 |
+
- Detailed error analysis
|
22 |
+
- Example inputs for testing
|
23 |
+
|
24 |
+
## How to Use
|
25 |
+
|
26 |
+
1. Enter or paste your reference text
|
27 |
+
2. Enter or paste your hypothesis text
|
28 |
+
3. Configure options (normalization, word filtering)
|
29 |
+
4. Click "Calculate WER" to see results
|
30 |
+
|
31 |
+
NOTE: There might be a 30-second delay due to the r1:1.5B model being called for medical term recall calculations.
|
32 |
+
|
33 |
+

|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
## Local Development
|
38 |
+
|
39 |
+
1. Clone the repository:
|
40 |
+
```bash
|
41 |
+
git clone https://github.com/yourusername/wer-evaluation-tool.git
|
42 |
+
cd wer-evaluation-tool
|
43 |
+
```
|
44 |
+
|
45 |
+
2. Create and activate a virtual environment using `uv`:
|
46 |
+
```bash
|
47 |
+
uv venv
|
48 |
+
source .venv/bin/activate # On Unix/macOS
|
49 |
+
# or
|
50 |
+
.venv\Scripts\activate # On Windows
|
51 |
+
```
|
52 |
+
|
53 |
+
3. Install dependencies:
|
54 |
+
```bash
|
55 |
+
uv pip install -r requirements.txt
|
56 |
+
```
|
57 |
+
|
58 |
+
4. Run the app locally:
|
59 |
+
```bash
|
60 |
+
uv run python app_gradio.py
|
61 |
+
```
|
62 |
+
|
63 |
+
## Installation
|
64 |
+
|
65 |
+
You can install the package directly from PyPI:
|
66 |
+
|
67 |
+
```bash
|
68 |
+
uv pip install wer-evaluation-tool
|
69 |
+
```
|
70 |
+
|
71 |
+
## Testing
|
72 |
+
|
73 |
+
Run the test suite using pytest:
|
74 |
+
|
75 |
+
```bash
|
76 |
+
uv run pytest tests/
|
77 |
+
```
|
78 |
+
|
79 |
+
## Contributing
|
80 |
+
|
81 |
+
1. Fork the repository
|
82 |
+
2. Create a new branch (`git checkout -b feature/improvement`)
|
83 |
+
3. Make your changes
|
84 |
+
4. Run tests to ensure everything works
|
85 |
+
5. Commit your changes (`git commit -am 'Add new feature'`)
|
86 |
+
6. Push to the branch (`git push origin feature/improvement`)
|
87 |
+
7. Create a Pull Request
|
88 |
+
|
89 |
+
## License
|
90 |
+
|
91 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
92 |
+
|
93 |
+
## Acknowledgments
|
94 |
+
|
95 |
+
- Thanks to all contributors who have helped with the development
|
96 |
+
- Inspired by the need for better speech recognition evaluation tools
|
97 |
+
- Built with [Gradio](https://gradio.app/)
|
98 |
+
|
99 |
+
## Contact
|
100 |
+
|
101 |
+
For questions or feedback, please:
|
102 |
+
- Open an issue in the GitHub repository
|
103 |
+
- Contact the maintainers at [email/contact information]
|
104 |
+
|
105 |
+
## Citation
|
106 |
+
|
107 |
+
If you use this tool in your research, please cite:
|
108 |
+
|
109 |
+
```bibtex
|
110 |
+
@software{wer_evaluation_tool,
|
111 |
+
title = {WER Evaluation Tool},
|
112 |
+
author = {Your Name},
|
113 |
+
year = {2024},
|
114 |
+
url = {https://github.com/yourusername/wer-evaluation-tool}
|
115 |
+
}
|
116 |
+
```
|
app.py
ADDED
@@ -0,0 +1,333 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import jiwer
|
3 |
+
import pandas as pd
|
4 |
+
import logging
|
5 |
+
from typing import List, Optional, Tuple, Dict
|
6 |
+
from ollama import Client
|
7 |
+
import re
|
8 |
+
import os
|
9 |
+
|
10 |
+
# Set up logging configuration
|
11 |
+
logging.basicConfig(
|
12 |
+
level=logging.INFO,
|
13 |
+
format='%(asctime)s - %(levelname)s - %(message)s',
|
14 |
+
force=True,
|
15 |
+
handlers=[
|
16 |
+
logging.StreamHandler(),
|
17 |
+
]
|
18 |
+
)
|
19 |
+
logger = logging.getLogger(__name__)
|
20 |
+
|
21 |
+
def calculate_wer_metrics(
|
22 |
+
hypothesis: str,
|
23 |
+
reference: str,
|
24 |
+
normalize: bool = True,
|
25 |
+
words_to_filter: Optional[List[str]] = None
|
26 |
+
) -> Dict:
|
27 |
+
"""
|
28 |
+
Calculate WER metrics between hypothesis and reference texts.
|
29 |
+
|
30 |
+
Args:
|
31 |
+
hypothesis (str): The hypothesis text
|
32 |
+
reference (str): The reference text
|
33 |
+
normalize (bool): Whether to normalize texts before comparison
|
34 |
+
words_to_filter (List[str], optional): Words to filter out before comparison
|
35 |
+
|
36 |
+
Returns:
|
37 |
+
dict: Dictionary containing WER metrics
|
38 |
+
|
39 |
+
Raises:
|
40 |
+
ValueError: If inputs are invalid or result in empty text after processing
|
41 |
+
"""
|
42 |
+
logger.info(f"Calculating WER metrics with inputs - Hypothesis: {hypothesis}, Reference: {reference}")
|
43 |
+
|
44 |
+
# Validate inputs
|
45 |
+
if not hypothesis.strip() or not reference.strip():
|
46 |
+
raise ValueError("Both hypothesis and reference texts must contain non-empty strings")
|
47 |
+
|
48 |
+
if normalize:
|
49 |
+
# Define basic transformations
|
50 |
+
basic_transform = jiwer.Compose([
|
51 |
+
jiwer.ExpandCommonEnglishContractions(),
|
52 |
+
jiwer.ToLowerCase(),
|
53 |
+
jiwer.RemoveMultipleSpaces(),
|
54 |
+
jiwer.RemovePunctuation(),
|
55 |
+
jiwer.Strip(),
|
56 |
+
jiwer.ReduceToListOfListOfWords()
|
57 |
+
])
|
58 |
+
|
59 |
+
if words_to_filter and any(words_to_filter):
|
60 |
+
def filter_words_transform(words: List[str]) -> List[str]:
|
61 |
+
filtered = [word for word in words
|
62 |
+
if word.lower() not in [w.lower() for w in words_to_filter]]
|
63 |
+
if not filtered:
|
64 |
+
raise ValueError("Text is empty after filtering words")
|
65 |
+
return filtered
|
66 |
+
|
67 |
+
transformation = jiwer.Compose([
|
68 |
+
basic_transform,
|
69 |
+
filter_words_transform
|
70 |
+
])
|
71 |
+
else:
|
72 |
+
transformation = basic_transform
|
73 |
+
|
74 |
+
# Pre-check the transformed text
|
75 |
+
try:
|
76 |
+
transformed_ref = transformation(reference)
|
77 |
+
transformed_hyp = transformation(hypothesis)
|
78 |
+
if not transformed_ref or not transformed_hyp:
|
79 |
+
raise ValueError("Text is empty after normalization")
|
80 |
+
logger.debug(f"Transformed reference: {transformed_ref}")
|
81 |
+
logger.debug(f"Transformed hypothesis: {transformed_hyp}")
|
82 |
+
except Exception as e:
|
83 |
+
logger.error(f"Transformation error: {str(e)}")
|
84 |
+
raise ValueError(f"Error during text transformation: {str(e)}")
|
85 |
+
|
86 |
+
measures = jiwer.compute_measures(
|
87 |
+
truth=reference,
|
88 |
+
hypothesis=hypothesis,
|
89 |
+
truth_transform=transformation,
|
90 |
+
hypothesis_transform=transformation
|
91 |
+
)
|
92 |
+
else:
|
93 |
+
measures = jiwer.compute_measures(
|
94 |
+
truth=reference,
|
95 |
+
hypothesis=hypothesis
|
96 |
+
)
|
97 |
+
|
98 |
+
return measures
|
99 |
+
|
100 |
+
# Initialize Ollama client
|
101 |
+
client = Client(host='http://localhost:11434')
|
102 |
+
|
103 |
+
def extract_medical_terms(text: str) -> List[str]:
|
104 |
+
"""
|
105 |
+
Extract medical terms from text using Qwen model via Ollama.
|
106 |
+
|
107 |
+
Args:
|
108 |
+
text (str): Input text
|
109 |
+
|
110 |
+
Returns:
|
111 |
+
List[str]: List of extracted medical terms
|
112 |
+
"""
|
113 |
+
prompt = f"""Extract all medical terms from the following text.
|
114 |
+
Return only the medical terms as a comma-separated list.
|
115 |
+
Text: {text}"""
|
116 |
+
|
117 |
+
try:
|
118 |
+
response = client.generate(
|
119 |
+
model='deepseek-r1:1.5b',
|
120 |
+
prompt=prompt,
|
121 |
+
stream=False
|
122 |
+
)
|
123 |
+
|
124 |
+
# Get the response text
|
125 |
+
response_text = response['response']
|
126 |
+
|
127 |
+
# Remove the thinking process
|
128 |
+
if '<think>' in response_text and '</think>' in response_text:
|
129 |
+
# Extract everything after </think>
|
130 |
+
medical_terms_text = response_text.split('</think>')[-1].strip()
|
131 |
+
else:
|
132 |
+
medical_terms_text = response_text
|
133 |
+
|
134 |
+
# Parse the comma-separated response
|
135 |
+
medical_terms = [term.strip() for term in medical_terms_text.split(',')]
|
136 |
+
# Remove empty terms and clean up
|
137 |
+
return [term for term in medical_terms if term and not term.startswith('<') and not term.endswith('>')]
|
138 |
+
|
139 |
+
except Exception as e:
|
140 |
+
logger.error(f"Error in medical term extraction: {str(e)}")
|
141 |
+
return []
|
142 |
+
|
143 |
+
def calculate_medical_recall(
|
144 |
+
hypothesis_terms: List[str],
|
145 |
+
reference_terms: List[str]
|
146 |
+
) -> float:
|
147 |
+
"""
|
148 |
+
Calculate medical term recall rate.
|
149 |
+
|
150 |
+
Args:
|
151 |
+
hypothesis_terms (List[str]): Medical terms from hypothesis
|
152 |
+
reference_terms (List[str]): Medical terms from reference
|
153 |
+
|
154 |
+
Returns:
|
155 |
+
float: Recall rate
|
156 |
+
"""
|
157 |
+
if not reference_terms:
|
158 |
+
return 1.0 if not hypothesis_terms else 0.0
|
159 |
+
|
160 |
+
correct_terms = set(hypothesis_terms) & set(reference_terms)
|
161 |
+
return len(correct_terms) / len(set(reference_terms))
|
162 |
+
|
163 |
+
def process_inputs(
|
164 |
+
reference: str,
|
165 |
+
hypothesis: str,
|
166 |
+
normalize: bool,
|
167 |
+
words_to_filter: str
|
168 |
+
) -> Tuple[str, str, str, str]:
|
169 |
+
"""
|
170 |
+
Process inputs and calculate both WER and medical term recall metrics.
|
171 |
+
|
172 |
+
Args:
|
173 |
+
reference (str): Reference text
|
174 |
+
hypothesis (str): Hypothesis text
|
175 |
+
normalize (bool): Whether to normalize text
|
176 |
+
words_to_filter (str): Comma-separated words to filter
|
177 |
+
|
178 |
+
Returns:
|
179 |
+
Tuple[str, str, str, str]: HTML formatted main metrics, error analysis,
|
180 |
+
and explanations
|
181 |
+
"""
|
182 |
+
if not reference or not hypothesis:
|
183 |
+
return "Please provide both reference and hypothesis texts.", "", "", ""
|
184 |
+
|
185 |
+
try:
|
186 |
+
# Extract medical terms
|
187 |
+
reference_terms = extract_medical_terms(reference)
|
188 |
+
hypothesis_terms = extract_medical_terms(hypothesis)
|
189 |
+
|
190 |
+
# Calculate medical recall
|
191 |
+
med_recall = calculate_medical_recall(hypothesis_terms, reference_terms)
|
192 |
+
|
193 |
+
# Calculate WER metrics
|
194 |
+
filter_words = [word.strip() for word in words_to_filter.split(",")] if words_to_filter else None
|
195 |
+
measures = calculate_wer_metrics(
|
196 |
+
hypothesis=hypothesis,
|
197 |
+
reference=reference,
|
198 |
+
normalize=normalize,
|
199 |
+
words_to_filter=filter_words
|
200 |
+
)
|
201 |
+
|
202 |
+
# Format metrics
|
203 |
+
metrics_df = pd.DataFrame({
|
204 |
+
'Metric': ['WER', 'MER', 'WIL', 'WIP', 'Medical Term Recall'],
|
205 |
+
'Value': [
|
206 |
+
f"{measures['wer']:.3f}",
|
207 |
+
f"{measures['mer']:.3f}",
|
208 |
+
f"{measures['wil']:.3f}",
|
209 |
+
f"{measures['wip']:.3f}",
|
210 |
+
f"{med_recall:.3f}"
|
211 |
+
]
|
212 |
+
})
|
213 |
+
|
214 |
+
# Format error analysis
|
215 |
+
error_df = pd.DataFrame({
|
216 |
+
'Metric': ['Substitutions', 'Deletions', 'Insertions', 'Hits'],
|
217 |
+
'Count': [
|
218 |
+
measures['substitutions'],
|
219 |
+
measures['deletions'],
|
220 |
+
measures['insertions'],
|
221 |
+
measures['hits']
|
222 |
+
]
|
223 |
+
})
|
224 |
+
|
225 |
+
# Format medical terms comparison
|
226 |
+
med_terms_df = pd.DataFrame({
|
227 |
+
'Source': ['Reference', 'Hypothesis'],
|
228 |
+
'Medical Terms': [
|
229 |
+
', '.join(reference_terms),
|
230 |
+
', '.join(hypothesis_terms)
|
231 |
+
]
|
232 |
+
})
|
233 |
+
|
234 |
+
metrics_html = metrics_df.to_html(index=False)
|
235 |
+
error_html = error_df.to_html(index=False)
|
236 |
+
med_terms_html = med_terms_df.to_html(index=False)
|
237 |
+
|
238 |
+
explanation = f"""
|
239 |
+
<h3>Metrics Explanation:</h3>
|
240 |
+
<ul>
|
241 |
+
<li><b>WER (Word Error Rate)</b>: The percentage of words that were incorrectly predicted</li>
|
242 |
+
<li><b>MER (Match Error Rate)</b>: The percentage of words that were incorrectly matched</li>
|
243 |
+
<li><b>WIL (Word Information Lost)</b>: The percentage of word information that was lost</li>
|
244 |
+
<li><b>WIP (Word Information Preserved)</b>: The percentage of word information that was preserved</li>
|
245 |
+
<li><b>Medical Term Recall</b>: The proportion of reference medical terms that were correctly identified in the hypothesis</li>
|
246 |
+
</ul>
|
247 |
+
<h3>Extracted Medical Terms:</h3>
|
248 |
+
{med_terms_html}
|
249 |
+
"""
|
250 |
+
|
251 |
+
return metrics_html, error_html, explanation, ""
|
252 |
+
|
253 |
+
except Exception as e:
|
254 |
+
error_msg = f"Error in processing: {str(e)}"
|
255 |
+
logger.error(error_msg)
|
256 |
+
return "", "", "", error_msg
|
257 |
+
|
258 |
+
def load_example() -> Tuple[str, str]:
|
259 |
+
"""Load example texts for demonstration."""
|
260 |
+
return (
|
261 |
+
"The patient shows signs of heart attack and hypertension.",
|
262 |
+
"The patient shows signs of heart attack and high blood pressure."
|
263 |
+
)
|
264 |
+
|
265 |
+
def create_interface() -> gr.Blocks:
|
266 |
+
"""Create the Gradio interface."""
|
267 |
+
with gr.Blocks(title="WER Evaluation Tool") as interface:
|
268 |
+
gr.Markdown("# Word Error Rate (WER) Evaluation Tool")
|
269 |
+
gr.Markdown(
|
270 |
+
"This tool helps you evaluate the Word Error Rate (WER) between a reference "
|
271 |
+
"text and a hypothesis text. WER is commonly used in speech recognition and "
|
272 |
+
"machine translation evaluation."
|
273 |
+
)
|
274 |
+
|
275 |
+
with gr.Row():
|
276 |
+
with gr.Column():
|
277 |
+
reference = gr.Textbox(
|
278 |
+
label="Reference Text",
|
279 |
+
placeholder="Enter the reference text here...",
|
280 |
+
lines=5
|
281 |
+
)
|
282 |
+
with gr.Column():
|
283 |
+
hypothesis = gr.Textbox(
|
284 |
+
label="Hypothesis Text",
|
285 |
+
placeholder="Enter the hypothesis text here...",
|
286 |
+
lines=5
|
287 |
+
)
|
288 |
+
|
289 |
+
with gr.Row():
|
290 |
+
normalize = gr.Checkbox(
|
291 |
+
label="Normalize text (lowercase, remove punctuation)",
|
292 |
+
value=True
|
293 |
+
)
|
294 |
+
words_to_filter = gr.Textbox(
|
295 |
+
label="Words to filter (comma-separated)",
|
296 |
+
placeholder="e.g., um, uh, ah"
|
297 |
+
)
|
298 |
+
|
299 |
+
with gr.Row():
|
300 |
+
example_btn = gr.Button("Load Example")
|
301 |
+
calculate_btn = gr.Button("Calculate WER", variant="primary")
|
302 |
+
|
303 |
+
with gr.Row():
|
304 |
+
metrics_output = gr.HTML(label="Main Metrics")
|
305 |
+
error_output = gr.HTML(label="Error Analysis")
|
306 |
+
|
307 |
+
explanation_output = gr.HTML()
|
308 |
+
error_msg_output = gr.HTML()
|
309 |
+
|
310 |
+
# Event handlers
|
311 |
+
example_btn.click(
|
312 |
+
load_example,
|
313 |
+
outputs=[reference, hypothesis]
|
314 |
+
)
|
315 |
+
|
316 |
+
calculate_btn.click(
|
317 |
+
process_inputs,
|
318 |
+
inputs=[reference, hypothesis, normalize, words_to_filter],
|
319 |
+
outputs=[metrics_output, error_output, explanation_output, error_msg_output]
|
320 |
+
)
|
321 |
+
|
322 |
+
return interface
|
323 |
+
|
324 |
+
if __name__ == "__main__":
|
325 |
+
logger.info("Application started")
|
326 |
+
app = create_interface()
|
327 |
+
# Explicitly configure Gradio to be accessible from outside the container
|
328 |
+
app.launch(
|
329 |
+
server_name="0.0.0.0", # Bind to all interfaces
|
330 |
+
server_port=7860,
|
331 |
+
share=False, # Don't create a public URL
|
332 |
+
debug=True # Enable debug mode for more information
|
333 |
+
)
|
requirements.txt
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio==5.16.0
|
2 |
+
jiwer==3.1.0
|
3 |
+
pandas==2.2.0
|
4 |
+
ollama==0.4.5
|