Spaces:

ivrit-ai
/

hebrew-transcription-leaderboard

Running

App Files Files Community

hebrew-transcription-leaderboard / src /about.py

benderrodriguez

Update leaderboard, and documentation.

6400e0f 15 days ago

raw

history blame

5.1 kB

	from dataclasses import dataclass
	from enum import Enum

	@dataclass
	class Task:
	benchmark: str
	metric: str
	col_name: str


	# Select your tasks here
	# ---------------------------------------------------
	class Tasks(Enum):
	# task_key in the json file, metric_key in the json file, name to display in the leaderboard
	task0 = Task("anli_r1", "acc", "ANLI")
	task1 = Task("logiqa", "acc_norm", "LogiQA")

	NUM_FEWSHOT = 0 # Change with your few shot
	# ---------------------------------------------------



	# Your leaderboard name
	TITLE = """<h1 align="center" id="space-title">Hebrew Speech Recognition Leaderboard</h1>"""

	# What does your leaderboard evaluate?
	INTRODUCTION_TEXT = """
	Welcome to the Hebrew Speech Recognition Leaderboard! This is a community-driven effort to track and compare the performance
	of various speech recognition models on Hebrew language tasks.

	This leaderboard is maintained by [ivrit.ai](https://ivrit.ai), a project dedicated to advancing Hebrew language AI technologies.
	You can find our work on [GitHub](https://github.com/ivrit-ai) and [Hugging Face](https://huggingface.co/ivrit-ai).

	## Motivation
	Hebrew presents unique challenges for speech recognition due to its rich morphology, absence of written vowels, and diverse
	dialectal variations. This leaderboard aims to:
	- Provide standardized benchmarks for Hebrew ASR evaluation
	- Track progress in Hebrew speech recognition technology
	- Foster collaboration in the Hebrew NLP community
	- Make Hebrew speech technology more accessible

	## Benchmarks
	The following datasets are used in our evaluation:

	### [ivrit-ai/eval-d1](https://huggingface.co/datasets/ivrit-ai/eval-d1)
	- Size: 2 hours
	- Domain: Manual transcription of podcasts. Typical segment length is 5 minutes.
	- Source: Description of source

	### [ivrit-ai/saspeech](https://huggingface.co/datasets/ivrit-ai/saspeech)
	- Size: X hours
	- Domain: Description
	- Source: Description of source

	### [google/fleurs/he](https://huggingface.co/datasets/google/fleurs)
	- Size: X hours
	- Domain: Description
	- Source: Description of source

	### [mozilla-foundation/common_voice_17_0/he](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
	- Size: X hours
	- Domain: Description
	- Source: Description of source

	### [imvladikon/hebrew_speech_kan](https://huggingface.co/datasets/imvladikon/hebrew_speech_kan)
	- Size: X hours
	- Domain: Description
	- Source: Description of source
	"""

	# Technical details about evaluation
	LLM_BENCHMARKS_TEXT = """
	## How it works
	Models are evaluated using Word Error Rate (WER) on each benchmark dataset. The final score is an average of WER across all benchmarks,
	with lower scores indicating better performance.

	Specifically, evaluation is done using the [jiwer](https://github.com/jitsi/jiwer) library.
	Source code for the evaluation can be found [here](https://github.com/ivrit-ai/asr-training/blob/master/evaluate_model.py).

	## Reproducibility
	To evaluate your model on these benchmarks, you can use our evaluation script as follows:

	```bash
	./evaluate_model.py --engine <engine> --model <model> --dataset <dataset:split:column> [--name <name>] [--workers <num_workers>]
	```

	For example, here's how to evaluate ivrit-ai/faster-whisper-v2-d4 on the google/fleurs/he dataset:

	```bash
	./evaluate_model.py --engine faster-whisper --model ivrit-ai/faster-whisper-v2-d4 --name he_il --dataset google/fleurs:test:transcription --workers 1
	```

	"""

	EVALUATION_QUEUE_TEXT = """
	## Submitting a model for evaluation

	### 1) Provide an inference script
	To evaluate your model, we need either:

	a) A simple inference script that takes audio input and returns transcribed text:
	```python
	def transcribe(audio_path: str) -> str:
	# Your model loading and inference code here
	return transcribed_text
	```

	b) Or augment our evaluate_model.py script with your model's implementation.

	### 2) Make sure your model is publicly accessible
	Your model should be available on the Hugging Face Hub with:
	- Public visibility
	- Clear licensing information
	- Basic model card documentation

	### 3) Fill up your model card
	Please include in your model card:
	- Model architecture
	- Training data description
	- Licensing information
	- Any special preprocessing requirements
	- Expected input format (sampling rate, audio format, etc.)

	## In case of evaluation failure
	If your model evaluation fails, please:
	1. Check that your model can be loaded and run locally
	2. Verify your inference script works with our benchmark format
	3. Ensure all dependencies are clearly specified
	4. Contact us through GitHub issues if problems persist
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""
	@misc{marmor2023ivritai,
	title={ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development},
	author={Yanir Marmor and Kinneret Misgav and Yair Lifshitz},
	year={2023},
	eprint={2307.08720},
	archivePrefix={arXiv},
	primaryClass={eess.AS}
	}
	"""