SaraAlthubaiti
/

TinyOctopus

Audio-Text-to-Text

Model card Files Files and versions Community

SaraAlthubaiti commited on 8 days ago

Commit

a996fdd

·

verified ·

1 Parent(s): f35ac83

Update README.md

Files changed (1) hide show

README.md +20 -12

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: mit
 datasets:
 - rsalshalan/QASR
 - DynamicSuperb/DialectIdentification_ADI17
@@ -20,8 +19,8 @@ pipeline_tag: audio-text-to-text
 **TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
 - **Bilingual Automatic Speech Recognition (ASR)** 🗣️
-- **Speech Translation** 🌍
-- **Dialect Identification**
 TinyOctopus maintaining the architectural principles of the following structure:
@@ -56,16 +55,25 @@ print("Generated Text:", output)
 ## Evaluation Results
-| Metric                                      | Score  |
-|---------------------------------------------|--------|
-| **ASR_QASR (Arabic - WER)**                 | 16.00  |
-| **ASR_ibrispeech&tedlium (English - WER)**  | 4.50   |
-| **Translation (BLEU)**                      | ?  |
-| **Dialect Identification (Accuracy)**       | 70.59  |
-![Confusion matrix of adi17 test set](https://huggingface.co/SaraAlthubaiti/TinyOctopus/resolve/main/images/CM_for_DI.png)
-## License
-TinyOctopus is under [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md).

 ---
 datasets:
 - rsalshalan/QASR
 - DynamicSuperb/DialectIdentification_ADI17
 **TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
 - **Bilingual Automatic Speech Recognition (ASR)** 🗣️
+- **Arabic to English Speech Translation** 🌍
+- **Spoken Arabic Dialect Identification**
 TinyOctopus maintaining the architectural principles of the following structure:
 ## Evaluation Results
+## ASR Performance (WER & Error Breakdown)
+| **Tasks**                           | **WER (%)** | **Substitution (%)** | **Deletion (%)** | **Insertion (%)** |
+|--------------------------------------|------------|----------------------|------------------|------------------|
+| **ASR_QASR (Arabic)**                | 16.00      | 9.5           | 2.7      | 3.8         |
+| **ASR_ibrispeech&tedlium (English)** | 4.50       | 3.0           | 0.8      | 0.7         |
+---
+## Translation Performance (BLEU Scores)
+| **Tasks**       | **BLEU (GPT-4o)** | **BLEU (Google)** |
+|----------------|------------------|------------------|
+| **Translation** | 55.05           | 43.23           |
+---
+## Dialect Identification Accuracy
+| **Tasks**                  | **Accuracy (%)** |
+|----------------------------|------------------|
+| **Dialect Identification** | 70.59            |
+![Confusion matrix of adi17 test set](https://huggingface.co/SaraAlthubaiti/TinyOctopus/resolve/main/images/CM_for_DI.png)