Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
datasets:
|
4 |
- rsalshalan/QASR
|
5 |
- DynamicSuperb/DialectIdentification_ADI17
|
@@ -20,8 +19,8 @@ pipeline_tag: audio-text-to-text
|
|
20 |
**TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
|
21 |
|
22 |
- **Bilingual Automatic Speech Recognition (ASR)** 🗣️
|
23 |
-
- **Speech Translation** 🌍
|
24 |
-
- **Dialect Identification**
|
25 |
|
26 |
TinyOctopus maintaining the architectural principles of the following structure:
|
27 |
|
@@ -56,16 +55,25 @@ print("Generated Text:", output)
|
|
56 |
|
57 |
## Evaluation Results
|
58 |
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
| **
|
63 |
-
| **
|
64 |
-
| **Dialect Identification (Accuracy)** | 70.59 |
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
-
|
|
|
|
|
|
|
68 |
|
69 |
-
## License
|
70 |
|
71 |
-
|
|
|
1 |
---
|
|
|
2 |
datasets:
|
3 |
- rsalshalan/QASR
|
4 |
- DynamicSuperb/DialectIdentification_ADI17
|
|
|
19 |
**TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
|
20 |
|
21 |
- **Bilingual Automatic Speech Recognition (ASR)** 🗣️
|
22 |
+
- **Arabic to English Speech Translation** 🌍
|
23 |
+
- **Spoken Arabic Dialect Identification**
|
24 |
|
25 |
TinyOctopus maintaining the architectural principles of the following structure:
|
26 |
|
|
|
55 |
|
56 |
## Evaluation Results
|
57 |
|
58 |
+
## ASR Performance (WER & Error Breakdown)
|
59 |
+
| **Tasks** | **WER (%)** | **Substitution (%)** | **Deletion (%)** | **Insertion (%)** |
|
60 |
+
|--------------------------------------|------------|----------------------|------------------|------------------|
|
61 |
+
| **ASR_QASR (Arabic)** | 16.00 | 9.5 | 2.7 | 3.8 |
|
62 |
+
| **ASR_ibrispeech&tedlium (English)** | 4.50 | 3.0 | 0.8 | 0.7 |
|
|
|
63 |
|
64 |
+
---
|
65 |
+
|
66 |
+
## Translation Performance (BLEU Scores)
|
67 |
+
| **Tasks** | **BLEU (GPT-4o)** | **BLEU (Google)** |
|
68 |
+
|----------------|------------------|------------------|
|
69 |
+
| **Translation** | 55.05 | 43.23 |
|
70 |
+
|
71 |
+
---
|
72 |
|
73 |
+
## Dialect Identification Accuracy
|
74 |
+
| **Tasks** | **Accuracy (%)** |
|
75 |
+
|----------------------------|------------------|
|
76 |
+
| **Dialect Identification** | 70.59 |
|
77 |
|
|
|
78 |
|
79 |
+

|