SaraAlthubaiti commited on
Commit
a996fdd
·
verified ·
1 Parent(s): f35ac83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - rsalshalan/QASR
5
  - DynamicSuperb/DialectIdentification_ADI17
@@ -20,8 +19,8 @@ pipeline_tag: audio-text-to-text
20
  **TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
21
 
22
  - **Bilingual Automatic Speech Recognition (ASR)** 🗣️
23
- - **Speech Translation** 🌍
24
- - **Dialect Identification**
25
 
26
  TinyOctopus maintaining the architectural principles of the following structure:
27
 
@@ -56,16 +55,25 @@ print("Generated Text:", output)
56
 
57
  ## Evaluation Results
58
 
59
- | Metric | Score |
60
- |---------------------------------------------|--------|
61
- | **ASR_QASR (Arabic - WER)** | 16.00 |
62
- | **ASR_ibrispeech&tedlium (English - WER)** | 4.50 |
63
- | **Translation (BLEU)** | ? |
64
- | **Dialect Identification (Accuracy)** | 70.59 |
65
 
 
 
 
 
 
 
 
 
66
 
67
- ![Confusion matrix of adi17 test set](https://huggingface.co/SaraAlthubaiti/TinyOctopus/resolve/main/images/CM_for_DI.png)
 
 
 
68
 
69
- ## License
70
 
71
- TinyOctopus is under [MIT license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md).
 
1
  ---
 
2
  datasets:
3
  - rsalshalan/QASR
4
  - DynamicSuperb/DialectIdentification_ADI17
 
19
  **TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
20
 
21
  - **Bilingual Automatic Speech Recognition (ASR)** 🗣️
22
+ - **Arabic to English Speech Translation** 🌍
23
+ - **Spoken Arabic Dialect Identification**
24
 
25
  TinyOctopus maintaining the architectural principles of the following structure:
26
 
 
55
 
56
  ## Evaluation Results
57
 
58
+ ## ASR Performance (WER & Error Breakdown)
59
+ | **Tasks** | **WER (%)** | **Substitution (%)** | **Deletion (%)** | **Insertion (%)** |
60
+ |--------------------------------------|------------|----------------------|------------------|------------------|
61
+ | **ASR_QASR (Arabic)** | 16.00 | 9.5 | 2.7 | 3.8 |
62
+ | **ASR_ibrispeech&tedlium (English)** | 4.50 | 3.0 | 0.8 | 0.7 |
 
63
 
64
+ ---
65
+
66
+ ## Translation Performance (BLEU Scores)
67
+ | **Tasks** | **BLEU (GPT-4o)** | **BLEU (Google)** |
68
+ |----------------|------------------|------------------|
69
+ | **Translation** | 55.05 | 43.23 |
70
+
71
+ ---
72
 
73
+ ## Dialect Identification Accuracy
74
+ | **Tasks** | **Accuracy (%)** |
75
+ |----------------------------|------------------|
76
+ | **Dialect Identification** | 70.59 |
77
 
 
78
 
79
+ ![Confusion matrix of adi17 test set](https://huggingface.co/SaraAlthubaiti/TinyOctopus/resolve/main/images/CM_for_DI.png)