Spaces:
Running
Running
<html lang="en"> | |
<head> | |
<meta charset="UTF-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
<title>Speech-to-Speech Model Comparison</title> | |
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet"> | |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"> | |
<style> | |
body { | |
background-color: #f0f8ff; | |
font-family: 'Arial', sans-serif; | |
} | |
.container { | |
background-color: #fff; | |
border-radius: 15px; | |
box-shadow: 0 6px 15px rgba(0, 0, 0, 0.15); | |
padding: 40px; | |
max-width: 800px; | |
margin: 30px auto; | |
} | |
h3 { | |
font-size: 2rem; | |
font-weight: bold; | |
color: #333; | |
text-align: center; | |
margin-bottom: 20px; | |
} | |
p { | |
color: #555; | |
font-size: 1rem; | |
line-height: 1.8; | |
} | |
.btn { | |
border-radius: 25px; | |
font-size: 1.1rem; | |
padding: 12px 25px; | |
font-weight: bold; | |
transition: background-color 0.3s ease, transform 0.2s ease; | |
} | |
.btn-primary { | |
background-color: #007bff; | |
border: none; | |
} | |
.btn-primary:hover { | |
background-color: #0056b3; | |
transform: scale(1.05); | |
} | |
.icon { | |
color: #f39c12; | |
margin-right: 5px; | |
} | |
.section-title { | |
font-size: 1.2rem; | |
font-weight: bold; | |
color: #007bff; | |
display: flex; | |
align-items: center; | |
margin-top: 20px; | |
} | |
.section-title .fa { | |
margin-right: 10px; | |
} | |
.audio-container { | |
text-align: center; | |
margin-top: 20px; | |
} | |
.audio-container .audio-item { | |
display: flex; | |
justify-content: center; | |
align-items: center; | |
margin-bottom: 15px; | |
} | |
.audio-container .audio-item span { | |
margin-right: 10px; | |
font-weight: bold; | |
} | |
audio { | |
display: inline-block; | |
} | |
</style> | |
</head> | |
<body> | |
<div class="container py-5"> | |
<h3 class="mb-4">⚖️ Speech-to-Speech Model Comparison</h3> | |
<div id="evaluation-info" class="mb-5"> | |
<p class="text-start"> | |
<span class="section-title"><i class="fas fa-info-circle"></i> Welcome to the Speech-to-Speech (S2S) | |
Model Evaluation! 👏</span> | |
In this evaluation, you will assess the performance of different S2S models, such as | |
<strong>ChatGPT-4o</strong>, <strong>FunAudioLLM</strong>, <strong>SpeechGPT</strong>, | |
<strong>Mini-Omni</strong>, <strong>Cascade</strong>, and <strong>LLaMA-Omni</strong>. | |
<br> | |
<span>🎯 <strong>Goal:</strong> Test how well these models handle speech tasks across different domains.<span> | |
<span class="section-title"><i class="fas fa-tasks"></i> How It Works</span> | |
Once you select a specific domain and task (e.g., <em>Educational Tutoring</em> and <em>Rhythm | |
Control</em>), | |
you will proceed to the evaluation stage. In each round, you will be presented with an audio input. | |
<span><strong> | |
<br> | |
🌰 Example:</strong></span> | |
<div class="audio-container"> | |
<div class="audio-item"> | |
<span>Audio Sample:</span> | |
<audio controls> | |
<source src="/static/audio/sample/input_audio.wav" type="audio/wav"> | |
</audio> | |
</div> | |
</div> | |
The corresponding text is: | |
<em>"Say the following sentence at my speed first, then say it again very slowly: | |
'Artificial intelligence is changing the world in many ways.'" </em> 🧠 | |
<small>(Note: the audio plays at 1.5x the normal speed.)</small> | |
<span class="section-title"><i class="fas fa-star"></i> Model Performance</span> | |
<div class="audio-container"> | |
<div class="audio-item"> | |
<span>ChatGPT-4o:</span> | |
<audio controls> | |
<source src="/static/audio/sample/4o_audio.wav" type="audio/wav"> | |
</audio> | |
</div> | |
<p style="margin: 0; text-align: left;"> | |
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed. | |
</p> | |
<p style="margin: 0; text-align: left;"> | |
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or | |
missing | |
information. | |
</p> | |
<br> | |
<div class="audio-item"> | |
<span>FunAudioLLM:</span> | |
<audio controls> | |
<source src="/static/audio/sample/FunAudio_audio.wav" type="audio/wav"> | |
</audio> | |
</div> | |
<p style="margin: 0; text-align: left;"> | |
🎙️ <strong>Speech:</strong> Partially followed the instruction on speed. | |
</p> | |
<p style="margin: 0; text-align: left;"> | |
🧾 <strong>Semantics:</strong> Accurately followed the instruction, with no semantic deviation or | |
missing | |
information. | |
</p> | |
<br> | |
<div class="audio-item"> | |
<span>SpeechGPT:</span> | |
<audio controls> | |
<source src="/static/audio/sample/SpeechGPT.wav" type="audio/wav"> | |
</audio> | |
</div> | |
<p style="margin: 0; text-align: left;"> | |
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed. | |
</p> | |
<p style="margin: 0; text-align: left;"> | |
🧾 <strong>Semantics:</strong> Partially followed the instruction, with minor semantic deviation and | |
missing information. | |
</p> | |
<br> | |
<div class="audio-item"> | |
<span>Mini-Omni:</span> | |
<audio controls> | |
<source src="/static/audio/sample/mini-omni.wav" type="audio/wav"> | |
</audio> | |
</div> | |
<p style="margin: 0; text-align: left;"> | |
🎙️ <strong>Speech:</strong> Did not follow the instruction on speed. | |
</p> | |
<p style="margin: 0; text-align: left;"> | |
🧾 <strong>Semantics:</strong> Did not follow the instruction, with significant semantic deviation | |
and missing information. | |
</p> | |
</div> | |
<p class="text-start"> | |
After making your choice, you'll proceed to the next round. 🔄 | |
</p> | |
<p class="text-start"> | |
<strong>Click the button below to start the evaluation! 🚀</strong> | |
</p> | |
</div> | |
<div class="text-center"> | |
<a href="http://71.132.14.167:6002/" target="_blank" class="btn btn-primary"><i class="fas fa-play"></i> | |
Start Evaluation</a> | |
</div> | |
</div> | |
</body> | |
</html> |