amanu / README.md
katospiegel's picture
results
98d0bf0
---
title: Amanu
emoji: πŸ‘
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 3.44.4
app_file: app.py
pinned: false
---
# This repo's goal is to support the transcription and annotation of audios.
## Parts
- `audio.py`: Everything related to audio preprocessing and analysis.
- `transcription.py`: All code for transcript audios using fast-whisper.
- `diarization.py`: Everything related to pyannotation.
- `textformatting.py`: All related to fomatting the text in specific outputs.
## UI parts
1. Transcription.
2. Diarization.
3. Revision.
4. Output formatting.
## How to access to the service?
The user will logging using a password and user specified by me. That user and password will be manually managed by me.
## Pricing
1. Calculate the fixed cost of a server running for a long period of time.
2. Check if I can use the hibernation period to save some money.
## Development
- [x] Add word time-stamp
- [x] Add Accuracy at word level
- [ ] Add mel spectrogram?
- [ ] Add Whisper parameters to the interface
- [x] Add Whisper X
- [x] Introduce SRT as output
- [x] Obtain txt with Diarization.
- [x] Obtain plain txt with segments.
- [ ] Introduce POS.
- [x] Optional Preprocessing
- [ ] Transcripcion box as the text being written.
Introduce Tab for analysis including POS. Maybe it would be great to have a visualizer with the timestamps and other features in Streamlit. Quizas correcciones.
## Dev
I used huggingface lfs
```
git install lfs
```
```
huggingface-cli lfs-enable-largefiles .
```