amanu / README.md
katospiegel's picture
results
98d0bf0
|
raw
history blame
1.54 kB
metadata
title: Amanu
emoji: πŸ‘
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 3.44.4
app_file: app.py
pinned: false

This repo's goal is to support the transcription and annotation of audios.

Parts

  • audio.py: Everything related to audio preprocessing and analysis.
  • transcription.py: All code for transcript audios using fast-whisper.
  • diarization.py: Everything related to pyannotation.
  • textformatting.py: All related to fomatting the text in specific outputs.

UI parts

  1. Transcription.
  2. Diarization.
  3. Revision.
  4. Output formatting.

How to access to the service?

The user will logging using a password and user specified by me. That user and password will be manually managed by me.

Pricing

  1. Calculate the fixed cost of a server running for a long period of time.
  2. Check if I can use the hibernation period to save some money.

Development

  • Add word time-stamp
  • Add Accuracy at word level
  • Add mel spectrogram?
  • Add Whisper parameters to the interface
  • Add Whisper X
  • Introduce SRT as output
  • Obtain txt with Diarization.
  • Obtain plain txt with segments.
  • Introduce POS.
  • Optional Preprocessing
  • Transcripcion box as the text being written.

Introduce Tab for analysis including POS. Maybe it would be great to have a visualizer with the timestamps and other features in Streamlit. Quizas correcciones.

Dev

I used huggingface lfs

git install lfs
huggingface-cli lfs-enable-largefiles .