sobomax commited on
Commit
128ce2c
1 Parent(s): 8ef3cf1

Make match reality.

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ Fine-tuning attempts on Microsoft's HiFiGAN vocoder were unsuccessful.
24
 
25
  Our approach involves a smaller model that takes a fixed audio chunk of 8 Mel frames, two pre-frames, and two post-frames.
26
  These frames are processed along with the original vocoder's 12 audio frames of 256 bytes each. The model employs
27
- convolution input layers for both audio and Mel frames to generate hidden dimensions, followed by two linear layers and
28
  a final convolution layer. The output is then multiplied with the original 8 audio frames to produce corrected frames.
29
 
30
  ![HelloSippyRT Model Architecture](https://docs.google.com/drawings/d/e/2PACX-1vTiWxGbEB2MbvHpTJHS22abWNrSt2pHv6XijEDmnQFjAqBewMJyZBQ_5Y9k1P9INQPQmuq56MpLDzJt/pub?w=960&h=720)
 
24
 
25
  Our approach involves a smaller model that takes a fixed audio chunk of 8 Mel frames, two pre-frames, and two post-frames.
26
  These frames are processed along with the original vocoder's 12 audio frames of 256 bytes each. The model employs
27
+ convolution input layers for both audio and Mel frames to generate hidden dimensions, followed by linear layer and
28
  a final convolution layer. The output is then multiplied with the original 8 audio frames to produce corrected frames.
29
 
30
  ![HelloSippyRT Model Architecture](https://docs.google.com/drawings/d/e/2PACX-1vTiWxGbEB2MbvHpTJHS22abWNrSt2pHv6XijEDmnQFjAqBewMJyZBQ_5Y9k1P9INQPQmuq56MpLDzJt/pub?w=960&h=720)