Spaces:

diegopacheco
/

gen-ai-multimodel-fun

Sleeping

File size: 987 Bytes

7d6fea3
 
 
 
 
 
 
 
a7920d3
7d6fea3
 
 
 
 
 
edeaf50
 
 
 
 
 
 
 
 
 
 
 
ff067ae
edeaf50
 
 
 
 
 
 
7d6fea3

---
title: Diego GenAI LLM multi-model story telling fun
emoji: 🤗
sdk: gradio
sdk_version: 4.24.0
license: cc-by-nc-sa-4.0
short_description: Diego's GenAI LLM multi-model story telling fun
colorFrom: yellow
colorTo: gray
app_file: app.py
---


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

### Result
* Multi-models in action
* Story Telling
  * Given a image
  * Generate the caption for the image
  * Generate an background story for the text
* Use LLM models:
  * Salesforce/blip-image-captioning-base for image captioning
  * gpt2 for text generation
  * gTTS for text to speech, gTTS is a Python library and CLI tool to interface with Google Translate's text-to-speech API.
  * openai/whisper-large-v2 for speach recognition
  * pipeline/sentiment-analysis task for sentiment analysis of the text story

Result UI:
<img src='result.png' />

Audio Result:

<audio controls>
  <source src="audio.mp3" type="audio/mpeg">
</audio>