adi-123 commited on
Commit
f2710b4
Β·
1 Parent(s): 42b428c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -10,4 +10,54 @@ pinned: false
10
  license: unknown
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
10
  license: unknown
11
  ---
12
 
13
+
14
+ # πŸ–ΌοΈ Image to 🎧 Audio Story Generator
15
+
16
+ This project showcases an end-to-end pipeline that transforms an image into an audio story using various AI models and tools.
17
+
18
+ ## 🌟 Overview
19
+
20
+ The goal of this project is to leverage AI capabilities to convert an uploaded image into an audio story. It uses a combination of image captioning, text generation, and text-to-speech models.
21
+
22
+ ## πŸš€ Features
23
+
24
+ ### πŸ“· Image Captioning
25
+ - Utilizes Salesforce's `blip-image-captioning-base` model to generate textual descriptions of uploaded images.
26
+
27
+ ### ✍️ Text Generation (Story Creation)
28
+ - Employs OpenAI's `togethercomputer/llama-2-70b-chat` model to create a short story influenced by the provided image caption within a positive conclusion of 100 words or less.
29
+
30
+ ### πŸ”Š Text-to-Speech Conversion
31
+ - Utilizes Hugging Face's `espnet/kan-bayashi_ljspeech_vits` model to convert the generated story into an audio file.
32
+
33
+ ### 🌐 Streamlit Web App
34
+ - Built using Streamlit, allowing users to upload images and visualize the generated image caption, story, and audio.
35
+
36
+ ## πŸ“ Usage
37
+
38
+ To use this application:
39
+
40
+ 1. Clone this repository.
41
+ 2. Install the required dependencies using `pip install -r requirements.txt`.
42
+ 3. Set up the necessary environment variables:
43
+ - `TOGETHER_API_KEY`: OpenAI API key.
44
+ - `HUGGINGFACEHUB_API_TOKEN`: Hugging Face API token.
45
+ 4. Run the Streamlit app with `streamlit run app.py`.
46
+ 5. Upload an image file (supported formats: jpg, jpeg, png).
47
+ 6. Wait for the AI processing to generate the story and audio.
48
+ 7. Access the image caption, story, and audio outputs.
49
+
50
+ ## πŸ“ Code Structure
51
+
52
+ - `app.py`: Contains the Streamlit web application code, integrating all functionalities.
53
+ - `README.md`: Documentation explaining the project, usage instructions, and dependencies.
54
+ - `requirements.txt`: Lists all necessary libraries.
55
+
56
+ ## πŸ™Œ Credits
57
+
58
+ This project was created with love by @Aditya-Neural-Net-Ninja. It makes use of cutting-edge AI models for image analysis, natural language processing, and text-to-speech conversion. Special thanks to Streamlit and Hugging Face for their incredible platforms.
59
+
60
+
61
+ **Note:** Please ensure you have the required API keys and tokens for OpenAI and Hugging Face to run this application successfully.
62
+
63
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference