PDF to Podcast Converter

Overview

This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file.

Features

Convert PDF to Podcast: Upload a PDF and convert its content into a podcast dialogue.
Engaging Dialogue: The generated dialogue is designed to be informative and entertaining.
Multiple Voice Options: Choose from different voices to narrate the podcast.
User-friendly Interface: Simple interface using Gradio for easy interaction.

Installation

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/knowsuchagency/pdf-to-podcast.git
cd pdf-to-podcast

Create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Set up API Key(s): Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey. Use it as the value to GEMINI_API_KEY. You'll also need an api key for OpenAI which you can either pass through the interface or set as the OPENAI_API_KEY environment variable.

Gemini flash is used as the LLM and OpenAI is used for text-to-speech.
Run the application:
```
python main.py
```
This will launch a Gradio interface in your web browser.
Upload a PDF: Upload the PDF document you want to convert into a podcast.
Enter OpenAI API Key: Provide your OpenAI API key in the designated textbox.
Generate Audio: Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue.

Project Structure

main.py: Main application script.
requirements.txt: List of dependencies.
README.md: Project documentation (this file).

Code Explanation

Dialogue Models

Defines the structure of the dialogue using Pydantic models.

class DialogueItem(BaseModel):
    text: str
    voice: Literal["alloy", "onyx", "fable"]

class Dialogue(BaseModel):
    scratchpad: str
    dialogue: List[DialogueItem]

LLM Function

Generates dialogue based on the input text using the promptic decorator.

@llm(model="gemini/gemini-1.5-flash")
def generate_dialogue(text: str) -> Dialogue:
    # Function to generate podcast dialogue

TTS Function

Converts text to speech using OpenAI's text-to-speech model.

def get_mp3(text: str, voice: str, api_key: str = None) -> bytes:
    # Function to generate MP3 from text

Main Function

Processes the PDF, generates dialogue, and converts it to audio.

def generate_audio(file: bytes, openai_api_key: str) -> bytes:
    # Main function to process PDF and generate audio

Gradio Interface

Creates a user-friendly interface for uploading PDFs and generating podcasts.

demo = gr.Interface(
    title="PDF to Podcast",
    description="Convert any PDF document into an engaging podcast episode!",
    fn=generate_audio,
    inputs=[
        gr.File(label="Input PDF", type="binary"),
        gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"),
    ],
    outputs=[
        gr.Audio(format="mp3"),
    ],
)

demo.launch(show_api=False)

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more information.