Spaces:
Running
Running
description = ''' | |
<link href="https://fonts.googleapis.com/css2?family=Roboto+Mono&display=swap" rel="stylesheet"> | |
<div style="font-family: 'Roboto Mono'; 'monospace'; font-weight: 400; line-height: 1.6; font-size: 18px; text-align: justify; text-justify: inter-word;"> | |
The <strong>Linguistic Lens Application</strong> is an intelligent, user-friendly platform designed to bridge language barriers by combining Optical Character Recognition (OCR), translation, and text-to-speech (TTS) functionalities. Built with a combination of Gradio for the interface, <code>PaddleOCR</code> for text extraction, <code>GoogleTranslator</code> for translation, and <code>Google Text-to-Speech (gTTS)</code> for audio conversion, the application provides a seamless experience for users needing real-time text translation from images with audio playback support. Users start by uploading an image, which the system processes to extract text using PaddleOCR. This extracted text is then translated into a selected language via GoogleTranslator using the deep_translator library, supporting a wide array of global languages. The translated text is subsequently converted into audio using gTTS, allowing users to listen to the translation in the chosen language. This multi-component design enables a comprehensive service flow where user inputs are transformed into text, translated, and delivered as both written and spoken output, making the application a robust tool for users needing on-the-go linguistic assistance. By modularizing the OCR, translation, and TTS functions, the application ensures scalability, maintainability, and ease of integration with other services, making it an ideal solution for enhancing accessibility and communication in diverse, multilingual environments. | |
</div> | |
''' | |
usecase_diagram = """ | |
--- | |
To konw more about the project study the below UML diagrams: | |
### The Use Case Diagram: | |
<img src="data:image/png;base64,{}" alt="usecase" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### 1. **Actors* | |
- **User**: Represents the end-user who interacts with the application by uploading images, initiating OCR, requesting translations, listening to audio, and clearing inputs. | |
#### 2. **Main Application**: "Linguistic Lens Application" | |
- The main component that consists of several use cases, each representing different functionalities in the application. | |
#### 3. **Use Cases** | |
- **Upload Image**: The user uploads an image, triggering the OCR process to extract any text within it. | |
- **Perform OCR**: The OCR functionality, handled by `OCR.py`, extracts text from the uploaded image. | |
- **Translate Text**: The user can translate extracted text into a different language using `translate_speak.py`. | |
- **Play Translated Audio**: Once the text is translated, the user can listen to it in the target language. | |
- **Clear Inputs**: Resets all input fields, clearing any previously entered or extracted data. | |
#### 4. **translate_speak.py Component** | |
- **Convert Text to Audio**: Converts text to audio, either from the extracted OCR text or translated text. | |
- **Translate Text**: Translates text from the source language to the user-selected target language. | |
- **Retrieve Supported Languages**: Provides the list of supported languages for translation and text-to-speech, displayed in the UI dropdown. | |
#### 5. **OCR.py Component** | |
- **Extract Text from Image**: Extracts text from the uploaded image using PaddleOCR. | |
#### 6. **app.py Component** | |
- **Display UI Components**: Handles the visual elements of the application, such as image uploads, translation, and audio playback. | |
- **Handle User Interaction**: Processes user actions, such as submitting an image for OCR, initiating a translation, or clearing inputs. | |
#### 7. **Interactions** | |
- **User Interactions**: Users initiate actions like uploading an image, starting OCR, requesting translation, and listening to audio. | |
- **System Interactions**: Behind the scenes, `OCR.py` and `translate_speak.py` handle the core functionalities: text extraction, translation, and audio generation. | |
--- | |
""" | |
class_diagram= ''' | |
### Explanation of the Class Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### 1. **Classes and Their Attributes/Methods** | |
- **App** | |
- **Attributes**: | |
- `langs_list`: List of supported languages for translation. | |
- `langs_dict`: Dictionary of supported languages with their language codes. | |
- `main_interface`: The Gradio interface instance. | |
- **Methods**: | |
- `encode_image(image_path)`: Encodes an image file to a base64 string for display. | |
- **OCR** | |
- **Methods**: | |
- `ocr_with_paddle(img)`: Uses PaddleOCR to perform OCR on an image and return extracted text and the audio path. | |
- **TranslateSpeak** | |
- **Attributes**: | |
- `output_path`: Path where the output audio file is saved. | |
- `translate_path`: Path for the translated audio file. | |
- **Methods**: | |
- `get_lang(lang)`: Determines the appropriate language code for text-to-speech in the target language. | |
- `audio_streaming(txt, lang, to)`: Converts text to audio in the specified language, saving the result to a path determined by `to`. | |
- `translate_txt(lang, text)`: Translates text to the specified language and generates audio, returning the translated text and audio path. | |
- **PaddleOCR** (External Library Class) | |
- **Methods**: | |
- `ocr(img)`: Accepts an image input and returns OCR results, which `OCR` class uses in `ocr_with_paddle()`. | |
- **GoogleTranslator** (External Library Class) | |
- **Methods**: | |
- `get_supported_languages(as_dict)`: Returns a list or dictionary of supported languages, depending on the argument. | |
- `translate(text, source, target)`: Translates the given text from source to target language. | |
- **gTTS** (External Library Class) | |
- **Methods**: | |
- `save(output_path)`: Saves the generated audio to a specified path. | |
#### 2. **Relationships and Interactions** | |
- **App** uses the **OCR** and **TranslateSpeak** classes to provide OCR and translation functionalities within the Gradio UI. | |
- **OCR** interacts with the **PaddleOCR** library to perform OCR on user-uploaded images. | |
- **TranslateSpeak** relies on **GoogleTranslator** to translate text and **gTTS** to convert translated text to audio. | |
--- | |
''' | |
object_diagram = """ | |
### Explanation of the Object Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### 1. **Objects and Their Attributes** | |
- **app_instance** | |
- `langs_list`: A list of language codes supported by the application, such as English ("en"), Spanish ("es"), and French ("fr"). | |
- `langs_dict`: A dictionary mapping language codes to their corresponding language names. | |
- `main_interface`: The main Gradio interface used for user interactions. | |
- **ocr_instance** | |
- `finaltext`: Contains the extracted text from an uploaded image after OCR processing. | |
- **translate_speak_instance** | |
- `output_path`: Path for saving the original audio output in WAV format. | |
- `translate_path`: Path for saving the translated audio in WAV format. | |
- **paddle_ocr_instance** | |
- `language`: The language setting for OCR processing, defaulted to "en" (English). | |
- **google_translator_instance** | |
- `source`: The source language code for translation, set to "en" (English). | |
- `target`: The target language code for translation, here set to "es" (Spanish). | |
- **gtts_instance** | |
- `lang`: The language code for text-to-speech, set to "en" (English) in this example. | |
- `slow`: A boolean indicating the speed of the generated speech (false indicates normal speed). | |
#### 2. **Relationships and Interactions** | |
- **app_instance** interacts with **ocr_instance** to extract text from uploaded images using PaddleOCR. | |
- **app_instance** also interacts with **translate_speak_instance** for translating and generating audio from text. | |
- **ocr_instance** utilizes **paddle_ocr_instance** to perform the OCR process. | |
- **translate_speak_instance** communicates with **google_translator_instance** to translate text and uses **gtts_instance** to convert the translated text to audio. | |
Each object in this diagram represents a specific instance in the application, showcasing how the components work together to provide OCR, translation, and audio features. | |
--- | |
""" | |
sequence_diagram = ''' | |
### Explanation of the Sequence Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### Overview | |
This sequence diagram illustrates how the components of the application interact during a typical workflow where a user uploads an image, the system extracts text from the image using OCR, translates the text into a different language, and finally generates audio from the translated text. | |
#### Sequence of Events | |
1. **User Uploads an Image**: | |
- The process begins when the **User** uploads an image to the application, which sends the image to the **App** component. | |
2. **Performing OCR**: | |
- The **App** calls the `ocr_with_paddle(image)` method from the **OCR** class to process the uploaded image. | |
- The **OCR** class invokes the `ocr(image)` method of the **PaddleOCR** instance to extract text from the image. | |
- **PaddleOCR** performs the OCR operation and returns the extracted text to the **OCR** class. | |
- The **OCR** class then returns the extracted text to the **App**. | |
3. **Translating the Text**: | |
- The **App** now calls the `translate_txt(lang, extractedText)` method from the **TranslateSpeak** class to translate the extracted text into the desired language. | |
- The **TranslateSpeak** class invokes the `translate(extractedText, source, target)` method of the **GoogleTranslator** to perform the translation. | |
- **GoogleTranslator** returns the translated text to the **TranslateSpeak** class. | |
4. **Generating Audio**: | |
- The **TranslateSpeak** class then calls the `audio_streaming(translatedText, lang, to)` method of the **gTTS** class to generate audio from the translated text. | |
- The **gTTS** class processes the text-to-speech operation and returns the audio path to the **TranslateSpeak** class. | |
5. **Returning Results to the User**: | |
- Finally, the **TranslateSpeak** class returns both the translated text and the audio path back to the **App**. | |
- The **App** displays the translated text and provides an audio playback option for the user. | |
--- | |
''' | |
colab_diagram = ''' | |
### Explanation of the Collaboration Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### Overview | |
The collaboration diagram illustrates how different components of the "Linguistic Lens Application" interact with each other to fulfill a particular user request, such as uploading an image, performing OCR, translating the text, and generating audio. This diagram focuses on the relationships and messages exchanged between the components rather than the chronological order of operations. | |
#### Interactions Between Components | |
1. **User Uploads an Image**: | |
- The interaction begins when the **User** uploads an image to the **App**. | |
2. **OCR Processing**: | |
- The **App** sends the uploaded image to the **OCR** component to process the image. | |
- The **OCR** component invokes the `ocr(image)` method on the **PaddleOCR** component to extract text from the image. | |
- The **PaddleOCR** processes the image and returns the extracted text back to the **OCR** component. | |
- The **OCR** component then returns the extracted text to the **App**. | |
3. **Translation of Text**: | |
- The **App** then calls the `translate_txt(lang, extractedText)` method on the **TranslateSpeak** component, passing in the extracted text and the target language. | |
- The **TranslateSpeak** component requests the translation from the **GoogleTranslator**, invoking the `translate(extractedText, source, target)` method. | |
- **GoogleTranslator** processes the request and returns the translated text back to the **TranslateSpeak** component. | |
4. **Generating Audio from Translated Text**: | |
- The **TranslateSpeak** component then calls the `audio_streaming(translatedText, lang, to)` method on the **gTTS** component to convert the translated text into audio. | |
- The **gTTS** component processes the text-to-speech request and returns the audio file path back to the **TranslateSpeak** component. | |
5. **Returning Results to the User**: | |
- Finally, the **TranslateSpeak** component sends both the translated text and the audio path back to the **App**. | |
- The **App** displays the translated text and provides an option for the **User** to play the audio. | |
#### Key Points | |
- The collaboration diagram emphasizes the connections and messages exchanged between components rather than the sequence of operations, showcasing how components work together to provide the application's functionalities. | |
- Each component plays a crucial role in the workflow, facilitating the operations needed to fulfill user requests efficiently. | |
This diagram helps in understanding the architecture of the application and the responsibilities of each component within the system. It is particularly useful for identifying how components are interrelated and how data flows through the application. | |
--- | |
''' | |
component_diagram= ''' | |
### Explanation of the Component Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### Overview | |
The component diagram illustrates the architecture of the "Linguistic Lens Application" by showing the major components, their roles, and the relationships between them. This diagram helps in understanding how the system is structured and how each part contributes to the overall functionality. | |
#### Components and Their Responsibilities | |
1. **App**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- Acts as the main interface for the user, handling interactions and orchestrating the workflow of the application. | |
- Manages user input, such as image uploads and language selections. | |
2. **OCR**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- Responsible for performing Optical Character Recognition on uploaded images. | |
- Invokes the **PaddleOCR** component to extract text from images. | |
3. **TranslateSpeak**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- Handles the translation of text extracted by the **OCR** component. | |
- Converts both the original and translated text into audio using the **gTTS** component. | |
4. **PaddleOCR**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- An external library component responsible for executing OCR to extract text from images. | |
- Provides methods to process images and return recognized text. | |
5. **GoogleTranslator**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- Translates text from one language to another using Google's translation. | |
- Provides access to a list of supported languages. | |
6. **gTTS (Google Text-to-Speech)**: | |
- **Type**: Component | |
- **Responsibilities**: | |
- Converts text to speech using Google's text-to-speech. | |
- Saves the generated audio files for playback. | |
#### Relationships | |
- The **App** component interacts with both the **OCR** and **TranslateSpeak** components to facilitate the application’s main functionalities. | |
- The **OCR** component relies on the **PaddleOCR** library for text extraction. | |
- The **TranslateSpeak** component uses **GoogleTranslator** for translation tasks and **gTTS** for generating audio output from the text. | |
This component diagram provides a high-level view of the application’s architecture, highlighting how different components work together to deliver the desired functionality. It helps stakeholders understand the modular structure and promotes better maintenance and scalability. | |
--- | |
''' | |
activity_diagram = ''' | |
### Explanation of the Activity Diagram | |
<img src="data:image/png;base64,{}" alt="class" width="500" height="500" style="display: block; margin-right: 40px;"> | |
#### Overview | |
The activity diagram illustrates the flow of activities in the "Linguistic Lens Application" from the moment the user uploads an image to the final display of translated text and audio playback. This diagram is useful for visualizing the dynamic behavior of the system and understanding how different processes interact. | |
#### Activity Flow | |
1. **User Uploads Image**: | |
- The process begins with the **User** uploading an image. | |
2. **App Receives Image**: | |
- The **App** component receives the uploaded image from the user. | |
3. **OCR Processing**: | |
- The **App** calls the `ocr_with_paddle(image)` method from the **OCR** component. | |
- The **OCR** component extracts text from the image by invoking the **PaddleOCR** component to perform OCR. | |
4. **PaddleOCR Performs OCR**: | |
- The **PaddleOCR** processes the image and returns the extracted text back to the **OCR** component. | |
5. **OCR Returns Extracted Text**: | |
- The **OCR** component sends the extracted text back to the **App**. | |
6. **App Displays Extracted Text**: | |
- The **App** displays the extracted text for the user to see. | |
7. **User Selects Target Language**: | |
- The **User** selects a target language for translation. | |
8. **App Calls Translate Function**: | |
- The **App** calls the `translate_txt(lang, extractedText)` method from the **TranslateSpeak** component to translate the extracted text. | |
9. **TranslateSpeak Calls GoogleTranslator**: | |
- The **TranslateSpeak** component invokes the **GoogleTranslator** to translate the extracted text into the selected language. | |
10. **GoogleTranslator Returns Translated Text**: | |
- The **GoogleTranslator** processes the request and returns the translated text back to the **TranslateSpeak** component. | |
11. **TranslateSpeak Calls gTTS for Audio Generation**: | |
- The **TranslateSpeak** component calls the **gTTS** component to generate audio from the translated text. | |
12. **gTTS Returns Audio Path**: | |
- The **gTTS** component processes the text-to-speech request and returns the audio file path to the **TranslateSpeak** component. | |
13. **TranslateSpeak Returns Results to App**: | |
- The **TranslateSpeak** component sends both the translated text and the audio path back to the **App**. | |
14. **App Displays Translated Text and Provides Audio Playback**: | |
- Finally, the **App** displays the translated text and provides an option for the user to play the audio. | |
#### Key Points | |
- This activity diagram provides a clear representation of the workflow within the application, making it easy to follow the sequence of operations. | |
- It highlights the interactions between the user and the application, as well as between the various components involved in processing the user's request. | |
This diagram is beneficial for stakeholders and developers to understand the application's flow, facilitating better communication and ensuring a smoother development process. | |
--- | |
''' | |
footer = """ | |
<div style="background-color: #333; color: white; padding: 10px; width: 100%; bottom: 0; left: 0; display: flex; justify-content: space-between; align-items: center; padding: .2rem 35px; box-sizing: border-box; font-size: 16px;"> | |
<div style="text-align: left;"> | |
<p style="margin: 0;">© 2024 </p> | |
</div> | |
<div style="text-align: center; flex-grow: 1;"> | |
<p style="margin: 0;">This website is made with ❤ by SARATH CHANDRA</p> | |
</div> | |
<div class="social-links" style="display: flex; gap: 20px; justify-content: flex-end; align-items: center;"> | |
<a href="https://github.com/21bq1a4210" target="_blank" style="text-align: center;"> | |
<img src="data:image/png;base64,{}" alt="GitHub" width="40" height="40" style="display: block; margin: 0 auto;"> | |
<span style="font-size: 14px;">GitHub</span> | |
</a> | |
<a href="https://www.linkedin.com/in/sarath-chandra-bandreddi-07393b1aa/" target="_blank" style="text-align: center;"> | |
<img src="data:image/png;base64,{}" alt="LinkedIn" width="40" height="40" style="display: block; margin: 0 auto;"> | |
<span style="font-size: 14px;">LinkedIn</span> | |
</a> | |
<a href="https://21bq1a4210.github.io/MyPortfolio-/" target="_blank" style="text-align: center;"> | |
<img src="data:image/png;base64,{}" alt="Portfolio" width="40" height="40" style="display: block; margin-right: 40px;"> | |
<span style="font-size: 14px;">Portfolio</span> | |
</a> | |
</div> | |
</div> | |
""" |