The Linguistic Lens Application is an intelligent, user-friendly platform designed to bridge language barriers by combining Optical Character Recognition (OCR), translation, and text-to-speech (TTS) functionalities. Built with a combination of Gradio for the interface, PaddleOCR for text extraction, GoogleTranslator for translation, and Google Text-to-Speech (gTTS) for audio conversion, the application provides a seamless experience for users needing real-time text translation from images with audio playback support. Users start by uploading an image, which the system processes to extract text using PaddleOCR. This extracted text is then translated into a selected language via GoogleTranslator using the deep_translator library, supporting a wide array of global languages. The translated text is subsequently converted into audio using gTTS, allowing users to listen to the translation in the chosen language. This multi-component design enables a comprehensive service flow where user inputs are transformed into text, translated, and delivered as both written and spoken output, making the application a robust tool for users needing on-the-go linguistic assistance. By modularizing the OCR, translation, and TTS functions, the application ensures scalability, maintainability, and ease of integration with other services, making it an ideal solution for enhancing accessibility and communication in diverse, multilingual environments.
'''
usecase_diagram = """
To konw more about the project study the below UML diagrams:
The Use Case Diagram:
1. Actors
User: Represents the end-user who interacts with the application by uploading images, initiating OCR, requesting translations, listening to audio, and clearing inputs.
2. Main Application: "Linguistic Lens Application"
The main component that consists of several use cases, each representing different functionalities in the application.
3. Use Cases
Upload Image: The user uploads an image, triggering the OCR process to extract any text within it.
Perform OCR: The OCR functionality, handled by OCR.py, extracts text from the uploaded image.
Translate Text: The user can translate extracted text into a different language using translate_speak.py.
Play Translated Audio: Once the text is translated, the user can listen to it in the target language.
Clear Inputs: Resets all input fields, clearing any previously entered or extracted data.
4. translate_speak.py Component
Convert Text to Audio: Converts text to audio, either from the extracted OCR text or translated text.
Translate Text: Translates text from the source language to the user-selected target language.
Retrieve Supported Languages: Provides the list of supported languages for translation and text-to-speech, displayed in the UI dropdown.
5. OCR.py Component
Extract Text from Image: Extracts text from the uploaded image using PaddleOCR.
6. app.py Component
Display UI Components: Handles the visual elements of the application, such as image uploads, translation, and audio playback.
Handle User Interaction: Processes user actions, such as submitting an image for OCR, initiating a translation, or clearing inputs.
7. Interactions
User Interactions: Users initiate actions like uploading an image, starting OCR, requesting translation, and listening to audio.
System Interactions: Behind the scenes, OCR.py and translate_speak.py handle the core functionalities: text extraction, translation, and audio generation.
"""
class_diagram= '''
Explanation of the Class Diagram
1. Classes and Their Attributes/Methods
App
Attributes:
langs_list: List of supported languages for translation.
langs_dict: Dictionary of supported languages with their language codes.
main_interface: The Gradio interface instance.
Methods:
encode_image(image_path): Encodes an image file to a base64 string for display.
OCR
Methods:
ocr_with_paddle(img): Uses PaddleOCR to perform OCR on an image and return extracted text and the audio path.
TranslateSpeak
Attributes:
output_path: Path where the output audio file is saved.
translate_path: Path for the translated audio file.
Methods:
get_lang(lang): Determines the appropriate language code for text-to-speech in the target language.
audio_streaming(txt, lang, to): Converts text to audio in the specified language, saving the result to a path determined by to.
translate_txt(lang, text): Translates text to the specified language and generates audio, returning the translated text and audio path.
PaddleOCR (External Library Class)
Methods:
ocr(img): Accepts an image input and returns OCR results, which OCR class uses in ocr_with_paddle().
GoogleTranslator (External Library Class)
Methods:
get_supported_languages(as_dict): Returns a list or dictionary of supported languages, depending on the argument.
translate(text, source, target): Translates the given text from source to target language.
gTTS (External Library Class)
Methods:
save(output_path): Saves the generated audio to a specified path.
2. Relationships and Interactions
App uses the OCR and TranslateSpeak classes to provide OCR and translation functionalities within the Gradio UI.
OCR interacts with the PaddleOCR library to perform OCR on user-uploaded images.
TranslateSpeak relies on GoogleTranslator to translate text and gTTS to convert translated text to audio.
'''
object_diagram = """
Explanation of the Object Diagram
1. Objects and Their Attributes
app_instance
langs_list: A list of language codes supported by the application, such as English ("en"), Spanish ("es"), and French ("fr").
langs_dict: A dictionary mapping language codes to their corresponding language names.
main_interface: The main Gradio interface used for user interactions.
ocr_instance
finaltext: Contains the extracted text from an uploaded image after OCR processing.
translate_speak_instance
output_path: Path for saving the original audio output in WAV format.
translate_path: Path for saving the translated audio in WAV format.
paddle_ocr_instance
language: The language setting for OCR processing, defaulted to "en" (English).
google_translator_instance
source: The source language code for translation, set to "en" (English).
target: The target language code for translation, here set to "es" (Spanish).
gtts_instance
lang: The language code for text-to-speech, set to "en" (English) in this example.
slow: A boolean indicating the speed of the generated speech (false indicates normal speed).
2. Relationships and Interactions
app_instance interacts with ocr_instance to extract text from uploaded images using PaddleOCR.
app_instance also interacts with translate_speak_instance for translating and generating audio from text.
ocr_instance utilizes paddle_ocr_instance to perform the OCR process.
translate_speak_instance communicates with google_translator_instance to translate text and uses gtts_instance to convert the translated text to audio.
Each object in this diagram represents a specific instance in the application, showcasing how the components work together to provide OCR, translation, and audio features.
"""
sequence_diagram = '''
Explanation of the Sequence Diagram
Overview
This sequence diagram illustrates how the components of the application interact during a typical workflow where a user uploads an image, the system extracts text from the image using OCR, translates the text into a different language, and finally generates audio from the translated text.
Sequence of Events
User Uploads an Image:
The process begins when the User uploads an image to the application, which sends the image to the App component.
Performing OCR:
The App calls the ocr_with_paddle(image) method from the OCR class to process the uploaded image.
The OCR class invokes the ocr(image) method of the PaddleOCR instance to extract text from the image.
PaddleOCR performs the OCR operation and returns the extracted text to the OCR class.
The OCR class then returns the extracted text to the App.
Translating the Text:
The App now calls the translate_txt(lang, extractedText) method from the TranslateSpeak class to translate the extracted text into the desired language.
The TranslateSpeak class invokes the translate(extractedText, source, target) method of the GoogleTranslator to perform the translation.
GoogleTranslator returns the translated text to the TranslateSpeak class.
Generating Audio:
The TranslateSpeak class then calls the audio_streaming(translatedText, lang, to) method of the gTTS class to generate audio from the translated text.
The gTTS class processes the text-to-speech operation and returns the audio path to the TranslateSpeak class.
Returning Results to the User:
Finally, the TranslateSpeak class returns both the translated text and the audio path back to the App.
The App displays the translated text and provides an audio playback option for the user.
'''
colab_diagram = '''
Explanation of the Collaboration Diagram
Overview
The collaboration diagram illustrates how different components of the "Linguistic Lens Application" interact with each other to fulfill a particular user request, such as uploading an image, performing OCR, translating the text, and generating audio. This diagram focuses on the relationships and messages exchanged between the components rather than the chronological order of operations.
Interactions Between Components
User Uploads an Image:
The interaction begins when the User uploads an image to the App.
OCR Processing:
The App sends the uploaded image to the OCR component to process the image.
The OCR component invokes the ocr(image) method on the PaddleOCR component to extract text from the image.
The PaddleOCR processes the image and returns the extracted text back to the OCR component.
The OCR component then returns the extracted text to the App.
Translation of Text:
The App then calls the translate_txt(lang, extractedText) method on the TranslateSpeak component, passing in the extracted text and the target language.
The TranslateSpeak component requests the translation from the GoogleTranslator, invoking the translate(extractedText, source, target) method.
GoogleTranslator processes the request and returns the translated text back to the TranslateSpeak component.
Generating Audio from Translated Text:
The TranslateSpeak component then calls the audio_streaming(translatedText, lang, to) method on the gTTS component to convert the translated text into audio.
The gTTS component processes the text-to-speech request and returns the audio file path back to the TranslateSpeak component.
Returning Results to the User:
Finally, the TranslateSpeak component sends both the translated text and the audio path back to the App.
The App displays the translated text and provides an option for the User to play the audio.
Key Points
The collaboration diagram emphasizes the connections and messages exchanged between components rather than the sequence of operations, showcasing how components work together to provide the application's functionalities.
Each component plays a crucial role in the workflow, facilitating the operations needed to fulfill user requests efficiently.
This diagram helps in understanding the architecture of the application and the responsibilities of each component within the system. It is particularly useful for identifying how components are interrelated and how data flows through the application.
'''
component_diagram= '''
Explanation of the Component Diagram
Overview
The component diagram illustrates the architecture of the "Linguistic Lens Application" by showing the major components, their roles, and the relationships between them. This diagram helps in understanding how the system is structured and how each part contributes to the overall functionality.
Components and Their Responsibilities
App:
Type: Component
Responsibilities:
Acts as the main interface for the user, handling interactions and orchestrating the workflow of the application.
Manages user input, such as image uploads and language selections.
OCR:
Type: Component
Responsibilities:
Responsible for performing Optical Character Recognition on uploaded images.
Invokes the PaddleOCR component to extract text from images.
TranslateSpeak:
Type: Component
Responsibilities:
Handles the translation of text extracted by the OCR component.
Converts both the original and translated text into audio using the gTTS component.
PaddleOCR:
Type: Component
Responsibilities:
An external library component responsible for executing OCR to extract text from images.
Provides methods to process images and return recognized text.
GoogleTranslator:
Type: Component
Responsibilities:
Translates text from one language to another using Google's translation.
Provides access to a list of supported languages.
gTTS (Google Text-to-Speech):
Type: Component
Responsibilities:
Converts text to speech using Google's text-to-speech.
Saves the generated audio files for playback.
Relationships
The App component interacts with both the OCR and TranslateSpeak components to facilitate the application’s main functionalities.
The OCR component relies on the PaddleOCR library for text extraction.
The TranslateSpeak component uses GoogleTranslator for translation tasks and gTTS for generating audio output from the text.
This component diagram provides a high-level view of the application’s architecture, highlighting how different components work together to deliver the desired functionality. It helps stakeholders understand the modular structure and promotes better maintenance and scalability.
'''
activity_diagram = '''
Explanation of the Activity Diagram
Overview
The activity diagram illustrates the flow of activities in the "Linguistic Lens Application" from the moment the user uploads an image to the final display of translated text and audio playback. This diagram is useful for visualizing the dynamic behavior of the system and understanding how different processes interact.
Activity Flow
User Uploads Image:
The process begins with the User uploading an image.
App Receives Image:
The App component receives the uploaded image from the user.
OCR Processing:
The App calls the ocr_with_paddle(image) method from the OCR component.
The OCR component extracts text from the image by invoking the PaddleOCR component to perform OCR.
PaddleOCR Performs OCR:
The PaddleOCR processes the image and returns the extracted text back to the OCR component.
OCR Returns Extracted Text:
The OCR component sends the extracted text back to the App.
App Displays Extracted Text:
The App displays the extracted text for the user to see.
User Selects Target Language:
The User selects a target language for translation.
App Calls Translate Function:
The App calls the translate_txt(lang, extractedText) method from the TranslateSpeak component to translate the extracted text.
TranslateSpeak Calls GoogleTranslator:
The TranslateSpeak component invokes the GoogleTranslator to translate the extracted text into the selected language.
GoogleTranslator Returns Translated Text:
The GoogleTranslator processes the request and returns the translated text back to the TranslateSpeak component.
TranslateSpeak Calls gTTS for Audio Generation:
The TranslateSpeak component calls the gTTS component to generate audio from the translated text.
gTTS Returns Audio Path:
The gTTS component processes the text-to-speech request and returns the audio file path to the TranslateSpeak component.
TranslateSpeak Returns Results to App:
The TranslateSpeak component sends both the translated text and the audio path back to the App.
App Displays Translated Text and Provides Audio Playback:
Finally, the App displays the translated text and provides an option for the user to play the audio.
Key Points
This activity diagram provides a clear representation of the workflow within the application, making it easy to follow the sequence of operations.
It highlights the interactions between the user and the application, as well as between the various components involved in processing the user's request.
This diagram is beneficial for stakeholders and developers to understand the application's flow, facilitating better communication and ensuring a smoother development process.