metadata

license: mit
datasets:
  - mwz/ur_para
language:
  - ur
tags:
  - 'paraphrase '
pipeline_tag: text2text-generation

Urdu Paraphrasing Model

This repository contains a pretrained model for Urdu paraphrasing. The model is based on the BERT architecture and has been fine-tuned on a large dataset of Urdu paraphrases.

Model Description

The pretrained model is based on the BERT architecture, specifically designed for paraphrasing tasks in the Urdu language. It has been trained using a large corpus of Urdu text to generate high-quality paraphrases.

Model Details

Model Name: Urdu-Paraphrasing-BERT
Base Model: BERT
Architecture: Transformer
Language: Urdu
Dataset: Urdu Paraphrasing Dataset mwz/ur_para

How to Use

You can use this pretrained model for generating paraphrases for Urdu text. Here's an example of how to use the model:

from transformers import pipeline

# Load the model
model = pipeline("text2text-generation", model="path_to_pretrained_model")

# Generate paraphrases
input_text = "Urdu input text for paraphrasing."
paraphrases = model(input_text, max_length=128, num_return_sequences=3)

# Print the generated paraphrases
print("Original Input Text:", input_text)
print("Generated Paraphrases:")
for paraphrase in paraphrases:
    print(paraphrase["generated_text"])

Training

The model was trained using the Hugging Face transformers library. The training process involved fine-tuning the base BERT model on the Urdu Paraphrasing Dataset.

Evaluation

The model's performance was evaluated on a separate validation set using metrics such as BLEU, ROUGE, and perplexity. However, please note that the evaluation results may vary depending on the specific use case.

Acknowledgments

The pretrained model is based on the BERT architecture developed by Google Research.

License

This model and the associated code are licensed under the MIT License.