BasePersianTextFormalizer

This model is fine-tuned to generate formal text from informal text based on the input provided. It has been fine-tuned on [Mohavere Dataset] (Takalli vahideh, Kalantari, Fateme, Shamsfard, Mehrnoush, Developing an Informal-Formal Persian Corpus, 2022.) using the pretrained model parsT5-base.

Usage

from transformers import (T5ForConditionalGeneration, AutoTokenizer, pipeline)
import torch

model = T5ForConditionalGeneration.from_pretrained('PardisSzah/BasePersianTextFormalizer')
tokenizer = AutoTokenizer.from_pretrained('PardisSzah/BasePersianTextFormalizer')


pipe = pipeline(task='text2text-generation', model=model, tokenizer=tokenizer)
def test_model(text):
  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  
  model.to(device) 

  inputs = tokenizer.encode("informal: " + text, return_tensors='pt', max_length=128, truncation=True, padding='max_length')
  inputs = inputs.to(device) 

  outputs = model.generate(inputs, max_length=128, num_beams=4)
  print("Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))  

text = "به یکی از دوستام میگم که چرا اینکار رو میکنی چرا به فکرت نباید برسه "
print("Original:", text)
test_model(text)

# output:  به یکی از دوستانم می گویم که چرا اینکار را می کنی چرا به فکرت نباید برسد

text = "اسم من پردیسه و خوشحالم که از این مدل خوشتون اومده "
print("Original:", text)
test_model(text)

# output:  اسم من پردیس است و خوشحالم که از این مدل خوشتان آمده است
Downloads last month
9
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.