--- license: llama3 inference: parameters: num_beams: 3 num_beam_groups: 3 num_return_sequences: 1 repetition_penalty: 10 diversity_penalty: 3.01 no_repeat_ngram_size: 2 temperature: 0.8 max_length: 128 widget: - text: >- Learn to build generative AI applications with an expert AWS instructor with the 2-day Developing Generative AI Applications on AWS course. example_title: AWS course - text: >- In healthcare, Generative AI can help generate synthetic medical data to train machine learning models, develop new drug candidates, and design clinical trials. example_title: Generative AI - text: >- By leveraging prior model training through transfer learning, fine-tuning can reduce the amount of expensive computing power and labeled data needed to obtain large models tailored to niche use cases and business needs. example_title: Fine Tuning --- # Text Rewriter Paraphraser This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters. ## Key Features: * **Fine-tuned on t5-base:** Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing. * **Large Dataset (430k examples):** Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance. * **High Quality Paraphrases:** Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness. * **Non-AI Detectable:** Aims to produce paraphrases that appear natural and indistinguishable from human-written text. **Model Performance:** * Train Loss: 1.0645 * Validation Loss: 0.8761 ## Getting Started: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Replace 'YOUR_TOKEN' with your actual Hugging Face access token tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN') model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN') ``` ```python text = "Data science is a field that deals with extracting knowledge and insights from data. " inputs = tokenizer(text, return_tensors="pt") output = model.generate(**inputs, max_length=50) print(tokenizer.decode(output[0])) ``` **Disclaimer:** * Limited Use: It grants a non-exclusive, non-transferable license to use the this model same as Llama-3. This means you can't freely share it with others or sell the model itself. * Commercial Use Allowed: You can use the model for commercial purposes, but under the terms of the license agreement. * Attribution Required: You need to abide by the agreement's terms regarding attribution. It is essential to use the paraphrased text responsibly and ethically, with proper attribution of the original source. **Further Development:** (Mention any ongoing development or areas for future improvement in Discussions.)