|
--- |
|
languages: |
|
- en |
|
license: |
|
- cc-by-nc-sa-4.0 |
|
- apache-2.0 |
|
tags: |
|
- grammar |
|
- spelling |
|
- punctuation |
|
- error-correction |
|
- grammar synthesis |
|
- FLAN |
|
- C4 |
|
|
|
datasets: |
|
- C4 |
|
|
|
widget: |
|
- text: "Me go to the store yesterday and buy many thing. I saw a big dog but he no bark at me. Then I walk home and eat my lunch, it was delicious sandwich. After that, I watch TV and see a funny show about cat who can talk. I laugh so hard I cry. Then I go to bed but I no can sleep because I too excited about the cat show." |
|
example_title: "Long-Text" |
|
- text: "Me and my family go on a trip to the mountains last week. We drive for many hours and finally reach our cabin. The cabin was cozy and warm, with a fireplace and big windows. We spend our days hiking and exploring the forest. At night, we sit by the fire and tell story. It was a wonderful vacation." |
|
example_title: "Long-Text" |
|
- text: "so em if we have an now so with fito ringina know how to estimate the tren given the ereafte mylite trend we can also em an estimate is nod s i again tort watfettering an we have estimated the trend an called wot to be called sthat of exty right now we can and look at wy this should not hare a trend i becan we just remove the trend an and we can we now estimate tesees ona effect of them exty" |
|
example_title: "Transcribed Audio Example" |
|
- text: "My coworker said he used a financial planner to help choose his stocks so he wouldn't loose money." |
|
example_title: "incorrect word choice" |
|
- text: "good so hve on an tadley i'm not able to make it to the exla session on monday this week e which is why i am e recording pre recording an this excelleision and so to day i want e to talk about two things and first of all em i wont em wene give a summary er about ta ohow to remove trents in these nalitives from time series" |
|
example_title: "lowercased audio transcription output" |
|
|
|
parameters: |
|
max_length: 128 |
|
min_length: 4 |
|
num_beams: 8 |
|
repetition_penalty: 1.21 |
|
length_penalty: 1 |
|
early_stopping: True |
|
--- |
|
|
|
# Grammar-Synthesis-Enhanced: FLAN-t5 |
|
|
|
<a href="https://colab.research.google.com/gist/Aelzi/25fee0b38c4687a2e9821d87980bbb09/demo-flan-t5-large-grammar-synthesis.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This model is a fine-tuned version of [pszemraj/flan-t5-large-grammar-synthesis](https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis) using the C4 200M dataset for the NaraSpeak Bangkit 2024 ENTR-H130 application. |
|
|
|
## T5 Model Overview |
|
|
|
The T5 (Text-To-Text Transfer Transformer) model, introduced by Google Research, is a transformer-based model that treats every NLP task as a text-to-text problem. This unified approach allows T5 to excel at a variety of tasks, such as translation, summarization, and question answering, by converting inputs and outputs into text format. |
|
|
|
### Transformer Architecture |
|
|
|
Transformers are a type of deep learning model designed for sequence-to-sequence tasks. They utilize a mechanism called "attention" to weigh the influence of different words in a sequence, allowing the model to focus on relevant parts of the input when generating each word in the output. This architecture is highly parallelizable and has proven effective in NLP tasks. |
|
|
|
## Usage in Python |
|
|
|
After `pip install transformers`, run the following code: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
corrector = pipeline( |
|
'text2text-generation', |
|
'farelzii/GEC_Test_v1', |
|
) |
|
raw_text = 'i can has cheezburger' |
|
results = corrector(raw_text) |
|
print(results) |
|
|