File size: 1,862 Bytes
a077c2b 444ac7a a077c2b 05a2a59 50ed190 0dfd4a5 1f3bc9f d403d33 f9056d5 50ed190 f9056d5 a077c2b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
title: The Language Transliteration Project
emoji: 🐠
colorFrom: indigo
colorTo: gray
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false
license: cc
---
Use this application on HuggingFace🤗 :- https://huggingface.co/spaces/DebasishDhal99/The-Language-Transliteration-Project
Blog discussing the results :- https://medium.com/@debasishdhaldd99/simplifying-language-through-python-aae6ee7113d9
This space is aimed at helping people with getting familiarized with Polish, Turkish, Hungarian, Serbo-Croatian-Bosniak (both Latin and Cyrillic based) and Romanian spelling system.
These languages use a modified Latin script with a lot of diacritic marks and digraphs, thus making them difficult for non-native speakers to pronounce or read the words
properly. This space offers simplified spelling of words/sentence in the said languages. More languages are on the pipeline.
For example, the Polish word Jarosław, an English speaker who isn't familiar with Polish orthography will pronounce it as Jaroslav, while its actual Polish pronunciation
is Yaroswav. Similary, the city of Przemyśl should be pronounced as Pzhemyshl, even though its not evident to an English speaker.
The approach for transliterating Polish language taken in this space is converting Polish character combinations to Cyrillic equivalents, which are single characters, thus
simplifying our task greately.
Features added as of now:-
- Polish, Turkish, Hungarian, Serbo-Croatian-Bosnian, Romanian language added.
- Option for the user to choose any of the 3-4 examples available and pass it as input to the model.
- Option for the user to generate a random but coherent sentence and pass it as input to the model. Acts as a nice playground for the user.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|