|
--- |
|
language: |
|
- sr |
|
tags: |
|
- Srpski |
|
- Serbian |
|
- GPT2 |
|
- generisanje |
|
- generation |
|
name: |
|
- Serbian-GPT-2 |
|
--- |
|
|
|
# The Best Generative GPT-2 Model For The Serbian Language |
|
|
|
**NOTE**: This model is locked with a key, if you need decryption keys, feel free to contact us at info@edukom.rs |
|
|
|
 |
|
|
|
By sharing this model, we aim to foster further research and applications in Serbian language processing. |
|
|
|
### Introduction: |
|
|
|
This GPT-2 model has been tuned on an extensive Serbian corpus, boasting a richness of 750 million tokens. It is designed to generate high-quality text in Serbian, capturing the nuances and intricacies of the language. |
|
|
|
### Dataset Details: |
|
|
|
The dataset encompasses a diverse range of topics, representing various aspects of the Serbian language and culture. Size: 750 million tokens. |
|
|
|
### Model Usage: |
|
|
|
This model can be utilized for various NLP tasks such as text generation, summarization, translation, and more. Due to its comprehensive training on a vast corpus, it promises accurate and contextually relevant outputs, especially for tasks related to the Serbian language. |
|
|
|
|
|
### Download & Decryption the Model: |
|
|
|
import os |
|
import requests |
|
import shutil |
|
import threading |
|
import time |
|
from transformers import GPT2LMHeadModel |
|
from cryptography.fernet import Fernet |
|
|
|
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' |
|
|
|
# Download Serbian-GPT-2 model |
|
print("\nDownload Serbian-GPT-2 model...") |
|
model_name = 'edukom/Serbian-GPT-2' |
|
base_url = f'https://huggingface.co/{model_name}/resolve/main/' |
|
files_to_download = ['added_tokens.json', 'config.json', 'generation_config.json', 'merges.txt', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json', 'vocab.json'] |
|
|
|
cache_dir = 'path/to/where/you/want/to/store/the/model' |
|
|
|
for file in files_to_download: |
|
response = requests.get(base_url + file) |
|
with open(os.path.join(cache_dir, file), 'wb') as f: |
|
f.write(response.content) |
|
|
|
# Decryption pytorch_model.bin |
|
key = input("\nEnter the decryption key: ").encode() |
|
cipher_suite = Fernet(key) |
|
|
|
decryption_data = os.path.join(cache_dir, 'pytorch_model.bin') |
|
|
|
try: |
|
with open(decryption_data, 'rb') as file: |
|
encrypted_data = file.read() |
|
|
|
decrypted_data = cipher_suite.decrypt(encrypted_data) |
|
|
|
with open(decryption_data, 'wb') as file: |
|
file.write(decrypted_data) |
|
|
|
def find_and_copy(): |
|
base_snapshot_dir = os.path.join(cache_dir, 'models--edukom--Serbian-GPT-2', 'snapshots') |
|
|
|
while not os.path.exists(base_snapshot_dir): |
|
time.sleep(0.1) |
|
|
|
while True: |
|
existing_dirs = [d for d in os.listdir(base_snapshot_dir) if os.path.isdir(os.path.join(base_snapshot_dir, d))] |
|
if existing_dirs: |
|
destination_path = os.path.join(base_snapshot_dir, existing_dirs[0], 'pytorch_model.bin') |
|
shutil.copyfile(decryption_data, destination_path) |
|
break |
|
time.sleep(0.1) |
|
|
|
# Start the copy process in parallel |
|
copy_thread = threading.Thread(target=find_and_copy, name="find_and_copy") |
|
copy_thread.start() |
|
|
|
# Loading Serbian-GPT-2 model |
|
model = GPT2LMHeadModel.from_pretrained(model_name, cache_dir=cache_dir) |
|
|
|
# Ensure the copying finishes |
|
copy_thread.join() |
|
|
|
print("\nCongratulations, the Serbian-GPT-2 model is ready for use ヅ\n") |
|
|
|
except Exception as e: |
|
print(f"\nError during decryption: {e}") |
|
print("\nYou can decrypt the model by contacting the author of this model who will add the key, email: info@edukom.rs") |
|
|
|
# Now you can use the Serbian-GPT-2 model for further operations... |
|
|
|
### Model Usage License: |
|
|
|
The author of this model is the company **Edukom AI**. The model is protected by encryption and its use requires a decryption key. |
|
|
|
This model is available under the following license: |
|
|
|
**For private and non-public use**: This model is freely available for use without any additional obligations. You can use it in your internal projects and experiments without any restrictions. |
|
|
|
**For commercial use**: For commercial use of this model, users are required to contact Edukom AI company to obtain the appropriate license and agreement. |
|
|
|
Please adhere to the license terms when using this model. For any questions or if you need decryption keys, feel free to contact us at **info@edukom.rs** |
|
|
|
Thank you for using our model! ヅ |
|
|
|
 |
|
|
|
|