--- license: apache-2.0 datasets: - Ericu950/Papyri_1 base_model: - meta-llama/Meta-Llama-3.1-8B-Instruct library_name: transformers tags: - papyrology - epigraphy - philology --- # Papy_1_Llama-3.1-8B-Instruct_date This is a fine-tuned version of the Llama-3.1-8B-Instruct model, specialized in assigning a date to Greek documentary papyri. On a test set of 1,856 unseen papyri, its predictions were, on average, 21.7 years away from the actual date spans. ## Dataset This model is fine-tuned on the Ericu950/Papyri_1 dataset, which consists of Greek documentary papyri texts and their corresponding dates sourced from the amazing Papyri.info. ## Usage To run the model on a GPU with larger memory, following these steps: ### 1. Download and load the model ```python import json from transformers import pipeline, AutoTokenizer, LlamaForCausalLM import torch model_id = "Ericu950/Papy_1_Llama-3.1-8B-Instruct_date" model = LlamaForCausalLM.from_pretrained( model_id, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_id) generation_pipeline = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", ) ``` ### 2. Run inference on a papyrus fragment of your choice ```python # This is a rough transcription of Pap.Ups. 106 papyrus_edition = """ ετουσ τεταρτου αυτοκρατοροσ καισαροσ ουεσπασιανου σεβαστου ------------------ ομολογει παυσιριων απολλωνιου του παuσιριωνοσ μητροσ ---------------τωι γεγονοτι αυτωι εκ τησ γενομενησ και μετηλλαχυιασ αυτου γυναικοσ ------------------------- απο τησ αυτησ πολεωσ εν αγυιαι συγχωρειν ειναι ---------------------------------- --------------------σ αυτωι εξ ησ συνεστιν ------------------------------------ ----τησ αυτησ γενεασ την υπαρχουσαν αυτωι οικιαν ------------ ------------------ ---------καὶ αιθριον και αυλη απερ ο υιοσ διοκοροσ -------------------------- --------εγραψεν του δ αυτου διοσκορου ειναι ------------------------------------ ---------- και προ κατενγεγυηται τα δικαια -------------------------------------- νησ κατα τουσ τησ χωρασ νομουσ· εαν δε μη --------------------------------------- υπ αυτου τηι του διοσκορου σημαινομενηι -----------------------------------ενοικισμωι του ημισουσ μερουσ τησ προκειμενησ οικιασ --------------------------------- διοσκοροσ την τουτων αποχην ---------------------------------------------μηδ υπεναντιον τουτοισ επιτελειν μηδε ------------------------------------------------ ανασκευηι κατ αυτησ τιθεσθαι ομολογιαν μηδε ----------------------------------- επιτελεσαι η χωρισ του κυρια ειναι τα διομολογημενα παραβαινειν, εκτεινειν δε τον παραβησομενον τωι υιωι διοσκορωι η τοισ παρ αυτου καθ εκαστην εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν ησσον· δ -----ιων ομολογιαν συνεχωρησεν· """ system_prompt = "Date this papyrus fragment to an exact year!" input_messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": papyrus_edition}, ] terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = generation_pipeline( input_messages, max_new_tokens=4, num_beams=45, # Set this as high as your memory will allow! num_return_sequences=1, early_stopping=True, ) beam_contents = [] for output in outputs: generated_text = output.get('generated_text', []) for item in generated_text: if item.get('role') == 'assistant': beam_contents.append(item.get('content')) real_response = "71 or 72 AD" print(f"Year: {real_response}") for i, content in enumerate(beam_contents, start=1): print(f"Suggestion {i}: {content}") ``` ### Expected Output: ``` Year: 71 or 72 AD Suggestion 1: 71 ``` ## Usage on free tier in Google Colab If you don’t have access to a larger GPU but want to try the model out, you can run it in a quantized format in Google Colab. **The quality of the responses might deteriorate significantly.** Follow these steps: ### Step 1: Install Dependencies ``` !pip install -U bitsandbytes import os os._exit(00) ``` ### Step 2: Download and quantize the model ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline import torch quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_date", device_map = "auto", quantization_config = quant_config) tokenizer = AutoTokenizer.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_date") generation_pipeline = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", ) ``` ### Step 3: Run inference on a papyrus fragment of your choice ``` # This is a rough transcription of Pap.Ups. 106 papyrus_edition = """ ετουσ τεταρτου αυτοκρατοροσ καισαροσ ουεσπασιανου σεβαστου ------------------ ομολογει παυσιριων απολλωνιου του παuσιριωνοσ μητροσ ---------------τωι γεγονοτι αυτωι εκ τησ γενομενησ και μετηλλαχυιασ αυτου γυναικοσ ------------------------- απο τησ αυτησ πολεωσ εν αγυιαι συγχωρειν ειναι ---------------------------------- --------------------σ αυτωι εξ ησ συνεστιν ------------------------------------ ----τησ αυτησ γενεασ την υπαρχουσαν αυτωι οικιαν ------------ ------------------ ---------καὶ αιθριον και αυλη απερ ο υιοσ διοκοροσ -------------------------- --------εγραψεν του δ αυτου διοσκορου ειναι ------------------------------------ ---------- και προ κατενγεγυηται τα δικαια -------------------------------------- νησ κατα τουσ τησ χωρασ νομουσ· εαν δε μη --------------------------------------- υπ αυτου τηι του διοσκορου σημαινομενηι -----------------------------------ενοικισμωι του ημισουσ μερουσ τησ προκειμενησ οικιασ --------------------------------- διοσκοροσ την τουτων αποχην ---------------------------------------------μηδ υπεναντιον τουτοισ επιτελειν μηδε ------------------------------------------------ ανασκευηι κατ αυτησ τιθεσθαι ομολογιαν μηδε ----------------------------------- επιτελεσαι η χωρισ του κυρια ειναι τα διομολογημενα παραβαινειν, εκτεινειν δε τον παραβησομενον τωι υιωι διοσκορωι η τοισ παρ αυτου καθ εκαστην εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν ησσον· δ -----ιων ομολογιαν συνεχωρησεν·""" system_prompt = "Date this papyrus fragment to an exact year!" input_messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": papyrus_edition}, ] outputs = generation_pipeline( input_messages, max_new_tokens=4, num_beams=10, num_return_sequences=1, early_stopping=True, ) beam_contents = [] for output in outputs: generated_text = output.get('generated_text', []) for item in generated_text: if item.get('role') == 'assistant': beam_contents.append(item.get('content')) real_response = "71 or 72 AD" print(f"Year: {real_response}") for i, content in enumerate(beam_contents, start=1): print(f"Suggestion {i}: {content}") ``` ### Expected Output: ``` Year: 71 or 72 AD Suggestion 1: 71 ```