Ericu950
/

Papy_1_Llama-3.1-8B-Instruct_date

@@ -10,13 +10,9 @@ tags:
 - epigraphy
 - philology
 ---
 # Papy_1_Llama-3.1-8B-Instruct_date
 This is a fine-tuned version of the Llama-3.1-8B-Instruct model, specialized in assigning a date to Greek documentary papyri. On a test set of 1,856 unseen papyri its predictions were, on average, 21.7 years away from the actual date spans.
 ## Dataset
 This model was finetuned on the Ericu950/Papyri_1 dataset, which consists of Greek documentary papyri editions and their corresponding dates and geographical attributions sourced from the amazing Papyri.info.
 ## Usage
@@ -29,16 +25,12 @@ To run the model on a GPU with large memory capacity, follow these steps:
 import json
 from transformers import pipeline, AutoTokenizer, LlamaForCausalLM
 import torch
 model_id = "Ericu950/Papy_1_Llama-3.1-8B-Instruct_date"
 model = LlamaForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 generation_pipeline = pipeline(
     "text-generation",
     model=model,
@@ -70,46 +62,37 @@ papyrus_edition = """
 εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν
 ησσον· δ -----ιων ομολογιαν συνεχωρησεν·
 """
 system_prompt = "Date this papyrus fragment to an exact year!"
 input_messages = [
     {"role": "system", "content": system_prompt},
     {"role": "user", "content": papyrus_edition},
 ]
 terminators = [
     tokenizer.eos_token_id,
     tokenizer.convert_tokens_to_ids("<|eot_id|>")
 ]
 outputs = generation_pipeline(
     input_messages,
-    max_new_tokens=13,
     num_beams=45, # Set this as high as your memory will allow!
-    num_return_sequences=3,
     early_stopping=True,
 )
 beam_contents = []
 for output in outputs:
     generated_text = output.get('generated_text', [])
     for item in generated_text:
         if item.get('role') == 'assistant':
             beam_contents.append(item.get('content'))
-real_response = "Oxyrynchos"
-print(f"Place of origin: {real_response}")
 for i, content in enumerate(beam_contents, start=1):
     print(f"Suggestion {i}: {content}")
 ```
 ### Expected Output:
 ```
-Place of origin: Oxyrynchos
-Suggestion 1: Oxyrhynchos
-Suggestion 2: Antinoopolis
-Suggestion 3: Alexandria
 ```
 ## Usage on free tier in Google Colab
@@ -135,18 +118,15 @@ os._exit(00)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
 import torch
 quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
 )
-model = AutoModelForCausalLM.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_place",
 device_map = "auto", quantization_config = quant_config)
-tokenizer = AutoTokenizer.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_place")
 generation_pipeline = pipeline(
     "text-generation",
     model=model,
@@ -176,42 +156,31 @@ papyrus_edition = """
 παραβαινειν, εκτεινειν δε τον παραβησομενον τωι υιωι διοσκορωι η τοισ παρ αυτου καθ εκαστην
 εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν
 ησσον· δ -----ιων ομολογιαν συνεχωρησεν·"""
-system_prompt = "Assign this papyrus fragment to an exact place!"
 input_messages = [
     {"role": "system", "content": system_prompt},
     {"role": "user", "content": papyrus_edition},
 ]
 outputs = generation_pipeline(
     input_messages,
-    max_new_tokens=13,
     num_beams=10,
-    num_return_sequences=3,
     early_stopping=True,
 )
 beam_contents = []
 for output in outputs:
     generated_text = output.get('generated_text', [])
     for item in generated_text:
         if item.get('role') == 'assistant':
             beam_contents.append(item.get('content'))
-real_response = "Oxyrynchos"
-print(f"Place of origin: {real_response}")
 for i, content in enumerate(beam_contents, start=1):
     print(f"Suggestion {i}: {content}")
 ```
 ### Expected Output:
 ```
-Place of origin: Oxyrynchos
-Suggestion 1: Oxyrhynchos
-Suggestion 2: Antinoopolis
-Suggestion 3: Alexandria
-```

 - epigraphy
 - philology
 ---
 # Papy_1_Llama-3.1-8B-Instruct_date
 This is a fine-tuned version of the Llama-3.1-8B-Instruct model, specialized in assigning a date to Greek documentary papyri. On a test set of 1,856 unseen papyri its predictions were, on average, 21.7 years away from the actual date spans.
 ## Dataset
 This model was finetuned on the Ericu950/Papyri_1 dataset, which consists of Greek documentary papyri editions and their corresponding dates and geographical attributions sourced from the amazing Papyri.info.
 ## Usage
 import json
 from transformers import pipeline, AutoTokenizer, LlamaForCausalLM
 import torch
 model_id = "Ericu950/Papy_1_Llama-3.1-8B-Instruct_date"
 model = LlamaForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 generation_pipeline = pipeline(
     "text-generation",
     model=model,
 εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν
 ησσον· δ -----ιων ομολογιαν συνεχωρησεν·
 """
 system_prompt = "Date this papyrus fragment to an exact year!"
 input_messages = [
     {"role": "system", "content": system_prompt},
     {"role": "user", "content": papyrus_edition},
 ]
 terminators = [
     tokenizer.eos_token_id,
     tokenizer.convert_tokens_to_ids("<|eot_id|>")
 ]
 outputs = generation_pipeline(
     input_messages,
+    max_new_tokens=4,
     num_beams=45, # Set this as high as your memory will allow!
+    num_return_sequences=1,
     early_stopping=True,
 )
 beam_contents = []
 for output in outputs:
     generated_text = output.get('generated_text', [])
     for item in generated_text:
         if item.get('role') == 'assistant':
             beam_contents.append(item.get('content'))
+real_response = "71 or 72 AD"
+print(f"Year: {real_response}")
 for i, content in enumerate(beam_contents, start=1):
     print(f"Suggestion {i}: {content}")
 ```
 ### Expected Output:
 ```
+Year: 71 or 72 AD
+Suggestion 1: 71
 ```
 ## Usage on free tier in Google Colab
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
 import torch
 quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
 )
+model = AutoModelForCausalLM.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_date",
 device_map = "auto", quantization_config = quant_config)
+tokenizer = AutoTokenizer.from_pretrained("Ericu950/Papy_1_Llama-3.1-8B-Instruct_date")
 generation_pipeline = pipeline(
     "text-generation",
     model=model,
 παραβαινειν, εκτεινειν δε τον παραβησομενον τωι υιωι διοσκορωι η τοισ παρ αυτου καθ εκαστην
 εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν
 ησσον· δ -----ιων ομολογιαν συνεχωρησεν·"""
+system_prompt = "Date this papyrus fragment to an exact year!"
 input_messages = [
     {"role": "system", "content": system_prompt},
     {"role": "user", "content": papyrus_edition},
 ]
 outputs = generation_pipeline(
     input_messages,
+    max_new_tokens=4,
     num_beams=10,
+    num_return_sequences=1,
     early_stopping=True,
 )
 beam_contents = []
 for output in outputs:
     generated_text = output.get('generated_text', [])
     for item in generated_text:
         if item.get('role') == 'assistant':
             beam_contents.append(item.get('content'))
+real_response = "71 or 72 AD"
+print(f"Year: {real_response}")
 for i, content in enumerate(beam_contents, start=1):
     print(f"Suggestion {i}: {content}")
 ```
 ### Expected Output:
 ```
+Year: 71 or 72 AD
+Suggestion 1: 71
+```