--- library_name: transformers license: llama3.2 base_model: unsloth/Llama-3.2-3B-Instruct tags: - trl - sft - generated_from_trainer model-index: - name: text2graph-llama-3.2-3b results: [] --- # text2graph-llama-3.2-3b This model is a fine-tuned version of [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct) on a text2triple dataset curated by Sonnet-3.5. This model has much faster inference speed than our previous trained [T5-based model](https://huggingface.co/pat-jj/text2triple-flan-t5). Also, it performs better for longer (> 512 tokens) input. # Example Input: "William Gerald Standridge (November 27, 1953 – April 12, 2014) was an American stock car racing driver. He was a competitor in the NASCAR Winston Cup Series and Busch Series." # Example Output: (S> William gerald standridge| P> Nationality| O> American), \ (S> William gerald standridge| P> Occupation| O> Stock car racing driver), \ (S> William gerald standridge| P> Competitor| O> Busch series), \ (S> William gerald standridge| P> Competitor| O> Nascar winston cup series), \ (S> William gerald standridge| P> Birth date| O> November 27, 1953), \ (S> William gerald standridge| P> Death date| O> April 12, 2014) # How to Use? ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "pat-jj/text2graph-llama-3.2-3b" def load_model_and_tokenizer(): # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Set up chat template tokenizer.chat_template = tokenizer.chat_template or "llama-3.1" return model, tokenizer def generate_triples(model, tokenizer, input_text, max_length=2048): # Format the input using chat template messages = [{ "role": "user", "content": f"Convert the following text to triples:\n\nText: {input_text}" }] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Tokenize input inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate response outputs = model.generate( **inputs, max_length=max_length, num_return_sequences=1, temperature=0.7, do_sample=True, pad_token_id=tokenizer.pad_token_id ) # Decode and return the response response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response def main(): print("Loading model and tokenizer...") model, tokenizer = load_model_and_tokenizer() print("\nModel loaded! Enter text to convert to triples (type 'quit' to exit):") while True: user_input = input("\nEnter text: ") if user_input.lower() == 'quit': break print("\nGenerating triples...") response = generate_triples(model, tokenizer, user_input) print("\nResponse:", response) if __name__ == "__main__": main() ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 3 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 3 - total_train_batch_size: 27 - total_eval_batch_size: 3 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Framework versions - Transformers 4.48.1 - Pytorch 2.1.2+cu121 - Datasets 2.21.0 - Tokenizers 0.21.0 ## **Cite Our [Paper](https://arxiv.org/abs/2502.10996)** ``` @misc{jiang2025rasretrievalandstructuringknowledgeintensivellm, title={RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation}, author={Pengcheng Jiang and Lang Cao and Ruike Zhu and Minhao Jiang and Yunyi Zhang and Jimeng Sun and Jiawei Han}, year={2025}, eprint={2502.10996}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.10996}, } ```