--- license: apple-ascl --- # Model Card for Fine-tuned OpenELM-270M ## Model Details ### Basic Information - **Model Name:** Fine-tuned OpenELM-270M - **Model Type:** Causal Language Model - **Base Model:** Apple/OpenELM-270M-Instruct - **Model Architecture:** Transformer-based language model - **Parameters:** 270 million - **Language(s):** English ### Model Architecture OpenELM-270M is based on the transformer architecture, specifically designed for efficient language modeling. It uses a 270 million parameter configuration, which is relatively small compared to many modern language models. ## Intended Use This model is fine-tuned for general conversation and task completion. It is designed to engage in dialogue and provide information across a wide range of topics. ### Primary intended uses - General conversation - Question answering - Task completion ### Out-of-scope use cases - Generation of harmful or biased content - Critical decision-making without human oversight - Tasks requiring real-time or post-training knowledge ## Training Data The model was fine-tuned on a synthetic dataset derived from GPT-4 (for user queries) and Claude 3 Opus and Claude 3.5 Sonnet (for responses). This high-quality synthetic dataset covers a wide range of topics and task types. ### Dataset characteristics - **Type:** Synthetic, instruction-following conversations - **Domains covered:** Diverse, covering multiple areas of knowledge ## Performance and Limitations ### Performance Metrics - **Training Loss:** Final loss of 1.3721 after 3 epochs - **Real-world Use** Seems to struggle with maintaining conversational context on CUDA? CPU produces much more coherent results compared to CUDA. ### Limitations and Current Shortcomings - The model's knowledge is limited to its training data and cut-off date. - It may occasionally produce inaccurate or inconsistent information. - The model's performance on tasks requiring recent knowledge or specialized expertise may be limited. - Current issues include: - Outputting special tokens in responses, which should be invisible to the user. - Generating overly long responses that may be cut off due to context window limitations. - Potential difficulty in maintaining conversation context over multiple turns. - Occasionally generating responses that don't directly address the user's input. ## Ethical Considerations - The model may reflect biases present in its training data. - It should not be used for generating harmful, illegal, or discriminatory content. - Users should be aware that the model can generate plausible-sounding but incorrect information. ## Caveats and Recommendations - Always verify important information produced by the model against reliable sources. - The model should be used as an assistive tool and not for making critical decisions without human oversight. - Regular evaluation and fine-tuning may be necessary to maintain performance and relevance. ## Training Procedure ### Training Hyperparameters - **Number of Epochs:** 3 - **Learning Rate:** Started higher, ended at 1.5815959741193386e-07 ### Training Hardware - **Hardware Type:** CPU (i7-11700) - **Hours of Training:** Approximately 51 hours ### Framework and Tokenizer - **Framework:** PyTorch, Transformers - **Tokenizer:** Uses Llama 3 chat format with special tokens ## Evaluation Results Detailed evaluation results are not available, but the model showed consistent improvement in loss throughout training. ## Quantitative Analyses - **Training Loss Curve:** The loss decreased from initial values around 2.1 to final values around 1.37-1.40, showing consistent improvement across epochs. ## Model Inputs and Outputs - **Input Format:** Uses Llama 3 chat format with the following structure: ``` <|begin_of_text|> <|start_header_id|>system<|end_header_id|>[system_message]<|eot_id|> <|start_header_id|>user<|end_header_id|>[user_input]<|eot_id|> <|start_header_id|>assistant<|end_header_id|> ``` - **Output:** Generated text completions following the assistant's response format ## Technical Specifications - **Context Window:** Initially 2048 tokens, with the potential to be increased to 4096 or 8192 tokens ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_path = "QuietImpostor/OpenELM-270M-Instruct-SonnOpus" model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_path) def generate_response(prompt, max_length=256): inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True) with torch.no_grad(): output = model.generate( **inputs, max_length=max_length, num_return_sequences=1, temperature=0.7, top_p=0.9, do_sample=True ) response = tokenizer.decode(output[0], skip_special_tokens=True) return response.strip() # Example usage system_msg = "You are a helpful AI assistant." user_input = "Hello, how are you?" prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_msg}<|eot_id|><|start_header_id|>user<|end_header_id|>{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>" response = generate_response(prompt) print(response) ```