File size: 5,338 Bytes
31e5906 8851f5f 0edba57 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f 31e5906 8851f5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
license: wtfpl
datasets:
- Biddls/Onion_News
- Self-GRIT/wikitext-2-raw-v1-preprocessed
language:
- en
metrics:
- f1
- accuracy
- precision
- perplexity
base_model:
- Wonder-Griffin/TraXL
library_name: transformers
---
TraXLMistral
Created by: Morgan Griffin & WongrifferousAI (Wonder-Griffin)
#Model Description
TraXLMistral is a custom language model based on the GPT-2 architecture with additional enhancements for various tasks including causal language modeling, sequence classification, and question answering. The model incorporates several advanced techniques such as sparse attention, memory-augmented neural networks (MANN), adaptive computation time (ACT), and latent space clustering, making it suitable for both reasoning and general-purpose text generation.
#Key Features:
Sparse Attention: Efficient attention mechanism inspired by Mistral, focusing computational resources on important elements in the sequence.
Memory-Augmented Neural Networks (MANN): Enhances model capacity by adding external memory to better handle long-term dependencies and complex reasoning tasks.
Adaptive Computation Time (ACT): Dynamically adjusts the number of computation steps based on the complexity of the input.
Latent Space Clustering: Clusters latent representations for improved interpretability and task-specific adjustments.
Logical Transformer Layer: Improves the model's reasoning capabilities by integrating logical transformations.
Intended Uses & Limitations
#Use Cases:
Text Generation: Generating coherent and contextually relevant text in a wide range of domains, including conversational agents, story generation, and creative writing.
Question Answering: Providing accurate and concise answers to natural language questions.
Sequence Classification: Classification of text into predefined categories such as sentiment analysis, document categorization, or other NLP tasks.
Conversational AI: Suitable for applications requiring interactive and context-aware conversation.
#Limitations:
This model may require additional fine-tuning for domain-specific tasks where the input data differs significantly from the training data.
Due to the use of sparse attention and memory modules, the model may require more resources (GPU memory) compared to simpler architectures.
Training Procedure
The model was trained using the Wikitext-raw-01 dataset (details needed) and fine-tuned for various tasks such as causal language modeling, question answering, and sequence classification. #Training Hyperparameters:
Learning Rate: 5e-05
Train Batch Size: 8
Eval Batch Size: 8
Optimizer: Adam (betas = (0.9, 0.999), epsilon = 1e-08)
LR Scheduler: Linear
Training Steps: 100,000
Seed: 42
#Training Environment:
Transformers version: 4.45.0.dev0
PyTorch version: 2.4.0+cu124
Datasets version: 2.20.0
Tokenizers version: 0.19.1
GPU: The model is trained using GPU acceleration, with checks for CUDA availability and multiple GPUs.
Model Architecture
##Configuration:
Model Type: Hybrid Transformer with GPT/Mistral/TransformerXL (Causal LM)
Vocab Size: 50256
Hidden Size: 768
Number of Layers: 4
Number of Attention Heads: 4
Feedforward Expansion Factor: 4
RNN Units: 128
Max Sequence Length: 256
Dropout Rate: 0.1
Sparse Attention: Enabled
Memory Size: 256
Max Computation Steps: 5
Dynamic Routing: Enabled
##Special Modules:
Sparse Attention Layer: Improves efficiency by reducing unnecessary attention computation.
Adaptive Computation Time (ACT): Adjusts computation time based on input complexity.
Memory-Augmented Neural Networks (MANN): Provides external memory to help with long-term dependencies.
Latent Space Clustering: Clusters latent representations for improved task-specific behavior.
Logical Transformer Layer: Improves reasoning and logic-based tasks.
##Supported Tasks:
Causal Language Modeling (causal_lm): Generates text sequences based on a given prompt.
Question Answering (qa): Extracts relevant answers from a context given a question.
Sequence Classification: Classifies input sequences into one of the predefined labels.
##Evaluation##
The model was evaluated on several NLP benchmarks, but detailed results are pending. The primary metrics used for evaluation include accuracy, F1-score, and precision. Evaluation Metrics:
Accuracy
F1-score
Precision
Intended Users
This model is designed for researchers, developers, and organizations looking to implement advanced NLP models in production. It can be used for building conversational agents, question-answering systems, text generation applications, and more. How to Use Inference Example """"
python
from transformers import BertTokenizerFast, TraXLMistral
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') model = TraXLMistral.from_pretrained('Wonder-Griffin/TraXLMistral')
input_text = "What is the capital of France?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(outputs) """" Limitations and Future Work
Limited Training Data: Future iterations should focus on expanding the dataset and improving performance across different languages and domains.
Memory Usage: Due to its complex architecture, this model might require optimizations for resource-constrained environments.
Acknowledgements
**Created by Morgan Griffin and WongrifferousAI (Wonder-Griffin)** |