theeseus-ai's picture
Update README.md
9d3c9aa verified
---
library_name: transformers
tags:
- unsloth
- trl
- sft
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
# Model Card for Critical Thinker
## Model Details
### Model Description
The **Critical Thinker** model is a fine-tuned version of **meta-llama/Llama-3.1-8B-Instruct**, optimized for developing and evaluating **critical thinking** and **investigative reasoning** skills. It is specifically trained on the **Critical Thinking Synthetic Dataset**, which focuses on logical reasoning, forensic investigation, and multi-layered decision-making scenarios.
- **Developed by:** Theeseus AI
- **Funded by [optional]:** Independent Research Grant
- **Shared by:** [Theeseus AI](https://www.linkedin.com/in/theeseus/)
- **Model type:** Transformer-based Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.1-8B-Instruct
### Model Sources
- **Repository:** [Critical Thinker on HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker)
- **Dataset:** [Critical Thinking Dataset](https://huggingface.co/datasets/theeseus-ai/CriticalThinker)
---
## Uses
### Direct Use
- **Critical Thinking Assessments:** Evaluating logical reasoning and problem-solving capabilities.
- **Digital Forensics Investigations:** Testing AI capabilities in analyzing logs, metadata, and cybersecurity incidents.
- **AI Research:** Studying and benchmarking multi-step reasoning and decision-making models.
### Downstream Use
- **Cybersecurity Training Programs:** Training AI models to detect vulnerabilities, analyze logs, and identify attack patterns.
- **Question-Answering Applications:** Developing reasoning-focused QA systems for educational and research tools.
- **AI Decision Support Systems:** Building AI assistants for forensic investigations and cybersecurity monitoring.
### Out-of-Scope Use
- Tasks requiring **real-time decision-making** under high constraints.
- Applications involving **medical diagnosis** or **legal interpretations** without human oversight.
---
## Bias, Risks, and Limitations
### Known Limitations
- May **misinterpret ambiguous evidence** or scenarios that lack sufficient context.
- Performance may degrade when analyzing **multi-lingual inputs** as the training data is primarily in **English**.
- Model output can include **false positives** when assessing evidence in forensic cases.
### Recommendations
- Use outputs as **supporting evidence**, not definitive conclusions.
- Perform **manual validation** for high-stakes decision-making.
- Implement **bias-checking algorithms** when deploying in production environments.
---
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("theeseus-ai/CriticalThinker")
model = AutoModelForCausalLM.from_pretrained("theeseus-ai/CriticalThinker")
input_text = "Investigate unusual logins from multiple IP addresses in a network."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
```
---
## Training Details
### Training Data
The model is fine-tuned on the **Critical Thinking Synthetic Dataset** available at [HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker). The dataset simulates digital forensics, cybersecurity incidents, and logical deduction scenarios.
### Training Procedure
#### Preprocessing
- Cleaned and validated JSONL format.
- Schema enforcement to ensure consistency.
#### Hyperparameters
- **Optimizer:** AdamW
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Precision:** bfloat16 (bf16) mixed precision
#### Compute Resources
- **Hardware:** NVIDIA A100 (80 GB) GPU
- **Training Time:** ~24 hours
---
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The dataset was split into **80% training**, **10% validation**, and **10% testing** sets.
#### Metrics
- **Accuracy:** Measures correctness of predictions.
- **F1 Score:** Evaluates precision and recall balance.
- **Log-likelihood Loss:** Assesses model confidence and robustness.
### Results
- **Accuracy:** 89.4%
- **F1 Score:** 88.7%
- **Log-likelihood Loss:** 0.21
#### Summary
The model demonstrates high performance in **logical deduction tasks** and **multi-choice reasoning problems**. It is particularly effective in identifying **patterns in digital forensics scenarios**.
---
## Environmental Impact
Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute):
- **Hardware Type:** NVIDIA A100 GPU
- **Hours Used:** 24
- **Cloud Provider:** AWS
- **Compute Region:** US-East
- **Carbon Emitted:** ~30 kg CO2eq
---
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** Transformer-based autoregressive model (decoder-only).
- **Objective:** Minimize cross-entropy loss for sequence prediction.
### Compute Infrastructure
- **Hardware:** NVIDIA A100 (80 GB) GPUs.
- **Frameworks:** PyTorch and HuggingFace Transformers.
---
## Citation
If you use this model, please cite it as follows:
```
@model{critical_thinker,
author = {Theeseus AI},
title = {Critical Thinker Model},
year = {2024},
version = {1.0},
publisher = {HuggingFace Models},
url = {https://huggingface.co/datasets/theeseus-ai/CriticalThinker}
}
```
---
## Contact
For questions or contributions, contact:
- **Email:** theeseus@protonmail.com
- **LinkedIn:** [Theeseus](https://www.linkedin.com/in/theeseus/)