ZeroXClem's picture
Update README.md
3526d83 verified
|
raw
history blame
7.3 kB
metadata
license: apache-2.0
tags:
  - merge
  - model_fusion
  - TIES
  - Llama3.1
  - crypto
  - blockchain
  - coding_assistant
  - creative_writing
  - roleplaying
  - uncensored
  - latent_diffusion
  - long_context
  - agentic_AI
  - multi_domain
  - research
  - instruction-following
  - technical_reasoning
  - task_generalization
  - AI_tools
  - GPT
base_model:
  - Chainbase-Labs/Theia-Llama-3.1-8B-v1
  - EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO
  - aifeifei798/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored
  - DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
datasets:
  - CoinMarketCap
  - blockchain_projects
  - agentic_code_DPO
libraries: transformers
library_name: transformers

ZeroXClem/Llama3.1-TheiaFire-DarkFusion-8B

Architecture: Llama 3.1 - 8B
Proposed Name: Llama3.1-TheiaFire-DarkFusion-8B
Merge Method: TIES
Merge Date: 10/25/2024
License: Apache 2.0


Model Overview

The Llama3.1-TheiaFire-DarkFusion-8B is a highly specialized fusion of four cutting-edge models, meticulously combined to provide an exceptional balance of technical reasoning, creativity, and uncensored freedom for a variety of use cases. Whether you need advanced coding assistance, blockchain insights, creative roleplaying, or general-purpose AI capabilities, this model delivers state-of-the-art results.

This model was merged using the TIES merge method to ensure optimal blending of layer weights and parameter configurations, resulting in a model that excels in multiple domains.


For optimal results, leave the system prompt blank within LMStudio. The tokenizer seems to struggle under system prompts.

Model Components

The following models were merged to create Llama3.1-TheiaFire-DarkFusion-8B:

  1. Theia-Llama-3.1-8B-v1

    • Purpose: Balances technical vision and crypto capabilities.
    • Training Focus: This model specializes in blockchain data and was trained on a large dataset of crypto whitepapers, research reports, and market data.
    • Unique Feature: Fine-tuned using LoRA for optimized crypto-specific performance.
  2. EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO

    • Purpose: Specialized in agentic reasoning and advanced coding tasks.
    • Unique Feature: This model is equipped with a 128K context window and comes with built-in tools for ReAct, calculator, search, and more.
  3. aifeifei798/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored

    • Purpose: Provides uncensored, creativity-driven responses ideal for writing, role-playing, and in-depth conversations.
    • Unique Feature: Uncensored nature allows for open exploration of creative writing and darker, more complex roleplay scenarios.
  4. DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst

    • Purpose: Enhances performance with latent diffusion model blending.
    • Unique Feature: This model builds upon Llama-3.1’s foundation and improves unseen task generalization with latent diffusion.

Model Specifications

Merge Configuration

# Llama3.1-TheiaFire-DarkFusion-8B Merge Configuration
models:
  - model: Chainbase-Labs/Theia-Llama-3.1-8B-v1
    parameters:
      density: 0.4  # Balancing technical vision and crypto capabilities
      weight: 0.3
  - model: EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO
    parameters:
      density: 0.6  # Giving priority to code-based reasoning and agentic capabilities
      weight: 0.4
  - model: aifeifei798/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored
    parameters:
      density: 0.5  # Focus on creativity and uncensored roleplay flexibility
      weight: 0.2
  - model: DeepAutoAI/ldm_soup_Llama-3.1-8B-Inst
    parameters:
      density: 0.5  # Blending latent diffusion capabilities for unseen tasks
      weight: 0.1

merge_method: ties
base_model: Theia-Llama-3.1-8B-v1
dtype: bfloat16
parameters:
  normalize: true
out_dtype: float16

Intended Use Cases

  1. Crypto Analysis & Blockchain Projects

    • Leverages data from CoinMarketCap and research reports for in-depth analysis of blockchain projects and crypto markets.
    • Ideal for creating blockchain-related content or automating crypto data analysis.
  2. Advanced Coding Assistant

    • Built-in support for agentic behavior such as reasoning and action, making it perfect for AI-driven coding assistance.
    • Handles large-scale coding projects with tools like search and calculator integration.
  3. Creative Writing & Roleplay

    • Uncensored output allows for rich, expressive writing ideal for novels, creative pieces, or roleplay scenarios.
    • Capable of producing nuanced, emotionally complex character responses in roleplaying games or interactive storytelling.
  4. Unseen Task Generalization

    • With the latent diffusion capabilities, this model can handle unseen tasks by learning weight distributions in an adaptive manner, improving performance on novel datasets or tasks.

Performance

  • The model has shown significant improvements in multi-domain reasoning, code generation, and unconstrained creative output.
  • Enhanced task generalization due to latent diffusion model blending techniques.

Model Capabilities

  • Context Window: 128K (capable of handling long-form tasks like novel writing and in-depth research).
  • Agentic Tools: Built-in tools like search and calculator.
  • Safety: While uncensored, responsible prompting is encouraged to ensure the best user experience and ethical usage.

Usage

This model can be used in popular AI libraries like Transformers and Langchain. Below is a basic setup using Transformers:

Example Code

import transformers
import torch

model_id = "Llama3.1-TheiaFire-DarkFusion-8B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are an AI assistant skilled in coding and creative writing."},
    {"role": "user", "content": "Please write me a Python function to compute the factorial of a number."}
]

outputs = pipeline(messages, max_new_tokens=256)
print(outputs[0]["generated_text"][-1])

Limitations

  • Uncensored Output: While this model offers creative freedom, it may produce content that could be considered inappropriate or unsuitable for certain contexts.
  • Bias: As with all language models, this one may reflect inherent biases in the training data. Users are encouraged to review and edit the outputs before use.

Acknowledgments

This model is a collective effort, combining the groundbreaking work from:

  • Chainbase Labs (for Theia-Llama)
  • EpistemeAI (for Fireball Meta-Llama)
  • Aifeifei798 (for DarkIdol)
  • DeepAutoAI (for LDM Soup)

Special thanks to the open-source community and the developers who contributed to the training and fine-tuning of these models.