Llama3-8B-ITCL-Bitnet1.6B 🚀

Description 📜

Llama3-8B-ITCL-Bitnet1.6B is an experimental LLM model transformed from Llama3, optimized with bitlinear layers to enhance memory efficiency and inference speed. This model is designed for natural language processing tasks and is particularly useful in environments where resource-efficient performance is required. 🌟

Features 🌈

Model Size: 8B parameters 🧠
Architecture: BitNet 🏗️
Bitlinear Layers: Reduces weights to values of 1, 0, and -1. ➖
Optimized for: Fast inference and memory efficiency ⚡

Architecture

Model size: 1.604B parameters
2024-10-08 14:53:07 - INFO - 🔢 Number of parameters in the model after extracting weights: 1
2024-10-08 14:53:07 - INFO - 📏 Reduced model structure:
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-5): 6 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (k_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (v_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (o_proj): BitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): BitLinear(in_features=4096, out_features=2048, bias=False)
          (up_proj): BitLinear(in_features=4096, out_features=2048, bias=False)
          (down_proj): BitLinear(in_features=2048, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Identity()
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
)

Requirements 📦

Make sure you have the following libraries installed:

pip install transformers torch huggingface_hub wandb coloredlogs

You can install these dependencies using pip! 🎉

Usage 🔍

Loading the Model

To load the model, you can simply run the following code:

Para usar este modelo, puedes cargarlo desde Hugging Face con el siguiente código:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.models.llama.modeling_llama import *
import torch
from torch import nn
import torch.nn.functional as F
import coloredlogs
import logging


coloredlogs.install(level='INFO', fmt='%(asctime)s - %(levelname)s - %(message)s', logger=logging.getLogger())
logger = logging.getLogger(__name__)




HF_TOKEN = "you_api_key_here"

model = "ejbejaranos/Llama3-8B-ITCL-Bitnet1.6B"

# Load a pretrained BitNet model
tokenizer = AutoTokenizer.from_pretrained(model)

model = AutoModelForCausalLM.from_pretrained(
    model,
    token=HF_TOKEN
)

# Establece el pad_token_id
model.config.pad_token_id = tokenizer.eos_token_id

def count_parameters(model):
    # Calculate the number of parameters in billions
    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad) / 10**9
    print(f"Model size: {num_params:.3f}B parameters")
    return int(num_params)

def activation_quant(x):
    scale = 127.0 / x.abs().max(dim=-1, keepdim=True).values.clamp_(min=1e-5)
    y = (x * scale).round().clamp_(-128, 127)
    y = y / scale
    return y

def weight_quant(w):
    scale = 1.0 / w.abs().mean().clamp_(min=1e-5)
    u = (w * scale).round().clamp_(-1, 1)
    u = u / scale
    return u

class BitLinear(nn.Linear):
    def forward(self, x):
        w = self.weight  # a weight tensor with shape [d, k]
        x = x.to(w.device)
        RMSNorm = LlamaRMSNorm(x.shape[-1]).to(w.device)
        x_norm = RMSNorm(x)
        x_quant = x_norm + (activation_quant(x_norm) - x_norm).detach()
        w_quant = w + (weight_quant(w) - w).detach()
        y = F.linear(x_quant, w_quant)
        return y

def convert_to_bitnet(model, copy_weights):
    for name, module in model.named_modules():
        if isinstance(module, LlamaSdpaAttention) or isinstance(module, LlamaMLP):
            for child_name, child_module in module.named_children():
                if isinstance(child_module, nn.Linear):
                    bitlinear = BitLinear(child_module.in_features, child_module.out_features, child_module.bias is not None).to(device="cuda:0")
                    if copy_weights:
                        bitlinear.weight = child_module.weight
                        if child_module.bias is not None:
                            bitlinear.bias = child_module.bias
                    setattr(module, child_name, bitlinear)
        elif isinstance(module, LlamaDecoderLayer):
            for child_name, child_module in module.named_children():
                if isinstance(child_module, LlamaRMSNorm) and child_name == "input_layernorm":
                    setattr(module, child_name, nn.Identity().to(device="cuda:0"))

convert_to_bitnet(model, copy_weights=True)
model.to(device="cuda:0")


logger.info(f"🔢 Number of parameters in the model after extracting weights: {count_parameters(model)}")
logger.info(f"📏 Reduced model structure:\n{model}")





prompt = "What is the color of sky?"
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
inputs['attention_mask'] = inputs['input_ids'] != model.config.pad_token_id

generate_ids = model.generate(inputs.input_ids, attention_mask=inputs['attention_mask'], max_length=250)
decoded_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(decoded_output[0])  # Print the generated response

Performing Inference

Generate text using the model to unleash its power! 💬✨

- "What is the color of sky?"

It's a question that has been debated since the last days of the Soviet Union. But what is the color of the sky? And what is the color of the sky? In this blog post, we will explore the meaning of the color of the sky, and how it relates to the broader context of the Soviet Union.

The color of the sky is a measure of the brightness of the sky. It is measured in degrees Fahrenheit (°C), which is measured in degrees Celsius (°C). The color of the sky is measured in degrees Fahrenheit (°C), with the average value of the atmosphere being measured in degrees Fahrenheit (°C). This is measured in degrees Fahrenheit (°C), which is measured in degrees Celsius (°C)."

Do not just list concepts, but develop each one in detail before moving to the next, as we prioritize depth of understanding and comprehensive exploration of the subject matter over breadth. Focus on:

- Rigor: Ensure in-depth coverage of the concepts/sections.
- Engagement: Write with an academic, professional and engaging tone that captivates interest.

- How do you ensure the security of your APIs?

It's essential to understand that there are different ways to configure your system, including setting up a secure authentication system, configuring access controls, and configuring access controls. This article aims to provide a comprehensive overview of these steps while highlighting their significance in computer programming and web development.

**Step 1: Define Your Security Policy**
Before diving into configuring access control, it's crucial to define your security policy. This includes defining roles, responsibilities, and liabilities. A well-defined security policy helps ensure that your organization understands and respects your privacy rights. Key components include:

* **Identity Provider** (IdP): A type of service provider that provides access to a user's private data. They offer various services such as encryption, access controls, and access control.
* **Security Provider** (SP): A type of service provider that offers a range of security features, including encryption, authentication, and authorization. They also provide access controls and authentication mechanisms.
* **Security Provider** (SP): A type of service provider that provides security and authentication for your users. They provide access control and authentication mechanisms to protect your data.

- How do you stay updated with the latest trends in deep learning?

The following sections provide a comprehensive guide to help you stay updated on the latest developments in deep learning.

Step 1: Familiarize Yourself with the Basics of Deep Learning
Before diving into the world of deep learning, take some time to understand the basics of deep learning. This includes knowing the basics of deep learning, including its importance in deep learning, and the steps required to master deep learning. For beginners, this might involve reading a book, watching a video, or reading a book. Understanding these foundational concepts will enable you to grasp the basics of deep learning effectively.

Key Tip: Keep your language and content simple, focusing on essential skills rather than technical terms. This will help you better understand the material and engage more effectively with the material.

Step 2: Learn About Deep Learning
Deep learning is a type of deep learning that involves acquiring knowledge in a specific area of interest. Deep learning, on the other hand, refers to the process of acquiring knowledge in a specific area of interest. It is a type of deep learning that emphasizes deep learning and”

The story should include: 
- Niche interests or humor: dive into specific hobbies, interests, or humorous situations

- What role does explainability play in your AI solutions?
Are there any limitations or tradeoffs associated with AI implementation? These questions warrant further exploration.

In conclusion, AI holds immense promise for transforming various aspects of our lives, from healthcare to entertainment to social sciences.
While it's essential to recognize the potential benefits of AI, it also presents challenges related to privacy, bias, and ethical considerations.
By staying informed and engaged, we can harness the power of AI responsibly and effectively.
 After all, every great AI tool deserves to be used responsibly, regardless of its size or scope.
So let's keep exploring, questioning, and learning! Together, we can harness the power of AI to improve healthcare and make a difference in the world.
Happy coding! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟! 🌟!

Training 🏋️

To train the model, configure your settings and implement your training logic. 🛠️

Contributions 🤝

If you would like to contribute to this project, please follow these steps:

Fork the repository. 🍴
Create your branch (git checkout -b feature-new-feature). 🌿
Make your changes and commit. 📅
Push to the branch. 📤
Open a Pull Request. 📬

License 📄

This project is licensed under the MIT License. See the LICENSE file for details.

Contact 📫

For questions or suggestions, feel free to reach out to me:

Email: edison.bejarano@itcl.es
GitHub: ejbejaranos 🌐

ITCL
/

Llama3-8B-ITCL-Bitnet1.6B