πŸš€ BitNet-Llama3 (from 8B to 2B) Transformation & Training

This project transforms a Llama3 model from 8B parameters to a BitNet architecture with 2B parameters, applying BitLinear layers. Additionally, the model is trained with a predefined dataset and uploaded to Hugging Face for future use.


Model Description

This is the model card of a πŸ€— transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: ejbejaranos@gmail.com
  • Funded by [optional]: ITCL
  • Shared by [optional]: [More Information Needed]
  • Model type: LLama3 8B Tramsformed to Bitnet
  • Language(s) (NLP): Bitnet
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

  • Repository: ejbejaranos/Bitnet-Llama3-from8BM-now2B

πŸ“„ Description

This repository includes scripts to:

  1. 🎯 Transform a Llama3 model to a BitNet architecture.
  2. πŸ’» Train the model using Hugging Face and Weights & Biases.
  3. πŸš€ Upload the transformed and trained model to Hugging Face for inference and future use.

βš™οΈ Requirements

  • Python 3.8+
  • Pytorch 1.10+
  • Transformers 4.0+
  • Hugging Face Hub API
  • Weights & Biases

🧰 Installation

Make sure you have all required dependencies installed:

pip install torch transformers datasets wandb huggingface_hub

πŸ’₯ How to Use

  1. Using the trained model for inference
from transformers import AutoModelForCausalLM, AutoTokenizer
from utils.bitnet_transformation import replace_linears_in_hf

# Load the BitNet model
model = "ejbejaranos/Bitnet-Llama3-from8BM-now2B"
model = AutoModelForCausalLM.from_pretrained(
    model,
    use_auth_token="YOUR_HF_TOKEN"
)

# Replace BitNet layers for inference
replace_linears_in_hf(model)
tokenizer = AutoTokenizer.from_pretrained("ejbejaranos/Bitnet-Llama3-from8BM-now2B")

# Set up for inference
model.to(device="cuda:0")
prompt = "What is Machine Learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generate_ids = model.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(output)

πŸ§‘β€πŸ”¬ Metrics

image/png

During training, the following metrics will be logged to Weights & Biases:

  • final_loss: 1.4.
  • final_perplexity: 4.2.

🎯 Future Goals

  • Implement additional quantization layers for inference.
  • Test the model on different datasets and contexts.

πŸ“’ Contact

If you have questions, suggestions, or improvements, feel free to open an Issue or contact us through Hugging Face.


Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

πŸ’‘ Acknowledgments

Thanks to Hugging Face and Weights & Biases for providing support and tools.

Downloads last month
6
Safetensors
Model size
2.36B params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ejbejaranos/Bitnet-Llama3-from8BM-now2B

Finetuned
(740)
this model
Quantizations
1 model

Dataset used to train ejbejaranos/Bitnet-Llama3-from8BM-now2B