File size: 3,457 Bytes

---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- Qwen2
- 1.5B
- 4-bit
- sft
- python
datasets: NuclearAi/Nuke-Python-Verse
base_model: Qwen/Qwen2-1.5B-Instruct
pipeline_tag: text-generation
---

# Uploaded  model

- **Developed by :** [NuclearAi](https://huggingface.co/NuclearAi)
- **License :** apache-2.0
- **Launch date :** Saturday 15 June 2024
- **Base Model :** Qwen/Qwen2-1.5B-Instruct
- **Special Thanks :** To [Unsloth](https://unsloth.ai/)

# Dataset used for training !

We Used : [NuclearAi/Nuke-Python-Verse](https://huggingface.co/datasets/NuclearAi/Nuke-Python-Verse) To Finetune *Qwen2-1.5B-Instruct Model* on a Large Amount of Dataset of **240,888** Unique lines of Python Codes Scraped from Publicly Available Datasets !


# Here is the code to run it in 4-bit Quantization method !

```python
```python
#!pip install transformers torch accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"  # Use GPU if available, else fallback to CPU

# Configure for 4-bit quantization using bitsandbytes
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,     # Enable 4-bit quantization
    bnb_4bit_use_double_quant=True,  # Use double quantization
    bnb_4bit_compute_dtype=torch.float16  # Use float16 computation for improved performance
)

# Load the model with the specified configuration
model = AutoModelForCausalLM.from_pretrained(
    "NuclearAi/Hyper-X-Qwen2-1.5B-It-Python",
    quantization_config=bnb_config,  # Apply the 4-bit quantization configuration
    torch_dtype="auto",              # Automatic selection of data type
    device_map="auto" if device == "cuda" else None  # Automatically select the device for GPU, or fallback to CPU
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("NuclearAi/Hyper-X-Qwen2-1.5B-It-Python")

# Initialize a text streamer for streaming the output
streamer = TextStreamer(tokenizer)

# Function to generate a response from the model based on the user's input
def generate_response(user_input):
    # Tokenize the user input
    input_ids = tokenizer.encode(user_input, return_tensors="pt").to(device)
    
    # Generate the model's response with streaming enabled
    generated_ids = model.generate(
        input_ids,
        max_new_tokens=128,
        pad_token_id=tokenizer.eos_token_id,  # Handle padding for generation
        streamer=streamer                     # Use the streamer for real-time token output
    )
    
    # Decode the response from token IDs to text
    response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    return response.strip()

# Start the conversation loop
print("You can start chatting with the model. Type 'exit' to stop the conversation.")
while True:
    # Get the user's input
    user_input = input("You: ")
    
    # Check if the user wants to exit the conversation
    if user_input.lower() in ["exit", "quit", "stop"]:
        print("Ending the conversation. Goodbye!")
        break
    
    # Generate the model's response
    print("Assistant: ", end="", flush=True)  # Prepare to print the response
    response = generate_response(user_input)
    
    # The TextStreamer already prints the response token by token, so we just print a newline
    print()  # Ensure to move to the next line after the response is printed
```
```