--- language: - en license: apache-2.0 tags: - text-generation-inference - transformers - Qwen2 - 1.5B - 4-bit - sft - python datasets: NuclearAi/Nuke-Python-Verse base_model: Qwen/Qwen2-1.5B-Instruct pipeline_tag: text-generation --- # Uploaded model - **Developed by :** [NuclearAi](https://huggingface.co/NuclearAi) - **License :** apache-2.0 - **Launch date :** Saturday 15 June 2024 - **Base Model :** Qwen/Qwen2-1.5B-Instruct - **Special Thanks :** To [Unsloth](https://unsloth.ai/) # Dataset used for training ! We Used : [NuclearAi/Nuke-Python-Verse](https://huggingface.co/datasets/NuclearAi/Nuke-Python-Verse) To Finetune *Qwen2-1.5B-Instruct Model* on a Large Amount of Dataset of **240,888** Unique lines of Python Codes Scraped from Publicly Available Datasets ! # Here is the code to run it in 4-bit Quantization method ! ```python ```python #!pip install transformers torch accelerate bitsandbytes from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer import torch device = "cuda" if torch.cuda.is_available() else "cpu" # Use GPU if available, else fallback to CPU # Configure for 4-bit quantization using bitsandbytes bnb_config = BitsAndBytesConfig( load_in_4bit=True, # Enable 4-bit quantization bnb_4bit_use_double_quant=True, # Use double quantization bnb_4bit_compute_dtype=torch.float16 # Use float16 computation for improved performance ) # Load the model with the specified configuration model = AutoModelForCausalLM.from_pretrained( "NuclearAi/Hyper-X-Qwen2-1.5B-It-Python", quantization_config=bnb_config, # Apply the 4-bit quantization configuration torch_dtype="auto", # Automatic selection of data type device_map="auto" if device == "cuda" else None # Automatically select the device for GPU, or fallback to CPU ) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("NuclearAi/Hyper-X-Qwen2-1.5B-It-Python") # Initialize a text streamer for streaming the output streamer = TextStreamer(tokenizer) # Function to generate a response from the model based on the user's input def generate_response(user_input): # Tokenize the user input input_ids = tokenizer.encode(user_input, return_tensors="pt").to(device) # Generate the model's response with streaming enabled generated_ids = model.generate( input_ids, max_new_tokens=128, pad_token_id=tokenizer.eos_token_id, # Handle padding for generation streamer=streamer # Use the streamer for real-time token output ) # Decode the response from token IDs to text response = tokenizer.decode(generated_ids[0], skip_special_tokens=True) return response.strip() # Start the conversation loop print("You can start chatting with the model. Type 'exit' to stop the conversation.") while True: # Get the user's input user_input = input("You: ") # Check if the user wants to exit the conversation if user_input.lower() in ["exit", "quit", "stop"]: print("Ending the conversation. Goodbye!") break # Generate the model's response print("Assistant: ", end="", flush=True) # Prepare to print the response response = generate_response(user_input) # The TextStreamer already prints the response token by token, so we just print a newline print() # Ensure to move to the next line after the response is printed ``` ```