NuclearAi
/

Hyper-X-Qwen2-1.5B-It-Python

Text Generation

text-generation-inference

4-bit precision

Inference Endpoints

Model card Files Files and versions Community

Hyper-X-Qwen2-1.5B-It-Python / README.md

Nuclear Ai

Update README.md

aefaf12 verified 4 months ago

|

history blame contribute delete

No virus

3.46 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- Qwen2
	- 1.5B
	- 4-bit
	- sft
	- python
	datasets: NuclearAi/Nuke-Python-Verse
	base_model: Qwen/Qwen2-1.5B-Instruct
	pipeline_tag: text-generation
	---

	# Uploaded model

	- Developed by : [NuclearAi](https://huggingface.co/NuclearAi)
	- License : apache-2.0
	- Launch date : Saturday 15 June 2024
	- Base Model : Qwen/Qwen2-1.5B-Instruct
	- Special Thanks : To [Unsloth](https://unsloth.ai/)

	# Dataset used for training !

	We Used : [NuclearAi/Nuke-Python-Verse](https://huggingface.co/datasets/NuclearAi/Nuke-Python-Verse) To Finetune Qwen2-1.5B-Instruct Model on a Large Amount of Dataset of 240,888 Unique lines of Python Codes Scraped from Publicly Available Datasets !


	# Here is the code to run it in 4-bit Quantization method !

	```python
	```python
	#!pip install transformers torch accelerate bitsandbytes
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
	import torch

	device = "cuda" if torch.cuda.is_available() else "cpu" # Use GPU if available, else fallback to CPU

	# Configure for 4-bit quantization using bitsandbytes
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True, # Enable 4-bit quantization
	bnb_4bit_use_double_quant=True, # Use double quantization
	bnb_4bit_compute_dtype=torch.float16 # Use float16 computation for improved performance
	)

	# Load the model with the specified configuration
	model = AutoModelForCausalLM.from_pretrained(
	"NuclearAi/Hyper-X-Qwen2-1.5B-It-Python",
	quantization_config=bnb_config, # Apply the 4-bit quantization configuration
	torch_dtype="auto", # Automatic selection of data type
	device_map="auto" if device == "cuda" else None # Automatically select the device for GPU, or fallback to CPU
	)

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("NuclearAi/Hyper-X-Qwen2-1.5B-It-Python")

	# Initialize a text streamer for streaming the output
	streamer = TextStreamer(tokenizer)

	# Function to generate a response from the model based on the user's input
	def generate_response(user_input):
	# Tokenize the user input
	input_ids = tokenizer.encode(user_input, return_tensors="pt").to(device)

	# Generate the model's response with streaming enabled
	generated_ids = model.generate(
	input_ids,
	max_new_tokens=128,
	pad_token_id=tokenizer.eos_token_id, # Handle padding for generation
	streamer=streamer # Use the streamer for real-time token output
	)

	# Decode the response from token IDs to text
	response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	return response.strip()

	# Start the conversation loop
	print("You can start chatting with the model. Type 'exit' to stop the conversation.")
	while True:
	# Get the user's input
	user_input = input("You: ")

	# Check if the user wants to exit the conversation
	if user_input.lower() in ["exit", "quit", "stop"]:
	print("Ending the conversation. Goodbye!")
	break

	# Generate the model's response
	print("Assistant: ", end="", flush=True) # Prepare to print the response
	response = generate_response(user_input)

	# The TextStreamer already prints the response token by token, so we just print a newline
	print() # Ensure to move to the next line after the response is printed
	```
	```