blackhole33
/

llama-3-8b-bnb-4bit

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-3-8b-bnb-4bit / README.md

blackhole33's picture

Update README.md

67643a3 verified 4 months ago

|

history blame contribute delete

No virus

1.8 kB

	---
	language:
	- uz
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	base_model: llama-3-8b-bnb-4bit
	---

	# Uploaded model

	# Usage model.

	```
	import gradio as gr
	from unsloth import FastLanguageModel

	# Load your pre-trained model
	max_seq_length = 2048
	dtype = None
	load_in_4bit = True

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name="blackhole33/llama-3-8b-bnb-4bit",
	max_seq_length=max_seq_length,
	dtype=dtype,
	load_in_4bit=load_in_4bit,
	)

	FastLanguageModel.for_inference(model) # Enable native 2x faster inference

	# Alpaca prompt template
	alpaca_prompt = """Quyida vazifani tavsiflovchi ko'rsatma mavjud bo'lib, u qo'shimcha kontekstni ta'minlaydigan kiritish bilan bog'langan. So'rovni to'g'ri to'ldiradigan javob yozing.

	### Instruction:
	{}

	### Response:
	{}"""

	# Function to generate response
	def generate_response(instruction):
	inputs = tokenizer(
	[
	alpaca_prompt.format(
	instruction, # instruction
	"" # output - leave this blank for generation!
	)
	],
	return_tensors="pt",
	).to("cuda")

	outputs = model.generate(**inputs, max_new_tokens=250, use_cache=True)
	res = tokenizer.batch_decode(outputs, skip_special_tokens=True)
	return res[0]

	# Gradio interface
	interface = gr.Interface(
	fn=generate_response,
	inputs=[
	gr.Textbox(lines=2, placeholder="Question"),
	],
	outputs="text",
	title="Uzbek Language Model Interface",
	description="Enter an instruction and context to get a response from the model.",
	)

	# Launch the interface
	interface.launch(share=True)


	```

	- Developed by: blackhole33
	- License: apache-2.0
	- Finetuned from model : llama-3-8b-bnb-4bit