hipnologo
/

falcon-7b-instruct-qlora-truthful-qa

@@ -3,6 +3,7 @@ library_name: peft
 license: apache-2.0
 datasets:
 - truthful_qa
 language:
 - en
 metrics:
@@ -10,9 +11,11 @@ metrics:
 - precision
 pipeline_tag: text-generation
 widget:
-- text: "How long is a goldfish's memory?"
-- text: "If a public stock price has been rising for years, what is most likely to happen in the next year?"
-- text: "Given a fair coin, which coin toss sequence is more likely: HHHHH or HTTHT?"
 ---
 # Fine-tuned Falcon-7B-Instruct Model for Truthful-QA
@@ -94,13 +97,13 @@ The following `bitsandbytes` quantization config was used during training:
 The fine-tuned model was evaluated and here are the results:
-Train_runtime: 19.0818
-Train_samples_per_second: 52.406
-Train_steps_per_second: 0.524
-Total_flos: 496504677227520.0
-Train_loss: 2.0626144886016844
-Epoch: 5.71
-Step: 10
 ## Model Architecture
@@ -153,30 +156,54 @@ PeftModelForCausalLM(
 This model is designed for Q&A tasks. Here is how you can use it:
 ```Python
-from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
-model = "hipnologo/falcon-7b-instruct-qlora-truthful-qa"
-tokenizer = AutoTokenizer.from_pretrained(model)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    tokenizer=tokenizer,
-    torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
-    device_map="auto",
-)
-sequences = pipeline(
-   "If a public stock price has been rising for years, what is most likely to happen in the next year?",
-    max_length=200,
-    do_sample=True,
-    top_k=10,
-    num_return_sequences=1,
-    eos_token_id=tokenizer.eos_token_id,
 )
-for seq in sequences:
-    print(f"Result: {seq['generated_text']}")
-```

 license: apache-2.0
 datasets:
 - truthful_qa
+- tiiuae/falcon-refinedweb
 language:
 - en
 metrics:
 - precision
 pipeline_tag: text-generation
 widget:
+- text: How long is a goldfish's memory?
+- text: >-
+    If a public stock price has been rising for years, what is most likely to
+    happen in the next year?
+- text: 'Given a fair coin, which coin toss sequence is more likely: HHHHH or HTTHT?'
 ---
 # Fine-tuned Falcon-7B-Instruct Model for Truthful-QA
 The fine-tuned model was evaluated and here are the results:
+* Train_runtime: 19.0818
+* Train_samples_per_second: 52.406
+* Train_steps_per_second: 0.524
+* Total_flos: 496504677227520.0
+* Train_loss: 2.0626144886016844
+* Epoch: 5.71
+* Step: 10
 ## Model Architecture
 This model is designed for Q&A tasks. Here is how you can use it:
 ```Python
+from peft import PeftModel, PeftConfig
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
 import transformers
 import torch
+import json
+model_id = "hipnologo/falcon-7b-instruct-qlora-truthful-qa" # sharded model by vilsonrodrigues
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+tokenizer.pad_token = tokenizer.eos_token
+model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, trust_remote_code=True)
+from peft import LoraConfig, get_peft_model
+config = LoraConfig(
+    r=16,
+    lora_alpha=32,
+    target_modules=["query_key_value"],
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM"
 )
+model = get_peft_model(model, config)
+from IPython.display import display, Markdown
+questions = ["If a public stock price has been rising for years, what is most likely to happen in the next year?",
+             "How long is a goldfish's memory?",
+             "Given a fair coin, which coin toss sequence is more likely: HHHHH or HTTHT?"]
+for example_text in questions:
+    encoding = tokenizer(example_text, return_tensors="pt").to("cuda:0")
+    output = model.generate(input_ids=encoding.input_ids,
+                            attention_mask=encoding.attention_mask,
+                            max_new_tokens=100,
+                            do_sample=True,
+                            temperature=0.7,
+                            eos_token_id=tokenizer.eos_token_id,
+                            top_k = 0)
+    answer = tokenizer.decode(output[0], skip_special_tokens=True)
+    display(Markdown(f"**Question:**\n\n{example_text}\n\n**Answer:**\n\n{answer}\n\n---\n"))
+```