llmware
/

bling-falcon-1b-0.1

@@ -12,22 +12,21 @@ BLING models are fine-tuned with distilled high-quality custom instruct datasets
 the objective of providing a high-quality Instruct model that is 'inference-ready' on a CPU laptop even
 without using any advanced quantization optimizations.
-### **PERFORMANCE on BASIC RAG TEST DATASET**
-| Model                 |  Params (B) |	Sourcing |	GPU/CPU	 | Output Tokens | Out as % of Input | Process Time (secs) | Score (0-100) |
-| :----------           | :--------:  |  :----:  | :-----:   | :---------:   | :-------:         | :--------:          | :-------:     |
-| gpt-4	                |   <=1000	  | Closed   | Multi-GPU | 2665	         | 10.53%	         | 183.8	           |    100        |
-| gpt-3.5-turbo-instruct|	<=175	  | Closed	 | Multi-GPU |	2621	     | 11.49%	         | 62.7	               |    100        |
-| claude-instant-v1	    |   <=50	  | Closed	 | Multi-GPU |	6337	     | 26.50%	         |  154	               |    100        |
-| aib-read-gpt	        |   7	      | Closed   | GPU	     |  1964	     |  9.30%            |	114	               |     96        |
-| **bling_falcon-1b-0.1**	|   **1.3**	      | **Open**	 | **CPU**	     |  **3204**	     | **14.55%**         |  **696**               |     **77**        |
-| bling_pythia-1.4b-0.1	|   1.4	      | Open	 | CPU	     |  2589	     | 11.75%	         |  593.5	           |     65        |
-| bling_pythia-1b-0.1	|   1.0	      | Open     |	CPU	     | 2753	         | 12.49%	         |  428	               |     59        |
-| bling_cerebras-1.3b   |	1.3	      | Open     |	CPU	     | 3202	         | 20.01%	         |  690.1	           |     52        |
-| bling_pythia_410m	    |  0.41	      |  NA	     |  CPU	     |  2349	     |  10.66%	         |  189	               |     36        |
-| bling_cerebras_590m	|  0.59	      |  NA	     |  CPU	     |  4407	     |   20.01%	         |  400.8	           |     30        |
-For more details on this evaluation, please see the dataset: **llmware/rag_instruct_test_dataset_0.1** and [BLOG](https://medium.com/@darrenoberst/evaluating-llm-performance-in-rag-instruct-use-cases-083dc272a31d)
 ### Model Description
@@ -115,7 +114,7 @@ This BLING model was built on top of a Falcon model base - for more information
 Darren Oberst & llmware team
-Please reach out anytime if you are interested in this project and would like to participate and work with us!

 the objective of providing a high-quality Instruct model that is 'inference-ready' on a CPU laptop even
 without using any advanced quantization optimizations.
+### Benchmark Tests
+Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
+Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
+--**Accuracy Score**:  **80.25** correct out of 100
+--Not Found Classification:  40.0%
+--Boolean:  41.5%
+--Math/Logic:  7.5%
+--Complex Questions (1-5):  1 (Low)
+--Summarization Quality (1-5):  3 (Coherent, extractive)
+--Hallucinations:  No hallucinations observed in test runs.
+For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
 ### Model Description
 Darren Oberst & llmware team
+Please reach out anytime if you are interested in this project!