doberst commited on
Commit
492f901
1 Parent(s): ee3ab5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -49,12 +49,29 @@ The first BLING models have been trained for common RAG scenarios, specifically:
49
  without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
50
 
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Bias, Risks, and Limitations
53
 
54
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
55
 
56
  Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
57
 
 
 
58
 
59
  ## How to Get Started with the Model
60
 
@@ -97,7 +114,7 @@ BLING models are built on top of EleutherAI/Pythia base - please see citation fo
97
 
98
  Darren Oberst & llmware team
99
 
100
- Please reach out anytime if you are interested in this project and would like to participate and work with us!
101
 
102
 
103
 
 
49
  without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
50
 
51
 
52
+ ### Benchmark Tests
53
+
54
+ Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester][https://www.huggingface.co/llmware/rag_instruct_benchmark_tester]
55
+ Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
56
+
57
+ --Score: 73.25 correct out of 100
58
+ --Not Found Classification: 17.5%
59
+ --Boolean: 29%
60
+ --Math/Logic: 0%
61
+ --Complex Questions (1-5): 1 (Low)
62
+ --Summarization Quality (1-5): 1 (Coherent, extractive)
63
+
64
+ For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
65
+
66
+
67
  ## Bias, Risks, and Limitations
68
 
69
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
70
 
71
  Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
72
 
73
+ This model can be used effective for quick testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
74
+
75
 
76
  ## How to Get Started with the Model
77
 
 
114
 
115
  Darren Oberst & llmware team
116
 
117
+ Please reach out anytime if you are interested in this project.
118
 
119
 
120