Update README.md
Browse files
README.md
CHANGED
@@ -49,12 +49,29 @@ The first BLING models have been trained for common RAG scenarios, specifically:
|
|
49 |
without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
|
50 |
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
## Bias, Risks, and Limitations
|
53 |
|
54 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
55 |
|
56 |
Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
|
57 |
|
|
|
|
|
58 |
|
59 |
## How to Get Started with the Model
|
60 |
|
@@ -97,7 +114,7 @@ BLING models are built on top of EleutherAI/Pythia base - please see citation fo
|
|
97 |
|
98 |
Darren Oberst & llmware team
|
99 |
|
100 |
-
Please reach out anytime if you are interested in this project
|
101 |
|
102 |
|
103 |
|
|
|
49 |
without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
|
50 |
|
51 |
|
52 |
+
### Benchmark Tests
|
53 |
+
|
54 |
+
Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester][https://www.huggingface.co/llmware/rag_instruct_benchmark_tester]
|
55 |
+
Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
|
56 |
+
|
57 |
+
--Score: 73.25 correct out of 100
|
58 |
+
--Not Found Classification: 17.5%
|
59 |
+
--Boolean: 29%
|
60 |
+
--Math/Logic: 0%
|
61 |
+
--Complex Questions (1-5): 1 (Low)
|
62 |
+
--Summarization Quality (1-5): 1 (Coherent, extractive)
|
63 |
+
|
64 |
+
For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
|
65 |
+
|
66 |
+
|
67 |
## Bias, Risks, and Limitations
|
68 |
|
69 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
70 |
|
71 |
Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
|
72 |
|
73 |
+
This model can be used effective for quick testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
|
74 |
+
|
75 |
|
76 |
## How to Get Started with the Model
|
77 |
|
|
|
114 |
|
115 |
Darren Oberst & llmware team
|
116 |
|
117 |
+
Please reach out anytime if you are interested in this project.
|
118 |
|
119 |
|
120 |
|