Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,14 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
base_model:
|
6 |
-
- pszemraj/flan-t5-large-grammar-synthesis
|
7 |
-
pipeline_tag: text2text-generation
|
8 |
-
|
|
|
|
|
|
|
9 |
|
10 |
# flan-t5-large-grammar-synthesis - GGUF
|
11 |
|
@@ -14,4 +17,38 @@ GGUF files for [flan-t5-large-grammar-synthesis](https://huggingface.co/pszemraj
|
|
14 |
|
15 |
This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
|
16 |
|
17 |
-
Refer to the original repo for more details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- pszemraj/flan-t5-large-grammar-synthesis
|
7 |
+
pipeline_tag: text2text-generation
|
8 |
+
tags:
|
9 |
+
- grammar
|
10 |
+
- spelling
|
11 |
+
---
|
12 |
|
13 |
# flan-t5-large-grammar-synthesis - GGUF
|
14 |
|
|
|
17 |
|
18 |
This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
|
19 |
|
20 |
+
Refer to the original repo for more details.
|
21 |
+
|
22 |
+
## Usage
|
23 |
+
|
24 |
+
You can use the GGUFs with [llamafile](https://github.com/Mozilla-Ocho/llamafile) (or llama-cli) like this:
|
25 |
+
|
26 |
+
```
|
27 |
+
llamafile.exe -m grammar-synthesis-Q6_K.gguf --temp 0 -p "There car broke down so their hitching a ride to they're class."
|
28 |
+
```
|
29 |
+
|
30 |
+
and it will output the corrected text:
|
31 |
+
|
32 |
+
```
|
33 |
+
system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
34 |
+
sampling:
|
35 |
+
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
|
36 |
+
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
|
37 |
+
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
|
38 |
+
sampling order:
|
39 |
+
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
40 |
+
generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0
|
41 |
+
|
42 |
+
|
43 |
+
The car broke down so they had to take a ride to school. [end of text]
|
44 |
+
|
45 |
+
|
46 |
+
llama_print_timings: load time = 782.21 ms
|
47 |
+
llama_print_timings: sample time = 0.23 ms / 16 runs ( 0.01 ms per token, 68376.07 tokens per second)
|
48 |
+
llama_print_timings: prompt eval time = 85.08 ms / 19 tokens ( 4.48 ms per token, 223.33 tokens per second)
|
49 |
+
llama_print_timings: eval time = 341.74 ms / 15 runs ( 22.78 ms per token, 43.89 tokens per second)
|
50 |
+
llama_print_timings: total time = 456.56 ms / 34 tokens
|
51 |
+
Log end
|
52 |
+
```
|
53 |
+
|
54 |
+
If you have a GPU, be sure to add `-ngl 9999` to your command to automatically place as many layers as the GPU can handle for faster inference.
|