pszemraj commited on
Commit
535bdaf
1 Parent(s): 907e903

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -9
README.md CHANGED
@@ -1,11 +1,14 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - pszemraj/flan-t5-large-grammar-synthesis
7
- pipeline_tag: text2text-generation
8
- ---
 
 
 
9
 
10
  # flan-t5-large-grammar-synthesis - GGUF
11
 
@@ -14,4 +17,38 @@ GGUF files for [flan-t5-large-grammar-synthesis](https://huggingface.co/pszemraj
14
 
15
  This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
16
 
17
- Refer to the original repo for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - pszemraj/flan-t5-large-grammar-synthesis
7
+ pipeline_tag: text2text-generation
8
+ tags:
9
+ - grammar
10
+ - spelling
11
+ ---
12
 
13
  # flan-t5-large-grammar-synthesis - GGUF
14
 
 
17
 
18
  This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
19
 
20
+ Refer to the original repo for more details.
21
+
22
+ ## Usage
23
+
24
+ You can use the GGUFs with [llamafile](https://github.com/Mozilla-Ocho/llamafile) (or llama-cli) like this:
25
+
26
+ ```
27
+ llamafile.exe -m grammar-synthesis-Q6_K.gguf --temp 0 -p "There car broke down so their hitching a ride to they're class."
28
+ ```
29
+
30
+ and it will output the corrected text:
31
+
32
+ ```
33
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
34
+ sampling:
35
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
36
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
37
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
38
+ sampling order:
39
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
40
+ generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0
41
+
42
+
43
+ The car broke down so they had to take a ride to school. [end of text]
44
+
45
+
46
+ llama_print_timings: load time = 782.21 ms
47
+ llama_print_timings: sample time = 0.23 ms / 16 runs ( 0.01 ms per token, 68376.07 tokens per second)
48
+ llama_print_timings: prompt eval time = 85.08 ms / 19 tokens ( 4.48 ms per token, 223.33 tokens per second)
49
+ llama_print_timings: eval time = 341.74 ms / 15 runs ( 22.78 ms per token, 43.89 tokens per second)
50
+ llama_print_timings: total time = 456.56 ms / 34 tokens
51
+ Log end
52
+ ```
53
+
54
+ If you have a GPU, be sure to add `-ngl 9999` to your command to automatically place as many layers as the GPU can handle for faster inference.