pszemraj
/

flan-t5-large-grammar-synthesis-gguf

Text2Text Generation

Inference Endpoints

Model card Files Files and versions

pszemraj commited on Nov 5, 2024

Commit

535bdaf

·

verified ·

1 Parent(s): 907e903

Update README.md

Files changed (1) hide show

README.md +46 -9

README.md CHANGED Viewed

@@ -1,11 +1,14 @@
----
-license: apache-2.0
-language:
-- en
-base_model:
-- pszemraj/flan-t5-large-grammar-synthesis
-pipeline_tag: text2text-generation
----
 # flan-t5-large-grammar-synthesis - GGUF
@@ -14,4 +17,38 @@ GGUF files for [flan-t5-large-grammar-synthesis](https://huggingface.co/pszemraj
 This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
-Refer to the original repo for more details.

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- pszemraj/flan-t5-large-grammar-synthesis
+pipeline_tag: text2text-generation
+tags:
+- grammar
+- spelling
+---
 # flan-t5-large-grammar-synthesis - GGUF
 This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.
+Refer to the original repo for more details.
+## Usage
+You can use the GGUFs with [llamafile](https://github.com/Mozilla-Ocho/llamafile) (or llama-cli) like this:
+```
+llamafile.exe -m grammar-synthesis-Q6_K.gguf --temp 0 -p "There car broke down so their hitching a ride to they're class."
+```
+and it will output the corrected text:
+```
+system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
+sampling:
+        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
+        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
+        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
+sampling order:
+CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
+generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0
+ The car broke down so they had to take a ride to school. [end of text]
+llama_print_timings:        load time =     782.21 ms
+llama_print_timings:      sample time =       0.23 ms /    16 runs   (    0.01 ms per token, 68376.07 tokens per second)
+llama_print_timings: prompt eval time =      85.08 ms /    19 tokens (    4.48 ms per token,   223.33 tokens per second)
+llama_print_timings:        eval time =     341.74 ms /    15 runs   (   22.78 ms per token,    43.89 tokens per second)
+llama_print_timings:       total time =     456.56 ms /    34 tokens
+Log end
+```
+If you have a GPU, be sure to add `-ngl 9999` to your command to automatically place as many layers as the GPU can handle for faster inference.