Mozilla
/

Llama-3.2-3B-Instruct-llamafile

Model card Files Files and versions Community

jartine commited on Oct 2, 2024

Commit

f249b3d

·

verified ·

1 Parent(s): 44a894d

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -116,6 +116,13 @@ driver needs to be installed if you own an NVIDIA GPU. On Windows, if
 you have an AMD GPU, you should install the ROCm SDK v6.1 and then pass
 the flags `--recompile --gpu amd` the first time you run your llamafile.
 For further information, please see the [llamafile
 README](https://github.com/mozilla-ocho/llamafile/).

 you have an AMD GPU, you should install the ROCm SDK v6.1 and then pass
 the flags `--recompile --gpu amd` the first time you run your llamafile.
+On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
+perform matrix multiplications. This is open source software, but it
+doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
+installed on your system, then you can pass the `--recompile` flag to
+build a GGML CUDA library just for your system that uses cuBLAS. This
+ensures you get maximum performance.
 For further information, please see the [llamafile
 README](https://github.com/mozilla-ocho/llamafile/).