This model is still uploading. README will be here shortly.
If you're too impatient to wait for that (of course you are), to run these files you need:
- llama.cpp as of this commit or later
- For users who don't want to compile from source, you can use the binaries from release master-e76d630
- To add new command line parameter
-gqa 8
Example command:
/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
There is no CUDA support at this time, but it should hopefully be coming soon.
There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.