jartine commited on
Commit
a41951c
·
verified ·
1 Parent(s): 5c7eae8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -68,7 +68,10 @@ This model has a max context window size of 128k tokens. By default, a
68
  context window size of 4096 tokens is used. You can use a larger context
69
  window by passing the `-c 8192` flag. The software currently has
70
  limitations in its llama v3.1 support that may prevent scaling to the
71
- full 128k size.
 
 
 
72
 
73
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
74
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
 
68
  context window size of 4096 tokens is used. You can use a larger context
69
  window by passing the `-c 8192` flag. The software currently has
70
  limitations in its llama v3.1 support that may prevent scaling to the
71
+ full 128k size. See our
72
+ [Phi-3-medium-128k-instruct-llamafile](https://huggingface.co/Mozilla/Phi-3-medium-128k-instruct-llamafile)
73
+ repository for llamafiles that are known to work with a 128kb context
74
+ size.
75
 
76
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
77
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card