If you're going to use CPU & RAM only without a GPU, what can be done to optimize the speed of running llama as an api?
Thanks in advance
· Sign up or log in to comment