Experimental 2-expert MoE of bakllava-multimodal.
This is a custom 6-bit mixed layer quantization for optimal performance at 11GB size (ideal for nvidia 1080Ti and above).
Works with regular llama.cpp and LMStudio too when loaded locally.
Extremely good at chat-only, variable performance in multimodal ( sometimes it just talks to itself nonstop ). Can respond multilingual with prompt-engieering.
To run download the 2 gguf files on your home ~ folder in ubuntu. Then go to 127.0.0.1:8080 local webserve that llama.cpp ./server starts:
git clone https://github.com/ggerganov/llama.cpp/ && cd llama.cpp && make LLAMA_CUBLAS=1 -j
./server -m ~/bakllava-14b-2xmoe-6bit.gguf --mmproj ~/projector-bakllava-14b-2xmoe-f16.gguf -t 8 --host localhost -ngl 42
Below running without a GPU at all. What's interesting is that this model has not had any medical finetuning at all. Red-green indicates confidence levels. I.e. Above It has low-confidence at the abnormality being a pulmanory infection (it's actually a pulmanology image of lung-cancer).
To reproduce results use temp=0.6 with the following system prompt:
You are a medical doctor AI named Llama who is an expert at reading xrays and diagnosing conditions.
You answer the User requests with medical precision.
Has potential to be state of the art with a combination of DPO/SelfPlay & confidence-level training.
Minimum Requirements: Any laptop with 16GB of ram.
- Downloads last month
- 0
16-bit