Anybody know how/what can actually load/inference this model?
I have tried the following: Ooobabooga TextGenWebui, llama.cpp, ollama.cpp, kobold.cpp, Tabby, Exlv2, and LM Studio, and not a single one of them has support for this model. I am trying to use this model as an open source, locally run alternative to GPT-4, as I do not like or wish to support OpenAI in any way possible, but it seems as though this model is just designed in a way that means running it on any pre-existing GUI is impossible
Any additional info would be massively appreciated, as I am having to put my job on hold to try and sort out a GPT-4 alternative.
Important to note: I have absolutely 0 experience with Diffusers/Transformers, and I have very little experience with code as well. I am trying to find a solution that allows me to run this model in a way that I can direct a front end to its port and have it fulfill requests from a tagging/captioning GUI
Maybe you can use lmdeploy
which support MiniCPM-Llama3-V-2_5
, you can use command line to get gradio, api_server or chat at terminal
https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5 - You can run llama.cpp server from here I believe