Text Generation
Transformers
Safetensors
English
llama
conversational
Inference Endpoints
text-generation-inference

How to load with Langchain using Ctransformers?

#20
by ianuvrat - opened

Cn anyone share the syntax how to load this model with CTransformers?

Hi,
Unfortunately ctransformers is running a bit late to the game in regards of keeping up with the recent new and exciting large language model architectural changes this winter.
From what I can tell, due to how ctransformers works, some of the new unique models with architectural changes may not be supported yet. It doesn't seem very flexible with loading models outside of the model families that it was built for already. ctransformers has not been updated since November. Which was just before all the fun new model excitement this winter.
I don't think they are out of the game yet. You may have to wait. Or consider using another backend/system interim if you don't want to wait. I've been using ctransformers this year and considering moving to something new due to the delays. (I hope that ctransformers is ok and catches up soon. I liked using it.)
ctransformers is built around running quantized models. So you might have better luck looking around areas where that is the main theme, if you were not already.
https://huggingface.co/TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF#about-gguf <- For example, TheBloke points out other tools/clients/libraries that could be used to load GGUF quants of this and some newer model(s).
The llama-cpp family (llama-cpp and llama-cpp-python) might be a good choice. They seem to be able to move fast to build support for new model changes. And I've personally seen a successful SOLAR load and reasonable inference by llama-cpp-python. They have good documentation as well. ctransformers and llama-cpp-python felt alike. So your experience/understanding may transfer decently.
If your ok with loading models on your CPU, it's pretty easy to setup for CPU. And it's OS agnostic. Though things could be really slow if your not running a recent CPU with AVX support or something similar and recent. And it has bindings that likely will bridge things decently between it and LangChain. (https://python.langchain.com/docs/integrations/providers/llamacpp Though I think LangChain might be a bit dated with it's info. And I'm not a LangChain user. Still might be useful.) (Llama-cpp caveat. Running on something other than CPU will likely require a couple year seasoned Linux/Unix/VM OS management experience and likely some decent experience with software development to understand what it needed. Should be do-able if your familiar with that. If not, that is asking a lot of people just to guide someone blindly through.)
That being said, likely something could cross between providers that LangChain supports and the systems that can load GGUF's of SOLAR that may work for you.
Cross checking between 'https://python.langchain.com/docs/integrations/providers -> More' and 'https://huggingface.co/TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF#about-gguf ' might yield some reasonable results in search of a new loading and inference system that works for different constraints.

hunkim changed discussion status to closed

Sign up or log in to comment