Error loading model: wrong number of tensors; expected 256, got 255

by prarthanagarwal - opened Nov 18

Nov 18

•

I am trying to load this in llama-cpp-python. Any help is appreciated. I downloaded the Q4_K_M.

class LLMHandler:
def init(self):
try:
model_path = str(config.models.LLAMA_PATH)
if not os.path.exists(model_path):
raise FileNotFoundError(f"LLaMA model not found at: {model_path}")

        if torch.cuda.is_available():
            logger.info(f"CUDA available: {torch.cuda.get_device_name()}")
            logger.info(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
            torch.cuda.empty_cache()
            gc.collect()
        
        logger.info("Initializing LLaMA with Q4_K_M specific settings")
        
        # Settings specifically for Q4_K_M
        self.model = Llama(
            model_path=model_path,
            n_ctx=2048,
            n_batch=512,
            n_threads=4,
            n_gpu_layers=20,     # Partial GPU offload
            seed=42,
            use_mmap=True,
            use_mlock=False,
            main_gpu=0,
            tensor_split=None,
            vocab_only=False,
            use_float16=True,    # Enable float16 for Q4_K
            rope_freq_base=500000,  # From model metadata
            rope_freq_scale=1.0,
            n_gqa=8,            # From model metadata
            rms_norm_eps=1e-5,  # From model metadata
            verbose=True
        )
        
        logger.info("Model initialized, testing...")
        test = self.model.create_completion("Test", max_tokens=1)
        logger.info("Model test successful")
        
    except Exception as e:
        import traceback
        detailed_error = f"Failed to initialize LLaMA: {str(e)}\n"
        detailed_error += f"Traceback: {traceback.format_exc()}"
        logger.error(detailed_error)
        raise RuntimeError(detailed_error)

async def generate(self, prompt: str) -> str:
    try:
        response = self.model.create_completion(
            prompt,
            max_tokens=config.system.max_tokens,
            temperature=config.system.temperature,
            top_p=config.system.top_p,
            stop=["Human:", "Assistant:"],
            stream=True
        )
        
        full_response = ""
        for chunk in response:
            if chunk.choices[0].text:
                full_response += chunk.choices[0].text
                
        return full_response.strip()
    except Exception as e:
        logger.error(f"Error generating response: {e}")
        raise

bartowski

Owner Nov 18

make sure your llama-cpp-python is up to date, this error is usually indicative of an old install

prarthanagarwal

Nov 18

Sir I have tried every single version of llama-cpp-python, both built-in cuda support and not. Been trying to debug it since the last 4 hours, I also tried pre-built wheel and building it from source but no luck!

bartowski

Owner Nov 18

hmm that's very odd, because that error is very explicitly caused by older versions of llama.cpp that didn't know where to find the ROPE tensor..

can you do:

import llama_cpp

print(llama_cpp.__version__)

deleted

Nov 18

I have no stake in this model, but since its was small, i ran it as a llamafile just to see. It worked fine, and imported into ollama ok too. So the model itself is ok. ( didn't feel like running it directly.. ) ( 6_K version )

prarthanagarwal

Nov 18

true, there is no error with the model. I checked with a python script and the current version is : 0.2.19+cu118 (i tried both upgrading and downgrading)

bartowski

Owner Nov 18

Yeah, so that release is a full year old...

https://pypi.org/project/llama-cpp-python/0.2.19/

prarthanagarwal

Nov 19

•

edited Nov 19

So, it has been 17 hours since your last reply. I installed Visual Studio Build Tools, Latest CUDA Toolkit, Tried cpp-python 0.2.77 and other versions - no luck

prarthanagarwal

Nov 19

i don't know how but the combination of CUDA 12.4 + 0.30 llama-cpp-python WORKED!!!!! 😭

bartowski

Owner Nov 19

It wasn't working because you needed a significantly newer version of the python package, they're on 0.3.2 at this point, the fix for this error only came out a few months ago, and you were on one from a whole year ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment