Text Generation
Transformers
GGUF
deepseek

failed to load the model

#1
by rinoa - opened

Hi @TheBloke , thank you for making the gguf model. I got an error when loading this model with llama.cpp. Do you have any suggestion?

error loading model: unordered_map::at
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'magicoder-s-ds-6.7b.Q5_K_M.gguf'
main: error: unable to load model

Hi @TheBloke I am also having the same problem while using the from llama_cpp import Llama same with LM studio also
not able to load the model

Same here but it looks like @TheBloke just reuploaded the model so trying again now.

Same issue unfortunately. By the way, let me take this opportunity to say thanks @TheBloke for your incredible work converting and uploading these models! It's been so valuable to always find the latest models ready to download and deploy on my local machine, typically within days after the original model was launched. Thank you!!

@steefvw any luck, seems he updated readme only

yesterday when I did quantization of this model I had the same issue

I think the issue is related to vocab size mismatch.
Similar to.
https://github.com/ggerganov/llama.cpp/issues/3900

When i did conversion, i had same message with the 16bit immediately after the convert.py. Further quantizing to 4 bit gave the vocab warning.

@aniljava , try this PR https://github.com/ggerganov/llama.cpp/pull/3633 maybe it work for you, I have tried as well but didn't success.

I tried #3663 briefly before giving up and waiting for @TheBloke . Also played a bit manually chaning the vocab size. No luck.

he said here:
https://discord.com/channels/1111983596572520458/1181468647441563728/1182011875068747847

Let me know if there's any GGUF issues as I need to use a PR to make this due to lack of tokenizer.model support

So it might be out of his reach for the time being.

It's better to ask the ogrinal model's author instead of downstream.

I removed my "like" since its not working on any app.
We had had this problem with deepseek models before and didnt work then and they dont work now.

Just downloaded Q4_K_M and Q3_K_M and also get an error:

error loading model: invalid unordered_map<K, T> key

on both gguf files.
Thanks anyway for publishing quantized models for small GPU!!!

deleted

Most of the time in cases like this it needs to be addressed upstream. Bloke cant fix their stuff....

Yeah sorry, the models seem to be unusable at the moment. I will see if I can fix it, otherwise I'll pull it for now

OK I have re-done the quants and they will now work with a specific fork of llama.cpp. This PR is not yet merged and is currently on hold, so there's no immediate indication when it will me merged.

Fork: https://github.com/DOGEwbx/llama.cpp/tree/regex_gpt2_preprocess

They will not work with mainline llama.cpp, and they will not work with any third-party GGUF clients, like llama-cpp-python, LM Studio, text-generation-webui, etc

The files are left for anyone who has the interest to compile llama.cpp for themselves and I can put a note in the README to this effect.

However I might still pull it, as the output is so far not usable:

ᐅ ./main -m /workspace/process/ise-uiuc_magicoder-s-ds-6.7b/gguf/magicoder-s-ds-6.7b.Q4_K_M.gguf -n -1 -ngl 100 -p "You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )

@@ Response
"

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )

@@ Response
Here is how you can solve it using python's FastAPI for creating an API server. We will use FastAPI to create the web server which includes endpoints that sorts the given integer list and returns highest number, lowest number, average, median, mode of the list in a JSON format. It also provides the sorted list as well:
```python

# install required packages using pip
pip install fastapi uv


$ pip install fastapi
 $ uvscikitly and statistics for calculating modes



```python
from typing import List
from fastapi import FastAPI, Depends
 from fastapi.middleware.responses
import statistics

from pydantic import BaseModel, Body
from starlette.requests: FastAPI
app = FastAPI()
from typing import List
from fastapi import HTTPException, status
from fastapi import FastApi, Response, Depends, Query
# for the calculation of mode in Python API
def calculate_mode(numbers: List[int]= Body):
    return {
        "highest": numbers},
import statistics as sta t.typing from starlette.middleware import FastAPI, HTTPException
from fastapi import FastAPI, Request, Uv
from fastapi import Depends
from typing import Optional
from pydantic import BaseModel
from fastapi.responsesponsese FastAPIErrorHTTPException, JSONResponse
from fastapi import FastAPI, Path
from statistics
from starlette import Request
import uviemyaml APIRouter and path operation:
from pydanticornado.middleware import HTTPException 40.t ai.

class NumberModel for mode in Pythonication with thestarlette

import statistics as Statistics

from fastapi import FastAPI, Depends, Path
from starlette , Uvit
from typing import List[int]
def sort

numbers



@app
	

def calculate_mean(statistics) is to use Python's Starlette:

```python
import uvit.validator import Depends,HTTPException

.... 

I terminated it there as it was infinitely generating. But even before that, there was no usable answer, it's mostly just gibberish

I thought it was a finetune of DeepSeek-Coder-6.7b-base, so I'm surprised it's so wonky. Maybe I misinterpreted the architecture. Hopefully we get llama update that makes this run coherently. Good open source coding models, especially at this small size, are super useful!

Could anyone help me test my quantized results?

https://huggingface.co/mzwing/Magicoder-S-DS-6.7B-GGUF

I just use the master branch of llama.cpp. Looks like some PRs merged later fix this issue.

Sign up or log in to comment