https://huggingface.co/ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1

#415
by eleius - opened

Can you quant this please?
Thanks

I can sure try. Things are a bit slow these days, but this is not a big model, so it shouldn't take more than a day. You can watch its progress at http://hf.tst.eu/status.html

Cheers

mradermacher changed discussion status to closed

Unfortunately, the model type MiniCPMV is not supported by llama.cpp yet.

Thanks anyway!

Unfortunately, the model type MiniCPMV is not supported by llama.cpp yet.

It seems support has been added if I'm not mistaken? https://gitea.swigg.net/dustins/llama.cpp/commit/d565bb2fd5a2a58b9924a7a34e77a87c78c52137 and https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/README-minicpmv2.6.md

Thanks for the notification - looks complicated. I'll give it a try.

It fails at the "surgery" step, seems maybe some path is wrong:

Could not locate the configuration_minicpm.py inside openbmb/MiniCPM-Llama3-V-2_5.

mradermacher changed discussion status to open

It seems the same error has been reported there by another user before they disabled discussions.

I'm afraid I can't be of help. Thanks for trying though!

It fails at the "surgery" step, seems maybe some path is wrong:

Could not locate the configuration_minicpm.py inside openbmb/MiniCPM-Llama3-V-2_5.

I assume it likely tries to execute https://huggingface.co/ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1/blob/main/configuration_minicpm.py but wrongly uses the path of the base model. Should you really need configuration_minicpm.py of the base model it is located under https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/blob/main/configuration_minicpm.py

I suspect it wants to somehow execute the configuration_minicpm.py in the model itself (at least, thats the only thing that makes sense). Or maybe gets the path form the config (which has lots of references to opembmb). At that point, I didn't feel confident enough (but hoped you would pipe in :-)

Worked using MiniCPM-V-2.6 but the model uses MiniCPM-V-2.5 so let's redo everything using MiniCPM-V-2.5.

diff --git a/examples/llava/minicpmv-convert/minicpmv2_6-surgery.py b/examples/llava/minicpmv-convert/minicpmv2_6-surgery.py
index cb4a75c6..309c3d24 100644
--- a/examples/llava/minicpmv-convert/minicpmv2_6-surgery.py
+++ b/examples/llava/minicpmv-convert/minicpmv2_6-surgery.py
@@ -9,7 +9,7 @@ ap.add_argument("-m", "--model", help="Path to MiniCPM-V-2.6 model")
 args = ap.parse_args()
 
 # find the model part that includes the the multimodal projector weights
-model = AutoModel.from_pretrained(args.model, trust_remote_code=True, local_files_only=True)
+model = AutoModel.from_pretrained(args.model, trust_remote_code=True, local_files_only=False)
 checkpoint = model.state_dict()
 
 # get a list of mm tensor names
@@ -30,7 +30,7 @@ if len(clip_tensors) > 0:
             f.write("{}\n")
 
 config = model.llm.config
-config._name_or_path = "openbmb/MiniCPM-V-2.6"
+config._name_or_path = "/mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1"
 config.auto_map = {
     "AutoConfig": "configuration_minicpm.MiniCPMConfig",
     "AutoModel": "modeling_minicpm.MiniCPMModel",
root@AI:~/BioMed/llama.cpp# ./venv/bin/python ./examples/llava/minicpmv-convert/minicpmv2_6-surgery.py -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1
configuration_minicpm.py: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.06k/4.06k [00:00<00:00, 38.9MB/s]
modeling_minicpmv.py: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13.9k/13.9k [00:00<00:00, 126MB/s]
resampler.py: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 35.8k/35.8k [00:00<00:00, 155MB/s]
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5:
- configuration_minicpm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5:
- resampler.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5:
- resampler.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:08<00:00,  2.06s/it]
Done!
Now you can convert /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1 to a regular LLaMA GGUF file.
Also, use /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/minicpmv.projector to prepare a minicpmv-encoder.gguf file.
root@AI:~/BioMed/llama.cpp# ./venv/bin/python ./examples/llava/minicpmv-convert/minicpmv2_6-convert-image-encoder-to-gguf.py -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1 --minicpmv-projector /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/minicpmv.projector --output-dir /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5                     
/root/BioMed/llama.cpp/./examples/llava/minicpmv-convert/minicpmv2_6-convert-image-encoder-to-gguf.py:1093: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  model.load_state_dict(torch.load(os.path.join(dir_model, "minicpmv.clip")))
/root/BioMed/llama.cpp/./examples/llava/minicpmv-convert/minicpmv2_6-convert-image-encoder-to-gguf.py:1233: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  projector = torch.load(args.minicpmv_projector)
  Converting to float32
resampler.query - f32 - shape = (96, 4096)
  Converting to float32
resampler.pos_embed_k - f32 - shape = (4900, 3584)
  Converting to float16
resampler.proj.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.kv.weight - f16 - shape = (4096, 1152)
  Converting to float16
resampler.attn.q.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.attn.k.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.attn.v.weight - f16 - shape = (4096, 4096)
  Converting to float32
resampler.attn.q.bias - f32 - shape = (4096,)
  Converting to float32
resampler.attn.k.bias - f32 - shape = (4096,)
  Converting to float32
resampler.attn.v.bias - f32 - shape = (4096,)
  Converting to float16
resampler.attn.out.weight - f16 - shape = (4096, 4096)
  Converting to float32
resampler.attn.out.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_q.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_q.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_kv.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_kv.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_post.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_post.bias - f32 - shape = (4096,)
Projector tensors added

tensor v.patch_embd.weight is always saved in f16
v.patch_embd.weight - f16 - shape = (1152, 3, 14, 14)
  Converting to float32
v.patch_embd.bias - f32 - shape = (1152,)
  Converting to float16
v.position_embd.weight - f16 - shape = (4900, 1152)
  Converting to float16
v.blk.0.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.0.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.0.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.0.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.1.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.1.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.1.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.2.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.2.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.2.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.3.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.3.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.3.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.4.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.4.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.4.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.5.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.5.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.5.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.6.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.6.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.6.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.7.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.7.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.7.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.8.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.8.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.8.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.9.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.9.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.9.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.10.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.10.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.10.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.11.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.11.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.11.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.12.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.12.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.12.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.13.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.13.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.13.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.14.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.14.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.14.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.15.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.15.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.15.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.16.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.16.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.16.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.17.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.17.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.17.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.18.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.18.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.18.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.19.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.19.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.19.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.20.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.20.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.20.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.21.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.21.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.21.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.22.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.22.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.22.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.23.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.23.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.23.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.24.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.24.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.24.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.25.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.25.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.25.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.26.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.26.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.26.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln2.bias - f32 - shape = (1152,)
  Converting to float32
v.post_ln.weight - f32 - shape = (1152,)
  Converting to float32
v.post_ln.bias - f32 - shape = (1152,)
Done. Output file: /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/mmproj-model-f16.gguf
root@AI:~/BioMed/llama.cpp# mv /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/mmproj-model-f16.gguf /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_MiniCPM-V-2.6.gguf
root@AI:~/BioMed/llama.cpp# rm -rf /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/

GGUF convearsion worked using MiniCPM-V-2.5 as well!

diff --git a/examples/llava/minicpmv-convert/minicpmv2_5-surgery.py b/examples/llava/minicpmv-convert/minicpmv2_5-surgery.py
index 7defb8ff..fe556089 100644
--- a/examples/llava/minicpmv-convert/minicpmv2_5-surgery.py
+++ b/examples/llava/minicpmv-convert/minicpmv2_5-surgery.py
@@ -9,7 +9,7 @@ ap.add_argument("-m", "--model", help="Path to MiniCPM-V-2.5 model")
 args = ap.parse_args()
 
 # find the model part that includes the the multimodal projector weights
-model = AutoModel.from_pretrained(args.model, trust_remote_code=True, local_files_only=True)
+model = AutoModel.from_pretrained(args.model, trust_remote_code=True, local_files_only=False)
 checkpoint = model.state_dict()
 
 # get a list of mm tensor names
@@ -30,7 +30,7 @@ if len(clip_tensors) > 0:
             f.write("{}\n")
 
 config = model.llm.config
-config._name_or_path = "openbmb/MiniCPM-Llama3-V-2.5"
+config._name_or_path = "/mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1"
 config.auto_map = {
     "AutoConfig": "configuration_minicpm.MiniCPMConfig",
     "AutoModel": "modeling_minicpm.MiniCPMModel"
root@AI:~/BioMed/llama.cpp# ./venv/bin/python ./examples/llava/minicpmv-convert/minicpmv2_5-surgery.py -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8
B-V1
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:05<00:00,  1.39s/it]
Done!
Now you can convert /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1 to a regular LLaMA GGUF file.
Also, use /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/minicpmv.projector to prepare a minicpmv-encoder.gguf file.
root@AI:~/BioMed/llama.cpp# ./venv/bin/python ./examples/llava/minicpmv-convert/minicpmv2_5-convert-image-encoder-to-gguf.py -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1 --minicpmv-projector /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/minicpmv.projector --output-dir /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5
/root/BioMed/llama.cpp/./examples/llava/minicpmv-convert/minicpmv2_5-convert-image-encoder-to-gguf.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  model.load_state_dict(torch.load(os.path.join(dir_model, "minicpmv.clip")))
/root/BioMed/llama.cpp/./examples/llava/minicpmv-convert/minicpmv2_5-convert-image-encoder-to-gguf.py:296: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  projector = torch.load(args.minicpmv_projector)
  Converting to float32
resampler.query - f32 - shape = (96, 4096)
  Converting to float32
resampler.pos_embed_k - f32 - shape = (4900, 4096)
  Converting to float16
resampler.proj.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.kv.weight - f16 - shape = (4096, 1152)
  Converting to float16
resampler.attn.q.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.attn.k.weight - f16 - shape = (4096, 4096)
  Converting to float16
resampler.attn.v.weight - f16 - shape = (4096, 4096)
  Converting to float32
resampler.attn.q.bias - f32 - shape = (4096,)
  Converting to float32
resampler.attn.k.bias - f32 - shape = (4096,)
  Converting to float32
resampler.attn.v.bias - f32 - shape = (4096,)
  Converting to float16
resampler.attn.out.weight - f16 - shape = (4096, 4096)
  Converting to float32
resampler.attn.out.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_q.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_q.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_kv.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_kv.bias - f32 - shape = (4096,)
  Converting to float32
resampler.ln_post.weight - f32 - shape = (4096,)
  Converting to float32
resampler.ln_post.bias - f32 - shape = (4096,)
Projector tensors added

tensor v.patch_embd.weight is always saved in f16
v.patch_embd.weight - f16 - shape = (1152, 3, 14, 14)
  Converting to float32
v.patch_embd.bias - f32 - shape = (1152,)
  Converting to float16
v.position_embd.weight - f16 - shape = (4900, 1152)
  Converting to float16
v.blk.0.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.0.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.0.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.0.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.0.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.0.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.0.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.1.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.1.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.1.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.1.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.1.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.1.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.2.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.2.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.2.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.2.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.2.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.2.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.3.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.3.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.3.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.3.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.3.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.3.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.4.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.4.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.4.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.4.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.4.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.4.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.5.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.5.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.5.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.5.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.5.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.5.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.6.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.6.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.6.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.6.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.6.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.6.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.7.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.7.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.7.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.7.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.7.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.7.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.8.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.8.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.8.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.8.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.8.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.8.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.9.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.9.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.9.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.9.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.9.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.9.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.10.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.10.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.10.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.10.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.10.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.10.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.11.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.11.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.11.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.11.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.11.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.11.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.12.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.12.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.12.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.12.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.12.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.12.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.13.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.13.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.13.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.13.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.13.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.13.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.14.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.14.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.14.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.14.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.14.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.14.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.15.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.15.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.15.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.15.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.15.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.15.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.16.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.16.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.16.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.16.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.16.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.16.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.17.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.17.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.17.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.17.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.17.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.17.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.18.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.18.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.18.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.18.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.18.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.18.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.19.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.19.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.19.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.19.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.19.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.19.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.20.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.20.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.20.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.20.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.20.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.20.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.21.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.21.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.21.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.21.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.21.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.21.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.22.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.22.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.22.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.22.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.22.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.22.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.23.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.23.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.23.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.23.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.23.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.23.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.24.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.24.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.24.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.24.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.24.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.24.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.25.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.25.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.25.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.25.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.25.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.25.ln2.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_k.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_k.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_v.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_v.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_q.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_q.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.attn_out.weight - f16 - shape = (1152, 1152)
  Converting to float32
v.blk.26.attn_out.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln1.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln1.bias - f32 - shape = (1152,)
  Converting to float16
v.blk.26.ffn_down.weight - f16 - shape = (4304, 1152)
  Converting to float32
v.blk.26.ffn_down.bias - f32 - shape = (4304,)
  Converting to float16
v.blk.26.ffn_up.weight - f16 - shape = (1152, 4304)
  Converting to float32
v.blk.26.ffn_up.bias - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln2.weight - f32 - shape = (1152,)
  Converting to float32
v.blk.26.ln2.bias - f32 - shape = (1152,)
  Converting to float32
v.post_ln.weight - f32 - shape = (1152,)
  Converting to float32
v.post_ln.bias - f32 - shape = (1152,)
Done. Output file: /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/mmproj-model-f16.gguf
mv /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/mmproj-model-f16.gguf /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf
root@AI:~/BioMed/llama.cpp# rm -rf /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/

The resulting GGUF is only 1.1 GiB which seems too small given that the SafeTensor is over 17 GB but to be fair we removed the entire image part of the model yet it still seems too small for an 8B model which I would expect to be around 13.5 GiB. When trying to load the resulting GGUF in llama.cpp I'm getting the following error:

root@AI:~/BioMed/llama.cpp# ./llama-cli -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf -p "I believe the meaning of life is" -n 128
Log start
main: build = 3272 (0ae6bdc2)
main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: seed  = 1734277883
llama_model_loader: loaded meta data with 19 key-value pairs and 455 tensors from /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = clip
llama_model_loader: - kv   1:                      clip.has_text_encoder bool             = false
llama_model_loader: - kv   2:                    clip.has_vision_encoder bool             = true
llama_model_loader: - kv   3:                clip.has_minicpmv_projector bool             = true
llama_model_loader: - kv   4:                          general.file_type u32              = 1
llama_model_loader: - kv   5:                        general.description str              = image encoder for MiniCPM-V
llama_model_loader: - kv   6:                        clip.projector_type str              = resampler
llama_model_loader: - kv   7:                      clip.minicpmv_version i32              = 3
llama_model_loader: - kv   8:                     clip.vision.image_size u32              = 448
llama_model_loader: - kv   9:                     clip.vision.patch_size u32              = 14
llama_model_loader: - kv  10:               clip.vision.embedding_length u32              = 1152
llama_model_loader: - kv  11:            clip.vision.feed_forward_length u32              = 4304
llama_model_loader: - kv  12:                 clip.vision.projection_dim u32              = 0
llama_model_loader: - kv  13:           clip.vision.attention.head_count u32              = 16
llama_model_loader: - kv  14:   clip.vision.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  15:                    clip.vision.block_count u32              = 26
llama_model_loader: - kv  16:                     clip.vision.image_mean arr[f32,3]       = [0.500000, 0.500000, 0.500000]
llama_model_loader: - kv  17:                      clip.vision.image_std arr[f32,3]       = [0.500000, 0.500000, 0.500000]
llama_model_loader: - kv  18:                              clip.use_gelu bool             = true
llama_model_loader: - type  f32:  285 tensors
llama_model_loader: - type  f16:  170 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'clip'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf'
main: error: unable to load model

Oh turns out I'm not supposed to load mmproj-model-f16.gguf as main model according to: https://github.com/ggerganov/llama.cpp/issues/7799

Convearting the main model doesn't seam to work. Booth using official llama.cpp and the fork.

root@AI:~/snow/llama.cpp# ./venv/bin/python convert_hf_to_gguf.py /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model --outfile /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_llamacpp/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00007.safetensors'
INFO:hf-to-gguf:token_embd.weight,           torch.float32 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.3.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00007.safetensors'
INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.3.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.4.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.5.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.6.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.7.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.8.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.8.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00007.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.10.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.11.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.12.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.13.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.13.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.14.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00007.safetensors'
INFO:hf-to-gguf:blk.14.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.14.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.15.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.16.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.17.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.18.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.19.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.20.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00007.safetensors'
INFO:hf-to-gguf:blk.20.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.20.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.21.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.22.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.23.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.24.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.25.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.25.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00007.safetensors'
INFO:hf-to-gguf:blk.25.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.26.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.27.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.27.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.28.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.28.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.28.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.28.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.29.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.29.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.29.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.30.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.30.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.30.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.30.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.31.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.31.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.31.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00007.safetensors'
INFO:hf-to-gguf:output.weight,               torch.float32 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.31.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.31.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.31.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:output_norm.weight,          torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
The repository for /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref:     https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  1baddeb572cd9de2a6d36f2ad0c361490bf5447dafca20afbac625e9d37f18a5
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 1531, in set_vocab
    self._set_vocab_sentencepiece()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 754, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 771, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 1534, in set_vocab
    self._set_vocab_llama_hf()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 846, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/snow/llama.cpp/gguf-py/gguf/vocab.py", line 390, in __init__
    raise FileNotFoundError('Cannot find Llama BPE tokenizer')
FileNotFoundError: Cannot find Llama BPE tokenizer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 4462, in <module>
    main()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 4456, in main
    model_instance.write()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 435, in write
    self.prepare_metadata(vocab_only=False)
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 428, in prepare_metadata
    self.set_vocab()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 1537, in set_vocab
    self._set_vocab_gpt2()
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 690, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 516, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/snow/llama.cpp/convert_hf_to_gguf.py", line 681, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

Let's see if we can add support for the 1baddeb572cd9de2a6d36f2ad0c361490bf5447dafca20afbac625e9d37f18a5 BPE pre-tokenizer as described in https://github.com/ggerganov/llama.cpp/issues/9098#issuecomment-2299421096

It was a good idea to queue it on nico1 specifically :-) Ok, trying to sort it out... the mmproj file would be the vision part, not the language model.

Uh, while I wrote my entry, things have moved fast. Yeah, I would wish that convert-hf-to-gguf would have some kind of interface where you could just set an env variable rather than having to patch it.

The main problem with just forcing it is that different hash supposedly means different pretokenizer behaviour, so while e.g. llama-3 might be close enough, it's probably not fully correct. Not that I would have a problem with that - slightly off but working is better than not working.

Anyway, as explained in the other thread, if yxou managew to get a .gguf, you can just rename it to Bio-Medical-MultiModal-Llama-3-8B-V1.gguf, clear status files in /dev/shm, and push (and clean up the downloadmdirectory at your leisure).

And we should probably provide the mproj file if nobody else does.

I just sucessfully convearted the text generation part as well:

root@AI:~/snow/llama.cpp# rm /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_MiniCPM-V-2.6.gguf
root@AI:~/snow/llama.cpp# mv /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf Bio-Medical-MultiModal-Llama-3-8B-V1_mmproj.gguf
root@AI:~/snow/llama.cpp# ./venv/bin/python convert_hf_to_gguf.py /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model --outfile /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00007.safetensors'
INFO:hf-to-gguf:token_embd.weight,           torch.float32 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.3.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00007.safetensors'
INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.3.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.4.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.5.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.6.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.7.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.8.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.8.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00007.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.10.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.11.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.12.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.13.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.13.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.14.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00007.safetensors'
INFO:hf-to-gguf:blk.14.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.14.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.15.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.16.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.17.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.18.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.19.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.20.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00007.safetensors'
INFO:hf-to-gguf:blk.20.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.20.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.21.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.22.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.23.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.24.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.25.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.25.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00007.safetensors'
INFO:hf-to-gguf:blk.25.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.26.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.27.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.27.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.28.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.28.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.28.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.28.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.29.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.29.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.29.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.30.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.30.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.30.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.30.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.30.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.30.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.31.ffn_gate.weight,      torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.31.attn_k.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:blk.31.attn_output.weight,   torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_q.weight,        torch.float32 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.31.attn_v.weight,        torch.float32 --> F16, shape = {4096, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00007.safetensors'
INFO:hf-to-gguf:output.weight,               torch.float32 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.31.attn_norm.weight,     torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.31.ffn_down.weight,      torch.float32 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.31.ffn_up.weight,        torch.float32 --> F16, shape = {4096, 14336}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,      torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:output_norm.weight,          torch.float32 --> F32, shape = {4096}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
The repository for /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1/model.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
INFO:gguf.vocab:Adding 280147 merge(s).
INFO:gguf.vocab:Setting special token type bos to 128000
INFO:gguf.vocab:Setting special token type eos to 128001
INFO:gguf.vocab:Setting special token type unk to 128002
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>

' }}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf: n_tensors = 291, total_size = 16.1G
Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16.1G/16.1G [00:52<00:00, 303Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf

It's working!

root@AI:~/snow/llama.cpp/build/bin# ./llama-cli -m /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf -p "I believe the meaning of life is" -n 128
ggml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
build: 4288 (43ed389a) with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 28 key-value pairs and 291 tensors from /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Model
llama_model_loader: - kv   3:                         general.size_label str              = 8.0B
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                       llama.context_length u32              = 8192
llama_model_loader: - kv   6:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   7:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  11:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  12:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  13:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  14:                          general.file_type u32              = 1
llama_model_loader: - kv  15:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  16:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  17:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  18:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  19:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  20:                  tokenizer.ggml.token_type arr[i32,128256]  = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  21:                      tokenizer.ggml.merges arr[str,280147]  = ["Δ  Δ ", "Δ  Δ Δ Δ ", "Δ Δ  Δ Δ ", "...
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  23:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 128002
llama_model_loader: - kv  25:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  27:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 257
llm_load_vocab: token to piece cache size = 0.7997 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 14.96 GiB (16.00 BPW)
llm_load_print_meta: general.name     = Model
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: UNK token        = 128002 '<unk>'
llm_load_print_meta: PAD token        = 0 '!'
llm_load_print_meta: LF token         = 128 'Γ„'
llm_load_print_meta: EOG token        = 128001 '<|end_of_text|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors:   CPU_Mapped model buffer size = 15317.02 MiB
.........................................................................................
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 4096
llama_new_context_with_model: n_ctx_per_seq = 4096
llama_new_context_with_model: n_batch       = 2048
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 500000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
llama_kv_cache_init:        CPU KV buffer size =   512.00 MiB
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   296.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 32

system_info: n_threads = 32 (n_threads_batch = 32) / 64 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

sampler seed: 251197925
sampler params:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = 128, n_keep = 1

I believe the meaning of life is to help others and make a difference in the world. One way to do this is by volunteering and giving back to our community.
Volunteering allows individuals to contribute their time, skills, and resources towards a greater good. It provides an opportunity to make a positive impact on society, which in turn, can improve the lives of others.
By volunteering, we can help those in need, such as the elderly, the homeless, and the sick. We can also work towards environmental conservation and animal welfare, among other causes.
In addition to volunteering, we can also make a difference by being kind and compassionate towards others. This includes showing empathy

llama_perf_sampler_print:    sampling time =      10.08 ms /   136 runs   (    0.07 ms per token, 13493.40 tokens per second)
llama_perf_context_print:        load time =    2711.82 ms
llama_perf_context_print: prompt eval time =     613.26 ms /     8 tokens (   76.66 ms per token,    13.04 tokens per second)
llama_perf_context_print:        eval time =   16488.17 ms /   127 runs   (  129.83 ms per token,     7.70 tokens per second)
llama_perf_context_print:       total time =   17133.27 ms /   135 tokens

@mradermacher You can quant this model now. The GGUF containing the text generation part is located under /mradermacher/tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1.gguf. The model is tested and works with llama.cpp. No idea what we do with the vision part. I guess we can just ignore it or upload it as is.

It was a good idea to queue it on nico1 specifically :-) Ok, trying to sort it out... the mmproj file would be the vision part, not the language model.

Big thanks for that as the model is gated.

Uh, while I wrote my entry, things have moved fast. Yeah, I would wish that convert-hf-to-gguf would have some kind of interface where you could just set an env variable rather than having to patch it.

It would likely be better but patching it seems fine as well for the few times we need to manually add a non-supported PRE-tokenizer.

The main problem with just forcing it is that different hash supposedly means different pretokenizer behaviour, so while e.g. llama-3 might be close enough, it's probably not fully correct. Not that I would have a problem with that - slightly off but working is better than not working.

As long it is just a different version of the same PRE-tokenizer it shouldn't really matter. The changes they usually make are so minimal they can be ignored. Changes often won't even affect the llama.cpp implementation and llama.cpp developers just bump the hash to be compatible with latest version completely ignoring compatibility with models finetuned on a previous version.

@mradermacher You can quant this model now.

It's on its way :) Well done!

No idea what we do with the vision part

Yeah, I will upload it. They are usually provided separately, typically in the upstream model, but since we don't have it, we should just upload it. But to what repository, and how to tell the user. Sigh.

Big thanks for that as the model is gated.

Fascinating (it's of course not listed in my gated repo list).

Yeah, I will upload it.

Or rather, I will kindly ask you to give it to me first :)

You could upload yourself using hugginface-cli, btw., api token should be available without any special consideration. I would upload it to the non-imatrix repo.

Or rather, I will kindly ask you to give it to me first :)

Oh sorry I accidentally moved it to my working directory when I renamed it. I moved it back to /tmp/quant/Bio-Medical-MultiModal-Llama-3-8B-V1_mmproj.gguf

You could upload yourself using hugginface-cli, btw., api token should be available without any special consideration. I would upload it to the non-imatrix repo.

You better upload it yourself as I'm not exactly sure where you want to have it uploaded and if it needs to be quantized.

You better upload it yourself

Sure, that was the plan, but it not wrong to show you the options :)

I will upload it to the non-imatrix repo simply as model.mmproj.gguf. As if it were a quant of sorts. There hasn't been a standard for these files, afaik, and this way, they keep together with the quants. Not sure what my model page does when confrontend with this :)

For the record, it said sth. like "old unsupported quant" :)

eleius changed discussion status to closed

Sign up or log in to comment