GGUF format

#1
by aari1995 - opened

Thank you for the great model Malte!

@TheBloke I think it would be much appreciated to have this maybe on your page as well? :-) It is a mix of the German Leo LMs and OpenHermes and seems to perform really well!

Thanks! I haven't done any quantization myself yet but I'll have a look into it.

There is already an AWQ quantized version: https://huggingface.co/mayflowergmbh/hermeo-7b-awq

Thank you very much - I am actually working on another solely german quantization technique boosting the models German capacities and replies. It works really good so far I think and has lots of potential, but WIP and will likely be updated next week, adding some more stuff.

https://huggingface.co/aari1995/germeo-7b-awq

Also at the moment I sadly have troubles evaluating the model on the German benchmarks as it does not really support AWQ. If you have an idea let me know.

Open for feedback!

What exactly is the problem? The latest transformers version does support AWQ, right? Feel free to reach out to me. I am happy to help.

Yes I also figured that out and it works now, thank you very much!
At the moment I need to find time to do the MMLU Eval as it takes 26 hours on my 3090 ti.
So far the benchmarks look good and are slightly worse but the models output is guaranteed German:

ARC-DE: 0.514
Hellaswag-DE: 0.651
TruthfulQA-DE: 0.508

I'll keep you updated.

https://huggingface.co/aari1995/germeo-7b-awq

Evaluation done. MMLU 0.522 (improvement). Resulting in an average of 0.563 (DE-Average). I think it is a good use case of knowledge transfer from English to German with "keeping the model German". It replies solely in German. @floleuerer created a benchmark for German response rates - in contact to see if there is an improvement.

Malte, would you be up for further experiments on knowledge transfer or a call? I am experimenting also with laser and want to see whether a non-bilingual model can achieve improvements with quantization / pruning methods.

Sign up or log in to comment