Minimal inference code

by thomaskalnik - opened Mar 26

Mar 26

Hi, can you provide a minimal example for running inference? I keep running into tensor length mismatch errors, I haven't found an inference example in any of the GemMoE repos. Thanks

Crystalcareai

Owner Mar 26

Can you share the error you're receiving? But yes I can update the readme.

Crystalcareai

Owner Mar 27

•

edited Mar 27

Sorry, keyboard just spazzed on me. Check the readme for some code, and I'll update with some proper examples once I can spin up an instance.

Crystalcareai changed discussion status to closed Mar 27

Crystalcareai changed discussion status to open Mar 27

thomaskalnik

Mar 27

Thanks, that worked for me. The issue was I was not using attn_implementation="flash_attention_2" in my model implementation.

Crystalcareai

Owner Mar 27

Yeah only flash-attn is supported atm. tbh i'm getting pretty disillusioned with Gemma as a whole - and will likely move my future MoE experiments to yi or mistral. It's too unpredictable, even with bug fixes.

thomaskalnik

Mar 27

Interesting, that is good context to have. I'll keep up to date on your work, thanks again.

Crystalcareai changed discussion status to closed Mar 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment