How is inference with this version of grok-1?

#1
by dyoung - opened

1st off, thank you hpcai-tech for taking the time to convert this model for pytorch. I'm hoping that this helps with the open community. It does seem that python+pytorch is the main setup used for most open source LLM's.

I don't have access to anything that can run it. And as a interested watching bystander, I'm curious if this model has been able to run successfully for someone that has access to enough hardware and funds to run this model in pytorch. (Very large model)
Mostly curious as to what is similar and what is different compared to what is seen with other open source MOE's. (Or against its self in rust+jax form. Or other base models as well.)
Is it coherent for starters? How efficient is the setup with pytorch? (How many tokens per sec?)
Where there any hardware surprises/special accommodations needed when trying to run it?
If it seems that it's running ok, can known evals be run against it to see how it fairs with marks against other baseline models?
What other bumps/surprises where there along the way that are unique for this model compared to others?

I understand that this could be asking a lot. And it's ok if question like these can not be answered anytime soon.

Thanks for taking your time to read this.

Sign up or log in to comment