@csabakecskemeti on Hugging Face: "-UPDATED- 4bit inference is working! The blogpost is updated with code snippet…"

Join the community of Machine Learners and AI enthusiasts.

csabakecskemeti

posted an update 2 days ago

Post

1606

-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

about 8 hours ago

How many tokens per second do you receive? I didn't see this on the blog.

about 8 hours ago

Good callout will add this evening
Llama 3 8b q8 was around 80t/s generation

about 3 hours ago

QLORA model loaded in 4bits

In this post