alpindale
/

Mistral-7B-Instruct-v0.2-AQLM-2Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

alpindale commited on Mar 14

Commit

86407f7

•

1 Parent(s): 9fcfbc0

Create README.md

Files changed (1) hide show

README.md +2 -0

README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Took 42 hours to quantize on 4xA40s, at a batch size of 128. I could've went higher, but hindsight.
2	+ At that batch size, it was using about 25-30 GiB per GPU, utilization remained at 100%.