How many A800 80G cards are needed for maximum seq length=32768 inference?

by nogggg - opened

In the Model Memory Requirements, it is mentioned that float16/bfloat16 requires 133.58 GB, but the loading model weights tool is 67.3324 GB. Using 2 A800s does not support max seq len (32768). If you want to maintain the maximum length, I think you may need 4 A800s. What should we do?

Sign up or log in to comment