If deploy m3 and m3 raranker model on a same gpu service,will it improve the gpu efficiency in the concurrent request environment

#65
by seetimee - opened

what's the performance deference between Deploying One Model Per Machine vs. Two Models Per Machine

Sign up or log in to comment