Downscaling the `Q_q` and `W_k` matrices for repeated layers in franken-merges
14
#4 opened 2 months ago
by
jukofyork
![](https://cdn-avatars.huggingface.co/v1/production/uploads/65995c45539c808e84c38bf1/FiU-p4LC6Ar0G2_1stO8d.png)
Guidance on GPU VRAM Split?
5
#3 opened 5 months ago
by
nmitchko
Performance
13
#2 opened 5 months ago
by
KnutJaegersberg
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1669551186189-63732ebbbd81fae2b3aaf3fb.jpeg)