GGUFs were build against a wrong configuration. Context should be 8K

#2
by Nazosan - opened

The original configuration was supposed to specify a context of 8K, but when these quants were built it said 32K. In particular this produces a bad RoPE when using automatic detection. For example, at 16K context (which should be 2x and thus most things would set 32768.0) it's still 10000. Actually, this one is a bit of a mess because of the way it handles RoPE and most values are wrong anyway. (At 16K somewhere around 21082.7 or so depending on hardware variances is best. On my AMD configuration 21650 came out best.)

Ideally the quants should be rebuilt. The original configuration has already been corrected some time ago. If anyone still uses v2 of Beyonder (I noticed v3 has very different base models so people might prefer one over the other depending on preference) the ideal values are probably a base of 10000 at 8K and below, 15469.8 at 12K, 21082.7 at 16K, 32614.5 at 24K, and 44448.0 at 32K. Probably. (Well, I doubt this one really scales to 32K all that well, but if one wants to try.) Scale of 1.0 at each.

Sign up or log in to comment