Possibility that Claude/ChatGPT uses similar techniques on adjusting RoPE sampling rate?
1
#4 opened over 1 year ago
by
Yhyu13
PPL chart for 16k models?
#3 opened over 1 year ago
by
Yhyu13
7B, 33B and 65B versions?
3
#2 opened over 1 year ago
by
flashvenom
Difference between this and 8k version?
10
#1 opened over 1 year ago
by
flashvenom