Distilled Long-Context Encoders - a OxxoCodes Collection

OxxoCodes 's Collections

updated Aug 30

Various efficient attention encoder-style architectures distilled into student models with half the hidden layers, plus a long-context NER dataset