stereoplegic
's Collections
Byte-level
updated
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper
•
2105.13626
•
Published
•
3
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
50
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper
•
2305.07185
•
Published
•
9
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper
•
1802.01817
•
Published
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
•
2403.09622
•
Published
•
17
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
Synthesis with Bytes
Paper
•
1811.09021
•
Published
•
1
Neural Machine Translation with Byte-Level Subwords
Paper
•
1909.03341
•
Published
Neural Machine Translation without Embeddings
Paper
•
2008.09396
•
Published
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical
Normalization by Fine-tuning ByT5
Paper
•
2110.15248
•
Published
MonoByte: A Pool of Monolingual Byte-level Language Models
Paper
•
2209.11035
•
Published
Are Character-level Translations Worth the Wait? Comparing Character-
and Subword-level Models for Machine Translation
Paper
•
2302.14220
•
Published
Bilingual End-to-End ASR with Byte-Level Subwords
Paper
•
2205.00485
•
Published
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
54
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
Representation
Paper
•
2103.06874
•
Published
•
1
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Paper
•
2404.14408
•
Published
•
6
Integrating Multi-scale Contextualized Information for Byte-based Neural
Machine Translation
Paper
•
2405.19290
•
Published
Word-Level Representation From Bytes For Language Modeling
Paper
•
2211.12677
•
Published
byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings
Paper
•
2106.13302
•
Published