-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 82
Collections
Discover the best community collections!
Collections including paper arxiv:2412.13663
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 47 -
2D Matryoshka Sentence Embeddings
Paper • 2402.14776 • Published • 6 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 82
-
LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding
Paper • 2306.14924 • Published • 2 -
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Paper • 2404.12365 • Published • 1 -
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5 -
Wave Network: An Ultra-Small Language Model
Paper • 2411.02674 • Published • 3
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
Functional Interpolation for Relative Positions Improves Long Context Transformers
Paper • 2310.04418 • Published • 4 -
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs
Paper • 2106.09997 • Published • 2 -
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2
-
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Paper • 1901.08746 • Published • 3 -
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 15 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 13 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 17 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 11 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 35 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 62 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47