Dushwe
's Collections
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
•
2310.09478
•
Published
•
19
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4
on mock CFA Exams
Paper
•
2310.08678
•
Published
•
12
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
242
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
•
2205.14135
•
Published
•
11
Baichuan 2: Open Large-scale Language Models
Paper
•
2309.10305
•
Published
•
19
Paper
•
2309.16609
•
Published
•
34
Code Llama: Open Foundation Models for Code
Paper
•
2308.12950
•
Published
•
22
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
•
2310.13385
•
Published
•
10
Monolingual or Multilingual Instruction Tuning: Which Makes a Better
Alpaca
Paper
•
2309.08958
•
Published
•
2
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
•
2310.11716
•
Published
•
5
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper
•
2307.08701
•
Published
•
22
Paper
•
2309.03450
•
Published
•
8
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
•
2311.00571
•
Published
•
40
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
48
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Paper
•
2311.05348
•
Published
•
11
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
Tuning
Paper
•
2311.07574
•
Published
•
14
Trusted Source Alignment in Large Language Models
Paper
•
2311.06697
•
Published
•
10
Gemini vs GPT-4V: A Preliminary Comparison and Combination of
Vision-Language Models Through Qualitative Cases
Paper
•
2312.15011
•
Published
•
15
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Paper
•
2401.01614
•
Published
•
21
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
Language, Audio, and Action
Paper
•
2312.17172
•
Published
•
26
Paper
•
2401.04088
•
Published
•
159
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
71
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper
•
2401.10935
•
Published
•
4
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
Perception
Paper
•
2401.16158
•
Published
•
18
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
•
2401.15947
•
Published
•
49
How to Train Data-Efficient LLMs
Paper
•
2402.09668
•
Published
•
40
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
•
2402.13250
•
Published
•
25