Research to the People

non-profit

https://www.researchtothepeople.org/

https://github.com/orgs/researchtothepeople/

Activity Feed Request to join this org

AI & ML interests

Cancer and Rare Disease.

RTTP's activity

alexshengzhili

authored a paper 3 months ago

Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities

Paper • 2411.05232 • Published Nov 7, 2024

alexshengzhili

posted an update 3 months ago

Post

1135

We’re excited to release Abstract2Appendix v1 10K , a high-quality dataset crafted to enhance the long-context capabilities of Large Language Models (LLMs). This dataset combines thousands of peer reviews from NeurIPS 2023, EMNLP 2023, TMLR, and ICLR 2023, making it a treasure trove of detailed feedback, critical reasoning, and structured academic insights. Our experiments showed that this dataset increased long context ability of phi-3 models!

🌟 Key Highlights:

• Expert Reviews: Aggregated from 3–6 reviews per paper, capturing the most insightful and constructive content.
• Rich Metadata: we have aggregated the reviews, and also included full parsed paper
• LLM Ready: Perfect for fine-tuning (We did dpo and sft)

🎯 Use Cases:

• Fine-tuning models with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT).
• Benchmarking zero-shot and long-context comprehension capabilities.

🔗 Explore the dataset: alexshengzhili/Abstract2Appendix_v1_10k

This dataset is based on the methodology described in our recent paper, “Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities”. Check it out for more details! https://arxiv.org/abs/2411.05232

hayuh

updated a Space 8 months ago

EDS Chatbot

💬

dcgenomics

updated a model 8 months ago

RTTP/test_v001

Updated Jun 18, 2024

dontjandra

updated a model 8 months ago

RTTP/AptimizeAI-Mistral-7b-HPP1

Text Generation • Updated Jun 16, 2024 • 11

alexshengzhili

posted an update 11 months ago

Post

After the Supervised Fine-Tuning (SFT) phase, we observed a notable degradation in the instruction-following capabilities of the LLaVA Multi-Modal Large Language Model (MM-LLM). To address this issue, we introduced a 6K-entry VQA preference dataset and employed Direct Preference Optimization (DPO), alongside testing other algorithms such as Rejection Sampling and SteerLM, to enhance instruction-following proficiency. Our methodology not only fully restored the language following capabilities of LLaVa on the MT-Bench but also outperformed LLaVA-RLHF and Vicuna. Additionally, our approach extended to visual VQA tasks, as demonstrated by significant performance improvements on MM-Vet and LLaVa-Bench. An interesting observation was that, compared to models using distilled SFT, our method showed substantial out-of-distribution improvements.

https://arxiv.org/abs/2402.10884
Model available
alexshengzhili/llava-v1.5-13b-dpo
GitHub:
https://github.com/findalexli/mllm-dpo/edit/main/README.MD

alexshengzhili

authored 2 papers 12 months ago

Multi-modal preference alignment remedies regression of visual instruction tuning on language model

Paper • 2402.10884 • Published Feb 16, 2024

SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs

Paper • 2308.03349 • Published Aug 7, 2023

AI & ML interests

Team members 15

RTTP's activity

EDS Chatbot