Yiyang Nan

nanyy1025

AI & ML interests

None yet

Recent Activity

liked a dataset about 1 month ago
C4AI-Community/multilingual-reward-bench
liked a dataset about 1 month ago
CohereForAI/include-base-44
liked a dataset about 1 month ago
CohereForAI/Global-MMLU
View all activity

Organizations

Bats Research's profile picture C4AI Community's profile picture

nanyy1025's activity

reacted to Taylor658's post with 🚀❤️🔥👀 3 months ago
view post
Post
2517
Spent the weekend testing out some prompts with 🕵️‍♂️Mystery Bot🕵️‍♂️ on my mobile... exciting things are coming soon for the following languages:

🌐Arabic, Chinese, Czech, Dutch, English French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese!🌐
upvoted an article 6 months ago
view article
Article

How NuminaMath Won the 1st AIMO Progress Prize

110
liked a Space 7 months ago
reacted to davanstrien's post with ❤️ 7 months ago
view post
Post
2315
Several methods/models have recently been shared to generate synthetic data from minimal or no initial seeds, essentially creating data directly from raw text.

IMO, these approaches that rely on smaller models for synthetic data generation are quite valuable for scaling up synthetic data and democratizing access to creating domain-specific synthetic datasets.

I've compiled a collection of Gradio demos showcasing some of these methods here: davanstrien/synthetic-data-generation-demos-667573f248b97360ff3668a5
·
reacted to macadeliccc's post with ❤️ 8 months ago
view post
Post
Create synthetic instruction datasets using open source LLM's and bonito🐟!

With Bonito, you can generate synthetic datasets for a wide variety of supported tasks.

The Bonito model introduces a novel approach for conditional task generation, transforming unannotated text into task-specific training datasets to facilitate zero-shot adaptation of large language models on specialized data.

This methodology not only improves the adaptability of LLMs to new domains but also showcases the effectiveness of synthetic instruction tuning datasets in achieving substantial performance gains.

AutoBonito🐟: https://colab.research.google.com/drive/1l9zh_VX0X4ylbzpGckCjH5yEflFsLW04?usp=sharing
Original Repo: https://github.com/BatsResearch/bonito?tab=readme-ov-file
Paper: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation (2402.18334)
  • 2 replies
·