Spaces:

Agent-Eval-Refine
/

README

Running

App Files Files Community

README / README.md

Jiayi-Pan's picture

Update README.md

a067ee1 verified 10 months ago

|

932 Bytes

	---
	title: README
	emoji: 🦀
	colorFrom: blue
	colorTo: blue
	sdk: static
	pinned: false
	---
	## Model/Data associated with research project Autonomous Evaluation and Refinement of Digital Agents.

	### [Paper](https://arxiv.org/abs/2404.06474) \| [Code](https://github.com/Berkeley-NLP/Agent-Eval-Refine)


	We design and use model-based evaluators to both evaluate and autonomously refine the performance of digital agents. Experiments show that domain-general automated evaluators can significantly improve the performance of digital agents, without any extra supervision.


	[Jiayi Pan](https://www.jiayipan.me/), [Yichi Zhang](https://sled.eecs.umich.edu/author/yichi-zhang/), [Nicholas Tomlin](https://people.eecs.berkeley.edu/~nicholas_tomlin/), [Yifei Zhou](https://yifeizhou02.github.io/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/), [Alane Suhr](https://www.alanesuhr.com/)

	UC Berkeley, University of Michigan