Spaces:
Runtime error
Runtime error
title: Dadc | |
emoji: 🏢 | |
colorFrom: red | |
colorTo: gray | |
sdk: gradio | |
sdk_version: 3.0.17 | |
app_file: app.py | |
pinned: false | |
license: bigscience-bloom-rail-1.0 | |
A basic example of dynamic adversarial data collection with a Gradio app. | |
*Instructions for someone to use for their own project:* | |
**Setting up the Space** | |
1. Clone this repo and deploy it on your own Hugging Face space. | |
2. Add one of your Hugging Face tokens to the secrets for your space, with the | |
name `HF_TOKEN`. Now, create an empty Hugging Face dataset on the hub. Put | |
the url of this dataset in the secrets for your space, with the name | |
`DATASET_REPO_URL`. It can be a private or public dataset. When you run this | |
space on mturk and when people visit your space on huggingface.co, the app | |
will use your token to automatically store new HITs in your dataset. NOTE: | |
if you push something to your dataset manually, you need to restart your space | |
or it could get merge conflicts when trying to push HIT data. | |
**Running Data Collection** | |
1. On your local repo that you pulled, create a copy of `config.py.example`, | |
just called `config.py`. Now, put keys from your AWS account in `config.py`. | |
These keys should be for an AWS account that has the | |
AmazonMechanicalTurkFullAccess permission. You also need to | |
create an mturk requestor account associated with your AWS account. | |
2. Run `python collect.py` locally. | |
**Profit** | |
Now, you should be watching hits come into your Hugging Face dataset | |
automatically! | |
**Tips and Tricks** | |
- If you are developing and running this space locally to test it out, try | |
deleting the data directory that the app clones before running the app again. | |
Otherwise, the app could get merge conflicts when storing new HITs on the hub. | |
When you redeploy your app on Hugging Face spaces, the data directory is deleted | |
automatically. | |
- huggingface spaces have limited computational resources and memory. If you | |
run too many HITs and/or assignments at once, then you could encounter issues. | |
You could also encounter issues if you are trying to create a dataset that is | |
very large. Check the log of your space for any errors that could be happening. | |