Spaces:
Running
Running
File size: 1,236 Bytes
c4c7cee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
### Dataset
To download the datataset, run:
```python
# download the full dataset
from huggingface_hub import snapshot_download
snapshot_download(repo_id="osv5m/osv5m", local_dir="datasets/osv5m", repo_type='dataset')
```
and finally extract:
```python
import os
import zipfile
for root, dirs, files in os.walk("datasets/osv5m"):
for file in files:
if file.endswith(".zip"):
with zipfile.ZipFile(os.path.join(root, file), 'r') as zip_ref:
zip_ref.extractall(root)
os.remove(os.path.join(root, file))
```
You can also directly load the dataset using `load_dataset`:
```python
from datasets import load_dataset
dataset = load_dataset('osv5m/osv5m', full=False)
```
where with `full` you can specify whether you want to load the complete metadata (default: `False`).
If you only want to download the test set, you can run the script below:
```python
from huggingface_hub import hf_hub_download
for i in range(5):
hf_hub_download(repo_id="osv5m/osv5m", filename=str(i).zfill(2)+'.zip', subfolder="images/test", repo_type='dataset', local_dir="datasets/osv5m")
hf_hub_download(repo_id="osv5m/osv5m", filename="README.md", repo_type='dataset', local_dir="datasets/osv5m")
``` |