English

The VILD Dataset (VIdeo and Long-Description)

This dataset is proposed from VideoCLIP-XL. We establish an automatic data collection system, designed to aggregate sufficient and high-quality pairs from multiple data sources. We have successfully collected over 2M (VIdeo, Long Description) pairs, denoted as our VILD dataset.

Format

{
  "short_captions": [
        "...",
    ],
  "long_captions": [
        "...",
    ],
  "video_id": "..."
}
{
  .....
},
.....

Source

@misc{wang2024videoclipxladvancinglongdescription,
      title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models}, 
      author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
      year={2024},
      eprint={2410.00741},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.00741}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.