microsoft
/

xclip-base-patch16

Video Classification

feature-extraction

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Sep 8, 2022

Commit

b5d3568

•

1 Parent(s): 5c39f88

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ model-index:
 # X-CLIP (base-sized model)
-X-CLIP model (base-sized, patch resolution of 32) trained fully-supervised on [Kinetics-400](https://www.deepmind.com/open-source/kinetics). It was introduced in the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Ni et al. and first released in [this repository](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
 This model was trained using 8 frames per video, at a resolution of 224x224.

 # X-CLIP (base-sized model)
+X-CLIP model (base-sized, patch resolution of 16) trained fully-supervised on [Kinetics-400](https://www.deepmind.com/open-source/kinetics). It was introduced in the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Ni et al. and first released in [this repository](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
 This model was trained using 8 frames per video, at a resolution of 224x224.