sujitpal commited on
Commit
295d06c
·
1 Parent(s): 1ec1154

fixed typos and added info pointed out in evaluation

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
 
8
  ## Model Details
9
 
10
- This model is a finetuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specically on remote sencing images.
11
 
12
  ### Model Date
13
 
@@ -19,22 +19,22 @@ The base model uses a ViT-B/32 Transformer architecture as an image encoder and
19
 
20
  ### Model Version
21
 
22
- We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd) for zero-shot classification for each of those.
23
 
24
  ### Training
25
 
26
  To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
27
- The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peack learning rate 1e-4 on 1 TPU-v3-8.
28
- Full log of the training run done to produce can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/1ts243k3).
29
 
30
  ### Demo
31
 
32
- Checko out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
33
 
34
 
35
  ### Documents
36
 
37
- - [Fine-tuning CLIP on RSICD with HuggingFace and flax/jax on colab using TPU]()
38
 
39
 
40
  ### Use with Transformers
@@ -67,7 +67,12 @@ for l, p in zip(labels, probs[0]):
67
 
68
  ### Intended Use
69
 
70
- The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
 
 
 
 
 
71
 
72
  #### Primary intended uses
73
 
@@ -79,7 +84,7 @@ We primarily imagine the model will be used by researchers to better understand
79
 
80
  ## Data
81
 
82
- The model was trained on publicly available remote sensing image cations datasets. Namely [RSICD](), [UCM]() and [Sydney]().
83
 
84
 
85
  ## Performance and Limitations
 
7
 
8
  ## Model Details
9
 
10
+ This model is a finetuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with an aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
11
 
12
  ### Model Date
13
 
 
19
 
20
  ### Model Version
21
 
22
+ We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd#evaluation-results) for performance metrics on zero-shot classification for each of those.
23
 
24
  ### Training
25
 
26
  To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
27
+ The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
28
+ Full log of the training run can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/1ts243k3).
29
 
30
  ### Demo
31
 
32
+ Check out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
33
 
34
 
35
  ### Documents
36
 
37
+ - [Fine-tuning CLIP on RSICD with HuggingFace and flax/jax on colab using TPU](https://colab.research.google.com/github/arampacha/CLIP-rsicd/blob/master/nbs/Finetuning_CLIP_with_HF_and_jax.ipynb)
38
 
39
 
40
  ### Use with Transformers
 
67
 
68
  ### Intended Use
69
 
70
+ The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification.
71
+
72
+ In addition, we can imagine applications in defense and law enforcement, climate change and global warming, and even some consumer applications. A partial list of applications can be found [here](https://github.com/arampacha/CLIP-rsicd#applications). In general we think such models can be useful as digital assistants for humans engaged in searching through large collections of images.
73
+
74
+ We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
75
+
76
 
77
  #### Primary intended uses
78
 
 
84
 
85
  ## Data
86
 
87
+ The model was trained on publicly available remote sensing image captions datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ). More information on the datasets used can be found on [our project page](https://github.com/arampacha/CLIP-rsicd#dataset).
88
 
89
 
90
  ## Performance and Limitations