kimihailv commited on
Commit
6872d9c
1 Parent(s): a4c8314

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -30
README.md CHANGED
@@ -58,47 +58,24 @@ To encode data:
58
 
59
  ```python
60
  from PIL import Image
 
61
  text = 'a small red panda in a zoo'
62
  image = Image.open('red_panda.jpg')
63
- image_data = model.preprocess_image(image)
64
- text_data = model.preprocess_text(text)
65
- image_embedding = model.encode_image(image_data)
66
- text_embedding = model.encode_text(text_data)
67
- score, joint_embedding = model.encode_multimodal(
68
- image_features=image_features,
69
- text_features=text_features,
70
- attention_mask=text_data['attention_mask'],
71
- return_scores=True
72
- )
73
- ```
74
 
75
- To get features:
 
76
 
77
- ```python
78
  image_features, image_embedding = model.encode_image(image_data, return_features=True)
79
  text_features, text_embedding = model.encode_text(text_data, return_features=True)
80
- ```
81
-
82
- These features can later be used to produce joint multimodal encodings faster, as the first layers of the transformer can be skipped:
83
-
84
- ```python
85
- joint_embedding = model.encode_multimodal(
86
  image_features=image_features,
87
  text_features=text_features,
88
- attention_mask=text_data['attention_mask']
 
89
  )
90
  ```
91
 
92
- There are two options to calculate semantic compatibility between an image and a text: [Cosine Similarity](#cosine-similarity) and [Matching Score](#matching-score).
93
-
94
- ### Cosine Similarity
95
-
96
- ```python
97
- import torch.nn.functional as F
98
- similarity = F.cosine_similarity(image_embedding, text_embedding)
99
- ```
100
-
101
- The `similarity` will belong to the `[-1, 1]` range, `1` meaning the absolute match.
102
 
103
  __Pros__:
104
 
 
58
 
59
  ```python
60
  from PIL import Image
61
+
62
  text = 'a small red panda in a zoo'
63
  image = Image.open('red_panda.jpg')
 
 
 
 
 
 
 
 
 
 
 
64
 
65
+ image_data = processor.preprocess_image(image)
66
+ text_data = processor.preprocess_text(text)
67
 
 
68
  image_features, image_embedding = model.encode_image(image_data, return_features=True)
69
  text_features, text_embedding = model.encode_text(text_data, return_features=True)
70
+ score, joint_embedding = model.encode_multimodal(
 
 
 
 
 
71
  image_features=image_features,
72
  text_features=text_features,
73
+ attention_mask=text_data['attention_mask'],
74
+ return_scores=True
75
  )
76
  ```
77
 
78
+ There are two options to calculate semantic compatibility between an image and a text: cosine similarity and [Matching Score](#matching-score).
 
 
 
 
 
 
 
 
 
79
 
80
  __Pros__:
81