GoodBaiBai88 commited on
Commit
10543c8
1 Parent(s): 187dc32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -30
README.md CHANGED
@@ -12,44 +12,43 @@ M3D-CLIP is a 3D medical CLIP model, which aligns vision and language through co
12
  The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
13
  The text encoder utilizes a pre-trained BERT as initialization.
14
 
15
- ![M3D_CLIP_table]([M3D_CLIP_table.png]#pic_center)
16
- ![itr_result]([itr_result.png]#pic_center)
17
 
18
  # Quickstart
19
 
20
  ```python
21
- device = torch.device("cuda") # or cpu
22
 
23
- tokenizer = AutoTokenizer.from_pretrained(
24
- "GoodBaiBai88/M3D-CLIP",
25
- model_max_length=512,
26
- padding_side="right",
27
- use_fast=False
28
- )
29
- model = AutoModel.from_pretrained(
30
- "GoodBaiBai88/M3D-CLIP",
31
- trust_remote_code=True
32
- )
33
- model = model.to(device=device)
34
 
35
- # Prepare your 3D medical image:
36
- # 1. The image shape needs to be processed as 1*32*256*256, considering resize and other methods.
37
- # 2. The image needs to be normalized to 0-1, considering Min-Max Normalization.
38
- # 3. The image format needs to be converted to .npy
39
- # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
40
-
41
- image_path = ""
42
- input_txt = ""
43
 
44
- text_tensor = tokenizer(input_txt, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
45
- input_id = text_tensor["input_ids"].to(device=device)
46
- attention_mask = text_tensor["attention_mask"].to(device=device)
47
- image = np.load(image_path).to(device=device)
48
-
49
- with torch.inference_mode():
50
- image_features = model.encode_image(image)[:, 0]
51
- text_features = model.encode_text(input_id, attention_mask)[:, 0]
52
 
 
 
 
53
  ```
54
 
55
  # Citation
 
12
  The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
13
  The text encoder utilizes a pre-trained BERT as initialization.
14
 
15
+ ![M3D_CLIP_table]([./M3D_CLIP_table.png]#pic_center)
16
+ ![itr_result]([./itr_result.png]#pic_center)
17
 
18
  # Quickstart
19
 
20
  ```python
21
+ device = torch.device("cuda") # or cpu
22
 
23
+ tokenizer = AutoTokenizer.from_pretrained(
24
+ "GoodBaiBai88/M3D-CLIP",
25
+ model_max_length=512,
26
+ padding_side="right",
27
+ use_fast=False
28
+ )
29
+ model = AutoModel.from_pretrained(
30
+ "GoodBaiBai88/M3D-CLIP",
31
+ trust_remote_code=True
32
+ )
33
+ model = model.to(device=device)
34
 
35
+ # Prepare your 3D medical image:
36
+ # 1. The image shape needs to be processed as 1*32*256*256, considering resize and other methods.
37
+ # 2. The image needs to be normalized to 0-1, considering Min-Max Normalization.
38
+ # 3. The image format needs to be converted to .npy
39
+ # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
40
+
41
+ image_path = ""
42
+ input_txt = ""
43
 
44
+ text_tensor = tokenizer(input_txt, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
45
+ input_id = text_tensor["input_ids"].to(device=device)
46
+ attention_mask = text_tensor["attention_mask"].to(device=device)
47
+ image = np.load(image_path).to(device=device)
 
 
 
 
48
 
49
+ with torch.inference_mode():
50
+ image_features = model.encode_image(image)[:, 0]
51
+ text_features = model.encode_text(input_id, attention_mask)[:, 0]
52
  ```
53
 
54
  # Citation