Safetensors
llava_llama
nielsr HF staff commited on
Commit
10fdabc
·
verified ·
1 Parent(s): ccbda92

Add pipeline tag, library name, and project page link

Browse files

This PR adds the `pipeline_tag` and `library_name` to the model card metadata. The `pipeline_tag` is set to `image-text-to-text`, reflecting the model's capability to process both image and text inputs to generate text outputs. The `library_name` is set to `transformers` given the model's compatibility with the Hugging Face Transformers library. This PR also adds a link to the project page for easier access to the online demo.

Files changed (1) hide show
  1. README.md +52 -46
README.md CHANGED
@@ -1,46 +1,52 @@
1
- ---
2
- license: other
3
- license_name: nvidia-oneway-noncommercial-license
4
- license_link: LICENSE
5
- ---
6
-
7
- # Llama3-VILA-M3-3B
8
-
9
- > Built with Meta Llama 3
10
-
11
- ## Model Overview
12
-
13
- ## Description:
14
- M3 is a medical visual language model that empowers medical imaging professionals, researchers, and healthcare enterprises by enhancing medical imaging workflows across various modalities.
15
-
16
- Key features include:
17
- - Integration with expert models from the MONAI Model Zoo
18
- - Support for multiple imaging modalities
19
-
20
- For more details, see our [repo](https://github.com/Project-MONAI/VLM)
21
-
22
- ### Core Capabilities
23
- M3 NIM provides a comprehensive suite of 2D medical image analysis tools, including:
24
- 1. Segmentation
25
- 2. Classification
26
- 3. Visual Question Answering (VQA)
27
- 4. Report/Findings Generation
28
-
29
- These capabilities are applicable across various medical imaging modalities, leveraging expert models from the MONAI Model Zoo to ensure high-quality results.
30
-
31
- ## Model Architecture:
32
- **Architecture Type:** Auto-Regressive Vision Language Model
33
- **Network Architecture:** [VILA](https://github.com/NVlabs/VILA) with Llama
34
-
35
- ## Input:
36
- **Input Type(s):** Text and Image
37
- **Input Format(s):** Text: String, Image
38
- **Input Parameters:** Text: 1D, Image: 2D
39
-
40
- ## Output:
41
- **Output Type(s):** Text and Image
42
- **Output Format:** Text: String and Image
43
- **Output Parameters:** Text: 1D, Image: 2D/3D
44
-
45
- ## Ethical Considerations
46
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-oneway-noncommercial-license
4
+ license_link: LICENSE
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
+ ---
8
+
9
+ # Llama3-VILA-M3-3B
10
+
11
+ > Built with Meta Llama 3
12
+
13
+ ## Model Overview
14
+
15
+ ## Description:
16
+ M3 is a medical visual language model that empowers medical imaging professionals, researchers, and healthcare enterprises by enhancing medical imaging workflows across various modalities.
17
+
18
+ Key features include:
19
+ - Integration with expert models from the MONAI Model Zoo
20
+ - Support for multiple imaging modalities
21
+
22
+ For more details, see our [repo](https://github.com/Project-MONAI/VLM)
23
+
24
+ ### Core Capabilities
25
+ M3 NIM provides a comprehensive suite of 2D medical image analysis tools, including:
26
+ 1. Segmentation
27
+ 2. Classification
28
+ 3. Visual Question Answering (VQA)
29
+ 4. Report/Findings Generation
30
+
31
+ These capabilities are applicable across various medical imaging modalities, leveraging expert models from the MONAI Model Zoo to ensure high-quality results.
32
+
33
+ ## Model Architecture:
34
+ **Architecture Type:** Auto-Regressive Vision Language Model
35
+ **Network Architecture:** [VILA](https://github.com/NVlabs/VILA) with Llama
36
+
37
+ ## Input:
38
+ **Input Type(s):** Text and Image
39
+ **Input Format(s):** Text: String, Image
40
+ **Input Parameters:** Text: 1D, Image: 2D
41
+
42
+ ## Output:
43
+ **Output Type(s):** Text and Image
44
+ **Output Format:** Text: String and Image
45
+ **Output Parameters:** Text: 1D, Image: 2D/3D
46
+
47
+ ## Ethical Considerations
48
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
49
+
50
+
51
+ ## Project Page:
52
+ https://vila-m3-demo.monai.ngc.nvidia.com/