gheinrich commited on
Commit
cc0d2ab
1 Parent(s): f134378

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AM-RADIO: Reduce All Domains Into One
2
+
3
+ Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov
4
+
5
+ [NVIDIA Research](https://www.nvidia.com/en-us/research/)
6
+
7
+ \[[Paper](https://arxiv.org/abs/2312.06709)\]\[[BibTex](#citing-radio)\]
8
+
9
+ ## Pretrained Models
10
+
11
+
12
+ ### HuggingFace Hub
13
+
14
+ Pull the E-RADIO model from a Python script:
15
+
16
+ ```Python
17
+ from transformers import AutoModel
18
+ model = AutoModel.from_pretrained("nvidia/E-RADIO", trust_remote_code=True)
19
+ ```
20
+
21
+ ### Usage
22
+
23
+ E-RADIO will return a tuple with two tensors.
24
+ The `summary` is similar to the `cls_token` in ViT and is meant to represent the general concept of the entire image.
25
+ It has shape $(B,C)$ with $B$ being the batch dimension, and $C$ being some number of channels.
26
+ The `spatial_features` represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM.
27
+ Spatial features have shape $(B,H,W,D)$ with $H$ being the height, and $W$ being the width of the spatial features.
28
+
29
+ ## Training
30
+
31
+ _Coming Soon_
32
+
33
+ ## License
34
+
35
+ RADIO code and weights are released under the [NSCLv1 License](LICENSE).
36
+
37
+ ## Citing RADIO
38
+
39
+ If you find this repository useful, please consider giving a star and citation:
40
+ ```
41
+ @misc{ranzinger2023amradio,
42
+ title={AM-RADIO: Agglomerative Model -- Reduce All Domains Into One},
43
+ author={Mike Ranzinger and Greg Heinrich and Jan Kautz and Pavlo Molchanov},
44
+ year={2023},
45
+ eprint={2312.06709},
46
+ archivePrefix={arXiv},
47
+ primaryClass={cs.CV}
48
+ }
49
+ ```