amaye15
/

Florence-2-DaViT-large-ft

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

amaye15 commited on Jul 30, 2024

Commit

7589a7b

·

verified ·

1 Parent(s): 6fda347

Update README.md

Files changed (1) hide show

README.md +15 -22

README.md CHANGED Viewed

@@ -20,30 +20,23 @@ DaViT (Dual-Attention Vision Transformer) is designed to handle image classifica
 Here is an example of how to use the DaViT model for image classification:
 ```python
-import torch
-from transformers import AutoModel, AutoConfig
-# Load the configuration and model
-config = AutoConfig.from_pretrained("your-username/DaViT")
-model = AutoModel.from_pretrained("your-username/DaViT")
-# Generate a random sample input tensor with shape (batch_size, channels, height, width)
-batch_size = 2
-channels = 3
-height = 224
-width = 224
-sample_input = torch.randn(batch_size, channels, height, width)
-# Pass the sample input through the model
-output = model(sample_input)
-# Print the output shape
-print(f"Output shape: {output.shape}")
-```
-## Files
-- `configuration_davit.py`: Contains the `DaViTConfig` class.
-- `modeling_davit.py`: Contains the `DaViTModel` class.
-- `test_davit_model.py`: Script to test the model.
-- `config.json`: Configuration file for the model.
-- `model.safetensors`: Pretrained weights of the DaViT model.
 ## Credits

 Here is an example of how to use the DaViT model for image classification:
 ```python
+# Load model directly
+from transformers import AutoModel, AutoProcessor
+from PIL import Image
+import requests
+model = AutoModel.from_pretrained("amaye15/DaViT-Florence-2-large-ft", trust_remote_code=True, cache_dir=os.getcwd())
+processor = AutoProcessor.from_pretrained("amaye15/DaViT-Florence-2-large-ft", trust_remote_code=True, cache_dir=os.getcwd())
+prompt = "<OCR>"
+url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
+image = image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(text=prompt, images=image, return_tensors="pt")
+model(inputs["pixel_values"])
+```
 ## Credits