nielsr HF staff commited on
Commit
77890b9
·
verified ·
1 Parent(s): abf6f42

Add pipeline tag, license and improve tags

Browse files

This PR adds the missing `pipeline_tag` and `license` to the model card metadata. It also enhances the tags section for better searchability and categorization. The `image-text-to-text` tag reflects the model's ability to perform image classification with text-based explanations.

Files changed (1) hide show
  1. README.md +27 -19
README.md CHANGED
@@ -1,27 +1,35 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Fine-Grained Visual Classification on FGVC-Aircraft
7
 
8
- Project Page: [SelfSynthX](https://github.com/sycny/SelfSynthX).
9
 
10
  Paper on arXiv: [Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data](https://arxiv.org/abs/2502.14044)
11
 
 
12
 
13
- This model is a fine-tuned multimodal foundation model based on [LLaVA-1.5-7B-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf), optimized for fine-grained classification of aircraft types using the FGVC-Aircraft dataset.
14
 
15
- ## Key Details
16
-
17
- - **Base Model:** LLaVA-1.5-7B
18
- - **Dataset:** FGVC-Aircraft (Fine-Grained Visual Classification of Aircraft)
19
- - **Innovation:**
20
- - **Self-Synthesized Data:** Extracts and highlights distinctive aircraft-specific visual features using the Information Bottleneck principle.
21
- - **Iterative Fine-Tuning:** Uses reward model-free rejection sampling to improve classification accuracy and explanation quality.
22
  - **Intended Use:** Identification of aircraft models with human-verifiable explanations.
23
 
24
- ## How to Use
25
 
26
  ```python
27
  import requests
@@ -31,8 +39,8 @@ from transformers import AutoProcessor, LlavaForConditionalGeneration
31
 
32
  model_id = "YuchengShi/LLaVA-v1.5-7B-Fgvc"
33
  model = LlavaForConditionalGeneration.from_pretrained(
34
- model_id,
35
- torch_dtype=torch.float16,
36
  low_cpu_mem_usage=True,
37
  ).to("cuda")
38
  processor = AutoProcessor.from_pretrained(model_id)
@@ -55,14 +63,14 @@ output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
55
  print(processor.decode(output[0][2:], skip_special_tokens=True))
56
  ```
57
 
58
- ## Training & Evaluation
59
 
60
- - **Training:** Fine-tuned using LoRA on FGVC-Aircraft with iterative rejection sampling.
61
  - **Evaluation:** Achieves high accuracy in distinguishing aircraft types while providing detailed, interpretable explanations.
62
 
63
- ## Citation
64
 
65
- If you use this model, please cite:
66
 
67
  ```bibtex
68
  @inproceedings{
 
1
  ---
2
+ library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ license: mit
5
+ tags:
6
+ - multimodal
7
+ - image-classification
8
+ - explanation
9
+ - visual-reasoning
10
+ - fine-grained-classification
11
+ - llava
12
+ - fgvc
13
  ---
14
 
15
+ # Fine-Grained Visual Classification on FGVC-Aircraft
16
 
17
+ Project Page: [SelfSynthX](https://github.com/sycny/SelfSynthX).
18
 
19
  Paper on arXiv: [Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data](https://arxiv.org/abs/2502.14044)
20
 
21
+ This model is a fine-tuned multimodal foundation model based on [LLaVA-1.5-7B-hf](https://huggingface.co/llava-hf/llava-1.5-7B-hf), optimized for fine-grained classification of aircraft types using the FGVC-Aircraft dataset.
22
 
23
+ ## Key Details
24
 
25
+ - **Base Model:** LLaVA-1.5-7B
26
+ - **Dataset:** FGVC-Aircraft (Fine-Grained Visual Classification of Aircraft)
27
+ - **Innovation:**
28
+ - **Self-Synthesized Data:** Extracts and highlights distinctive aircraft-specific visual features using the Information Bottleneck principle.
29
+ - **Iterative Fine-Tuning:** Uses reward model-free rejection sampling to improve classification accuracy and explanation quality.
 
 
30
  - **Intended Use:** Identification of aircraft models with human-verifiable explanations.
31
 
32
+ ## How to Use
33
 
34
  ```python
35
  import requests
 
39
 
40
  model_id = "YuchengShi/LLaVA-v1.5-7B-Fgvc"
41
  model = LlavaForConditionalGeneration.from_pretrained(
42
+ model_id,
43
+ torch_dtype=torch.float16,
44
  low_cpu_mem_usage=True,
45
  ).to("cuda")
46
  processor = AutoProcessor.from_pretrained(model_id)
 
63
  print(processor.decode(output[0][2:], skip_special_tokens=True))
64
  ```
65
 
66
+ ## Training & Evaluation
67
 
68
+ - **Training:** Fine-tuned using LoRA on FGVC-Aircraft with iterative rejection sampling.
69
  - **Evaluation:** Achieves high accuracy in distinguishing aircraft types while providing detailed, interpretable explanations.
70
 
71
+ ## Citation
72
 
73
+ If you use this model, please cite:
74
 
75
  ```bibtex
76
  @inproceedings{