Safetensors
wolfgangblack commited on
Commit
710a967
·
verified ·
1 Parent(s): 3a955dc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ metrics:
4
+ - accuracy
5
+ - f1
6
+ ---
7
+ # Model Card for Model ID
8
+
9
+ This repo contains models used as raters for media into categories of PG, PG13, R, X, and XXX. These models are single modality models used to create an ensemble or multimodal model. In the case of the multimodal model, the single modality models are used as processor components to create the inputs for a smaller Multilayer Perceptron (MLP)
10
+
11
+ ## Model Details
12
+
13
+
14
+ ### Model Description
15
+
16
+ The main model here is the multimodal model trained 7/22/24. This model was trained using a weighted soft f1 loss with emphasis on class 0 (PG). This model utilizes finetuned resnet18, ViT, resnet50 with cross validation, prompt Bert, and prompt Roberta in the MultiModalProcessor. This processor passes the proper modality through the proper models and then returns the last hidden layer. These vectors are concatonated to create the input to the Multimodal Models MLP.
17
+
18
+ Each model was trained on the same balanced downsampled dataset found [here](https://civitai.com/models/544550/training-data-for-image-classification). Please note: this dataset contains some mislabeled data across each label. The resnet50-CV is the only model which may have different training/test set data due to the cross validation search, however no data used for evaluation was found in the training/test sets. The data for evaluation is a private dataset labeled by Wolfgang Black and Seb at CivitAI.
19
+
20
+
21
+ - **Developed by:** Wolfgang Black
22
+ - **Model type:** Multimodal
23
+ - **Language(s) (NLP):** English
24
+ - **Finetuned from model [optional]:** Various - due to the multimodal nature however ony the MLP was truly trained from scratch.
25
+
26
+ ### Model Sources [optional]
27
+
28
+ #### ResNets
29
+ - **Link** - https://pytorch.org/vision/main/models/resnet.html
30
+ - Note: models were initialized with `weights = 'ImageNetV1'`
31
+
32
+ #### ViT
33
+ - **Repository:** https://huggingface.co/google/vit-base-patch16-224
34
+ - **Paper [optional]:** https://arxiv.org/abs/2010.11929
35
+
36
+ #### DistilBert
37
+ This model is the basis for promptBert
38
+ - **Repository:** https://huggingface.co/distilbert/distilbert-base-uncased
39
+ - **Paper:** https://arxiv.org/abs/1910.01108
40
+
41
+ #### Roberta
42
+ This model is the basis for promptRoberta
43
+ - **Repository:** https://huggingface.co/FacebookAI/roberta-large-mnli
44
+ - **Paper:** https://arxiv.org/abs/1907.11692
45
+
46
+
47
+ ## Uses
48
+
49
+ These models should be used to classify generated images or text into movie-ratings
50
+
51
+ ## How to Get Started with the Model
52
+
53
+ `Warning`: I did not include the code here necessary for the Multimodal Config, Processor, or Model. The code snippet below assumes the users have that code.
54
+
55
+ ```
56
+ from src.multimodal_model import MultimodalConfig, MultimodalModel, MultimodalProcessor
57
+ model_dir = '' #where the multimodal directory is
58
+ config = MultimodalConfig.from_pretrained(model_dir)
59
+ model = MultimodalModel(config).from_pretrained(model_dir) #assumes composite models exist in directories as specified by config
60
+ processor = MultimodalProcessor(models = config.models) #assumes composite models exist in directories as specified by config
61
+ model.eval()
62
+ with torch.no_grad():
63
+ outputs = model(**inputs) ##assumes inputs as pil.Image, text = None | str(prompt), tags = None | str(tags), label = None | str
64
+ logits = outputs['logits']
65
+ torch.argmax(logits, dim = 1).item()
66
+ prediction = model.config.id2label[torch.argmax(out['logits'], dim=1).item()]
67
+ ```
68
+
69
+
70
+ ### Out-of-Scope Use
71
+
72
+ Currently all models are untested on videos
73
+
74
+
75
+ ## Bias, Risks, and Limitations
76
+
77
+ Models are entirely finetuned (in the case of composite models) or trained (MLP) on generated images and may not work well on real images or non-digital media
78
+
79
+ ### Recommendations
80
+
81
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. This includes the poor labels for PG13/R due to personal bias of the dataset as well as that all data for training is generated images