NimaZahedinameghi commited on
Commit
1075e78
1 Parent(s): 0999119

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -36,14 +36,42 @@ It achieves the following results on the evaluation set:
36
 
37
  ## Intended uses & limitations
38
 
39
- More information needed
 
 
 
 
40
 
41
  ## Training and evaluation data
42
 
43
- More information needed
 
 
 
44
 
45
  ## Training procedure
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ### Training hyperparameters
48
 
49
  The following hyperparameters were used during training:
@@ -78,4 +106,63 @@ The following hyperparameters were used during training:
78
  - Transformers 4.42.4
79
  - Pytorch 2.3.1+cu121
80
  - Datasets 2.19.1
81
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ## Intended uses & limitations
38
 
39
+ This model is intended for:
40
+ - Analyzing workplace incident descriptions
41
+ - Providing structured hazard classifications
42
+ - Identifying hazard sources and types
43
+ - Generating keywords for database querying related to incidents
44
 
45
  ## Training and evaluation data
46
 
47
+ The model was fine-tuned on a custom dataset (`incident_descriptions.json`) containing workplace safety reports. Each entry in the dataset includes:
48
+ - An instruction
49
+ - An incident description
50
+ - A structured output with hazard classification
51
 
52
  ## Training procedure
53
 
54
+ The model was fine-tuned using the Axolotl framework with the following configuration:
55
+
56
+ ```json
57
+ {
58
+ "_name_or_path": "mistralai/Mistral-7B-v0.1",
59
+ "architectures": ["MistralForCausalLM"],
60
+ "attention_dropout": 0.0,
61
+ "hidden_size": 4096,
62
+ "num_attention_heads": 32,
63
+ "num_hidden_layers": 32,
64
+ "num_key_value_heads": 8,
65
+ "quantization_config": {
66
+ "load_in_8bit": true,
67
+ "quant_method": "bitsandbytes"
68
+ },
69
+ "torch_dtype": "bfloat16",
70
+ "transformers_version": "4.42.4",
71
+ "use_cache": false
72
+ }
73
+ ```
74
+
75
  ### Training hyperparameters
76
 
77
  The following hyperparameters were used during training:
 
106
  - Transformers 4.42.4
107
  - Pytorch 2.3.1+cu121
108
  - Datasets 2.19.1
109
+ - Tokenizers 0.19.1
110
+
111
+ ## How to Use
112
+
113
+ Here's how you can use this model for workplace hazard identification:
114
+
115
+ ```python
116
+ from transformers import AutoTokenizer, AutoModelForCausalLM
117
+ import torch
118
+
119
+ # Load model and tokenizer
120
+ model_name = "NimaZahedinameghi/WHI"
121
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
122
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
123
+
124
+ # Prepare the input
125
+ instruction = "Given an incident description from a workplace safety report, analyze the text and provide a structured hazard classification. Your response should include the hazard source (broken down into three levels of granularity), the general hazard type, and keywords for database querying related to the incident. Ensure your classification is specific and accurately reflects the details provided in the incident description."
126
+ incident_description = "During the night shift, a worker was operating a forklift in the warehouse. While maneuvering between storage racks, the forklift's rear wheel caught on a piece of loose pallet wrap on the floor. This caused the forklift to swerve suddenly, colliding with a nearby rack. The impact dislodged several heavy boxes from the upper levels, which fell and narrowly missed the worker. The worker managed to stop the forklift and exit safely, but was visibly shaken by the near-miss incident."
127
+
128
+ # Combine instruction and input
129
+ input_text = f"{instruction}\n\nIncidentDescription: {incident_description}"
130
+
131
+ # Tokenize and generate
132
+ input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device)
133
+ output = model.generate(input_ids, max_length=500, num_return_sequences=1, do_sample=True, temperature=0.7)
134
+
135
+ # Decode and print the result
136
+ result = tokenizer.decode(output[0], skip_special_tokens=True)
137
+ print(result)
138
+ ```
139
+
140
+ This code will generate a structured hazard classification based on the given incident description.
141
+
142
+ ## Limitations and Biases
143
+
144
+ - The model's performance is limited by the quality and diversity of the training data.
145
+ - It may not accurately classify hazards outside its training domain.
146
+ - The model should not be used as the sole basis for safety decisions; always consult with safety professionals.
147
+
148
+ ## Ethical Considerations
149
+
150
+ When using this model, consider:
151
+ - Privacy: Ensure that incident descriptions do not contain personally identifiable information.
152
+ - Accountability: The model's outputs should be reviewed by qualified safety professionals.
153
+ - Bias: Be aware of potential biases in the training data that could affect the model's classifications.
154
+
155
+ ## Citation
156
+
157
+ If you use this model in your research, please cite:
158
+
159
+ ```
160
+ @misc{WHI2023,
161
+ author = {Nima Zahedinameghi},
162
+ title = {WHI: Workplace Hazard Identification Model},
163
+ year = {2023},
164
+ publisher = {HuggingFace},
165
+ journal = {HuggingFace Hub},
166
+ howpublished = {\url{https://huggingface.co/NimaZahedinameghi/WHI}},
167
+ }
168
+ ```