--- license: apache-2.0 base_model: sentence-transformers/all-mpnet-base-v2 tags: - generated_from_trainer metrics: - accuracy model-index: - name: IKT_classifier_netzero_best results: [] widget: - text: "We have put forth a long-term low- emissions development strategy (LEDS) that aspires to halve emissions from its peak to 33 MtCO2e by 2050, with a view to achieving net-zero emissions as soon as viable in the second half of the century. This will require serious and concerted efforts across our industry, economy and society" example_title: NET-ZERO - text: "Unconditional Contribution In the unconditional scenario, GHG emissions would be reduced by 27.56 Mt CO2e (6.73%) below BAU in 2030 in the respective sectors. 26.3 Mt CO2e (95.4%) of this emission reduction will be from the Energy sector while 0.64 (2.3%) and 0.6 (2.2%) Mt CO2e reduction will be from AFOLU (agriculture) and waste sector respectively. There will be no reduction in the IPPU sector. Conditional Contribution In the conditional scenario, GHG emissions would be reduced by 61.9 Mt CO2e (15.12%) below BAU in 2030 in the respective sectors." example_title: TARGET_FREE - text: "This land is buffered from the sea by the dyke and a network of drains and pumps will control the water levels in the polder. We have raised the minimum platform levels for new developments from 3m to 4m above the Singapore Height Datum (SHD) since 2011. Presently, critical infrastructure on existing coastal land, notably Changi Airport Terminal 5 and Tuas Port, will be constructed with platform levels at least 5m above SHD." example_title: NEGATIVE --- # IKT_classifier_netzero_best This model is a fine-tuned version of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the [GIZ/policy_qa_v0_1](https://huggingface.co/datasets/GIZ/policy_qa_v0_1) dataset. It achieves the following results on the evaluation set: - Loss: 0.4126 - Precision Macro: 0.9246 - Precision Weighted: 0.9248 - Recall Macro: 0.9209 - Recall Weighted: 0.9211 - F1-score: 0.9219 - Accuracy: 0.9211 ## Model description The model is a multi-class text classifier based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and fine-tuned on text sourced from national climate policy documents. ## Intended uses & limitations The classifier assigns a class of 'NEGATIVE','TARGET_FREE', or 'NET-ZERO' to denote **alignment with Net-Zero targets** in extracted passages from the documents. The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports. The performance of the classifier is very high. On training, the classifier exhibited very good overall performance (F1 ~ 0.9). This performance was evenly balanced between precise identification of true positive classifications (precision ~ 0.9) and a wide net to capture as many true positives as possible (recall ~ 0.9). When tested on real world unseen test data, the performance was still very high (F1 ~ 0.85). However, testing was based on a small out-of-sample dataset. Therefore classification performance will need to further evaluated on deployment. ## Training and evaluation data The training dataset is comprised of labelled passages from 2 sources: - [ClimateWatch NDC Sector data](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1). - [IKI TraCS Climate Strategies for Transport Tracker](https://changing-transport.org/wp-content/uploads/20220722_Tracker_Database.xlsx) implemented by GIZ and funded by theInternational Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK). Here we utilized the QA dataset (CW_NDC_data_Sector). The combined dataset[GIZ/policy_qa_v0_1](https://huggingface.co/datasets/GIZ/policy_qa_v0_1) contains ~85k rows. Each row is duplicated twice, to provide varying sequence lengths (denoted by the values 'small', 'medium', and 'large', which correspond to sequence lengths of 60, 85, and 150 respectively - indicated in the 'strategy' column). This effectively means the dataset is reduced by 1/3 in useful size, and the 'strategy' value should be selected based on the use case. For this training, we utilized the 'medium' samples Furthermore, for each row, the 'context' column contains 3 samples of varying quality. The approach used to assess quality and select samples is described below. The pre-processing operations used to produce the final training dataset were as follows: 1. Dataset is filtered based on 'medium' value in 'strategy' column (sequence length = 85). 2. For ClimateWatch, all rows are removed as there was assessed to be no taxonomical alignment with the IKITracs labels inherent to the dataset. For IKITracs, labels are assigned based on the presence of certain substrings based on 'parameter' values which correspond to assessments of Net-Zero targets by human annotaters. The specific assignments are as follows: > - 'NET-ZERO': target_labels = ['T_Netzero','T_Netzero_C'] > - 'NEGATIVE': target_labels_neg = ['T_Economy_C','T_Economy_Unc','T_Adaptation_C','T_Adaptation_Unc','T_Transport_C','T_Transport_O_C','T_Transport_O_Unc','T_Transport_Unc'] > - 'TARGET_FREE': random sample of other (non-target) labeled data 3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'. 4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples. 5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount') 6. Data is then augmented using sentence shuffle from the ```albumentations``` library and NLP-based insertions using ```nlpaug```. This is done to increase the number of training samples available for the Net-Zero class from 62 to 124. The end result is a almost equal sample per class breakdown of: > - 'NET-ZERO': 124 > - 'NEGATIVE': 126 > - 'TARGET_FREE': 125 ## Training procedure The model hyperparameters were tuned using ```optuna``` over 10 trials on a truncated training and validation dataset. The model was then trained over 5 epochs using the best hyperparameters identified. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 9.588722322096848e-05 - train_batch_size: 3 - eval_batch_size: 3 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 400.0 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision Macro | Precision Weighted | Recall Macro | Recall Weighted | F1-score | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------------:|:------------------:|:------------:|:---------------:|:--------:|:--------:| | No log | 1.0 | 113 | 0.7402 | 0.8808 | 0.8847 | 0.8697 | 0.8684 | 0.8694 | 0.8684 | | No log | 2.0 | 226 | 0.8484 | 0.84 | 0.8358 | 0.6752 | 0.6842 | 0.6675 | 0.6842 | | No log | 3.0 | 339 | 0.3188 | 0.9209 | 0.9229 | 0.9209 | 0.9211 | 0.9200 | 0.9211 | | No log | 4.0 | 452 | 0.5524 | 0.8889 | 0.8925 | 0.8718 | 0.8684 | 0.8689 | 0.8684 | | 0.5553 | 5.0 | 565 | 0.4126 | 0.9246 | 0.9248 | 0.9209 | 0.9211 | 0.9219 | 0.9211 | ### Framework versions - Transformers 4.31.0 - Pytorch 2.0.1+cu118 - Datasets 2.13.1 - Tokenizers 0.13.3