--- tags: - reviews - multi-class - classifier - text classification - roberta-base widget: - text: "The room is looks better than what we have expected. Spacious and clean. The room is warm all the time and the beds are so soft and comfortable to sleep on. We dont get a lot of sleep in Tokyo, and sleeping on Minn's bed was the greatest thing ever.The hotel is near the JR and Nijojo Mae station & lots of supermarket.Thank you for having us. This place is highly recommended if you decide to stay in Kyoto." - text: "We went for a weekend to be out in nature with our kids and a friend. The house is very cute inside and decorated nicely BUT the property photos leave out a house right next-door, so not private, a messy yard area w broken down sheds and construction, a gun range close by so all we could hear was gunshots all day, the kitchen cabinets esp the pantry were dirty and filled w junk and the hot tub was foggy, dirty and they must have just dumped a lot of bleach in rather than balancing the chemicals and cleaning it properly because everyone got rashes/eye irritation/headaches and had to get out and shower. The house really only sleeps five and you are stuck scrounging for pillows blankets and sheets and blowing up an aero bed for anyone else. The first one had a leak so we had to find a second and do it all again. We could not find a trundle bed. I really wanted to like it as cute as the pictures are but the real thing leaves a lot to be desired." - text: "Was quiet and nice" --- ## Jupyter Notebooks GitHub link : [lihuicham/airbnb-helpfulness-classifier](https://github.com/lihuicham/airbnb-helpfulness-classifier) Fine-tuning Python code in `finetuning.ipynb` ## Team Members (S001 - Synthetic Expert Team E) : Li Hui Cham, Isaac Sparrow, Christopher Arraya, Nicholas Wong, Lei Zhang, Leonard Yang ## Description This model is an AirBnB reviews helpfulness classifier. It can predict the helpfulness, from most helpful (A) to least helpful (C) of the reviews on AirBnB website. ## Pre-trained LLM Our project fine-tuned [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) for multi-class text (sequence) classification. ## Dataset 5000 samples are scraped from AirBnB website based on `listing_id` from this [Kaggle AirBnB Listings & Reviews dataset](https://www.kaggle.com/datasets/mysarahmadbhat/airbnb-listings-reviews).Samples were translated from French to English language. Training Set : 4560 samples synthetically labelled by GPT-4 Turbo. Cost was approximately $60. Test/Evaluation Set : 500 samples labelled manually by two groups (each group labelled 250 samples), majority votes applies. A scoring rubrics (shown below) is used for labelling. ## Training Details ``` hyperparameters = {'learning_rate': 3e-05, 'per_device_train_batch_size': 16, 'weight_decay': 1e-04, 'num_train_epochs': 4, 'warmup_steps': 500} ``` We trained our model on Colab Pro which costed us approximately 56 computing units. ## Slides ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6622aad539b849b30889a466/VyDlefWdJI6mTHh6QPfSk.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6622aad539b849b30889a466/o0rpAVcsiGAsw1Tfnk05d.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6622aad539b849b30889a466/dh8ZbajbaU2xOu9NUkePm.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6622aad539b849b30889a466/eRsqmSSAF6OcTHj1o-zlJ.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6622aad539b849b30889a466/bghUlOv61-PFftjzxdDSE.png)