Add README.md
Browse files
README.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Jupyter Notebooks
|
2 |
+
GitHub link : [lihuicham/airbnb-helpfulness-classifier](https://github.com/lihuicham/airbnb-helpfulness-classifier)
|
3 |
+
|
4 |
+
Fine-tuning Python code in `finetuning.ipynb`
|
5 |
+
|
6 |
+
## Team Members (S001 - Synthetic Expert Team E) :
|
7 |
+
|
8 |
+
Li Hui Cham, Isaac Sparrow, Christopher Arraya, Nicholas Wong, Lei Zhang, Leonard Yang
|
9 |
+
|
10 |
+
## Description
|
11 |
+
This model is an AirBnB reviews helpfulness classifier. It can predict the helpfulness, from most helpful (A) to least helpful (C) of the reviews on AirBnB website.
|
12 |
+
|
13 |
+
## Pre-trained LLM
|
14 |
+
Our project fine-tuned [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) for multi-class text (sequence) classification.
|
15 |
+
|
16 |
+
## Dataset
|
17 |
+
5000 samples are scraped from AirBnB website based on `listing_id` from this [Kaggle AirBnB Listings & Reviews dataset](https://www.kaggle.com/datasets/mysarahmadbhat/airbnb-listings-reviews).Samples were translated from French to English language.
|
18 |
+
|
19 |
+
Training Set : 4560 samples synthetically labelled by GPT-4 Turbo. Cost was approximately $60.
|
20 |
+
|
21 |
+
Test/Evaluation Set : 500 samples labelled manually by two groups (each group labelled 250 samples), majority votes applies. A scoring rubrics (shown below) is used for labelling.
|
22 |
+
|
23 |
+
## Training Details
|
24 |
+
```
|
25 |
+
hyperparameters = {'learning_rate': 3e-05,
|
26 |
+
'per_device_train_batch_size': 16,
|
27 |
+
'weight_decay': 1e-04,
|
28 |
+
'num_train_epochs': 4,
|
29 |
+
'warmup_steps': 500}
|
30 |
+
```
|
31 |
+
|
32 |
+
We trained our model on Colab Pro which costed us approximately 56 computing units.
|
33 |
+
|
34 |
+
## Slides
|
35 |
+
|
36 |
+

|
37 |
+
|
38 |
+

|
39 |
+
|
40 |
+

|
41 |
+
|
42 |
+

|
43 |
+
|
44 |
+

|