Chapter 4 Cat Boost
Lastly, we are going to set up and try catboost


##
## iter imp variable
## 1 1 Arrival.Delay.in.Minutes
## 1 2 Arrival.Delay.in.Minutes
## 1 3 Arrival.Delay.in.Minutes
## 1 4 Arrival.Delay.in.Minutes
## 1 5 Arrival.Delay.in.Minutes
## 2 1 Arrival.Delay.in.Minutes
## 2 2 Arrival.Delay.in.Minutes
## 2 3 Arrival.Delay.in.Minutes
## 2 4 Arrival.Delay.in.Minutes
## 2 5 Arrival.Delay.in.Minutes
## 3 1 Arrival.Delay.in.Minutes
## 3 2 Arrival.Delay.in.Minutes
## 3 3 Arrival.Delay.in.Minutes
## 3 4 Arrival.Delay.in.Minutes
## 3 5 Arrival.Delay.in.Minutes
## 4 1 Arrival.Delay.in.Minutes
## 4 2 Arrival.Delay.in.Minutes
## 4 3 Arrival.Delay.in.Minutes
## 4 4 Arrival.Delay.in.Minutes
## 4 5 Arrival.Delay.in.Minutes
## 5 1 Arrival.Delay.in.Minutes
## 5 2 Arrival.Delay.in.Minutes
## 5 3 Arrival.Delay.in.Minutes
## 5 4 Arrival.Delay.in.Minutes
## 5 5 Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations: 5
## Imputation methods:
## X id
## "" ""
## Gender Customer.Type
## "" ""
## Age Type.of.Travel
## "" ""
## Class Flight.Distance
## "" ""
## Inflight.wifi.service Departure.Arrival.time.convenient
## "" ""
## Ease.of.Online.booking Gate.location
## "" ""
## Food.and.drink Online.boarding
## "" ""
## Seat.comfort Inflight.entertainment
## "" ""
## On.board.service Leg.room.service
## "" ""
## Baggage.handling Checkin.service
## "" ""
## Inflight.service Cleanliness
## "" ""
## Departure.Delay.in.Minutes Arrival.Delay.in.Minutes
## "" "pmm"
## satisfaction
## ""
## PredictorMatrix:
## X id Gender Customer.Type Age Type.of.Travel Class
## X 0 1 0 0 1 0 0
## id 1 0 0 0 1 0 0
## Gender 1 1 0 0 1 0 0
## Customer.Type 1 1 0 0 1 0 0
## Age 1 1 0 0 0 0 0
## Type.of.Travel 1 1 0 0 1 0 0
## Flight.Distance Inflight.wifi.service
## X 1 1
## id 1 1
## Gender 1 1
## Customer.Type 1 1
## Age 1 1
## Type.of.Travel 1 1
## Departure.Arrival.time.convenient Ease.of.Online.booking
## X 1 1
## id 1 1
## Gender 1 1
## Customer.Type 1 1
## Age 1 1
## Type.of.Travel 1 1
## Gate.location Food.and.drink Online.boarding Seat.comfort
## X 1 1 1 1
## id 1 1 1 1
## Gender 1 1 1 1
## Customer.Type 1 1 1 1
## Age 1 1 1 1
## Type.of.Travel 1 1 1 1
## Inflight.entertainment On.board.service Leg.room.service
## X 1 1 1
## id 1 1 1
## Gender 1 1 1
## Customer.Type 1 1 1
## Age 1 1 1
## Type.of.Travel 1 1 1
## Baggage.handling Checkin.service Inflight.service Cleanliness
## X 1 1 1 1
## id 1 1 1 1
## Gender 1 1 1 1
## Customer.Type 1 1 1 1
## Age 1 1 1 1
## Type.of.Travel 1 1 1 1
## Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X 1 1 0
## id 1 1 0
## Gender 1 1 0
## Customer.Type 1 1 0
## Age 1 1 0
## Type.of.Travel 1 1 0
## Number of logged events: 5
## it im dep meth out
## 1 0 0 constant Gender
## 2 0 0 constant Customer.Type
## 3 0 0 constant Type.of.Travel
## 4 0 0 constant Class
## 5 0 0 constant satisfaction
##
## iter imp variable
## 1 1 Arrival.Delay.in.Minutes
## 1 2 Arrival.Delay.in.Minutes
## 1 3 Arrival.Delay.in.Minutes
## 1 4 Arrival.Delay.in.Minutes
## 1 5 Arrival.Delay.in.Minutes
## 2 1 Arrival.Delay.in.Minutes
## 2 2 Arrival.Delay.in.Minutes
## 2 3 Arrival.Delay.in.Minutes
## 2 4 Arrival.Delay.in.Minutes
## 2 5 Arrival.Delay.in.Minutes
## 3 1 Arrival.Delay.in.Minutes
## 3 2 Arrival.Delay.in.Minutes
## 3 3 Arrival.Delay.in.Minutes
## 3 4 Arrival.Delay.in.Minutes
## 3 5 Arrival.Delay.in.Minutes
## 4 1 Arrival.Delay.in.Minutes
## 4 2 Arrival.Delay.in.Minutes
## 4 3 Arrival.Delay.in.Minutes
## 4 4 Arrival.Delay.in.Minutes
## 4 5 Arrival.Delay.in.Minutes
## 5 1 Arrival.Delay.in.Minutes
## 5 2 Arrival.Delay.in.Minutes
## 5 3 Arrival.Delay.in.Minutes
## 5 4 Arrival.Delay.in.Minutes
## 5 5 Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations: 5
## Imputation methods:
## X id
## "" ""
## Gender Customer.Type
## "" ""
## Age Type.of.Travel
## "" ""
## Class Flight.Distance
## "" ""
## Inflight.wifi.service Departure.Arrival.time.convenient
## "" ""
## Ease.of.Online.booking Gate.location
## "" ""
## Food.and.drink Online.boarding
## "" ""
## Seat.comfort Inflight.entertainment
## "" ""
## On.board.service Leg.room.service
## "" ""
## Baggage.handling Checkin.service
## "" ""
## Inflight.service Cleanliness
## "" ""
## Departure.Delay.in.Minutes Arrival.Delay.in.Minutes
## "" "pmm"
## satisfaction
## ""
## PredictorMatrix:
## X id Gender Customer.Type Age Type.of.Travel Class
## X 0 1 0 0 1 0 0
## id 1 0 0 0 1 0 0
## Gender 1 1 0 0 1 0 0
## Customer.Type 1 1 0 0 1 0 0
## Age 1 1 0 0 0 0 0
## Type.of.Travel 1 1 0 0 1 0 0
## Flight.Distance Inflight.wifi.service
## X 1 1
## id 1 1
## Gender 1 1
## Customer.Type 1 1
## Age 1 1
## Type.of.Travel 1 1
## Departure.Arrival.time.convenient Ease.of.Online.booking
## X 1 1
## id 1 1
## Gender 1 1
## Customer.Type 1 1
## Age 1 1
## Type.of.Travel 1 1
## Gate.location Food.and.drink Online.boarding Seat.comfort
## X 1 1 1 1
## id 1 1 1 1
## Gender 1 1 1 1
## Customer.Type 1 1 1 1
## Age 1 1 1 1
## Type.of.Travel 1 1 1 1
## Inflight.entertainment On.board.service Leg.room.service
## X 1 1 1
## id 1 1 1
## Gender 1 1 1
## Customer.Type 1 1 1
## Age 1 1 1
## Type.of.Travel 1 1 1
## Baggage.handling Checkin.service Inflight.service Cleanliness
## X 1 1 1 1
## id 1 1 1 1
## Gender 1 1 1 1
## Customer.Type 1 1 1 1
## Age 1 1 1 1
## Type.of.Travel 1 1 1 1
## Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X 1 1 0
## id 1 1 0
## Gender 1 1 0
## Customer.Type 1 1 0
## Age 1 1 0
## Type.of.Travel 1 1 0
## Number of logged events: 5
## it im dep meth out
## 1 0 0 constant Gender
## 2 0 0 constant Customer.Type
## 3 0 0 constant Type.of.Travel
## 4 0 0 constant Class
## 5 0 0 constant satisfaction
Details of the catboost model
## Catboost
##
## 103904 samples
## 22 predictor
## 2 classes: 'neutral.or.dissatisfied', 'satisfied'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold)
## Summary of sample sizes: 69270, 69270, 69268
## Resampling results across tuning parameters:
##
## depth iterations Accuracy Kappa
## 4 100 0.9517343 0.9013730
## 4 200 0.9591161 0.9164248
## 4 300 0.9610121 0.9202933
## 6 100 0.9607523 0.9197698
## 6 200 0.9636492 0.9256961
## 6 300 0.9638320 0.9260832
## 8 100 0.9637262 0.9258584
## 8 200 0.9642266 0.9268986
## 8 300 0.9639186 0.9262829
##
## Tuning parameter 'learning_rate' was held constant at a value of 0.1
##
## Tuning parameter 'rsm' was held constant at a value of 0.95
## Tuning
## parameter 'border_count' was held constant at a value of 64
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were depth = 8, learning_rate =
## 0.1, iterations = 200, l2_leaf_reg = 0.1, rsm = 0.95 and border_count = 64.
Results of catboost model
## Confusion Matrix and Statistics
##
## Reference
## Prediction neutral or dissatisfied satisfied
## neutral or dissatisfied 14263 625
## satisfied 310 10778
##
## Accuracy : 0.964
## 95% CI : (0.9617, 0.9662)
## No Information Rate : 0.561
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.9267
##
## Mcnemar's Test P-Value : < 0.00000000000000022
##
## Sensitivity : 0.9452
## Specificity : 0.9787
## Pos Pred Value : 0.9720
## Neg Pred Value : 0.9580
## Prevalence : 0.4390
## Detection Rate : 0.4149
## Detection Prevalence : 0.4269
## Balanced Accuracy : 0.9620
##
## 'Positive' Class : satisfied
##
From the variable importance plot, wifi service has really high importance, and then type of travel and online-boarding.
