Chapter 4 Cat Boost

Lastly, we are going to set up and try catboost

## 
##  iter imp variable
##   1   1  Arrival.Delay.in.Minutes
##   1   2  Arrival.Delay.in.Minutes
##   1   3  Arrival.Delay.in.Minutes
##   1   4  Arrival.Delay.in.Minutes
##   1   5  Arrival.Delay.in.Minutes
##   2   1  Arrival.Delay.in.Minutes
##   2   2  Arrival.Delay.in.Minutes
##   2   3  Arrival.Delay.in.Minutes
##   2   4  Arrival.Delay.in.Minutes
##   2   5  Arrival.Delay.in.Minutes
##   3   1  Arrival.Delay.in.Minutes
##   3   2  Arrival.Delay.in.Minutes
##   3   3  Arrival.Delay.in.Minutes
##   3   4  Arrival.Delay.in.Minutes
##   3   5  Arrival.Delay.in.Minutes
##   4   1  Arrival.Delay.in.Minutes
##   4   2  Arrival.Delay.in.Minutes
##   4   3  Arrival.Delay.in.Minutes
##   4   4  Arrival.Delay.in.Minutes
##   4   5  Arrival.Delay.in.Minutes
##   5   1  Arrival.Delay.in.Minutes
##   5   2  Arrival.Delay.in.Minutes
##   5   3  Arrival.Delay.in.Minutes
##   5   4  Arrival.Delay.in.Minutes
##   5   5  Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##                                 X                                id 
##                                ""                                "" 
##                            Gender                     Customer.Type 
##                                ""                                "" 
##                               Age                    Type.of.Travel 
##                                ""                                "" 
##                             Class                   Flight.Distance 
##                                ""                                "" 
##             Inflight.wifi.service Departure.Arrival.time.convenient 
##                                ""                                "" 
##            Ease.of.Online.booking                     Gate.location 
##                                ""                                "" 
##                    Food.and.drink                   Online.boarding 
##                                ""                                "" 
##                      Seat.comfort            Inflight.entertainment 
##                                ""                                "" 
##                  On.board.service                  Leg.room.service 
##                                ""                                "" 
##                  Baggage.handling                   Checkin.service 
##                                ""                                "" 
##                  Inflight.service                       Cleanliness 
##                                ""                                "" 
##        Departure.Delay.in.Minutes          Arrival.Delay.in.Minutes 
##                                ""                             "pmm" 
##                      satisfaction 
##                                "" 
## PredictorMatrix:
##                X id Gender Customer.Type Age Type.of.Travel Class
## X              0  1      0             0   1              0     0
## id             1  0      0             0   1              0     0
## Gender         1  1      0             0   1              0     0
## Customer.Type  1  1      0             0   1              0     0
## Age            1  1      0             0   0              0     0
## Type.of.Travel 1  1      0             0   1              0     0
##                Flight.Distance Inflight.wifi.service
## X                            1                     1
## id                           1                     1
## Gender                       1                     1
## Customer.Type                1                     1
## Age                          1                     1
## Type.of.Travel               1                     1
##                Departure.Arrival.time.convenient Ease.of.Online.booking
## X                                              1                      1
## id                                             1                      1
## Gender                                         1                      1
## Customer.Type                                  1                      1
## Age                                            1                      1
## Type.of.Travel                                 1                      1
##                Gate.location Food.and.drink Online.boarding Seat.comfort
## X                          1              1               1            1
## id                         1              1               1            1
## Gender                     1              1               1            1
## Customer.Type              1              1               1            1
## Age                        1              1               1            1
## Type.of.Travel             1              1               1            1
##                Inflight.entertainment On.board.service Leg.room.service
## X                                   1                1                1
## id                                  1                1                1
## Gender                              1                1                1
## Customer.Type                       1                1                1
## Age                                 1                1                1
## Type.of.Travel                      1                1                1
##                Baggage.handling Checkin.service Inflight.service Cleanliness
## X                             1               1                1           1
## id                            1               1                1           1
## Gender                        1               1                1           1
## Customer.Type                 1               1                1           1
## Age                           1               1                1           1
## Type.of.Travel                1               1                1           1
##                Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X                                       1                        1            0
## id                                      1                        1            0
## Gender                                  1                        1            0
## Customer.Type                           1                        1            0
## Age                                     1                        1            0
## Type.of.Travel                          1                        1            0
## Number of logged events:  5 
##   it im dep     meth            out
## 1  0  0     constant         Gender
## 2  0  0     constant  Customer.Type
## 3  0  0     constant Type.of.Travel
## 4  0  0     constant          Class
## 5  0  0     constant   satisfaction
## 
##  iter imp variable
##   1   1  Arrival.Delay.in.Minutes
##   1   2  Arrival.Delay.in.Minutes
##   1   3  Arrival.Delay.in.Minutes
##   1   4  Arrival.Delay.in.Minutes
##   1   5  Arrival.Delay.in.Minutes
##   2   1  Arrival.Delay.in.Minutes
##   2   2  Arrival.Delay.in.Minutes
##   2   3  Arrival.Delay.in.Minutes
##   2   4  Arrival.Delay.in.Minutes
##   2   5  Arrival.Delay.in.Minutes
##   3   1  Arrival.Delay.in.Minutes
##   3   2  Arrival.Delay.in.Minutes
##   3   3  Arrival.Delay.in.Minutes
##   3   4  Arrival.Delay.in.Minutes
##   3   5  Arrival.Delay.in.Minutes
##   4   1  Arrival.Delay.in.Minutes
##   4   2  Arrival.Delay.in.Minutes
##   4   3  Arrival.Delay.in.Minutes
##   4   4  Arrival.Delay.in.Minutes
##   4   5  Arrival.Delay.in.Minutes
##   5   1  Arrival.Delay.in.Minutes
##   5   2  Arrival.Delay.in.Minutes
##   5   3  Arrival.Delay.in.Minutes
##   5   4  Arrival.Delay.in.Minutes
##   5   5  Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##                                 X                                id 
##                                ""                                "" 
##                            Gender                     Customer.Type 
##                                ""                                "" 
##                               Age                    Type.of.Travel 
##                                ""                                "" 
##                             Class                   Flight.Distance 
##                                ""                                "" 
##             Inflight.wifi.service Departure.Arrival.time.convenient 
##                                ""                                "" 
##            Ease.of.Online.booking                     Gate.location 
##                                ""                                "" 
##                    Food.and.drink                   Online.boarding 
##                                ""                                "" 
##                      Seat.comfort            Inflight.entertainment 
##                                ""                                "" 
##                  On.board.service                  Leg.room.service 
##                                ""                                "" 
##                  Baggage.handling                   Checkin.service 
##                                ""                                "" 
##                  Inflight.service                       Cleanliness 
##                                ""                                "" 
##        Departure.Delay.in.Minutes          Arrival.Delay.in.Minutes 
##                                ""                             "pmm" 
##                      satisfaction 
##                                "" 
## PredictorMatrix:
##                X id Gender Customer.Type Age Type.of.Travel Class
## X              0  1      0             0   1              0     0
## id             1  0      0             0   1              0     0
## Gender         1  1      0             0   1              0     0
## Customer.Type  1  1      0             0   1              0     0
## Age            1  1      0             0   0              0     0
## Type.of.Travel 1  1      0             0   1              0     0
##                Flight.Distance Inflight.wifi.service
## X                            1                     1
## id                           1                     1
## Gender                       1                     1
## Customer.Type                1                     1
## Age                          1                     1
## Type.of.Travel               1                     1
##                Departure.Arrival.time.convenient Ease.of.Online.booking
## X                                              1                      1
## id                                             1                      1
## Gender                                         1                      1
## Customer.Type                                  1                      1
## Age                                            1                      1
## Type.of.Travel                                 1                      1
##                Gate.location Food.and.drink Online.boarding Seat.comfort
## X                          1              1               1            1
## id                         1              1               1            1
## Gender                     1              1               1            1
## Customer.Type              1              1               1            1
## Age                        1              1               1            1
## Type.of.Travel             1              1               1            1
##                Inflight.entertainment On.board.service Leg.room.service
## X                                   1                1                1
## id                                  1                1                1
## Gender                              1                1                1
## Customer.Type                       1                1                1
## Age                                 1                1                1
## Type.of.Travel                      1                1                1
##                Baggage.handling Checkin.service Inflight.service Cleanliness
## X                             1               1                1           1
## id                            1               1                1           1
## Gender                        1               1                1           1
## Customer.Type                 1               1                1           1
## Age                           1               1                1           1
## Type.of.Travel                1               1                1           1
##                Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X                                       1                        1            0
## id                                      1                        1            0
## Gender                                  1                        1            0
## Customer.Type                           1                        1            0
## Age                                     1                        1            0
## Type.of.Travel                          1                        1            0
## Number of logged events:  5 
##   it im dep     meth            out
## 1  0  0     constant         Gender
## 2  0  0     constant  Customer.Type
## 3  0  0     constant Type.of.Travel
## 4  0  0     constant          Class
## 5  0  0     constant   satisfaction

Details of the catboost model

## Catboost 
## 
## 103904 samples
##     22 predictor
##      2 classes: 'neutral.or.dissatisfied', 'satisfied' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold) 
## Summary of sample sizes: 69270, 69270, 69268 
## Resampling results across tuning parameters:
## 
##   depth  iterations  Accuracy   Kappa    
##   4      100         0.9517343  0.9013730
##   4      200         0.9591161  0.9164248
##   4      300         0.9610121  0.9202933
##   6      100         0.9607523  0.9197698
##   6      200         0.9636492  0.9256961
##   6      300         0.9638320  0.9260832
##   8      100         0.9637262  0.9258584
##   8      200         0.9642266  0.9268986
##   8      300         0.9639186  0.9262829
## 
## Tuning parameter 'learning_rate' was held constant at a value of 0.1
## 
## Tuning parameter 'rsm' was held constant at a value of 0.95
## Tuning
##  parameter 'border_count' was held constant at a value of 64
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were depth = 8, learning_rate =
##  0.1, iterations = 200, l2_leaf_reg = 0.1, rsm = 0.95 and border_count = 64.

Results of catboost model

## Confusion Matrix and Statistics
## 
##                          Reference
## Prediction                neutral or dissatisfied satisfied
##   neutral or dissatisfied                   14263       625
##   satisfied                                   310     10778
##                                                
##                Accuracy : 0.964                
##                  95% CI : (0.9617, 0.9662)     
##     No Information Rate : 0.561                
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.9267               
##                                                
##  Mcnemar's Test P-Value : < 0.00000000000000022
##                                                
##             Sensitivity : 0.9452               
##             Specificity : 0.9787               
##          Pos Pred Value : 0.9720               
##          Neg Pred Value : 0.9580               
##              Prevalence : 0.4390               
##          Detection Rate : 0.4149               
##    Detection Prevalence : 0.4269               
##       Balanced Accuracy : 0.9620               
##                                                
##        'Positive' Class : satisfied            
## 

From the variable importance plot, wifi service has really high importance, and then type of travel and online-boarding.