Chapter 1 Proposal

1.1 The data:

Airline Passenger Satisfaction from kaggle https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction

Variables: Gender: Gender of the passengers (Female, Male)

Customer Type: The customer type (Loyal customer, disloyal customer)

Age: The actual age of the passengers

Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)

Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)

Flight distance: The flight distance of this journey

Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)

Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient

Ease of Online booking: Satisfaction level of online booking

Gate location: Satisfaction level of Gate location

Food and drink: Satisfaction level of Food and drink

Online boarding: Satisfaction level of online boarding

Seat comfort: Satisfaction level of Seat comfort

Inflight entertainment: Satisfaction level of inflight entertainment

On-board service: Satisfaction level of On-board service

Leg room service: Satisfaction level of Leg room service

Baggage handling: Satisfaction level of baggage handling

Check-in service: Satisfaction level of Check-in service

Inflight service: Satisfaction level of inflight service

Cleanliness: Satisfaction level of Cleanliness

Departure Delay in Minutes: Minutes delayed when departure

Arrival Delay in Minutes: Minutes delayed when Arrival

Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

1.2 Modeling goal:

This dataset contains an airline passenger satisfaction survey.We want to know what factors are highly correlated to a satisfied (or dissatisfied) passenger, and try to model and predict passenger satisfaction.

1.3 The models we intend to use:

random forest(white box), lightGBM(black box), KNN(black box), logistic regression(white box)

From the plot_missing function, we can see that there are some missing values in the arrival delay variable. We will use the mice package to deal with them. The result from mice shows that we have imputated all the missing values successfully

## 
##  iter imp variable
##   1   1  Arrival.Delay.in.Minutes
##   1   2  Arrival.Delay.in.Minutes
##   1   3  Arrival.Delay.in.Minutes
##   1   4  Arrival.Delay.in.Minutes
##   1   5  Arrival.Delay.in.Minutes
##   2   1  Arrival.Delay.in.Minutes
##   2   2  Arrival.Delay.in.Minutes
##   2   3  Arrival.Delay.in.Minutes
##   2   4  Arrival.Delay.in.Minutes
##   2   5  Arrival.Delay.in.Minutes
##   3   1  Arrival.Delay.in.Minutes
##   3   2  Arrival.Delay.in.Minutes
##   3   3  Arrival.Delay.in.Minutes
##   3   4  Arrival.Delay.in.Minutes
##   3   5  Arrival.Delay.in.Minutes
##   4   1  Arrival.Delay.in.Minutes
##   4   2  Arrival.Delay.in.Minutes
##   4   3  Arrival.Delay.in.Minutes
##   4   4  Arrival.Delay.in.Minutes
##   4   5  Arrival.Delay.in.Minutes
##   5   1  Arrival.Delay.in.Minutes
##   5   2  Arrival.Delay.in.Minutes
##   5   3  Arrival.Delay.in.Minutes
##   5   4  Arrival.Delay.in.Minutes
##   5   5  Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##                                 X                                id 
##                                ""                                "" 
##                            Gender                     Customer.Type 
##                                ""                                "" 
##                               Age                    Type.of.Travel 
##                                ""                                "" 
##                             Class                   Flight.Distance 
##                                ""                                "" 
##             Inflight.wifi.service Departure.Arrival.time.convenient 
##                                ""                                "" 
##            Ease.of.Online.booking                     Gate.location 
##                                ""                                "" 
##                    Food.and.drink                   Online.boarding 
##                                ""                                "" 
##                      Seat.comfort            Inflight.entertainment 
##                                ""                                "" 
##                  On.board.service                  Leg.room.service 
##                                ""                                "" 
##                  Baggage.handling                   Checkin.service 
##                                ""                                "" 
##                  Inflight.service                       Cleanliness 
##                                ""                                "" 
##        Departure.Delay.in.Minutes          Arrival.Delay.in.Minutes 
##                                ""                             "pmm" 
##                      satisfaction 
##                                "" 
## PredictorMatrix:
##                X id Gender Customer.Type Age Type.of.Travel Class
## X              0  1      0             0   1              0     0
## id             1  0      0             0   1              0     0
## Gender         1  1      0             0   1              0     0
## Customer.Type  1  1      0             0   1              0     0
## Age            1  1      0             0   0              0     0
## Type.of.Travel 1  1      0             0   1              0     0
##                Flight.Distance Inflight.wifi.service
## X                            1                     1
## id                           1                     1
## Gender                       1                     1
## Customer.Type                1                     1
## Age                          1                     1
## Type.of.Travel               1                     1
##                Departure.Arrival.time.convenient Ease.of.Online.booking
## X                                              1                      1
## id                                             1                      1
## Gender                                         1                      1
## Customer.Type                                  1                      1
## Age                                            1                      1
## Type.of.Travel                                 1                      1
##                Gate.location Food.and.drink Online.boarding Seat.comfort
## X                          1              1               1            1
## id                         1              1               1            1
## Gender                     1              1               1            1
## Customer.Type              1              1               1            1
## Age                        1              1               1            1
## Type.of.Travel             1              1               1            1
##                Inflight.entertainment On.board.service Leg.room.service
## X                                   1                1                1
## id                                  1                1                1
## Gender                              1                1                1
## Customer.Type                       1                1                1
## Age                                 1                1                1
## Type.of.Travel                      1                1                1
##                Baggage.handling Checkin.service Inflight.service Cleanliness
## X                             1               1                1           1
## id                            1               1                1           1
## Gender                        1               1                1           1
## Customer.Type                 1               1                1           1
## Age                           1               1                1           1
## Type.of.Travel                1               1                1           1
##                Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X                                       1                        1            0
## id                                      1                        1            0
## Gender                                  1                        1            0
## Customer.Type                           1                        1            0
## Age                                     1                        1            0
## Type.of.Travel                          1                        1            0
## Number of logged events:  5 
##   it im dep     meth            out
## 1  0  0     constant         Gender
## 2  0  0     constant  Customer.Type
## 3  0  0     constant Type.of.Travel
## 4  0  0     constant          Class
## 5  0  0     constant   satisfaction
## 
##  iter imp variable
##   1   1  Arrival.Delay.in.Minutes
##   1   2  Arrival.Delay.in.Minutes
##   1   3  Arrival.Delay.in.Minutes
##   1   4  Arrival.Delay.in.Minutes
##   1   5  Arrival.Delay.in.Minutes
##   2   1  Arrival.Delay.in.Minutes
##   2   2  Arrival.Delay.in.Minutes
##   2   3  Arrival.Delay.in.Minutes
##   2   4  Arrival.Delay.in.Minutes
##   2   5  Arrival.Delay.in.Minutes
##   3   1  Arrival.Delay.in.Minutes
##   3   2  Arrival.Delay.in.Minutes
##   3   3  Arrival.Delay.in.Minutes
##   3   4  Arrival.Delay.in.Minutes
##   3   5  Arrival.Delay.in.Minutes
##   4   1  Arrival.Delay.in.Minutes
##   4   2  Arrival.Delay.in.Minutes
##   4   3  Arrival.Delay.in.Minutes
##   4   4  Arrival.Delay.in.Minutes
##   4   5  Arrival.Delay.in.Minutes
##   5   1  Arrival.Delay.in.Minutes
##   5   2  Arrival.Delay.in.Minutes
##   5   3  Arrival.Delay.in.Minutes
##   5   4  Arrival.Delay.in.Minutes
##   5   5  Arrival.Delay.in.Minutes
## Class: mids
## Number of multiple imputations:  5 
## Imputation methods:
##                                 X                                id 
##                                ""                                "" 
##                            Gender                     Customer.Type 
##                                ""                                "" 
##                               Age                    Type.of.Travel 
##                                ""                                "" 
##                             Class                   Flight.Distance 
##                                ""                                "" 
##             Inflight.wifi.service Departure.Arrival.time.convenient 
##                                ""                                "" 
##            Ease.of.Online.booking                     Gate.location 
##                                ""                                "" 
##                    Food.and.drink                   Online.boarding 
##                                ""                                "" 
##                      Seat.comfort            Inflight.entertainment 
##                                ""                                "" 
##                  On.board.service                  Leg.room.service 
##                                ""                                "" 
##                  Baggage.handling                   Checkin.service 
##                                ""                                "" 
##                  Inflight.service                       Cleanliness 
##                                ""                                "" 
##        Departure.Delay.in.Minutes          Arrival.Delay.in.Minutes 
##                                ""                             "pmm" 
##                      satisfaction 
##                                "" 
## PredictorMatrix:
##                X id Gender Customer.Type Age Type.of.Travel Class
## X              0  1      0             0   1              0     0
## id             1  0      0             0   1              0     0
## Gender         1  1      0             0   1              0     0
## Customer.Type  1  1      0             0   1              0     0
## Age            1  1      0             0   0              0     0
## Type.of.Travel 1  1      0             0   1              0     0
##                Flight.Distance Inflight.wifi.service
## X                            1                     1
## id                           1                     1
## Gender                       1                     1
## Customer.Type                1                     1
## Age                          1                     1
## Type.of.Travel               1                     1
##                Departure.Arrival.time.convenient Ease.of.Online.booking
## X                                              1                      1
## id                                             1                      1
## Gender                                         1                      1
## Customer.Type                                  1                      1
## Age                                            1                      1
## Type.of.Travel                                 1                      1
##                Gate.location Food.and.drink Online.boarding Seat.comfort
## X                          1              1               1            1
## id                         1              1               1            1
## Gender                     1              1               1            1
## Customer.Type              1              1               1            1
## Age                        1              1               1            1
## Type.of.Travel             1              1               1            1
##                Inflight.entertainment On.board.service Leg.room.service
## X                                   1                1                1
## id                                  1                1                1
## Gender                              1                1                1
## Customer.Type                       1                1                1
## Age                                 1                1                1
## Type.of.Travel                      1                1                1
##                Baggage.handling Checkin.service Inflight.service Cleanliness
## X                             1               1                1           1
## id                            1               1                1           1
## Gender                        1               1                1           1
## Customer.Type                 1               1                1           1
## Age                           1               1                1           1
## Type.of.Travel                1               1                1           1
##                Departure.Delay.in.Minutes Arrival.Delay.in.Minutes satisfaction
## X                                       1                        1            0
## id                                      1                        1            0
## Gender                                  1                        1            0
## Customer.Type                           1                        1            0
## Age                                     1                        1            0
## Type.of.Travel                          1                        1            0
## Number of logged events:  5 
##   it im dep     meth            out
## 1  0  0     constant         Gender
## 2  0  0     constant  Customer.Type
## 3  0  0     constant Type.of.Travel
## 4  0  0     constant          Class
## 5  0  0     constant   satisfaction

We will visualize the numercial variables below

Here we check distribution of the score variables. Those are more categorical than numerical. Here we display histogram to demonstrate distribution of character variables. We find that the genders of customers are quite balanced. Majority of the travels are business travels. There are more neutral or dissatisfied reviews than satisfied, but not by a large margin.

##        X              Inflight.wifi.service Departure.Arrival.time.convenient
##  Min.   :-0.0043215   Min.   :-0.00249      Min.   :-0.004861                
##  1st Qu.:-0.0008216   1st Qu.: 0.12121      1st Qu.: 0.011893                
##  Median : 0.0007393   Median : 0.13472      Median : 0.070119                
##  Mean   : 0.0669499   Mean   : 0.26709      Mean   : 0.176147                
##  3rd Qu.: 0.0016382   3rd Qu.: 0.34005      3rd Qu.: 0.218589                
##  Max.   : 1.0000000   Max.   : 1.00000      Max.   : 1.000000                
##  Ease.of.Online.booking Gate.location       Food.and.drink     
##  Min.   :0.001913       Min.   :-0.035428   Min.   :-0.002162  
##  1st Qu.:0.030944       1st Qu.:-0.002494   1st Qu.: 0.032185  
##  Median :0.038833       Median : 0.002313   Median : 0.059073  
##  Mean   :0.224940       Mean   : 0.145529   Mean   : 0.233672  
##  3rd Qu.:0.420518       3rd Qu.: 0.170661   3rd Qu.: 0.404512  
##  Max.   :1.000000       Max.   : 1.000000   Max.   : 1.000000  
##  Online.boarding     Seat.comfort       Inflight.entertainment
##  Min.   :0.001002   Min.   :0.0000435   Min.   :-0.004861     
##  1st Qu.:0.078926   1st Qu.:0.0496160   1st Qu.: 0.083950     
##  Median :0.204462   Median :0.1226578   Median : 0.299691     
##  Mean   :0.256455   Mean   :0.2683176   Mean   : 0.339342     
##  3rd Qu.:0.367796   3rd Qu.:0.4973836   3rd Qu.: 0.515371     
##  Max.   :1.000000   Max.   :1.0000000   Max.   : 1.000000     
##  On.board.service   Leg.room.service    Baggage.handling     Checkin.service   
##  Min.   :-0.02837   Min.   :-0.005873   Min.   :-0.0005263   Min.   :-0.03543  
##  1st Qu.: 0.06398   1st Qu.: 0.064434   1st Qu.: 0.0554440   1st Qu.: 0.06525  
##  Median : 0.13197   Median : 0.123950   Median : 0.0957927   Median : 0.15314  
##  Mean   : 0.25072   Mean   : 0.212240   Mean   : 0.2433687   Mean   : 0.18395  
##  3rd Qu.: 0.38782   3rd Qu.: 0.327593   3rd Qu.: 0.3738770   3rd Qu.: 0.21879  
##  Max.   : 1.00000   Max.   : 1.000000   Max.   : 1.0000000   Max.   : 1.00000  
##  Inflight.service      Cleanliness      
##  Min.   :-0.0001341   Min.   :-0.00383  
##  1st Qu.: 0.0522451   1st Qu.: 0.05248  
##  Median : 0.0887792   Median : 0.12322  
##  Mean   : 0.2451461   Mean   : 0.27344  
##  3rd Qu.: 0.3867554   3rd Qu.: 0.49464  
##  Max.   : 1.0000000   Max.   : 1.00000

By displaying the correlation matrix, we can see that arrival delay and departure delay have really high correlation, which is very obvious. If a flight takes off late, it is very likely to arrive late as well. The other numerical variables don’t correlate with each other, as they are supposed to be. Variables that fall into the “customer in-flight experience” area such as food and drink, in-flight entertainment and seat comfort have positive correlation with each other. The reason might be that airline companies manage these factors under one department/plan. If they decide to improve the food and drink, they might improve the in-flight entertainment as well. But it is surprised to see that in-flight service has no correlation with cleanness.
Variables such as WiFi service, online booking, and gate location tend to have a positive correlation. In speculation, larger airline companies might have better online booking sites, get access to good gate locations and so on.