Upload the Neth.CSV data file to google colab. Estimate a linear regression model of number of weekly trips per household as a function of the remaining variables (to the extent possible). In the final model identify which variables contribute to an increase in number of weekly trips and which variables contribute to a decrease in the number of weekly trips.

In estimating a linear regression model these are the steps I want you to do:

• Run multiple linear regression by including all variables
• Check for multi-collinearity and remove appropriate variables.
• Check if all variables are significant and remove the variables which are not significant one by one.
• Report the R2 and Adjusted R2 of the final model
• Check if the mean of the residuals is close to zero. Comment on the mean.
• Plot the histogram of the standardized residuals. Comment on whether the standardized residuals look normal.
• Check for outliers.
• Plot the residuals vs fitted values and comment.
The variable definitions are:
• HHSIZE household size
• NCAR number of cars in household
• HEMPSTS number of workers in household
• HSTUDEN number of students in household
• HTTRPS number of weekly trips per household
• NUCHLT12 number of children < 12 years in household
• CITY household residence in city (dummy variable)
• SUBURB household residence in suburb (dummy variable) • RURAL household residence in rural area (dummy variable)• INCOME continuous household income value
• NUCHGT12 number of children >= 12 yrs in household

CE 316: Applied Probability and Statistics in CEE

Homework 8

Due: December 5 at 10:00 PM

Problem 1:

Upload the Neth.CSV data file to google colab. Estimate a linear regression model of number of weekly trips per household as a function of the remaining variables (to the extent possible). In the final model identify which variables contribute to an increase in number of weekly trips and which variables contribute to a decrease in the number of weekly trips.

In estimating a linear regression model these are the steps I want you to do:

• Run multiple linear regression by including all variables

• Check for multi-collinearity and remove appropriate variables.

• Check if all variables are significant and remove the variables which are not significant one by one.

• Report the R2 and Adjusted R2 of the final model

• Check if the mean of the residuals is close to zero. Comment on the mean.

• Plot the histogram of the standardized residuals. Comment on whether the standardized residuals look normal.

• Check for outliers.

• Plot the residuals vs fitted values and comment.

The variable definitions are:

• HHSIZE household size

• NCAR number of cars in household

• HEMPSTS number of workers in household

• HSTUDEN number of students in household

• HTTRPS number of weekly trips per household

• NUCHLT12 number of children < 12 years in household

• CITY household residence in city (dummy variable)

• SUBURB household residence in suburb (dummy variable)

• RURAL household residence in rural area (dummy variable)

• INCOME continuous household income value

• NUCHGT12 number of children >= 12 yrs in household

8-1

,

hhsize ncar hempsts hstuden httrps nuchlt12 city suburb rural income nuchgt12
1 4 1 0 18 0 1 0 0 20500 0
1 0 1 0 31 0 1 0 0 31000 0
3 0 1 0 55 1 1 0 0 43000 0
1 0 1 0 38 0 1 0 0 12000 0
2 0 0 0 22 0 1 0 0 31000 0
1 0 1 0 34 0 1 0 0 31000 0
1 0 1 0 13 0 1 0 0 20500 0
1 0 1 0 25 0 1 0 0 20500 0
1 0 1 0 20 0 1 0 0 31000 0
2 0 0 0 8 0 1 0 0 31000 0
3 0 0 0 18 2 1 0 0 12000 0
1 0 0 0 10 0 1 0 0 12000 0
1 0 0 0 7 0 1 0 0 12000 0
1 0 0 1 22 0 1 0 0 12000 0
2 0 0 0 50 0 1 0 0 31000 0
1 0 1 0 12 0 1 0 0 20500 0
1 0 0 1 32 0 1 0 0 12000 0
1 0 0 0 9 0 1 0 0 12000 0
1 0 1 0 18 0 1 0 0 20500 0
1 0 1 0 38 0 1 0 0 12000 0
1 0 1 0 47 0 1 0 0 12000 0
1 0 0 1 22 0 1 0 0 12000 0
1 0 1 0 30 0 1 0 0 31000 0
2 0 0 0 29 0 1 0 0 31000 0
1 0 0 1 29 0 1 0 0 12000 0
1 0 0 0 11 0 1 0 0 12000 0
2 0 0 1 41 1 1 0 0 12000 0
4 0 0 0 47 2 1 0 0 20500 0
1 0 0 0 16 0 1 0 0 20500 0
1 0 0 1 24 0 1 0 0 12000 0
5 0 0 0 19 4 1 0 0 12000 0
1 0 0 0 15 0 1 0 0 12000 0
1 0 0 1 24 0 1 0 0 12000 0
3 0 1 0 35 1 1 0 0 31000 0
1 0 1 0 24 0 1 0 0 12000 0
6 0 1 4 52 0 1 0 0 31000 4
1 0 0 1 22 0 1 0 0 12000 0
1 0 0 1 14 0 1 0 0 12000 0
1 0 0 0 8 0 1 0 0 12000 0
1 0 0 1 23 0 1 0 0 12000 0
1 0 0 1 17 0 1 0 0 12000 0
1 0 0 1 46 0 1 0 0 12000 0
1 0 1 0 16 0 1 0 0 31000 0
1 0 0 1 48 0 1 0 0 12000 0
1 0 0 1 48 0 1 0 0 12000 0
1 0 0 0 29 0 1 0 0 12000 0
1 0 0 1 26 0 1 0 0 12000 0
1 0 0 1 28 0 1 0 0 12000 0
2 0 2 0 18 0 1 0 0 31000 0
1 0 0 1 25 0 1 0 0 12000 0
1 0 0 1 24 0 1 0 0 12000 0
1 0 0 1 38 0 1 0 0 12000 0
1 0 0 0 22 0 0 1 0 12000 0
3 0 0 2 45 0 0 1 0 12000 2
1 0 0 0 20 0 0 1 0 20500 0
1 0 0 0 14 0 0 1 0 12000 0
8 0 1 3 109 3 0 1 0 31000 3
2 0 0 0 46 0 0 1 0 31000 0
2 0 0 0 10 0 0 1 0 12000 0
2 0 0 0 10 0 0 1 0 31000 0
2 0 0 0 12 0 0 1 0 31000 0
3 0 2 0 102 0 0 1 0 20500 1
1 0 1 0 30 0 0 1 0 31000 0
2 0 0 0 24 0 0 1 0 31000 1
1 0 0 0 15 0 0 1 0 12000 0
3 0 2 0 24 1 0 1 0 31000 0
3 0 2 0 54 0 0 1 0 31000 1
7 0 2 2 70 2 0 1 0 31000 3
1 0 0 0 11 0 0 1 0 31000 0
1 0 0 0 16 0 0 1 0 12000 0
1 0 0 0 12 0 0 1 0 12000 0
7 0 0 0 65 5 0 1 0 12000 0
2 0 0 0 29 0 0 1 0 31000 0
3 0 2 0 38 0 0 1 0 43000 1
1 0 0 0 12 0 0 1 0 12000 0
1 0 0 0 10 0 0 1 0 12000 0
1 0 0 0 20 0 0 1 0 12000 0
1 0 0 0 24 0 0 1 0 12000 0
1 0 0 0 15 0 0 1 0 12000 0
2 0 2 0 45 0 0 1 0 12000 0
1 0 0 1 29 0 1 0 0 12000 0
2 0 0 2 62 0 1 0 0 12000 0
5 0 1 1 55 2 1 0 0 20500 1
4 0 0 1 47 2 1 0 0 20500 0
1 0 0 0 29 0 1 0 0 12000 0
6 0 0 1 90 3 1 0 0 1200

