Lab 03

Introduction

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” by Hamermesh and Parker found that instructors who are viewed to be better looking receive higher instructional ratings. In this lab, you will analyze the data from this study in order to learn what goes into a positive professor evaluation. You will also use the GGally package for visualization relationships between many variables pairs at once and the broom package to tidy your regression outputs. We will start by a brief revisiting of the simple linear regression and then extend it to multiple regression (adding more predictor variables to the model)

Creating a quarto file

Create a new Quarto document with the title Multiple Linear Regression. Change the output format to pdf (note that it is set to html by default). Refer to lab_00_Guide if you don’t remember how to create a new quarto file. Save the file as lab_03.
Note that if you created the file correctly, it should appear under files with a .qmd extension(i.e., lab_03.qmd). If you do not have this file exactly as described, please STOP and make sure you have it done correctly before you proceed.
After correctly creating the file, click on Render to see the output in pdf format. Note that it may pop up in a new window. This step is just for ensuring that your document created properly and that you are able to generate a pdf, the format in which you will submit the your lab.

Packages

You will need the following packages for this lab: openintro, tidyverse, statsr, GGally, and broom. Recall that we have previously installed all these packages. All we need to do is load them into our workspace. Use code below:

library(openintro)
library(tidyverse)
library(statsr)
library(broom)
library(GGally)

Be sure to run the packages code chunk above to ensure that they are all loaded correctly. Remember to use include=F option (i.e., {r, include=F}) to prevent the code output for loading packages from showing up in your rendered report.

Loading (and viewing) Data

We will use a data frame called evals contained in the openintro package. The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance (bty). The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.

Since we have already activated openintro, we can load the data by running the data command. See below:

Code

data(evals)

Once you run the chunk above, you should be able to see a new object called evals in the environment area. Click on it to examine the data.

You can learn more about the data by running the following command in the console:

?evals

Multiple Linear Regression Modeling

Let us start by creating a full model (we can name it m_full) that predicts professor score based on the following predictors: rank, gender, ethnicity, language (language of the university where they got their degree), age, cls_perc_eval (proportion of students that filled out evaluations), cls_students (number of students in class), cls_level (course level), cls_profs (number of professors), cls_credits (number of credits), and bty_avg (average beauty rating). Note that some of these variables are categorical and others are numerical. You can learn more about these variables including how they were measured by checking the documentation (you can run ?evals in the console).

Before running the code below, answer the following question:

How many variables do you expect to be in the multiple regression model? How do you know?

Code

m_full <- lm(score ~ rank
                  + gender 
                  + ethnicity 
                  + language 
                  + age 
                  + cls_perc_eval 
                  + cls_students 
                  + cls_level 
                  + cls_profs 
                  + cls_credits 
                  + bty_avg, data = evals)
tidy(m_full)

# A tibble: 13 × 5
   term                   estimate std.error statistic  p.value
   <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)            3.53      0.241       14.7   4.65e-40
 2 ranktenure track      -0.107     0.0820      -1.30  1.93e- 1
 3 ranktenured           -0.0450    0.0652      -0.691 4.90e- 1
 4 gendermale             0.179     0.0515       3.47  5.79e- 4
 5 ethnicitynot minority  0.187     0.0775       2.41  1.63e- 2
 6 languagenon-english   -0.127     0.108       -1.17  2.41e- 1
 7 age                   -0.00665   0.00308     -2.16  3.15e- 2
 8 cls_perc_eval          0.00570   0.00155      3.67  2.68e- 4
 9 cls_students           0.000445  0.000358     1.24  2.15e- 1
10 cls_levelupper         0.0187    0.0556       0.337 7.37e- 1
11 cls_profssingle       -0.00858   0.0514      -0.167 8.67e- 1
12 cls_creditsone credit  0.509     0.117        4.35  1.70e- 5
13 bty_avg                0.0613    0.0167       3.67  2.68e- 4

Questions:

How many variables are in the model? Is this what you expected?
Interpret the coefficient associated with the ethnicity variable.

Model Prunning

The above model is called a full model because it contains all predictors. The full model is not always the best. We can improve on this model by dropping certain predictors that are not adding value to the model. We are going to use one of the step-wise selection methods (backward elimination) to prune the above model in order to improve it. We are trying to increase the adjusted \(R^2\).

Step 1

We will start with the full model (m_full) and its adjusted \(R^2\). Run the glance function to check the adjusted \(R^2\) of the model above (m_full). Interpret the value in context.

Code

glance(m_full)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.164         0.141 0.504      7.33 2.41e-12    12  -333.  694.  752.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

The output shows that the adjusted \(R^2\) is \(14.12\%\). We use this number as our baseline and drop predictors one at a time, each time checking on the improvement in the adjusted \(R^2\). Our new model will be one that leads to the highest improvement in the adjusted \(R^2\).

Let us start by dropping rank (we can call the model without rank rm_rank (meaning remove rank). We also tidy and glance the model. See the code below:

Code

rm_rank <- lm(score ~ gender 
                    + ethnicity 
                    + language 
                    + age 
                    + cls_perc_eval 
                    + cls_students 
                    + cls_level 
                    + cls_profs 
                    + cls_credits 
                    + bty_avg, data = evals)
#tidy(rm_rank)
glance(rm_rank)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.160         0.142 0.504      8.63 5.78e-13    10  -334.  692.  742.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Does dropping rank lead to a significant improvement in the adjusted \(R^2\)?

Next, we want to drop gender. See below:

Code

rm_gender <- lm(score ~ rank 
                      + ethnicity 
                      + language 
                      + age 
                      + cls_perc_eval 
                      + cls_students 
                      + cls_level 
                      + cls_profs 
                      + cls_credits 
                      + bty_avg, data = evals)
#tidy(rm_gender)
glance(rm_gender)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.141         0.120 0.510      6.74 1.64e-10    11  -339.  704.  758.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Comment about the effect of dropping gender from the model? Is it a good idea?

Next, let us drop ethnicity:

Code

rm_ethnicity <- lm(score ~ rank
                          + gender 
                          + language 
                          + age 
                          + cls_perc_eval 
                          + cls_students 
                          + cls_level 
                          + cls_profs 
                          + cls_credits 
                          + bty_avg, data = evals)
#tidy(rm_ethnicity)
glance(rm_ethnicity)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.153         0.132 0.507      7.39 1.10e-11    11  -336.  698.  752.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

What can you comment about the effect of dropping ethnicity?

Next, drop language:

Code

rm_language <- lm(score ~ rank
                        + ethnicity
                        + gender 
                        + age 
                        + cls_perc_eval 
                        + cls_students 
                        + cls_level 
                        + cls_profs 
                        + cls_credits 
                        + bty_avg, data = evals)
#tidy(rm_language)
glance(rm_language)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.161         0.140 0.504      7.87 1.52e-12    11  -334.  694.  747.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

What is the effect of dropping language?

Next, drop age:

Code

rm_age <- lm(score ~ rank
                  + ethnicity
                  + gender
                  + language
                  + cls_perc_eval 
                  + cls_students 
                  + cls_level 
                  + cls_profs 
                  + cls_credits 
                  + bty_avg, data = evals)
glance(rm_age)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.155         0.134 0.506      7.51 6.58e-12    11  -336.  697.  751.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Comment on the effect of dropping age:

Next, drop cls_perc_eval (i.e., proportion of students that filled out evaluations):

Code

rm_cls_perc_eval <- lm(score ~ rank
                              + ethnicity
                              + age
                              + gender
                              + language
                              + cls_students 
                              + cls_level 
                              + cls_profs 
                              + cls_credits 
                              + bty_avg, data = evals)
glance(rm_cls_perc_eval)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.138         0.117 0.511      6.59 3.11e-10    11  -340.  706.  760.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Comment appropriately.

Next, drop cls_students (class size):

Code

rm_cls_students <- lm(score ~ rank
                            + ethnicity
                            + age
                            + gender
                            + cls_perc_eval
                            + language
                            + cls_level 
                            + cls_profs 
                            + cls_credits 
                            + bty_avg, data = evals)
glance(rm_cls_students)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.161         0.140 0.504      7.85 1.64e-12    11  -334.  694.  748.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Comment on the effect of dropping class size.

Next, drop cls_level (course level):

Code

rm_cls_level <- lm(score ~ rank
                          + ethnicity
                          + age
                          + gender
                          + cls_students
                          + cls_perc_eval
                          + language
                          + cls_profs 
                          + cls_credits 
                          + bty_avg, data = evals)
glance(rm_cls_level)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.163         0.143 0.504      8.00 8.63e-13    11  -333.  692.  746.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Does dropping course level lead to an increase in the adjusted \(R^2\). if so, by how much?

We are almost done. Next, drop cls_profs (number of professors):

Code

rm_cls_profs <- lm(score ~ rank
                          + ethnicity
                          + age
                          + gender
                          + cls_students
                          + cls_level
                          + cls_perc_eval
                          + language
                          + cls_credits 
                          + bty_avg, data = evals)
glance(rm_cls_profs)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.163         0.143 0.503      8.01 8.30e-13    11  -333.  692.  746.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Does dropping cls_profs lead to an increase in adjusted \(R^2\)? By how much?

Next, drop cls_credits (i.e., number of credits):

Code

rm_cls_credits <- lm(score ~ rank
                          + ethnicity
                          + age
                          + gender
                          + cls_students
                          + cls_level
                          + cls_perc_eval
                          + language
                          + cls_profs
                          + bty_avg, data = evals)
glance(rm_cls_credits)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic       p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>         <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.128         0.107 0.514      6.04 0.00000000306    11  -343.  711.  765.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Comment on the effect of dropping cls_credits.

Finally, drop bty_avg (i.e., average beauty score):

Code

rm_bty_avg <- lm(score ~ rank
                        + ethnicity
                        + age
                        + gender
                        + cls_students
                        + cls_level
                        + cls_perc_eval
                        +cls_credits
                        + language
                        + cls_profs, data = evals)
glance(rm_bty_avg)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.138         0.117 0.511      6.59 3.11e-10    11  -340.  706.  760.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

As you have seen, dropping class_profs leads to the most improvement to the model. So, we create a new model (call it m_prunned_1) without cls_profs and check its adjusted \(R^2\). Below is our new model:

Code

m_prunned_1 <- lm(score ~ rank 
                        + gender 
                        + ethnicity 
                        + language 
                        + age 
                        + cls_perc_eval 
                        + cls_students 
                        + cls_level 
                        + cls_credits 
                        + bty_avg, data = evals)
tidy(m_prunned_1)

# A tibble: 12 × 5
   term                   estimate std.error statistic  p.value
   <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)            3.53      0.240       14.7   3.54e-40
 2 ranktenure track      -0.107     0.0819      -1.31  1.91e- 1
 3 ranktenured           -0.0454    0.0651      -0.697 4.86e- 1
 4 gendermale             0.178     0.0514       3.47  5.78e- 4
 5 ethnicitynot minority  0.189     0.0761       2.49  1.32e- 2
 6 languagenon-english   -0.127     0.108       -1.17  2.41e- 1
 7 age                   -0.00666   0.00308     -2.16  3.10e- 2
 8 cls_perc_eval          0.00568   0.00154      3.68  2.65e- 4
 9 cls_students           0.000449  0.000357     1.26  2.09e- 1
10 cls_levelupper         0.0184    0.0555       0.331 7.41e- 1
11 cls_creditsone credit  0.511     0.116        4.40  1.36e- 5
12 bty_avg                0.0611    0.0166       3.67  2.67e- 4

Code

glance(m_prunned_1)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.163         0.143 0.503      8.01 8.30e-13    11  -333.  692.  746.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

The new adjusted \(R^2\) is \(14.3\%\) and we will use it in step two:

Step 2

We repeat the process above using m_prunned_1 as the baseline. Remember, its adjusted \(R^2\) is \(14.3\%\).

First, drop gender:

Code

rm_gender <- lm(score ~ rank 
                      + ethnicity 
                      + language 
                      + age 
                      + cls_perc_eval 
                      + cls_students 
                      + cls_level 
                      + cls_credits 
                      + bty_avg, data = evals)
#tidy(rm_gender)
glance(rm_gender)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.141         0.122 0.510      7.43 5.80e-11    10  -339.  702.  752.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Next, remove rank:

Code

rm_rank <- lm(score ~ gender 
                    + ethnicity 
                    + language 
                    + age 
                    + cls_perc_eval 
                    + cls_students 
                    + cls_level 
                    + cls_credits 
                    + bty_avg, data = evals)
#tidy(rm_rank)
glance(rm_rank)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.160         0.144 0.503      9.61 1.82e-13     9  -334.  690.  736.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Next, remove ethnicity,

Code

rm_ethnicity <- lm(score ~ rank 
                        + gender 
                        + language 
                        + age 
                        + cls_perc_eval 
                        + cls_students 
                        + cls_level 
                        + cls_credits 
                        + bty_avg, data = evals)
#tidy(rm_ethnicity)
glance(rm_ethnicity)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.152         0.133 0.506      8.10 4.42e-12    10  -336.  697.  746.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Next, remove language:

Code

rm_language <- lm(score ~ rank 
                        + gender 
                        + ethnicity 
                        + age 
                        + cls_perc_eval 
                        + cls_students 
                        + cls_level 
                        + cls_credits 
                        + bty_avg, data = evals)
#tidy(rm_language)
glance(rm_language)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.161         0.142 0.504      8.67 5.04e-13    10  -334.  692.  741.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

No improvement.

Next, remove age:

Code

rm_age <- lm(score ~ rank 
                  + gender 
                  + ethnicity 
                  + language 
                  + cls_perc_eval 
                  + cls_students 
                  + cls_level 
                  + cls_credits 
                  + bty_avg, data = evals)
#tidy(rm_age)
glance(rm_age)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.155         0.136 0.506      8.28 2.25e-12    10  -336.  695.  745.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

No improvement.

Next, remove cls_perc_eval:

Code

rm_cls_per_eval <- lm(score ~ rank 
                            + gender 
                            + ethnicity 
                            + language 
                            + age 
                            + cls_students 
                            + cls_level 
                            + cls_credits 
                            + bty_avg, data = evals)
#tidy(rm_cls_per_eval)
glance(rm_cls_per_eval)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.138         0.119 0.510      7.26 1.11e-10    10  -340.  704.  754.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

No improvement.

Next, remove cls_students

Code

rm_cls_st <- lm(score ~ rank 
                      + gender 
                      + ethnicity 
                      + language 
                      + age 
                      + cls_perc_eval 
                      + cls_level 
                      + cls_credits 
                      + bty_avg, data = evals)
#tidy(rm_cls_st)
glance(rm_cls_st)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.161         0.142 0.504      8.64 5.53e-13    10  -334.  692.  742.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

No improvement.

Next, remove cls_level

Code

rm_cls_level <- lm(score ~ rank 
                          + gender 
                          + ethnicity 
                          + language 
                          + age 
                          + cls_perc_eval 
                          + cls_students 
                          + cls_credits 
                          + bty_avg, data = evals)
#tidy(rm_cls_level)
glance(rm_cls_level)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.163         0.145 0.503      8.82 2.83e-13    10  -333.  690.  740.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Model improves:

Next, remove cls_credits:

Code

rm_cls_credits <- lm(score ~ rank 
                          + gender 
                          + ethnicity 
                          + language 
                          + age 
                          + cls_perc_eval 
                          + cls_students 
                          + cls_level 
                          + bty_avg, data = evals)
#tidy(rm_cls_credits)
glance(rm_cls_credits)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic       p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>         <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.128         0.108 0.514      6.61 0.00000000137    10  -343.  710.  759.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

No improvement.

Next, we remove bty_avg:

Code

rm_bty_avg <- lm(score ~ rank 
                      + gender 
                      + ethnicity 
                      + language 
                      + age 
                      + cls_perc_eval 
                      + cls_students 
                      + cls_level 
                      + cls_credits, data = evals)
#tidy(rm_bty_avg)
glance(rm_bty_avg)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.138         0.119 0.510      7.26 1.11e-10    10  -340.  704.  754.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

We notice that removing cls_level leads to the most improvement in the adjusted \(R^2\) of \(14.47-14.3=.17\). So, our new improved model (call it m_prunned_2) is:

Code

m_prunned_2 <- lm(score ~ rank 
                        + gender 
                        + ethnicity 
                        + language 
                        + age 
                        + cls_perc_eval 
                        + cls_students 
                        + cls_credits 
                        + bty_avg, data = evals)
#tidy(rm_prunned_2)
glance(m_prunned_2)

# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.163         0.145 0.503      8.82 2.83e-13    10  -333.  690.  740.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Step 3

We use the model from step 2 above along with its adjusted \(R^2\) as the baseline for step 3. The rest of the work is left as an exercise.

Exercises

(8 pts) In class, we began the backward elimination method (based on adjusted R-squared) for model selection. Complete the steps to come up with the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.
(4 pts) Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.
(4 pts) Pick one slope for a numerical variable in your model and one for a categorical variable and interpret them in context.
(4 pts) Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?