Philosophy Admissions Survey: What determines success?

So far, we’ve looked at the effects of tradition, gender, and minority status on admissions, as well as some of the most successful candidates. But what are the factors that most determine success for any given candidate? How much weight does each factor carry? What are the best ways to improve one’s applications, in order to increase the odds of success? This post will hopefully answer a few of those questions. The analysis here is again completed by my spouse (this time in R, if anyone was curious).

———————

We wanted to determine what factors are associated with admission to top 50 ranked philosophy PhD programs. We had data from 95 individuals who had applied to 804 philosophy PhD programs. We used the results of the philosophy admissions survey, from which we extracted information about each candidate, which we coded into variables for the purpose of building a statistical model.

In the end, the models here were built based on 84 individuals. The total acceptances, wait-lists, and rejections were determined from the summary question at the end of the survey, for both PGR top-20 and 21-50 programs. 32 individuals took the survey before those questions were added to the survey; their results were added by hand.

84 responses is a large enough sample size to make some interesting conclusions about the population. However, the sample size does limit the number of variables that can be considered at one time. If we had tried to use the information from every question, it would be impossible to distinguish the variables that were making a difference in admissions rates from those that were not. It would also reduce the sample size, since some people left at least a few question blank. We had to eliminate some questions in order to get a better picture of what was making a difference. The variables considered are: gender, minority status, teaching/work experience, publications, graduate degrees in philosophy, undergraduate institution selectivity, GRE scores (all three sections), undergraduate overall GPA, and undergraduate major GPA.

The following variables can take values of zero or one:

gender: female (1) or male (0)

minority: participant is a minority (1) or not (0)

experience: participant has (1) or does not have (0) teaching experience

philgrad: participant has (1) or does not have (0) a graduate degree (generally a masters degree) in philosophy.

published: participant’s work has (1) or has not (0) been published in an academic philosophy journal of any kind

For some survey questions whose answers were of interest for this analysis, the survey asked participants to indicate which of several ranges they fell into. For example, does the participant’s undergraduate institution admit 0-25%, 26-50%, 51-75%, or 76-100% of candidates? We recoded these ordinal variables (meaning they consisted of ordered categories) into continuous variables (numbers that can take any value in a certain range) by using the value at the top of the range selected by the participant (e.g. 51-75% becomes 75%).

There may be meaningful differences between participants from schools with respective admission rates of 51% and 75%. Unfortunately, this information is not in our model. This sort of procedure is generally frowned upon in the statistics world because ordinal data behaves differently than continuous data. We did this because continuous predictors require one parameter each, while ordinal predictors require one fewer than the number of levels. For example, an ordinal variable with four possible values (e.g. 0-25%, 26-50%, 51-75%, or 76-100%) would require three parameters. In regression models, more parameters means more things have to be estimated, and there are more chances to be wrong. In general, simpler models work better. We already had too many variables and not enough data, so we made this compromise. We treated the following variables in this fashion:

selectivity: percentage of applicants admitted at participant’s undergraduate institution

gre_verbal: verbal GRE percentile

gre_quant: quantitative GRE percentile

gre_writing: writing GRE percentile

gpa: overall undergraduate GPA

majgpa: undergraduate GPA for classes in participant’s major (usually, but not always, philosophy)

The online survey also asked participants which of 3 orientations (analytic, continental, none) participants indicated in their applications. We turned this into two dichotomous variables:

analytic: participant’s application indicated an analytic orientation (1) or did not

continental: participant’s application indicated a continental orientation (1) or did not

Participants who selected “none” would have a value of 0 for both of these variables.

For each individual, we knew the number of programs applied to and the number of successful applications (those which resulted in acceptance or being placed on the wait-list). Because we were modeling a binary outcome (you get into a program or you don’t), we used logistic regression, a form of statistical modeling which takes this into account.

First, we tried building our model with all our predictors on our entire data set. Below is a summary of this model. “Estimate” refers to the estimated coefficient in the logistic regression model, equal to the change in the natural log of the odds ratio associated with a one unit increase in the value of the predictor. Since the only possible values for some of these predictors are 0 and 1 (gender for example), in these cases it refers to the difference in the natural log of the odds ratio between the two levels of the variable. The important thing here is that positive coefficients mean an increase in the predictor (or a value of 1) is associated with an increase in the odds of application success. Std. Error is a measure of how reliable the estimate of the coefficient is: a high standard error means the true value could actually be very different from the one shown. This doesn’t really matter for our purposes. Z value and Pr(>|z|) are referring to a Wald test, which is a comparison between the model we built and one that includes all the variables except the one being tested. A high value of z, which corresponds to a low value of p, means that the variable is contributing quite a bit to the predictive power of the model. A period or one or more asterisks to the right of the numbers indicates the level of statistical significance, as shown in the “Signif. codes” below. If none of these are present, this means that the contribution is not statistically significant.

Coefficients:

Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.88E+001 6.34E+000 -2.97 0.00302 **
selectivity -8.27E-004 4.07E-003 -0.2 0.84
minority -4.87E-001 3.24E-001 -1.5 0.13
gender 1.55E-001 2.23E-001 0.7 0.49
gpa 7.53E-001 7.33E-001 1.03 0.3
majgpa 1.24E+000 1.43E+000 0.87 0.38
experience -2.92E-002 2.43E-001 -0.12 0.9
analytic -1.41E-001 4.43E-001 -0.32 0.75
continental -3.13E-001 5.06E-001 -0.62 0.54
philgrad -1.00E-001 2.32E-001 -0.43 0.67
published -5.68E-002 3.20E-001 -0.18 0.86
gre_verbal 9.68E-002 4.86E-002 1.99 0.04629 *
gre_quant 7.66E-003 7.89E-003 0.97 0.33
gre_writing 1.72E-004 9.02E-003 0.02 0.98

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

At this point, only one predictor (verbal GRE score) was significantly contributing to the prediction of the model, with a positive coefficient indicating that higher GRE scores are associated with higher odds of a successful application. I thought maybe there were a few unusual observations that were unduly influencing the model fit, so I examined the residuals, and found one outlier (participant #38, with a residual of -2.4754). I removed the outlier and tried building the model again, but got pretty much the same thing. I thought that maybe having too many predictors was introducing too much “noise,” so I tried cutting four that looked like obvious duds (selectivity, major gpa, teaching experience, gre writing score) and rebuilding the model using the full data set.

Coefficients:

Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.88E+001 6.34E+000 -2.97 0.00302 **
selectivity -8.27E-004 4.07E-003 -0.2 0.84
minority -4.87E-001 3.24E-001 -1.5 0.13
gender 1.55E-001 2.23E-001 0.7 0.49
gpa 7.53E-001 7.33E-001 1.03 0.3
majgpa 1.24E+000 1.43E+000 0.87 0.38
experience -2.92E-002 2.43E-001 -0.12 0.9
analytic -1.41E-001 4.43E-001 -0.32 0.75
continental -3.13E-001 5.06E-001 -0.62 0.54
philgrad -1.00E-001 2.32E-001 -0.43 0.67
published -5.68E-002 3.20E-001 -0.18 0.86
gre_verbal 9.68E-002 4.86E-002 1.99 0.04629 *
gre_quant 7.66E-003 7.89E-003 0.97 0.33
gre_writing 1.72E-004 9.02E-003 0.02 0.98

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Now the Wald tests for gender, undergrad GRE and verbal GRE score were all significant. I tried building a model using only these 3 predictors.

Coefficients:

Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.74 3.03 -5.85 4.80e-09 ***
gender 0.35 0.17 2.13 0.033047 *
gpa 1.65 0.41 4.08 4.47e-05 ***
gre_verbal 0.1 0.03 3.72 0.000201 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

All 3 variables had significant Z tests.

Next, I performed a deviance chi square test to make sure the logistic regression model fit the data. The logistic regression model assumes a linear relationship between the predictors and the natural log of the odds ratio for the outcome (application success) with random errors following the binomial distribution; we used this same type of model here. This test compares the distribution of residuals (the differences between our predictions and the results) to what would be expected under this distribution, generating a chi squared statistic. A bigger difference means a higher value of chi squared and a lower p value. Traditionally, .05 is used as the cutoff for evidence of a violation of the logistic regression model. If a poor model fit (indicating violation of assumptions) was detected, that would invalidate our results. Incidentally, the chi square deviance test should only be used when there are multiple results for each combination of predictors (in this case, each applicant). Otherwise, its results are not valid. With chisq(73)=69.43829, p = 0.5964882, there was no evidence for lack of fit. The logistic regression model is appropriate.

Next, I performed an overall significance test for the model. I wanted to know if the predictions it generated were significantly better than just guessing. To answer this question, I compared the accuracy of our predictions for our sample to the level of accuracy that we would get by calculating the overall acceptance rate and using that as our guess for every participant’s applications. We calculated the deviance (a measure of the difference between what we observed and what we predicted) for both models, then tested the difference between the two deviance statistics for significance using a chi square test, chisq(3) = 58.93553, p = 9.922735e-13. The chisquare test for the overall model is highly significant, indicating that it performs much better than chance. I checked for outliers and found 2: Participants #23 (residual = 2.32834320) and 38 (residual = -2.54603528). I wanted to make sure that they weren’t having a large influence on the model, which can happen sometimes, so I tried removing them and rebuilding the model:

Coefficients:

Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.5 3.03 -5.77 7.95e-09 ***
gender 0.38 0.17 2.29 0.022186 *
gpa 1.67 0.41 4.07 4.65e-05 ***
gre_verbal 0.1 0.03 3.62 0.000299 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The results were pretty much the same. The coefficient estimates and Wald statistics changed very little. There was still no evidence for lack of fit, chisq(71) = 57.40618, p = 0.8782012. The model still performed significantly better than chance, chisq(3) = 57.8712, p = 1.674629e-12. Removing the outliers doesn’t really help, so we can keep them in and use the original 3-predictor model.

The final regression equation is:

y = -17.50348 + gender*0.38489 + gpa*1.67461 + gre_verbal*0.10037

This equation predicts the log of the odds ratio for a given application being successfully admitted or wait-listed to one program, given the gender, overall GPA, and GRE verbal percentile of the applicant. The variable gender takes a value of 1 for a woman and 0 for a man. The GPA is based on a 4.0 scale, and the verbal GRE is the percentile score.

For example, for a male with a GPA of 3.95 and a verbal GRE percentile of 87 applying to a program, the equation would yield a fitted value of y = -17.50348 + 0*0.38489 + 3.95*1.67461 + 87*0.10037 = -2.1565 . To interpret this, we need to antilog the fitted value to get an odds ratio of e^(-2.1565) = 0.1157 . This is equivalent to a probability of 0.1157 / (1 + 0.1157 ) = .1037, or a 10% chance of success.

The intercept in this case should be the odds when all the values are zero, meaning a male candidate with a GPA of 0 and scoring in the 0th percentile on the verbal section of the GRE. The model did not include any individuals in this range (since someone with an overall GPA of 0.0 is not graduating college, and not pursuing graduate admissions in philosophy), so a prediction in this range is likely not accurate. Given that almost all the applicants who filled out the survey had GPAs over 3.0 and verbal GRE scores above the 80th percentile, the model should not be used to extrapolate outside of that range.

 

 

1 thought on “Philosophy Admissions Survey: What determines success?

Leave a comment