When trying to improve educational outcomes, it is hard not to feel the need for urgency. We want to figure out what works now andimplement changes now because if we wait, those kids who are in schools now will miss out. Unfortunately, this pressure to act quickly may be fundamentally at odds with the ability to measure what really works, since a meaningful change in the trajectory of student achievement is not always apparent until years later. Diane Whitmore Schanzenbach of Northwestern University provides a compelling example of exactly this conundrum.
Schanzenbach's thesis is that too often, education research only assesses an intervention's immediate or intermediate outcomes without capturing its full long-term benefits. This may be particularly relevant, she asserts, when judging the impact of early childhood investments.
Schanzenbach offers the example of two studies (both of which she co-authored) on the famous 1990s Project STAR class size experiment in Tennessee. That well-known experiment assigned students randomly to either regularly sized classes or small classes. Researchers behind both papers (the first from Dynarski, Hyman and Schanzenbach, 2013, and the second from Chetty, Friedman, Hilger, Saez, Schanzenbach and Yagan, 2011) found that the smaller kindergarten classes yielded an immediate bump in students' test scores for that year—but both papers report that this bump faded as students entered middle school.
But that's not the end of the story. When the students became adults, clear positive impacts reemerged, so to speak, for those students who had been placed in the smaller classes. Schanzenbach concludes that "we find that the actual long-run impacts were larger than what would have been predicted based on the short-run test score gains."
The failure of test score gains to endure and carry through to what later turn out to be positive outcomes may confirm public skepticism about test scores as an accurate indicator of long-term achievement.
But not so fast.
Schanzenbach is right in noting that the fade-out of higher test scores two to six years after the intervention did not correlate with more positive life outcomes. However, the immediate test score gains from the year of the intervention, when students were in kindergarten, were highly predictive of students' college attendance and degree completion. Schanzenbach admits as much, stating with her colleagues that "the short-term effect of small classes on test scores, it turns out, is an excellent predictor of its long-term effect on adult outcomes," (Dynarski et al., 2013).
Schanzenbach's theory finds stronger footing in her second paper, Chetty et al. This paper looked at both kindergarten class size and each student's kindergarten classroom quality (as measured by the average test scores of his classmates at the end of kindergarten—a proxy for a combination of peer effects, teacher effects, and other classroom characteristics). Again, small kindergarten classes correlated with higher kindergarten test scores and higher college attendance.
Moreover, while the higher kindergarten test scores were correlated with higher earnings at age 27, they provide a statistically significant explanation for only a small portion of the difference in earnings. Thus, the short-term test score bump can barely begin to explain the benefits students derived later on in life from having been assigned to a smaller or higher-quality class.
The missing piece of the statistical puzzle was students' non-cognitive skills. When the STAR students were in 4th and 8th grades they were assessed on non-cognitive outcomes, with results finding stronger non-cognitive outcomes but faded test-score gains for the students who had been in the small class sizes.
Furthermore, these non-cognitive measures seem to explain a much greater share of future earnings than do the academic outcomes. Teasing apart the positive impact of higher test scores and stronger non-cognitive skills achieved in a one-standard deviation higher-quality kindergarten classroom, the higher 4th grade test scores would predict an additional $40 of income at age 27 but, the non-cognitive skills would predict an additional $139 in earnings.
Although we think Schanzenbach's characterization of the findings in Dynarsky et al. undersells the predictive power of immediate test score gains, she does raise several critical points. The first is that early childhood interventions may foster outcomes that most strongly emerge long after the initial study period has ended, thereby eluding researchers who only measure immediate and intermediate outcomes for a few years. The second is that interventions may yield effects that cannot be evaluated purely by measures of academic skills and content. As our understanding of the importance of grit and executive functioning grows, so too should our measures of the impact of classroom experience on these skills alongside standardized test scores.