Last month in TQB "Existential takes on first year induction and professional development" discussed a study funded out of the Institute for Education Sciences on new teacher induction. The study compared two highly respected and well delivered teacher induction models (those of the New Teacher Center and the Educational Testing Service) with what the district was offering as standard fare. In the first year of study, no differences were found in any of the outcomes we all care about, such as retention and student achievement.
However, we failed to note a feature of the study regarding the type of test used to measure student performance gains. We will not fail to note this problem in subsequent reports on research of this type:
"Teaching to the test" is an unfortunate phenomenon that is made possible when testing instruments become familiar to teachers. Measuring true student performance gains is problematic if some teachers are teaching to the test and others are not, or are not doing so intensively. Obviously, independent, non-district tests of the relevant content mitigate this problem and should be used if the treatment in the research changes the potential for a group of teachers to teach to the test.
The problem of different incentives felt by treatment and control teachers to teach to the test seems to have been present in the induction study: researchers found that treatment teachers spent significantly less time in professional development than control teachers on the topic of preparing students for standardized testing. The effect of this differential may have been magnified if the treatment teachers relied on the emphases put on topics in professional development to place a lower priority on day-to-day teaching than control teachers on preparing students for standardized testing. Yet despite the potential that differences in time spent teaching to familiar tests between the two groups affected student performance, the study measured student achievement gains using achievement tests that the district normally administers.
Obviously district tests can be used in education research in which the treatment at issue has no bearing on test-taking experiences. For example, measuring differences in student achievement in classrooms using standard light bulbs and classrooms with natural light can surely be measured using district tests. But if the treatment at issue can influence any classroom practices associated with testing itself, researchers would be well-advised to find a test that is aligned relative to content but not overly familiar to teachers to measure the impact of the treatment on student achievement.