The results of a fascinating experiment by two researchers, Brian Jacobs and Lars Lefgren, offers some food for thought on making performance pay work. Against a background of previous research that found little correlation between evaluations by principals of their teachers and actual teacher effectiveness, Jacobs and Lefgren asked 13 elementary school principals to rate the effectiveness of their teachers?and only their effectiveness. The principals were instructed to disregard other teacher behaviors that usually get thrown into the pot on the standard district evaluation form, such as consistently showing up for faculty meetings or happily subbing for other absent teachers.
As most of us (except perhaps a building union rep) might have expected, Jacobs and Lefgren did turn up a positive correlation between principals' assessments of teacher effectiveness and their contribution to test scores, but we're not talking at the level of a soothsayer. The correlation was 0.32 in reading and 0.36 in math, not all that much higher than if the principals had engaged in pure guesswork.
There's more agreement between principals and test results on those teachers who are really good and those who are really weak, the top and bottom 15 percent. There the correlation was 52 percent in reading and 69 percent in math. The strong match here suggests that both performance pay for the very best teachers and better-constructed efforts to dismiss the very weakest teachers could be supported by a combination of testing and evaluations.
But there's something a little disturbing about the experiment and the researchers' presumptions. While Jacobs and Lefgren ask an important question that needs asking, namely "Can principals identify effective teachers?", they never consider the possibility that the tests in question may not be identifying the most effective teachers, that the principals may have other data points that make them as likely or more likely to be right about a teacher's true effectiveness. Shouldn't they have at least posed the question rather than assume that these tests are always superior to us mere mortals?
This is an issue that cannot be ignored nor should it be the domain of the camp that hates tests and accountability. It does the accountability movement a disservice to assume these tests, no matter how they are used, provide infallible instruments of teacher effectiveness when we already know a lot of things about them that suggest some worrisome weaknesses or at least certain parameters that should define them. They're highly unreliable if only a single year is considered. There is a lot of evidence to suggest--only bolstered here by the findings here--that tests of reading are not as good at measuring a teacher's effectiveness as tests of math.
We must confront what these tests can and cannot do fairly and reliably. For more about this issue, read Kate Walsh's new essay "If Wishes Were Horses: The Reality Behind Teacher Quality Findings."