With the advent of value-added methodologies, output measures have become the gold standard for measuring the impact of not just teachers on students, but also of the institutions which train the teachers. However, it's increasingly clear that getting the systems up and running to produce reliable and valid scores is no trivial matter.
The wisdom of a judicious approach is underscored by a recent CALDER study by Kata Mihaly, et al., shedding more light on some of the pitfalls.
The most dramatic finding: when measuring the effectiveness of teachers from different preparation programs who are not well distributed among different schools, it's pretty difficult to control for unobservable differences in school quality. If the analysis doesn't control for them, the means of distinguishing between teacher effectiveness and school quality is muddied.
In this study's case, using Florida data, there was sufficient school-based mixture of teachers from different programs--despite the fact that teachers tended to teach in schools near the programs from which they graduated--but the researchers caution that such differences cannot be taken for granted.
This is not an issue about which the 45 or so states who are now or will soon be developing similar value-added models can be complacent. Amazingly, the authors found that if the school-based mix of teachers for which student performance data is analyzed isn't sufficient for appropriate controls for school quality, then a teacher prep program that is rated as top-notch under one analysis can change to sub-standard on another, or vice versa.
Chalk up yet another aspect of the entire process of figuring out teacher effectiveness in which the devil is in the details.