NBER has a new working paper on the challenges of assessing value-add in the education context:
Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted valueadded measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height.
We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice
Evaluation and properly crediting the appropriate causation is tough. It is especially tough to do when there are strong inhibitions or limitations to accurate randomization.
This leaped out at me as I’m currently working on a couple of papers about health plan quality. The data that we are using is a series of composite measures of consumer surveys and claims analysis to try and figure out which characteristics of insurers are associated with high value care and what insurer characteristics are associated with low value care. We can find the associations easily enough. But as soon as we want to start thinking about hypothesizing about causal pathways of outcomes, I get a splitting headache.
Walker
This stuff is one of many reasons Arne Duncan was the worst hire of the Obama administration.
Steeplejack
@David Anderson:
“Pays”? Can’t think what that might be.
Sab
I did door to door canvassing on a state ballot issue to reverse one of Gov Kasich’s stupid legislative achievements. My favorite partner was an inner city school teacher originally from a tiny white city in rural Ohio. She loves her current school and kids. She told me that in her school the teachers worked cooperatively, since the many of the same kids passed on from one grade to another in the same school. ” If my pay depends on competing with other teachers in my same school, I surely will not help those other teachers. If I help them do better then I do worse.”
daveNYC
What’s the definition of high value vs. low value care? Life expectancy? Quality of Life? The credit card debt the patient racks up while trying to achieve either one?
Kay
@Walker:
He was. It was also a legitimate betrayal of teachers, who overwhelmingly backed Obama and were assured they wouldn’t get a pick like Duncan, who they were already familiar with because of his work in Chicago. The thing went sideways from the start because they felt (rightfully) that they had been tricked. It just got worse from there.
clay
As a Florida teacher whose pay depends at least partly on my “VAM” score, let me just say that it’s nice to get some evidence to support what we’ve long suspected: the VAM is arbitrary and useless.
Jim Bales
That is a cool result, thanks for the link!
Ohio Mom
The elephant in the room is the curriculum. If you have great teachers teaching crap, it hardly matters that they are great.
You see this with all the teaching to the test that is the legacy of NCLB. For example, when your reading lessons are all short pieces, followed by multiple choice and very short written questions, you do not get the long term results you would get from a robust LArts curriculum, with students reading real literature and writing lengthy pieces.
What gets included in science and social studies/history curricula is incredibly political. I’m still smarting over Ohio’s right-wing economics curriculum. “Public foods are businesses run by the givernment,” said Ohio Son’s middle school lesson. Really?
Ohio Mom
Oops that should be “public GOODS.”
It turns out it matters a lot who is in the state capital, decreeing what gets taught.
But those people do not have high public profiles, while just about everyone has a historical grudge about one teacher or another they had, and they often generalize that grudge against all that goes on in schools.
psycholinguist
Excellent find. This is one of the reasons my field (psychology) has been pressing for the adoption of effect size as a primary tool for interpreting the importance of an inferential finding. In a publish or perish world, people just keep adding participants, or running comparisons, to hit that magic .05 alpha. I’m going to borrow this for my stats class.
clay
@Ohio Mom: Did it really say “givernment”? On the one hand, that’s probably a typo, but on the other, it’s exactly the kind of derogatory thing a RWNJ would say!
Big Mango
37 years in the classroom 30 in the inner city, imho strength of personality and persistent are what separates the effective from the ineffective. You cannot measure the degree that a teacher gets into the head of a child.
Victor Matheson
I once wrote a paper showing the difficulty of measuring economic impact of sporting event on cities. We found that using some standard models you could “prove” that winning an NCAA title in women’s field hockey or men’s water polo (and a bunch of other sports) had a huge impact on personal incomes in the cities where these universities were located. Obviously, that is pure garbage, which was the point of the paper.
Of course, this doesn’t mean statistics and econometrics is wrong. It just means that they are tools that can be dangerous in the wrong hands.
Here is a link to that paper if anyone is interested. (Not sure why Gordon Gekko is listed as a co-author of that paper. Someone is screwing around with the data somewhere…)
Kent
At my last school I taught physics and oceanography.
With respect to physics there were four physics teachers in my department.
I taught general physics to average (non-AP) juniors, some were very bright but mostly non-science types.
Across the hall the department chair taught 5 sections of different AP physics classes to the brightest kids in the school, some bound for ivy leagues.
Next door to me a teacher taught remedial students who were a blend of special ed and exceptionally unmotivated students a dumbed down version we called Physical Science.
The fourth teacher taught just one section of physics and mostly electrical engineering and robotics courses.
How do you use test scores to determine who is the best teacher? Every single student in the AP classes could pass the state assessments with their eyes closed. My kids had about a 95% passing rate. The teacher next to me with remedial kids had more like a 70% pass rate. But she would also get kids who might show up after getting released in April from juvenile detention who’s test scores go on her record 2 weeks later even though she’s maybe seen them 2 weeks total that year.
Do you base it on improvement on previous science scores? None of them have had any physics since the 7th grade. They took chemistry or bio as sophomores. So all started with zero physics.
The whole notion is ridiculous.
PQuincy2008
Thank you for disseminating this. As a college teacher facing endless, dreary insistence that we perform “learning outcomes assessment” — an exercise in noise generation sans pareil, in my judgment — it’s nice to see that the bugaboo of ‘objective measurement of learning’ is just as vapid in areas where some would claim it sort of works, e.g., math performance.
FlyingToaster
This mess (standardized testing and the resulting mis-measurement of every damn thing) was one of the primary motivators for us to continue living in a “underperforming” municipality* and sending WarriorGirl to private school.
I’ve worked professionally on “Test the Tester” and “Test the Test” programs, and honest-to-Dog, all of them suck. Some suck more than others {insert every state grade-school, middle-school, and high-school standardized test here}, but they all have inherent biases and deficiencies that make them nearly damn worthless.
If you’re using them, year-over-year, to measure whether your curriculum is working, maybe they’re helpful (Of course, you’d have to change the curriculum in response). But standardized tests are entirely useless for measuring teacher effectiveness.
* In general, in New England, school districts are by municipality, or a combination of municipalities for high school. We don’t have “county” districts in Massachusetts at all.
ProfDamatu
@daveNYC:
I’m not an expert, but my understanding is that “low value care” would be things that don’t improve outcomes much, if at all, or might even be harmful. For instance, over-prescription of antibiotics, or ordering tests that aren’t strongly indicated by clinical symptoms right off the bat. “High-value care” is the opposite; interventions that do improve outcomes for the patient.
I am guessing that the reason this would be interesting in the insurance context is because we’d like to know if certain insurance plan designs are more likely to lead to higher rates of low-value care (because we’d like to cut down on this type of care as much as possible) or lower rates of high-value care being received by subscribers? Like, we already know that imposing higher cost-sharing results in almost indiscriminate care-skipping, not a reduction in low-value care only as had been the hope.
David Anderson
Yep, precisely that. One of the indicators is antibiotic prescription for colds. That does nothing good for the patient. Hell, it is mild negative as the patient’s gut bacteria just got whacked hard. But sometimes it happens and there is significant variance between docs/insurers.