One of the contentious aspects of the Chicago teacher strike is the role of standardized testing. As is typical of the reform movement, the Chicago school district is pushing for even more standardized testing. Teachers are resistant for a variety of reasons. Research suggests that increased standardized testing does not improve student outcomes, although those results must still be considered preliminary. Time spent “teaching to the test”– an inevitable consequence of high-stakes testing– robs students and teachers of the most valuable educational resource. What’s more, it is a simple fact that more standardized testing leads to more cheating, fraud, and abuse, from students, teachers, and administrators alike. That is not a normative statement; it is an empirical statement. Finally, there is widespread anecdotal evidence of the extreme psychological and emotional costs that repeated high-stakes, high-pressure testing has on children.
I would just like to add an important element to this: we don’t need to test everyone every year to have an extremely accurate picture of how our students are performing. People say that we need to know where our students stand and if they’re improving. And indeed we do! But we can find that information without subjecting all of our students to stressful testing that takes away valuable class time and invites considerable negative washback. Appropriately stratified samples, carefully selected, can tell us what we need to know about districts, states, and the nation. And we can express the accuracy of that information with mathematical precision using statistics like standard error and confidence intervals.
As Dr. Stephen Krashen, Professor Emeritus in Education at the University of Southern California– someone whose expertise and credentials are beyond reproach– has said, “One function of such tests is to compare groups and investigate factors related to high achievement, which works if tests are valid and are low-stakes and thus do not encourage cheating. But we don’t have to test every child in every grade every year…. When you go to doctors, they don’t take all your blood; they take only a sample.” We have developed validity and reliability tests, measures of statistical error, and processes for accounting for that error precisely so that we don’t have to check everyone. And lest you think that checking everyone is necessarily more accurate than extrapolating from samples, that’s not the case, even if we assume the validity of the test in assessing its given construct.
I can only conclude two things: first, that people simply don’t understand the power and accuracy of inferential statistics to describe complex realities like student academic achievement; and second, that people resist extrapolation out of the impression that this it offers a less effective bludgeon with which to attack teachers.
Look, I consider myself a quantitative researcher, among other things, and I hope to publish on questions of language testing and assessment. I just this past week agreed to peer review for a major language assessment journal. I’m not opposed to testing entirely, not at all. But their limitations are real, their negative consequences empirically verified, and most importantly, their primary strength ignored when they are used on all the kids, all the time. For our data collection, testing all kids twice during their educational careers, as the gold standard NAEP tests do, in addition to targeted stratified samples that can be minimally intrusive, is more than enough. If the purpose of testing isn’t data collection, but rather having an instrument to assault teachers, well, that’s a total betrayal of our children and our educational system.
Taylormattd
Freddie, have you apologized to Ezra Klein for your post trashing him, given he did not write the column you were bitching about?
BGinCHI
Freddie, do you find that most, some, or none of the people who advocate these testing regimens are experienced teachers?
Serious question.
Anyone who has spent years in the classroom would, in my experience, which is extensive, favor a measured approach such as what you describe above. Almost no teachers are for all the testing. In fact, I’ve never heard a teacher say it was the right move to turn classrooms over to a prescribed set of tests that engineer the curriculum to prepare for them.
That may be anecdotal, but it’s just no contest.
Massive testing appeals to administrators and politicians but not to teachers. What does that tell us?
Linnaeus
For some reason, I can’t get my comment to post with a link to it, but there’s an In These Times article by Mike Elk that points out that the director (David Magill) of the school to which Rahm Emanuel sends his own children (the University of Chicago Lab School) is a skeptic when it comes to using standardized tests as a metric:
Raven
@Taylormattd: What are you, a Jack Russell or what?
Raven
@Taylormattd: What are you, a Jack Russell or what?
srv
All you PhD’s and Quantatative Researchers would be out of jobs if we had standardized testing. As long as you keep fiddling with the curriculi with your new hipster fashions, kids will never be able to make change at the cash register.
Bulworth
Freddie, there was an editorial (not an op-ed) in the NYT today more or less saying that the Chicago school system changes–involving teacher evaluations be based more on standardized test scores–originated from state law. Can you speak to this?
Xecky Gilchrist
people simply don’t understand the power and accuracy of inferential statistics to describe complex realities
Indeed. Our country would be a very different place if the public had even a basic grasp of statistics. And the difference between a “million” and a “billion,” for that matter.
You’re right about people’s resistance to extrapolation, and it shows up in a lot of places. It lets them discount the results of polls they don’t like or exaggerate the importance of polls they do like, lets them dismiss as “just a theory” any scientific findings that run against their preconceptions, and so on.
I guess it’s just another case of the principle that wherever there’s ignorance, there’s exploitation, and the wingnut authorities are in the thick of it.
Villago Delenda Est
Testing is being used as a blunt instrument to forward an agenda that has nothing to do with the education of children.
There is so much bad intention here, so much dishonesty.
Linnaeus
@BGinCHI:
I’m having trouble linking to it, but there’s a recent In These Times article in which the director of the private school to which Rahm Emanuel sends his own children (the University of Chicago Lab School) comments that he doesn’t think standardized tests are a good measure of learning.
Bulworth
Freddie, there was an editorial (not an op-ed) in the NYT today more or less saying that the Chicago school system changes–involving teacher evaluations be based more on standardized test scores–originated from state law. Can you speak to this?
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
Raven
@Taylormattd: What are you, a Jack Russell or what?
Villago Delenda Est
Testing is being used as a blunt instrument to forward an agenda that has nothing to do with the education of children.
There is so much bad intention here, so much dishonesty.
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
Brachiator
No such thing, but thanks for trying.
No. And you are mixing apples and oranges. Whether standardized tests are useful in describing “complex realities like student academic achievement” is one issue.
The use of inferential statistics to provide useful information about student achievement is another matter altogether.
I sympathize with what you are trying to do. I don’t think you are making your points.
Sigh. Almost too much confusion here to untangle.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
Xecky Gilchrist
people simply don’t understand the power and accuracy of inferential statistics to describe complex realities
Indeed. Our country would be a very different place if the public had even a basic grasp of statistics. And the difference between a “million” and a “billion,” for that matter.
You’re right about people’s resistance to extrapolation, and it shows up in a lot of places. It lets them discount the results of polls they don’t like or exaggerate the importance of polls they do like, lets them dismiss as “just a theory” any scientific findings that run against their preconceptions, and so on.
I guess it’s just another case of the principle that wherever there’s ignorance, there’s exploitation, and the wingnut authorities are in the thick of it.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
Raven
@Taylormattd: Are you a Jack Russell or what? Who gives a shit about Ezra Klein?
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
Raven
@Taylormattd: Are you a Jack Russell or what? Who gives a shit about Ezra Klein?
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
raven
@Brachiator: Steven “Monitor Theory” Krashen?
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
KSH
I love this post. I am an engineer that teaches statistics at the graduate level. Thank you so much.
Villago Delenda Est
FYWP is having one of its database episodes.
Watch for multiple identical comments in the near future.
That is all.
raven
@Brachiator: Steven “Monitor Theory” Krashen?
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
Villago Delenda Est
FYWP is having one of its database episodes.
Watch for multiple identical comments in the near future.
That is all.
Belafon (formerly anonevent)
There’s a failure in the blood analogy, though: I can’t pick the parts of my blood that I want to be tested. The kids that would get to take these tests would be the ones who are good at test taking.
And I’m of the opinion that most of these tests are useless.
Villago Delenda Est
FYWP is having one of its database episodes.
Watch for multiple identical comments in the near future.
That is all.
Villago Delenda Est
FYWP is having one of its database episodes.
Watch for multiple identical comments in the near future.
That is all.
Villago Delenda Est
FYWP is having one of its database episodes.
Watch for multiple identical comments in the near future.
That is all.
BGinCHI
@Linnaeus: That wouldn’t surprise me. That’s a great school and they don’t fuck around with stuff that doesn’t work. It’s all Dewey all the time there and that ain’t gonna be testing regimens….
The Bearded Blogger
Great post, thank you. I really hope democrats address this testing madness at some point.
James Hulsey
I agree that sampling would be better, but that can certainly be biased if the sampling is not truly blind.
I was a good student, and somehow, when it it was time to take survey tests like that, I was usually in the sample.
Also, those tests were usually very long (a full day), and with less motivated people (and low stakes) they were quite likely to just quit rather than finish.
I would like to think the methodology has gotten better in 30 years. But high-stakes testing of all students is not the answer.
taylormattd
@Raven: Yes, yes, I get it. It’s perfectly fine to post a fact-free rant, almost libeling someone, as long as the person who writes the rant is your buddy. Good to know you don’t give two shits about such things.
jl
@Brachiator:
I think some of the post makes good points. If annual standardized testing of every student has adverse effects on education, then that is a problem. But if annual standardized testing of every student is considered the only reliable gold standard measurement, it will be difficult, to say the least, to ever measure those adverse consequences, if you only have that one instrument.
But I cannot figure out parts of the post. I have no idea what this means, for example:
” people resist extrapolation out of the impression that this it offers a less effective bludgeon with which to attack teachers. ”
What is being extrapolated and who is exrapolating, and for what purpose?
If some of the better test protocols deBoer advocates can predict out of sample better than other methods, even for new samples that are small, that is a powerful validation, even if you cannot do most conventional inferential statistics on out of sample forecast performance.
But if mass quantity annual tests are the gold standard, what is there to extrapolate? You are taking a population measurement every year, they produce summary statistics and no inference needed, and the concept of out of sample means nothing. You assume you have the gold standard, you take a population measurement, and you are finished. That is profoundly unscientific approach, even on a common sense level, but if you can win just with slogans, it has “Science!” written all over it. True junk science, but you can sell it as the real thing if there is no time to explain, again on common sense level, but that takes time.
So, by the end, I did not understand some of the points deBoer was trying to make. Maybe I do not know enough about the politics of the great public school wars.
Raven
@taylormattd: You got that right.
pofthree
I would support all efforts to cut public education.
Both my parents were teachers, and most of my elder kin. A family gathering is like a teachers convention. Hordes of graduated students drop by my parents every year. I have nothing but admiration for them.
But the system has changed. My dad _knew_ every kid in his class and their parents. Today when I dropped my third kid into class, I felt like I was talking to an automaton. Same with the school office and the district office. All smiles and polite responses – the kind you get from trained customer service reps who refuse to solve anything.
All this despite the fact that class sizes are much less than when my parents were teaching. That is the sum total of my experience over 10 years and 3 schools and 3 kids.
If this was anything else in life, I would have taken my kids somewhere else. I am stuck here because I can’t afford to do so. And I’m fed up.
Have I met some good people? Yes. I would say that this is the difference between the past and now. Good, competent people in the system are the exception today.
Maybe all of you are right and money will fix the issue. I just don’t believe it that if I pay more these issues will be fixed. It will be more money to feed the monster in place now.
kindness
My wife is a teacher. Most years her kids are great. Some years though, you get a room full of kids who are frequently absent, don’t do any of their home work, don’t study and when you tell their parents about it, they don’t care.
This. This is why teachers shouldn’t be evaluated according to the test scores of their students. Some things are out of teachers control.
Davis X. Machina
@srv: Curriculums (English), or curricula (Latin). Curriculi is an impossible plural.
Brachiator
@jl:
I think that most standardized testing is a waste of time. It doesn’t tell you anything particularly useful, and often pointlessly stresses younger students. A fellow commuter is a second grade teacher (and I think an especially good one) and she speaks movingly of how much anxiety her students feel taking these tests. She also points out many of the weakenesses of testing these younger kids when they are still developing verbal and math skills.
I honestly don’t know how useful these tests are for older students.
But if the tests are fundamentally worthless, then sampling them is not going to tell you anything useful either.
However, the basic question remains: how do you accurately assess student performance and teacher effectiveness? It’s not just about attacking teachers.
I couldn’t figure this out either. Again, I sympathize, and think that deBoer has thought hard about this issue, but still has some trouble making an effective argument.
taylormattd
@Raven: He rips the shit out of Ezra Klein based on a column he believes Ezra wrote. Turns out Ezra didn’t write it. It is pointed out by maybe a dozen commenters, yet he never responds, never retracts, never says a word, and there it remains on the front page of this blog.
And that is perfectly fine with you? The real crime is pointing it out. You should start commenting at Brietbart.com, you’d fit in there.
Davis X. Machina
Nonsense. Given enough seismographs, we can prevent earthquates.
The Moar You Know
Testing is simply a method of driving teachers out of the classroom. Here in CA, the average career span of a teacher has dropped to five years. My wife, an English teacher, spends half her year teaching kids the tests – not teaching them English. Not reading. Not writing. Not analyzing. Nothing but teaching them the test.
Her job depends on how those kids score. Which is explained by the testers to the kids. The kids, BTW, suffer no negative consequences if they decide to throw the test and fill in deliberately wrong answers or none at all. So far, fortunately, only about five percent do that. So far.
The kids come out dumber and the teachers quit. Just the way the profiteers waiting in the wings wanted.
Raven
@taylormattd: what-the-fuck-ever, take it up with the blog owner.
Dennis SGMM
In a phrase popularized and probably invented by Mark Twain, “There are three kinds of lies; Lies, damned lies, and statistics.”
Sly
@Belafon (formerly anonevent):
That can be remedied through a random selection process that’s out of the hands of the district.
A big part of why these tests are useless is that, because they are to be taken by many students at once, they are designed with an efficient grading process in mind. Multiple-choice bubble tests are nearly useless in every academic subject, yes, but they are exceptionally easy and fast to grade.
SpotWeld
The thing that frequent standardized tests are great at is producing metrics. Nice spreadsheets of numbers that you can boil down into pareto charts and graphs that allow a functionary to crank though a set formula and then summarize into a few bullet points.
They can they say the “insert trendy buzzword filled overprized pre-packaged 3-ring binder” program shows that we’ll get maximum efficiency and raise student scores if we eliminate this and this. (And if a little standarized testing is good, even more must be better!)
It’s awesome for midlevel functionaries. No depth needed, results are nice and glossy and look awesome in PowerPoint slides. Politicians can hand pick people by ideological agreement and never worry about having to deal with skilled and talented people who are inconveniently not in line with party goals.
Yup, rapid-fire standardized tests are great. They make it look like that a bloated administration is needed and necessary, which is exactly the primary goal of a bloated administration.
Teaching on the other hand… well, they’ll get to that once they get enough test results (oh, bad news, there’s nothing that says what defines “enough test results”)
ding dong
@Raven: do you have to keep on asking the question again? He’s a pitbull. Btw you sound like a Jack Russell youself:-)
SpotWeld
@Sly: I concurr. Standardized test can be a good tool for checking to see if a student has learned a certain amount of minimum basics.
The minimum basics needed to ensure a school is covering certain manditory topics, and the minimum basics needed to enter college. To see if a sudent is excelling, to see if they are getting deeper context, to see if they are reaching a point where they can activly learn beyond the classroom. There may be testing that can cover that, but it’s not going to be standardized.
FormerSwingVoter
I think one issue with the strike is that a lot of people feel like teachers need to be evaluated on some level, and the worst performers kind of need to be weeded out, as terrible as that sounds. It happens in lots of other fields – people are bad at things, so it’s best that they move into fields that they’re good at. I was a terrible computer programmer, now I’m an pretty good marketer.
Now, I’ve never worked in education in any way, shape, or form, so take anything I say with a grain of salt… but it seems to me that the criteria that a lot of unions want to use, like seniority, wouldn’t necessarily correlate with being the best at their jobs. I totally understand that standardized testing may be an even worse measure, but the perception out there (pushed by the media, natch) is that teacher unions are trying to resist teacher evaluation completely.
So how can we evaluate teachers in a way that makes sense and reward the best ones? Seniority seems to be a sub-par way to do it, but standardized teaching is even worse. But if we can manage to figure out how to solve that problem we can improve schools in a big way.
Raven
@ding dong: It was the database error.
taylormattd
@Raven: Yes, I’ll send Cole an email complaining about Freddie, instead of, you know, attempting to ask Freddie directly. Perfectly logical.
Nicole
@FormerSwingVoter:
The reason unions protect seniority is because it is far more likely that an employer will push out a good senior employee so they can hire a newbie and pay the newbie less, than it is that a senior employee has slid by for years, getting raises, by being bad at the job.
This “getting rid of bad teachers” thing is simply a way to justify paying teachers less through getting rid of senior teachers. If we were really concerned about the quality of our teachers, we’d make the starting salary high enough to be competitive with other fields that require similar levels of education.
Ohio Mom
@FormerSwingVoter: Hard to know where to begin on this one, so many intersecting issues.
First though, you should know that in places where teachers have tenure, they aren’t given it right away; the principal can always forego renewing the contract of a first/second/third year teacher he/she isn’t thrilled with.
Second, even tenure does not guarantee a job for life; it only ensures that a principal has to carefuly document the teacher in question’s failings and follow a due process procedure in terminating that teacher. And there are already procedures in place to evaluate all teachers; in my state, even the most senior teachers must be observed by an administrator every few years.
Third, many beginning teachers realize all by themselves that teaching is not for them; a lot of people only stay in the field for a few years.
Fourth, there are many reasons to protect seniority but a big one right now in our current age of reduced budgets, is that more experienced teachers are more expensive. You can get two young uns’ for the price of someone who has been around a while. If you’re depending on standardized test scores, this is easy to rig. Just give the older teacher a bunch of kids who aren’t going to do well on the test, and are going to cause a lot of issues all year long which in turn will disrupt everyone else’s learning and voila, low test scores for the class as a whole and the excuse you need to boot that teacher.
BUT the main issue here is the framing. Why are we adapting the ed deformer’s language and assuming the biggest problem is bad teachers? Even if we lived in a Lake Wobegon of teachers, where every single teacher was above average, you’d still have a lot of underachievement, and most of it would be because we have many, many more poor kids than most other developed countries.
Finland has something like 2-3% of their children in poverty, we have almost a quarter. *Our schools that serve kids who aren’t poor, score as well, if not better, than kids in other developed countries. But our poor kids don’t, and they pull our averages down.* It’s often said, we have a poverty problem, not a school problem.
Chris
It wouldn’t matter if you tested only a small sample. As long as jobs and school closures hinge on test results, they’d still have the same incentives to teach to the test. Unless they know in advance which kids will get tested, they’d have to do the test-prep with everyone. It’s the high-stakes nature of the testing, not just the amount of testing, that’s the problem.
Marc
Standardized tests have real statistical value – for example, lets say that you change the way you teach algebra. How well does method 1 compare with method 2?
They can also be useful at the individual class level as long as you understand the intrinsic noise. If otherwise similar classes have scores ranging from 30 – 70 95% of the time, then you don’t actually know that a teacher whose class gets 30 is any worse than one whose class gets 70. But if test scores consistently plunge by a lot for kids with one teacher then there is a problem.
I’ve actually come to appreciate things like student evaluations of instruction at the college level. The relative scores from my students track my own self-assessments very accurately. The scores assigned to my peers are consistent with what I observe when I’m in class doing peer evaluations. You really can get information from a properly designed test; you just need to understand that numbers have errors and not to assign consequences to things that are statistically the same.
Janus Daniels
@pofthree: Where are you sending your children to school?
Arclite
It’s not just that they test the kids every year, but multiple times a year. My daughter took three sets of standardized tests last year: fall, winter, & spring.
For her, these are low stakes, low pressure. They don’t reflect on her grade, she’s very well educated (her Lexile score for 4th grade was 1210, high school level) so she finds the tests “easy”, and she views the tests as “puzzles” to solve, so they’re actually quite enjoyable for her.
That being said, I don’t think most kids are like that, and that’s where the issue comes in. My son is only in Kindergarten, but he freezes up when isolated. He can read and write and count with the group, but get him alone, and he struggles, not for not knowing, but just from the stress. Otherwise he’s a normal (but shy) very intelligent kid (was speaking at six months). I wonder that despite being raised in the same way as his sister, he’ll have the same outcome b/c his personality is very different and may not be suited to filling in bubbles.
Ohio Mom
@Marc: Nobody is giving or looking at standardized tests in the public schools in order to try to figure out what teaching methods work best.
There is indeed a lot of research on what teaching methods work best — short answer is, depends on the learning style of the student, different kids learn best different ways and teachers need to accomodate those different styles — and that research both predates the NCLB/RTTT testing regimens and continues on without any input from those regimens.
That said, there is an exception to what I just said, and that is the fevered review school districts do of the tests and their schools’ results so that in ensuing years they can do a better job of teaching to the tests. But I don’t think that’s what you meant…
High stakes tests have two purposes. They’ve been used to make it look like our schools are failing, and now the ed deformers want to use them to make it look like the teachers are also “failing.”
In the end, it’s all to help in the effort to privatize/monatize K-12. And don’t think you’re immune over there in the higher-ed universe. They are coming after you even now, they just have full attention for you yet because they’re still working on the lower grades.
stinger
As a former teacher, I find the idea deeply insulting that without frequent standardized testing, teachers have no idea how their students are doing.
Arclite
@Ohio Mom:
And it’s not even poor kids. My daughter’s public school is in one of the wealthiest parts of the state, and there are many underachieving kids there. The teachers can’t do it all, and the parents are too busy with their own lives to care. The kids play games or read comics or TV for hours after school, parents don’t check homework or sit with their kids or help their kids. The parents simply don’t imbue their children with a love of learning.
Comrade Nimrod Humperdink
@Brachiator:
The best, ideal way to do it, in my limited experience (I worked as an English adjunct at a State U for several years before I left the US) is performance assessment. Papers. Projects. Presentations. That means that evaluating teachers is based on what you see from the students in those performances, and evaluating their curriculum designs, as well as classroom observations (random if necessary) and so on and so forth.
Designing performance tasks requires the students to demonstrate some understanding of the concepts being discussed, and how to work with them, apply them. Multiple choice tests are big favorites because you can run them through a scanner, and the results can be used to blame teachers when they’re bad or as a feather in your political cap if they’re good. And they’re cheap. Expert evaluation of performance-based curriculum is time intensive and expensive. It would require far more in terms of resources than the states or the country would ever be willing to put into it, I would wager.
I took a seminar on writing assessment that introduced this concept by putting a list of ten steps to make a pot of coffee on a sheet of paper. If you mess one up as an abstract academic exercise, hey, you get 90% on the assignment, that’s an A. But you don’t have coffee, do you?