“The four corporations that dominate the U.S. standardized testing market spend millions of dollars lobbying state and federal officials — as well as sometimes hiring them — to persuade them to favor policies that include mandated student assessments, helping to fuel a nearly $2 billion annual testing business, a new analysis shows.” [WaPo March 30, 2015]
“Pearson Education, ETS (Educational Testing Service), Houghton Mifflin Harcourt, and McGraw-Hill— collectively spent more than $20 million lobbying in states and on Capitol Hill from 2009 to 2014.” [WaPo March 30, 2015]
Standardized testing has its uses. Standardized testing is also a very remunerative business. And, because standardized testing is a very lucrative business it’s expensive for local schools. Are we getting what we’re paying for?
The initial answer to that question appears to be – probably not:
“Nevada will receive a $1.3 million settlement from the testing company partly responsible for the debacle that resulted in only a third of students completing their standardized exams this spring.
The New Hampshire-based company Measured Progress has agreed to refund 30 percent of the state’s $2.7 million contract, returning nearly $790,000, according to an announcement by the Nevada Attorney General’s Office on Monday. In addition, the company will provide $510,000 worth of assessments aligned to Nevada’s new science standards in middle school.” [RGJ] (August 24, 2015)
So, Measured Progress is out, one of the Big 4, CTB/McGraw-Hill is in, and Nevada is still using the services of Smarter Balanced Assessment Consortium.
Testing last Spring was a lovely hot mess. However, that really shouldn’t be the end of the discussion. Again, we need to ask: Are we getting what we’re paying for?
That depends on whether or not the people making the decisions about the administration of standardized tests, and those making the decisions about the use of information derived from the standardized tests are using the tests in rational ways. First, let’s look at what standardized tests are created to do:
“The folks who create standardized achievement tests are terrifically talented. What they are trying to do is to create assessment tools that permit someone to make a valid inference about the knowledge and/or skills that a given student possesses in a particular content area. More precisely, that inference is to be norm-referenced so that a student’s relative knowledge and/or skills can be compared with those possessed by a national sample of students of the same age or grade level.” [ASCD]
What is being described in the paragraph is a “norm referenced” test, in which the student is measured against a hypothetical “average” student. A criterion referenced test measures the student’s performance against a set of pre-determined criteria, or learning standards. If we are speaking about being able to assess whether a youngster has met the achievement standards in 4th grade arithmetic then a criterion referenced test would obviously be the best choice. [FTO]
Let’s look at the Nevada High School Proficiency test for a moment, specifically at the reading portion. (pdf) Passages are printed out for the student to read, and then three to six questions, in multiple choice format, are allocated to each section. In the first practice sample, the very first question seeks to determine if the reader can identify “tone” in writing.
However, in order to arrive at the correct answer the reader must also know the definition of “sarcasm,” “distorted descriptions,” “vivid word choice,” “sophisticated sentences,” “figurative language,” and “humorous comparisons.” The correct answer is “D,” the author intending his figurative language and humorous comparisons to create an amused tone. Amused? Doesn’t that depend on one’s sense of humor? The paragraph begins with the “studio was a mess,” and end with “The place was a dump.” For the humor-challenged among us item C “sophisticated sentences” (with dashes and semi-colons) and “ornate language” could be a “superior tone.” Thus, in order to get this item correctly our hypothetical student has to have a vocabulary background sufficient to understand the alternatives, and has to have a sense of humor. And herein we run into the Multiple Choice Question Wall.
A criterion referenced test, no matter how sophisticated, no matter how carefully designed, and precisely structured, has to be completed in a reasonable amount of time. The sample test includes two items related to “tone” and “mood.” In other words, there are essentially two items on the high stakes test which are designed to determine if the reader can identify these characteristics of writing, and one of them may partially depend on whether or not the reader has a normally developed sense of humor, or maybe any sense of humor at all? However, it would be impossible to publish a test that gives a precise determination of whether the reader can identify “tone,” or “mood,” without making the test unrealistically long. The following is one of the best summations of that Multiple Choice Question Wall issue:
“Given the size of the content domains to be represented and the limited number of items that the test developers have at their disposal, standardized achievement tests are really quite remarkable. They do what they are supposed to do. But standardized achievement tests should not be used to evaluate the quality of education. That’s not what they are supposed to do.” [ASCD]
How long might an examination have to be in order to fully and precisely determine whether or not a Nevada high school senior had adequately mastered the reading and language arts standards currently adopted? (pdf)
Then there’s the matter of the “cut off” score. Whether the bottom line is a good old fashioned one – no, you don’t get a driver’s license without answering 80% of the questions correctly; or, whether there’s a fancy new way to derive “passing” scores like the Nevada Department of Education’s “compensatory” system [NDE ppt update 4/15 #5] – a bottom line is still a bottom line. Once again we run headlong into a structural issue which informs how we should, or should not, be using the results.
What if we gave a test and everyone passed it? In an ideal world, wouldn’t every student in every class in the entire state pass a proficiency test (or end of course exam) based on carefully crafted criteria?
Is there such a thing as “too many test takers passing” an examination? One of the classic issues with norm-referenced tests is that most of the questions are of the middling variety, i.e. not too hard and not too soft. There’s no way to differentiate among students without Score Variance. We can say an individual “exceeds” the standards, “meets” the standards, or “fails to meet the minimal standards,” and we’re still talking in Goldilocks and the Bears terms – it’s too cold, too hot, or just right.
Human nature being what it is, and American definitions of progress being what they are, the tendency is to follow up success with more opportunities for failure. If too many youngsters are in the “just right” category, then the test must be “too easy.” Thus we negate the purpose of the criterion referenced testing – to find out if the young test takers are learning what they’re supposed to. Unfortunately, the better the job teachers do in imparting significant knowledge or training young people in important skills, the less likely there will be items on a norm-referenced test measuring those bits, (see Score Variance) and the greater the likelihood we’ll move the goal posts in criterion referenced testing.
Meanwhile back in the billfold: Test manufacturers and their $2 billion annual revenue business will continue to ride the political winds in which some blow-hards declare that satisfactory test scores should be a prime method to measure “effective teaching.” Thereby conveniently ignoring the confounding causality trap, and using the results to (1) declare Schools Are Failing and should be (2) the subject of True Universal School Choice. As long as politicians are perfectly willing to out-source the evaluation of schools and classrooms to corporate interests, corporate interests will be just as willing to manufacture “assessment” and “measurement” vehicles for this spiraling cycle. With, of course, the local taxpayer footing the bill.