A pair of commenters raised some provocative questions in response to my recent post We Cannot Solve the Problems With Tests by Creating More of Them.
P.J-D takes issue with another commenter, jfon, who wrote;
The tests offer little diagnostic information... offer little or no evidence they enhance or acurately measure learning.
P.J-D replies:
Depending on the test, these critiques may be true, but they also apply to the classroom grades generated by teachers. Until another method for legitimizing classroom grades is presented, state testing will be the go-to option.
This is an interesting rebuttal, in which P.J-D concedes that test scores are no better than teacher scores, but insists that teachers must meet some higher standard, demonstrating even greater value.
J.T. Steedle, who apparently does research in the area of test accuracy, chimes in with this view:
RE: "...I do not believe that any test that is mechanically graded, or even graded by low-paid humans, can successfully measure critical thinking and problem-solving."
Fortunately for all of us, reality is not dictated by beliefs. As automated essay scoring pioneer Ellis Page pointed out, we should be satisfied with automated scoring if we cannot distinguish human-generated scores from machine-generated scores. Ironically, human and machine scorers can be distinguished because the automated scoring engine scores more consistently with humans than the humans score with each other.
This remain true even when the task involves critical thinking. Complex, open-ended performance tasks administered as part of the Collegiate Learning Assessment and the College and Work Readiness Assessment are primarily scored by Pearson's Intelligent Essay Assessor (IEA) program. IEA's agreement with human scorers is consistently higher than human scorers' agreements with each other.
Who would you trust to score such tasks? Low-paid teachers?
First of all, it is ironic that J.T. Steedle seems to think the pay teachers receive is relevant. This seems rather a circular argument, where our profession is discredited as being untrustworthy and thus unworthy of decent pay, and then this low pay is used as a REASON we should not be trusted, because, presumably, who but an intellectual midget would agree to work in such a poorly rewarded position?
In reality, I believe teachers are capable of assessing students in deeper and more meaningful ways than even the almighty Pearson’s Intelligent Essay Assessor.
When I work with a teacher to design a Project-Based Learning unit, we create a series of activities and challenges that build up students’ expertise in a subject. Then the students are usually challenged to write about what they have learned, and perhaps create a presentation or display of some sort to share what they have learned with others. This sort of assessment builds learning in multiple dimensions. As the project unfolds, we are able to assess how the student collaborates with peers. In science, our projects often include elements of the scientific processes of inquiry and investigation, so we can assess our students’ ability to ask good questions, and design experiments that will reveal new information. When our students design their displays or presentations, we can assess their ability to communicate their ideas and knowledge to others. And in their reports, we can assess their deeper understanding of the science involved. In the most ambitious projects, students delve into ethical and social issues, and think critically about the possible solutions to issues they have encountered. These open-ended explorations are what give real life to this work.
This does not preclude other means of assessing student understanding, and often teachers assess students with more traditional means, as a way to make sure that they have, indeed, learned the basic content as well as the deeper understandings we aim for with more complex tasks. But when we go to the tests, we lose the capacity to explore these deeper understandings, especially when they go beyond our conventional thinking.
I must plead to a bit of ignorance when confronted with Pearson’s Intelligent Essay Assessor. I have not met the mighty machine, nor seen it in operation, except on some very limited samples. I tried to test it out at the site where a demo is offered, but I could not make the text box accept my writing. It may be of some limited use in scoring student essays on pedestrian topics. But if it’s job is to relieve our concern that teachers can reliably score complex work, I think I have a rather deep mistrust.
Another commenter, PL Thomas, makes an interesting point in this regard.
Consistency, like standard, is not what matters, especially when trying to offer valuable feedback about writing. . .
The use of computers and machines and automation -- all the WRONG GOALS to dwarf authentic goals. . .That's the inherent problem with a century of pursuing "standardized" and so-called "objective" tests. . .
To increase consistency, we lose the most important element--the human element. . (read more of Paul's thoughts here).
I want teachers capable of challenging their students intellectually. I want teachers to challenge student’s thinking. But I also want students to be able to challenge the teacher’s thinking. How does one challenge the thinking done by the Pearson’s Intelligent Essay Assessor?
What do you think? Do you think we should rely on the Pearson Intelligent Essay Assessor? Or should we trust our teachers -- regardless of their pay?