Testing Science and Scientific Inquiry

By Stephen Sawchuk — June 22, 2009 2 min read
  • Save to favorites
  • Print

I went to a really interesting session yesterday about ways testing experts are using computer technology to measure science content in novel ways. Basically, the computer offers ways of testing students’ knowledge of science and the scientific process, as well as ways of simulating content that can otherwise be dangerous (like chemistry experiments) or processes like erosion that occur over thousands of years. And, proponents say, it’s a way of increasing cognitive demand in testing and getting at students’ problem-solving capabilities.

Minnesota has a science test it’s using for No Child Left Behind purposes that uses what its creators call “figural” items. They are built around a particular animated scenario, such as building an electromagnet or going for a bike ride, and go beyond multiple choice to ask students to apply higher-order critical-thinking skills. For instance, an item might ask students to “drag and drop” water molecules to demonstrate what happens when water evaporates, to record data points onto a graph, or to put the species of a food chain in order.

The National Assessment of Educational Progress, which is now conducting science testing, has a subset of students who will be assessed using “interactive computer tasks.” These go even a bit further than the Minnesota test by requiring students to engage in the entire process of scientific inquiry, by conducting simulated experiments, recording data, and using it to arrive at an answer. The test also records some students’ keystrokes, such as whether they were able to find and deploy the appropriate tools, how many “test runs” they performed in their experiments, and how they arrived at their answers.

On the face value of things, this seems like pretty cool stuff. The tests are much more interactive than multiple choice, closer to the interactive electronic devices kids use all the time these days. They allow for more sophisticated measurement than factual recall.

Now, with all of that, you may be wondering what the drawbacks are to this type of testing. Well, for one, these are complicated tasks, and they have to be capable of running on the equipment schools have, not all of which is cutting edge. About 185,000 students took the Minnesota science exam last year, for instance.

And then the scoring of these items can be difficult. When a question is open-ended, there have to be scoring guides and adjudication procedures for all the possible answers. The guides can go on for pages when there are hundreds of answers, rather than the typical four on a multiple-choice test.

For the NAEP test, it has meant constructing elaborate computer protocols for extracting the evidence of students’ knowledge and skills from their keystrokes and rubrics for scoring the tasks.

There is also the technical challenge of designing these things without introducing “construct irrelevance"— that is, making sure that the interface, graphical elements, and help menus that make up these test items don’t detract from the content, concept, or skill that’s being tested. Or, as one person who presented put it more succinctly, “We can’t use Guitar Hero [a popular video game] for measuring science.”

What does this all mean? Well, as one of the panelists suggested, has our ability to develop clever science items gotten ahead of our capacity to generate meaningful data about student performance from those items?

I’ll let you be the judge.

A version of this news article first appeared in the Curriculum Matters blog.