Eighteen states and the District of Columbia have adopted the Next Generation Science Standards, and many plan to officially start testing students on those standards in the spring of 2018.
But the tests are, for the most part, still in early development phases. And the question on many educators’ minds is: What will the final tests look like?
Last week, the American Institutes for Research, a research and evaluation nonprofit that is contracting with some states to design and implement NGSS tests, brought together psychometricians, science education experts, and state leaders for two days of discussion in Washington on how to turn the standards into state summative exams. The states represented included California, Connecticut, Hawaii, New Hampshire, Oregon, Utah, Washington, and West Virginia. (New Hampshire and Utah didn’t officially adopt the NGSS, but New Hampshire is considering adoption and Utah adopted standards based on the NGSS framework for grades 6-8.)
The meetings were closed to the press, but I spoke with Jon Cohen, the president of AIR assessment, and Gary Phillips, AIR’s senior vice president and an institute fellow, afterward about some of what the attendees decided at the gathering.
“The first thing you need to know is that although states are trying to collaborate and work together, the tests are going to differ across states,” said Cohen. That includes the form of the tests, how they’re structured, the questions used, and how they report scores.
For that reason, the results won’t likely be comparable across states. While two federally funded consortia were formed to create tests aligned to the Common Core State Standards, states are developing the NGSS tests for the most part on their own.
Even so, “there are going to be some things in common” across the tests, Cohen said.
Here’s some of what we know:
General Test Characteristics
- The tests are likely to be two hours maximum.
- They’ll be delivered online. Some states will use adaptive testing, meaning items will change in difficulty depending on how the test-taker is performing (like the Smarter Balanced assessments for the common-core standards).
- The tests won’t have many (or any) multiple-choice items. There will be open-ended responses, essays, and equations for students to complete.
- Most states will only test once per grade band (3-5, 6-8, and high school), since that’s all the new federal education law requires. That means they will cover topics from several grades in a single test. Utah, though, is likely to test every grade.
Test Questions
- Each item will test two or three of the NGSS dimensions. “In my experience looking at items people are actually coming up with across states, most often items measure some science content and a science practice,” said Cohen. “Sometimes they measure science content and a crosscutting concept. Almost never do they measure a crosscutting concept and a science practice, and almost never do they measure all three.”
- Many questions will use computer simulations and have students conduct virtual experiments. Students will, for instance, operate microscopes, use scales to measure weights, and make planetary models through the computer platform. Some states could choose to do manual tasks, in which students are given manipulatives and perform tasks with their hands. “But in order to get that done in a realistic way in the test setting, it has to be very prescriptive—there have to be not a lot of choices students make,” said Cohen. “On a computer, you can simulate pretty much anything.”
- Test items will often come in clusters, or several questions about a single topic that build off one another. AIR is recommending three to four items per topic cluster. However, some states will use as many as eight to 10 items per cluster.
Scores and Reporting
- States will likely report scores in one of two ways: 1) Students will receive one general science score and one score for each of the three dimensions (disciplinary core ideas, science and engineering practices, and crosscutting concepts), or 2) Students will receive one general science score and one score for each science discipline (physical, life, and earth and space sciences).
- Some states will choose to have all items scored online, which will allow for immediate results. Some states may opt for hand-scoring on some items, which will take longer.
- States will likely set their own cut scores, meaning scores won’t be comparable from one state to the next.
Cohen also walked me through a few AIR example items for the Next Generation standards. In the one below, you can see students use a virtual scale and create a bar graph (the original question is interactive, but this is just a static screenshot):
Related stories: