Special Report
Teaching

Performance Assessment: 4 Best Practices

By Stephen Sawchuk — February 05, 2019 | Corrected: February 06, 2019 8 min read
Griffin Walsh plays Kindville at Newnan Crossing Elementary School in Newnan, Ga. Some schools in the state, including Newnan Crossing, are piloting Kindville, a new formative education assessment program which looks, and plays, just like a video game, but will eventually spit out qualitative math and reading scores.
  • Save to favorites
  • Print

Corrected: An earlier version of this story contained an incorrect job description for Paul Leather. He oversees state and local partnerships for the Center for Innovation and Education.

Let’s get this out of the way first: Performance assessment—the idea of measuring what students can do, not merely what they know—is not a new idea in K-12 education.

Teachers have been told to engage students in projects at least since the days of John Dewey, and probably long before that. (The famous Socratic method, after all, requires students to advance and sustain their positions in an argument, not repeat back knowledge.)

Nevertheless, performance assessment has a bit of a riddled history in the United States. In the 1990s—the last major period of experimentation—it was tried at scale and then abandoned in Kentucky, Maryland, and Vermont.

The challenges begin with a definitional problem: Does an essay test count as a performance assessment? What about a short response on an otherwise multiple-choice test? Experts disagree, and such quibbles have fueled confusion over how to measure “authentic” student performance.

Education Week‘s special report sets out to inject some clarity into the debate. In this report, you’ll find a glossary, examples of districts slowly expanding their use of capstone projects for graduation, states encouraging the use of tests that feel more like games, and colleges exploring whether demonstrations of competency can supplement traditional application credentials like seat time and transcripts.

The landscape of performance assessment remains hard to parse. Although most testing experts agree that it’s trending again, it’s unclear how widespread performance assessment is. Just ask Jenny Poon, a fellow at the Center for Innovation in Education, which advises four states on the development and use of the tests. She’s worked to create a continuously updated, comprehensive map of state action.

The problem is that, even in states that have policies supporting performance testing on paper, districts vary greatly in how rigorously they implement those ideas.

Still, Poon said a few trends help to raise interest in measuring student performance in richer ways than multiple-choice questions. About 20 states now use the Next Generation Science Standards, she points out, which specifically require students to engage in scientific practices, such as generating hypotheses and recording data from experiments.

A second thrust is states’ interest in better gauging what high school graduates know and can do, as evidenced by the spread of state and local adoptions of diploma seals and capstone projects—presumably a firmer indication of student ability than credit hours.

Experts also know more about performance assessment after years of experimentation. So as you read the report, keep in mind some of the four big lessons they’ve offered up, which are distilled here for you.

1. Decide on goals first.

First and foremost, the experts say: Know why you want the assessment and what benefits you expect to achieve by investing in it.

“There’s no point in teaching someone to write an article for a newspaper and giving them a multiple-choice test to see if they’re able to do that,” said Scott Marion, the executive director of the Center for Assessment, which advises states on testing. “Performance assessment is made for those situations. But if you’re filling in grammar rules, then maybe multiple choice is fine.”

A related issue concerns how the results will be used. Performance assessments are generally more difficult to standardize and less likely to produce comparable results for individual students. That’s probably OK if the test is being used mainly to supplement curriculum or for classroom grading. But it’s a bigger problem if you want to use it for making decisions about whether a student should graduate from high school or for school ratings.

One well-known mishap occurred in Vermont in the early 1990s, when the state’s portfolio-assessment program rolled out. The program used teachers to score collections of students’ best math and writing work. Early results showed that the degree of agreement among teachers’ scores, known as rater reliability, was initially fairly low. In retrospect, RAND Corp. researcher Brian Stecher, who helped evaluate the program back then, wonders whether leaders there got the focus wrong.

“I think what was really beneficial in Vermont was the fact that this broadened to some extent how teachers were teaching mathematics, instead of a reductive ‘I do, we do, you do,’ ” Stecher said, referring to a common teaching method taught during teacher preparation. “That seems like a good thing to me and valuable in its own right—and might have been a better use of this unstructured portfolio than trying to have it be the basis for a standardized judgment.”

2. Keep costs in mind.

Coming up with good performance tasks can be expensive as well as time-consuming. In short, it’s hard to do performance on the fly or on the cheap. That’s especially the case if what’s valued is the comparability and reliability of scores, which requires creating and field-testing many tasks.

“When you open up assessments to getting students a wide range of response possibilities in terms of format, length, and activities, then it just becomes very hard to manage the time, and materials, and scheduling. It becomes hard to incorporate it into a structured system of assessments, and it also becomes more expensive,” Stecher said.

That’s one reason so few states have done so at scale under federal annual-testing requirements. New Hampshire, the sole exception for now, is using some traditional exams in the years it doesn’t administer its locally developed performance measures.

Finally, even if a performance exam is only used locally or for classroom purposes, teachers must invest time and energy to familiarize themselves with its scoring frameworks to make sure they’re grading fairly. Many districts with expertise in performance assessment, in fact, use blind scoring or double reviews of student work—and all that takes time.

And while teachers are generally more knowledgeable about scoring frameworks, or rubrics as they’re called in the field, than they were 20 years ago, there’s still often an expertise gap for teachers who are used to fill-in-the-blanks and true-false questions, said Steve Ferrara, who oversaw Maryland’s now-defunct performance-assessment program in the 1990s. (He’s now a senior adviser at Measured Progress, a testing company.)

3. Prioritize teaching and learning—not just testing.

Performance assessment in education should be part and parcel of reforms to teaching and learning.

Much of the criticism of multiple-choice tests is that they encourage teachers to focus on low-level, easily measured skills. The inverse should be true, too: Give students rich assessment tasks worth teaching to and help support educators to redesign their instruction to boost development of skills like analysis and inference.

In fact, studies from the 1990s on the Maryland State Performance Assessment Program found that under it, teachers had higher expectations for the learning of their students, and principals had higher expectations on what they expected teachers to do. Schools with a high degree of curriculum alignment to the tests showed the most improvement, Ferrara said.

In other words, performance assessment truly requires system change.

“If you don’t include at least parallel reforms in teaching and learning, an assessment isn’t enough,” Marion warns. “You have to improve the meaningfulness of the content, instructional quality, and improve student engagement, too. If you’re not doing those three things, then you’re just rearranging the deck chairs.”

There are also technical reasons why the mirroring of testing and instruction is desirable: Performance assessment hinges on students having had enough exposure to the content and skills needed to complete the task. Otherwise, the assessment might measure generic problem-solving intelligence, rather than how well students grasp and apply what they’ve learned, noted Sean P. “Jack” Buckley, the head of the U.S. Department of Education’s statistical wing from 2010 to 2013, during which he oversaw the development of the agency’s first performance tasks for exams administered as part of “the nation’s report card.”

“This was always something we worried about,” he said. “It is way easier to make a hard test that smart people can do well on than one that shows growth tied to teaching and learning.”

4. Plan for scaling up the exams—and communicating the results.

Parents and teachers can be a performance assessment’s biggest boosters or its toughest foes, which means it’s key to keep them apprised of the assessment program and the logic behind it as it’s piloted, rolled out, and scored.

Teachers, the experts say, should especially be intimately involved in test design and communications.

“It takes time to build the capacity to build quality assessments; it’s almost an apprenticeship approach,” said Paul Leather, who helped get New Hampshire’s performance-assessment system off the ground and now oversees state and local partnerships for the Center for Innovation and Education, a research and consulting group.

“As we built our common tasks, we selected content teacher-leaders who led development of the content and the common tasks,” he said. “Over time, they start to lead the entire system because assessment literacy has reached such a high level, and we believe that actually has to happen for this kind of system to scale. You essentially create a way in which expertise is not just shared as a product, but something that helps others to gain that expertise over time.”

Even when teachers are involved in task design, they can feel left behind without the right training and supports, Ferrara cautioned. “It took so much effort in the first few years [of MSPAP] to get the program up and running that all the investment went into the assessment program and not into” professional development, he said. In fact, he recalled, missing materials and a lack of training in the 1992 assessment administration raised teacher ire and got the test slammed in newspapers as the “MSPAP Mishap.”

Finally, as performance assessments yield more-nuanced information on students’ abilities, there’s a related challenge of communicating those results. For six years, Maine required high schools to prepare students to demonstrate competency in eight subjects to earn a diploma. But the experiment faltered in part because districts struggled to communicate what the new grades, often issued on a 1-to-4 scale, meant—and how they’d affect students’ chances of getting into college, according to news reports on the system. By 2018, the pressure caused state lawmakers to roll back the requirements, giving districts the option to return to traditional diplomas.

A version of this article appeared in the February 06, 2019 edition of Education Week as Four Lessons Learned When Teachers Went Beyond Bubble Tests

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Artificial Intelligence Webinar
Decision Time: The Future of Teaching and Learning in the AI Era
The AI revolution is already here. Will it strengthen instruction or set it back? Join us to explore the future of teaching and learning.
Content provided by HMH
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
School & District Management Webinar
Stop the Drop: Turn Communication Into an Enrollment Booster
Turn everyday communication with families into powerful PR that builds trust, boosts reputation, and drives enrollment.
Content provided by TalkingPoints
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Special Education Webinar
Integrating and Interpreting MTSS Data: How Districts Are Designing Systems That Identify Student Needs
Discover practical ways to organize MTSS data that enable timely, confident MTSS decisions, ensuring every student is seen and supported.
Content provided by Panorama Education

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Teaching What's the Ideal Classroom Seating Arrangement? Teachers Weigh In
Educators employ different seating strategies to optimize student learning.
1 min read
swingspaces pgk 45
Chairs are arranged in a classroom at a school in Bowie, Md. Classroom seating is one of the first decisions educators make at the start of the school year, and they have different approaches.
Pete Kiehart for Education Week
Teaching 'There's a Firehose of Information': Talking to Students About Minneapolis
Find curated coverage on discussing confusing, scary, or politically charged topics in the classroom.
2 min read
A child kneels in the snow among demonstrators holding signs during a news conference at Lake Hiawatha Park in Minneapolis, on Jan. 9, 2026, demanding Immigration and Customs Enforcement be kept out of schools and Minnesota following the killing of 37-year-old mother Renee Good by federal agents earlier on Wednesday.
A child kneels in the snow among demonstrators holding signs during a news conference at Lake Hiawatha Park in Minneapolis on Jan. 9, 2026, demanding Immigration and Customs Enforcement be kept out of schools following the killing of Renee Good by federal agents.
Kerem Yücel/Minnesota Public Radio via AP
Teaching Opinion The Most Exhausting Part of Teaching Isn't the Students
Teachers reveal what drives them from the field and what leaders can do to improve teachers' lives.
9 min read
Conceptual illustration of classroom conversations and fragmented education elements coming together to form a cohesive picture of a book of classroom knowledge.
Sonia Pulido for Education Week
Teaching In Their Own Words ‘Normal Looks Different’: Teaching Through Fear in Minneapolis
Tracy Byrd, a 9th grade English teacher, shares what teaching entails as federal agents patrol his city.
8 min read
MINNEAPOLIS, MN, January 22, 2026: Ninth grade teacher Tracy Byrd helps student Avi Veeramachaneni, 14, with his final essay on the last day of the semester at Washburn High School in Minneapolis, MN.
Tracy Byrd helps students with essays on Jan. 22 at Washburn High School in Minneapolis. As immigration raids and protests have played out across the city, he and fellow educators have sought to create a stable environment for students.
Caroline Yang for Education Week