All Means All

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

All means all. That’s the consistent message from federal officials. States must test all students with disabilities, among others, report the results, and, with few exceptions, use the scores to judge schools.

But how to do so in a way that’s fair and accurate can elicit as much controversy as clarity.

“I think it’s one of the most difficult challenges in measurement, frankly,” says Daniel M. Koretz, a professor of education at Harvard University.

Take the issue of accommodations, the most common means for giving students with disabilities greater access to standardized tests. Accommodations are changes in test materials, procedures, or settings that are designed to eliminate barriers to performance related to a student’s disability. Generally, students receive the same accommodations during testing that they receive during regular classroom instruction, as required by their individualized education plans, or IEPs.

Commonly used accommodations include providing a Braille version of an exam for a student who is blind, permitting a student to mark answers in a test booklet rather than on a separate answer sheet, using a computer or word processor, dictating responses to a scribe, providing an interpreter for a student who is deaf, offering large-print editions of tests, and allowing frequent breaks during testing.

When 12-year-old llana Kahan and her 15-year-old brother, Alex, take standardized tests, for example, they’re typically in a room with just a few other students. The test directions are read aloud to them, sometimes more than once. On mathematics tests, they can use a calculator. And the brother and sister have extra time to finish the exams.

Such changes are essential for the children, who have learning disabilities, to show what they know and can do, says their mother, Jouette Kahan. “Without extended time, without the use of a calculator, without being able to clarify directions, my children probably really couldn’t take these tests and be successful,” says Kahan, whose children attend school in Montgomery County, Md.

Preliminary results from State Accountability for All Students, a three-year, national research project at the University of Dayton that began in 2001, found that increasing the accommodations permitted on state tests boosts the participation rates of students with disabilities. In states with 25 or more unrestricted types of accommodations, about 75 percent of such students took elementary reading tests, compared with 58 percent in states with fewer accommodations. Similarly, about 77 percent of elementary pupils with disabilities participated in state math tests in such states, compared with 60 percent in states with more restrictive lists.

But research has failed to provide simple or conclusive answers about how specific accommodations influence test scores. Often, it’s unclear whether a specific accommodation or a combination of them actually helps special education students or--of equal concern--gives them an unfair advantage. Another worry is that some adaptations--most notably, reading passages of a reading test out loud to students--may change the nature of what’s being tested.

“The simple answer is, we know very little,” Koretz says. “It means that people have to fly, to some degree, by the seat of their pants.”

With limited research to guide them, and differences in their assessments, states vary widely on which accommodations they do or don’t allow on state tests. Sometimes, an accommodation permitted in one state may be prohibited in another, even when the same test is used.

“States seem to agree that it’s better to list more than fewer accommodations,” says Martha L. Thurlow, the director of the National Center on Educational Outcomes, a research group at the University of Minnesota that tracks such policies.

She notes that a decade ago, few states even had written guidelines. “But there still isn’t a lot of real consistency,” Thurlow says. “And I believe that’s because the accommodations that states decide are OK or not OK reflect attitudes and beliefs.”

‘Over- and Under-Accommodated’

One concern is that differences in accommodation practices and who receives them make it hard to interpret or compare test results.

Federal law requires states and districts to provide “appropriate accommodations” to students with disabilities on tests “where necessary.” But the actual choice about how special education students take part in state tests and which accommodations they receive rests with their IEP teams.

Research suggests such teams, which often lack expertise in assessment, may be ill-equipped to make such decisions. Kenneth Olsen, the director of the Alliance for Systems Change, based at the University of Kentucky, this past year surveyed 22 states about the training they provided teachers and others about the use of accommodations during testing and instruction.

“We just found that not much training was going on,” he says. “And those who were doing the training were doing it on a catch-as-catch-can basis.”

Partly as a result, he says, “we’re finding kids are both over- and under-accommodated. In some cases, we find that the local people feel like they’re cheating when they make accommodations, and so they don’t.

“Then, on the other side,” he continues, “you have teachers who look at the list and say, ‘I’m going to give this kid every break I possibly can.’ If there are accommodations, they pile them on. So we have both ends of the continuum.”

Stephen Tollafield, a lawyer with the Oakland, Calif.-based Disability Rights Advocates, argues, “Students should be able to use on a standardized test any accommodation that they have in a classroom that they use every day to demonstrate their knowledge.”

But often, test publishers and states distinguish between “standard” accommodations and what are called “nonstandard” accommodations or “modifications,” which they’re concerned alter the nature of what’s tested and invalidate the test score.

One of the most controversial examples involves reading questions and passages aloud to students, particularly on reading tests, a practice some states bar and others permit. This past fall, 30 Maryland elementary schools failed to meet their performance targets under the federal No Child Left Behind law after the state invalidated the scores of 3rd graders who had questions on a reading exam read out loud to them. School officials in the state’s Montgomery County district complained that they were caught between providing accommodations specified in the students’ IEPs and the new accountability requirements. In part, that’s because of continued confusion about whether IEP teams can select an accommodation that a state has not approved.

In an April 2003 letter to New York state education officials, Robert H. Pasternack, the then U.S. Department of Education’s assistant secretary for special education and rehabilitative services, indicated that states have the authority to instruct IEP teams to select only accommodations that a state has determined would not change the nature of the test.

“We agree that states must have the ability to ensure that state assessments are valid, reliable, and consistent with professional and technical standards, especially when the results will have critical consequences for the student or the’ school,” Pasternack wrote. “This is especially important, given the emphasis under No Child Left Behind on accountability for results.”

But Tollafield says parents and students often are poorly informed about the consequences of taking a test with accommodations or modifications.

A federal district court in California ruled in 2002 that the state must provide accommodations and modifications to students taking its high school exit exam in accordance with a student’s IEP. It also directed the state to offer an alternate high school assessment for students whose IEPs required it. State officials had not planned to provide accommodations unless a student received a waiver from the state, and Tollafield says some special education students were forgoing accommodations entirely for fear that using one would invalidate their scores.

How to report and count scores from nonstandard accommodations is a major issue for states. Education Week’s survey for Quality Counts 2004 found that 15 states forbid students to take state tests with modifications but have no further policies. Ten states exclude the results of tests taken with modifications when calculating proficiency rates. Eighteen states automatically give tests taken with nonstandard accommodations a zero or a score below the “proficient” level.

“The difference between an accommodation that a child might need and a modification that changes the construct of the test is a line that’s caused some challenges,” says Stephanie Lee, the director of the office of special education programs in the federal Department of Education. The department plans to provide additional guidance on the matter.

“We know that assessment accommodations are right. These children have a right to accommodations,” says Margaret J. McLaughlin, a professor of special education at the University of Maryland College Park.

“We also know that not every test can accept every accommodation,” she continues. “At certain points, you have to say if you’re really measuring how well a child can decode and read a text on a page and make meaning out of it, it doesn’t seem possible that you could ever allow that test to be read to a child and call it an acceptable accommodation.”

Alternate Assessments

For students who can’t take state tests even with accommodations, both the Individuals with Disabilities Education Act and the No Child Left Behind Act require states and districts to provide “alternate assessments.” In 1995-96, only six states offered students with disabilities an alternative to the regular state test. The 1997 reauthorization of the IDEA requires every state to begin using such measures no later than July 2000, but provides little guidance about what they should look like. As late as 2002-2003, more than 20 states had conditions tied to their receipt of federal IDEA grants either because they were not providing alternate assessments or were not reporting those scores, says Lee.

By this school year, though, every state had at least one alternate assessment available for special education students or allowed districts to develop such tests, ranging from a portfolio of work, to performance tasks completed over a period of days or weeks, to observational ratings completed by classroom teachers. See story, Page 79.

But before enactment of the No Child Left Behind law, the federal government had not really reviewed states’ alternate assessments or demanded that the results be used to rate schools. Now, the stakes are much higher.

As with accommodations, who should take such assessments, how to score the results, and how to fold those results into an overall picture of school performance are fiercely debated.

For example, should alternate assessments be limited to students with the most significant cognitive disabilities, or be available to any youngster who, for one reason or another, cannot take a standardized test even with accommodations? If a student receives a “proficient” score on an alternate assessment that is not aligned with grade-level performance standards, should that count as proficient when calculating school ratings in the same way as a proficient score on a regular grade-level test?

The debate escalated in 2002, when federal officials proposed limiting to 0.5 percent of the tested population the proportion of students who could take an alternate assessment linked to other than a grade-level performance standard and still have it count as proficient for calculating “adequate yearly progress” under the federal law.

Federal officials had been trying to provide an incentive for schools to pay attention to students with the most severe cognitive disabilities, by giving credit for such youngsters’ progress. But many educators complained the cap was too rigid. After a barrage of criticism, the Education Department raised the proposed cap to 1 percent, and was expected to issue final regulations late last year.

While some educators believed the 1 percent cap was reasonable, given the proportion of special education students who now take alternate assessments in most states, others protested that even the 1 percent figure was arbitrary and too low. They worried it could discourage school officials from suitably testing many children with severe disabilities, even though the policy does not limit the percent of students who can take alternate assessments.

“The risk is that some students may not be assessed appropriately as the cap influences decisionmaking,” wrote Lydia Calderon, with the office of special education in the Michigan education department, in commenting on the proposed rules, “or that students who are assessed appropriately, but their numbers exceed the cap, become those students who do not count, whether or not they are proficient within the framework of the alternate standards.”

Nationwide, the percent of students with IEPs who took alternate assessments in 2002-02 varied considerably. While less than 1 percent of students tested in grades 3-8 and 11 in South Dakota took such exams, 11.6 percent of students in grades 3-10 in Florida did so.

While the proposed rule would compel states to tie alternate assessments to the same academic-content standards used for other students, that has not always been true in practice.

According to the National Center on Educational Outcomes, five states and the District of Columbia base their alternate assessments on grade-level content standards. Thirty-two states use “extended” or “expanded” content standards that try to get at the essence of the content standard, but for students at a much lower skill level. Alabama and Minnesota build their alternate assessments around functional-living skills, such as learning to shop or cook, that are not related to the state’s content standards. And four states used a combination of the two.

Gerald Tindal, a professor of education at the University of Oregon who’s been researching alternate assessments across states, says “the fit between content standards and alternate assessments is really loosely linked at present.”

In some states, he contends, “the goal of the alternate assessment is to grab any behavior that the kid can be successful at,” with only the remotest link to content standards. Other states, he says, have tried to convert their content standards into skills that can be measured in a functional context, such as personal hygiene, “and that’s often been a big stretch.”

“I think states are now realizing, OK, we’re going to have to pay attention to these content standards’ having some kind of integrity,” Tindal says.

Without extended time, without the use of a calculator, … my children probably really couldn’t take these tests and be successful.

But opinions on how to measure the knowledge and skills of students with the severest cognitive disabilities--and just how much can be expected of them in mastering academic content--run deep. In response to the proposed rules, Jack Beard of Urbana, Ohio, wrote: “I have a son who is 18 years old and functions as a 2-year-old. He needs to be taught how to perform basic functions of living, not social studies. He cannot read, write, or even talk! It is a waste of time to tie in what he needs to learn, in order to survive, with state standard tests!”

In contrast, Sue Gibson, the parent of a 17-year-old son with developmental disabilities in Kirksville, Mo., wrote: “I fully understand that there are children who will never achieve grade-level performance, but as a parent, I don’t want my child thrown out of the mix altogether.”

“I want real goals that will translate into my son having what skills he needs to live a full life in the community,” she continued. “That means that he has to be able to have basic reading and writing and math skills. I know he can learn and do these things, especially if it is truly required and mandated to be accomplished.”

‘Gap Kids’

One reason the proposed rules about alternate assessments have been so controversial is that many states are struggling with how best to evaluate what some have called “gap kids.” Such students perform at too high a level to take an alternate assessment designed for youngsters with severe cognitive impairments, but at too low a level to show what they know and can do on state tests for students at their chronological grade.

“What happens to the kid who’s in the regular curriculum but six grades below grade level?” asks Edward G. Roeber, a vice president for Measured Progress, a Dover, N.H.-based testing company. “We don’t really have, in most large-scale assessments, tests that will work for these kids who fall into the gray area of policy. We would like to assess them at a level that doesn’t frustrate them.”

Under the No Child Left Behind law, that’s exactly what will happen, some fear, as those children are required to perform at grade-level standards and have the results count for their schools. In part, the law is trying to push schools to educate students who, in the past, may not have received appropriate instruction because expectations were so low.

“To take kids four years behind in reading, and put them in front of a test that’s full of reading and writing that’s four years above their grade level, and say, ‘This counts for the school,’ is just plain mean,” Koretz of Harvard asserts. “It doesn’t do the kids any good whatsoever.”

“We’ve identified them as failures,” agrees Marjorie K. Gray, the director of special education for the Oxford Hills district in Maine. “And worse, what we have done, because of the structure of No Child Left Behind, is we’ve identified their school as a failure.”

Some states have dealt with the issue by providing out-of-level tests for such youngsters, or tests designed for a grade level lower than the one in which the students are enrolled. Eighteen states allowed out-of-level testing in the 2002-03 school year, according to the National Center on Educational Outcomes.

But out-of-level testing is highly controversial.

“There are a lot of policy concerns about whether parents know and are informed that their students are being instructed below where they should be, whether it simply reflects low expectations for students who could do better,” says Thurlow of the University of Minnesota.

Data from some states show that many students who took out-of-level tests scored high on such exams, suggesting they could have been in regular, grade-level testing, she says. Moreover, out-of-level test scores are rarely reported publicly or used for either student or system accountability.

Yet some contend that out-of-level testing may be the best way to capture what some students know and can do. Susan Agruso, the assistant superintendent for instructional accountability for the Charlotte-Mecklenburg public schools in North Carolina, says it’s “silly” to give an 8th grade test to a student who is being instructed at the 4th grade level, “when that’s a stretch curriculum for him.”

She would like her state to offer out-of-level tests to such students and count their scores as proficient if they meet the goals spelled out for them at their instructional levels. Currently, such students take North Carolina’s Alternate Assessment Academic Inventory. The state automatically counts their scores as not meeting the standard--a result that has led some elementary schools in Charlotte-Mecklenburg to be identified as not making adequate progress under the No Child Left Behind law.

Initially, the federal Department of Education indicated that it would prohibit the use of out-of-level testing entirely for purposes of complying with the law. The measure explicitly mandates that students be assessed against grade-level content standards. Then, in a letter last summer, Secretary of Education Rod Paige wrote that states could continue to count the scores of students who took “instructional-level tests” in 2002-03 for purposes of adequate yearly progress for this school year only.

‘Huge Concessions’

To some scholars, the larger problem is retrofitting standardized tests to measure the knowledge and skills of youngsters whom they were never designed to assess in the first place.

If states had clearer content standards about which topics and skills are essential for students to learn, and publishers had narrower and more specific definitions of what their tests measure, those experts maintain, it would be easier to tell whether altering a testing practice for special education students was appropriate.

In the long run, they point to the hope of “universal design” as a way out of the dilemma. Taken from the same concept in architecture, the idea is to make tests available to the widest possible range of students from the start, much like drawing on-ramps and wider doorways into building blueprints. That might mean expunging unnecessary verbiage from a test that isn’t meant to measure reading or vocabulary skills, or piloting tests with a sample of students who better reflect the eventual test-taking population.

What happens to the kid who’s in the regular curriculum but six grades below grade level?

But most admit that universal design hasn’t reached the point where the federal government could take the concept into account to review state testing systems later this year.

“The way I describe this in public is that both sides have had to make huge concessions,” says Daniel Wiener, Massachusetts’ director of student-assessment services. “Special educators always felt that if a kid was in danger of failing a test, he didn’t have to take it. Now that students with disabilities are required to be included, the assessment system has had to make concessions to include them. They’ve had to provide a longer list of accommodations that sometimes went beyond the point they may have wanted to go. They’ve had to develop alternate assessments, usually expensive on a per-kid basis and requiring a lot of training of teachers. So everybody has had to meet in the middle.”

Lynn Olson

Lynn Olson was managing editor of special projects for Education Week. She also covered national policy (including “P-16 issues” issues, NCLB standards, accountability, and reform), assessment and testing.

In March 2024, Education Week announced the end of the Quality Counts report after 25 years of serving as a comprehensive K-12 education scorecard. In response to new challenges and a shifting landscape, we are refocusing our efforts on research and analysis to better serve the K-12 community. For more information, please go here for the full context or learn more about the EdWeek Research Center.

A version of this article appeared in the January 08, 2004 edition of Education Week