Assessment Chat

Testing and Accountability in the NCLB Era

David Figlio and Eduwonkette discuss if today's testing and accountability policies accurately depict student performance and the size of the achievement gap between groups.

August 19, 2008

Testing and Accountability in the NCLB Era

  • Eduwonkette pens an anonymous education research and policy blog hosted by She is known for being a “thorn” in the side of New York City Department of Education officials for dissecting – and often criticizing – the city’s testing and accountability policies.
  • David Figlio is the Orrington Lunt Professor of Education, Social Policy and Economics at Northwestern University and a Research Associate at the National Bureau of Economic Research. He has assisted the U.S. government, as well as several state governments and foreign countries in the design, implementation and evaluation of accountability policies.

Alexis Reed (Moderator):

Good afternoon, and welcome to Education Week’s Live Chat: Testing and Accountability in the NCLB Era. Joining us live are David Figlio, the Orrington Lunt Professor of Education, Social Policy and Economics at Northwestern University and a Research Associate at the National Bureau of Economic Research, and Eduwonkette, an anonymous education research and policy blogger hosted by In this afternoon’s Live Chat, they will discuss and answer your questions regarding testing and accountability in our current education system. I’m Alexis Reed, a research associate in the EPE Research Center, and I’ll be moderating this discussion with our guests, each of whom has a unique background and perspective on testing and accountability. We’re already receiving a tremendous number of questions for this chat, so let’s get right to them.

Question from Sherman Dorn, Professor of Social Foundations, University of South Florida:

The general theory of action for NCLB and other high-stakes accountability systems appears to assume the existence of magister economicus, the theoretically rational school employee. On the other hand, critics of NCLB, Florida’s systems, and others are concerned with the potential harms of irrational responses, unintended consequences such as narrowing the curriculum or teaching to the test. The critics seems closer to the mindset of behavioral economists.

Is there any research currently going on to determine if teachers are magisters economici, irrational actors, or a mix (and what type of mix)? David Figlio:


I think that the evidence is becoming clearer that many of the hopes of high-stakes accountability advocates and many of the fears of high-stakes accountability critics are correct -- school administrators and teachers can and do respond to accountability pressures, at least at the margins.

A number of recent studies have shown that schools subject to greater accountability pressure tend to improve student test performance in reading and mathematics to a meaningful degree -- my recent study of Florida with Cecilia Rouse, Jane Hannaway and Dan Goldhaber (working paper on the website of the National Center for the Analysis of Longitudinal Data in Education Research, or, for instance, suggests test score gains of one-tenth of a standard deviation in reading and math associated with a school getting an “F” grade relative to a “D” grade. We find that these test score gains persist for several years after the student leaves the affected school. Jonah Rockoff of Columbia University has a new working paper studying New York City’s rollout of school grades that suggests that responses to grading pressure seem to happen immediately -- grades released in November were mainfested in test score changes in the same winter/spring.

In the case of my study with Rouse, Hannaway and Goldhaber, we try to look inside the “black box” by studying a wide variety of potentially productive school responses, and it appears that Florida schools responded to accountability pressures by changing some of their instructional policies and practices, rather than “gaming the system.”

The rapid and apparently productive response of school personnel to school accountability pressure suggests that educators are, at least to some degree “magisters economici,” responding to the incentives associated with the system. And this makes getting the system right so important, because if schools and teachers respond quickly to incentives, the incentives had better be what society/policymakers want.

Many people raise concerns about teaching to the test, and there is certainly evidence of this -- consistently, estimated effects of accountability on high-stakes tests are larger than those on low-stakes tests -- though the low-stakes test results tend to be meaningful still, especially with respect to math. Harder to get a handle on is the narrowing of the curriculum to concentrate on the measured subjects; there is a lot of suggestive evidence that this is taking place to a small degree at the elementary level, though studies of the effects of accountability on performance on low-stakes subjects typically don’t find that performance on these subjects suffers -- but of course, those subjects are still being measured with tests. Still there is certainly the incentive to reduce focus on “low-stakes” subjects. One possible solution for those concerned about low-stakes subjects being given short shrift would be to impose requirements such as minimum time spent of instruction or portfolio reviews.

There is a lot of evidence that accountability systems can have unintended consequences that are predicted by the magister economicus model. Derek Neal and Diane Whitmore Schanzenbach at the University of Chicago note that accountability systems based on getting students above a given performance threshold tend to induce schools to focus on the kids on the “bubble.” I’ve found that that type of system may lead schools to employ selective discipline in an apparent attempt to shape the testing pool, or even to utilize the school meals program to artificially boost student test performance by “carbo-loading” students for peak short-term brain activity. These types of unintended consequences are much more likely in accountability systems based on the “status” model of getting students above a proficiency threshold, rather than the “gains” model of evaluating schools based on how much these students gain.

But there’s a tradeoff here. The more we evaluate schools based on test score gains, where gaming incentives are lower, the more the focus is taken off of poorly-performing students whom society/policymakers would like to see attain proficiency. How the system is designed is crucially important.

Question from Tom R. Hopkins, Consultant, American Indian Ed.:

Why do you assume test scores are the same as academic achievement? Based on my teaching experience, test scores cannot assess serious intellectual activities. Would it not be more accurate to say something like, “Test score analysis to measure how well students take such tests?


Good question. While I don’t assume that test scores are a perfect proxy for the full range of academic skills that we care about, they are at least moderately correlated with general competencies like students’ ability to read, think, compute, and analyze. As a result, I do think they have an important role to play in assessing our schools’ progress.

But I am sympathetic to your argument that these tests don’t assess complex analytical and intellectual activities. The real question – and the one that no one has the answer to, to my knowledge - is whether the test score increases we see under accountability systems translate to other venues in ways that improve students’ life chances over the long haul. Put differently, are kids now graduating from high school in Texas and North Carolina, who have gone through the better part of their K-12 educations under NCLB-like accountability systems, more likely to graduate from high school, go to college, and graduate from college than they would been otherwise? Are they more productive workers as a result? Better citizens? Question from Dr. Ellen Solek, Superintendent of Schools, East Haddam, CT School District:

Dear Dr. Figlio, Are you aware of any current research thas has, or is being conducted that determines correlation (if any) between K-12 student test scores, accountability, and future success in the workplace? Thank you

David Figlio:

It’s too early to know about the effects of accountability on workplace success. That said, there have been a number of studies that have linked K-12 test scores to labor market outcomes as adults. John Bishop at Cornell University and Francine Blau and Lawrence Kahn (also both of Cornell University) have written influential papers showing strong to very strong correlations between test scores and adult earnings. These papers use data that are decades old, however.

There is also newer evidence that college selectivity, which is associated with higher K-12 test scores, has important effects on wages in early adulthood (say, in one’s late twenties.) Mark Hoekstra of the University of Pittsburgh has written an excellent paper that carefully links attendance at an elite public university -- whose entrance criteria is partly determined by test scores -- to early earnings success, for instance. It will take another decade before we know the degree to which school accountability directly plays into this mix.

Question from Juan Lizama, Richmond Times-Dispatch in Richmond Virginia.:

Since NCLB has put more pressure to raise achivement for minority groups, I’m wondering weather their is testing data indicating that students with dissabilities and LEP are making progress?

David Figlio:

One of the unintended consequences of school accountability policies that has been the most studied involves identification and placement of students with disabilities, who in some accountability systems are exempt from the testing, or the school grading. Numerous authors have found that school districts apparently respond to the type of accountability system by increasing the rate at which low-achieving students are served by ESE services. This general finding can be viewed both favorably or unfavorably, depending on whether one believes that too many students or too few students had been receiving these services.

NCLB places a special focus on these students. While there has not been a lot of research yet published on the subject of whether NCLB is helping students with disabilities, there is some evidence from the recent National Concil on Disabilities report suggesting that students with disabilities are making progress under NCLB.

This too is consistent with the magister economicus model that Sherman asked about earlier. NCLB provides new incentives for schools to improve these students’ performance, and it seems like schools are doing so.

Question from Erin Johnson:

As there are only a few school systems around the world that have been able to improve their students’s learning unlike the experience that we have under NCLB, what aspects of those school systems should we /should we not incorporate into our schools?

David Figlio:

I think that the research evidence is pretty clear that there are no “silver bullets” -- few if any educational interventions have been found that dramatically and consistently improve student performance.

What is quite clear is that individual teachers matter a great deal. There is a lot of ongoing debate about how best to measure teacher value added -- the MIT Press/American Education Finance Association journal that I edit along with David Monk, Education Finance and Policy, will publish an entire special issue on the technical details and difficulties associated with evaluating teacher quality -- so I’d be lying if I said that I knew the best way to actually measure which teachers were doing the best and which teachers were doing the worst. But regardless of the method of measuring teacher quality, there exist massive differences in teacher quality within any given school system, indeed, within any given school. Some teachers, year in and year out, have students who consistently perform at extremely high levels and some teachers, year in and year out, have students who consistently perform at extremely low levels -- and there are lots of teachers in the middle.

The problem is that we haven’t figured out how to identify these teachers ex ante. Policies aimed at making teaching attractive to a wide range of potential teachers and rewarding teachers with demonstrated skills could help to make sure that our schools have more of the very successful teachers and fewer of the unsuccessful teachers. But these policies rely a lot of indentifying teacher success, and we’re still figuring out the best ways to do that.

There’s also a question of how widely to cast the net for potential teachers. Do we look only for teachers who have completed a full teacher training program? Do we bring into the classroom more teachers with little formal teacher training and provide considerable on-the-job training and mentoring? How do we retain the teachers we want to retain? A lot more experimentation is needed before we have the answers to these questions.

Question from Jerry Overmyer, Math and Science Outreach Coordinator, University of Northern Colorado:

In finding interventions that raise student achievement, what measures besides state assessments are as valid and reliable?


Ideally, you would have access to an audit test that is low-stakes. Many school districts give a second test of student achievement that holds no stakes for schools, teachers, or students. The national studies run by the National Center for Educational Statistics provide another example – there are no stakes attached to these tests and thus these studies provide much more reliable test score data than state assessments.

Question from Diane Ravitch:

It appears that accountability as practiced at present has had many unintended consequences--narrowing of the curriculum, cheating, exclusion of low-performing students, manipulating data, etc. What kind of accountability would avoid these pitfalls? What makes sense today? Diane Ravitch

David Figlio:

Diane, It strikes me that many of the unintended consequences, such as manipulating data, cheating and exclusion, are much more likely in the case of status-based accountability systems than in the case of systems based on value added. It’s much harder to “game the system” when one has to game more and more every year to make gains. Curriculum narrowing could still be an issue in either type of system -- so if policymakers wanted to avoid that, there would need to be some explicit safeguards.

Question from Karen Sterling, Associate Director, Granite District:

What are your thoughts about Adaptive Tests done statewide? What are your thoughts about using the ACT to measure high school proficiency?


I would like to see more experimentation with adaptive tests in the context of state accountability systems. In some states, “floor and ceiling effects” – the idea that state tests can’t differentiate very low or very high levels of performance – make it difficult to accurately estimate achievement gaps and to determine whether these gaps are closing, stable, or increasing.

As I understand them, adaptive tests like that used as part of the Early Childhood Longitudinal Study – Kindergarten Cohort, first administer a “routing test” that determines the level of difficulty of the test a student will receive. Unfortunately, because these would likely have to be administered to children in the elementary grades by teachers, I suspect they will be prohibitively expensive for many states.

Question from David Cox, analyst Florida House of Representatives, House Democratic Office:

1) Florida officials annually tout the advancements students make on its high-stakes assessment test the Florida Comprehensive Assessment Test. Yet despite the increases in most scores, the state ranks very low in several key national indicators, including its high school graduation rate, ACT scores and SAT scores.

Do you believe Florida’s education accountability system is a quality one; and can you briefly talk about what states have in your opinion the best accountability systems?

2)Have you updated or reached any evolving conclusions about the impact vouchers have on low performing schools since your 2005 study on accountability and whether voucher threats improve low-performing schools? David Figlio:

Florida’s accountability system is one of the systems I know best, and I think that it has many of the hallmarks of what I would consider a high quality system. Schools are evaluated on the basis of student learning gains, which reduces the incentives for “gaming” (indeed, I can find very little evidence of gaming in post-2002 Florida schools) and the system places extra weight on the performance of low-performing students, helping to ensure that schools will continue to pay attention to many of the students who might have fallen through the cracks. This combination of focusing on value added while concentrating on low achievers is very much the spirit of the goal of No Child Left Behind, but avoids many of the potential gaming pitfalls.

The Florida Opportunity Scholarship Program never really got large enough to know whether it would have long-term impacts on the public school system. I believe we will learn a lot from the evaluation of the much larger Corporate Tax Credit Scholarship Program (which, truth in advertising, I am currently evaluating.)

Question from Christa Wallis Ttitle 1 Resource Teacher, SBCUSD, San Bernardino, California:

Are the school targets under NCLB going to based on a (more reasonable) growth model (like California uses in the Academic Performance Index, API) rather than the current “one size fits all"model?


I don’t have any inside political insight into what features the NCLB reauthorization will include, especially as it will likely be influenced by who wins in November.

Nonetheless, here are two thoughts on growth models. First, despite the fact that the Department of Education has a “growth model pilot,” states are not really using growth models – i.e. estimating school effects on continuous academic growth irrespective of their starting point. Instead, they are “projection models,” which give schools credit for putting students on track to become proficient. These projection models give schools more time to move students to proficiency, but don’t fundamentally alter the perverse incentives – for example, the incentive to focus narrowly on students performing right below the cut score (“bubble kids”) - that made growth models attractive to many in the first place. Second, those concerned with the use of pure growth models because they draw attention away from the lowest performing students might take a look at California’s API system, which gave more weight to moving students in the lowest performance category up. We can work within a growth model framework while giving different weights to growth at the high and low ends of the distribution.

As David pointed out, the system design issues are of utmost importance here because these incentives do alter educators’ behavior.

Question from Jonathan Cohen, Ph.D. Adjunct Professor, Teachers College, Columbia Univ:

What do you think are the pros and cons of using school climate assessment as another method of accountability. Ohio is planning on doing so. It has the advantage of recognizing the social, emotional and civic dimensions of school life: the foundation for school -- and life -- success.

David Figlio:

School climate assessment has a lot to like about it -- it captures a non-test-based indicator of school quality, and can be a measure of parental satisfaction. However, it has a couple of disadvantages. For one, it is more likely to be manipulable than are test scores (especially it heavy weight is placed on teacher or principal responses) and the parents who are most satisfied with the schools are the ones most likely to (a) remain in the school and (b) respond to the climate surveys. So I’m cautiously optimistic about climate surveys as a “check”.

Question from Brenda Fischer, Associate Director of Master of Education in Teaching and Learning, Saint Mary’s University of Minnesota:

What effect has the use of high stakes standardized tests had on the assessment practices of teachers in their classrooms? Are they now “testing to the test” to insure that both test content and test process are very familiar to students?


A number of studies have demonstrated that teachers not only emphasize the content that consistently appear in state tests, but the format in which these questions are asked as well. For example, if students are asked to provide 4 sentence responses to a prompt on state tests, teachers will often integrate similarly structured assessments into their classrooms.

At first glance, this is not a problem – you know the familiar argument about the “test worth teaching to.” But this argument ignores the thorny problem of “test score inflation.” To understand why test score inflation is a serious problem, you have to understand the sampling principle of testing. Psychometrician Dan Koretz provides the following example: Suppose we want to evaluate students’ vocabulary. A typical high school student knows 11,000 root words, but a test can only include a sample of these words – maybe 40. If we design our test well, we can still learn something about the breadth of each student’s vocabulary. But we don’t really care if the student knows the 40 words on the test; rather, we care about the larger domain from which these words are sampled.

Now imagine that for weeks before our test, I drilled students incessantly on those 40 words. Voila! They perform exceptionally on the test. Yes, their vocabularies have increased by 40 words. Maybe these are 40 really important words - the so-called “test worth teaching to.” But proficiency in the domain that my test is intended to measure has not expanded by the same amount.

I’ve seen this over and over again; administrators and teachers figure out which concepts are consistently on the test, and which aren’t, and they alter their instruction accordingly. The trouble is that if we administer a slightly different test, drawing on a broader range of concepts from the domain we care about, kids haven’t mastered them. Question from Michelle Mullen, Consultant for AVID Center (Non-profit organization):

Given the need for large-scale assessments and data collection, how can states create a more performance-based assessment system that is efficient and economical and offers a more accurate picture of student knowledge and skill?


You have identified the two central problems with performance-based assessment systems: 1) their cost, and 2) the potential for varied standards of evaluation to be used across schools and classrooms - the very problem standardized tests were intended to address. Regarding the first issue, I can imagine a system in which teachers from schools outside of the same district were called upon to evaluate students’ performance based work, whether it be designing a science experiment or writing an essay based on a set of historical documents. The latter has been executed quite well in the context of the History AP exams, so at least there is a precedent for it. Using other teachers as evaluators also helps to hold down costs, but it will still be more expensive that a Scantron machine.

The question, of course, is whether we could train teachers to use a standard set of evaluation criteria to assess students. I think it’s critical that teachers are not assessing their own students on these types of assessments; in NY State, teachers grade their own students Regents tests, and it creates some very uncomfortable conflicts of interests (and temptations to cheat) when your own student is one question short of passing.

Question from David Long, Attorney:

What state policy proposals would you recommend to address the problem of teachers, schools and school districts responding to state testing regimes by emphasizing rote learning, test taking strategies and teaching to the test at the expense of higher order skills, inquiry-based learning and knowledge and skills other than those tested by the state?


Economist Sunny Ladd has recently put forth a provocative proposal for addressing some of these problems, and I stand behind her idea. In short, Ladd’s idea is to incorporate qualitative reviews by teams of evaluators into states’ monitoring systems to ensure that the process through which test outcomes are produced is not based solely on the shortcuts you cite.

Here is an excerpt from her Ed Week commentary:

Each state would establish a statewide review board that would function independently of the state’s department of education. The review board would send small teams of professionals to make periodic visits to each school—perhaps one visit every two or three years—with each visit preceded by an internal self-study. The review panel would then write a report on each school that, along with the school’s response, would be made public. Though the report would include a summary of the school’s success, or lack thereof, in raising student achievement in the core subjects, it would evaluate the school on a far broader set of outcomes than student test scores alone.

The ultimate concern would still be student outcomes, but schools, in concert with district policymakers, could help define which additional outcomes were most important given the students they serve. Moreover, the review panel would look closely at the policies and systems that schools put in place to promote those outcomes. The panel itself would not be in the business of providing assistance or support to the school, since doing so would interfere with its ability to be objective. Ironically, the result could well be more testing of students, not less, but with the tests being used more for internal diagnostic purposes within the classroom than for school-level accountability. The intent here is to encourage the schools to develop their internal capacities to make data-driven decisions, while not forcing them into a straitjacket of common outcomes and practices.

You can find her Ed Week commentary on this issue here.

Question from Stephanie Garnes, Educator:

Many parents choose a school or district based on the school’s test scores or performance ratings. Should parents only consider a school’s test scores when choosing a school for their child? Why?

David Figlio:

I believe that it is a mistake for a parent to only consider test scores when choosing schools. Test scores are only one indicator of a school’s quality. But they don’t capture different teaching pedagogies, curricular focus, topical specialties, etc. In fact, given that I’ve found that housing in the school zones of top-rated schools (based on test scores) comes at a considerable premium (my 2004 paper in the Americna Economic Review suggests that in Florida, an A school is worth 9 percent more than a B in house prices, for instance), parents willing to spend a lot of time trying to find the best fit for their child may have money-saving opportunities if they choose a slightly lower-ranked school that they think is a better fit!

Question from K. Houghton, Reading Specialist, East Greenbush CSD:

Do you feel that testing masks modest gains that may be made daily by students who are slower learners and just need a little extra time to learn the concepts presented?


I don’t think that testing necessarily masks gains made by students. It’s our current focus on proficiency that masks these gains, because we’ve told schools and teachers (by virtue of NCLB’s incentives) that the only gains that are worth making are those that push a student over the proficiency cut point. But it’s important to recognize that this is not a flaw in testing itself, but a problem with how we’ve chosen to use tests in the context of NCLB.

Question from Fern Goldstein, special education teacher, Banyan School:

How does portfolio assessment figure into accountability? It is a more valid representation of a student’s accomplishments, but very subjective and procedures are not standardized from state to state or very thorough.

David Figlio:

Portfolio assessment is a terrific representation of a student’s accomplishments, but it is extraordinarily expensive to carry out in a systematic manner. In order for this to be a valid accountability tool, portfolios would need to be assessed externally, rather than by individuals who are being explicitly or implicitly evaluated by the accountability system. Setting up guidelines for evaluation that are common throughout a state, and then carrying it out, would be well beyond the reach of any state, I would guess. It would, however, be a terrific validity check to do small-scale.

Question from Marlene Brubaker, Teacher, CCTS:

What efforts have been made by the Federal and State governments to determine the effects of elimination of music, art and sports (in order to meet ever increasing financial strain to support math/english) on the actual achievement areas? By this, I am implying that if we cut music, for example, we actually will reduce a child’s ability to master math, based on the research that these two are intrinsically related.


To my knowledge, the federal government is not monitoring the amount of time spent on non-tested subjects like music and art after the implementation of NCLB. However, the Center on Education Policy has conducted a number of district-level surveys to monitor changes in time use in which you might be interested.

We often hear that NCLB does not mandate that educators focus on reading and math to the detriment of other subjects. But NCLB is a policy predicated on the idea that incentives can fundamentally change behavior. We should *expect* teachers to respond to NCLB’s powerful incentives. It therefore is not surprising, as you note, that there is a growing body of evidence, both systematic and anecdotal, that many schools are devoting more instructional time to reading and math and less time on other school subjects, such as social studies, science, and the arts. This is particularly evident in schools most at risk of missing AYP.

Question from Marianne Caston, Director of Supervision, Antioch University, Santa Barbara:

Have there been any considerations of the unintended consequences of NCLB testing and accountability practices, such as making it more difficult to place student teachers in classrooms? It seems that with the stipulations attached to not meeting progressively more difficult, if not unachievable expectations, school administrators are putting more and more obstacles in the way of novice teachers who must practice their actual classroom skills with real children to learn. Is this phenomenon happening nationwide?

David Figlio:

I am working on a study with my colleagues Tim Sass at Florida State University and Li Feng at Texas State University right now to investigate the effects of school accountability pressure on new and experienced teachers’ decisions to remain in the classroom. Our preliminary findings indicate that teachers, particularly young teachers, in schools facing increased accountability pressure are more likely to leave the school or teaching altogether. We are now doing more digging to see whether the young teachers who leave are the relatively successful or unsuccessful ones.

About a decade ago I conducted a study with Kim Rueben, now at the Urban Institute, on the effects of property tax limits on teacher quality. This is relevant because tax limits are another form of external accountability on schools and districts. We found strong evidence that tax limits reduced the quality of young teachers, as measured by the selectivity of their undergraduate institution -- the best measure we could find nationally at the time. Question from Nancy Mestas, Student, El Paso Community College:

I did some observations during this past spring semester. I can honestly say that I found it rather interesting that Special Education students are to take the same test format as a regular student but with some modifications. What I find insane is these modifications only consist of a larger print and no distractors. Why?


One of the biggest complaints about NCLB is that only a small fraction of special education students can take alternate assessments, and that only a limited number of accommodations can be offered in most states. Here’s the debate in brief - those concerned with extending alternate assessments worry that special ed kids will be overlooked if the standards are different, while others worry that these kids - who are classified as special education in many cases precisely because they aren’t working on grade level - are being asked to do something that’s educationally and developmentally inappropriate.

What do we know from states’ experience with special education exemptions? Fellow chatter David Figlio and Lawrence Getzler, looking at data from Florida, found that such loopholes led educators to hide low-scoring kids in special education. Julie Cullen and Randy Reback, analyzing data from Texas, found similar processes in play. Neither of these states had strong participation rules, so reclassification is not as big an issue under NCLB as it has been in the past. However, we might predict that if teachers face high pressure to push some students but not others forward, these kids may very well be neglected. On the other hand - as you experienced - we can imagine that there are potentially negative social effects of the current approach (i.e. all kids on grade level, no matter what) on kids facing tasks they’re simply not prepared for.

Question from John Monahan, Teacher, Patterson High School, Baltimore City, Maryland:

Testing and accountibility are two different things. In Maryland, state officials crowed about improved test scores without mentioning that this years test was significantly shorter and had clearer direction than last years. That means that the test scores from one year can’t be compared to scores from another. When are the people designing the tests going to be held accountable?

David Figlio:

It is important for the credibility of an accountability system that any meaningful changes in a test administration or scoring system be put front and center. This again is something that I admire about Florida’s system and the way in which changes were announced and interpreted, at least in general.

By the way, I spent much of my formative years in the Baltimore City public schools.

Question from Barb Kapinus, Sr. Policy Analyst, NEA:

If current tests are limited indicators of what students can and do learn, it seem that value-added is only a partial answer to the problems of current accountability systems. What else, beyond the tests, can improve accountability systems?


If it is your belief that the goals of public schools include more than fostering academic achievement, as is mine, then value-added does only provide a partial policy solution. I would like to see schools evaluated on a wider range of student outcomes, but as David said above, surveys of students, teachers, and parents about these non-cognitive outcomes and processes of schooling are subject to massive inflation when they become high-stakes. So I’m partial to adopting a qualitative review process like that proposed by Sunny Ladd above.

Question from J.H., graduate student, Emory University:

Although NCLB mandates standardized testing in science, there is a great deal of evidence suggesting that science education remains an undervalued and often marginalized subject area in our nation’s elementary schools. At the same time, researchers in science education continue to call for reform that will ensure that all students in all grade levels are given opportunities to gain science proficiency. How do we reconcile these apparently conflicting priorities? What are your thoughts about the current state and future of early (K-8) science education in the age of NCLB?

David Figlio:

I served on a National Research Council panel a few years back on the subject of science and accountability, so I’ve given this topic some thought over the years. Science is such a difficult set of subjects to test, and to test well, largely because it is such a challenge to test whether students have developed the basic tools of inquiry. Many science tests are more fact-based than are other subjects commonly tested for accountability purposes, so I worry that if we ramp up science testing for the purposes of accountability we might see an increased emphasis on science but not necessarily the science skills that society most wants students to master. I’m not yet sure if there is a reconciliation; I certainly agree with people who want to see science emphasized in our elementary and secondary classrooms, and I also think that testing can help to focus attention on subjects. But I also fear that teaching to a science test is more problematic than teaching to, say, a math test.

Question from Nancy Mestas, Student, El Paso Community College:

My expected graduation year is about 2011. What kind of changes can we as future teachers expect?

David Figlio:

I’m always thrilled to hear of people interested in becoming teachers! I suspect that the policy climate for teachers will change a lot over the next decade. School districts and states will experiment more with performance-based pay, and I expect the trend towards more standards for students and teachers to continue. How this will all play out, however, is anyone’s guess.

Question from Inga Weberg, Sales Manager at Redleaf Press:

Test validity for young childrens is less than half compared to, say, 10 year olds where you can predict scores in the 90 percentiles. What is being considered to “test” young children? I assume alternative observational materials etc.

David Figlio:

I don’t know of any accountability systems using tests for students below third grade, precisely for this reason. I’m not a psychometrician, but my understanding of this literature is that test reliability and validity is considerably higher for tests in later elementary grades than in the first few years of elementary school.

Question from Tami Ellison, Educator/Educational Consultant,

It seems that if we want to improve student achievement, we might want to develop meaningful assessments that actually impact student learning and improve clasroom practices. Summative assessments do little in effecting change at the student level in a timely manner. How might we improve our use of formative assessments, both formal and informal, as the basis for developing actionable remedies that can actually impact student learning and impact student achievement?


Great question - we’ve had a long discussion about the potential and pitfalls of data-driven decision making over at the blog.

One of the biggest problems I’ve witnessed recently with the use of formative assessments is that they are rarely used to say that a given student has strong comprehension skills but weak writing skills, but rather used to make predictions about the likelihood that the student will pass or fail the state test. The real potential of data-driven decision making lies in the possibility of drilling down in these data and identifying each student’s academic strengths and weaknesses. But it is difficult to encourage teachers to use formative data this way when they are primarily focused on getting students over the proficiency bar. My take is that without changing the broader structure in which teachers are working, it is difficult to tell them to ignore the overall scores and focus on each individual student’s needs. If we move to a true growth-based system, hopefully we will see formative data used more this way.

Question from Brian Stecher, Researcher, RAND Corporation:

What are some ways to get the data we would need to accurately answer the question of what students actually know and how great the gaps actually are?

David Figlio:

Brian - This is a difficult question to answer in the minute or so that I have left. But I wanted to quickly put a plug in for the National Academies of Sciences volume “Knowing what students know” by Jim Pellegrino and colleagues. This volume makes a lot of the points I’d like to make if I had a few more minutes!!

Alexis Reed (Moderator):

Thank you for all of your great questions, and many thanks to Mr. Figlio and Eduwonkette for their time and insights. Unfortunately, we have more questions than time, so we’ll have to bring the discussion to an end. A transcript of this chat will be available on Education Week’s website shortly:

The Fine Print

All questions are screened by an editor and the guest speaker prior to posting. A question is not displayed until it is answered by the guest speaker. Due to the volume of questions received, we cannot guarantee that all questions will be answered, or answered in the order of submission. Guests and hosts may decline to answer any questions. Concise questions are strongly encouraged.

Please be sure to include your name and affiliation when posting your question.’s Online Chat is an open forum where readers can participate in a give- and-take discussion with a variety of guests. reserves the right to condense or edit questions for clarity, but editing is kept to a minimum. Transcripts may also be reproduced in some form in our print edition. We do not correct errors in spelling, punctuation, etc. In addition, we remove statements that have the potential to be libelous or to slander someone. Please read our privacy policy and user agreement if you have questions.

---Chat Editors

Related Tags: