How much time should we give a reform before we demand to see results?
In school improvement, once a reform is implemented, the question always (and often immediately) asked becomes: “Is the reform working?” Because of this impatience for results, the issue of what constitutes an appropriate gestation period for reform measures holds great import for both schools and policymakers. How much time should we give a reform before we demand to see results?
By way of illustration, my colleagues and I spent years trying to get learning technology into the schools. In one state, a law was finally passed allocating a generous sum for technology. That occurred in June. When the state’s legislators returned from their summer vacations, they wanted to know if test scores had risen because of the bill. The computers had not even been purchased yet! So, how long should we give a reform? Clearly, more than from June to September.
In economics, there used to be a discussion of the “lags” in the effects of monetary policy: how long it takes between the time a problem in the economy is identified and the time it is solved. Since monetary policy is under the purview of the Federal Reserve Board, these lags could be pretty well controlled, at least up to the point of taking action.
What are the corresponding “lags” in education reform policy? I can name a few. First, the recognition lag, which is similar to that faced for monetary policy: how long it takes to pinpoint a problem and begin to plot a solution. In education’s case, we might be talking about the recognition that reading scores are low and stagnant. Because education reforms are generally brought about by legislation, though, and because there are competing solutions to this problem, the desired program to fix it would have to be identified. Hence, we have a policy-selection lag. Recent efforts to choose only policy solutions shown through scientific research to work have complicated the selection process and extended this lag. How does one show that a policy works before it has been tried?
The program selected is proposed in the legislature, where the bill must be debated and passed (legislation lag). Then regulations must be written to tell districts how they should implement the law. The time this takes could be called the regulation lag. Unfunded laws, of course, go nowhere, so there is the appropriation lag, when money must be provided, generally by a different legislative committee from the one that developed the law.
Most new education reforms are controversial and go against some groups’ self-interest. That often leads to attempts to invalidate the proposed reform. So there is what I will call the litigation lag, in which districts are afraid to start, lest there be a challenge in court. After that, there will be an implementation lag, when districts and schools try to do what the legislation and regulations require but can’t get it right immediately. At the same time, there may be a buy-in lag, wherein educators resistant to change try to figure out how to continue “business as usual” while trying to adhere to the law, or at least appearing to try. Once implementation is begun in earnest, teachers have to learn how to do the new approach. So we have a learning lag.
After a while, the new policy will be put in place, so we go on to the impact lag. The first day a new technology is used in instruction, we will not see measurable increases in student achievement. Ultimately, students may benefit from the reform, but the National Assessment of Educational Progress tests will not be given for another year, and not reported for a year after that. So we have a measurement lag and a reporting lag.
For the purpose of measuring progress, we could use scores on state tests, which at least come out once a year, but then we would see an interpretation lag, when people on both sides of the reform use or criticize the construction of the test, its alignment, or who takes it to prove the reform is either working or not working. This problem is magnified in states that frequently change their state test, because, in that case, year-to-year changes in scores are difficult to calculate.
But when we do have comparable data, we get to the final problem, perhaps best characterized as the methodology lag. This simply means that whenever the results of the study of a policy’s impact disagree with advocates’ views, they will criticize the methodology. Thus, the effectiveness of the Tennessee class-size-reduction program, called STAR, which ran from 1985 to 1989, is still being debated in terms of its design and the methodology used to assess it.
By this time, the administration in Washington, or the state’s governor or schools chief, or the district superintendent, or school boards at various levels have changed, and so the whole policy framework changes. Who cares whether or not the earlier policy worked? It was the president’s, the governor’s, the chief’s, or superintendent X’s plan, and now that Mr. Y is in control, he wants to try his own ideas.
There are many potential delays between the time a good idea is conceived and the time it should be required to prove its effectiveness 'beyond a reasonable doubt.'
This is not an argument against experimenting with new education reforms, and certainly does not advocate less evaluation through scientific research. Rather, understanding “lags” suggests several responses. First, policymakers should not demand definitive proof of a policy’s effectiveness too soon. There are many potential delays between the time a good idea is conceived and the time it should be required to prove its effectiveness “beyond a reasonable doubt.” Second, policy evaluation based on test-score growth or “value added” should begin early, but the resulting evidence should be weighed based on data available, methodologies employed, and how long the program has been operating.
Third, even though value-added student-learning gains may be the ultimate determinate of the effectiveness of a policy, we must seek intermediate outcomes that may be predictive of the ultimate conclusion before such a conclusion can be reached. For example, a number of programs are attempting to boost student achievement by improving the quality of teachers. It would be useful to assess whether or not such reforms are improving the quality of classroom teachers (perhaps by looking at their skills, knowledge, and classroom performance) even before we can analyze the value-added such teachers get from their students.
And finally, we should recognize that the proliferation of sophisticated methodologies and data to be applied to them, along with this lag structure, may enable smart and well- meaning evaluators to come up with entirely different conclusions. Sometimes these interpretations derive from personal or group biases. Often those against a reform can find alternative methods and data to trash distasteful conclusions and support others more to their liking. No single implementation of a policy, and no one evaluation study, should be expected to provide definitive evidence of a policy’s worth, pro or con. Rather, we need multiple applications of a program and a number of evaluations. Then we should judge the effectiveness of a policy by considering “the preponderance of the evidence.”
Lewis C. Solmon, an economist, is the executive vice president for education of the Milken Family Foundation and the director of its Teacher Advancement Program. He was formerly the dean of the graduate school of education at the University of California, Los Angeles.