Capturing Teaching's Essence: Stanford Team Tests New Methods

Architects have building plans, fashion models have portfolios, and lawyers have briefs, but "great teaching, just like mediocre teaching, disappears into thin air as soon as it's completed,'' says Lee S. Shulman, professor of education at Stanford University.

To capture the essence of exemplary classroom teaching and somehow evaluate it is the multimillion-dollar challenge that Mr. Shulman and his Stanford colleagues have been working on for over a year.

Much is at stake for the researchers and their sponsor, the Carnegie Corporation of New York. The concept of voluntary national certification for skilled teachers--which is being explored by the National Board for Professional Teaching Standards--will not earn wide acceptance, public officials say, unless more credible means of assessing teachers' capabilities are developed.

Mr. Shulman's "Teacher Assessment Project'' is not creating tests for the new national board, which Carnegie helped launch. But it is aiming to provide board members and other policymakers with a broader conception of what new types of teacher assessment might look like, as well as some specific examples.

Many view the problems of developing such assessments as formidable. Most states that have ventured into the documentation area, in particular, are "no longer using portfolios'' or other means of collecting data on teachers' activities in the classroom, says Lynn Cornett, associate director of school-college programs for the Southern Regional Education Board.

"They became an exercise in amassing paper,'' she says. "They didn't give [states] the information they wanted about the teacher.''

But while the methodology has been flawed, Mr. Shulman argues, the notion of documenting a teacher's performance in school is "fundamentally sound.''

The Field Test

This past year, with the support of an $817,000 grant from Carnegie, the research team created sample exercises, or "prototypes,'' that might be used in a two-day "assessment center'' for teachers.

In Mr. Shulman's lexicon, an assessment center represents a site away from the school building where teachers might go for one or two days to be evaluated using a range of exercises that approximate such "typical''--but seldom measured--classroom tasks as developing a lesson with colleagues or marking student papers.

The assessment-center exercises focused on two very narrow subjects: teaching equivalent fractions to elementary-school students and the American Revolution to high-school students.

Mr. Shulman argued that researchers would have to devise workable prototypes in such narrow domains before they could hope to design a whole assessment.

Last summer, a total of 40 teacher "candidates'' from around the country traveled to Stanford to participate in a field test of the assessment-center activities. The teachers of high-school history and elementary-school mathematics ranged from 25-year veterans to individuals who had just completed their student teaching.

They came from rural, suburban, and urban areas, and represented several racial and ethnic groups.

Within each subject, the exercises asked the teachers to engage in such tasks as analyzing a textbook, critiquing a videotape of another teacher's performance, designing a small-group lesson, and responding to students' questions or misconceptions about the subject. (See related story on page 21.)

Each candidate was also asked to prepare a "familiar lesson,'' which he or she presented to a group of six students specially recruited for the task.

"We didn't want to leave out an example of actual teaching,'' explains Mr. Shulman. "Somehow, the notion of having an assessment of teaching that didn't have one real live instruction seemed like a contradiction in terms.''

"It was also the only exercise we had that permitted a sufficiently complete, active instruction to unfold before our eyes,'' he adds. "We could then sit down with the teachers and examine the extent to which they could look back on something they had done and reflect on how they could do it differently.''

The 'Heart' of Teaching

Teachers who helped field-test the exercises--either as "candidates'' or "examiners''--describe them as some of the most realistic reflections of teaching that they have encountered.

"I learned a tremendous amount,'' says Susan Adler Kaplan, a high-school English teacher in Providence, R.I., who helped walk other teachers through a sample exercise in U.S. history.

"It made me look at my own teaching,'' adds Ms. Kaplan, who is also a member of the national board. "The interviewing techniques that we used really ... got to the heart of what it is we do and why we do it.''

"You really got an insight into the way teachers think about teaching,''says A. Robert Lynch, a high-school social-studies teacher in Jericho, N.Y., and another board member. "It's rare, except on some occasions in the faculty room, that you get somebody explaining why they did a certain thing in teaching about the American Revolution.''

The exercises--and the scoring system that accompanied them--were designed to mark the kind of proficiency that might be expected from a teacher with two to four years' experience, an adequate preservice education, and an organized induction into teaching.

Although that may not be the standard chosen by the national board, Mr. Shulman says, the researchers needed to design their exercises around some predetermined standard to see if they could create a set of workable prototypes.

The volunteer examiners led candidates through the exercises, observed others, and took notes about what they saw.

'Summer Camp' for Teachers

The examiners also conducted "debriefing'' sessions with the candidates to gauge their reactions to the activities.

Fernando R. Moreno, a history teacher at Gilroy High School in Gilroy, Calif., volunteered to be a teacher "candidate'' for the field test. He compared the experience to a "summer camp'' for teachers.

"I've been teaching for 10 years now,'' he says. "I got to review a lot of the techniques that I use and to see other teachers use them.''

"I think that periodically we should all go through a training session like that,'' he adds, "just to revise our skills and exchange ideas with teachers from other parts of the country.''

Not 'Anything Goes'

When the field tests were completed, the research team found itself with several file cabinets full of examiners' notes, audio tapes, videotapes, and miscellaneous records, all organized by candidate and exercise.

Now, researchers are grappling with their most difficult challenge: how to condense the rich information collected during the summer into a brief synopsis that might be used in deciding whether to "pass'' or "fail'' a candidate for certification.

Unlike traditional multiple-choice tests, the exercises were based on the premise that teaching is complex and that there is no single right way to do it.

"The challenge,'' says Edward Haertel, an associate director of the project, "is recognizing diverse kinds of answers as legitimate, while at the same time maintaining standards--not just saying anything goes.''

Refinement of the scoring system will continue throughout the coming year, according to Mr. Shulman. So far, he says, the researchers have identified some positive trends and a number of potential difficulties.

A 'Clear Difference'

After each exercise, the examiner and several observers were asked to "rate'' the candidate on a seven-point scale. The purpose of this "intuitive global rating'' was to provide a "rough and ready'' way to sort out candidates of highly differing ability, according to a report on the project.

Examiners' agreement on these ratings turned out to be quite high in statistical terms.

"There was a clear difference between the new teachers and the experienced teachers,'' says Mr. Lynch. "People who had been teaching 10 or 15 years got directly to the issues, knew exactly where they wanted to go, and could foresee the problems that the students would have understanding them. The new teachers tended to teach from the perspective of the content.''

'Mock Boards'

But it proved much harder, the researchers acknowledge, to develop a more detailed scoring system.

After an initial practice round, the researchers designed a scoring procedure that could rate a candidate's performance on each exercise in up to five general subcategories, such as command of subject matter, content-specific pedagogy, and classroom organization and management. Performance in each category was then summarized across exercises.

Project staff members scored performance for each subcategory on a six-point scale that ranged from "AAA'' for "distinguished'' performance, to "C'' for "questionable'' performance. In addition, scorers could use "flags'' to signal aspects of a candidate's performance to which they wanted to draw attention--such as unusual gifts, or particularly troublesome deficiencies.

Labels and descriptions for each category and for each point on the rating scale were developed to help scorers determine how to rate the candidates.

The research team then created decisionmaking panels, or "mock boards,'' in which small groups of people were asked to use the more detailed rating system to "pass'' or "fail'' candidates for purposes of certification.

Although the mock boards did not approximate what a real certification board would look like, they helped identify weaknesses in the scoring procedure as well as the kinds of information that a real board might request about a candidate.

According to the researchers, much of the difficulty and ambiguity in scoring reflected problems in the exercises themselves, such as providing candidates with clear instructions.

In addition, because examiners had been uncertain about what kinds of follow-up questions or "probes'' they should ask candidates in order to elicit information, some of the data on candidates was insufficient for the purposes of scoring.

Those types of difficulties, says Mr. Haertel, an associate professor of education at Stanford, have forced the research team to intensify its efforts to refine the exercises and the scoring procedure.

"I still think it can be done,'' he says. "We've made a lot of progress, and these are things that can be fixed. We've learned a lot about the exercises and about the scoring.''

But he and others caution that national certification for teachers probably should not rest on performance assessments alone.

Eventually, Mr. Shulman speculates, the national board may want to certify teachers based on some combination of education, experience, paper-and-pencil tests, assessment-center exercises, and documentation at the school site.

"One of the reasons we're moving to documentation,'' explains Mr. Shulman, "is because we've already seen what the limitations of performance assessments are.''

The research team tried unsuccessfully, for instance, to devise an assessment-center exercise that would require candidates to manage a disruptive student. "It always seemed too artificial,'' says Mr. Shulman.

Similarly, teachers routinely design student tests as part of their teaching, but "everyone knows they don't write them in 45 minutes,'' the researcher says. That kind of activity was also left out of the assessment center.

Assessment centers "can't really capture'' the myriad of specific, contextual factors that influence good teaching, says Linda Vavrus, a co-director of the project. "It doesn't allow teachers to demonstrate their competence in the same way they perhaps could in the context in which they work.''

Trying to develop a link between assessment-center exercises and ways to document a teacher's actual classroom activities is the project's next goal.

If the national board can devise such a system, Mr. Shulman predicts, it would be "superior to the assessments in any other profession.''

Moving Ahead

The researchers plan to rely heavily on the insights and participation of expert teachers in designing new forms of documentation.

"I really believe that the only way this whole thing is ever going to fly is if teachers are actively involved not only in setting the standards but also in thinking through some of the things that they're going to be doing to demonstrate their competence,'' says Ms. Vavrus.

Together with co-director Angelo Collins, Ms. Vavrus is in the process of putting together teams of teachers who will work with them throughout the coming year.

The documentation activities will explore two specific subjects: how to teach critical literacy skills to 3rd and 4th graders and introductory biology to high-school students. The Carnegie Corporation has provided an additional $1.3 million for that effort.

Potential Activities

Because documentation activities could extend over a full year--compared with the one or two days a candidate might spend in an assessment center--the researchers do not plan to narrow their focus to just one aspect of teaching.

Instead, they hope to collect data on many different kinds of teaching activities.

Ms. Collins, for example, has been asking experienced biology teachers what evidence they would want to show someone to prove that they are "exemplary'' teachers.

"The first thing they wanted to show off was the work of their students,'' she says.

Such information will become the starting point for designing documentation activities.

For elementary-school teachers, the project will emphasize using children's literature to teach speaking, listening, writing, reading, and comprehension skills.

"We want to be able to get at more than what is included in basal anthologies,'' says Ms. Vavrus. "We believe that an exemplary teacher of elementary literacy would be very adaptable and knowledgeable about the kinds of structures that are present in children's literature'' and how to use them.

As part of their portfolios, she suggests, teachers might be asked to document their end-of-the-year activities; show how they established a "literate environment'' in their classroom; and demonstrate how they created a reading and writing project using children's literature of different kinds.

Two Views of Teaching

The project is also trying to balance its emphasis between teaching as it now exists and teaching "as it might be'' based on advances in research.

In a report to Carnegie officials that summarized their first 15 months of work, Mr. Shulman wrote: "We frequently encountered the question of what is fair or reasonable to expect of teachers--whether, or to what degree, the national board's assessments should attempt to promote, or hold teachers accountable for, advances in teaching.''

To date, he says, the project has leaned toward assessing teaching as it is most commonly practiced. In the future, it will try to balance an emphasis on the "status quo'' with exercises that ask teachers to engage in more unusual instructional tasks, such as cooperative learning.

"We want to get away from the didactic, lecture-based model of instruction'' that is now so common, says Ms. Vavrus. "We're not saying that it's not important for teachers to be able to conduct a good recitation, but there certainly is more to good literacy instruction than that. I think that we are trying to move the focus of the literacy agenda ahead somewhat.''

Encouraging Cooperation

The documentation activities will also help the research team emphasize one of the most important aspects of teaching: encouraging cooperation in schools.

"Documentation gives us an opportunity to look at collegial relationships among teachers,'' says Ms. Collins. "One of the criteria that we are talking about when we design an exercise is, 'Is this something that a teacher could do by him or herself, or does this exercise require some form of attestation by another person, and who is the appropriate person to do it?'''

"I am trying to put together teams of teachers to work together next year,'' she adds, "and the teams will consist of an experienced teacher, who has worked full time on the design process, and a beginning teacher with relatively little experience, who will be the constructor of the documents.''

According to Mr. Shulman, the project's "first commitment'' is not to assessment itself, but to "improvements in teaching and in teacher education.''

"That suggests that you create a substantial proportion of your assessment under conditions that not only permit but invite coaching, assistance, and input from colleagues,'' he says.

"In an ironic way,'' he adds, "the kinds of things you construct conventional tests to avoid, you design the documentation part of the process to foster.''

If board certification is aimed at teachers with two to four years' experience, Mr. Shulman argues, it could have a strong, positive influence on the quality of teachers' preservice education and on their introduction to teaching.

Minority Candidates

The researchers are also working with a group of colleges that prepare large numbers of minority teachers to see if they can design experiences that would help students pass these new kinds of assessments.

By designing tests that more accurately reflect actual classroom teaching, the researchers are hoping to weed out some of the "irrelevant'' factors that often prevent minority candidates from doing well on tests.

"What we're not interested in doing is the typical test-bias exercise, where you find out which kinds of items minority candidates have trouble with and you throw them out,'' says Mr. Shulman.

"The question,'' he argues, "is what kind of support system can be developed at the training site to help candidates develop the kinds of competence needed to meet these standards?''

"The exciting thing,'' adds Ms. Collins, "is that our charge is to explore. We don't expect to come up with an answer, or even a model. What we're ending up with are questions--lots and lots of questions.''

