Education Chat

Chat Transcript: Computer-Based Testing

How prevalent is computer-based testing in K-12 schools? Where is it headed? And what effect has the No Child Left Behind Act had on the use of computerized assessment?

Computer-Based Testing

Guests: Randy Bennett, distinguished presidential appointee, Research and Development Division, Educational Testing Service; and Gregory K.W.K. Chung, senior researcher, National Center for Research on Evaluation, Standards, and Student Testing, UCLA

Dec. 14, 2005

How prevalent is computer-based testing in K-12 schools?Where is it headed? And what effect has the No Child Left Behind Act had on the use of computerized assessment?

Our guests address these and related issues.

Kevin Bushweller (Moderator): Welcome to today’s online chat about computer-based testing. There has been much discussion about computerized assessment and its ability to provide quick results and analyses that educators can use to improve instruction. But has this potential been realized? What are the downsides of computer-based testing? And what barriers exist to prevent more schools from moving to this form of assessment?

Our guests will address those and other questions. So let’s get the discussion started ...


Question from Kevin Bushweller: What do you see as the biggest barrier to K-12 schools incorporating the use of computerized testing? And how do they get over that barrier?

Randy Bennett: Certainly one of the biggest barriers has been that, in many schools, there are not enough computers to test all kids in a cohort (say a grade) simultaneously in a secure fashion. This has led to administering tests within a window, say a three week period. That practice poses several problems, including that all instructional computing in those schools may come to a halt until testing ends. It also means that, to the extent that test questions are reused over that period, there is the potential for security leaks.


Question from Kevin Bushweller: In the past 5 years, what has been the most impressive technological advancement related to computer-based testing?

Gregory K.W.K. Chung: One impressive advance during 2000-2005 was how quickly automated scoring of essays moved from being a topic of research to a for-profit service. Three companies come to mind: ETS Technologies (now offered via ETS), Knowledge Analysis Technologies (now offered via Pearson Measurement), and Vantage Learning. While each company had different scoring techniques and different amounts of research evidence about the validity of the scoring technique, they were able to develop *operational* systems that (1) automatically score essays as reliability or better than human raters, (2) offer competitive pricing and other services (compared to using human raters).

Automated scoring of essays is probably the most concrete and widely known technology. However, there has been very interesting developments in the following areas: (a) assessment authoring tools—developing tools to help non-experts create assessments that are more likely to be of high quality; test assembly—techniques that allow simultaneous consideration of constraints so that the testlet created meets specified constraints; (b) scoring of complex performances (e.g., problem solving).


Question from Ron Skinner, ASBO International: What are some of the first steps that school districts with an average student-to-computer ratio can take to realize any advantages of using computer-based assessment to inform student learning, not just grade students?

Randy Bennett: I think the first step is to find a computer-based test that was specifically developed for that purpose (i.e., to inform student learning). That would mean a test intended to give meaningful information about the specifics of what a student knows and can do in a targeted skill domain.


Question from John Shacter, consultant and educator, Kingston, TN: Aren’t the pros and cons of this type of testing rather obvious? Check my answers: Should we do it? YES! -- Should we eliminate all other types of testing? NO! -- Should we eliminate all types of testing? DEFINITELY NO! All teachers test our students. Some teachers would just prefer not having any “outsiders” check up on the process. Isn’t that what it is all about?

Randy Bennett: I think it’s fair to say that there are legitimate concerns about testing, whether we do it on computer or on paper, and whether we do it for high or low stakes (though most of the controversy is over high-stakes tests). My hope is that if we thoughtfully construct and use computer-based tests, we can address some of the legitimate concerns about testing generally.


Question from Gene Coulson, Executive Director, Office of Program Services, West Virginia Department of Education: Is there any definitive research on differences in test score due solely to test mode (i.e. online vs paper and pencil)?

Randy Bennett: There is very little in the way of published, peer-reviewed research on this question at the k-12 level. (For adults, there have been many such studies and the general conclusion is that, for carefully developed cognitive tests, paper and computer versions produce scores with similar distributions and rank orders--meaning that mode makes little difference.) At the k-12 level, the most comprehensive studies have been conducted by NAEP, the National Assessment of Educational Progress. One study was done in mathematics and one in writing, with both studies using nationally representative samples of kids. The tests were either all constructed response (writing) or included a large portion of constructed-response items. Both studies came to the conclusion that test mode mattered because computer skill was related to online test performance. The conclusion I draw from those studies is that we should be very careful about how we design constructed-response items for use in computer-based tests and the provisions we make for students with limited computer skill. The complete report is available online from the National Center for Education Statistics at: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2005457


Question from Matt Silberglitt, Science Assessment Specialist, Minnesota Department of Education: Are any states currently managing two systems online and paper? Is it acceptable under NCLB for different students to be assessed on the same content in different ways?

Randy Bennett: Yes. Virginia would be the best example because of the high volume of online testing that it does. That state administered about 650,000 tests online last academic year. Most of those tests were high school, end-of-course tests. Over a dozen different test titles were offered in English, science, mathematics, and social studies. I believe that paper versions of all titles were available but that most students (by far) took the online versions. I am not aware of any restriction under NCLB on the mode in which a student is assessed, as long as all students are assessed against the same standards. (I would interpret that also to mean with comparable measures.)


Question from John Foster, PSU, Instructor: Online testing is on the increase, but noting varying local IT security and policy issues, what is your projection of the percentage of testing which will take place online in 2008?

Randy Bennett: It’s very hard to make projections. I count 25 states that are doing “something” related to online assessment currently. “Something” includes states that are running pilots as well as states that are already offering operational tests. It includes states that are offering only formative or diagnostic tests online, as well as states that already have online high-stakes measures for promotion and graduation. The farthest I would go is the say that I would expect all but a handful of states to be doing something in online assessment by 2008.


Question from Kevin Bushweller: What do you see as the potential drawbacks of computerized testing? What problems exist that schools really should be aware of?

Gregory K.W.K. Chung: From a practical standpoint, the big question is infrastructure--can the school support the testing process. This means adequate computers, software, reliable networks, security, space, technical support. Assuming the school is considering Web-based solutions, then schools need to be aware of the district network security policies. Restrictions on the network (e.g., firewalls) can result in last-minute surprises. 10 years ago headaches were centered around whether a school had Internet/network access. Now it’s whether all the security on those networks will allow the testing software to run, and this is particularly true when delivering innovative formats that require student interaction (e.g., simulations).


Question from Matt Silberglitt, Science Assessment Specialist, Minnesota Department of Education: Has anyone successfully implemented online assessment without a significant investment in improvements to the technology infrastructure in schools? Has anyone succeeded in convincing stakeholders that the cost of these improvements is justified?

Randy Bennett: It’s rarely the case that online assessment is used as a justification for improving school technology infrastructure. It has more often been the case that online assessment is a part of a plan to use technology for broad educational purposes. In other words, I wouldn’t pay higher taxes just so my kids could take tests on computer. I would pay higher taxes so that my kids could use computers in school to do their writing, information search, modeling, and so on.


Question from Kevin Bushweller: How can schools ensure that students are being tested via computer under similar conditions when the amount and quality of technology varies from school to school?

Randy Bennett: That’s a difficult problem but one that can be addressed in several ways. One way is to set minimum technology requirements and enforce them. Most of those requirements can be queried remotely. That is, the test delivery system can check before testing begins that a particular computer configuration is appropriate for administering that test and not proceed with the administration if the configuration is not suitable (i.e., critical software or hardware is missing, or Internet connectivity is not sufficient). A second method used by some delivery systems is to take control of the computer at the operating system level. By taking control, the delivery system can ensure that the display is set to the right resolution, that no other programs can be accessed by the student during the test, that screens cannot be copied, and so on. But ultimately, of course, some minimal level of technology needs to be available at each school and the test delivery software needs to be capable enough to make the testing experience virtually the same on each computer.


Question from Suzanne Finney, Academic/assessment Specialist, Charter Schools Office Ferris State University: What is the status of Computer Adaptive Testing Systems ability to produce longitudinal data for cohorts of students over multiple years?

Gregory K.W.K. Chung: I think there are two issues here. First, is the CAT system itself, which administers questions, accepts student responses, and based on the student’s performance and the internal item models, administers another question. Presumably, the system will store students’ scores for any administration and so forth.

But my sense is that what you are asking about is whether such systems track student performance over time so you can examine whether students are improving on whatever variables ... that is a data management issue. This would be a value-added feature of any CAT system that is vendor specific.

If you are interested more of the data management side, there are products available that do longitudinal reporting. The one I am most familiar with is QSP qsp.cse.ucla.edu [NOTE- this is a CRESST/UCLA product].


Question from Barbara Schwartz-Bechet, Ed.D., Associate Professor of Special Education, Bowie State University: Hello. How do you take into account indiviudal learning styles when you create questions for computerized assessments and how do you create fair rubrics for students with disabilities? Do you believe that students with disabilities should be afforded a different computerized format or should be graded differently?

Gregory K.W.K. Chung: This is a tough issue. In general, it depends on the purpose of the assessment. For example, if the purpose is to determine what a student “knows” (e.g., after some instructional period), then by all means you should ensure that the assessment is appropriate for the particular individual. Otherwise, what you may be measuring may have little or nothing to do with the student’s knowledge.

An absurd example--Suppose you are interested in determining whether a student understands narrative structures (e.g., plot etc.), and the student has a known reading disability, and the assessment is heavily text-oriented. The student scores low on a narrative structure measure. Duh. In this example, the low score may be unduly influenced by the student’s difficulty with reading.

So, alternatives tasks/measures need to be created that embed narrative structures (e.g., video based) but in a format that isn’t a barrier.

The focus should be on getting as “clean” a measure of the construct you are interested in. To that end, I would not have different definitions for different populations, unless that is an important distinction. But I would include accomodations if the accomodations do not interfere with the measurement.


Question from Kevin Bushweller: A few years ago, there was quite a bit of discussion about the potential of adaptive testing, in which a computerized test adjusts its levels of difficulty based on how well a test taker is performing. What’s new with adaptive testing? Is it being used more in schools? Or has its potential not been realized?

Randy Bennett: Adaptive testing is fairly common, though not in NCLB-related tests. The best example would be the Northwest Evaluation Association’s (NWEA) Measures of Academic Progress (MAP), which is reportedly used in ~1,900 school districts (see http://www.nwea.org/assessments/). NCLB requires that students be tested against the standards for their grade, which implies a relatively narrow difficulty range. Adaptive testing has its greatest benefits when one wants to measure across a wide range of difficulty, so it may not be as well suited to the within-grade, standards-based requirements of NCLB as to other assessment purposes, like the progress measurement for which MAP is used.


Question from Mary Allen, Account Manager, Edusoft: With the evolution of online testing, what is your opinion of the validity of the results in relation to the students’ computer capabilities versus their knowledge of the content? In short, will their computer skills or lack there of affect their scores on an online test?

Gregory K.W.K. Chung: Please see Randy’s response to a related question.

For online tasks more complex than typical formats (i.e., not multiple choice), I would also add that it is vital for the assessment designer to pay close attention to usability of the software and to make the task as simple as possible ... to the point where participants just need to point and click. If the use has to click more than 3 times to get where he/she wants to go, you’re asking for trouble (in a testing situation). A good rule of thumb is that if the participant has to think about the interface, then it’s a poor design.

From what I have observed of participants in numerous studies (from 4th graders, middle school, high school, adult, and military) using software we created (for research purposes), with such simplicity in mind, non-users can quickly become proficient in the use of the software. In fact, children seem to be far more at ease than adults when it comes to using unfamiliar software.

Now, for tasks that have a poor interface (or, another way of putting it--students need a lot of training to use the software), I would be suspicious of the results for low performing students, especially under the following conditions: (a) speeded conditions; (b) one-shot event; (c) new format [i.e., students have not seen it before in class].

Long term, I think “computer skill” issue (at the basic fluency level) will become increasingly less an issue, in the same way writing papers by hand vs. with a word processor is a non-issue. Online formats will simply be the norm.


Question from Kevin Bushweller: What important questions should researchers be examining about the use of computer-based testing in K-12 schools?

Gregory K.W.K. Chung: I think there are 2 important questions that researchers must ask themselves as they examine computer-based assessments in K12. First, will the use of computer-based testing help students learn? Can the results of the testing be used to identify gaps in student knowledge and skills? Of what benefit will the testing be to students?

Second, researchers should strive to explore formats that are common in standardardized testing. That is, given the potential for computer-based technology to deliver much more innovative formats, researchers should spend time examining how the technology can be used to create tasks and measures that better capture what we want to know about students--whether students can reason, problem solve, demonstrate deep understanding of the matieral.


Question from Charles Tucker, Instructor, Wichita State University: Does taking computer based tests show any increase in scores over traditional pencil? If so, can you explain why?

Randy Bennett: For adults taking cognitive tests, the scores from computer and paper versions of the same test are typically close enough to be considered interchangeable. As I may have said in response to an earlier question, in the k-12 context, there are very few published, peer reviewed studies. In the NAEP Math Online study, which used nationally representative samples of 8th graders taking a test presented on paper and on computer, scores on the paper version were higher than scores on the computer version of the test by a small amount. Several other studies (not necessarily published, peer reviewed, or that used representative samples) have found a similar effect--i.e., paper easier than computer by a small amount. I would predict that such differences will moderate further as (1) kids become more familiar with taking tests on computer, (2) test delivery software becomes more capable, (3) test developers learn to design items that work in familiar ways, and (3) computers become more intuitive and easier to use.


Question from Carole Kennedy, retired educator: Schools are increasingly interested in using computer-based assessments. What research is available to help them make decisions about its positive and negative effects?

Randy Bennett: The most comprehensive studies done to date are associated with NAEP. (I should say that I was involved in those studies so that my bias is clear.) There were several unique aspects to those studies. First, they used nationally representative samples of students (4th and 8th grades in math, 8th grade in writing). Second, they included hands-on measures of students’ computer skills. Third, they used constructed-response as well as multiple-choice items. Finally, they looked at a variety of other issues including the effects of delivery mode on population groups and the accuracy of the automated scoring of constructed-response items. The reports are available from the National Center for Education Statistics and can be accessed free via the web at: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2005457. As I say, I believe that is the most comprehensive research available on k-12 online testing. (But you know my bias!)


Question from Gilbert Andrada, State Education Consultant, Connecticut DOE: How far in advance of CBT should a state implement curriculum objectives involving keyboarding skills, pointer skills, and base knowledge of the language that is used to communicate computer behaviors (i.e “click,” “highlight,” “menu”) so that we can expect that test scores are minimimally impacted by the mode of test delivery?

Randy Bennett: In general, the more time the better. Clearly, students should be given enough time to develop the computer-related skills that they will need to take the test. If the test is simply a multiple-choice measure that requires only very limited computer skill, less advance time is going to be needed than if it involves constructed-response questions that ask the student type an extended essay. In either case, the student should be given multiple opportunities well in advance of the test to become familiar with the computer (keyboard, mouse, and vocabulary), the test delivery software, and the specific item types that will be used in the assessment.


Question from Peter Davidson, Zayed University: How can you ensure equivalency between a computer-based version of a test and its paper-based version?

Randy Bennett: It’s important first to recognize that “equivalency” or “comparability of scores” is important under the following types of circumstances: (1) some students take the test on paper and others take it on computer and we want to compare scores to one another or to a the same fixed standard, (2) we want to aggregate scores taken on paper with those taken on computer, (3) we want to compare performance over time when there has been a change in testing modes. If those types of conditions apply, then we can do several things to increase the chances that scores will be comparable: (1) make sure that students have sufficient time before the test to become familiar with the basic computer skills required to take the test, the delivery software and navigational conventions, the test tools (e.g., online calculator, highlight tool), and the specific item types and their presentation characteristics and response requirements; (2) make sure timing is set so that the computer and paper versions are equally (un)speeded (which may not mean having equal time limits for each mode; (3) make sure that the student is familiar with the particular computer type that the test will be delivered on (e.g., including the keyboard layout, which may be different from the computer he or she routinely uses); (4) make sure that the presentation characteristics and response requirements of items do not introduce skills that aren’t required for the paper test and which may therefore differentially affect student performance; and finally (5) equate where appropriate (i.e., adjust scores to take account of comparability differences).


Question from Chuck Half, Coordinator Technology, Pittsburgh Public Schools: As a 34,000 student district setting a 3 - 5 year assessment strategy in place, we are concluding that utilizing online assessment technology is NOT beneficial because online: • does not encourage students to write things down • can not support open-ended questions • can not support the oral response format • encourages ‘guessing’ rather than thinking • forces ‘scrolling’ just to read the passages and then again when answering questions versus turning the pages back and forth • is not consistent with the state’s PSSA administration environment • requires significant changes in access to and scheduling of computer labs.

What do you recommend is realistic for the next 3 - 5 years?

Gregory K.W.K. Chung: First a reaction ... it is uncanny how your operational experience dovetails with my experience with computer-based studies in K12, university, and military settings. Regardless of the site, infrastructure is always a concern for the many “gotchas.” And more substantively, people want to know whether their students can think (as demonstrated by a variety of complex performance), but guess what--with the exception of essays, the only (operational) game in town is the selected response format whose online implementation is at times a poorer implementation of the paper booklet. This is a remarkable turn of events that echo’s Lindquist’ lamenting of how machine scoring constrained the development of novel items.

So 3-5 years out ... more of the same. This is not to say that there hasn’t been any activity in this area. In fact, Randy Bennett and ETS, NBME, and CRESST have been active in this area for quite some time. But the problem is that innovative formats take a while to develop and validate, there are a lot of unknowns, and in the end, the format is only one part of what is essentially a business decision.


Question from Ron Skinner, ASBO International: If a district were able to look beyond cost and provide all of the resources necessary to start computer-based testing, what other pitfalls might they face during implementation?

Randy Bennett: If the test were for some high-stakes purpose, like promotion or graduation, then security would be a concern. Some Internet test delivery systems allow students to access other programs while taking the test or to copy screens using the SHIFT Print Screen key combination. Fairness might also be affected if the display of test items on one computer was not the same as the display of the same items on another computer in the same classroom or lab. Some Internet test delivery systems allow such variation in display, meaning that one student might see three of the five multiple-choice options on screen (and have to scroll to see the rest), while another student might see all five without having to scroll at all. These types of variation from one machine to the next can unfairly affect test score.


Question from Diane Weir, School Committee, Westford Public Schools: If computer-based assessments are going to shape classroom activity, they need to be used more often and integrated into an overall computer-based learning system. It would require a shift in culture and allocation of resources and some acceptance of risk. What can universities and the testing industry do to support this shift?

Randy Bennett: A variation on this idea can be found in what are called Intelligent Tutoring Systems. These systems integrate real-time assessment with instruction. That is, they dynamically assess what a student knows and present instruction matched to that student’s “knowledge state.” John Anderson, at Carnegie-Mellon University, has been working on this concept for close to two decades. His “Cognitive Tutors,” as they are called, are now being used in a few thousand schools, I believe, in secondary school mathematics. They are integrated with classroom instruction in the way you describe, meaning that students work with the computer some portion of the time but also engage in more traditional classroom activities for the remainder of the time. The one thing missing that I would like to see added to this concept is the aggregation of information collected from the student’s many interactions with the computer. That aggregation could then supplment, or perhaps eventually replace, the summative assessment. For more about the Cognitive Tutors, see http://www.carnegielearning.com/. For more about the general idea of integrating classroom assessment and summative assessment, see: ftp://ftp.ets.org/pub/res/reinvent.pdf


Question from Matt Silberglitt, Science Assessment Specialist, Minnesota Department of Education: What seems to be driving the move to online testing? Are most states looking to increase quality, validity, and universal design or are most states looking for lower long-term costs and shorter turnaround on results?

Randy Bennett: By far, the driving force seems to me to be getting results back to teachers faster. That’s true for the increase in online formative/diagnostic and interim/progress assessment, as well as for the increase in online summative assessment.


Question from Lois Bishop, Director of Instructional Programs, The American School Foundation of Guadalajara, Mexico: Of what value is on-line testing to the classroom teacher?

Gregory K.W.K. Chung: I think the utility of online assessment has largely ignored the classroom, but I think computer-based (formative) assessment is going to be one of the hot areas in the near future. Here’s why. First, the classroom is where the action is. That’s where learning happens. But the classroom environment hasn’t fundamentally changed for over a century--group instruction (vs. individualized), non-adaptive, large student:teacher ratios, numerous demands on teacher time. What this means down to is a middle or high school teacher teaching 5 subjects and 150 kids a day. No practical way to get good information about their students in a timely way. Homework is used for student practice not as information about student learning. Tests are too infrequent. If a student performs poorly on the test, it’s too late to do anything. So, online formative assessments at the very least can be used for information gathering (e.g., homework as diagnostics). With good data management and reporting, such information can be used for daily or weekly updates, increasingly the chances of detecting group- or individual-level problems.


Question from Christi Dennis, Data Recognition Corporation: Could you elaborate on your earlier comment regarding interesting developments in the area of scoring complex problem solving?

Gregory K.W.K. Chung: The late 1990s seemed to be the go-go days of automated scoring of complex responses. I refer you to the work at ETS (Bennett, Bajar, Gitomer, Mislevy, Burstein), CRESST (Baker, O’Neil, Chung), NBME (Clauser). But the central questions many of these researchers were asking was whether technology could be used to create complex tasks, whether such tasks could be scored, and whether the scores made sense (i.e., validity). So for example, ETS did some work on scoring architectural designs, mathematical reasoning, automated essay scoring, engineeing design, and a variety of interesting responses types. The work at CRESST explored whether we could score someone’s Web information seeking, conceptual knowledge, teamwork skills, online behavior, and most recently the use of simulations and sensors to uncover learning processes previously hidden.


Question from Charles Blaschke, President, Education TURNKEY Institute, cblaschke@edturnkey.com: A recent report from the National Center on Educational Outcomes, University of Minnesota, reported that eleven states are in the process of developing computer-based large scale assessments which is down from 20 states who reported the same in 2003. At the same time the NCEO report indicates that a third of the states are field testing alternative ways of providing appropriate accommodations for students with disabilities, especially on alternative assessments, and 27 states have included RFP requirements for developers to follow “universal design principles” for the development of alternative tests. Since one of the justifications of computer-based testing has been the relative ease by which accommodations could be provided (e.g., the Oregon settlement), how does one explain the decline of state planned computer-based assessments while higher priority appears to be being placed by state on the uniform use of valid accommodations or universal design principles?

Randy Bennett: I haven’t read the report, so I can’t comment on it directly. But what should be obvious is that NCLB does require the inclusion of students with disabilities in state assessments. It doesn’t require computer-based testing. So first things first.


Randy Bennett:
To pick up on Chuck Half’s question about what will be 3-5 years out, you might look at the state of the art in more advanced settings where infrastructure and cost are less of an impediment than they are in k-12 today. In the new version of the Test of English as a Foreign Language (TOEFL) just introduced, there is a speaking section that provides for an oral response. In the United States Medical Licensure Examination, candidates solve a series of computer-presented patient-management problems by ordering tests, prescribing treatments, and making diagnoses. In the National Association of Architectural Registration Boards’ Architect’s Registration Examination, candidates use the computer to design buildings. In the American Institute of Certified Public Accountants licensure exam, candidates respond to complex accounting problems. So, in these various cases, the problems call for complex thinking, intermediate steps are recorded, and responses other than multiple choice or an essay are required. We might not see this level of complexity in k-12 tests in 3 years but we will see it there eventually.


Question from Bill Irwin Senior Research Associate PA School Boards Association: In use of computers for testing, won’t students who come from a disadvantaged backgrounnd be further disadvantaged by the fact that many are less familiar and less comfortable with technology?

Randy Bennett: That would depend in good part on how familiar and comfortable one has to be to take a particular computer-based test. If the test was a multiple-choice one that only required the student to click on response options and if the student had sufficient time to practice the skills needed to take that test before hand, there may be no disadvantage at all. On the other hand, if the test required extensive keyboard skills and students from low-SES backgrounds were less likely to have those skills, there could be significant disadvantage. It is true, however, that the ratio of students to Internet-connected computers in instructional classrooms as of 2003 was only marginally greater in low SES schools than in high SES schools (5.1:1 vs. 4.2:1). (See http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2005015). So, if access in schools is any measure, the situation may not be as bad as one might imagine.


Question from Gene Coulson, Executive Director, Office of Program Services, West Virginia Department of Education: Please explain how an online test might be constructed differently than a paper/pencil test.

Gregory K.W.K. Chung: Let’s consider two scenarios -- (a) when both online and paper can measure the same construct, and (b) when the online version is the only plausible environment. The first condition--not much different I think, unless you want to measure more. So suppose I want to whether a student can solve the equation 3*(a + 2b) = ?. You may be interested in only the answer. In this case, no difference. But if you were interested in the process the student used, then the both versions begin to depart a little. The paper may have a “show your work” section. The online might have the same, but with text fields provided for students to type their responses. Still similar. Now if you have 100s of students, the paper version quickly becomes overwhelming. The online version can collate the responses, and ideally score each step. The second condition is where the online version is the only way to go. For example, if you trying to measure students’ search skills--which mode would you use? So you might set up a search environment with tracking to see whether students could find different kinds of information (e.g., ill-defined or well-defined searches), and observe the kinds of search strategies they used (e.g., boolean, browsing, etc). But regardless of format, the fundamentals remain the same--focus first on learnin-related processes you want to know about (e.g., do they understand the matieral, can they use effective problem solving strategies), then the content (in math, physics, etc). The worry about format near the end of the process.


Question from Bretton Spar, Fairfax County PS Project Manager: A current issue we are facing is meeting the needs of special education students requiring testing accomodations; screen readers for example. What are the industry trends for meeting needs of this signicant population?

Randy Bennett: The best persons to answer this would be the folks at CAST (Center for Applied Special Education Technology) (see http://www.cast.org/) or at the National Center on Educational Outcomes (see http://education.umn.edu/nceo/). At CAST, Bob Dolan is the person to contact.


Question from Gage Kingsbury, NWEA: Using the computer as a page-turner for a test is useful, but it seems that computer-based testing offers an opportunity to create innovative tests that might tap new domains of achievement. What is being done in this area?

Gregory K.W.K. Chung: I refer to a previous posting (Christi Dennis), and I exapand here. There is constant work in the field pushing in this regard. With respect to tapping new domains of achievement, I think on the far-out end we’re going to see the emergence of blended spaces. That is, to date the conception is that student-computer is the assessment space, and this is reasonable given the technology development path, and the kinds of assessment targets have focused usually on the cognitive side. But this is no longer a hard requirement. Sensor technology is available to radically expand the range of observation--what you do, say, see, and feel can all be sensed. And this is true for the things students interact with (e.g., blocks, books, toys, tennis racket, etc.). So this really opens serveral new dimensions for targets of assessment. Here’s a scenario--if you wanted to assess young children’s problem solving skills (3-6 year olds), how would you do it? One way is to give them some ordering tasks, etc. But another way is to observe under natural conditions (e.g., play or other like context)--measure students behaviors, the objects they interact with, and other people they interact with. Blended physical observations with computer- and other classroom activities and you’ve got a rich picture of what a 4-year can do. And this is entirely doable.


Kevin Bushweller (Moderator): Thank you for joining us for this online chat. And I want to extend a special thanks to our guests, whose expertise in computer-based testing made this a fascinating discussion.