Artificial intelligence, advanced numerical analysis, and. Effect of item response theory irt model selection on. A didactic approach to the use of irt truescore equating. Statistical equating with measures of oral reading fluency. Equipercentile equating determines the equating relationship as one where a score could have an equivalent percentile on either form. Impact of group differences on equating accuracy and the. Equipercentile equating is typically done by computer, though it is relatively easily done by. Equating scores from adaptive to linear tests iacat. Frequently asked questions equating of scores on multiple. The computer programs listed below can be used to conduct many of the equating analyses described in kolen and brennan 2004. Patrick meyer is an assistant professor in the curry school of education at the university of virginia. Annual meeting of the national council on measurement in education. Irteq windows application that implements irt scaling and. Kernel and traditional equipercentile equating with degrees of.
Test equating is the statistical process that accounts for the differences in test difficulty and then adjusts the scale of the current test administration so that the same criterion standard can be used. The test form to which we are equating the new form. An equipercentile version of the levine linear observedscore. It is based on a flexible family of equipercentilelike equating functions and contains the linear equating function as a special case. You can equate forms with classical test theory ctt or item response theory irt. Dementia rating scale2 drs2 publisher, online testing. Two local methods for observedscore equating are applied to the problem of equating an adaptive test to a linear test.
R is a free and open source software that is widely used for statistical analyses. Statistical methods for test equating computer software manual. The class is a nonmathematical introduction to the topic, emphasizing conceptual understanding and practical applications. Equating unl digital commons university of nebraskalincoln. The table below shows how the test equating process works. A sas program for calculating equivalent scores using the equipercentile method. Finally, examples of currently available software will be inventoried. An analytical procedure for the equipercentile method of.
There are three general approaches to irt equating. Methods and practices statistics for social and behavioral sciences kolen, michael j. The new edition of test equating, scaling, and linking. Equating determines for each score on the new form the corresponding score on the reference form. The proposed procedure requires a approximating the empirical score distributions of the two forms by means of the first terms of an infinite series, and b contrasting the results obtained when only the first two moments are used i. We give the assumptions for the two methods in order to emphasize that all equating methods require some nontestable assumptions to be fulfilled. Several other studies, including a generalizability study and an equipercentile equating study, were conducted to determine the equivalency between the two forms. In addition to statistical procedures, successful equating, scaling and linking involves many aspects of testing, including procedures to develop tests, to administer and score tests and to interpret. An r package for observedscore linking and equating. This book provides an introduction to test equating, scaling and linking, including those concepts and practical issues that are critical for developers and all other testing professionals. For practitioners, the book provides a splendid introduction to the topics considered. An analytical procedure for the equipercentile method of equating tests.
Pie for pc console, pie for pc gui, pie for mac os9, pie for mac os10 conducts irt true and observed scoring equating for dichotomously scored tests. Equipercentile equating produced scores on a 30point scale in all studies. Considering that irt data simulation might unequally favor irt equating methods, pseudo tests and pseudo groups were also constructed to make equating results comparable with those. Impact of group differences on equating accuracy and the adequacy of equating assumptions. The merit list is the combination of different batches taking different test forms and from different districts each. Aug, 2014 the traditional equipercentile method was used as an evaluation baseline. The class is a nonmathematical introduction to the topic. The kernel levine equipercentile observedscore equating. Therefore, the construction and administration of alternate forms of the same test is a necessary requirement for operating these testing programs cook, 2007. Software enabling complex machine learning has become widely. The major testing companies of course have the software they need for scaling and equating but software available for researchers and graduate students is very limited.
Equipercentile equating involves percentile rank or score to be found for all scores in each of the forms and of all forms and clubbed together to generate a merit list. An investigation into the test equating methods used during 2006. The advantages and disadvantages of each equating method are discussed along with the conditions conducive to satisfactory equating. Linking two assessment systems using commonitem irt. Equipercentile equating with equal interval scores brad hanson february 10, 1993 revised 5295 let x and y be discrete random variables representing the distribution of scores on two forms of a test labeled form x and form y, respectively in the some population. Equating in smallscale language testing programs sage journals.
Several methods have been developed to conduct equating. Equating is an important step in the process of collecting, analyzing, and reporting test scores in any program of assessment. The r package equate albano, 2014 is free, opensource software for conducting observedscore linking and equating under singlegroup, equivalentgroups, and nonequivalentgroups designs with one. Comparison of approaches for equating different versions of.
References of noncommercial software for irt analyses nina. The most complete coverage of the entire field of score equating and score linking in general has been provided by kolen and brennan 2004. Computer programs college of education university of iowa. Hypothesis testing of equating differences in the ke. This booklet grew out of a halfday class on equating that author samuel livingston teaches for new statistical staff at educational testing service ets. Methods and practices is a welcome update to a book which has become a classic in equating and linking. He is the inventor of jmetrik, an open source psychometric software program. The assumptions and the formulas for the chain equipercentile equating function. An equipercentile version of the levine linear observedscore equating function using the methods of kernel equating alina a. A comparison of linear, equipercentile, and fipc equating. Equating in smallscale language testing programs geoffrey. Fair and equitable measurement of student learning in moocs. Ir provides unlimited scoring and report generation after handentry of drs2 and drs2. The general form of the levine function will be soon available in ke software at.
Bayesian nonparametric estimation of test equating functions. A common approach is known as equipercentile equating. Ctt methods include tucker, levine, and equipercentile. The primary purpose of the center for advanced studies in measurement and assessment casma is to pursue researchbased initiatives that lead to advancements in the methodology and practice of educational measurement and assessment. Pdf equating in smallscale language testing programs. A handful of statistical packages are available for linking and equating test forms. Kernel equating ke is a powerful, modern and unified approach to test equating. In observed score equating, the characteristics of score distributions are set equal for a specified population of examinees angoff, 1971. This article presents a sas program that uses equipercentile equating to derive equated scores on two. Test score equating is used to compare different test scores from different test forms. The scores on these multiple forms were equated using the equipercentile equating method the legally sanctioned format and the merit list created. Estimates bootstrap standard errors of linear equating and equipercentile equating under the random groups design.
Abd in applied mathematics and computational science, 2008. However, one of the reasons that irt was invented was that equating with ctt was very weak. Methods of equating utilize functions to transform scores on two or more versions of a test, so that they can be compared and used interchangeably. Unlike with item response theory, equating based on classical test theory is somewhat distinct from scaling. Equipercentile equating was not necessary for the european continent, because all contributing studies administered versions with 30point totals. Since the turn of the century, much has been written on score equating and linking. Approach 1 includes a commonitem linking strategy using item response theory irt, with external anchor sets embedded in the new test administration. Any equipercentile equating method has five steps or parts. The book is appealing to anyone interested in the topic of equating, scaling, and linking. In the case of the common pupils design, nfer developed its own software to. In largescale testing programs, various equating methods are available to ensure the.
Conducts linear and equipercentile equating under the commonitem nonequivalent groups design. While equating methods research has flourished because of the need for technically sound designs and analyses, software development has been limited. Prior use of the equipercentile method of test equating was based on a graphic procedure which is tedious, subject to smoothing errors, and nonanalytical. An equipercentile version of the levine linear observed. A new procedure for comparing results of linear and equipercentile equating methods is presented and illustrated. A comparison of linear, equipercentile, and fipc equating methods across multidimensional test forms for nonequivalent groups. For the equipercentile equating property eep, the converted scores on form x have the same distribution as scores on form y. All of them can be accomplished with our industryleading software xcalibre, though conversion equating requires an additional software called irteq. The class consists of illustrated lectures, interspersed with selftests for the participants.
In the descriptions that follow, forms are referred to as x and y, where scores on x will be equated to the scale of y. Center for advanced studies in measurement and assessment. This booklet grew out of a halfday class on equating that i teach for new statistical staff at educational testing service ets. The results show that the 2pl and the trt approaches produce comparable results that more closely agree with the results of the equipercentile method than the grm does. A comparison of irt observed score kernel equating and. A query was sent for seeking districtwise merit list. Frequently asked questions equating of scores on multiple forms.
877 1338 243 1072 1113 1241 1352 486 838 457 885 1464 157 708 1119 1027 386 968 259 421 225 398 1060 992 850 489 1041 277 374 1479 503