A Guide to Objective Scrutiny of a Test: Individual Item Analysis & Overall Test Analysis in Practice
The present study examines important characteristics of the individual items of a test, i.e. Item Facility, Item Discrimination, Choice Distribution as well as the significant feature of overall test, i.e. Reliability. In doing so, four valid vocabulary tests, each comprising 30 items (120 items in total), were administered to an adequate number of subjects majoring in English Translation (32 students, Shahid Chamran University, B.A. Level) as similar as possible to those for whom it is really intended (i.e. target population).
The whole package of 120-item test is divided into four Vocabulary Levels Tests. They are 2000 word level, 3000 word level, Academic vocabulary & 5000 word level respectively. In order to better keep track (and of course, analysis) of the whole performance of students, the items based on the aforementioned order are numbered respectively. The required characteristic of each item is calculated. In addition, the value of Reliability is also computed for each particular test.
This try-out activity of pretesting involves administering the test for the purpose of collecting information about the usefulness of the test itself, and for the improvement of the test and testing procedures. Specifically speaking, the goal of pretesting will be twofold. The first purpose is to determine objectively the characteristics of the individual items (item analysis). These characteristics include item facility (IF), item discrimination (ID) and choice distribution (CD). The second purpose is only to determine the characteristic of Reliability.
In closing, having calculated and analyzed the pertinent statistical values, one can either safely eliminate the ill-constructed, malfunctional and unsuitable items or manipulate them technically in a way to develop an appropriate test fulfilling academic requirements.
* Choice distribution (CD) [response frequency distribution)
Choice distribution is a technique which helps a test developer to know how each and all of distractors perform in a given test administration. In simple words, choice distribution refers to the frequency with which alternatives are selected by the examinees [the distribution of responses given to different alternatives in a multiple-choice item].
Choice distribution should be determined in order to improve the test both quantitatively and qualitatively. Thus through choice distribution, the test developer can observe deficiencies existing in the nature of choices and then discard or modify them. For example, if a choice is not selected by any examinees, it implies that this distractor does not function satisfactorily; therefore it should be deleted.
The 2000 word level
The 3000 word level
The 5000 word level
* Item facility (IF) [facility value, item easiness, item difficulty]
It refers to a measure of the ease of a test item. Item facility has to do with how easy or difficult an item is from the viewpoint of the group of students or examinees taking the test of which that item is a part. The reason for concern with IF is very simple; a test item that is too easy (say, an item that every student answers correctly) or a test item that is too difficult (one, say, that every student answers incorrectly) can tell us nothing about the differences in ability with the test population; so it should be deleted.
A formula for producing decimal value for IF:
Item with facility indexes beyond 0.63 are too easy, and items with facility indexes below 0.37 are too difficult; thus it should be deleted.
Item facility refers to the proportion of correct responses, while item difficulty refers to the proportion of wrong responses.
Item Facility (IF)
* Items are numbered respectively in total (2000/3000/Academic/5000 word level)
* Item discrimination (item differentiation)
It refers to the notion that how well a test item discriminates between weak (less knowledgeable) and strong (more knowledgeable) examinees in the ability being tested. There is a relationship between item facility and item discrimination. An item with a too high or low facility index is not likely to have a discrimination power.
A suitable procedure for calculating ID is to rank the total scores of test takers from the highest to the lowest. Then, dividing examinees into two equal groups (the higher half (high group/H) and lower half (low group/L). At last, apply this formula:
CH: number of correct responses
to a particular item by the examinee in the high group
In contrast to item facility where the ideal index is 0.50, for item discrimination the ideal index is unity (1). Nevertheless, items which show discrimination value beyond 0.40 can be considered acceptable. An item discriminates in a positive direction (positive discrimination) if more test takers in the upper group than the lower group get the item right.
Item Discrimination (ID)
A quality of test scores which refers to the consistency of measures across different times, test forms, raters and other characteristics of the measurement context. Synonyms for reliability are: dependability, stability, consistency, predictability and accuracy. To put another way, the tendency toward consistency from one set of measurement to the next is called reliability. In doing so, reliability is best defined as the consistency of scores produced by a given test.
the number of the items in a test
* Summarized Data:
Number of subjects: 32
Summarized Statistical Calculations
Mean – Variance – Reliability
* The 2000 word level
Mean = 29.37
* The 3000 word level
Mean = 26.31
* Academic vocabulary
Mean = 27.12
* The 5000 word level
Mean = 20.06
Please see some ads as well as other content from TranslationDirectory.com: