Article for translators: A Guide to Objective Scrutiny of a Test: Individual Item Analysis & Overall Test Analysis in Practice

Home

Join as a Member!

Post Your Job - Free!

All Translation Agencies

Advertisements

A Guide to Objective Scrutiny of a Test: Individual Item Analysis & Overall Test Analysis in Practice

By Mohammad Mohseni Far, M.A.,
Department of English,
Faculty of Letters & Humanities,
Shahid Chamran University, Ahvaz, Iran

Mmb_m2005 at yahoo com

Become a member of TranslationDirectory.com at just $12 per month (paid per year)

Introduction

Mohammad Mohseni Far photo The present study examines important characteristics of the individual items of a test, i.e. Item Facility, Item Discrimination, Choice Distribution as well as the significant feature of overall test, i.e. Reliability. In doing so, four valid vocabulary tests, each comprising 30 items (120 items in total), were administered to an adequate number of subjects majoring in English Translation (32 students, Shahid Chamran University, B.A. Level) as similar as possible to those for whom it is really intended (i.e. target population).

The whole package of 120-item test is divided into four Vocabulary Levels Tests. They are 2000 word level, 3000 word level, Academic vocabulary & 5000 word level respectively. In order to better keep track (and of course, analysis) of the whole performance of students, the items based on the aforementioned order are numbered respectively. The required characteristic of each item is calculated. In addition, the value of Reliability is also computed for each particular test.

This try-out activity of pretesting involves administering the test for the purpose of collecting information about the usefulness of the test itself, and for the improvement of the test and testing procedures. Specifically speaking, the goal of pretesting will be twofold. The first purpose is to determine objectively the characteristics of the individual items (item analysis). These characteristics include item facility (IF), item discrimination (ID) and choice distribution (CD). The second purpose is only to determine the characteristic of Reliability.

In closing, having calculated and analyzed the pertinent statistical values, one can either safely eliminate the ill-constructed, malfunctional and unsuitable items or manipulate them technically in a way to develop an appropriate test fulfilling academic requirements.

* Choice distribution (CD) [response frequency distribution)

Choice distribution is a technique which helps a test developer to know how each and all of distractors perform in a given test administration. In simple words, choice distribution refers to the frequency with which alternatives are selected by the examinees [the distribution of responses given to different alternatives in a multiple-choice item].

Choice distribution should be determined in order to improve the test both quantitatively and qualitatively. Thus through choice distribution, the test developer can observe deficiencies existing in the nature of choices and then discard or modify them. For example, if a choice is not selected by any examinees, it implies that this distractor does not function satisfactorily; therefore it should be deleted.

Analysis:

Non-highlighted rows

They refer to those items correctly responded by all testees. Technically speaking, the distractors in aforementioned items have not functioned well so they need to be modified or eliminated.
Highlighted rows (boxes in white)

They take in boxes with given frequency of being selected by the test takers. Although some of distractors/alternative out of them have worked well (i.e. those wrong choices which have been selected by the examinees), there still seems some malfunctioned distractors required to be modified or deleted (i.e. those which have not been selected).

Choice Distribution

The 2000 word level

Choice Distribution

The 3000 word level

Choice Distribution

Academic vocabulary

Choice Distribution

The 5000 word level

* Item facility (IF) [facility value, item easiness, item difficulty]

It refers to a measure of the ease of a test item. Item facility has to do with how easy or difficult an item is from the viewpoint of the group of students or examinees taking the test of which that item is a part. The reason for concern with IF is very simple; a test item that is too easy (say, an item that every student answers correctly) or a test item that is too difficult (one, say, that every student answers incorrectly) can tell us nothing about the differences in ability with the test population; so it should be deleted.

A formula for producing decimal value for IF:

A formula for producing decimal value for IF

Item with facility indexes beyond 0.63 are too easy, and items with facility indexes below 0.37 are too difficult; thus it should be deleted.

Item facility refers to the proportion of correct responses, while item difficulty refers to the proportion of wrong responses.

Item Facility (IF)

Item	Item Facility (IF)	Item	Item Facility (IF)	Item	Item Facility (IF)
1	1	41	1	81	0.93
2	1	42	0.81	82	0.9
3	1	43	1	83	1
4	1	44	1	84	0.81
5	1	45	1	85	1
6	1	46	0.06	86	1
7	1	47	0.06	87	1
8	1	48	0.93	88	0.31
9	1	49	0.65	89	0.75
10	1	50	0.75	90	1
11	1	51	1	91	1
12	1	52	1	92	0.90
13	1	53	1	93	0.03
14	1	54	1	94	0.84
15	1	55	0.50	95	0.03
16	1	56	0.06	96	0.03
17	1	57	0.18	97	1
18	0.90	58	1	98	0.62
19	1	59	1	99	0.03
20	1	60	1	100	0.93
21	1	61	0.93	101	1
22	0.81	62	1	102	0.71
23	1	63	1	103	1
24	0.81	64	1	104	0.12
25	1	65	1	105	1
26	1	66	1	106	0.06
27	1	67	0.87	107	0.25
28	0.84	68	0.09	108	0.81
29	1	69	1	109	0.06
30	1	70	1	110	0.28
31	1	71	1	111	0.37
32	1	72	0.40	112	1
33	1	73	1	113	1
34	1	74	1	114	1
35	1	75	1	115	0.09
36	1	76	1	116	0.50
37	1	77	0.09	117	1
38	1	78	1	118	1
39	1	79	1	119	1
40	0.40	80	1	120	0.40

* Items are numbered respectively in total (2000/3000/Academic/5000 word level)

* Item discrimination (item differentiation)

It refers to the notion that how well a test item discriminates between weak (less knowledgeable) and strong (more knowledgeable) examinees in the ability being tested. There is a relationship between item facility and item discrimination. An item with a too high or low facility index is not likely to have a discrimination power.

A suitable procedure for calculating ID is to rank the total scores of test takers from the highest to the lowest. Then, dividing examinees into two equal groups (the higher half (high group/H) and lower half (low group/L). At last, apply this formula:

Item discrimination

CH: number of correct responses to a particular item by the examinee in the high group
CL: number of correct responses to a particular item by the examinee in the low group

In contrast to item facility where the ideal index is 0.50, for item discrimination the ideal index is unity (1). Nevertheless, items which show discrimination value beyond 0.40 can be considered acceptable. An item discriminates in a positive direction (positive discrimination) if more test takers in the upper group than the lower group get the item right.

Item Discrimination (ID)

Item	Item Discr.(ID)	Item	Item Discr.(ID)	Item	Item Discr.(ID)
1	0	41	0	81	0.33
2	0	42	0.37	82	0.18
3	0	43	0	83	0
4	0	44	0	84	0.37
5	0	45	0	85	0
6	0	46	0.12	86	0
7	0	47	0.12	87	0
8	0	48	0.33	88	0.62
9	0	49	0.31	89	0.50
10	0	50	0.50	90	0
11	0	51	0	91	0
12	0	52	0	92	0.18
13	0	53	0	93	0.06
14	0	54	0	94	0.31
15	0	55	1	95	0.06
16	0	56	0.12	96	0.06
17	0	57	0.37	97	0
18	0.18	58	0	98	0.75
19	0	59	0	99	0.06
20	0	60	0	100	0.12
21	0	61	0.33	101	0
22	0.37	62	0	102	0.43
23	0	63	0	103	0
24	0.37	64	0	104	0.25
25	0	65	0	105	0
26	0	66	0	106	0.12
27	0	67	0.25	107	0.50
28	0.31	68	0.18	108	0.37
29	0	69	0	109	0.12
30	0	70	0	110	0.56
31	0	71	0	111	0.25
32	0	72	0.81	112	0
33	0	73	0	113	0
34	0	74	0	114	0
35	0	75	0	115	0.18
36	0	76	0	116	1
37	0	77	0.18	117	0
38	0	78	0	118	0
39	0	79	0	119	0
40	0.81	80	0	120	0.81

Reliability

A quality of test scores which refers to the consistency of measures across different times, test forms, raters and other characteristics of the measurement context. Synonyms for reliability are: dependability, stability, consistency, predictability and accuracy. To put another way, the tendency toward consistency from one set of measurement to the next is called reliability. In doing so, reliability is best defined as the consistency of scores produced by a given test.

KR-21 method: this formula is based on the assumption that all items in a test are designed to measure a single trait. Due to application of purely statistical procedure, the method is sometimes called rational equivalence.

KR-21 method

K: the number of the items in a test
X: the mean score
V: the variance

* Summarized Data:

Number of subjects: 32
Four Vocabulary Levels Tests: 2000 / 3000 / Academic / 5000 word level
Maximum mark in each level test: 30
Total mark out of four tests (maximum): 120

Subject	*2000*	*3000*	*Academic*	*5000*	Total
1	30	26	26	22	104
2	30	27	25	19	101
3	30	23	25	19	97
4	28	26	29	22	105
5	28	25	27	21	101
6	30	25	27	13	95
7	30	26	27	18	101
8	30	30	29	24	113
9	28	28	27	20	103
10	30	22	22	18	92
11	30	28	27	24	109
12	30	24	24	13	91
13	30	28	30	25	113
14	29	28	28	20	105
15	28	23	27	17	95
16	29	22	27	16	94
17	29	25	29	19	102
18	30	27	29	23	109
19	29	28	26	17	100
20	30	29	29	28	116
21	27	26	27	22	102
22	30	26	29	19	104
23	30	30	29	24	113
24	30	28	28	15	101
25	30	27	24	14	95
26	28	24	30	18	100
27	30	23	22	17	92
28	30	28	27	20	105
29	29	23	27	15	94
30	28	29	28	26	111
31	30	29	27	30	116
32	30	29	30	24	113