Cartoon by John Landers, Courtesy of Causeweb.org |

One of the biggest problems faced by researchers and practitioners alike is the lack of helpful

**standards**to make sense of all the**research****data**. All the**statistical data**results I mean. The problem becomes more pressing if you are the person who must read that data, analyze it and even worse, present or discuss it with members of your audience. There is no time to waste because you needed it yesterday, right? If this sounds like you, then this article sums it up for you briefly.
More specifically, this article will expose you to a range of useful

**statistics**and provide you with meaningful “tools” that will help you easily explain their worth to others. The meaning of those short classifying phrases is limited to interpreting PPM coefficients, test-retest stability coefficients, adjusted R-square values, index effect sizes, reliability or internal consistency coefficients, difficulty indices, item-discrimination indices or point biserial correlation coefficients. These apply to any**research**study.**Interpreting Pearson Product Moment Correlation or PPM Coefficients**

Davis (1971 in Chappell,1984) labeled Pearson coefficient (

__zero order__) correlations between .01 to .09 as “negligible”, .10 to .29 as “low”, .30 to .49 as “moderate”, .50 to .69 as “substantial” and .70 or higher as “very strong.” These short classifying phrases will help facilitate consistency in the**interpretation**of size (or strength) of all Pearsonian and adjusted R-Square values obtained in any study. These**interpretations**apply to simple correlation coefficients (or PPM correlation coefficients) as well as Pearsonian test-retest stability coefficients.**Interpreting the index Effect Size (ES)**

The index Effect Size (ES), defined as “the mean difference between the treated and control subjects divided by the standard deviation of the control group,” (Smith, Glass & Miller, 1980) is often used to evaluate the magnitude of the experimental effect in standard deviation units. Schermer (1988) reviewed ES outcomes and devised a set of

**standards**to facilitate consistency in the**interpretation**of these**outcomes**. Any researcher can adopt these**standards**. In quantitative terms, point size estimates of less than .2 are “small” effects, .5 as “medium” in size and higher than .5 as “large.” Use these**benchmarks**to estimate the magnitude of the effect over the posttest or delayed posttest measure.**Interpreting Reliability Correlations or Internal Consistency Coefficients of Exams or Teacher-Made Tests**

Fox (1969) labeled reliability correlations between 0 and .50 as “low”, .51 to .70 as “moderate”, .71 to .86 as “high”, and above .86 as “very high” for the purposes of

**educational research**. Another researcher’s review of**evaluation devices**identified minimum reliability coefficient values of .85 for making effective decisions about individuals and .65 for groups (Ridley, 1976). Nothing below .50 for the latter would suffice (Jordan, 1953; Nunnally, 1967). Additionally, Diederich (1960), states that most teacher-made tests “…regarded as good, usable tests achieved reliabilities between .60 and .80.” Lower test reliabilities may be acceptable for group**research projects**in education (Borg and Gall, 1989). Short form tests can expect slight drops in reliability in spite of retaining the best test items (Borg and Gall, 1989). If the reduction in length represents a negligible decrease in reliability, you will gain substantial savings in time-spent writing an exam. These short classifying phrases will facilitate consistency in the**interpretation**of size (or strength) of reliability coefficients for all instruments or exams used in any**research project**. Some well-known**statistics**that fall into this category include Cronbach’s Alpha, the Kuder-Richardson Formula 20 (KR-20) and the Kuder-Richardson Formula 21 (KR-21).**Interpreting the Quality or Power of each test item (or question) for the purposes of Item or Exam Analysis**

Selection of final composite test items (or questions) proceeds with the goal of obtaining a representative range of difficulties, and the highest possible item discrimination values balanced vis-à-vis the highest possible coverage over the Table of Specifications for content validity maintenance, thereby reducing researcher bias in the selection process (Richter, 1980).

Using the

**guidelines**proposed by Kromhout (1987),**test items**passed by 80 percent of exam takers are extremely easy and items passed by less than 20 percent are extremely difficult for exam takers in a**field-trial****“test” study**. In other words, removing**difficulty indices**above .80 and below .20 ensures that all exam takers receive a test with a moderate range of difficulty.
The

**item-discrimination index**, which analyzes the power of each test (question) item, is the “**Point Biserial Correlation Coefficient**” (**PBCC)**. Researchers consider this coefficient “… to be the single best measure of the effectiveness of a test item” (Lewis, 1989). Lewis (1989) proposes the following range of numbers and**interpretations**. A test item (or question) with a**PBCC**of .30 and above is a very good discriminator of the top 24% from the bottom 24% scoring groups. A test item with a**PBCC**of .20 to .29 is reasonably good, but subject to improvement. Test items with**PBCCs**of .09 to .19 are marginal, usually needing improvement, and those below .09 are poor, to be improved or discarded.**Conclusions and Recommendations**

This concludes your reading on “Standards for Interpreting Statistics Made Easy.” Researchers and practitioners are encouraged to use these

**standards**and**benchmarks**in their future efforts related to analyzing, interpreting or explaining, and presenting their**statistical data.**I hope that this read has made you a little less fearful of**statistics**and a little more confident in your newly acquired knowledge of the meaning and worth of these numerical performance**benchmarks**.**Author Information:**

Ihor Cap, Ph.D. is an Education Research Specialist, Web Author and Marketing & Promotions Manager for EZREKLAMA.

**References**:

The complete reference to each of the cited sources is available in the following document cited below.

Cap, Ihor. (1995).

__The usefulness and effectiveness of a self-instructional print module on multicultural behaviour change in apprentices in____Manitoba____.__Unpublished doctoral dissertation, Florida State University, Tallahassee. Available from University Microfilms Inc., P.O. Box 1764, Ann Arbor, MI 48106-1764 USA. (377 pages)
Cartoon Picture by John Landers, Courtesy of Causeweb.org

This article first appeared August 24, 2009 in http://articlesandblogs.ezreklama.com.

VIEWS TO DATE