Why do we care about reliability?

parachute Before providing a more complete introduction to reliability, and validity for that matter, it is important to understand why we care about reliability. According to the American Psychological Association (APA), there are two main reasons we should care about reliability. To paraphrase their first reason, if one is conducting research using some form of assessment, for that research to be take seriously, the underlying tools and their output must be viewed as reliable. The second reason is that in order for the assessment scores to have credence, they must have a reasonable amount of reliability.

At TTI Success Insights we are interested in having both our research and our assessment scores seen as credible. Hence, according to APA guidelines, we should view strong reliability of our assessment scores as a critical component of our research program. Historically, TTI Success Insights has periodically released reliability studies. The reader interested in viewing the results of these studies may find them here https://www.ttisuccessinsights.com/research/.

2 Important Considerations for any Assessment

There are two basic concepts that any assessment, psychometric or not, should address. Theses concepts are reliability and validity. We will briefly discuss validity, as the two concepts do go hand in hand, and will devote a series of blogs to a more in-depth exploration of validity in the future. For now, we present a high-level view of the two concepts and how they are related.

Consider a simple, home use, oral thermometer. If one were to measure their own body temperature several times over a period of a few hours, and this individual is not suffering from an underlying illness, one would expect to see very similar results for each reading. In other words, take a body temperature now and it reads 95.6 degrees F. Take the body temperature again in 30 minutes and it reads 96.0 degrees F. Continue this exercise for a few hours and note each reading. If the readings are all reasonably close the 96.0 degrees F, we would be able to conclude the thermometer is reliable. If the temperatures are all close to 98.6 degrees F rather than 96.0 degrees F, we would be able to conclude that the thermometer is also valid.

Using the previous example, we say that reliability of an assessment is a measure of how consistent the assessment is. Exactly what that means is discussed in a subsequent section. The validity of an assessment is a measure of how accurately the assessment measures what it claims to be measuring.

4 Basic Types of Reliability Measures

The APA suggests there are 4 basic types of reliability. These are:

Internal Consistency
Temporal Consistency
Alternate or Parallel Forms Reliability
Generalizability

The authors would like to note that not all contemporary literature follows this list of types of reliability. In some sense, the term Generalizability is a field of study in and of itself. Further, many credible sources include inter-rater reliability on their lists. TTI Success Insights has chosen to follow the APA guidelines rather that chase the elusive historical and contemporary consensus on definitions of reliability and validity.

Internal Consistency

Internal consistency is a measure of how consistently a population responds to the individual items on an assessment. There are many approaches to computing this type of reliability. Each approach produces a coefficient that is then interpreted in an appropriate context. A very popular measure of internal consistency in the assessment market is the α (alpha) coefficient originally derived by Cronbach, a long time professor of educational psychology at Stanford .

Some reliability studies may provide split-half reliability estimates. Contemporary researchers, including Cronbach, suggest using a more robust method such as the ω (omega) coefficient derived by McDonald, a long time professor of quantitative methods in psychology at the University of Illinois. It is possible to compute some reliability coefficients manually, but generally a statistical software package is required.

Temporal Consistency

While internal consistency measures how consistently individuals respond to individual items on an assessment, temporal consistency is a measure of how consistent individuals’ scale level scores are over time. As an example, consider the TTI Success Insights Style Insights behaviors assessment. This is a four scale assessment measuring Dominance, Influence, Steadiness, and Compliance. Suppose an individual takes the Style Insights assessment and scores 53 on the Dominance scale. Now, in a few months the individual responds to the assessment again. If this individual scores in close proximity to 53 on the Dominance scale the second time taking the assessment, that individual is consistent in their responses over time.

However, we are interested in the assessment, not the individual. With that in mind, the same approach is taken with respect to a population or sample of a population. We measure the correlation between the scores on the first and second measures. If the correlation is strong, whatever that may mean, then we are confident that the assessment shows temporal consistency.

A couple points are worth noting. First, a reasonable amount of time between administrations of the assessments is required. If an individual responds to an assessment multiple times in a short time span, simple memorization of earlier responses likely accounts for the similarities in the individuals scoring. Second, the longer the time period between administration of the assessments, the higher the likelihood of a major life event. An individual’s behavior style is relatively stable over time. However, major life events may impact an individual’s style and, possibly, test-retest comparisons.

Alternate or Parallel Forms Reliability

Alternate forms reliability is an approach that requires not one, but at least two different versions of the same assessment that are relatively equally effective at measuring the content domain in question. The idea here is to have a group of respondents take both versions of the assessment in a relatively short period of time. The individuals’ scores are then studied from a correlation standpoint or another similar approach to determine the level of consistency between the two versions. High levels of correlation indicate high levels of reliability in the sense of alternate forms.

Generalizability

Generalizability^* is an entire field of study in and of itself. The idea is to extend the possible sources of error in measurement beyond a single concept. For example, internal consistency only looks at, essentially, inter-item correlation. Temporal consistency only considers correlation across time. Generalizability considers, for example, consistency across both inter-item correlations and time, simultaneously. A more complete description of this process is beyond the scope of this article.

TTI Success Insights

TTI SI has traditionally focused on the areas of internal and temporal consistency. Alternate forms reliability requires 2 similar assessments for analyses. The time and resource commitments to independently develop an alternate form of Style Insights, or any of the other TTI SI assessments, is prohibitive.

However, during the natural course of time, assessments periodically require an in depth review and potential re-development. During such a process, one may develop such an independent assessment measuring the same content domain and naturally generate a situation in which alternate forms reliability may be studied. For a historical view of TTI SI reliability studies, see TTI’s research page.

Conclusion

The future is bright at TTI SI. Previous reliability studies at TTI SI included internal consistency measures, in the form of the α coefficient, along with some of the usual descriptive statistics. TTI SI also periodically performs test-retest studies on temporal consistency. As noted earlier, contemporary literature on this topic calls for better measures of internal consistency than the α coefficient, or at least reporting the α coefficient with confidence intervals. TTI SI plans to implement α coefficient with confidence intervals as well as reporting the ω coefficient with confidence intervals.

We also plan to incorporate Generalizability theory in both internal consistency (to measure errors in both the items and the respondents) and in temporal consistency (to measure errors in items, respondents, and across time).

Reliability and validity remain a primary focus of everything we do at TTI SI since we understand the importance of the information and the differentiating factor it provides us in the marketplace.

_______________________________

Bibliography:

Cronbach, Lee J., Glesser, Goldine C., Nanda, Harinder, Rajaratnam, Nageswari. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. John Wiley & Sons, Inc., New York, 1972.

Topics:

research