C-8: Evaluate the validity and reliability of measurement procedures ©

Target Terms: Validity, Reliability 


Definition: The extent to which we are measuring what we intend to measure. In other words, do our data points actually represent what we think/say they do?

Example in everyday context: You’re trying to lose weight, and therefore decide to pay more attention to what you eat. You decide to look at labels to determine if certain foods are “healthy” or not. It would be a mistake to use one measure (for example, calorie count per serving) as a valid measure of whether a food is “healthy.” Caloric content is an invalid measure when trying to determine whether a particular item is healthy or not. Instead, you take a look at nutritional recommendations for someone of your age, gender, and activity level, and then determine whether food is healthy or not by looking at a range of data points – calories, ingredients, vitamin content, etc.

Example in a clinical context:
A behavior analyst wants to collect data on how long a behavior of interest lasts. They collect data on duration. This is a valid measure because the behavior analyst wants to determine the duration of the behavior and uses an appropriate measure. An invalid measure would have involved, for example, taking frequency count data, which would not have indicated how long the behavior lasts.

Example in a supervision/consultation context:
A supervisee is having trouble relating respectfully to non-behavior-analytic members of a treatment team. The supervisor and supervisee sit down to determine how to validly take data on the desired behavior of respectful communication. They operationalize their definitions, set criteria for mastery, and devise a measurement system aimed at capturing the behaviors under discussion. 

Why it matters:
When data are relevant to the phenomenon of interest, we can begin to use other scientific processes to better understand and intervene on improving socially significant behaviors. To ensure the best outcomes, we must be certain that we are treating the behavior that we want to treat and not some other behavior!


Definition: The extent to which a measurement procedure produces the same value repeatedly. In other words, can you rely on it?

Example in an everyday context: You get on the scale to see how much you weigh. The first time you step on the scale, it says 140 pounds. You immediately step on the scale again and weigh yourself and the scale says 140 pounds. This is a reliable measure. 

Example in a clinical context:
Two behavior analyst are conducting a functional analysis on a client who exhibits self-injurious behavior. Each condition lasts for five minutes each and is repeated over the course of four consecutive days. Both behavior analysts use the same measurement tool to collect data during the functional analysis and their results are nearly identical over repeated measures. This measurement was reliable.  

Example in a supervision/consultation context:
A teacher is conducting a manding session with a student and two behavioral consultants are collecting interobserver agreement (IOA) data on the fidelity of the teaching procedures. Both consultants obtain the same IOA data, demonstrating a reliable value. 

Why it matters:
Highly reliable means that changes in data can be attributed to other variables, such as the intervention, rather problems within the measurement system itself. This is crucially important when evaluating the effectiveness of intervention!

NOTE: you can have reliability without validity. For example, a small child might step on a scale multiple times, and get a weight reading of 482 pounds each time. This is a reliable, but clearly not valid, measure.

%d bloggers like this: