Perils of balancing scales

15 Nov

Randomization of scale order (balancing) across respondents is common practice. It is done to ‘cancel’ errors generated by ‘satisficers’ who presumably pick the first option on a scale without regard to content. The practice is assumed to have no impact on the propensity of satisficers to pick the first option, or on other respondents, both somewhat unlikely assumptions.

A far more reasonable hypothesis is that reversing scale order does have an impact on respondents, on both non-satisficers and satisficers. Empirically, people take longer to fill out reverse ordered scales, and it is conceivable that they pay more attention to filling out the responses — reducing satisficing and perhaps boosting the quality of responses, either way not simply ‘canceling’ errors among a subset, as hypothesized.

Within satisficers, without randomization, correlated bias may produce artificial correlations across variables where none existed. For example, satisficers (say uneducated) love candy (love candy to hate candy scale). Such a calamity ought to be avoided. However, in a minority of cases where satisficers true preferences are those expressed in the first choice, randomization will artificially produce null results. Randomization may be more sub-optimal still if there indeed are effects on rest of the respondents.

Within survey experiments, where balancing randomization is “orthogonal” (typically just separate) to the main randomization, it has to be further assumed that manipulation has equal impact on “satisficers” in either reverse or regularly ordered scale, again a somewhat tenuous assumption.

The entire exercise of randomization is devoted not to find out the true preferences of the satisficers, a more honorable purpose, but to eliminate them from the sample. There are better ways to catch ‘satisficers’ than randomizing across the entire sample. One possibility is to randomize within a smaller set of likely satisficers. On knowledge questions, ability estimated over multiple questions can be used to inform propensity the first option (if correct and if chosen) was not a guess. Response latency can be used as well to inform judgments. For attitude questions, follow up questions measuring the strength of attitude etc. can be used to weight responses on attitude questions.

If we are interested in getting true attitudes from ‘satisficers,’ we may want to motivate respondents either by interspersed exhortations that their responses matter, or by providing financial incentives.

Lastly, it is important to note that combining two kinds of systematic error doesn’t make it a ‘random’ error. And no variance in data can be a conservative attribute of data (with hardworking social scientists around).