The General Social Survey (GSS), run out of National Opinion Research Center at University of Chicago, and American National Election Studies (ANES), which until recently ran out of University of Michiganâ€™s Institute for Social Research, are two preeminent surveys tracking over-time trends in social and political attitudes, beliefs and behavior of the US adult population.
Outside of their shared Midwestern roots, GSS and ANES also share sampling design – both use stratified random sample, with selection of PSUs affected by necessities of in-person interviewing, and during the 1980s and 1990s, sampling frame. However, in spite of this relative close coordination in sampling, common mode of interview, responses to few questions asked identically in the two surveys diverge systematically.
In 1996, 2000, 2004, and 2008, GSS and ANES included exact same questions on racial trait ratings. Limiting the sample to just White respondents, mean difference in trait ratings of Whites and Blacks was always greater in ANES – ratings of hardwork and intelligence, almost always statistically significantly so.
Separately, difference in proportion of self-identified Republicans estimated by ANES and GSS is declining over time.
This unexplained directional variance poses considerable threat to inference. The problem takes additional gravity given that the surveys are the bedrock of important empirical research in social science.
Technology has made it easy to analyze data. However we have paid inadequate attention to developing automation in data analysis software that pays more attention to potential problems with the data itself. For example, I was recently exploring how interviewer rated political knowledge varied by respondent’s level of education within each year over time using ANES cumulative file. It was only when I plotted the confidence bounds (not earlier) that I found that in 2004 7-category education variable (VCF0140a) had fewer than 7 levels – a highly unlikely scenario. To verify, I checked the unique levels for education in 2004 and indeed there were only 5 -
 6 5 2 3 1
The variable from which the 7-category variable is ostensibly constructed (V043254) in 2004 has 8 levels. Since the plot looks reasonable for 2004, the problem was likely due to the case of (unwarranted) collapsing of adjacent categories than switching order more irresponsibly. Tallying raw counts revealed that categories 6 and 7, 0 and 1, and 4 and 5 had been collapsed.
On to the point about developing software that automatically flags potential problems – it would be nice if the software flagged differing number of levels of same variable by year. However this suggestion is piecemeal and more careful thinking ought to be brought to bear to design issues.
I have on occasion used American National Election Studies (ANES) cumulative file to do over time comparisons. Roughly half of those times, I have found patterns that don’t make much sense. Only a small fraction of the times when the patterns didn’t make sense have I chosen to investigate the data more closely, as a likely explanation for aberrant patterns. The following ‘finding’ is a result of such effort.
ANES cumulative file (1948-2004) carries a variety of indices. In creating some of the indices, it appears pre-election measures have been combined with post-election measures in some of the years. If that wasn’t enough, at least one of the times, the same index in some years has pre-election measure combined with post-election measure, while using only post measures in other years. Here’s an example –
‘External Efficacy Index’ (VCF0648) is built out of two items –
Item 1: Public officials don’t care much what people like me think.
Item 2: People like me don’t have any say about what the government does
Item 2 is asked both pre and post election in some cycles. In 1996, efficacy is built out of -
960568 (pre), 961244 (post)
[you can ID post-election wave questions through the following coding category - Inap, no Post IW]. Post version of 960568 is 961245
While in 2000 it is built out of – 001527 (post) ,001528 (post)
I have alerted the ANES staff and it is likely that the new iteration of cumulative file will fix this particular issue.