Ideally we would like to be able to place ideology of each bit of information consumed in relation to the ideological location of the person. And we would like a time stamped distribution of the bits consumed. We can then summarize various moments of that distribution (or the distribution of ideological distances). And that would be that. (If we were worried about dimensionality, we would do it by topic.)

But lack of data mean we must change the estimand. We must code each bit of information as merely uncongenial or uncongenial. This means taking directionality out of the equation. For a Republican at a 6 on a 1 to 7 liberal to conservative scale, consuming a bit of information at 5 is the same as consuming a bit at 7.

The conventional estimand then is a set of two ratios: (Bits of politically congenial information consumed)/(All political information) and (Bits of uncongenial information)/(All political information consumed). Other reasonable formalizations exist, including difference between congenial and uncongenial. (Note that the denominator is absent, and reasonably so.)

To estimate these quantities, we must often make further assumptions. First, we must decide on the domain of political information. That domain is likely vast, and increasing by the minute. We are all producers of political information now. (We always were but today we can easily access political opinions of thousands of lay people.) But see here for some thoughts on how to come up with the relevant domain of political information from passive browsing data.

Next, generally, people code ideology at the level of ‘source.’ New York Times is ‘independent’ or ‘liberal’ and ‘Fox’ simply ‘conservative’ or perhaps more accurately ‘Republican leaning.’ (Continuous measures of ideology – as estimated by Groseclose and Milyo or Gentzkow and Shapiro – are also assigned at the source level.) This is fine except that it means coding all bits of information consumed from a source as the same. This is called ecological inference. And there are some attendant risks. We know that not all NYT articles are ‘liberal.’ In fact, we know much of it is not even political news. A toy example of how such measures can mislead:

Page Views: 10 Fox, 10 CNN. Est: 10/20

But say Fox Pages 7R, 3D and CNN 5R, 5D

Est: 7/10 + 5/10 = 12/20

If the measure of ideology is continuous, there are still some risks. If we code all page views as mean ideology of the source, we assume that the person views a random sample of pages on the source. (Or some version of that.) But that is too implausible an assumption. It is much more likely that a liberal reading the NYT likely stays away from the David Brooks’ columns. If you account for such within source self-selection, selective exposure measures based on source level coding are going to be downwardly biased — that is find people as less selective than they are.

Discussion until now has focused on passive browsing data, eliding over survey measures. There are two additional problems with survey measures. One is about the denominator. Measures based on limited choice experiments like ones used by Iyengar and Hahn 2009 are bad measures of real life behavior. In real life we just have far more choices. And inferences from such experiments can at best recover ordinal rankings. The second big problem with survey measures is ‘expressive responding.’ Republicans indicating they watch Fox News not because they do but because they want to convey they do.