Why were the polls so accurate?

The Quant. Interwebs have overflowed with joy since the election. Poll aggregation works. And so indeed does polling, though you won’t hear as much about it on the news, which is likely biased towards celebrity intellects, than the hardworking many. But why were the polls so accurate?

One potential explanation: because they do some things badly. For instance, most fail at collecting “random samples” these days, because of a fair bit of nonresponse bias. This nonresponse bias – if correlated with propensity to vote – may actually push up the accuracy of the vote choice means. There are a few ways to check this theory.

One way to check this hypothesis – were results from polls using Likely Voter screens different from those not using them? If not, why not? From Political Science literature, we know that people who vote (not just those who say they vote) do vary a bit from those who do not vote, even on things like vote choice. For instance, there is just a larger proportion of `independents’ among them.

Other kinds of evidence will be in the form of failure to match population, or other benchmarks. For instance, election polls would likely fare poorly when predicting how many people voted in each state. Or tallying up Spanish language households or number of registered. Another way of saying this is that the bias will vary by what parameter we aggregate from these polling data.

So let me reframe the question – how do polls get election numbers right even when they undercount Spanish speakers? One explanation is positive correlation between selection into polling, and propensity to vote, which makes vote choice means much more reflective of what we will see come election day.

The other possible explanation to all this – post-stratification or other posthoc adjustment to numbers, or innovations in how sampling is done: matching, stratification etc. Doing so uses additional knowledge about the population and can shrink ses and improve accuracy. One way to test such non-randomness: over tight confidence bounds. Many polls tend to do wonderfully on multiple uncorrelated variables – for instance, census region proportions, gender, … etc. – something random samples cannot regularly produce.

Raising money for causes

Four teenagers, at the cusp of adulthood, and eminently well to do, were out on the pavement raising money for children struck with cancer. They had been out raising money for a couple of hours, and from a glance at their tin pot, I estimated that they had raised about $30 odd dollars, likely less. Assuming donation rate stays below $30/hr, or more than what they would earn if they were all working minimum wage jobs, I couldn’t help but wonder if their way of raising money for charity was rational; they could have easily raised more by donating their earnings from doing minimum wage job. Of course, these teenagers aren’t alone. Think of the people out in the cold raising money for the poor on New York pavements. My sense is that many people do not think as often about raising money by working at a “regular job”, even when it is more efficient (money/hour) (and perhaps even more pleasant). It is not clear why.

The same argument applies to those who run in marathons etc. to raise money. Preparing and running in marathon generally costs at least hundreds of dollars for an average ‘Joe’ (think about the sneakers, the personal trainers people hire, the amount of time they `donate’ to train, which could have been spent working and donating that money to charity etc.). Ostensibly, as I conclude in an earlier piece, they must have motives beyond charity. These latter non-charitable considerations, at least at first glance, do not seem to apply to the case of teenagers, or to those raising money out in the cold in New York.

Random redistricting (a bit more efficiently)

In a forthcoming article, Chen and Rodden estimate the effect of ‘Unintentional gerrymandering’ on number of seats that go to a particular party. To do so they pick a precinct at random, and then add (randomly chosen) adjacent precincts to it till the district is of a certain size (decided by total number of districts one wants to create). Then they go about creating a new district in the same manner, randomly selecting a precinct bordering the first district. This goes on till all the precincts are assigned to a district. There are some additional details but they are immaterial to the point of the note. A smarter way to do the same thing would be to just create one district over and over again (starting with a randomly chosen precinct). This would reduce the computational burden (memory for storing edges, differencing shape files, etc.) while leaving estimates unchanged.

A potential source of bias in estimating impact of televised campaign ads

One popular strategy for estimating impact of televised campaign ads is by exploiting ‘accidental spillover’ (see Huber and Arceneaux 2007). The identification strategy builds on the following facts: Ads on local television can only be targeted at the DMA level. DMAs sometimes span multiple states. Where DMAs span battleground and non-battleground states, ads targeted for residents of battleground states are seen by those in non-battleground states. In short, people in non-battleground states are ‘inadvertently’ exposed to the ‘treatment’. Behavior/Attitudes etc. of the residents who were inadvertently exposed are then compared to those of other (unexposed) residents in those states. The benefit of this identification strategy is that it allows television ads to be decoupled from the ground campaign and other campaign activities, such as presidential visits (though people in the spillover region are exposed to television coverage of the visits). It also decouples ad exposure etc. from strategic targeting of the people based on characteristics of the battleground DMA etc. There is evidence that content, style, the volume, etc. of television ads is ‘context aware’ – varies depending on what ‘DMA’ they run in etc. (After accounting for cost of running ads in the DMA, some variation in volume/content etc. across DMAs within states can be explained by partisan profile of the DMA, etc.)

By decoupling strategic targeting from message volume and content, we only get an estimate of the ‘treatment’ targeted dumbly. If one wants an estimate of ‘strategic treatment’, such quasi-experimental designs relying on accidental spillover may be inappropriate. How to estimate then the impact of strategically targeted televised campaign ads: first estimate how ads are targeted depending on area and people (Political interest moderates the impact of political ads [see for e.g. Ansolabehere and Iyengar 1995]) characteristics, next estimate effect of messages using the H/A strategy, and then re-weight the effect using estimates of how the ad is targeted.

One can also try to estimate effect of ‘strategy’ by comparing adjusted treatment effect estimates in DMAs where treatment was targeted vis-a-vis (captured by regressing out other campaign activity) and where it wasn’t.

R-Tip: Missing weights, weighted.mean

There are instances where sampling weights are not only unknown, but in fact, cannot be known (unless one makes certain unsavory assumptions). Under those circumstances, weights for certain respondents can be ‘missing’. Typically there the strategy is to code those weights as 0. However if you retain those as NA, weighted.mean etc. is wont to give you NA as an answer, even if you set na.rm=T.

To get non-NA answers, set NAs to zero or estimate mean over respondents with non-missing weights.

Sample this

What do single shot evaluations of MT (replace it with anything else) samples (vis-a-vis census figures) tell us? I am afraid very little. Inference rests upon knowledge of the data (here – respondent) generating process. Without a model of the data generating process, all such research reverts to modest tautology – sample A was closer to census figures than sample B on parameters X,Y, and Z. This kind of comparison has a limited utility: as a companion for substantive research. However, it is not particularly useful if we want to understand the characteristics of the data generating process. For even if respondent generation process is random, any one draw (here – sample) can be far from the true population parameter(s).

Even with lots of samples (respondents), we may not be able to say much if the data generation process is variable. Where there is little expectation that the data generation process will be constant, and it is hard to understand why MT respondent generation process for political surveys will be a constant one (it likely depends on the pool of respondents, which in turn perhaps depends on the economy etc., the incentives offered, the changing lure of incentives, the content of the survey, etc.), we cannot generalize. Of course one way to correct for all of that is to model this variation in the data generating process, but that will require longer observational spans, and more attention to potential sources of variation etc.

Moving further away from the out-party

Two things are often stated about American politics (most vociferously by Mo Fiorina): political elites are increasingly polarized, and that the issue positions of the masses haven’t budged much. Assuming such to be the case, one expects the average distance between where partisans place themselves and where they place the ‘in-party’ (or the ‘out-party’) to increase. However it appears that the distance to the in-party has remained roughly constant (and very small), while distance to the out-party has grown, in line with what one expects from the theory of ‘affective polarization’ and group-based perception.

‘Representativeness heuristic’, base rates, and Bayes

From the Introduction of their edited volume:
Tversky and Kahneman used the following experiment for testing ‘representativeness heuristic’ –

Subjects are shown a brief personality description of several individuals, sampled at random from 100 professionals – engineers and lawyers.
Subjects are asked to assess whether description is of an engineer or a lawyer.
In one condition, subjects are told group = 70 engineers/30 lawyers. Another the reverse = 70 lawyers/30 engineers.

Results –
Both conditions produced same mean probability judgments.

Discussion –
Tversky and Kahneman call this result a ‘sharp violation’ of Bayes Rule.

Counter Point –
I am not sure the experiment shows any such thing. Mathematical formulation of the objection is simple and boring so an example. Imagine, there are red and black balls in an urn. Subjects are asked if the ball is black or red under two alternate descriptions of the urn composition. When people are completely sure of the color, the urn composition obviously should have no effect. Just because there is one black ball in the urn (out of say a 100), it doesn’t mean that the person will start thinking that the black ball in her hand is actually red. So on and so forth. One wants to apply Bayes by accounting for uncertainty. People are typically more certain (lots of evidence it seems – even in their edited volume) so that automatically discounts urn composition. People may not be violating Bayes Rule. They may just be feeding the formula incorrect data.