Capuchin Monkeys and Fairness: I want at least as much as the other

1 Dec

In a much heralded experiment, we see that a Capuchin monkey rejects a reward (food) for doing a task after seeing another monkey being rewarded with something more appetizing for doing the same task. It has been interpreted as evidence for our ‘instinct for fairness’. But there is more to the evidence. The fact that the monkey that gets the heftier reward doesn’t protest the more meager reward for the other monkey is not commented upon though highly informative. Ideally any weakly reasoned deviation from equality should provoke a negative reaction. Monkeys who get the longer end of the stick – even when aware that others are getting the shorter end of the stick – don’t complain. Primates are peeved only when they are made aware that they are getting the short end of the stick. Not so much if someone else gets it. My sense is that it is true for most humans as well – people care far more about them holding the short end of the stick than others. It is thus incorrect to attribute such behavior to an ‘instinct for fairness’. A better attribution may be to the following rule – I want at least as much as the others are getting.

Sampling on M-Turk

13 Oct

In many of the studies that use M-Turk, there appears to be little strategy to sampling. A study is posted (and reposted) on M-Turk till a particular number of respondents take the study. If the pool of respondents reflects true population proportions, if people arrive in no particular order, and all kinds of people find the monetary incentive equally attractive, the method should work well. There is reasonable evidence to suggest that at least points 1 and 3 are violated. One costly but easy fix for the third point is to increase payment rates. We can likely do better.

If we are agnostic about variable on which we want precision, here’s one way to sample: Start with a list of strata, and their proportions in the population of interest. If the population of interest is sample of US adults, the proportions are easily known. Set up screening questions, and recruit. Raise price to get people in cells that are running short. Take simple precautions. For one, to prevent gaming, do not change the recruitment prompt to let people know that you want X kinds of people.

On balance, let there be imbalance on observables

27 Sep

For whatever reason, some people are concerned with imbalance when analyzing data from randomized experiments. The concern may be more general, but its fixes devolve into reducing imbalance on observables. Such fixes may fix things or break things. More generally, it is important to keep in mind what one experiment can show. If randomization is done properly, and other assumptions hold, the most common estimator of experiment effects – difference in means – is unbiased. We also have a good idea of how often the true estimate will be in the bounds. For tightening those bounds, relying on sample size is the way to go. General rules apply. Larger is better. But some refinements to that general rule. When everyone/thing is the same – for instance, neutrons* (in most circumstances) – and if measurement error isn’t a concern, samples of 1 will do just fine. The point holds for a potentially easier to obtain case than everybody/thing being same – when treatment effect is constant across things/people. When everyone is different w.r.t treatment effect, randomization won’t help. Though one can always try to quantify the difference. More generally, sample size required for a particular level of balance is greater, greater the heterogeneity. Stratified random assignment (or blocking) will help. This isn’t to say that raw difference estimator will be biased. It won’t. Just that variance will be higher.

* Based on discussion in Gerber and Green on why randomization is often not necessary in physics.

Bad Weather: Getting weather data by zip and date

27 Jun

High-quality weather data are public. But they aren’t easy to make use of.

Here below, is some advice, and some scripts for finding out the weather in a particular zip code on a particular day (or a set of dates).

Some brief ground clearing before we begin. Weather data come from weather stations, which can belong to any of the five or more “networks”, each of which collect somewhat different data, sometimes label the same data differently, and have different reporting protocols. The only geographic information that typically comes with weather stations is their latitude and longitude. By “weather”, we may mean temperature, rain, wind, snow, etc. and we may want data on these for every second, minute, hour, day, month etc. It is good to keep in mind that not all weather stations report data for all units of time, and there can be a fair bit of missing data. Getting data at coarse time units like day, month, etc. typically involves making some decisions about what particular statistic is the most useful. So for instance, you may want, minimum and maximum (for daily temperature), or totals (for rainfall and snow). With that primer, let’s begin.

We begin with what not to do. Do not use the NOAA web service. The API provides a straightforward way to get “weather” data for a particular zip for a particular month. Except, the requests often return nothing. It isn’t clear why. The documentation doesn’t say whether the search for the closest weather station is limited to X kilometers because without that, one should have data for all zip codes and all dates. Nor does the API bother to return how far the weather station is from which it got the data, though one can get that post hoc using Google Geocoding API. However, given the possibility that the backend for the API would improve over time, here’s a script for getting the daily weather data, and hourly precipitation data.

On to what can be done. The “web service” that you can use is Farmer’s Almanac’s. Sleuthing using scripts that we discuss later reveal that The Almanac reports data from the NWS-USAF-NAVY stations (ftp link to the data file). And it appears to have data for most times though no information is provided on the weather station from which it got the data and the distance to the zip code.

If you intend to look for data from GHCND, COOP or ASOS, there are two kinds of crosswalks that you can create – one that goes from zip codes to weather stations, and one that goes from weather stations to zip codes. I assume that we don’t have access to shape files (for census zip codes), and that postal zip codes encompass a geographic region. To create a weather station to zip code crosswalk, web service such as Geonames or Google Geocoding API can be used. If the station lat,./long. is in the zip code, the distance comes up as zero. Otherwise the distance is calculated as distance from the “centroid” of the zip code (see geonames script that finds 5 nearest zips for each weather station). For creating a zip code to weather station crosswalk, we get centroids of each zip using a web service such as Google (or use already provided centroids from free zip databases). And then find the “nearest” weather stations by calculating distances to each of the weather stations. For a given set of zip codes, you can get a list of closest weather stations (you can choose to get n closest stations, or say all weather stations within x kilometers radius, and/or choose to get stations from particular network(s)) using the following script. The output lists for each zip code weather stations arranged by proximity. The task of getting weather data from the closest station is simple thereon – get data (on a particular set of columns of your choice) from the closest weather station from which the data are available. You can do that for a particular zip code and date (and date range) combination using the following script.

Not feeling as warm about Whites anymore

4 Mar

The difference between Whites’ thermometer ratings (0 = cold, 100=warm) of Whites and Blacks has gone down relatively steadily over the past 50 or so years.
therm.bw_.nes_.ovrtime

However, the decline is almost entirely explained by drop in ratings of Whites.
therm.white_.nes_.ovrtime

Aging Curves of Political Knowledge

2 Mar

age.pk.nes

Each line represents data from a different year. Source: American National Election Studies.

Now, aging curves of political knowledge by cohort.

4yr.cohort.pk.nes

Why were the polls so accurate?

16 Nov

The Quant. Interwebs have overflowed with joy since the election. Poll aggregation works. And so indeed does polling, though you won’t hear as much about it on the news, which is likely biased towards celebrity intellects, than the hardworking many. But why were the polls so accurate?

One potential explanation: because they do some things badly. For instance, most fail at collecting “random samples” these days, because of a fair bit of nonresponse bias. This nonresponse bias – if correlated with propensity to vote – may actually push up the accuracy of the vote choice means. There are a few ways to check this theory.

One way to check this hypothesis – were results from polls using Likely Voter screens different from those not using them? If not, why not? From Political Science literature, we know that people who vote (not just those who say they vote) do vary a bit from those who do not vote, even on things like vote choice. For instance, there is just a larger proportion of `independents’ among them.

Other kinds of evidence will be in the form of failure to match population, or other benchmarks. For instance, election polls would likely fare poorly when predicting how many people voted in each state. Or tallying up Spanish language households or number of registered. Another way of saying this is that the bias will vary by what parameter we aggregate from these polling data.

So let me reframe the question – how do polls get election numbers right even when they undercount Spanish speakers? One explanation is positive correlation between selection into polling, and propensity to vote, which makes vote choice means much more reflective of what we will see come election day.

The other possible explanation to all this – post-stratification or other posthoc adjustment to numbers, or innovations in how sampling is done: matching, stratification etc. Doing so uses additional knowledge about the population and can shrink ses and improve accuracy. One way to test such non-randomness: over tight confidence bounds. Many polls tend to do wonderfully on multiple uncorrelated variables – for instance, census region proportions, gender, … etc. – something random samples cannot regularly produce.