Don’t Expose Yourself! Discretionary Exposure to Political Information

10 Oct

As the options have grown, so have the fears. Are the politically disinterested taking advantage of the nearly limitless options to opt out of news entirely? Are the politically interested siloing themselves into “echo chambers”? In an eponymous Oxford Research Encylopedia article, I discuss what we think we know, and some concerns about how we can know. Some key points:

  • Is the gap between how much the politically interested and politically disinterested know about politics increasing, as Post-broadcast Democracy posits? Figure 1 suggests not.

  • Quantity rather than ratio: “If the dependent variable is partisan affect, how ‘selective’ one is may not matter as much as the net imbalance in consumption—the difference between the number of congenial and uncongenial bits consumed…”

  • To measure how much political information a person is consuming, you must be able to distinguish political information from its complement. But what isn’t political information? “In this chapter, our focus is on consumption of varieties of political information. The genus is political information. And the species of this genus differ in congeniality, among other things. But what is political information? All information that influences people’s political attitudes or behaviors? If so, then limiting ourselves to news is likely too constraining. Popular television shows like The Handmaid’s Tale, Narcos, and Law and Order have clear political themes. … Shows like Will and Grace and The Cosby Show may be less clearly political, but they also have a political subtext.” (see Figure 4) … “Even if we limit ourselves to news, the domain is still not clear. Is news about a bank robbery relevant political information? What about Hillary Clinton’s haircut? To the extent that each of these affect people’s attitudes, they are arguably pertinent. “

  • One of the challenges with inferring consumption based on domain level data is that domain level data are crude. Going to http://nytimes.com is not the same as reading political news. And measurement error may vary by the kind of person. For instance, say we label http://nytimes.com as political news. For the political junkie, the measurement error may be close to zero. For teetotalers, it may be close to 100% (see more).

  • Show people a few news headlines along with the news source (you can randomize the source). What can you learn from a few such ‘trials’? You cannot learn what proportion of news they get from a particular source. you can learn the preferences, but not reliably. More from the paper: “Given the problems with self-reports, survey instruments that rely on behavioral measures are plausibly better. … We coded congeniality trichotomously: congenial, neutral, or uncongenial. The correlations between trials are alarmingly low. The polychoric correlation between any two trials range between .06 to .20. And the correlation between choosing political news in any two trials is between -.01 and .05.”

  • Following up on the previous point: preference for a source which has a mean slant != preference for slanted news. “Current measures of [selective exposure] are beset with five broad problems. First is conceptual errors. For instance, people frequently equate preference for information from partisan sources with a preference for congenial information.”

Computing Optimal Cut-Offs

7 Oct

Probabilities from classification models can have two problems:

  1. Miscalibration: A p of .9 often doesn’t mean a 90% chance of 1 (assuming a dichotomous y). (You can calibrate it using isotonic regression.)

  2. Optimal cut-offs: For multi-class classifiers, we do not know what probability value will maximize the accuracy or F1 score. Or any metric for which you need to trade-off between FP and FN.

One way to solve #2 is to run the true labels and probabilities through a brute-force optimizer and gives you the optimal cut-off for the metric. Here’s the script for doing the same along with an illustration.

Online Learning With Biased Sampling

3 Oct

Say that you train a model to predict who will click on an ad. Say that you deploy the model to only show ads to people who are likely to click on them. (For a discussion about the optimal strategy for who to show ads to, see here.) And say you use the clicks from the people who see the ad to continue to tune the parameters. (This is a close approximation of a standard implementation of online learning in online advertising.)

In effect, once you launch the model, you only get data from a biased set of users. Such a sampling bias can be a problem when the data generating process (how the 1s and the 0s are generated) changes in a way such that changes above the threshold (among the kinds of people who we get data from) are uncorrelated with how it changes below the threshold (among the people who we do not get data from). The concerning aspect is that if this happens, the model continues to “work,” in that the accuracy can continue to be high even as recall (the proportion of people for whom the ad is relevant) becomes lower over time. There is only one surefire way to diagnose the issue and address it: continue to collect some data from people below the threshold and learn if the data generating process is changing.

Some Facts About Indian Polling Stations

27 Sep

Of the 748,584 polling stations for which we have self-reported data on building conditions, nearly 24% report having Internet. A similar number report having “Landline Telephone/Fax Connection.”

97.7% report having toilets for men and women.

2.6% report being in a “dilapidated or dangerous” building.

93.2% report having ramps for the disabled. 98.3% report having “proper road connectivity.” Nearly 4% report being located at a place where the “voters have to cross river/valley/ravine or natural obstacle to reach PS.”

92% of the polling stations are located in “Govt building/Premises.” And 11.4% are reportedly located in “an institution/religious place.”

8% report having a “political party office situated within 200 meters of PS premises.”

For underlying data and scripts, see here.

Growth Funnels

24 Sep

You spend a ton of time building a product to solve a particular problem. You launch. Then, either the kinds of people whose problem you are solving never arrive or they arrive and then leave. You want to know why because if you know the reason, you can work to address the underlying causes.

Often, however, businesses only have access to observational data. And since we can’t answer the why with observational data, people have swapped the why with the where. Answering the where can give us a window into the why (though only a window). And that can be useful.

The traditional way of posing ‘where’ is called a funnel. Conventionally, funnels start at when the customer arrives on the website. We will forgo convention and start at the top.

There are only three things you should work to optimally do conditional on the product you have built:

  1. Help people discover your product
  2. Effectively convey the relevant value of the product to those who have discovered your product
  3. Help people effectively use the product

p.s. When the product changes, you have to do all three all over again.

One way funnels can potentially help is triage. How big is the ‘leak’ at each ‘stage’? The funnel on the top of the first two steps is: of the people who discovered the product, how many did we successfully communicate the relevant value of the product to? Posing the problem in such a way makes it seem more powerful than it is. To come up with a number, you generally only have noisy behavioral signals. For instance, the proportion of people who visit the site who sign up. But low proportions could be of large denominators—lots of people are visiting the site but the product is not relevant for most of them. (If bringing people to the site costs nothing, there is nothing to do.) Or it could be because you have a super kludgy sign-up process. You could drill down to try to get at such clues, but the number of potential locations for drilling remains large.

That brings us to the next point. Macro-triaging is useful but only for resource allocation kinds of decisions based on some assumptions. It doesn’t provide a way to get concrete answers to real issues. For concrete answers, you need funnels around concrete workflows. For instance, for a referral ‘product’ for AirBnB, one way to define steps (based on Gustaf Alstromer) is as follows:

  1. how many people saw the link asking people to refer
  2. of the people who saw the link, how many clicked on it
  3. of the people who clicked on it, how many invited people
  4. of the people who were invited, how many signed up (as a user, guest, host)
  5. of the people who signed up, how many made the first booking

Such a funnel allows you to optimize the workflow by experimenting with the user interface at each stage. It allows you to analyze what the users were trying to do but failed to do, or took a long time doing. It also allows you to analyze how optimally (from a company’s perspective) are users taking an action. For instance, people may want to invite all their friends but the UI doesn’t have a convenient way to import contacts from email.

Sometimes just writing out the steps in a workflow can be useful. It allows people to think if some steps are actually needed. For instance, is signing-in needed before we allow a user to understand the value of the product?

Conscious Uncoupling: Separating Compliance from Treatment

18 Sep

Let’s say that we want to measure the effect of a phone call encouraging people to register to vote on voting. Let’s define compliance as a person taking the call. And let’s assume that the compliance rate is low. The traditional way to estimate the effect of the phone call is via an RCT: randomly split the sample into Treatment and Control, call everyone in the Treatment Group, wait till after the election, and calculate the difference in the proportion who voted. Assuming that the treatment doesn’t affect non-compliers, etc., we can also estimate the Complier Average Treatment Effect.

But one way to think about non-compliance in the example above is as follows: “Buddy, you need to reach these people using another way.” That is a super useful thing to know, but it is an observational point. You can fit a predictive model for who picks up phone calls and who doesn’t. The experiment is useful in answering how much can you persuade the people you reach on the phone. And you can learn that by randomizing conditional on compliance.

For such cases, here’s what we can do:

  1. Call a reasonably large random sample of people. Learn a model for who complies.
  2. Use it to target people who are likelier to comply and randomize post a person picking up.

More generally, Average Treatment Effect is useful for global rollouts of one policy. But when is that a good counterfactual to learn? Tautologically, when that is all you can do or when it is the optimal thing to do. If we are not in that world, why not learn about—and I am using the example to be concrete—a) what is a good way to reach me, b) what message do you want to show me. For instance, for political campaigns, the optimal strategy is to estimate the cost of reaching people by phone, mail, f2f, etc., estimate the probability of reaching each using each of the media, estimate the payoff for different messages for different kinds of people, and then target using the medium and the message that delivers the greatest benefit. (For a discussion about targeting, see here.)

But technically, a message could have the greatest payoff for the person who is least likely to comply. And the optimal strategy could still be to call everyone. To learn treatment effects among people who are unlikely to comply (using a particular method), you will need to build experiments to increase compliance. More generally, if you are thinking about multi-arm bandits or some such dynamic learning system, the insight is to have treatment arms around both compliance and message. The other general point, implicit in the essay, is that rather than be fixated on calculating ATE, we should be fixated on an optimization objective, e.g., the additional number of people persuaded to turn out to vote per dollar.

Prediction Errors: Using ML For Measurement

1 Sep

Say you want to measure the how often people visit pornographic domains over some period. To measure that, you build a model to predict whether or not a domain hosts pornography. And let’s assume that for the chosen classification threshold, the False Positive rate (FP) is 10\% and the False Negative rate (FN) is 7\%. Here below, we discuss some of the concerns with using scores from such a model and discuss ways to address the issues.

Let’s get some notation out of the way. Let’s say that we have n users and that we can iterate over them using i. Let’s denote the total number of unique domains—domains visited by any of the n users at least once during the observation window—by k. And let’s use j to iterate over the domains. Let’s denote the number of visits to domain j by user i by c_{ij} = {0, 1, 2, ....}. And let’s denote the total number of unique domains a person visits (\sum (c_{ij} == 1)) using t_i. Lastly, let’s denote predicted labels about whether or not each domain hosts pornography by p, so we have p_1, ..., p_j, ... , p_k.

Let’s start with a simple point. Say there are 5 domains with p: {1_1, 1_2, 1_3, 1_4, 1_5}. Let’s say user one visits the first three sites once and let’s say that user two visits all five sites once. Given 10\% of the predictions are false positives, the total measurement error in user one’s score = 3 * .10 and the total measurement error in user two’s score = 5 * .10. The general point is that total false positives increase as a function of predicted 1s. And the total number of false negative increase as the number of predicted 0s.

Read more here.

Seven Rules of Technical Management

31 Aug
  1. Assume that everything is being done sub-optimally. Probe everything.
  2. Push people to explain things as simply as possible.
  3. For anything complex, ask people to write things down in plain English.
  4. Get perspective from people with different technical competencies.
  5. Frontload your thinking on a project.
  6. Define the API before moving too far out.
  7. Do less. Do well.

The unspoken golden rule: hire competent people. Other virtues matter but competence generally matters the most in technical roles.

Comparing Ad Targeting Regimes

30 Aug

Ad targeting regimes are essential when we have multiple things to sell (opportunity cost) or when the cost of running an ad is non-trivial or when the cost to user’s welfare of a mistargeted ad is non-trivial or any combination of the above. I am leaving this purposely vague because all this is well known.

Say that we have built two different models for selecting people to whom we should show ads—Model A and Model B. Now say that we want to compare which model is better. And by better, we mean better CTR. How do we compare the models? Some people have run an RCT to compare the efficacy of the two models. We don’t need an RCT. All we need to know for each user is whether or not they have been selected to see an ad under each model. We can have 4 potential scenarios:

model_a, model_b
0, 0
1, 0
0, 1
1, 1

For CTR, 0-0 doesn’t add any information. It is the conditional probability. To measure which of the models is better, draw a fixed size random sample of users picked by model_a and another random sample of the same size from users picked by model_b and compare CTR. (The same user can be picked twice. It doesn’t matter.)

Now that we know what to do, let’s understand why experiments are wasteful. The heuristic account is as follows: experiments are there to compare ‘similar people.’ When comparing targeting regimes, we are tautologically comparing different people. That is the point of the comparison.

What’s Relevant? Learning from Organic Growth

26 Aug

Say that we want to find people to whom a product is relevant. One way to do that is to launch a small campaign advertising the product and learn from people who click on the ad, or better yet, learn from people who not just click on the ad but go and try out the product and end up using it. But if you didn’t have the luxury of running a small campaign and waiting a while, you can learn from organic growth.

Conventionally, people learn from organic growth by posing it as a supervised problem. And they generate the labels as follows: people who have ‘never’ (mostly: in the last 6–12 months) used the product are labeled as 0 and people who “adopted” the product in the latest time period, e.g., over the last month, are labeled 1. People who have used the product in the last 6–12 months or so are filtered out.

There are three problems with generating labels this way. First, not all the people who ‘adopt’ a product continue to use the product. Many of the people who try it find that it is not useful or find the price too high and abandon it. This means that a lot of 1s are mislabeled. Second, the cleanest 1s are the people who ‘adopted’ the product some time ago and have continued to use it since. Removing those is thus a bad idea. Third, the good 0s are those who tried the product but didn’t persist with it not those who never tried the product. Generating the labels in such a manner also allows you to mitigate one of the significant problems with learning from organic growth: people who organically find a product are different from those who don’t. Here, you are subsetting on the kinds of people who found the product, except that one found it useful and another did not. This empirical strategy has its problems, but it is distinctly better than the conventional approach.