Traveling Salesman

18 Nov

White-collar elites venerate travel, especially to exotic and far-away places. There is some justification for the fervor—traveling is pleasant. But veneration creates an umbra that hides some truths:

  1. Local travel is underappreciated. We likely underappreciate the novelty and beauty available locally.
  2. Virtual travel is underappreciated. We know all the ways virtual travel doesn’t measure up to the real experience. But we do not ponder enough about how the gap between virtual and physical travel has closed, e.g., high-resolution video, and how some aspects of virtual travel are better:
    1. Cost and convenience. The comfort of the sofa beats the heat and the cold, the crowds, and the fatigue.
    2. Knowledgeable guides. Access to knowledgeable guides online is much greater than offline. 
    3. New vistas. Drones give pleasing viewing angles unavailable to lay tourists.
    4. Access to less visited places. Intrepid YouTubers stream from places far off the tourist map, e.g., here.
  3. The tragedy of the commons. The more people travel, the less appealing it is for everyone because a) travelers change the character of a place and b) the crowds come in the way of enjoyment.
  4. The well-traveled are mistaken as being intellectually sophisticated. “Immersion therapy” can expand horizons by challenging perspectives. But often travel needs to be paired with books, needs to be longer, the traveler needs to make an effort to learn the language, etc., for it to be ‘improving.’
  5. Traveling by air is extremely polluting. A round-trip between LA and NYC emits .62 tons of CO2 which is the same as CO2 generated from driving 1200 miles.

Noise: A Flaw in Book Writing

10 Jul

This is a review of Noise, A Flaw in Human Judgment by Kahneman, Sibony, and Sunstein.

The phrase “noise in decision making” brings to mind “random” error. Scientists, however, shy away from random error. Science is mostly about systematic error, except, perhaps, quantum physics. So Kahneman et al. conceive of noise as seemingly random error that is a result of unmeasured biases. For instance, research suggests that heat causes bad mood. And bad mood may, in turn, cause people to judge more harshly. If this were to hold, the variability in judging stemming from the weather can end up being interpreted as noise. But, as is clear, there is no “random” error, merely bias. Kahneman et al. make a hash of this point. Early on, they give the conventional formula of total expected error as the sum of bias and variance (they don’t further decompose variance into irreducible error and ‘random’ error) with the aim of talking about the two separately, and naturally, never succeed in doing that.

The conceptual issues ought not detract us from the important point of the book. It is useful to think about human judgment systems as mathematical functions. We should expect the same inputs to map to the same output. It turns out that it isn’t even remotely true in most human decision-making systems. Take insurance underwriting, for instance. Given the same data (realistic but made-up information about cases), the median percentage difference between quotes between any pair of underwriters is an eye-watering 55% (which means that for half of the cases, it is worse than 55%), about five times as large as expected by the executives. There are a few interesting points that flow from this data. First, if you are a customer, your optimal strategy is to get multiple quotes. Second, what explains ignorance about the disagreement? There could be a few reasons. First, when people come across a quote from another underwriter, they may ‘anchor’ their estimate on the number they see, reducing the gap between the number and the counterfactual. Second, colleagues plausibly read to agree—less effort and optimizing for collegiality, asking, “Could this make sense?”, than read to evaluate, “Does this make sense?” (see my notes for a fuller set of potential explanations.)

Data from asylum reviews is yet starker. “A study of cases that were randomly allotted to different judges found that one judge admitted 5% of applicants, while another admitted 88%.” (Paper.)

Variability can stem from only two things. It could be that the data doesn’t allow for a unique judgment (irreducible error). (But even here, the final judgment should reflect the uncertainty in the data.) Or that at least one person is ‘wrong’ (has a different answer than others). Among other things, this can stem from:

  1. variation in skill, e.g., how to assess patent applications
  2. variation in effort, e.g., some people put more effort than others
  3. agency and preferences, e.g., I am a conservative judge, and I can deny an asylum application because I have the power to do so
  4. biases like using irrelevant information, e.g., weather, hypoglycemia, etc.

(Note: a lack of variability doesn’t mean we are on to the right answer.)

The list of proposed solutions is extensive—from selecting better judges to the wisdom of the crowds to using models to training people better to more elaborate schemes like dividing the decision task and asking people to make relative than absolute judgments. The evidence backing the solutions is not always hefty, which meshes with the ideolog-like approach to evidence present everywhere in the book. When I did a small audit of the citations, three things stood out (the overarching theme is adherence to the “No Congenial Result Scrutinized or Left Uncited Act”):

  1. Extremely small n studies cited without qualification. Software engineers.
    Quote from the book: “when the same software developers were asked on two separate days to estimate the completion time for the same task, the hours they projected differed by 71%, on average.”
    The underlying paper: “In this paper, we report from an experiment where seven experienced software professionals estimated the same sixty software development tasks over a period of three months. Six of the sixty tasks were estimated twice.”
  2. Extremely small n studies cited without qualification. Israeli Judges.
    Hypoglycemia and judgment: “Our data consist of 1,112 judicial rulings, collected over 50 d in a 10-mo period, by eight Jewish-Israeli judges (two females) who preside over two different parole boards that serve four major prisons in Israel.”
  3. Surprising but likely unreplicable results. “When calories are on the left, consumers receive that information first and evidently think “a lot of calories!” or “not so many calories!” before they see the item. Their initial positive or negative reaction greatly affects their choices. By contrast, when people see the food item first, they apparently think “delicious!” or “not so great!” before they see the calorie label. Here again, their initial reaction greatly affects their choices. This hypothesis is supported by the authors’ finding that for Hebrew speakers, who read right to left, the calorie label has a significantly larger impact..” (Paper.)
    “We show that if the effect sizes in Dallas et al. (2019) are representative of the populations, a replication of the six studies (with the same sample sizes) has a probability of only 0.014 of producing uniformly significant outcomes.” (Paper.)
  4. Citations to HBR. Citations to think pieces in Harvard Business Review (10 citations in total based on a keyword search) and books like ‘Work Rules!’ for a fair many claims.

Here are my notes for the book.

Casual Inference: Errors in Everyday Causal Inference

12 Aug

Why are things the way they are? What is the effect of something? Both of these reverse and forward causation questions are vital.

When I was at Stanford, I took a class with a pugnacious psychometrician, David Rogosa. David had two pet peeves, one of which was people making causal claims with observational data. And it is in David’s class that I learned the pejorative for such claims. With great relish, David referred to such claims as ‘casual inference.’ (Since then, I have come up with another pejorative phrase for such claims—cosal inference—as in merely dressing up as causal inference.)

It turns out that despite its limitations, casual inference is quite common. Here are some fashionable costumes:

  1. 7 Habits of Successful People: We have all seen business books with such titles. The underlying message of these books is: adopt these habits, and you will be successful too! Let’s follow the reasoning and see where it falls apart. One stereotype about successful people is that they wake up early. And the implication is you wake up early you can be successful too. It *seems* right. It agrees with folk wisdom that discomfort causes success. But can we reliably draw inferences about what less successful people should do based on what successful people do? No. For one, we know nothing about the habits of less successful people. It could be that less successful people wake up *earlier* than the more successful people. Certainly, growing up in India, I recall daily laborers waking up much earlier than people living in bungalows. And when you think of it, the claim that servants wake up before masters seems uncontroversial. It may even be routine enough to be canonized as a law—the Downtown Abbey law. The upshot is that when you select on the dependent variable, i.e., only look at cases where the variable takes certain values, e.g., only look at the habits of financially successful people, even correlation is not guaranteed. This means that you don’t even get to mock the claim with the jibe that “correlation is not causation.”

    Let’s go back to Goji’s delivery service for another example. One of the ‘tricks’ that we had discussed was to sample failures. If you do that, you are selecting on the dependent variable. And while it is a good heuristic, it can lead you astray. For instance, let’s say that most of the late deliveries our early morning deliveries. You may infer that delivering at another time may improve outcomes. Except, when you look at the data, you find that the bulk of your deliveries are in the morning. And the rate at which deliveries run late is *lower* early morning than during other times.

    There is a yet more famous example of things going awry when you select on the dependent variable. During World War II, statisticians were asked where the armor should be added on the planes. Of the aircraft that returned, the damage was concentrated in a few areas, like the wings. The top-of-head answer is to suggest we reinforce areas hit most often. But if you think about the planes that didn’t return, you get to the right answer, which is that we need to reinforce areas that weren’t hit. In literature, people call this kind of error, survivorship bias. But it is a problem of selecting on the dependent variable (whether or not a plane returned) and selecting on planes that returned.

  2. More frequent system crashes cause people to renew their software license. It is a mistake to treat correlation as causation. There are many different reasons behind why doing so can lead you astray. The rarest reason is that lots of odd things are correlated in the world because of luck alone. The point is hilariously illustrated by a set of graphs showing a large correlation between conceptually unrelated things, e.g., there is a large correlation between total worldwide non-commercial space launches and the number of sociology doctorates that are awarded each year.

    A more common scenario is illustrated by the example in the title of this point. Commonly, there is a ‘lurking’ or ‘confounding’ variable that explains both sides. In our case, the more frequently a person uses a system, the more the number of crashes. And it makes sense that people who use the system most frequently also need the software the most and renew the license most often.

    Another common but more subtle reason is called Simpson’s paradox. Sometimes the correlation you see is “wrong.” You may see a correlation in the aggregate, but the correlation runs the opposite way when you break it down by group. Gender bias in U.C. Berkeley admissions provides a famous example. In 1973, 44% of the men who applied to graduate programs were admitted, whereas only 35% of the women were. But when you split by department, which eventually controlled admissions, women generally had a higher batting average than men. The reason for the reversal was women applied more often to more competitive departments, like—-wait for it—-English and men were more likely to apply to less competitive departments like Engineering. None of this is to say that there isn’t bias against women. It is merely to point out that the pattern in aggregated data may not hold when you split the data into relevant chunks.

    It is also important to keep in mind the opposite of correlation is not causation—lack of correlation does not imply a lack of causation.

  3. Mayor Giuliani brought the NYC crime rate down. There are two potential errors here:
    • Forgetting about ecological trends. Crime rates in other big US cities went down at the same time as they did in NY, sometimes more steeply. When faced with a causal claim, it is good to check how ‘similar’ people fared. The Difference-in-Differences estimator that builds on this intuition.
    • Treating temporally proximate as causal. Say you had a headache, you took some medicine and your headache went away. It could be the case that your headache went away by itself, as headaches often do.

  4. I took this homeopathic medication and my headache went away. If the ailments are real, placebo effects are a bit mysterious. And mysterious they may be but they are real enough. Not accounting for placebo effects misleads us to ascribe the total effect to the medicine. 

  5. Shallow causation. We ascribe too much weight to immediate causes than to causes that are a few layers deeper.

  6.  Monocausation: In everyday conversations, it is common for people to speak as if x is the only cause of y.

  7.  Big Causation: Another common pitfall is reading x causes y as x causes y to change a lot. This is partly a consequence of mistaking statistical significance with substantive significance, and partly a consequence of us not paying close enough attention to numbers.

  8. Same Effect: Lastly, many people take causal claims to mean that the effect is the same across people. 

Unsighted: Why Some Important Findings Remain Uncited

1 Aug

Poring over the first 500 citations of the over 900 citations for Fear and Loathing across Party Lines on Google Scholar (7/31/2020), I could not find a single study citing the paper for racial discrimination. You may think the reason is obvious—the paper is about partisan prejudice, not racial prejudice. But a more accurate description of the paper is that the paper is best known for describing partisan prejudice but has powerful evidence on the lack of racial discrimination among white Americans–in fact, there is reasonable evidence of positive discrimination in one study. (I exclude the IAT results, weaker than Banaji’s results, which show Cohen’s d ~ .22, because they don’t speak directly to discrimination.)

There are the two independent pieces of evidence in the paper about racial discrimination.

Candidate Selection Experiment

“Unlike partisanship where ingroup preferences dominate selection, only African Americans showed a consistent preference for the ingroup candidate. Asked to choose between two equally qualified candidates, the probability of an African American selecting an ingroup winnerwas .78 (95% confidence interval [.66, .87]), which was no different than their support for the more qualified ingroup candidate—.76 (95% confidence interval [.59, .87]). Compared to these conditions, the probability of African Americans selecting an outgroup winner was at its highest—.45—when the European American was most qualified (95% confidence interval [.26, .66]). The probability of a European American selecting an ingroup winner was only .42 (95% confidence interval [.34, .50]), and further decreased to .29 (95% confidence interval [.20, .40]) when the ingroup candidate was less qualified. The only condition in which a majority of European Americans selected their ingroup candidate was when the candidate was more qualified, with a probability of ingroup selection at .64 (95% confidence interval [.53, .74]).”

Evidence from Dictator and Trust Games

“From Figure 8, it is clear that in comparison with party, the effects of racial similarity proved negligible and not significant—coethnics were treated more generously (by eight cents, 95% confidence interval [–.11, .27]) in the dictator game, but incurred a loss (seven cents, 95% confidence interval [–.34, .20]) in the trust game. There was no interaction between partisan and racial similarity; playing with both a copartisan and coethnic did not elicit additional trust over and above the effects of copartisanship.”

There are two plausible explanations for the lack of citations. Both are easily ruled out. The first is that the quality of evidence for racial discrimination is worse than that for partisan discrimination. Given both claims use the same data and research design, that explanation doesn’t work. The second is that it is a difference in base rates of production of research on racial and partisan discrimination. A quick Google search debunks that theory. Between 2015 and 2020, I get 135k results for racial discrimination and 17k for partisan polarization. It isn’t exact but good enough to rule it out as a possibility for the results I see. This likely leaves us with just two explanations: a) researchers hesitate to cite results that run counter to their priors or their results, b) people are simply unaware of these results.

Addendum (9/26/2021): Why may people be unaware of the results? Here are some lay conjectures (which are general and NOT about the paper I use as an example above; I only use the paper as an example because I am familiar with it. See below on the reason):

  1. Papers, but especially paper titles and abstracts, are written around a single point because …
    1. Authors believe that this is a more effective way to write papers.
    2. Editors/reviewers recommend that the paper focus on one key finding or not focus on some findings — via Dean Eckles. (see the p.s. as well) The reason why some of the key results didn’t make the abstract in the paper I use as an example is, as Sean shares, because reviewers thought the results were not strong.)
  2. Authors may be especially reluctant to weave in ‘controversial’ supplementary findings in the abstract because …
    1. Sharing certain controversial results may cause reputational harm.
    2. Say the authors want to instill belief in A > B. Say a vast majority of readers have strong priors about: A > B and C > D. Say a method finds A > B and D > C. There are two ways to frame the paper. Talk about A > B and bury D > C. Or start with D > C and then show A > B. Which paper’s findings would be more widely believed?
  3. Papers are read far less often than paper titles and abstracts. And even when people read a paper, they are often doing a ‘motivated search’—looking for the relevant portion of the paper. (Good widely available within article search should principally help here.)

p.s. All of the above is about cases where papers have important supplementary results. But as Dean Eckles points out, sometimes the supplementary results are dropped at reviewers’ request, and sometimes (and this has happened to me), authors never find the energy to publish them elsewhere.

Gaming Measurement: Using Economic Games to Measure Discrimination

31 Jul

Prejudice is the bane of humanity. Measurement of prejudice, in turn, is a bane of social scientists. Self-reports are unsatisfactory. Like talk, they are cheap and thus biased and noisy. Implicit measures don’t even pass the basic hurdle of measurement—reliability. Against this grim background, economic games as measures of prejudice seem promising—they are realistic and capture costly behavior. Habyarimana et al. (HHPW for short) for instance, use the dictator game (they also have a neat variation of it which they call the ‘discrimination game’) to measure ethnic discrimination. Since then, many others have used the design, including prominently, Iyengar and Westwood (IW for short). But there are some issues with how economic games have been set up, analyzed, and interpreted:

  1. Revealing identity upfront gives you a ‘no personal information’ estimand: One common aspect of how economic games are setup is the party/tribe is revealed upfront. Revealing the trait upfront, however, may be sub-optimal. The likelier sequence of interaction and discovery of party/tribe in the world, especially as we move online, is regular interaction followed by discovery. To that end, a game where players interact for a few cycles before an ‘irrelevant’ trait is revealed about them is plausibly more generalizable. What we learn from such games can be provocative—-discrimination after a history of fair economic transactions seems dire. 
  2. Using data from subsequent movers can bias estimates. “For example, Burnham et al. (2000) reports that 68% of second movers primed by the word “partner” and 33% of second movers primed by the word “opponent” returned money in a single-shot trust game. Taken at face value, the experiment seems to show that the priming treatment increased by 35 percentage-points the rate at which second movers returned money. But this calculation ignores the fact that second movers were exposed to two stimuli, the 14 partner/opponent prime and the move of the first player. The former is randomly assigned, but the latter is not under experimental control and may introduce bias. ” (Green and Tusicisny) IW smartly sidestep the concern: “In both games, participants only took the role of Player 1. To minimize round-ordering concerns, there was no feedback offered at the end of each round; participants were told all results would be provided at the end of the study.”
  3. AMCE of conjoint experiments is subtle and subject to assumptions. The experiment in IW is a conjoint experiment: “For each round of the game, players were provided a capsule description of the second player, including information about the player’s age, gender, income, race/ethnicity, and party affiliation. Age was randomly assigned to range between 32 and 38, income varied between $39,000 and $42,300, and gender was fixed as male. Player 2’s partisanship was limited to Democrat or Republican, so there are two pairings of partisan similarity (Democrats and Republicans playing with Democrats and Republicans). The race of Player 2 was limited to white or African American. Race and partisanship were crossed in a 2 × 2, within-subjects design totaling four rounds/Player 2s.” The first subtlety is that AMCE for partisanship is identified against the distribution of gender, age, race, etc. For generalizability, we may want a distribution close to the real world. As Hainmeuller et al. write: “…use the real-world distribution (e.g., the distribution of the attributes of actual politicians) to improve external validity. The fact that the analyst can control how the effects are averaged can also be viewed as a potential drawback, however. In some applied settings, it is not necessarily clear what distribution of the treatment components analysts should use to anchor inferences. In the worst-case scenario, researchers may intentionally or unintentionally misrepresent their empirical findings by using weights that exaggerate particular attribute combinations so as to produce effects in the desired direction.” Second, there is always a chance that it is a particular higher-order combination, e.g., race–PID, that ‘explains’ the main effect. 
  4. Skew in outcome variables means that the mean is not a good summary statistic. As you see in the last line of the first panel of Table 4 (Republican—Republican Dictator Game), if you can take out the 20% of the people who give $0, the average allocation from others is $4.2. HHPW handle this with a variable called ‘egoist’ and IW handle it with a separate column tallying people giving precisely $0. 
  5. The presence of ‘white foreigners’ can make people behave more generously. As Dube et al. find, “the presence of a white foreigner increases player contributions by 19 percent.” The point is more general, of course. 

With that, here are some things we can learn from economic games in HHPW and IW:

  1. People are very altruistic. In HPPW: “The modal strategy, employed in 25% of the rounds, was to retain 400 USh and to allocate 300 USh to each of the other players. The next most common strategy was to keep 600 USh and to allocate 200 USh to each of the other players (21% of rounds). In the vast majority of allocations, subjects appeared to adhere to the norm that the two receivers should be treated equally. On average, subjects retained 540 shillings and allocated 230 shillings to each of the other players. The modal strategy in the 500 USh denomination game (played in 73% of rounds) was to keep one 500 USh coin and allocate the other to another player. Nonetheless, in 23% of the rounds, subjects allocated both coins to the other players.” In IW, “[of the $10, players allocated] nontrivial amounts of their endowment—a mean of $4.17 (95% confidence interval [3.91, 4.43]) in the trust game, and a mean of $2.88 (95% confidence interval [2.66, 3.10])” (Note: These numbers are hard to reconcile with numbers in Table 4. One plausible explanation is that these numbers are over the entire population and Table 4 numbers are a subset on partisans and independents are somewhat less generous than partisans.) 
  2. There is no co-ethnic bias. Both HHPW and IW find this. HHPW: “we find no evidence that this altruism was directed more at in-group members than at out-group members. [Table 2]” IW: “From Figure 8, it is clear that in comparison with party, the effects of racial similarity proved negligible and not significant—coethnics were treated more generously (by eight cents, 95% confidence interval [–.11, .27]) in the dictator game, but incurred a loss (seven cents, 95% confidence interval [–.34, .20]) in the trust game.”
  3. A modest proportion of people discriminate against partisans. IW: “The average amount allocated to copartisans in the trust game was $4.58 (95% confidence interval [4.33, 4.83]), representing a “bonus” of some 10% over the average allocation of $4.17. In the dictator game, copartisans were awarded 24% over the average allocation.” But it is less dramatic than that. The key change in the dictator game is the number of people giving $0. The change in the percentage of people giving $0 is 7% among Democrats. So the average amount of money given to R and D by people who didn’t give $0 is $4.1 and $4.4 respectively which is a ~ 7% diff. 
  4. More Republicans than Democrats act like ‘homo-economicus.’ I am just going by the proportion of respondents giving $0 in dictator games.

p.s. I was surprised that there are no replication scripts or even a codebook for IW. The data had been downloaded 275 times when I checked.

The Declining Value of Personal Advice

27 Jun

There used to be a time when before buying something, you asked your friends and peers about advice, and it was the optimal thing to do. These days, it is often not a great use of time. It is generally better to go online. Today, the Internet abounds with comprehensive, detailed, and trustworthy information, and picking the best product, judging by its quality, price, appearance, or what have you, in a slew of categories is easy to do.

As goes for advice about products, so goes for much other advice. For instance, if a coding error stumps you, your first move should be to search StackOverflow than Slack a peer. If you don’t understand a technical concept, look for a YouTube video or a helpful blog or a book than “leverage” a peer.

The fundamental point is that it is easier to get high-quality data and expert advice today than it has ever been. If your network includes the expert, bless you! But if it doesn’t, your network no longer damns you to sub-optimal information and advice. And that likely has welcome consequences for equality.

The only cases where advice from people near you may edge ahead of readily available help online is where the advisor has access to private information about your case or where the advisor is willing to expend greater elbow grease to get to the facts and think of advice that aptly takes account of your special circumstances. For instance, you may be able to get good advice on how to deal with alcoholic parents from an expert online but probably not about alcoholic parents with the specific set of deficiencies that your parents have. Short of such cases, the value of advice from people around is lower today than before, and probably lower than what you can get online.

The declining value of interpersonal advice has one significant negative externality. It takes out a big way we have provided value to our loved ones. We need to think harder about how we can fill that gap.

Why do We Fail? And What to do About It?

28 May

I recently read Gawande’s The Checklist Manifesto. (You can read my review of the book here and my notes on the book here.) The book made me think harder about failure and how to prevent it. Here’s a result of that thinking.

We fail because we don’t know or because we don’t execute on what we know (Gorovitz and MacIntyre). Of the things that we don’t know are things that no else knows either—they are beyond humanity’s reach for now. Ignore those for now. This leaves us with things that “we” know but the practitioner doesn’t.

Practitioners do not know because the education system has failed them, because they don’t care to learn, or because the production of new knowledge outpaces their capacity to learn. Given that, you can reduce ignorance by 1) increase the length of training, b) improving the quality of training, c) setting up continued education, d) incentivizing knowledge acquisition, e) reducing the burden of how much to know by creating specializations, etc. On creating specialties, Gawande has a great example: “there are pediatric anesthesiologists, cardiac anesthesiologists, obstetric anesthesiologists, neurosurgical anesthesiologists, …”

Ignorance, however, ought not to damn the practitioner to error. If you know that you don’t know, you can learn. Ignorance, thus, is not a sufficient condition for failure. But ignorance of ignorance is. To fix overconfidence, leading people through provocative, personalized examples may prove useful.

Ignorance and ignorance about ignorance are but two of the three reasons for why we fail. We also fail because we don’t execute on what we know. Practitioners fail to apply what they know because they are distracted, lazy, have limited attention and memory, etc. To solve these issues, we can a) reduce distractions, b) provide memory aids, c) automate tasks, d) train people on the importance of thoroughness, e) incentivize thoroughness, etc.

Checklists are one way to work toward two inter-related aims: educating people about the necessary steps needed to make a decision and aiding memory. But awareness of steps is not enough. To incentivize people to follow the steps, you need to develop processes to hold people accountable. Audits are one way to do that. Meetings set up at appropriate times during which people go through the list is another way.

Wanted: Effects That Support My Hypothesis

8 May

Do survey respondents account for the hypothesis that they think people fielding the survey have when they respond? The answer, according to Mummolo and Peterson, is not much.

Their paper also very likely provides the reason why—people don’t pay much attention. Figure 3 provides data on manipulation checks—the proportion guessing the hypothesis being tested correctly. The change in proportion between control and treatment ranges from -.05 to .25, with a bulk of changes in Qualtrics between 0 and .1. (In one condition, authors even offer an additional 25 cents to give a result consistent with the hypothesis. And presumably, people need to know the hypothesis before they can answer in line with it.) The faint increase is especially noteworthy given that on average, the proportion of people in the control group who guess the hypothesis correctly—without the guessing correction—is between .25–.35 (see Appendix B; pdf).

So, the big thing we may have learned from the data is how little attention survey respondents pay. The numbers obtained here are similar to those in Appendix D of Jonathan Woon’s paper (pdf). The point is humbling and suggests that we need to: a) invest more in measurement, and b) have yet larger samples, which is an expensive way to overcome measurement error—a point Gelman has made before.

There is also the point about the worthiness of including ‘manipulation checks.’ Experiments tell us ATE of what we manipulate. The role of manipulation checks is to shed light on ‘compliance.’ If conveying experimenter demand clearly and loudly is a goal, then the experiments included probably failed. If the purpose was to know whether clear but not very loud cues about ‘demand’ matter—and for what it’s worth, I think it is a very reasonable goal; pushing further, in my mind, would have reduced the experiment to a tautology—the paper provides the answer.

Advice that works

31 Mar

Writing habits of some writers:

“Early in the morning. A good writing day starts at 4 AM. By 11 AM the rest of the world is fully awake and so the day goes downhill from there.”

Daniel Gilbert

“Usually, by the time my kids get off to school and I get the dogs walked, I finally sit down at my desk around 9:00. I try to check my email, take care of business-related things, and then turn it off by 10:30—I have to turn off my email to get any writing done.”

Juli Berwald

“When it comes to writing, my production function is to write every day. Sundays, absolutely. Christmas, too. Whatever. A few days a year I am tied up in meetings all day and that is a kind of torture. Write even when you have nothing to say, because that is every day.”

Tyler Cowen

“I don’t write everyday. Probably 1-2 times per week.”

Benjamin Hardy

“I’ve taught myself to write anywhere. Sometimes I find myself juggling two things at a time and I can’t be too precious with a routine. I wrote Name of the Devil sitting on a bed in a rented out room in Hollywood while I was working on a television series for A&E. My latest book, Murder Theory, was written while I was in production for a shark documentary and doing rebreather training in Catalina. I’ve written in casinos, waiting in line at Disneyland, basically wherever I have to.”

Andrew Mayne

Should we wake up at 4 am and be done by 11 am as Dan Gilbert does or should we get started at 10:30 am like Juli, near the time Dan is getting done for the day? Should we write every day like Tyler or should we do it once or twice a week like Benjamin? Or like Andrew, should we just work on teaching ourselves to “write anywhere”?

There is a certain tautological aspect to good advice. It is advice that works for you. Do what works for you. But don’t assume that you have been given advice that is right for you or that it is the only piece of advice on that topic. Advice givers rarely point out that the complete set of reasonable things that could work for you is often pretty large and contradictory and that the evidence behind the advice they are giving you is no more than anecdotal evidence with a dash of motivated reasoning.

None of this to say that you should not try hard to follow advice that you think is good. But once you see the larger point, you won’t fret as much when you can’t follow a piece of advice or when the advice doesn’t work for you. As long as you keep trying to get to where you want to be (and of course, even the merit of some wished for end states is debatable), it is ok to abandon some paths, safe in the knowledge that there are generally more paths to get there.

The Risk of Misunderstanding Risk

20 Mar

Women who participate in breast cancer screening from 50 to 69 live on average 12 more days. This is the best case scenario. Gerd has more such compelling numbers in his book, Calculated Risks. Gerd shares such numbers to launch a front on assault on the misunderstanding of risk. His key point is:

“Overcoming innumeracy is like completing a three-step program to statistical literacy. The first step is to defeat the illusion of certainty. The second step is to learn about the actual risks of relevant eventsand actions. The third step is to communicate the risks in an understandable way and to draw inferences without falling prey to clouded thinking.”

Gerd’s key contributions are on the third point. Gerd identifies three problems with risk communication:

  1. using relative risk than Numbers Needed to Treat (NNT) or absolute risk,
  2. Using single-event probabilities, and
  3. Using conditional probabilities than ‘natural frequencies.’

Gerd doesn’t explain what he means by natural frequencies in the book but some of his other work does. Here’s a clarifying example that illustrates how the same information can be given in two different ways, the second of which is in the form of natural frequencies:

“The probability that a woman of age 40 has breast cancer is about 1 percent. If she has breast cancer, the probability that she tests positive on a screening mammogram is 90 percent. If she does not have breast cancer, the probability that she nevertheless tests positive is 9 percent. What are the chances that a woman who tests positive actually has breast cancer?”

vs.

“Think of 100 women. One has breast cancer, and she will probably test positive. Of the 99 who do not have breast cancer, 9 will also test positive. Thus, a total of 10 women will test positive. How many of those who test positive actually have breast cancer?”

For those in a hurry, here are my notes on the book.

Disgusting

7 Feb

Vegetarians turn at the thought of eating the meat of a cow that has died from a heart attack. The disgust that vegetarians experience is not principled. Nor is the greater opposition to homosexuality that people espouse when they are exposed to foul smell. Haidt uses similar such provocative examples to expose chinks in how we think about what is moral and what is not.

Knowing that what we find disgusting may not always be “disgusting,” that our moral reasoning can be flawed, is a superpower. Because thinking that you are in the right makes you self-righteous. It makes you think that you know all the facts, that you are somehow better. Often, we are not. If we stop conflating disgust with being in the right or indeed, with being right, we shall all get along a lot better.

The Best We Can Do is Responsibly Answer the Questions that Life Asks of Us

5 Feb

Faced with mass murder, it is hard to escape the conclusion that life has no meaning. For how could it be that life has meaning when lives matter so little? As a German Jew in a concentration camp, Victor Frankl had to confront that question.

In Man’s Search for Meaning, Frankl gives two answers to the question. His first answer is a reflexive rejection of the meaninglessness of life. Frankl claims that life is “unconditional[ly] meaningful.” There is something to that, but not enough to hang on to for too long. It is also not his big point.

Instead, Frankl has a more nuanced point: “If there is … meaning in life …, then there must be … meaning in suffering.” (Because suffering is an inescapable part of life.) The meaning of suffering, according to him, lies in how we respond to it. Do we suffer with dignity? Or do we let suffering degrade us? The broader, deeper point that underpins the claim is that we cannot always choose our conditions, but we can choose the “stand [we take] toward the conditions.” And life’s meaning is stored in the stand we take, in how we respond to the questions that “life asks of us.”

Not only that, the extent of human achievement is: responsibly answering the questions that life asks of us. This means two things. First, that questions about human achievement can only be answered within the context of one’s life. And second, in responsibly answering questions that life asks of us, we attain what humans can ever attain. In a limited life, circumscribed by unavoidable suffering, for instance, the peak of human achievement is keeping dignity. If your life offers you more, then, by all means, do more—derive meaning from action, from beauty, and from love. But also take solace in the fact that we can achieve the greatest heights a human can achieve in how we respond to unavoidable suffering.

Ruined by Google

13 Jan

Information on tap is a boon. But if it means that the only thing we will end up knowing—have in your heads—is where to go to find the information, it may also be a bane.

Accessible stored cognitions are vital. They allow us to verify and contextualize new information. If we need to look things up, because of laziness or forgetfulness, we will end up accepting some false statements, which we would have easily refuted had we had the relevant information in our memory, or we will fail to contextualize some statements appropriately.

Information on tap also produces another malaise. It changes the topography of what we know. As search costs go down, people move from learning about a topic systematically to narrowly searching for whatever they need to know, now. And knowledge built on narrow searches looks like Swiss cheese.

Worse, many a time when people find the narrow thing they are looking for, they think that that is all there to know. For instance, in Computer Science and Machine Learning, people can increasingly execute sophisticated things without knowing much. (And that is a mostly a good thing.) But getting something to work—by copying the code from StackOverflow—gives people the sense that they “know.” And when we think we know, we also know that there is not much more to know. Thus, information on tap reduces the horizons of our knowledge about our ignorance.

In becoming better at fulfilling our narrower needs, lower search costs may be killing expertise. And that is mostly a bad thing.

See also this paper that suggests that searching on Google causes you to think that you know more.

The Other Side

23 Oct

Samantha Laine Perfas of the Christian Science Monitor interviewed me about the gap between perceptions and reality for her podcast ‘perception gaps’ over a month ago. You can listen to the episode here (Episode 2).

The Monitor has also made the transcript of the podcast available here. Some excerpts:

“Differences need not be, and we don’t expect them to be, reasons why people dislike each other. We are all different from each other, right. …. Each person is unique, but we somehow seem to make a big fuss about certain differences and make less of a fuss about certain other differences.”

One way to fix it:

If you know so little and assume so much, … the answer is [to] simply stop doing that. Learn a little bit, assume a little less, and see where the conversation goes.

The interview is based on the following research:

  1. Partisan Composition (pdf) and Measuring Shares of Partisan Composition (pdf)
  2. Affect Not Ideology (pdf)
  3. Coming to Dislike (pdf)
  4. All in the Eye of the Beholder (pdf)

Related blog posts and think pieces:

  1. Party Time
  2. Pride and Prejudice
  3. Loss of Confidence
  4. How to read Ahler and Sood

Loss of Confidence

21 Oct

We all overestimate how much we know. If the aphorism, “the more you know, the more you know that you don’t know” is true, then how else could it be? But knowing more is not the only path to learning about our ignorance. Mistakes are another. When we make mistakes, we get to adjust our parameters (understanding) about how much we know. Overconfident people, however, incur smaller losses when they make mistakes. They don’t learn as much from mistakes because they externalize the source of errors or don’t acknowledge the mistakes, believing it is you who is wrong, not them. So, the most ignorant (the most confident) very likely make the least progress in learning about their ignorance when they make mistakes. (Ignorance is just one source of why people overestimate how much they know. There are many other factors, including personality.) But if you know this, you can fix it.

Pride and Prejudice

14 Jul

It is ‘so obvious’ that policy A >> policy B that only who don’t want to know or who want inferior things would support policy B. Does this conversation remind you of any that you have had? We don’t just have such conversations about policies. We also have them about people. Way too often, we are being too harsh.

We overestimate how much we know. We ‘know know’ that we are right, we ‘know’ that there isn’t enough information in the world that will make us switch to policy B. Often, the arrogance of this belief is lost on us. As Kahneman puts it, we are ‘ignorant of our own ignorance.’ How could it be anything else? Remember the aphorism, “the more you know, the more you know you don’t know”? The aphorism may not be true but it gets the broad point right. The ignorant are overconfident. And we are ignorant. The human condition is such that it doesn’t leave much room for being anything else (see the top of this page).

Here’s one way to judge your ignorance (see here for some other ideas). Start by recounting what you know. Sit in front of a computer and type it up. Go for it. And then add a sentence about how do you know. Do you recall reading any detailed information about this person or issue? From where? Would you have bought a car if you had that much information about a car?

We not just overestimate what we know, we also underestimate what other people know. Anybody with different opinions must know less than I. It couldn’t be that they know more, could it?

Both, being overconfident about what we know and underestimating what other people know leads to the same thing: being too confident about the rightness of our cause and mistaking our prejudices for obvious truths.

George Carlin got it right. “Have you ever noticed that anybody driving slower than you is an idiot, and anyone going faster than you is a maniac?” It seems the way we judge drivers is how we judge everything else. Anyone who knows less than you is either being willfully obtuse or an idiot. And those who know more than you just look like ‘maniacs.’

Firmly Against Posing Firmly

31 May

“What is crucial for you as the writer is to express your opinion firmly,” writes William Zinsser in “On Writing Well: An Informal Guide to Writing Nonfiction.” To emphasize the point, Bill repeats the point at the end of the paragraph, ending with, “Take your stand with conviction.”

This advice is not for all writers—Bill particularly wants editorial writers to write with a clear point of view.

When Bill was an editorial writer for the New York Herald Tribune, he attended a daily editorial meeting to “discuss what editorials … to write for the next day and what position …[to] take.” Bill recollects,

“Frequently [they] weren’t quite sure, especially the writer who was an expert on Latin America.

“What about that coup in Uruguay?” the editor would ask. “It could represent progress for the economy,” the writer would reply, “or then again it might destabilize the whole political situation. I suppose I could mention the possible benefits and then—”

The editor would admonish such uncertainty with a curt “let’s not go peeing down both legs.”

Bill approves of taking a side. He likes what the editor is saying if not the language. He calls it the best advice he has received on writing columns. I don’t. Certainty should only come from one source: conviction born from thoughtful consideration of facts and arguments. Don’t feign certainty. Don’t discuss concerns in a perfunctory manner. And don’t discuss concerns at the end.

Surprisingly, Bill agrees with the last bit about not discussing concerns in a perfunctory manner at the end. But for a different reason. He thinks that “last-minute evasions and escapes [cancel strength].”

Don’t be a mug. If there are serious concerns, don’t wait until the end to note them. Note them as they come up.

Bad Hombres: Bad People on the Other Side

8 Dec

Why do many people think that people on the other side are not well motivated? It could be because they think that the other side is less moral than them. And since opprobrium toward the morally defective is the bedrock of society, thinking that the people in the other group are less moral naturally leads people to censure the other group.

But it can’t be that two groups simultaneously have better morals than the other. It can only be that people in the groups think they are better. This much logic dictates. So, there has to be a self-serving aspect to moral standards. And this is what often leads people to think that the other side is less moral. Accepting this is not the same as accepting moral relativism. For even if we accept that some things are objectively more moral—not being sexist or racist say—some groups—those that espouse that a certain sex is superior or certain races are better—will still think that they are better.

But how do people come to know of other people’s morals? Some people infer morals from political aims. And that is a perfectly reasonable thing to do as political aims reflect what we value. For instance, a Republican who values ‘life’ may think that Democrats are morally inferior because they support the right to abortion. But the inference is fraught with error. As matters stand, Democrats would also like women to not go through the painful decision of aborting a fetus. They just want there to be an easy and safe way for women should they need to.

Sometimes people infer morals from policies. But support for different policies can stem from having different information or beliefs about causal claims. For instance, Democrats may support a carbon tax because they believe (correctly) the world is warming and because they think that the carbon tax is what will help reduce global warming the best and protect American interests. Republicans may dispute any part of that chain of logic. The point isn’t what is being disputed per se, but what people will infer about others if they just had information about the policies they support. Hanlon’s razor is often a good rule.

Estimating Bias and Error in Perceptions of Group Composition

14 Nov

People’s reports of perceptions of the share of various groups in the population are typically biased. The bias is generally greater for smaller groups. The bias also appears to vary by how people feel about the group—they are likelier to think that the groups they don’t like are bigger—and by stereotypes about the groups (see here and here).

A new paper makes a remarkable claim: “explicit estimates are not direct reflections of perceptions, but systematic transformations of those perceptions. As a result, surveys and polls that ask participants to estimate demographic proportions cannot be interpreted as direct measures of participants’ (mis)information since a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value…”

The claim is supported by a figure that takes the form of plotting a curve over averages. (It also reports results from other papers that base their inferences on similar figures.)

The evidence doesn’t seem right for the claim. Ideally, we want to plot curves within people and show that the curves are roughly the same. (I doubt it to be the case.)

Second, it is one thing to claim that the reports of perceptions follow a particular rescaling formula and another to claim that people are aware of what they are doing. I doubt that people are.

Third, if the claim that ‘a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value’ is true, then presenting people with correct information ought not to change how people think about groups, for e.g., perceived threat from immigrants. The calibrated error should be a much better moderator than the raw error. Again, I doubt it.

But I could be proven wrong about each. And I am ok with that. The goal is to learn the right thing, not to be proven right.