## Incentives to care

11 Sep

A lot of people have their lives cut short because they eat too much and exercise too little. Worse, the quality of their shortened lives is typically much lower as a result of avoidable’ illnesses that stem from bad behavior.’ And that isn’t all. People who are not feeling great are unlikely to be as productive as those who are. Ill-health also imposes a significant psychological cost on loved ones. The net social cost is likely enormous.

One way to reduce such costly avoidable misery is to invest upfront. Teach people good habits and psychological skills early on, and they will be less likely to ‘self-harm.’

So why do we invest so little up front? Especially when we know that people are ill-informed (about consequences of their actions) and myopic.

Part of the answer is that there are few incentives for anyone else to care. Health insurance companies don’t make their profits by caring. They make them by investing wisely. And by minimizing ‘avoidable’ short-term costs. If a member is unlikely to stick with a health plan for life, why invest in their long-term welfare? Or work to minimize negative externalities that may affect the next generation?

One way to make health insurance care is to rate them on estimated quality-adjusted years saved due to interventions they sponsored. That needs good interventions and good data science. And that is an opportunity. Another way is to get the government to invest heavily early on to address this market failure. Another version would be to get the government to subsidize care that reduces long-term costs.

## Measuring Segregation

31 Aug

Dissimilarity index is a measure of segregation. It runs as follows:

$\frac{1}{2} \sum\limits_{i=1}^{n} \frac{g_{i1}}{G_1} - \frac{g_{i2}}{G_2}$
where:

$g_{i1}$ is population of $g_1$ in the ith area
$G_{i1}$ is population of $g_1$ in the larger area
from which dissimilarity is being measured against

The measure suffers from a couple of issues:

1. Concerns about lumpiness. Even in a small area, are black people at one end, white people at another?
2. Choice of baseline. If the larger area (say a state) is 95\% white (Iowa is 91.3% White), dissimilarity is naturally likely to be small.

One way to address the concern about lumpiness is to provide an estimate of the spatial variance of the quantity of interest. But to measure variance, you need local measures of the quantity of interest. One way to arrive at local measures is as follows:

1. Create a distance matrix across all addresses. Get latitude and longitude. And start with Euclidean distances, though smart measures that take account of physical features are a natural next step. (For those worried about computing super huge matrices, the good news is that computation can be parallelized.)
2. For each address, find n closest addresses and estimate the quantity of interest. Where multiple houses are similar distance apart, sample randomly or include all. One advantage of n closest rather than addresses in a particular area is that it naturally accounts for variations in density.

But once you have arrived at the local measure, why just report variance? Why not report means of compelling common-sense metrics, like the proportion of addresses (people) for whom the closest house has people of another race?

As for baseline numbers (generally just a couple of numbers): they are there to help you interpret. They can be brought in later.

## Unstrapped

21 Aug

When strapped for time, some resort to wishful thinking, others to lashing out. Both are illogical. If you are strapped for time, it is either because you scheduled poorly or because you were a victim of unanticipated obligations. Both are understandable, but neither justify ‘acting out.’ So don’t.

Whatever the reason, being strapped for time means either that some things won’t get done on time, or that you will have to work harder, or that you will need more resources (yours or someone else’s), or all three. And the only things to do are:

1. Triage,
3. Communicate effectively to those affected

If you have landed in soup because of poor scheduling, for instance, by not budgeting any time to deal with things you haven’t scheduled, make a note. And improve.

And since it is never rational to worry—it is at best unproductive, and at worst corrosive—avoid it like plague.

## How do we know?

17 Aug

How can fallible creatures like us (claim to) know something? The scientific method is about answering that question well. To answer the question well, we have made at least three big innovations:

1. Empiricism. But no privileged observer. What you observe should be reproducible by all others.

2. Open to criticism: If you are not convinced about the method of observation, the claims being made, criticize. Offer reason or proof.

3. Mathematical Foundations: Reliance on math or formal logic to deduce what claims can be made if certain conditions are met.

These innovations along with two more innovations have allowed us to ‘scale.’ Foremost among the innovations that allow us to scale is our ability to work together. And our ability to preserve information on stone, paper, electrons, allows us to collaborate with and build on the work done by people who are now dead. The same principle that allows us to build as gargantuan a structure as the Hoover Dam and entire cities allows us to learn about complex phenomenon. And that takes us to the final principle of science.

## Peer to Peer

20 Mar

Peers are equals, except as reviewers, when they are more like capricious dictators. (Or when they are members of a peerage.)

We review our peers’ work because we know that we are all fallible. And because we know that the single best way we can overcome our own limitations is by relying on well-motivated, informed, others. We review to catch what our peers may have missed, to flag important methodological issues, to provide suggestions for clarifying and improving the presentation of results, among other such things. But given a disappointingly long history of capricious reviews, authors need assurance. So consider including in the next review a version of the following note:

Reviewers are fallible too. So this review doesn’t come with the implied contract to follow all ill-advised things or suffer. If you disagree with something, I would appreciate a small note. But rejecting a bad proposal is as important as accepting a good one.

Fear no capriciousness. And I wish you well.

## (Software) Product Development Cycle

6 Mar
1. Solicit Ideas
Talk to customers, analyze usage data, talk to peers, managers, think, hold competitions, raffles,…
For each idea:
2. Define the idea
Write out the idea for clarity — at least 5–10 sentences. Do some due diligence to see what else is there. Learn and revise (including abandon).
3. Does the idea make business sense? Does it make sense to the developers (if the idea is mechanistic or implementation related)?
Ideally, peer review. But at the minimum: Talk about your idea with the manager(s) and the developers. If both manager(s) and developer(s) agree (for some mechanistic things, only developers need to agree), move to the next step.
4. Prototype
This is optional, and for major, complex innovations that can be easily prototyped only. Write code. Produce Results. Does it make sense? Peer review, if needed. If not, abandon. If it does, move to the next step
5. Write the specifications
Spend no less than 20% of the entire development time on writing the specs, including proposed functions, options, unit tests, concerns, implications. Run the specifications by developers, get this peer reviewed, improve, and finalize.
6. Set Priority and Release Target
Talk to the manager about the priority order of the change, and assign it to a particular release cycle.

7. Who Does What by When?
Create JIRA ticket(s)
8. Code
General cycle = code, test -> peer review -> code, test -> peer review …
MVP of Expected Code or Aspects of good code: ‘Code your documentation’ (well-documented), modular, organized, tested, nice style, profiled. Do it once, do it well.

## The Innumerate American

19 Feb

In answering a question, scientists sometimes collect data that answers a different, sometimes yet more important question. And when that happens, scientists sometimes overlook the easter egg. This recently happened to me, or so I think.

Kabir and I recently investigated the extent to which estimates of motivated factual learning are biased (see here). As part of our investigation, we measured numeracy. We asked American adults to answer five very simple questions (the items were taken from Weller et al. 2002):

1. If we roll a fair, six-sided die 1,000 times, on average, how many times would the die come up as an even number? — 500
2. There is a 1% chance of winning a $10 prize in the Megabucks Lottery. On average, how many people would win the$10 prize if 1,000 people each bought a single ticket? — 10
3. If the chance of getting a disease is 20 out of 100, this would be the same as having a % chance of getting the disease. — 20
4. If there is a 10% chance of winning a concert ticket, how many people out of 1,000 would be expected to win the ticket? — 100
5. In the PCH Sweepstakes, the chances of winning a car are 1 in a 1,000. What percent of PCH Sweepstakes tickets win a car? — .1%

The average score was about 57%, and the standard deviation was about 30%. Nearly 80% (!) of the people couldn’t answer that 1 in a 1000 chance is .1% (see below). Nearly 38% couldn’t answer that a fair die would turn up, on average, an even number 500 times every 1000 rolls. 36% couldn’t calculate how many people out of a 1000 would win if each had a 1% chance. And 34% couldn’t answer that 20 out of 100 means 20%.

If people have trouble answering these questions, it is likely that they struggle to grasp some of the numbers behind how the budget is allocated, or for that matter, how to craft their own family’s budget. The low scores also amply illustrate that the education system fails Americans.

Given the importance of numeracy in a wide variety of domains, it is vital that we pay greater attention to improving it. The problem is also tractable — with the advent of good self-learning tools, it is possible to intervene at scale. Solving it is also liable to be good business. Given numeracy is liable to improve people’s capacity to count calories, make better financial decisions, among other things, health insurance companies could lower premiums in lieu of people becoming more numerate, and lending companies could lower interest rates in exchange for increases in numeracy.

## Motivated Citations

13 Jan

The best kind of insight is the ‘duh’ insight — catching something that is exceedingly common, almost routine, but something that no one talks about. I believe this is one such insight.

The standards for citing congenial research (that supports the hypothesis of choice) are considerably lower than the standards for citing uncongenial research. It is an important kind of academic corruption. And it means that the prospects of teleological progress toward truth in science, as currently practiced, are bleak. An alternate ecosystem that provides objective ratings for each piece of research is likely to be more successful. (As opposed to the ‘echo-system’—here are the people who find stuff that ‘agrees’ with what I find—in place today.)

An empirical implication of the point is that the average ranking of journals in which congenial research that is cited is published is likely to be lower than in which uncongenial research is published. Though, for many of the ‘conflicts’ in science, all sides of the conflict will have top-tier publications —-which is to say that the measure is somewhat crude.

The deeper point is that readers generally do not judge the quality of the work cited for support of specific arguments, taking many of the arguments at face value. This, in turn, means that the role of journal rankings is somewhat limited. Or more provocatively, to improve science, we need to make sure that even research published in low ranked journals is of sufficient quality.

## (Numb)ers

7 Jan

Limited Time Only

Lifespan ~ 80 years*52 weeks/year ~ 4k weeks

If you are 40: 2k weeks to go.

If you are 70: another 500–700 weeks.

Interpolate for rest.

If you read a book a week for 50 years, you will read ~ 2500 books.

Approximately 130M books have been published.

Wage

India’s PPP adjusted median per capita income was $1,497/yr in 2013. So ~ 550M people <$1500/yr.

• Since it is PPP wage, think about living on $1500/year in the US • Another way to think about it: Wage of median American working for a year ($26,964) > wage of 18 Indians working for a year
• Yet another way to think about it:
Avg. career ~ 35 years (30–65)
What median American will earn in 2 yrs = lifetime earning of an Indian earning median income
• Yet another way to think about it:
median CEO/median worker wage ~ 4x
median US Worker/median Indian worker ~ 18x
• Yet another way to think about it:
From a company’s perspective, median American has to be more than 45x productive (where shipping 0, as in lots of digital products) than median Indian or outsourcing or automation makes more sense.