Missing Women on the Streets of Delhi

19 Nov

In 1990, Amartya Sen estimated that more than 100 million women were missing in South and West Asia, and China. His NYRB article shed light on sex-discrimination in parts of Asia, highlighting, among other things, pathologies like sex-selective abortion, biases in nutrition, healthcare, and schooling.

We aim to extend that line of inquiry, and shed light on the question: “How many women are missing from a public life?” In particular, we aim to answer **how many women are missing from the streets.**

To estimate ‘missing women,’ we need a baseline. While there are some plausible ‘taste-based’ reasons for the sex ratio on the streets to differ from 50-50, for the current analysis, I assume that in a gender equal society, roughly equal number of men and women are out on the streets. And I attribute any skew to real (and perceived) threat of molestation, violence, harassment, patriarchy (allowing wives, daughters, sisters to go out), discrimination in employment, and similar such things.

Note About Data and Measurement

Of all the people out on the street over the course of a typical day in Delhi, what proportion are women? To answer that, I devised what I thought was a pretty reasonable sampling plan, and a pretty clever data collection strategy see here. Essentially, we would send people at random street locations at random times and ask them to take photos at head height, and then crowd-source counting the total number of people in the picture and the total number of women in the picture.

The data we finally collected in this round bears little resemblance to the original data collection plan. As to why the data collection went off rails, we have nothing to share publicly for now. The map of the places from which we collect data though lays bare the problems.

Data, Scripts, and Analyses are posted here.


The data were collected between 2016-11-12 and 2017-01-11. And between roughly 10 am and 7 pm. In all, we collected nearly 1,958 photos from 196 locations. On average about 81.5% of the people on the street were men. The average proportion of men across various locations was 86.7% which suggests that somewhat busier places have somewhat more women.

Estimating Bias and Error in Perceptions of Group Composition

14 Nov

People’s reports of perceptions of the share of various groups in the population are typically biased. The bias is generally greater for smaller groups. The bias also appears to vary by how people feel about the group—they are likelier to think that the groups they don’t like are bigger, and stereotypes (see Ahler and Sood).

A new paper makes a remarkable claim: “explicit estimates are not direct reflections of perceptions, but systematic transformations of those perceptions. As a result, surveys and polls that ask participants to estimate demographic proportions cannot be interpreted as direct measures of participants’ (mis)information, since a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value…”

The claim is supported by a figure that takes the form of plotting a curve over averages. (It also reports results from other papers that base their inferences on similar such figures.)

The evidence doesn’t seem right for the claim. Ideally, we want to plot curves within people and show that the curves are roughly the same. (I doubt it to be the case.)

Second, it is one thing to claim that the reports of perceptions follow a particular rescaling formula, and another to claim that people are aware of what they are doing. I doubt that people are.

Third, if the claim that ‘a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value’ is true, then presenting people correct information ought not to change how people think about groups, for e.g., perceived threat from immigrants. The calibrated error should be a much better moderator than raw error. Again, I doubt it.

But I could be proven wrong about each. And I am ok with that. The goal is to learn the right thing, not to be proven right.

Learning About [the] Loss (Function)

7 Nov

One of the things we often want to learn is the actual loss function people use for discounting ideological distance between self and a legislator. Often people try to learn the loss function using over actual distances. But if the aim is to learn the loss function, perceived distance rather than actual distance is better. It is so because perceived = what the voter believes to be true. People can then use the function to simulate out scenarios if perceptions = fact.

Confirmation Bias: Confirming Media Bias

31 Oct

It used to be that searches for ‘Fox News Bias’ were far more common than searches for ‘CNN bias’. Not anymore. The other notable thing—correlated peaks around presidential elections.

Note also that searches around midterm elections barely rise above the noise.

God, Weather, and News vs. Porn

22 Oct

The Internet is for porn (Avenue Q). So it makes sense to measure things on the Internet in porn units.

I jest, just a bit.

In Everybody Lies, Seth Stephens Davidowitz points out that people search for porn more than weather on GOOG. Data from Google Trends for the disbelievers.

But how do searches for news fare? Surprisingly well. And it seems the new president is causing interest in news to outstrip interest in porn. Worrying, if you take Posner’s point that people’s disinterest in politics is a sign that they think the system is working reasonably well. The last time searches for news > porn was when another Republican was in the White House!

How is the search for porn affected by Ramadan? For answer, we turn to Google Trends from Pakistan.

In Ireland though, of late, it appears the searches for porn increase during Christmas.

Make Phone Meetings Great Again!

20 Oct

Remote teams live and die by the phone meeting. So, how can we prevent death? Channel Rumsfeld. Begin with known knowns. Have a clear agenda, have a discussion leader, and end with a summary. Do those well. Then try out some ideas to address well-known challenges: disengagement, social friction, exclusion, time wastage, and inability to follow what’s going on.

  1. People on the phone can’t always tell between the two uses of brief silence—a brief pause, and signaling opening the floor for discussion. The speaker can address the issue by signaling the end of speech with a phrase. A speaker may start the speech by noting: ‘at the end of what I have to say, I will formally open up the floor and go around alphabetically among those whose speakers are unmuted.’
  2. Having ‘too many’ people = wasting people’s time + disinterested participants. How many is too many? Max. number of people that can productively engage where everyone is expected to contribute is probably as low as 4–5. Ask people to discuss independently within their own teams and share notes.
  3. Prevent or Cure Rambling as side effects are the same as above—time wastage and disinterest. The discussion leader should take on the responsibility to energetically understand the point people are trying to get at if people are having trouble articulating. The discussion leader may also refer the person to the shared document to sketch out the idea and try again.
  4. Slides/stuff in advance that everyone actually reads is important. Just tell people if you didn’t find time to read + independently think, just opt out (semi-private opt-outs with emails to meeting organizers should be allowed). The job of a meeting is not spoon feeding.
  5. Keeping people on the same page:
    • Visual aids, e.g. slides, are useful in bringing people on the same page.
    • Taking notes on a shared screen can also help see people that progress is being made. The document can be shared and that allows others to contribute and organize simultaneously.

Most importantly, avoid meetings when you can. If the aim of meeting = transferring information, it only makes sense to have a meeting < 10% of the times. Alternatives = write out a document, or create a slideshow or a video and send it along.

Incentives to Care

11 Sep

A lot of people have their lives cut short because they eat too much and exercise too little. Worse, the quality of their shortened lives is typically much lower as a result of `avoidable’ illnesses that stem from `bad behavior.’ And that isn’t all. People who are not feeling great are unlikely to be as productive as those who are. Ill-health also imposes a significant psychological cost on loved ones. The net social cost is likely enormous.

One way to reduce such costly avoidable misery is to invest upfront. Teach people good habits and psychological skills early on, and they will be less likely to self-harm.

So why do we invest so little up front? Especially when we know that people are ill-informed (about consequences of their actions) and myopic.

Part of the answer is that there are few incentives for anyone else to care. Health insurance companies don’t make their profits by caring. They make them by investing wisely. And by minimizing ‘avoidable’ short-term costs. If a member is unlikely to stick with a health plan for life, why invest in their long-term welfare? Or work to minimize negative externalities that may affect the next generation?

One way to make health insurance care is to rate them on estimated quality-adjusted years saved due to interventions they sponsored. That needs good interventions and good data science. And that is an opportunity. Another way is to get the government to invest heavily early on to address this market failure. Another version would be to get the government to subsidize care that reduces long-term costs.

Measuring Segregation

31 Aug

Dissimilarity index is a measure of segregation. It runs as follows:

\frac{1}{2} \sum\limits_{i=1}^{n} \frac{g_{i1}}{G_1} - \frac{g_{i2}}{G_2}

g_{i1} is population of g_1 in the ith area
G_{i1} is population of g_1 in the larger area
from which dissimilarity is being measured against

The measure suffers from a couple of issues:

1. Concerns about lumpiness. Even in a small area, are black people at one end, white people at another?
2. Choice of baseline. If the larger area (say a state) is 95\% white (Iowa is 91.3% White), dissimilarity is naturally likely to be small.

One way to address the concern about lumpiness is to provide an estimate of the spatial variance of the quantity of interest. But to measure variance, you need local measures of the quantity of interest. One way to arrive at local measures is as follows:

1. Create a distance matrix across all addresses. Get latitude and longitude. And start with Euclidean distances, though smart measures that take account of physical features are a natural next step. (For those worried about computing super huge matrices, the good news is that computation can be parallelized.)
2. For each address, find n closest addresses and estimate the quantity of interest. Where multiple houses are similar distance apart, sample randomly or include all. One advantage of n closest rather than addresses in a particular area is that it naturally accounts for variations in density.

But once you have arrived at the local measure, why just report variance? Why not report means of compelling common-sense metrics, like the proportion of addresses (people) for whom the closest house has people of another race?

As for baseline numbers (generally just a couple of numbers): they are there to help you interpret. They can be brought in later.


21 Aug

When strapped for time, some resort to wishful thinking, others to lashing out. Both are illogical. If you are strapped for time, it is either because you scheduled poorly or because you were a victim of unanticipated obligations. Both are understandable, but neither justify ‘acting out.’ So don’t.

Whatever the reason, being strapped for time means either that some things won’t get done on time, or that you will have to work harder, or that you will need more resources (yours or someone else’s), or all three. And the only things to do are:

  1. Triage,
  2. Ask for help, and
  3. Communicate effectively to those affected

If you have landed in soup because of poor scheduling, for instance, by not budgeting any time to deal with things you haven’t scheduled, make a note. And improve.

And since it is never rational to worry—it is at best unproductive, and at worst corrosive—avoid it like plague.