Stereotypical Understanding

The paucity of women in Computer Science, Math and Engineering in the US is justly widely lamented. Sometimes, the imbalance is attributed to gender stereotypes. But only a small fraction of men study these fields. And in absolute terms, the proportion of women in these fields is not a great deal lower than the proportion of men. So in some ways, the assertion that these fields are stereotypically male is in itself a misunderstanding.

For greater clarity, a contrived example: Say that the population is split between two similar sized groups, A and B. Say only 1% of Group A members study X, while the proportion of Group B members studying X is 1.5%. This means that 60% of those to study X belong to Group B. Or in more dramatic terms: activity X is stereotypically Group B. However, 98.5% of Group B doesn’t study X. And that number is not a whole lot different from 99%, the percentage of Group A that doesn’t study X.

When people say activity X is stereotypically Group B, many interpret it as ‘activity X is quite popular among X.’ (That is one big stereotype about stereotypes.) That clearly isn’t so. In fact, the difference between the preferences for studying X between Group A and B — as inferred from choices (assuming same choices, utility) — is likely pretty small.

Obliviousness to the point is quite common. For instance, it is behind arguments linking terrorism to Muslims. And Muslims typically respond with a version of the argument laid out above—they note that an overwhelming majority of Muslims are peaceful.

One straightforward conclusion from this exercise is that we may be able to make headway in tackling disciplinary stereotypes by elucidating the point in terms of the difference between p(X|Group A) and p(X| Group B) rather than in terms of p(Group A | X).

The Value of Money: Learning from Experiments Offering Money for Correct Answers

Papers at hand:

Prior et al. and Bullock et al.


Two empirical points that we learn from the papers:

1. Partisan gaps are highly variable (without money, control condition). See also: Partisan Retrospection?
(The point is never explicitly commented on by either of the papers. The point has implications for proponents of partisan retrospection.)

2. When respondents are offered money for the correct answer, partisan gap reduces by about half on average.


Question in front of us: Interpretation of point 2.


Why are there partisan gaps on knowledge items?

1. Different Beliefs: People believe different things to be true: People learn different things. For instance, Republicans learn that Obama is a Muslim, and Democrats that he is an observant Christian. For a clear exposition on what I mean by ‘beliefs’, see Waters of Casablanca.

2. Systematic Lazy Guessing: The number one thing people lie about on knowledge items is that they have the remotest clue about the question being asked. And the reluctance to acknowledge ‘Don’t Know’ is in itself a serious point worthy of investigation and careful interpretation. (My sense is that it tells us something important about humans.) When people guess on items with partisan implications, they may use inference rules. For instance, a Republican, when asked about whether unemployment rate under Obama had increased or decreased, may reason that Obama is a socialist and since socialism is bad policy, it must have increased the unemployment rate.

3. Cheerleading: Even when people know that things that reflect badly on their party happened, they lie. (I will be surprised if this is common.)


The Quantity of Interest: Different Beliefs.
We do not want: Different Beliefs + Systematic Lazy Guessing


Why would money reduce partisan gaps?

1. Reducing Systematic Lazy Guessing: Bullock et al. use pay for DK, offering people small incentive (much smaller than pay for correct) to confess to ignorance. Estimate should be closer to the quantity of interest: ‘Different Beliefs.’

2. Considered Guessing: On being offered money for the correct answer, respondents replace ‘lazy’ (for a bounded rational human —optimal) partisan heuristic described above with more effortful guessing. Replacing Systematic Lazy Guessing with Considered Guessing is good to the extent that Considered Guessing is less partisan. If it is so, the estimate will be closer to the quantity of interest: ‘Different Beliefs.’ (Think of it as a version of correlated measurement error. And we are now replacing systematic measurement error with error that is more evenly distributed, if not ‘randomly’ distributed.)

3. Looking up the Correct Answer: People look up answers to take the money on offer. Both papers go some ways to show that cheating isn’t behind the narrowing of the partisan gap. Bullock et al. use ‘placebo’ questions, and Prior et al. timing etc.

4. Reduces Cheerleading: For respondents for whom utility from lying < $, they stop lying. Estimate will be closer to the quantity of interest: 'Different Beliefs.'

5. Demand Effects: Respondents take the offer of money as a cue that their instinctive response isn’t correct. Estimate may be further away from the quantity of interest: ‘Different Beliefs.’

Sample this: Sampling randomly from the streets

Say you want to learn about the average number of potholes per unit paved street in a city. To estimate that quantity, the following sampling plan can be employed:

1. Get all the streets in a city from Google Maps or OSM
2. Starting from one end of the street, split each street into .5 km segments till you reach the end of the street. The last segment, or if the street is shorter than .5km, the only segment, can be shorter than .5 km.
3. Get the lat/long of start/end of the segments.
4. Create a database of all the segments: segment_id, street_name, start_lat, start_long, end_lat, end_long
5. Sample from rows of the database
6. Produce a CSV of the sampled segments (subset of step 4)
7. Plot the lat/long on Google Map — filling all the area within the segment.
8. Collect data on the highlighted segments.

Ipso Facto: Analysis of Complaints to IPSO

Independent Press Standards Agency (IPSO) handles complaints about accuracy etc. in the media in the U.K. Against which media organization are most complaints filed? And against which organization are the complaints most often upheld? We answer these questions using data from the IPSO website. (The data and scripts behind the analysis are posted on GitHub.)

Between its formation in September, 2014 and May 20th, 2016, IPSO received 371 complaints. Expectedly, tabloid newspapers are well represented. Of the 371 complaints, The Telegraph alone received 35 complaints, or about 9.4% of the total complaints. It was followed closely by The Mail with 31 complaints. The Times had 25 complaints filed against it, The Mirror and The Express 22 each, and The Sun, 19 complaints.

Generally, less than half the number of complaints were completely or partly upheld. Topping the list was The Express and The Telegraph with 10 upheld complaints each. And following close behind was The Times with 8 complaints, The Mail with 6, and The Sun and the Daily Star with 4 each.

See also the plot of batting average of media organizations with most complaints against them.

Clustering and then Classifying: Improving Prediction for Crudely-labeled and Mislabeled Data

Mislabeled and crudely labeled data are common problems in data science. Supervised prediction of such data expectedly yields poor results. To alleviate the problem, one simple solution is to cluster the data within each label, and then, instead of predicting original labels, predict cluster labels. For a class of problems, the method can be shown to always improve both comprehensibility and accuracy.

Detailed research note coming soon!