It used to be that searches for ‘Fox News Bias’ were far more common than searches for ‘CNN bias’. Not anymore. The other notable thing—correlated peaks around presidential elections.
Note also that searches around midterm elections barely rise above the noise.
The Internet is for porn (Avenue Q). So it makes sense to measure things on the Internet in porn units.
I jest, just a bit.
In Everybody Lies, Seth Stephens Davidowitz points out that people search for porn more than weather on GOOG. Data from Google Trends for the disbelievers.
But how do searches for news fare? Surprisingly well. And it seems the new president is causing interest in news to outstrip interest in porn. Worrying, if you take Posner’s point that people’s disinterest in politics is a sign that they think the system is working reasonably well. The last time searches for news > porn was when another Republican was in the White House!
How is the search for porn affected by Ramadan? For answer, we turn to Google Trends from Pakistan. But you may say that the trend is expected given Ramadan is seen as a period for ritual purification. And that is a reasonable point. But you see the same thing with Eid-ul-Fitr and porn.
And in Ireland, of late, it seems searches for porn increase during Christmas.
Dissimilarity index is a measure of segregation. It runs as follows:
is population of in the ith area
is population of in the larger area
from which dissimilarity is being measured against
The measure suffers from a couple of issues:
- Concerns about lumpiness. Even in a small area, are black people at one end, white people at another?
- Choice of baseline. If the larger area (say a state) is 95\% white (Iowa is 91.3% White), dissimilarity is naturally likely to be small.
One way to address the concern about lumpiness is to provide an estimate of the spatial variance of the quantity of interest. But to measure variance, you need local measures of the quantity of interest. One way to arrive at local measures is as follows:
- Create a distance matrix across all addresses. Get latitude and longitude. And start with Euclidean distances, though smart measures that take account of physical features are a natural next step. (For those worried about computing super huge matrices, the good news is that computation can be parallelized.)
- For each address, find n closest addresses and estimate the quantity of interest. Where multiple houses are similar distance apart, sample randomly or include all. One advantage of n closest rather than addresses in a particular area is that it naturally accounts for variations in density.
But once you have arrived at the local measure, why just report variance? Why not report means of compelling common-sense metrics, like the proportion of addresses (people) for whom the closest house has people of another race?
As for baseline numbers (generally just a couple of numbers): they are there to help you interpret. They can be brought in later.
In answering a question, scientists sometimes collect data that answers a different, sometimes yet more important question. And when that happens, scientists sometimes overlook the easter egg. This recently happened to me, or so I think.
Kabir and I recently investigated the extent to which estimates of motivated factual learning are biased (see here). As part of our investigation, we measured numeracy. We asked American adults to answer five very simple questions (the items were taken from Weller et al. 2002):
- If we roll a fair, six-sided die 1,000 times, on average, how many times would the die come up as an even number? — 500
- There is a 1% chance of winning a $10 prize in the Megabucks Lottery. On average, how many people would win the $10 prize if 1,000 people each bought a single ticket? — 10
- If the chance of getting a disease is 20 out of 100, this would be the same as having a % chance of getting the disease. — 20
- If there is a 10% chance of winning a concert ticket, how many people out of 1,000 would be expected to win the ticket? — 100
- In the PCH Sweepstakes, the chances of winning a car are 1 in a 1,000. What percent of PCH Sweepstakes tickets win a car? — .1%
The average score was about 57%, and the standard deviation was about 30%. Nearly 80% (!) of the people couldn’t answer that 1 in a 1000 chance is .1% (see below). Nearly 38% couldn’t answer that a fair die would turn up, on average, an even number 500 times every 1000 rolls. 36% couldn’t calculate how many people out of a 1,000 would win if each had a 1% chance. And 34% couldn’t answer that 20 out of 100 means 20%.
If people have trouble answering these questions, it is likely that they struggle to grasp some of the numbers behind how the budget is allocated, or for that matter, how to craft their own family’s budget. The low scores also amply illustrate that the education system fails Americans.
Given the importance of numeracy in a wide variety of domains, it is vital that we pay greater attention to improving it. The problem is also tractable — with the advent of good self-learning tools, it is possible to intervene at scale. Solving it is also liable to be good business. Given numeracy is liable to improve people’s capacity to count calories, make better financial decisions, among other things, health insurance companies could lower premiums in lieu of people becoming more numerate, and lending companies could lower interest rates in exchange for increases in numeracy.
For the data and scripts used to generate the graphs, see https://github.com/soodoku/pollbias.