Trump Trumps All: Coverage of Presidents on Network Television News

4 May

With Daniel Weitzel.

The US government is a federal system, with substantial domains reserved for local and state governments. For instance, education, most parts of the criminal justice system, and a large chunk of regulation are under the purview of the states. Further, the national government has three co-equal branches: legislative, executive, and judicial. Given these facts, you would expect news coverage to be broad in its coverage of branches and the levels of government. But there is a sharp skew in news coverage of politicians, with members of the executive branch, especially national politicians (and especially the president), covered far more often than other politicians (see here). Exploiting data from Vanderbilt Television News Archive (VTNA), the largest publicly available database of TV news—over 1M broadcast abstracts spanning 1968 and 2019—we add body to the observation. We searched for references to the president during their presidency and coded each hit as 1. As the figure below shows, references to the president are common. Excluding Trump, on average, a sixth of all articles contain a reference to the sitting president. But Trump is different: 60%(!) of abstracts refer to Trump.

Data and scripts can be found here.

Trading On Overconfidence

2 May

In Thinking Fast and Slow, Kahneman recounts a time when Thaler, Amos, and he met a senior investment manager in 1984. Kahneman asked, “When you sell a stock, who buys it?”

“[The investor] answered with a wave in the vague direction of the window, indicating that he expected the buyer to be someone else very much like him. That was odd: What made one person buy, and the other person sell? What did the sellers think they knew that the buyers did not? [gs: and vice versa.]”

“… It is not unusual for more than 100M shares of a single stock to change hands in one day. Most of the buyers and sellers know that they have the same information; they exchange the stocks primarily because they have different opinions. The buyers think the price is too low and likely to rise, while the sellers think the price is high and likely to drop. The puzzle is why buyers and sellers alike think that the current price is wrong. What makes them believe they know more about what the price should be than the market does? For most of them, that belief is an illusion.”

Thinking Fast and Slow. Daniel Kahneman

Note: Kahneman is not just saying that buyers and sellers have the same information but that they also know they have the same information.

There is a 1982 counterpart to Kahneman’s observation in the form of Paul Milgrom and Nancy Stokey’s paper on the No-Trade Theorem. “[If] [a]ll the traders in the market are rational, and thus they know that all the prices are rational/efficient; therefore, anyone who makes an offer to them must have special knowledge, else why would they be making the offer? Accepting the offer would make them a loser. All the traders will reason the same way, and thus will not accept any offers.”

From Lives Lost to Years Lost

2 Apr

The mortality rate is puzzling to mortals. A better number is the expected number of years lost. (A yet better number would be quality-adjusted years lost.) To make it easier to calculate the expected years lost, Suriyan and I developed a Python package that uses the SSA actuarial data and life table to estimate the expected years lost.

We illustrate the use of the package by estimating the average number of years by which people’s lives are shortened due to coronavirus. Using data from Table 1 of the paper that gives us the distribution of ages of people who died from COVID-19 in China, with conservative assumptions (assuming the gender of the dead person to be male, taking the middle of age ranges) we find that people’s lives are shortened by about 11 years on average. These estimates are conservative for one additional reason: there is likely an inverse correlation between people who die and their expected longevity. And note that given a bulk of the deaths are among older people, when people are more infirm, the quality-adjusted years lost is likely yet more modest. Given that the last life tables from China are from 1981 and given life expectancy in China has risen substantially since then (though most gains come from reductions in childhood mortality, etc.), we exploit the recent data from the US, assuming as-if people have the same life tables as Americans. Using the most recent SSA data, we find that the number to be 16. Compare this to deaths from road accidents, the modal reason for death among 5-24, and 25-44 ages in the US. Assuming everyone who dies from a traffic accident is a man, and assuming the age of death to be 25, we get ~52 years, roughly 3x as large as coronavirus.

The Puzzle of Price Dispersion on Amazon

29 Mar

Price dispersion is an excellent indicator of transactional frictions. It isn’t that absent price dispersion, we can confidently say that frictions are negligible. Frictions can be substantial even when price dispersion is zero. For instance, if the search costs are high enough that it makes it irrational to search, all the sellers will price the good at the buyer’s Willingness To Pay (WTP). Third world tourist markets, which are full of hawkers selling the same thing at the same price, are good examples of that. But when price dispersion exists, we can be reasonably sure that there are frictions in transacting. This is what makes the existence of substantial price dispersion on Amazon compelling.

Amazon makes price discovery easy, controls some aspects of quality by kicking out sellers who don’t adhere to its policies and provides reasonable indicators of quality of service with its user ratings. But still, on nearly all items that I looked at, there was substantial price dispersion. Take, for instance, the market for a bottle of Nature Made B12 vitamins. Prices go from $8.40 to nearly $30! With taxes, the dispersion is yet greater. If the listing costs are non-zero, it is not immediately clear why sellers selling the product at $30 are in the market. It could be that the expected service quality for the $30 seller is higher except that between the highest price seller and the next highest price seller, the ratings of the highest price seller are lower (take a look at shipping speed as well). And I would imagine that the ratings (and the quality) of Amazon, which comes in with the lowest price, are the highest. More generally, I have a tough time thinking about aspects of service and quality that are worth so much that the range of prices goes from 1x to 4x for a branded bottle of vitamin pills.

One plausible explanation is that the lowest price seller has a non-zero probability of being out of stock. And the more expensive and worse-quality sellers are there to catch these low probability events. They set a price that is profitable for them. One way to think about it is that the marginal cost of additional supply rises in the way the listed prices show. If true, then there seems to be an opportunity to make money. And it is possible that Amazon is leaving money on the table.

p.s. Sales of the boxed set of Harry Potter shows a similar pattern.

It Pays to Search

28 Mar

In Reinventing the Bazaar, John McMillan discusses how search costs affect the price the buyer pays. John writes:

“Imagine that all the merchants are quoting $1[5]. Could one of them do better by undercutting this price? There is a downside to price-cutting: a reduction in revenue from any customers who would have bought from this merchant even at the higher price. If information were freely available, the price-cutter would get a compensating boost in sales as additional customers flocked in. When search costs exist, however, such extra sales may be negligible. If you incur a search cost of 10 cents or more for each merchant you sample, and there are fifty sellers offering the urn, then even if you know there is someone out there who is willing to sell it at cost, so you would save $5, it does not pay you to look for him. You would be looking for a needle in a haystack. If you visited one more seller, you would have a chance of one in fifty of that seller being the price-cutter, so the return on average from that extra price quote would be 10 cents (or $5 multiplied by 1/50), which is the same as your cost of getting one more quote. It does not pay to search.”

Reinventing the Bazaar, John McMillan

John got it wrong. It pays to search. The cost and the expected payoff for the first quote is 10 cents. But if the first quote is $15, the expected payoff for the second quote—(1/49)*$50—is greater than 10 cents. And so on.

Another way to solve for it is to come up with the expected number of quotes you need to get to get to the seller selling at $10. It is 25. Given you need to spend on average $2.50 to get a benefit of $2.50, you will gladly search.

Yet another way to think is that the worst case is that you make no money—when the $10 seller is the last one you get a quote from. But in every other case, you make money.

For the equilibrium price, you need to make assumptions. But if the buyer knows that there is a price cutter, they will all buy from him. This means that the price cutter will be the only seller remaining.

There are two related fun points. First, one of the reasons markets are competitive on price when true search costs are high is likely because people price their time remarkably low. Second, when people spend a bunch of time looking for the cheapest deal, you incentivize all the sellers selling at a high rate to lower their rates and make it better for everyone else.

Good News: Principles of Good Journalism

12 Mar

If fake news—deliberate disinformation, not uncongenial news—is one end of the spectrum, what is the other end of the spectrum?

To get at the question, we need a theory of what news should provide. A theory of news, in turn, needs a theory of citizenship, which prescribes the information people need to execute their role, and an empirically supported behavioral theory of how people get that information.

What a democracy expects of people varies by the conception of democracy. Some theories of democracy only require citizens to have enough information to pick the better candidate when differences in candidates are material. Others, like deliberative democracy, expect people to be well informed and to have thought through various aspects of policies.

I opt for deliberative democracy to guide expectations about people for two reasons. Not only does the theory best express the highest ideals of democracy, but it also has the virtue of answering a vital question well. If all news was equally profitable to produce and was as widely read, what kind of news would lead to the best political outcomes, as judged by idealized versions of people—people who have all the information and all the time to think through the issues?

There are two virtues of answering such a question. First, it offers a convenient place to start answering what we mean by ‘good’ news; we can bring in profitability and reader preferences later. Second, engaging with it uncovers some obvious aspects of ‘good’ news.

For news to positively affect political outcomes (not in the shallow, instrumental sense), the news has to be about politics. Rather than news about Kim Kardashian or opinions about the hottest boots this season, ‘good’ news is about policymaker, policy-implementor, and policy-relevant news.

News about politics is a necessary but not a sufficient condition. Switching from discussing Kim Kardashian’s dress to Hillary Clinton’s is very plausibly worse. Thus, we also want the news to be substantive, engaging with real issues rather than cosmetic concerns.

Substantively engaging with real issues is still no panacea. If the information is not correct, it will misinform than inform the debate. Thus, the third quality of ‘good’ news is correctness.

The criterion for “good” news is, however, not just correctness, but it is the correctness of interpretation. ‘Good’ news allows people to draw the right conclusions. For instance, reporting murder rates as say ‘a murder per hour’ without reporting the actual number of murders or comparing the probability of being murdered to other common threats to life may instill greater fear in people than ‘optimal.’ (Optimal, as judged by better-informed versions of ourselves who have been given time to think. We can also judge optimal by correctness—did people form unbiased, accurate beliefs after reading the news?)

Not all issues, however, lend themselves to objective appraisals of truth. To produce ‘good’ news, the best you can do is have the right process. The primary tool that journalists have in the production of news is the sources they use to report on stories. (While journalists increasingly use original data to report, the reliance on people is widespread.) Thus, the way to increase correctness is through manipulating aspects of sources. We can increase correctness by increasing the quality of sources, e.g., source more knowledgeable people with low incentives to cook the books, increase the diversity of sources, e.g., not just government officials but also plausibly major NGOs, and the number of sources.

If we emphasize correctness, we may fall short on timeliness. News has to be timely enough to be useful, aside from being correct enough to guide policy and opinion correctly.

News can be narrowly correct but may commit sins of omission. ‘Good’ news provides information on all sides of the issue. ‘Good’ news highlights and engages with all serious claims. It doesn’t give time to discredited claims for “balance.”

Second-to-last, news should be delivered in the right tone. Rather than speculative ad-hominem attacks, “good” news engages with arguments and data.

Lastly, news contributes to the public kitty only if it is original. Thus, ‘good’ news is original. (Plagiarism reduces the incentives for producing quality news because it eats into the profits.)

Feigning Competence: Checklists For Data Science

25 Jan

You may have heard that most published research is false (Ionnadis). But what you probably don’t know is that most corporate data science is also false.

Gaurav Sood

The returns on data science in most companies are likely sharply negative. There are a few reasons for that. First, as with any new ‘hot’ field, the skill level of the average worker is low. Second, the skill level of the people managing these workers is also low—most struggle to pose good questions, and when they stumble on one, they struggle to answer it well. Third, data science often fails silently (or there is enough corporate noise around it that most failures are well-hidden in plain sight), so the opportunity to learn from mistakes is small. And if that was not enough, many companies reward speed over correctness, and in doing that, often obtain neither.

How can we improve on the status quo? The obvious remedy for the first two issues is to increase the skill by improving training or creating specializations. And one remedy for the latter two points is to create incentives for doing things correctly.

Increasing training and creating specializations in data science is expensive and slow. Vital, but slow. Creating the right incentives for good data science work is not trivial either. There are at least two large forces lined up against it: incompetent supervisors and the fluid and collaborative nature of work—work usually involves multiple people, and there is a fluid exchange of ideas. Only the first is fixable—the latter is a property of work. And fixing it comes down to making technical competence a much more important criterion for hiring.

Aside from hiring more competent workers or increasing the competence of workers, you can also simulate the effect by using checklists—increase quality by creating a few “pause points”—times during a process where the person (team) pauses and goes through a standard list of questions.

To give body to the boast, let me list some common sources of failures in DS and how checklists at different pause points may reduce failure.

  1. Learn what you will lose in translation. Good data science begins with a good understanding of the problem you are trying to solve. Once you understand the problem, you need to translate it into a suitable statistical analog. During translation, you need to be aware that you will lose something in the translation.
  2. Learn the limitations. Learn what data you would love to have to answer the question if money was no object. And use it to understand how far do you fall short from that ideal and then come to a judgment about whether the question can be answered reasonably with the data at hand.
  3. Learn how good the data are. You may think you have the data, but it is best to verify it. For instance, it is good practice to think through the extent to which a variable captures the quantity of interest.
  4. Learn the assumptions behind the formulas you use and test the assumptions to find the right thing to do. Thou shall only use math formulas when you know the limitations of such formulas. Having a good grasp of when formulas don’t work is essential. For instance, say the task is to describe a distribution. Someone may use the mean and standard deviation to describe it. But we know that these sufficient statistics vary by distribution. For binomial, it may just be p. A checklist for “describing” a variable can be:
    1. check skew by plotting: averages are useful when distributions are symmetric, and lots of observations are close to the mean. If skewed, you may want to describe various percentiles.
    2. how many missing values and what explains the missing values.
    3. check for unusual values and what explains the ‘unusual’ values.

Ruling Out Explanations

22 Dec

The paper (pdf) makes the case that the primary reason for electoral cycles in dissents is priming. The paper notes three competing explanations: 1) caseload composition, 2) panel composition, and 3) volume of caseloads. And it “rules them out” by regressing case type, panel composition, and caseload on quarters from the election (see Appendix Table D). The coefficients are uniformly small and insignificant. But is that enough to rule out alternate explanations? No. Small coefficients don’t imply that there is no path from proximity to the election via competing mediators to dissent (if you were to use causal language). We can only conclude that the pathway doesn’t exist if there is a sharp null. The best you can do is bound the estimated effect.

Preference for Sons in the US: Evidence from Business Names

24 Nov

I estimate preference for passing on businesses to sons by examining how common words son and sons are compared to daughter and daughters in the names of businesses.

In the US, all businesses have to register with a state. And all states provide a way to search business names, in part so that new companies can pick names that haven’t been used before.

I begin by searching for son(s) and daughter in states’ databases of business names. But the results of searching son are inflated because of three reasons:

  • son is part of many English words, from names such as Jason and Robinson to ordinary English words like mason (which can also be a name). 
  • son is a Korean name.
  • some businesses use the wordson playfully. For instance, sonis a homonym of sun and some people use that to create names like son of a beach.

I address the first concern by using a regex that only looks at words that exactly match son or sons. But not all states allow for regex searches or allow people to download a full set of results. Where possible, I try to draw a lower bound. But still some care is needed in interpreting the results.

Data and Scripts: https://github.com/soodoku/sonny_side

In all, I find that a conservative estimate of son to daughter ratio is between 4 to 1 to 26 to 1 across states.

Learning From the Future with Fixed Effects

6 Nov

Say that you want to predict wait times at restaurants using data with four columns: wait times (wait), the restaurant name (restaurant), time and date of observation. Using the time and date of the observation, you create two additional columns: time of the day (tod) and day of the week (dow). And say that you estimate the following model:

\text{wait} \sim  \text{restaurant} + tod + dow + \epsilon

Assume that the number of rows is about 100 times the number of columns. There is little chance of overfitting. But you still do an 80/20 train/test split and pick the model that works the best OOS.

You have every right to expect the model’s performance to be close to its OOS performance. But when you deploy the model, the model performs much worse than that. What could be going on?

In the model, we estimate a restaurant level intercept. But in estimating the intercept, we use data from all wait times, including those that happened after the date. One fix is to using rolling averages or last X wait times in the regression. Another is to more formally construct the data in such a way that you are always predicting the next wait time.