What’s Best? Comparing Model Outputs

10 Mar

Let’s assume that you have a large portfolio of messages: n messages of k types. And say that there are n models, built by different teams, that estimate how relevant each message is to the user on a particular surface at a particular time. How would you rank order the messages by relevance, understood as the probability a person will click on the relevant substance of the message?

Isn’t the answer: use the max. operator as a service? Just using the max. operator can be a problem because of:

a) Miscalibrated probabilities: the probabilities being output from non-linear models are not always calibrated. A probability of .9 doesn’t mean that there is a 90% chance that people will click it.

b) Prediction uncertainty: prediction uncertainty for an observation is a function of the uncertainty in the betas and distance from the bulk of the points we have observed. If you were to randomly draw a 1,000 samples each from the estimated distribution of p, a different ordering may dominate than the one we get when we compare the means.

This isn’t the end of the problems. It could be that the models are built on data that doesn’t match the data in the real world. (To discover that, you would need to compare expected error rate to actual error rate.) And the only way to fix the issue is to collect new data and build new models of it.

Comparing messages based on propensity to be clicked is unsatisfactory. A smarter comparison would take optimize for profit, ideally over the long term. Moving from clicks to profits requires reframing. Profits need not only come from clicks. People don’t always need to click on a message to be influenced by a message. They may choose to follow-up at a later time. And the message may influence more than the person clicking on the message. To estimate profits, thus, you cannot rely on observational data. To estimate the payoff for showing a message, which is equal to the estimated winning minus the estimated cost, you need to learn it over an experiment. And to compare payoffs of different messages, e.g., encourage people to use a product more, encourage people to share the product with another person, etc., you need to distill the payoffs to the same currency—ideally, cash.

Expertise as a Service

3 Mar

The best thing you can say about Prediction Machines, a new book by a trio of economists, is that it is not barren. Most of the growth you see is about the obvious: the big gain from ML is our ability to predict better, and better predictions will change some businesses. For instance, Amazon will be able to move from shopping-and-then-shipping to shipping-and-then-shopping—you return what you don’t want—if it can forecast what its customers want well enough. Or, airport lounges will see reduced business if we can more accurately predict the time it takes to reach the airport.

Aside from the obvious, the book has some untended shrubs. The most promising of them is that supervised algorithms can have human judgment as a label. We have long known about the point. For instance, self-driving cars use human decisions as labels—we learn braking, steering, speed as a function of road conditions. But what if we could use expert human judgment as a label for other complex cognitive tasks? There is already software that exploits that point. Grammarly, for instance, uses editorial judgments to give advice about grammar and style. But there are so many places other we could exploit this. You could use it to build educational tools that gives guidance on better ways of doing something in real time. You could also use it to reduce the need for experts.

p.s. The point about exploiting the intellectual property of experts deserves more attention.

Experiments Without Control

4 Jan

Say that you are in the search engine business. And say that you have built a model that estimates how relevant an ad is based on the ‘context’: search query, previous few queries, kind of device, location, and such. Now let’s assume that for context X, the rank-ordered list of ads based on expected profit is: product A, product B, and product C. Now say that you want to estimate how effective an ad for product A is in driving the sales of product A. One conventional way to estimate this is to randomly assign during serve time: for context X, serve half the people an ad for product A and serve half the people no ad. But if it is true (and you can verify this) that an ad for product B doesn’t cause people to buy product A, then you can switch the ‘no ad’ control where you are not making any money with an ad for product B. With this, you can estimate the effectiveness of ad for product A while sacrificing the least amount of revenue. Better yet, if it is true that ad for product A doesn’t cause people to buy product B, you can also at the same time get an estimate of the efficacy of ad for product B.

The Benefit of Targeting

16 Dec

What is the benefit of targeting? Why (and when) do we need experiments to estimate the benefits of targeting? And what is the right baseline to compare against?

I start with a business casual explanation, using examples to illustrate some of the issues at hand. Later in the note, I present a formal explanation to precisely describe the assumptions to clarify under what conditions targeting may be a reasonable thing to do.

Business Casual

Say that you have some TVs to sell. And say that you could show an ad about the TVs to everyone in the city for free. Your goal is to sell as many TVs as possible. Does it make sense for you to build a model to pick out people who would be especially likely to buy the TV and only show an ad to them? No, it doesn’t. Unless ads make people less likely to purchase TVs, you are always better-off reaching out to everyone.

You are wise. You use common sense to sell more TVs than the guy who spent a bunch of money building the model and selling less. You make tons of money. And you use the money to buy Honda and Mercedes dealerships. You still retain the magical power of being able to show ads to everyone for free. Your goal is to maximize profits. And selling Mercedes nets you more profit than Hondas. Should you use a model to show some people ads about Toyota and other people ads about Honda? The answer is still no. Under likely to hold assumptions, the optimal strategy is to show an ad for Mercedes first and then an ad for Toyota. (You can show the Toyota ad first if people who want to buy Mercedes won’t buy a cheaper car if they see an ad for a cheaper car first.)

But what if you are limited to only one ad? What would you do? In that case, a model may make sense. Let’s see how things may look with some fake data. Let’s compare the outcomes of four strategies: two model-based targeting strategies and two target-everyone with one ad strategies. To make things easier, let’s assume that selling Mercedes nets ten units of profits and selling Honda nets five units of profit. Let’s also assume that people will only buy something if they see an ad for their preferred product.

Continue reading here (pdf).

The Value of Predicting Bad Things

30 Oct

Foreknowledge of bad things is useful because it gives us an opportunity to a. prevent it, and b. plan for it.

Let’s refine our intuitions with a couple of concrete examples.

Many companies work super hard to predict customer ‘churn’—which customer is not going to use a product over a specific period (which can be the entire lifetime). If you know who is going to churn in advance, you can: a. work to prevent it, b. make better investment decisions based on expected cash flow, and c. make better resource allocation decisions.

Users “churn” because they don’t think the product is worth the price, which may be because a) they haven’t figured out a way to use the product optimally, b) a better product has come on the horizon, or c) their circumstances have changed. You can deal with this by sweetening the deal. You can prevent users from abandoning your product by offering them discounts. (It is useful to experiment to learn about the precise demand elasticity at various predicted levels of churn.) You can also give discounts is the form of offering some premium features free. Among people who don’t use the product much, you can run campaigns to help people use the product more effectively.

If you can predict cash-flow, you can optimally trade-off risk so that you always have cash at hand to pay your obligations. Churn can also help you with resource allocation. It can mean that you need to temporarily hire more customer success managers. Or it can mean that you need to lay off some people.

The second example is from patient care. If you could predict reasonably that someone will be seriously sick in a year’s time (and you can in many cases), you can use it to prioritize patient care, and again plan investment (if you were an insurance company) and resources (if you were a health services company).

Lastly, as is obvious, the earlier you can learn, the better you can plan. But generally, you need to trade-off between noise in prediction and headstart—things further away are harder to predict. The noise-headstart trade-off is something that should be done thoughtfully and amended based on data.

Targeting 101

22 Jun

Targeting Economics

Say that there is a company that makes more than one product. And users of any one of its products don’t use all of its products. In effect, the company has a \textit{captive} audience. The company can run an ad in any of its products about the one or more other products that a user doesn’t use. Should it consider targeting—showing different (number of) ads to different users? There are five things to consider:

  • Opportunity Cost: If the opportunity is limited, could the company make more profit by showing an ad about something else?
  • The Cost of Showing an Ad to an Additional User: The cost of serving an ad; it is close to zero in the digital economy.
  • The Cost of a Worse Product: As a result of seeing an irrelevant ad in the product, the user likes the product less. (The magnitude of the reduction depends on how disruptive the ad is and how irrelevant it is.) The company suffers in the end as its long-term profits are lower.
  • Poisoning the Well: Showing an irrelevant ad means that people are more likely to skip whatever ad you present next. It reduces the company’s ability to pitch other products successfully.
  • Profits: On the flip side of the ledger are expected profits. What are the expected profits from showing an ad? If you show a user an ad for a relevant product, they may not just buy and use the other product, but may also become less likely to switch from your stack. Further, they may even proselytize your product, netting you more users.

I formalize the problem here (pdf).

Learning About [the] Loss (Function)

7 Nov

One of the things we often want to learn is the actual loss function people use for discounting ideological distance between self and a legislator. Often people try to learn the loss function using over actual distances. But if the aim is to learn the loss function, perceived distance rather than actual distance is better. It is so because perceived = what the voter believes to be true. People can then use the function to simulate out scenarios if perceptions = fact.

Incentives to Care

11 Sep

A lot of people have their lives cut short because they eat too much and exercise too little. Worse, the quality of their shortened lives is typically much lower as a result of avoidable' illnesses that stem frombad behavior.’ And that isn’t all. People who are not feeling great are unlikely to be as productive as those who are. Ill-health also imposes a significant psychological cost on loved ones. The net social cost is likely enormous.

One way to reduce such costly avoidable misery is to invest upfront. Teach people good habits and psychological skills early on, and they will be less likely to self-harm.

So why do we invest so little up front? Especially when we know that people are ill-informed (about the consequences of their actions) and myopic.

Part of the answer is that there are few incentives for anyone else to care. Health insurance companies don’t make their profits by caring. They make them by investing wisely. And by minimizing ‘avoidable’ short-term costs. If a member is unlikely to stick with a health plan for life, why invest in their long-term welfare? Or work to minimize negative externalities that may affect the next generation?

One way to make health insurance care is to rate them on estimated quality-adjusted years saved due to interventions they sponsored. That needs good interventions and good data science. And that is an opportunity. Another way is to get the government to invest heavily early on to address this market failure. Another version would be to get the government to subsidize care that reduces long-term costs.

Companies With Benefits or Potential Welfare Losses of Benefits

27 Sep

Many companies offer employees ‘benefits.’ These include paying for healthcare, investment plans, company gym, luncheons etc. (Just ask a Silicon Valley tech. employee for the full list.)

But why ‘benefits’? And why not cash?

A company offering a young man zero down health care plan seems a bit like within-company insurance. Post Obamacare—it also seems a bit unnecessary. (My reading of Obamacare is that it just mandates companies pay for the healthcare but doesn’t mandate that how they pay for it. So cash payments ought to be ok?)

For investment, the reasoning strikes me as thinner still. Let people decide what they want to do with their money.

In many ways, benefits look a bit like ‘gifts’—welfare reducing but widespread.

My recommendation: just give people cash. Or give them an option to have cash.

Raising Money for Causes

10 Nov

Four teenagers, on the cusp of adulthood, and eminently well to do, were out on the pavement raising money for children struck with cancer. They had been out raising money for a couple of hours, and from a glance at their tin pot, I estimated that they had raised about $30 odd dollars, likely less. Assuming donation rate stays below $30/hr, or more than what they would earn if they were all working minimum wage jobs, I couldn’t help but wonder if their way of raising money for charity was rational; they could have easily raised more by donating their earnings from doing minimum wage job. Of course, these teenagers aren’t alone. Think of the people out in the cold raising money for the poor on New York pavements. My sense is that many people do not think as often about raising money by working at a “regular job”, even when it is more efficient (money/hour) (and perhaps even more pleasant). It is not clear why.

The same argument applies to those who run in marathons etc. to raise money. Preparing and running in marathon generally costs at least hundreds of dollars for an average ‘Joe’ (think about the sneakers, the personal trainers that people hire, the amount of time they `donate’ to train, which could have been spent working and donating that money to charity etc.). Ostensibly, as I conclude in an earlier piece, they must have motives beyond charity. These latter non-charitable considerations, at least at first glance, do not seem to apply to the case of teenagers, or to those raising money out in the cold in New York.