Ad targeting regimes are essential when we have multiple things to sell (opportunity cost) or when the cost of running an ad is non-trivial or when the cost to user’s welfare of a mistargeted ad is non-trivial or any combination of the above. I am leaving this purposely vague because all this is well known.
Say that we have built two different models for selecting people to whom we should show ads—Model A and Model B. Now say that we want to compare which model is better. And by better, we mean better CTR. How do we compare the models? Some people have run an RCT to compare the efficacy of the two models. We don’t need an RCT. All we need to know for each user is whether or not they have been selected to see an ad under each model. We can have 4 potential scenarios:
For CTR, 0-0 doesn’t add any information. It is the conditional probability. To measure which of the models is better, draw a fixed size random sample of users picked by model_a and another random sample of the same size from users picked by model_b and compare CTR. (The same user can be picked twice. It doesn’t matter.)
Now that we know what to do, let’s understand why experiments are wasteful. The heuristic account is as follows: experiments are there to compare ‘similar people.’ When comparing targeting regimes, we are tautologically comparing different people. That is the point of the comparison.