Taking inspiration from Darwin, I outline a workflow for making advertisements battle it out - with only the best performer winning. We'll also look at statistical significance and the trap of mistaking proximate goals for business goals.

January 17, 2021

No notes available for this episode.

Transcribed by Rugo Obi

Over the course of my 10 or so years of doing online advertising, I've come up with a pretty simple system for optimizing adverts.

I was able to teach this to junior staff with no experience in online marketing, and it seemed to be workable.

The core idea is to create lots of variants of advertisements, and then pit them against one another in a survival of the fittest type contest. And then added to that is proper conversion tracking, rather than tracking within the platform, as well as statistical validity tests.

I'll go into a lot more detail in the course of this episode.

As a precursor, before you start advertising anything, you should check whether or not the category of content you plan on advertising is even allowed on that platform.

There are many many restrictions these days in order to protect the user experience within each advertising platforms' platform, and you should be aware of that.

For example, adult content, gambling, drugs, politics, crypto, are all very often restricted.

You don't want to end up in a situation where you've created 100 different adverts for some particular area, only to find out that they get immediately banned, and you'd wasted all that effort.

Sometimes, despite your best intentions, one of your adverts might unintentionally trip one of these content warnings.

For example, I advertised something called 'public international law', which is abbreviated as PIL, and I use that abbreviation in some of my adverts. These adverts ended up getting flagged and banned and took a week or two in order to appeal.

In more severe cases, when Google or Facebook suspect you of advertising something really really problematic, they will ban your whole account or freeze it or something like that.

This’s happened to me once, and it was incredibly painful to get out of the situation. I must have filled in about 20 different forms on Facebook and went crying to many, many different members of staff until something could be done about it.

This is an unfortunate state of affairs and so it's something you should be aware of.

A workaround possibly is to have multiple advertising accounts and to switch to another one If the first one has some sort of problems like this.

Now we get to the meat of the video, where I'm going to share the workflow I use in order to iteratively arrive at effective online advertisements.

The background idea is that it's good to try out many many different combinations of things, to search widely before you hone in on something that seems to be working.

So, the way I would do that is I might create 10 different variants of the headline, and then pit them against one another while keeping the image and copy constant.

After some time advertising, it should hopefully become apparent that one of those headlines performs much better than all the others, and therefore I'll start using that one in future.

At the same time then, I will also keep the headline constant but vary the image, also keeping the copy constant as well. And then I'll figure out which image is most effective.

I'll also do the same thing for the copy, keeping the image and the headline constant.

Once I figured out which single headline, which single image and which single piece of copy works best, I'll combine them all together into a super effective ad, then begin the whole process afresh. After a couple of iterations of this, I tend to arrive at far more efficient online advertisements, perhaps costing a third what the first generation might cost.

Oh, and I should also mention before we go into details that having variety in the advertisements that you show people is good, in and of itself, regardless of whether or not you are comparing their performances.

This is because of ad blindness, the idea that if someone sees your ad too often, they stop perceiving it.

The mind is very good at tuning out useless information or old information, therefore, if you had only one or two ads that you were showing to people, quite soon, they would completely tune out.

Compare that situation to where you have for example 10, 20, or 60 different adverts, in that case, you're likely to get attention for much longer.

If you're going to be running, many, many, many, many variations of advertisements against one another, you're going to need some sort of reliable system in order to distinguish which ad was which and collect your data.

Nearly all online advertising platforms allow you to give your advert a name, and I use this space to store the information about what variants were being used in a particular advertisement.

For example, you see the letter C and P here. C stands for copy and P stands for picture.

In this case, I'm using the copy with the benefit reliable in it, and the picture of the flower dog.

In the next case, I have the copy of "Study Smarter" and the picture of conversion contexts.

Headline isn't present here because I wasn’t testing it in this particular case but it's somewhere further down.

It's very important when you're measuring the relative performance of each advert, to use a realistic end metric.

For example, in my case, I was using these online advertisements in order to encourage authors to apply to sell their notes on my platform.

This required me to set up some conversion pixels on Facebook's behalf within my website's backend.

If instead I hadn't done that, and then I just relied on the click-through rate or the cost per click on Facebook, I might end up in a sort of a local maximum, but not a global maximum. In the sense that I might be getting very very cheap clicks on Facebook, but those people never actually apply to become authors.

This is a real danger with any sort of online advertising, that you mistake the proximate easy to calculate metric with the one that actually counts and makes a difference to your business.

Let me make that last point concrete.

So we're looking at some of the results of my online advertisements to get authors to apply. And in this column, we see the cost per author application as measured by the conversion pixel on my website. Lower is better here.

And in this column, we see the cost per click on Facebook. Lower is also better here.

Now, let's compare these two adverts. This second one is much more cost-effective in terms of my overall business goal because I'm getting an author application for $14.50 instead of $17.40.

So this would lead me to likely prefer this particular offer. However, if I hadn't connected my frontend to Facebook's conversion pixels, then, I would just have this data.

Considering that lower cost per click is always better, then I’d choose the top ad. I would think it's significantly better when in fact, it was worse when I considered the entire picture.

This is what I mean by the danger of getting trapped in a local maximum and thereby avoiding the global maximum, the one that actually matters to you.

Another insidious and very very very difficult to avoid source of error in comparing online advertisements is statistical invalidity.

Imagine in this case that only one author ended up applying in each of these cases, if that's what happened, then you can’t really read into this data too much.

That would be the statistical equivalent of flipping a coin once, seeing it turn up heads, and then concluding that it was biased.

In order to conclude that a coin is biased, you need to have a much larger number of flips. With three or four flips all coming up heads, there's still a good chance that the coin is unbiased because that can happen by random chance. However, if you flip the coin 300 times in a row and they were all heads, then your coin is very very likely to be biased.

The exact same kind of reasoning applies with your online advertisement performance. You need to make use of the law of large numbers in order to have a fair comparison between the performances that you see in these sort of reports that the advertising platforms create.

There are many statistical significance calculators online that you can use in order to test the validity of your results. On top of that, certain advertising platforms now include statistical significance calculators as part of their reports, so you should check if that's available.

Let me make my previous point more concrete by comparing the performance of two ads and their statistical validity.

So, ad A was clicked 10,000 times, and had three conversions, leading to a conversion rate of 0.03%.

Now, ad B converted a whopping eight times. That's nearly three times as much and has a conversion rate of 0.08%.

Can you definitely conclude, or conclude at least with a 95% probability, that ad B is better than ad A?

Well, if we go down here, it says that we can't. There's only a 93% confidence rate, which is below the kind of arbitrary cutoff point of either 95% or 99% that we use for statistical significance.

The danger inherent in reaching conclusions without taking statistical validity into account is that you may end up sitting on a fortress of false conclusions, yet have a lot of confidence in them being correct.

It might so happen here that with time, this advert ends up getting many more conversions just because of random chance, in which case, this was actually the most effective one if you had waited long enough to find that information out.

Off-camera, I've just gone and returned these numbers to the values they started at, and I want to demonstrate something else here.

So you can see here that ad B had nearly three times as many conversions as ad A, yet we still couldn't conclude that ad B was better performing with statistical significance.

The reason for that is because the total number of conversions is 11, which is very low. That does not count as a large number in the law of large numbers.

That's a bit like flipping a coin only 11 times. If we increase these numbers by 100, we get a total of 1200, which is a much bigger number, and we see now that statistical significance is 100%.

Even if we bring this down to 350, our statistical significance is 98%, even though the difference is only about 20%.

The overall point here is that statistical significance depends on the number of conversions and in certain flows in your website or for very small budgets, you might never get sufficient conversions to have statistical significance.

In that case, you're going to just have to do the best with what you've got.

Aside from being technically difficult and requiring some sophisticated math, statistical significance calculations are also emotionally and psychologically challenging.

This is so because you might be looking at the performance of your adverts and see that one is performing quite badly relative to the others and when you look at that bad performer, you might really want to switch it off and divert money to other adverts and get more bang for your buck.

This is especially true if you have a small budget. However, if you take this action, then you will sacrifice the certainty that one particular ad was better or worse than the other.

So this is not a decision you should take lightly.