Episode #3

SEO Strategies For Web Apps - Part I

This episode goes through some of the strategies I used to get north of 200k monthly organic page views to my website. I'll cover picking keywords through Google Keyword Planner (and why it's important to build your code naming conventions around these), structured data (which increase CTRs on Google), and scalable mass-content creation - what I believe to be the best strategy for SEO both when I began in 2010 and ten years later when I released this screencast in 2020.

May 17, 2020

Show Notes



  • Neovim - The vim fork I happen to use. I have no opinion on whether it's better or worse than regular Vim. This video used version 0.4.3
  • fzf - A versatile command-line fuzzy-finder for opening file


Transcribed by Rugo Obi

Welcome to Semicolon&Sons.

Today's episode is about SEO.

Oxbridge Notes is getting about 200,000 hits per month, and that’s about 90% at least organic traffic.

There are a lot of ways to do SEO in 2020 but the one that appeals to me most is to generate content from data or to generate content from user input.

The most important thing to do from an SEO perspective when you're starting off is to figure out what keywords you want to target.

Unfortunately this lies outside of code so we're going to have to visit the keyword planner on Google Ads first.

I believe you need an account. I'm not sure if you have to spend any money with them but either way it's absolutely worth it.

Once in here, what you want to go for is “get search volumes and forecasts”. I'm going to add in a bunch of possible keywords that I think might be good and then we’re going to go get the data and figure out which ones are worth building a website around.

One thing to watch out for here is that the location will be set to where you are. But if your market is somewhere else, like mine is (it's in the United Kingdom and the United States), therefore you wanna switch that around.

Click on historical metrics next and then in the second column there you can see a bunch of numbers.

For example, "law notes" has 110 monthly searches and "law revision" has 170.

I decided to combine both of those terms and a lot of my website is structured around the idea of revision notes.

We're not done here yet since I'm also active in the United States. Let's therefore compare the historical metrics there.

Here we can see a slightly different picture.

"Law outlines" is nearly as popular as "law notes". In the past, "law outlines" was more popular than "law notes". So therefore I structured my US website around the idea of outlines.

I'm showing you this because things that are true in one market are not necessarily true in another.

Once I've chosen my keywords, I really went to town in including them throughout my website.

You can see it's in the website name, the tagline, a bunch of headers, links and so on. It's even in the URL.

On top of that, if we visit my United States website, you can see that the keyword switches for the most part here. The website name is the same but now I'm using "outlines" in many of the places where I used "notes" before. This is in order to search engine optimize for the different market.

Very briefly, let me show you how I switch those keywords between geographic location.

Essentially I rely on a translation mechanism. This is usually used to translate between different languages but here I'm just translating SEO keywords from region to region.

There are a ton of reasons to look up your keywords before you build anything and principle amongst these is that you reduce the amount of drift, naming drift, between your code and your business, between your code and your marketing efforts.

As you can see here, the routes within my system are called revision_ notes, based on my keyword.

Unfortunately, if you look at where those routes are leading to by following on to the latter half of each line, you see that the controller names don't quite match onto the entities in question.

This sometimes leads to confusion in communicating and it can be a cause for non-search engine optimized entities to crop up into the public-facing website.

Next let's take a look at how a landing page from Oxbridge Notes appears in Google Search. I'm going to search for some medical law notes or whatever and you can see there’s some pricing information due to some micro-data about a product (I believe).

There’s the title tag, the meta-description and then at the top here you’ll also see my URL.

A gotcha for me with all of this stuff is that you don't have complete control over what Google will present in this meta description and meta tags. At best you can suggest to them. Also there is an enormous lag between changes you make and changes Google picks up.

For example if you look inside my Product model I have a meta_ description method that returns basically a fixed string that varies only based on the data for that individual product.

But that string there, "Carefully curated", for example, "criminal law notes written by a high scoring Oxford grads". That doesn't appear in the meta description online, at least for the medical law notes.

Why might that be? Well, if I open up git here, I'm using the Fugitive plugin. You can see that the last commit that affected that line of code was at the end of 2019. That's about 4 months ago, as of right now.

And in the previous commit, I used to have a method, set_meta_ description instead of just a pure meta_description function.

The difference is that previously I stored the meta description in the database. But now I don't do that anymore.

What you see in Google is what I previously had within that database.

I'm bringing up this to let you know that if you make a mistake here you will spend a very long time recovering from it.

One of the biggest edges you can get today in SEO is if you can get Google to use structured data for your website.

Let's have a look at all the types of structured data available.

Here is the article type. This is something I've tried to implement but it's not working quite yet. Let's take a look in a minute...

...The Breadcrumb snippet... Course snippet.

The overall idea here is that these bits of structured data turn into pretty snippets that increase your conversion rate and make your search entries stand out on Google. So you want to have as many of these as possible.

Let's take a look at the article data and what kind of structure it has.

Of course, there is no guarantee that if you include this code Google will also show your snippet in the search result.

Here’s a web page that I’d like to use the article structure for. Let's take a look at the generated HTML and search for the article type. As you can see there is a JSON ld, well, bit of JSON, similar to what we saw previously.

Now let's check to see if I'm getting one of those pretty snippets over on Google by searching for that particular page. It doesn't look like it's working for me. Let's take a look and figure out why.

In order to check if what I've got is correct, Google provides something called the structured data testing tool.

Once in here, I just enter the URL and Google will fetch it and then extract them. And we can see here that in my article data type there’s one warning. Perhaps that's the reason why I'm not getting that snippet in the search result.

Scrolling down we see that I’m missing the mainEntityOfPage key. I’d like to fix this issue so I googled the mainEntityOfPage and ended up on the Schema.org reference page.

Here's the piece of code I need to put on my website. I have a folder full of all these structured data templates called structured_data. And in that I have an article sub template. So I'm just going to add that key we saw on Schema.org right here near the top.

I'm not really sure how to get the current URL of this exact page in Rails. I believe it's current_url because that's what they use in the test environment but the only way to be sure is to run some integration test against this.

I have an integration test for law cases which is the only area that uses this article bit. So I'm going to run that for them. It's going to take a second because my test server has not been pre-loaded. And red... It looks like I had it incorrect.

It is request.original_url. Yeah that's, that's always available. So I'm going to run the previous test again with :TestLatest and it has turned green. Excellent.

I want to deploy this straight away. of course, I'm going to have to stage that and commit as well. Let me do that really quick with the Fugitive plugin and push to origin which is where I have CircleCI.

We can see over on CircleCI that it's already prepared the environment and has started running all those integration tests. These are quite slow so we're gonna have to give that some time.

Let's return to this before the end of the video and see how it's doing.

I've mentioned in another episode that I initially started the business with lots of AdWords. But over time that ceased to be tenable and at some point it just wasn't worth investing a lot of money into AdWords.

If I go to my historic AdWords data and select a campaign from Oxbridge Notes, the oldest one I have is 2013/2014.

That's about 3 years after I started the business but it will do. Now we scroll down here and we get an average cost per click of 27 Cents.

OK that's fine, and then we see a cost per conversion of 3.74 euros. Considering that the average lifetime value of a customer around that point was probably about 70 bucks, this was extremely profitable.

I really should have made more use of this opportunity to get traffic while it was cheap.

Compare this to the situation in the last month in 2020. If you scroll down here we see that the average cost per click is not far off what it was previously in 2013/14.

Here it is €0.39 and a bit higher. But the cost-per-conversion is now astronomical. It's €222.

The customer lifetime value at this point in the business is higher due to email marketing. I reckon it's about 140. But a cost per conversion of 222 is too high and therefore I need to fix some of these ads because they are not really worthwhile.

Keep those figures in mind as we compare the situations for SEO over in Google Analytics. I'm going to filter just for organic search results. That's the number of organic search users I had in the last month. Next, let's do a quick calculation where we multiply the number of users by what it would have cost me to get those users using AdWords.

So the cost per click currently is 39 cents, so I multiply the 40423 by $0.39 and I get €15,764.

That's the value of the traffic generated through SEO in the last 30 days.

What's more, the amount of traffic available through AdWords is much less than the amount of traffic available through organic search. You know this, often when you see an ad you just skip it and go for the first organic search result.

By having an organic search strategy you tap into this preference of people.

So what pages on my website are responsible for bringing in all that traffic?

Let's take a look at the landing page report over in Google Analytics and look for some patterns.

Here we see a total of 57,000 sessions within this period. The top result here only accounts for 1.74%. That's just the law taxon, not all the law products. The majority of the traffic comes from a long tail which was my intentional strategy with SEO on this website.

So if I search here for the keyword "samples", you will see that a lot of the URLs there contain "samples" pretty far in.

This accounts for 29,000 of the sessions or about 52%.

Another SEO initiative I have is cases, summaries of particular law cases. This accounts for 19,000 sessions or 34%.

If you add up just those two structures together, we come to 91% of the content.

So what does this content look like over on Oxbridge Notes?

I'm going to go into one of the law pages here, administrative law, and then scroll down to one of the sample pages. God, that pop-up is annoying.

Here on one of those sample pages, there is a PDF preview of the digital document that a particular author or seller on the website has uploaded. This contains unique content that’s often well worded, therefore I have a source of text, a source of human written text.

I don't believe Google indexed PDF stuff so I also converted the PDF into some HTML using some back-end tools.

In general, I think as programmers the most effective SEO strategy is to create some sort of system where other people naturally write content and then use that for SEO.

For example, you may have a forum where people talk to one another and then you might rank for that reason. Or you might be able to even generate content programmatically where none existed before.

I prefer this to writing endless blog posts that may or may not ever rank.

How many of those sample pages do I have in total in my production website?

So I'm going to open up my production console which opens up an interactive Ruby session on my server. I have wrapped this in a bin/production console because I like to create bin scripts around all the commands and keep this bin scripts consistent across every single project even though the underlying technology may differ.

Now I'm going to select all the notes files with samples. Some of them do not have samples because the document is too short or I was unable to convert them for whatever reason. You will see here that I have 8,608 of these pages.

Next for the second category, let's look at how many law cases I have. There’s a total of 1,610 here. How did I generate these?

Well, I hired a virtual assistant over Upwork and over the course of about 2 months she extracted that information from documents I had the ownership rights to.

In total the project cost less than €2000 .

Why am I allowed to use this material in this way? The answer is that there's a contract between me and the authors on my website, that they would permit me to use a certain percentage of their content as free samples in order to attract people to the website, attract people to the pages where they’re selling the notes and increase their sales.

I know quite a few other web entrepreneurs who have successful SEO on their websites and sometimes you'd be amazed at the things that work.

For example, in one case someone got a lot of data I believe about mobile phones and then automatically created text from that JSON, text that compared one phone to the other. He then submitted all these generated pages to Google’s index. And — here’s the trick — the pages were interesting enough for human users. That means that they would visit the pages, spend time on them, and even comment on them.

Here is a very dumb example to recreate something like that.

I have some movies data... let's say it’s JSON. I have a function, make_wordy which takes in a movie and creates a compelling paragraph of some sort. Well, to be compelling one day. And then I simply look through those movies and create some HTML out of that.

Let's see what it looks like when I run this script. And there you can see two separate HTML pages that could potentially be indexed .

In the wrong hands this is obviously spammy and won't get you anywhere. But with a bit of thought into making things interesting you can go very far with this approach.

This screencast is already very long, so I'm going to leave it for today and continue next week.

I've got a lot more to say about this topic.

See you then.