Tracking before your web application router will track non existent urls and redirect junk

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the web-development category.

Last Updated: 2024-04-18

We wanted to tell users how many people visited each product page on a website as a form of social-proof marketing.

I wrote the following piece of middleware to intercept every request and track it:

class HitCounter
  def initialize(app)
    @app = app
  end

  def call(env)
   # The `Traffic` module, not shown, simply talks to a
   # redis database and increments a number corresponding to the URL as a key.
    Traffic.new.track_hit(Rack::Request.new(env))
    app.call(env)
  end

  private

  attr_reader :app
end

A month later, I wake up to thousands of exceptions. Why? Because redis had run out of memory in a way I couldn't have foreseen based on the relatively minuscule surface-area of my website. There were a few reasons for this, but the reason focused on in this article was that spiders, hackers etc. attempted to crawl an ungodly number of non-existent routes

This led to me accumulating keys for junk not even on my website.

(Additionally, temporary URLs redirecting to s3 assets were also tracked, despite being useless for the intended data analysis.)

The fix was to move the tracking into the controller, after the routing had decided what to do and filtered out non-existent routes.

Here's the fixed code — notice how it will be immune to 404s and other errors that only get discovered at the controller-level:

module ControllerHelpers
  module HitTracking
    # Usage (within a controller): track_hits only: :show

    extend ActiveSupport::Concern

    class_methods do
      def track_hits(only: [])
        after_action -> { Traffic.new.track_hit(request) },
          if: -> { response.successful? }, only: Array.wrap(only)
      end
    end
  end
end