Learn system design and architecture in context, via a behind-the-scenes tour of Oxbridge Notes, a profitable web app serving over 200,000 requests/month for 10 years (albeit with bleeding edge code). See how performance is improved with queues, DB indexes, memcached, full-text search, micro-services, and HTTP caching, how production errors are tracked and escalated when necessary, how classes are organized into a sensible architecture, how its containerized for continuous integration testing, how git is used to minimize conflicts and more. At the end, you'll know exactly how to do the same and be able to ace any system design interview.
The ease of running
Data is more important than code, therefore the most important job you, as a programmer, have is to design a system that allows for a simple, constrained, and predictable set of data. In this episode, I'll discuss how null constraints can reduce the number of types your program has to deal with, thereby simplifying your code. Then I discuss how
check constraints can force data to take a limited (and more useful) range of values. Lastly I'll explain why it's better to carry all this out at the database level rather than at the Ruby/Python/Php/JS level.
Here I continue on from the last episode (null constraints, etc.) in exploring ways to use an SQL database to ensure data integrity. I'll show ways to avoid shooting yourself in the foot by setting non-existant relationships or by deleting rows that are referenced elsewhere in the database and are therefore necessary. I'll show how to lean on foreign keys to build resource allocation features with practically no backend code. Next I'll demonstrate the perils of relying on uniqueness validations at the Ruby/PHP/Python backend level. Finally I'll show to avoid bloat in pivot tables for many-to-many relationships.
In this third episode about data integrity, I talk about how to save your data such that computers have an easier time working with it (e.g. for filters), how to ensure that data stays valid by your own rules as you add increasing numbers of validations over the years, about avoiding the mistake of duplicating data across tables and needing to sync these records, about using database transactions to ensure that state changes to multiple rows either all occur or all fail as a unit (preventing invalid intermediate states from cropping up), and lastly about using cascades to ensure the deletion of associated records that should not exist when their 'parent' records are deleted.
A key factor in reducing my coding time for Oxbridge Notes down to a few hours per month was adding comprehensive integration tests. Today I demonstrate how these tests work using the test-browser's NO HEADLESS mode, which lets you actually see the browser executing your tests. Next I show how to write such tests using tools like factories (touching on how I test tracking code). Following that, I show how to set up a continuous integration server (using docker containers), and how to run your CI tests locally to verify they work before pushing them to the cloud. Lastly, I finish with a discussion about what we should test, given limited testing budget.
In this episode I talk about my personal best practices when doing acceptance testing in web-development. Firstly, how to reduce brittleness by using stable test-IDs to interface with your tests. Next I discuss why you should use non-JS test-drivers where possible for speed. Then I talk about the benefits of making your integration tests fail on ANY JS exception -- even when only tangentially related to the system under test. Lastly I give you the reasoning behind why I like to automatically capture screenshots of any test failures.
Continuing on from the last episode, I discuss more best practices for acceptance tests. Firstly, I discuss how to give your assertions a looser touch so as to reduce coupling (and brittleness). Next I talk about abstracting away your config such that changes in specific config variables won't break your tests. Lastly, I discuss the necessity of clearing state between your tests for dependencies like email collectors, file systems, databases, etc.
In war, no plan survives contact with the enemy. In tech, no code survives contact with real users. In this episode, I discuss systems to monitor and stay alerted about errors in production. By watching me debug a few of my exceptions, you'll learn: 1) How these tools can integrate back-traces of the original source code, rather than any minified or transpiled versions. 2) How to configure your error reporting to include not just back-end, but also errors in the front-end and background queues. 3) How these tools can speed up back-end debugging by showing the SQL queries prior to the error. 4) How these tools speed up front-end debugging by using telemetry to list the browser events leading up the error. 5) How you can use the info persisted in the error tracking tool to salvage mission-critical data you would otherwise have lost.
Continuing on from last week's episode, I talk about how: 1) Being on top of your exceptions can wow customers and clients. 2) The dangers of being inundated with too many exceptions and how to make the numbers manageable with filters. 3) How to connect errors in production to the actual humans you might need to apologize to. 4) The need to provision error tracking in ALL of your microservices — every component needs it. 5) The need to scrub sensitive data before potentially sending it to a 3rd party error-reporting platform. 6) The value of connecting error reporting to your deploys.
Logs are a fundamental part of diagnosing and fixing issues in your production system. I talk about 1) the importance of having a single command to immediately tail all relevant logs; 2) the need to ensure one line of log corresponds to one event (in order to play well with unix tools); 3) how to use the timeless
awk command to parse your logs; 4) the value of being comprehensive in what you log, so that you can better diagnose issues and re-run failed requests after a bug has been fixed; 5) my system for recreating entire user session journeys by parsing my logs.
In this episode, I explain why you need downtime notifiers in addition to regular error tracking systems. Briefly: because regular error reporting systems fail to even boot when the error is very serious. Also: total failure demands that you are reached via a more urgent channel - e.g. phone call or SMS. Next I describe a common blind-spot that will remain undetected even when you have a downtime-notifier in place: cronjobs that fail to run. However there is an elegant solution to this problem, in the form of heartbeat monitoring.
System failure is not binary - your application may be slow, or restarting too often, or teetering on the edge of disaster. System monitoring is how you detect these issues - either before or after the fact. I show you how I scan my aggregated logs from multiple components for certain logged error conditions (e.g. hacking attempts). If serious ones crop up, an email alert is triggered. Next I demonstrate how to use Monit to get real-time stats on your system's CPU and RAM usage. Following that I use another tool to graph my RAM usage patterns over the last week. Zooming out again, I introduce response time stats.
10,000's of programmers have picked up new tricks and/or made progress growing their software businesses.
These screencasts are great - I love the intentional focus on marketing as well as technical excellence.— Cliff Weitzman (Forbes 30 Under 30)
This content is absolute gold.— Ben P
Finally, a real enginner showcases a real world project.— Hải Vũ
That's not all either. Keep reading praise for Semicolon&Sons.
We publish a video once every week, on Sundays. Every 2nd video is completely free to watch.
If you'd like, we can remind you via email when a new episode is released. We'll also keep you up to date about the top-secret game we're developing for learning programming.