Take data interchange specs seriously

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the data category.

Last Updated: 2024-04-23

I ended up with an unexpected error (that did not come to my attention for many months) relating to JSON-LD micro-data on my web-page for a media law product.

When I randomly opened up a web-page for a law product, I saw the following error in the JS logs

fbevents.js:23 [Facebook Pixel] - Unable to parse JSON-LD tag. Malformed JSON found: '
{
  "@type": "product",
  "description": "A collection of the best LPC Media Law notes the director of Oxbridge Notes (an Oxford law graduate) could find after combing through twenty-nine LPC samples from outstanding students with the highest results in England and carefully evaluating each on accuracy, formatting, logical structure, spelling/grammar, conciseness and "wow-factor".

  In short these are what we believe to be the strongest set of Media Law notes available in the UK this year. This collection of notes is fully updated for recent exams, also making them the most up-to-date study materials you'll find.

  We're confident you'll find these revision materials useful - check the samples below to see why we are so excited.
  ",
  "offers": {
    "@type": "Offer",
    ...
    }
  }
'.

The cause of the error itself was that the description field contained unescaped newline characters, however JSON does not allow that.

Here is the original database item's description field (as seen through a Ruby REPL):

Product.find_by(slug: "lpc-law-media-law").description
=> "A collection of the best LPC Media Law notes the director of Oxbridge Notes
(an Oxford law graduate) could find after combing through twenty-nine LPC
samples from outstanding students with the highest results in England and
carefully evaluating each on accuracy, formatting, logical structure,
spelling/grammar, conciseness and \"wow-factor\".\r\n\r\nIn short these are what
we believe to be the strongest set of Media Law notes available in the UK this
year. This collection of notes is fully updated for recent exams, also making
them the most up-to-date study materials you'll find.\r\n\r\nWe're confident
you'll find these revision materials useful - check the samples below to see why
we are so excited.\r\n"

Yet a JSON validator gives the following error for the actual produced JSON on the webpage:

Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got 'undefined'

Running the Rails converter method .to_json on the description field got it to validate (at the expense of introducting weird initial quotes)

Product.find_by(slug: "lpc-law-media-law").description.to_json
=> "\"A collection of the best LPC Media Law notes the director of Oxbridge
Notes (an Oxford law graduate) could find after combing through twenty-nine LPC
samples from outstanding students with the highest results in England and
carefully evaluating each on accuracy, formatting, logical structure,
spelling/grammar, conciseness and \\\"wow-factor\\\".\\r\\n\\r\\nIn short these
are what we believe to be the strongest set of Media Law notes available in the
UK this year. This collection of notes is fully updated for recent exams, also
making them the most up-to-date study materials you'll find.\\r\\n\\r\\nWe're
confident you'll find these revision materials useful - check the samples below
to see why we are so excited.\\r\\n\""

Above, in this json-ified version, you can see it escaped the carriage return and newline characters.

Here's the effect of printing these escaped strings:

# Original version 
[3] pry(main)> puts "hi \n jack"
hi
jack

# Version after calling to_json
[4] pry(main)> puts "hi \\n jack"
hi \n jack

i.e. the to_json escaping preserves the newline characters instead of actually printing a newline.

Lesson

The deeper lesson is that I should make less assumptions about JSON or indeed any other data representation. JSON (and other formats) have specs which I could read. They have validators which I could have used. And, importantly, they have built-in converters in Ruby and every other language I use and I should rely on these converters (instead of assuming that even a string would work).