CSV file issues

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the user-input-and-output category.

Last Updated: 2024-03-02

Anticipate clients will reorder columns

We had a production issue parsing a CSV with medical test results when the client originally put the column with the medical test result in row 13 but then changed it to 15 without telling us later.

In future, always validate the headers order in some approximate way.

And inform clients that the order must not be changed.

Anticipate unexpected (e.g. windows) encodings

When the client uploaded a CSV file with coupon codes to our system, it contained an invalid byte sequence for UTF8 (0xa0), causing the upload to fail.

This is a windows-specific byte and must be handled with the encoding windows-1252

In general, when clients upload files - and it's possible they will be on windows or old machines (ASK THEM) - anticipate encoding issues.

My new practice will be either to convert to the a compatible encoding, removing harmful crap

open("file.csv", "r") { |io| io.read.encode("UTF-8", invalid: :replace, replace: "") }

Anticipate CSV-like files (Excel, Pages)

When accepting file uploads (or file input generally), always validate file extension and provide a user-understandable error.

In Project S we had a CSV uploader feature. The client uploaded an Excel sheet and a general exception was thrown, crashing the system. I should have shown an error message "must be CSV" instead.