Always use boundaries for exact matches

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the regex category.

Last Updated: 2024-04-19

When looking for exact matches, do not use grep without before and after boundaries - otherwise you'll get unwanted substrings.

I had a bug in a shell script that acted as a simple, file-based DB

Here is how the date is stored:

# format: file name<|>retrieval_id
send_to_chain<|>ri31698156db868fca8030fa01eb98116d1f0cce869e9f338c9bca29c74a2112e55
tags<|>ri3169818e3aa0a14f7c053eb16d99d39de0b982f153a8901e8442aef534da44446
secrets.txt<|>ri316982644edfe92c11576446aa4103b4bbfd626e7de0c26d4664ae0d0c9793f29
tanktop<|>....

Here was my incorrect code

retrieval_id=$(grep "$file" storage.db | tail -n 1 | awk -F "$delimiter" '{print $3}')

The issue was that this matched for substrings of file names. I.e. it matched not only for the file tags but also for the file tanktop

The fix was to specify that the match must be bounded by the start of the line and the delimiter.

delimiter="<|>"
retrieval_id=$(grep "^$file""$delimiter" storage.db | tail -n 1 | awk -F "$delimiter" '{print $3}')