How to parallelize a loop

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the bash category.

Last Updated: 2023-03-28

How would you parallelize a slow operation in a bash loop?

I started off with the following:

while read -r key; do
  aws s3api put-object-acl --bucket mybucket --key "$key" --grant-full-control 'emailaddress=""'
done < s3-keys.txt

This was too slow so I wanted a way to parallelize. I came up with the combination of backgrounding the slow command with ampersand, limiting the total number of processes, and waiting for all subprocesses to finish with the wait command once a certain number were running.

N=50 # set number of processes
while read -r key; do
  echo "Processing: $key"
  # The double parentheses construct permits arithmetic evaluation, and we do
  # some modulo math to determine whether to call `wait`
  ((i=i%N)); ((i++==0)) && wait
  aws s3api put-object-acl --bucket mybucket --key "$key" --grant-full-control 'emailaddress=""' &
  echo "Done\n"
done < s3-keys.txt