How to transfer data between AWS buckets

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the AWS category.

Last Updated: 2024-04-24

The first mistake I made was not to consider keeping the region the same as previously. By default I's account made a Sao Paulo bucket, whereas Project BX previously had one in California. This ran the risk of breaking the old links, so I had to move the bucket region once I realized this.

I ran an s3 sync between a bucket under my control and I's one. Afterwards, in the s3 console of I, I was unable to make the individual bucket entries public. The error said permission denied, despite the objects being in his account!

The issue was that the sync command had not given I's account full control as a grantee and had instead copied over my ACL:

aws s3api get-object-acl --bucket projectb2 --key missing.jpg
{
    "Owner": {
        "DisplayName": "jack.kinsella",
        "ID": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    },
    "Grants": [
        {
            "Grantee": {
                # Should be I's name here!
                "DisplayName": "jack.kinsella",
                "ID": "xyz",
                "Type": "CanonicalUser"
            },
            "Permission": "FULL_CONTROL"
        }
    ]

The following changed the grantee to I (I just put in his email address) for a single item.

aws s3api put-object-acl --bucket projectb --key missing.jpg --grant-full-control 'emailaddress="isemailaddress@example.com"'

But that's just one entry. I now need to loop over all entities.

To start, the following command lists all the key names inside a bucket and places them in a file:

aws s3api list-objects --bucket projectb2 --output text --query Contents[].[Key] > s3-keys.txt

If your bucket is huge, you might want to test on a subfolder of your bucket using --prefix:

aws s3api list-objects --bucket projectb2 --output text --query Contents[].[Key] --prefix magazine_covers

Finally, here is how to loop through the works on the command line:

while IFS= read -r key; do
  aws s3api put-object-acl --bucket projectb2 --key $key --grant-full-control 'emailaddress="isemailaddress@example.com"'
done < s3-keys.txt

Unfortunately, this process was taking hours due to network roundtime issues. So I rewrote to parallelize:

N=50
while IFS= read -r key; do
  echo "Processing: $key"
  ((i=i%N)); ((i++==0)) && wait
  aws s3api put-object-acl --bucket projectb2 --key "$key" --grant-full-control 'emailaddress="isemailaddress@example.com"' &
  echo "Done\n"
done < s3-keys.txt

Ultimately this also proved too slow so I did the whole again from scratch using sync, which somehow was much faster:

 aws s3 sync s3://projectb s3://projectb2 --grants 'full=emailaddress=isemailaddress@examples.com'