← Back to The Lab
  AWS S3 Batch Operations: Process Millions of Objects Without Breaking a Sweat

AWS S3 Batch Operations: Process Millions of Objects Without Breaking a Sweat

Zayed April 5, 202669

If you've ever needed to copy millions of S3 objects, apply tags across an entire bucket, or run a Lambda function on every file you've ever stored — you've probably written a script that runs overnight and prays it doesn't get throttled. AWS S3 Batch Operations exists to kill that script.

In this post, we'll break down what S3 Batch Operations is, how it works under the hood, when to use it, and walk through a real example.

What is S3 Batch Operations?

S3 Batch Operations is a managed job execution feature that lets you perform large-scale operations on S3 objects — across billions of objects if needed — with a single API call or a few clicks in the console. AWS handles the parallelism, retries, error reporting, and progress tracking.

Instead of spinning up EC2 instances, managing concurrency, and handling partial failures yourself, you hand AWS a list of objects and tell it what to do. It does the rest.

Supported Operations

As of 2025, S3 Batch Operations supports the following actions:

  • Copy — Copy objects within or across buckets (even cross-region or cross-account)

  • Invoke AWS Lambda — Run custom processing logic per object

  • Replace object tagging — Apply a new tag set to all matched objects

  • Delete object tagging — Remove all tags from matched objects

  • Replace access control list (ACL) — Update ACLs on objects

  • Restore — Initiate restore requests for Glacier or Glacier Deep Archive objects

  • Object Lock retention — Apply or extend retention rules

  • Object Lock legal hold — Enable or disable a legal hold

  • Replicate — Replicate objects that were not covered by a replication rule

How It Works

Every S3 Batch Operations job has three core components:

1. The Manifest

The manifest is a list of objects the job will act on. You can provide it as:

  • An S3 Inventory report (CSV or ORC) — ideal for full-bucket operations

  • A custom CSV file you generate yourself — useful when you have a filtered subset of objects

A simple CSV manifest looks like this:

my-bucket,photos/2023/jan/img001.jpg
  my-bucket,photos/2023/jan/img002.jpg
  my-bucket,videos/2023/q1-review.mp4

2. The Operation

This is what you want to do to every object in the manifest — copy, tag, invoke Lambda, etc.

3. The Completion Report

When the job finishes, AWS writes a report to an S3 bucket of your choice. The report lists every object that was processed, whether it succeeded or failed, and the HTTP status code. This is invaluable for debugging partial failures.

Real-World Example: Re-tagging Objects After a Taxonomy Change

Imagine your team has been tagging uploaded files with env=prod, but the company has decided to standardise on environment=production. You have 4 million objects to update.

Step 1 — Generate a manifest using S3 Inventory

Enable S3 Inventory on your bucket with a daily or weekly schedule. Once the CSV report is generated, note its location (e.g. s3://my-inventory-bucket/reports/2025-06-01T00-00Z/manifest.json).

Alternatively, generate your own CSV using the AWS CLI:

aws s3api list-objects-v2 \
    --bucket my-bucket \
    --query "Contents[].Key" \
    --output text | tr '\t' '\n' | awk '{print "my-bucket," $0}' > manifest.csv

  aws s3 cp manifest.csv s3://my-batch-manifests/manifest.csv

Step 2 — Create the batch job

aws s3control create-job \
    --account-id 123456789012 \
    --operation '{"S3PutObjectTagging": {"TagSet": [{"Key": "environment", "Value": "production"}]}}' \
    --manifest '{"Spec": {"Format": "S3BatchOperations_CSV_20180820", "Fields": ["Bucket","Key"]}, "Location":
   {"ObjectArn": "arn:aws:s3:::my-batch-manifests/manifest.csv", "ETag": "abc123"}}' \
    --report '{"Bucket": "arn:aws:s3:::my-batch-reports", "Format": "Report_CSV_20180820", "Enabled": true,
  "Prefix": "tagging-job", "ReportScope": "AllTasks"}' \
    --priority 10 \
    --role-arn arn:aws:iam::123456789012:role/S3BatchRole \
    --no-confirmation-required

Step 3 — Monitor the job

Head to S3 → Batch Operations in the AWS Console, or poll via CLI:

aws s3control describe-job \
    --account-id 123456789012 \
    --job-id your-job-id

You'll see progress stats like numberOfTasksSucceeded, numberOfTasksFailed, and estimated time remaining.

Invoking Lambda Per Object

The most powerful operation is Lambda invocation. For each object in the manifest, S3 Batch Operations calls your Lambda with a structured payload:

{
    "invocationSchemaVersion": "1.0",
    "invocationId": "abc...",
    "job": { "id": "job-id" },
    "tasks": [
      {
        "taskId": "task-id",
        "s3Key": "photos/img001.jpg",
        "s3VersionId": null,
        "s3BucketArn": "arn:aws:s3:::my-bucket"
      }
    ]
  }

Your Lambda must return a result for each task:

{
    "invocationSchemaVersion": "1.0",
    "treatMissingKeysAs": "PermanentFailure",
    "invocationId": "abc...",
    "results": [
      {
        "taskId": "task-id",
        "resultCode": "Succeeded",
        "resultString": "Processed successfully"
      }
    ]
  }

This pattern is perfect for image transcoding, virus scanning, metadata extraction, or any custom transformation at scale.

IAM Permissions

The IAM role used by S3 Batch Operations needs:

  • s3:GetObject on the source bucket

  • s3:PutObjectTagging (or the relevant permission for your operation)

  • s3:GetObject on the manifest bucket

  • s3:PutObject on the report bucket

  • lambda:InvokeFunction if invoking Lambda

The trust relationship must allow batchoperations.s3.amazonaws.com to assume the role.

Pricing

S3 Batch Operations pricing has two components:

  • $0.25 per job created

  • $1.00 per million objects processed

For most workloads this is far cheaper than the EC2 time, engineering effort, and error-handling overhead of a DIY approach.

When Should You Use It?

Use S3 Batch Operations when:

  • You need to act on more than a few thousand objects

  • You want built-in retry and failure reporting without writing it yourself

  • The job needs to be auditable (the completion report is a paper trail)

  • You want to run the operation once or infrequently (for continuous processing, use S3 Event Notifications or S3 Object Lambda instead)

Gotchas to Watch Out For

  • Manifest ETag must be accurate. If you update the manifest CSV after creating the job, the ETag won't match and the job will fail at start.

  • Lambda concurrency limits apply. If your account's Lambda concurrency is low, Batch Operations will be throttled. Request a quota increase before kicking off large jobs.

  • Jobs must be confirmed before they run (unless you pass --no-confirmation-required). A job sitting in Awaiting your confirmation state does nothing until approved.

  • Cross-account copies require bucket policies on the destination to allow the source account's role to write objects.

Final Thoughts

S3 Batch Operations is one of those AWS features that sounds niche until the day you desperately need it. When that day comes — and it will — you'll be glad you don't have to write and babysit a custom script. Set up the manifest, define your operation, attach the right IAM role, and let AWS do the heavy lifting.

#AWS S3 #S3 Batch Operations #S3 Bulk Operations #Cloud Storage #S3 Inventory #bulk copy s3