
AWS S3 Batch Operations: Process Millions of Objects Without Breaking a Sweat
If you've ever needed to copy millions of S3 objects, apply tags across an entire bucket, or run a Lambda function on every file you've ever stored — you've probably written a script that runs overnight and prays it doesn't get throttled. AWS S3 Batch Operations exists to kill that script.
In this post, we'll break down what S3 Batch Operations is, how it works under the hood, when to use it, and walk through a real example.
What is S3 Batch Operations?
S3 Batch Operations is a managed job execution feature that lets you perform large-scale operations on S3 objects — across billions of objects if needed — with a single API call or a few clicks in the console. AWS handles the parallelism, retries, error reporting, and progress tracking.
Instead of spinning up EC2 instances, managing concurrency, and handling partial failures yourself, you hand AWS a list of objects and tell it what to do. It does the rest.
Supported Operations
As of 2025, S3 Batch Operations supports the following actions:
Copy — Copy objects within or across buckets (even cross-region or cross-account)
Invoke AWS Lambda — Run custom processing logic per object
Replace object tagging — Apply a new tag set to all matched objects
Delete object tagging — Remove all tags from matched objects
Replace access control list (ACL) — Update ACLs on objects
Restore — Initiate restore requests for Glacier or Glacier Deep Archive objects
Object Lock retention — Apply or extend retention rules
Object Lock legal hold — Enable or disable a legal hold
Replicate — Replicate objects that were not covered by a replication rule
How It Works
Every S3 Batch Operations job has three core components:
1. The Manifest
The manifest is a list of objects the job will act on. You can provide it as:
An S3 Inventory report (CSV or ORC) — ideal for full-bucket operations
A custom CSV file you generate yourself — useful when you have a filtered subset of objects
A simple CSV manifest looks like this:
my-bucket,photos/2023/jan/img001.jpg
my-bucket,photos/2023/jan/img002.jpg
my-bucket,videos/2023/q1-review.mp42. The Operation
This is what you want to do to every object in the manifest — copy, tag, invoke Lambda, etc.
3. The Completion Report
When the job finishes, AWS writes a report to an S3 bucket of your choice. The report lists every object that was processed, whether it succeeded or failed, and the HTTP status code. This is invaluable for debugging partial failures.
Real-World Example: Re-tagging Objects After a Taxonomy Change
Imagine your team has been tagging uploaded files with env=prod, but the company has decided to standardise on environment=production. You have 4 million objects to update.
Step 1 — Generate a manifest using S3 Inventory
Enable S3 Inventory on your bucket with a daily or weekly schedule. Once the CSV report is generated, note its location (e.g. s3://my-inventory-bucket/reports/2025-06-01T00-00Z/manifest.json).
Alternatively, generate your own CSV using the AWS CLI:
aws s3api list-objects-v2 \
--bucket my-bucket \
--query "Contents[].Key" \
--output text | tr '\t' '\n' | awk '{print "my-bucket," $0}' > manifest.csv
aws s3 cp manifest.csv s3://my-batch-manifests/manifest.csvStep 2 — Create the batch job
aws s3control create-job \
--account-id 123456789012 \
--operation '{"S3PutObjectTagging": {"TagSet": [{"Key": "environment", "Value": "production"}]}}' \
--manifest '{"Spec": {"Format": "S3BatchOperations_CSV_20180820", "Fields": ["Bucket","Key"]}, "Location":
{"ObjectArn": "arn:aws:s3:::my-batch-manifests/manifest.csv", "ETag": "abc123"}}' \
--report '{"Bucket": "arn:aws:s3:::my-batch-reports", "Format": "Report_CSV_20180820", "Enabled": true,
"Prefix": "tagging-job", "ReportScope": "AllTasks"}' \
--priority 10 \
--role-arn arn:aws:iam::123456789012:role/S3BatchRole \
--no-confirmation-requiredStep 3 — Monitor the job
Head to S3 → Batch Operations in the AWS Console, or poll via CLI:
aws s3control describe-job \
--account-id 123456789012 \
--job-id your-job-idYou'll see progress stats like numberOfTasksSucceeded, numberOfTasksFailed, and estimated time remaining.
Invoking Lambda Per Object
The most powerful operation is Lambda invocation. For each object in the manifest, S3 Batch Operations calls your Lambda with a structured payload:
{
"invocationSchemaVersion": "1.0",
"invocationId": "abc...",
"job": { "id": "job-id" },
"tasks": [
{
"taskId": "task-id",
"s3Key": "photos/img001.jpg",
"s3VersionId": null,
"s3BucketArn": "arn:aws:s3:::my-bucket"
}
]
}Your Lambda must return a result for each task:
{
"invocationSchemaVersion": "1.0",
"treatMissingKeysAs": "PermanentFailure",
"invocationId": "abc...",
"results": [
{
"taskId": "task-id",
"resultCode": "Succeeded",
"resultString": "Processed successfully"
}
]
}This pattern is perfect for image transcoding, virus scanning, metadata extraction, or any custom transformation at scale.
IAM Permissions
The IAM role used by S3 Batch Operations needs:
s3:GetObjecton the source buckets3:PutObjectTagging(or the relevant permission for your operation)s3:GetObjecton the manifest buckets3:PutObjecton the report bucketlambda:InvokeFunctionif invoking Lambda
The trust relationship must allow batchoperations.s3.amazonaws.com to assume the role.
Pricing
S3 Batch Operations pricing has two components:
$0.25 per job created
$1.00 per million objects processed
For most workloads this is far cheaper than the EC2 time, engineering effort, and error-handling overhead of a DIY approach.
When Should You Use It?
Use S3 Batch Operations when:
You need to act on more than a few thousand objects
You want built-in retry and failure reporting without writing it yourself
The job needs to be auditable (the completion report is a paper trail)
You want to run the operation once or infrequently (for continuous processing, use S3 Event Notifications or S3 Object Lambda instead)
Gotchas to Watch Out For
Manifest ETag must be accurate. If you update the manifest CSV after creating the job, the ETag won't match and the job will fail at start.
Lambda concurrency limits apply. If your account's Lambda concurrency is low, Batch Operations will be throttled. Request a quota increase before kicking off large jobs.
Jobs must be confirmed before they run (unless you pass
--no-confirmation-required). A job sitting in Awaiting your confirmation state does nothing until approved.Cross-account copies require bucket policies on the destination to allow the source account's role to write objects.
Final Thoughts
S3 Batch Operations is one of those AWS features that sounds niche until the day you desperately need it. When that day comes — and it will — you'll be glad you don't have to write and babysit a custom script. Set up the manifest, define your operation, attach the right IAM role, and let AWS do the heavy lifting.