Slashed: How to Parse AWS Cost and Usage Reports (CUR) with Python to Find Hidden Waste
Your AWS bill summary hides the real story. Learn how to programmatically parse AWS Cost and Usage Reports with Python, surface the exact resources bleeding your budget, and generate an actionable cost-reduction playbook you can execute in a single afternoon.
TL;DR
The AWS billing console gives you totals. The Cost and Usage Report (CUR) gives you truth.
- The Stack: Python (pandas + boto3), AWS CUR (Parquet format), S3 (data lake), Athena (optional SQL querying).
- The Verdict: A 200-line Python script can surface more actionable cost savings than a $50,000/year FinOps SaaS tool. The data is already there — you just need to read it.
Want me to run this analysis on your AWS account? I will parse your CUR data, identify every dollar of waste, and hand you a prioritized action plan.
Book a Free 15-Minute AWS Bill Review — no commitments, just clarity.
Why the Billing Console Lies to You
Open your AWS Billing Dashboard right now. You will see a bar chart showing total spend per service: EC2, RDS, S3, Lambda. It looks clean. It looks manageable.
It is hiding everything that matters.
The billing console aggregates costs at the service level. It does not tell you:
- Which specific EC2 instance is costing you $2,400/month and running at 4% CPU utilization.
- Which S3 bucket is racking up $800/month in
PUTrequest charges because a misconfigured application is writing millions of tiny objects. - Which NAT Gateway is silently processing 12TB of internal traffic that should be routed through a free VPC Endpoint.
To find that data, you need the Cost and Usage Report (CUR).
Step 1: Enable CUR (5 Minutes)
If you haven't already, enable CUR delivery to S3 in Parquet format:
- Go to AWS Billing Console → Cost & Usage Reports → Create Report.
- Name it
cur-detailed-report. - Select Include resource IDs (critical for identifying specific resources).
- Choose Parquet format (columnar storage, dramatically faster to parse than CSV).
- Set the S3 delivery bucket.
AWS will start delivering hourly/daily reports within 24 hours. Each report is a set of Parquet files partitioned by date.
Step 2: The Python Parser
Here is the core script that reads your CUR data, groups spending by resource, and identifies the top cost offenders:
cur_analyzer.py
import boto3
import pandas as pd
from datetime import datetime, timedelta
def download_cur_files(bucket: str, prefix: str, local_dir: str = "/tmp/cur"):
"""Download the latest CUR Parquet files from S3."""
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
files = []
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get("Contents", []):
if obj["Key"].endswith(".parquet"):
local_path = f"{local_dir}/{obj['Key'].split('/')[-1]}"
s3.download_file(bucket, obj["Key"], local_path)
files.append(local_path)
return files
def analyze_cur(files: list[str]) -> pd.DataFrame:
"""Parse CUR Parquet files and return top cost offenders."""
frames = [pd.read_parquet(f) for f in files]
df = pd.concat(frames, ignore_index=True)
# Filter to the last 30 days of usage
df["line_item_usage_start_date"] = pd.to_datetime(
df["line_item_usage_start_date"]
)
cutoff = datetime.now() - timedelta(days=30)
df = df[df["line_item_usage_start_date"] >= cutoff]
# Group by resource and sum the unblended cost
cost_by_resource = (
df.groupby(
[
"line_item_product_code",
"line_item_resource_id",
"line_item_usage_type",
]
)["line_item_unblended_cost"]
.sum()
.reset_index()
.sort_values("line_item_unblended_cost", ascending=False)
)
return cost_by_resource.head(50)
def generate_report(top_costs: pd.DataFrame) -> str:
"""Generate a Markdown report of the top 50 cost offenders."""
lines = ["# AWS Cost Offender Report", ""]
lines.append(f"Generated: {datetime.now().isoformat()}")
lines.append("")
lines.append(
"| Service | Resource ID | Usage Type | 30-Day Cost |"
)
lines.append(
"| :------ | :---------- | :--------- | ----------: |"
)
for _, row in top_costs.iterrows():
cost = f"${row['line_item_unblended_cost']:,.2f}"
lines.append(
f"| {row['line_item_product_code']} "
f"| `{row['line_item_resource_id'][:40]}` "
f"| {row['line_item_usage_type']} "
f"| {cost} |"
)
return "\n".join(lines)
if __name__ == "__main__":
BUCKET = "your-cur-bucket"
PREFIX = "cur-detailed-report/cur-detailed-report/year=2026/month=5"
files = download_cur_files(BUCKET, PREFIX)
top_costs = analyze_cur(files)
report = generate_report(top_costs)
with open("cost_report.md", "w") as f:
f.write(report)
print(report)
print(f"\nTotal 30-day spend in top 50 resources: "
f"${top_costs['line_item_unblended_cost'].sum():,.2f}")Run it, and you will get a Markdown table showing exactly which resources are draining your budget. No dashboards. No SaaS subscriptions. Just raw data and truth.
Step 3: What to Look For
Once you have the output, here are the patterns that consistently reveal the biggest savings:
The Idle Compute Monster
If you see an i-0abc123... EC2 instance costing $1,800/month but your CloudWatch metrics show 3% average CPU, that instance is over-provisioned by at least 60%. Downsize it or migrate the workload to a Spot-backed Kubernetes pod.
The NAT Gateway Tax
Look for NatGateway-Bytes usage types. If you see $5,000+ in NAT Gateway data processing, your private subnets are routing internal AWS traffic (S3, ECR, DynamoDB) through the public internet. Deploy VPC Endpoints and watch that line item vanish. I cover this in detail in my FinOps Engineering Guide.
The Snapshot Graveyard
EBS snapshots accumulate silently. Old AMIs that nobody uses anymore still retain their snapshots, costing $0.05/GB/month. Filter your CUR data for EBS:SnapshotUsage and you will likely find hundreds of dollars in zombie storage.
The Data Transfer Surprise
Cross-region and cross-AZ data transfer charges are invisible in the billing console but glaringly obvious in CUR data. If your microservices are chatting across availability zones, you are paying $0.01/GB for every internal API call.
Automating It
Don't run this script manually. Set up an EventBridge rule to trigger a Lambda function weekly. The Lambda runs the analysis, writes the report to S3, and posts a summary to your #finops Slack channel:
lambda_handler.py
def handler(event, context):
files = download_cur_files(BUCKET, PREFIX)
top_costs = analyze_cur(files)
report = generate_report(top_costs)
# Upload to S3
boto3.client("s3").put_object(
Bucket="reports-bucket",
Key=f"cost-reports/{datetime.now().strftime('%Y-%m-%d')}.md",
Body=report.encode(),
)
# Post to Slack
total = top_costs["line_item_unblended_cost"].sum()
post_to_slack(
f"Weekly CUR Analysis complete. "
f"Top 50 resources account for ${total:,.2f} in spend. "
f"Report: https://reports-bucket.s3.amazonaws.com/..."
)The Operational Reality
- CUR files are massive. A medium-sized AWS account generates hundreds of megabytes of CUR data per month. Use Parquet (not CSV) and consider AWS Athena for SQL-based querying if your account is large.
- Resource IDs change. Auto Scaling Groups create and terminate instances constantly. Focus on usage types and patterns, not individual instance IDs, for dynamic workloads.
- Savings Plans distort costs. If you have active Savings Plans or Reserved Instances, the
line_item_unblended_costmay not reflect your actual spend. Usesavings_plan_savings_plan_effective_costcolumns for a more accurate picture.
The Payoff
Your AWS bill is not a black box. The data to slash it by 30% is already sitting in an S3 bucket, waiting for someone to read it.
This script is the diagnostic. The next step is the surgery — re-architecting the resources that are bleeding you dry. If this analysis shows you are wasting more than $1,500/month, you don't need a tool; you need an architect.
Is your AWS bill hiding $10,000 in waste? Most companies have never looked at their CUR data. When they do, the savings are immediate and dramatic.
I will run this analysis on your account and hand you a prioritized action plan — or show you exactly how to do it yourself.
Stop guessing. Book a Free Infrastructure Audit.
Get weekly DevOps insights
Join engineers who read my deep-dives on Kubernetes, AWS cost optimization, CI/CD, and infrastructure automation.

DevOps Engineer & Cloud Consultant | FinOps, GitOps & Kubernetes Expert
I build systems that run reliably, scale efficiently, and deploy intelligently. See how I can help your team.