Published on

Practical AWS governance for fast-moving engineering teams

Authors
  • avatar
    Name
    Younes ZADI
    Twitter

Giving engineers access to AWS is always a balancing act.

On one side, if you lock everything down, you get the “ticket-driven cloud”: every new S3 bucket, IAM tweak, or Lambda permission becomes a request to the SRE / platform team. The result is predictable: slow delivery, frustrated engineers, and an infrastructure team that turns into a permanent bottleneck.

On the other side, if you give broad admin permissions “so people can move fast”, you may move fast… right into problems: surprise cloud bills, accidental public exposure, production drift, broken compliance, and permission sprawl that becomes impossible to audit or reason about.

This article is about a third path: a governance framework that enables engineers with real autonomy, while still enforcing strong, explicit boundaries.

The goal isn’t to remove the SRE team from the loop. It’s to reserve SRE time for the right work:

  • building shared platforms,
  • owning foundational infrastructure (networking, observability, security baselines),
  • handling exceptions and high-risk changes,
  • instead of approving day-to-day “please create X” requests.

We’ll build a model that provides:

  • flexibility where it’s safe (especially in staging),
  • tight control where it’s critical (production),
  • clear ownership and auditability everywhere,
  • using AWS’ native services: Organizations, Identity Center, SCPs, Control Tower, Permission Boundaries, IAM policies, and Resource-based policies.

By the end, you should have a concrete blueprint you can adapt to your company.

1. What Software Engineers Are Actually Allowed to Do

To design meaningful guardrails, we first need to be explicit about what engineers are expected to do in each environment — and just as importantly, what they are not.

1.1. Staging Accounts (team-x-staging)

Staging is where engineers should be able to move fast, experiment, and iterate — but still within clear cost and security boundaries.

Engineers CAN:

  • Deploy and update Lambda functions
  • Create and update S3 buckets (non-public only)
  • Create and manage DynamoDB tables
  • Create and manage SQS and SNS
  • Create and manage API Gateway resources
  • Read and write CloudWatch logs and metrics
  • Trigger Step Functions
  • Use pre-defined IAM roles provided by the platform (for example, execution roles for Lambda)
  • Deploy infrastructure using Serverless, CDK, or Terraform

Engineers CANNOT:

  • Create EC2 instances
  • Create RDS / Aurora / Redshift
  • Modify VPCs, subnets, route tables, or internet gateways
  • Create NAT gateways or other high-cost networking components
  • Disable logging, monitoring, or security services
  • Escalate permissions beyond defined boundaries
  • Create IAM users, IAM roles, or long-lived access keys
  • Modify SCPs, permission boundaries, or organization-level settings

The intent is to allow full ownership of application-level infrastructure while preventing accidental creation of high-cost or high-blast-radius resources.

1.2. Production Accounts (team-x-production)

Production prioritizes stability, auditability, and safety over speed. Engineers CAN:

  • Read application logs
  • Read metrics and dashboards
  • Read runtime configuration
  • Inspect deployed Lambda functions and API configurations
  • Assume read-only roles via Identity Center (we will see this in a later section)

Engineers CANNOT:

  • Deploy or modify infrastructure directly
  • Modify IAM roles or policies
  • Create, update, or delete data stores
  • Create new AWS resources
  • Access secrets in plaintext
  • Assume administrative or elevated roles All production deployments and infrastructure changes are performed via CI/CD deployment roles, not by humans.

1.3. Platform / SRE Accounts

Platform and SRE teams own the foundational infrastructure and governance layers.

SREs CAN:

  • Provision and manage shared infrastructure
  • Manage networking (VPCs, connectivity, DNS)
  • Define and maintain SCPs, permission boundaries, and permission sets
  • Own and operate CI/CD deployment roles
  • Manage observability, security baselines, and cost controls
  • Perform controlled break-glass operations when required

This separation ensures that high-risk changes are intentional, reviewed, and traceable, while application teams retain autonomy over their services.

2. Organization Structure (and Why It Works)

Before talking about permissions, it’s important to start with the most fundamental decision in AWS governance: account and organization structure. The most common organization structure is the Account per Team per Environment.

Org Root
├── platform-prod
├── platform-staging
├── team-a-prod
├── team-a-staging
├── team-b-prod
├── team-b-staging

Why this works:

  • Blast radius isolation
  • Clean cost attribution
  • Strong security boundaries
  • Simple mental model for engineers
Accounts are the strongest security boundary in AWS. Use them.

Managing multiple accounts can be tedious and error-prone. To simplify this, we can use AWS Organizations.

2.1. AWS Organizations

It's a global service that allows you to group accounts and apply governance rules—such as SCPs and Control Tower guardrails—consistently across the entire organization. This makes OUs the right level to encode organization-wide intent, while individual accounts handle workload isolation.

In this model, accounts are used to separate teams and environments, while OUs are used to separate types of workloads and risk profiles.

Root
├── Platform OU
│   ├── platform-prod
│   └── platform-staging
├── Workloads OU
│   ├── team-a-prod
│   ├── team-a-staging
│   ├── team-b-prod
│   └── team-b-staging

Now that we have the organization structure, we need to be explicit about how humans access AWS.

2.2. Identity Center

In this model, all human access goes through AWS IAM Identity Center (SSO). There are no IAM users in workload accounts.

Identity Center sits at the organization level, while permissions are enforced at the OU and account levels.

Human
Identity Provider (Google / Okta / Azure AD)
IAM Identity Center
Permission Set
IAM Role (in a specific account)

Important: Identity Center does not grant power by itself.

It only controls which role a human is allowed to assume. What that role can actually do is still constrained by:

  • The account it lives in
  • The OU the account belongs to
  • SCPs and Control Tower guardrails
  • Permission boundaries attached to the role

Tying it all together, we can see the following diagram:

AWS Organization Overview

Now that we have set the organization structure and how humans access AWS, we can start talking about permissions.

3. The Golden Rule of AWS Permissions

Always apply constraints at the highest level possible — and permissions at the lowest level possible.

This rule drives everything that follows.

In practice to achieve this we will use:

  • Organizations SCPs
  • Permission Boundaries
  • Identity-based policies
  • Resource-based policies

We will see how to use each of these in the next sections.

3.1. Mental Model: The Permission “Layers”

Think of AWS permissions as concentric safety rings:

┌────────────────────────────┐
│ Org SCPs (hard limits)     │  ← What must NEVER happen
├────────────────────────────┤
│ Permission Boundaries      │  ← Max power roles can ever get
├────────────────────────────┤
│ Identity-based policies    │  ← What a role is allowed to do
├────────────────────────────┤
│ Resource-based policies    │  ← Cross-account & data access
└────────────────────────────┘

4. Permissions, from the highest level to the lowest level

Let's deep dive into each of the permissions layers. The higher level we are the more coverage we have.

4.1. SCPs: The Non-Negotiable Guardrails

At this high level, SCPs are applied at the organization or OU level. What SCPs Are For

SCPs define what is absolutely forbidden, even for AdministratorAccess.

Example SCPs (Org or OU level) ❌ Block EC2 & RDS everywhere except Platform accounts

{
  "Effect": "Deny",
  "Action": ["ec2:*", "rds:*"],
  "Resource": "*",
  "Condition": {
    "StringNotEquals": {
      "aws:PrincipalAccount": [
        "111111111111", // platform-prod
        "222222222222" // platform-staging
      ]
    }
  }
}

❌ Prevent IAM User Creation (force Identity Center)

{
  "Effect": "Deny",
  "Action": ["iam:CreateUser", "iam:CreateAccessKey"],
  "Resource": "*"
}

Key principle:

SCPs should be boring, stable, and rarely changed.

If engineers complain about SCPs, you’re probably using them wrong.

4.2. Permission Boundaries: The Safety Net

What Permission Boundaries Are For:

We saw that SCP limit access to entire accounts, but what about specific individual roles or permission sets?

This is where permission boundaries come in.

For example you can add a permission boundary on a role used for CI/CD, and not an entire account. The following example will deny CI/CD roles from doing creation of IAM users, EC2 instances, RDS instances and Organizations actions.

{
  "Effect": "Deny",
  "Action": ["iam:*", "ec2:*", "rds:*", "organizations:*"],
  "Resource": "*"
}

But remember:

This is still constrained by SCPs.

4.3. Identity-Based Policies: Day-to-Day Permissions

These are applied at the role level. For example, you can create a role for a staging engineer that will be used in the permission set defined earlier, and attach it to an identity-based policy. Example: Staging Engineer Role:

{
  "Effect": "Allow",
  "Action": ["lambda:*", "s3:*", "logs:*", "cloudwatch:*"],
  "Resource": "*"
}

But remember:

This is still constrained by SCPs and permission boundaries.

4.4. Resource-Based Policies: Data Access, Not Power

Resource-based policies are applied at the resource level (S3 buckets, SQS, SNS, KMS, Lambda invoke permissions). These are used to grant access to specific resources to specific principals. Example: Cross-Account S3 Write (Staging Only):

{
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::123456789012:role/deployment-role"
  },
  "Action": "s3:PutObject",
  "Resource": "arn:aws:s3:::shared-bucket/*"
}
Resource policies are about who can touch my data, not infrastructure creation.

5. Bonus: Monitoring compliance

Okay, now we have given flexibility to engineers to move fast, but can we monitor if they are actually following our governance rules? For example, can we monitor if teams are creating public or non-encrypted S3 buckets?

This is where AWS Control Tower comes in. It basically does two kinds of compliance monitoring:

  • Preventive: Using SCPs behind the scenes which we have seen earlier.
  • Detective: Using AWS Config to monitor the compliance of the accounts (we haven't seen this yet)

AWS Config continuously evaluates resources across all accounts and detects violations.

Typical checks:

  • Public or unencrypted S3 buckets
  • Open security groups
  • Missing required tags
  • IAM roles without permission boundaries
  • Logging or monitoring disabled

Violations are aggregated centrally and can trigger alerts through SNS which will alert the admin and even can trigger remediation Lambda functions.

AWS Config

6. Final Recommendation (Strong Opinion)

  • ✅ Use AWS Organizations to group accounts and apply governance rules
  • ✅ Use Identity Center to control human access (don't use IAM users)
  • ✅ Use SCPs to say “never” at the organization or OU level
  • ✅ Use permission boundaries to say “at most” on roles and permission sets
  • ✅ Use IAM policies to say “usually”
  • ✅ Use Control Tower to monitor compliance across all accounts.