Skip to content
Cloud & InfrastructureFeatured

Building an AWS Landing Zone with OpenTofu

12 min read

TL;DR: We rebuilt the AWS Landing Zone Accelerator architecture in OpenTofu. Same security outcomes, one toolchain, no black boxes.


Why Not Just Use LZA Directly?

AWS Landing Zone Accelerator ships as a CloudFormation-based pipeline. The architecture and security patterns behind it are solid. But if your organisation already uses OpenTofu or Terraform, adopting LZA means splitting your infrastructure across two toolchains — half in a pipeline you can't easily inspect, half in your IaC tool. Two mental models, twice the opportunity for things to fall between the cracks.

So we took a different approach: study the LZA reference architecture, extract the patterns, implement them in OpenTofu. The account isolation, centralised networking, and layered controls are what actually matter. How you deploy them is just plumbing.


Account Boundaries

Everything here builds on one idea: AWS account boundaries are the strongest isolation mechanism you have.

You can try to isolate workloads with IAM policies within a single account, but it's fragile. A misconfigured resource policy, an overly broad role trust, a leaked access key — any of these can blow past your carefully crafted IAM boundaries. Account boundaries contain all of them.

When a workload runs in its own account, a compromised role in dev can't reach production. A runaway process can't starve other workloads of service quota. Billing attribution is precise, not estimated. And SCPs give you a hard ceiling that even account administrators can't punch through.


The Multi-Account Structure

The OU hierarchy follows the LZA pattern:

text
Root
├── Security OU
│   ├── Security & Logging Account
│   └── Audit Account
├── Infrastructure OU
│   ├── Network Account
│   └── Shared Services Account
├── Workloads OU
│   ├── Dev Account
│   ├── Integration Account
│   ├── Staging Account
│   └── Production Account
├── Sandbox OU
│   └── (Developer sandbox accounts)
└── Quarantine OU
    └── (Empty — incident response only)

Management Account runs AWS Organizations, SSO, and billing. Nothing else. No workloads, no CI/CD. The moment you deploy workloads into the management account, you've undermined the model.

Security & Logging Account receives CloudTrail, Config, GuardDuty, and Security Hub data from every account. One place to look when something goes wrong.

Network Account owns the Transit Gateway, centralised egress VPCs, and Network Firewall. All cross-account routing flows through here.

Shared Services Account hosts container registries, artifact repositories, and CI/CD infrastructure.

Workload Accounts are per-environment. Dev, integration, staging, production — each in its own account. A development IAM misconfiguration can't affect production. Environment-specific SCPs can tighten controls as you move towards production.


Cross-Account State Management

With multiple accounts, you need somewhere to put your state that every account can reach without passing around long-lived credentials. We use a dedicated S3 bucket and DynamoDB table in the shared services account, with cross-account IAM roles for access:

HCL
terraform {
  backend "s3" {
    bucket         = "org-tfstate-shared-services"
    key            = "network/transit-gateway/terraform.tfstate"
    region         = "eu-west-2"
    encrypt        = true
    dynamodb_table = "org-tfstate-locks"
    role_arn       = "arn:aws:iam::role/TerraformStateAccess"
    assume_role_duration = "1h"
  }
}

Each workload account assumes a role in the shared services account to read and write state. The role trust policy is scoped to the specific CI/CD principal in each account. No long-lived credentials, no shared keys.

DynamoDB locking is non-negotiable in a multi-team environment. Without it, concurrent applies corrupt state.


Multi-Account Provider Configuration

OpenTofu's provider aliasing handles cross-account operations cleanly. Here's the pattern we use throughout:

HCL
provider "aws" {
  region = "eu-west-2"
  alias  = "network"

  assume_role {
    role_arn     = "arn:aws:iam::role/TerraformExecutionRole"
    session_name = "opentofu-network"
  }

  default_tags {
    tags = {
      ManagedBy   = "opentofu"
      Environment = var.environment
      Owner       = "platform-team"
    }
  }
}

provider "aws" {
  region = "eu-west-2"
  alias  = "security"

  assume_role {
    role_arn     = "arn:aws:iam::role/TerraformExecutionRole"
    session_name = "opentofu-security"
  }
}

Every resource block explicitly references its provider. No implicit defaults. When you're reading the code, it's always obvious which account a resource lives in.


Centralised Egress with Network Firewall

No compute instance in any workload account has direct internet access. All egress routes through a centralised inspection VPC in the network account via AWS Network Firewall.

Workload VPCs have no internet gateway and no NAT gateway. Their default routes point to the Transit Gateway, which sends internet-bound traffic to the inspection VPC. Network Firewall checks it against a domain allowlist. If the destination isn't on the list, the traffic gets dropped.

A compromised instance can't phone home to a C2 domain unless someone explicitly approved that domain in a PR.

HCL
resource "aws_networkfirewall_rule_group" "domain_allowlist" {
  provider = aws.network

  capacity = 100
  name     = "egress-domain-allowlist"
  type     = "STATEFUL"
  rule_group {
    rule_variables {
      ip_sets {
        key = "HOME_NET"
        ip_set {
          definition = ["10.0.0.0/8"]
        }
      }
    }
    rules_source {
      rules_source_list {
        generated_rules_type = "ALLOWLIST"
        target_types         = ["TLS_SNI", "HTTP_HOST"]
        targets = [
          ".amazonaws.com",
          ".docker.io",
          ".github.com",
          ".hashicorp.com",
          ".ubuntu.com",
        ]
      }
    }
  }

  tags = {
    Purpose = "centralised-egress-filtering"
  }
}

The targets list is deliberately short. Teams request additions via pull request, which triggers review and approval. This generates friction — developers used to unfettered internet access will push back — but it forces visibility over what workloads actually need to reach externally. The real list is usually shorter than anyone expects.


Transit Gateway

Transit Gateway ties everything together. Every VPC attaches via RAM shares, and route tables control which accounts can talk to each other.

HCL
resource "aws_ec2_transit_gateway" "main" {
  provider = aws.network

  description                     = "org-transit-gateway"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  auto_accept_shared_attachments  = "enable"

  tags = {
    Name = "org-tgw"
  }
}

resource "aws_ec2_transit_gateway_vpc_attachment" "workload" {
  provider = aws.network

  transit_gateway_id = aws_ec2_transit_gateway.main.id
  vpc_id             = var.workload_vpc_id
  subnet_ids         = var.workload_tgw_subnet_ids

  transit_gateway_default_route_table_association = false
  transit_gateway_default_route_table_propagation = false

  tags = {
    Name        = "tgw-attach-${var.environment}"
    Environment = var.environment
  }
}

Default route table association and propagation are disabled. Every route is explicit. A newly attached VPC doesn't accidentally gain connectivity to accounts it shouldn't reach. Workload accounts can reach shared services and the internet via the inspection VPC, but can't route directly to each other unless you deliberately add that route. Same default-deny philosophy as the SCPs, just at the network level.


Layered Security Controls

No single control layer catches everything. You need them stacked.

SCPs (Preventive)

SCPs are the hard ceiling. They apply to every principal in every account within an OU, including the account root user. No bypass from within the account.

JSON
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyUnapprovedRegions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "eu-west-1",
            "eu-west-2"
          ]
        },
        "ArnNotLike": {
          "aws:PrincipalARN": [
            "arn:aws:iam::role/OrganizationAdmin"
          ]
        }
      }
    }
  ]
}

The OrganizationAdmin exclusion is necessary — without it, you lock yourself out of global services that operate in us-east-1 (IAM, CloudFront, Route 53).

We use SCPs for region restriction, denying root account actions, and protecting security services (preventing anyone from disabling GuardDuty, Security Hub, CloudTrail, or Config).

Proactive Controls (CloudFormation Hooks)

Even with OpenTofu as the deployment tool, CloudFormation hooks still apply because many AWS services use CloudFormation under the hood for service-linked operations. Proactive controls catch non-compliant resources before creation: requiring IMDSv2 on EC2, encryption on EBS, denying public IPs on EC2, enforcing S3 encryption and public access blocks.

Detective Controls (Config Rules)

Config rules continuously evaluate compliance. They catch drift, manual console changes (and someone will always make manual console changes), and anything that slipped through preventive controls: unencrypted volumes, unrestricted SSH, missing MFA, publicly accessible S3 buckets.

Continuous Assessment

Security Hub aggregates findings from GuardDuty, Config, Inspector, and Firewall Manager. We enable NIST 800-53 Rev 5 and CIS AWS Foundations Benchmark v3.0. Findings are centralised in the security account and feed into the operations team's alerting pipeline.

Monitoring and Audit

CIS CloudWatch alarms cover critical API-level events: root account usage, console sign-in failures, IAM policy changes, network ACL changes, security group changes, CloudTrail configuration changes.

CloudTrail is enabled organisation-wide with logs delivered to the security account's S3 bucket. The bucket policy prevents deletion. Lifecycle rules handle retention.


The Quarantine OU

The Quarantine OU has a single SCP: deny all actions.

JSON
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAllActions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*"
    }
  ]
}

During an incident, you move the compromised account into the Quarantine OU. Every principal in that account immediately loses the ability to do anything. The attacker's access is cut. Resources stay intact for forensics, but nothing else can happen.

One MoveAccount API call. That's it. Compare that to trying to identify and revoke individual credentials while an incident is still active.


What This Gave Us

Everything is in one toolchain. Every resource is in state. Every change goes through tofu plan and code review before it touches anything. When we need to add a new account or VPC, it follows patterns we've already established — no special cases, no manual steps.

We still reference the LZA docs regularly. The architecture patterns are genuinely good. We just didn't need the deployment mechanism that comes with them.

David Christiansen
David Christiansen

Solution Architect with 30 years in cloud infrastructure, security, identity, and .NET engineering.

Related Posts