In Terraform, I’ve found that KISS > DRY and that the use of complex data structures as inputs are a gilded foot gun.

I’ve now created a few large Terraform repositories that have managed sprawling infrastructure spanning multiple environments at once and I’ve felt pain.

Oh! How I've hurt myself.

I learned Terraform as a software engineer and I applied my opinionated “best practices” to Terraform code – especially DRY. Terraform’s primary means of DRYing your code is using Modules and I thought of Modules like do-everything blueprints. If I needed an AWS VPC, I’d create a “network” module. As my project needs changed, so would the module; it would be made to be more flexible.

That single network module would morph. Originally, it would only create a VPC and maybe a couple of subnets. Then public/private subnet pairs in 2 availability zones. Then, the big mistake would happen: I would make the module dynamic.

My project would pass in subnet configuration as an input .. something like:

subnets = {
  us_east_1a_public = {
    availability_zone = "us-east-1a",
    cidr              = "10.0.0.0/20",
  },
  us_east_1a_private = {
    availability_zone = "us-east-1a",
    cidr              = "10.0.16.0/20"
  },
  us_east_1b_public = {
    availability_zone = "us-east-1b",
    cidr              = "10.0.32.0/20",
  },
  us_east_1b_private = {
    availability_zone = "us-east-1b",
    cidr              = "10.0.48.0/20"
  }
  # You know there's more ... you're HA, right?
}

Then, you want to add network routing rules to just your private subnets, but you don’t want to refactor a ton, so each entry in your inputs starts to look like …

us_east_1a_public = {
  availability_zone = "us-east-1a",
  cidr              = "10.0.0.0/20",
  routes = {
    "0.0.0.0/0" = aws_internet_gateway.igw.id
  }
},
us_east_1a_private = {
  availability_zone = "us-east-1a",
  cidr              = "10.0.16.0/20"
  routes = {
    "0.0.0.0/0" = modules.all_my_nat_gateways.nat_ids["us_east_1a"]
  }
}

You know that’s ugly. It looks right but feels wrong. Eh, it works, right? It’s fine.

It wasn’t fine.
-Narrator

I think most software developers go down this path. They don’t want to repeat themselves and they’re told that static values inside code files is wrong. We do everything we can to avoid typing the word resource a 2nd time or 3rd time and we sure as hell don’t want to see more than one resource block of the same type … I mean, can you imagine?!

You’ve done this, too, dear reader. I’m sure of it. Maybe you’ve noticed why it hurts … maybe not.

This is what you’ve done:

  • You’ve added logic to a framework based on static files.
  • You’ve made your resource block complex – it has expressions and loops in it. Maybe you had to create locals to hide the complexity, but that made it worse.
  • You’ve stopped defining resources in Terraform code and started defining them in your own language.
  • You now have an undocumented interface.
  • You’ve taken a declarative language and made it imperative.
  • You think you’ve made something beautiful.

You've really stepped in it this time.

Terraform is a framework for declaring your infrastructure using a simple, static, block-based interface. You’ve just stolen all three of its selling points from it. You’ve undone the very essence of Terraform. You monster.

As your repo of related projects grows, so does your library of widely reused, overly generic, modules. And so do the complexity of each of them as they morph and bend to fit scenarios they were never intended for.

Here comes my hot takes … my lessons learned. My cheat codes to avoid that pain.

More Terraform code is better than less.

Terraform projects and modules should be inflexible, single-purposed, and opinionated for each scenario. Input variables should be avoided unless their purpose is obvious.

So what that you have each subnet defined in its own resource block with the CIDR inline as a static value?

resource "aws_subnet" "us_east_1a_public" {
  vpc_id            = aws_vpc.my_vpc.id
  cidr_block        = "10.0.0.0/20"
  availability_zone = "us-east-1a"

  tags = {
    Name = "us-east-1a Public"
  }
}

resource "aws_subnet" "us_east_1a_private" {
  vpc_id            = aws_vpc.my_vpc.id
  cidr_block        = "10.0.16.0/20"
  availability_zone = "us-east-1a"

  tags = {
    Name = "us-east-1a Private"
  }
}

Nothing. That is readable. It’s reasonable. It’s easy to reason about. People unfamiliar with Terraform can understand it. It’s documented. It’s standardized. It’s beautiful.