Avoiding Common Pitfalls in Terraform Module Design in coding
I've been working with Terraform for over 6 years now, and during this time, I've encountered several common mistakes when designing modules that can lead to frustrations, mismanagement, and even mistrust in the tool itself.
I continue to see teams complaining that Terraform is overly complex, hard to maintain, doesn't scale well, and – the most common complaint – that state files are unreliable and need constant manual intervention. That last point, especially, leads teams to fight against Terraform rather than working with it. In some cases, teams abandon the use of CICD processes altogether, opting instead to run Terraform commands manually, which only leads to more problems (not to mention a compliance and security nightmare).
Before I go into more detail, I want to clarify that I will be using the terms "root module" and "child module" throughout this post. A root module is the top-level configuration in a Terraform project, while a child module is a reusable component that can be called from multiple root modules. A root module would define provider and backend configuration, while a child module would not.
Poor Module Naming
One of the most common mistakes I see is poor module naming (especially root modules). Modules should be named by their business purpose, not by the resource type they manage.
For example, a module that manages an S3 bucket for storing user uploads should be named user-uploads-bucket
rather than just s3-bucket
. This makes it clear what the module is for and helps avoid confusion when multiple modules manage similar resources. It also makes it easier to understand the overall architecture of the infrastructure at a glance when you see the root modules invoked for an environment.
Even when creating a truly reusable child module, the name should reflect its purpose. For example, if I were creating a module that would encapsulate the default configuration for all S3 buckets in my organization that adhered to best practices and compliance requirements, I would name it something like compliant-s3-bucket
. This makes it clear that the module is not just a generic S3 bucket but one that meets specific organizational standards. That module would then be invoked by a "static-website-bucket" module or a "user-uploads-bucket" module, etc.
Poor Module Scope
This goes hand-in-hand with poor module naming. Modules should have a clear and focused scope. A module that tries to do too much becomes difficult to understand, maintain, and reuse.
I can't count how many times I've seen an engineer adding a variable to a module to add some new functionality that has already had a dozen engineers do the same before them; each shovelling on more complexity and scope. The result is always the same: a module that nobody understands, nobody wants to use, and nobody wants to maintain. We only really notice these modules once they've reached the point of being unmanageable, and by then, it's often easier to rewrite them from scratch than to try and untangle the mess.
I think of modules a lot like microservices. Each module should have a single responsibility and do it well. If a module is responsible for managing a VPC, it shouldn't also be responsible for managing EC2 instances or RDS databases. Those should be separate modules that can be composed together as needed. If you find yourself adding more and more variables to a module, it's a sign that the module's scope is too broad and needs to be broken down into smaller, more focused modules.
That may sound like I'm advocating for incredibly small modules, and in some cases, I am. But the key is that each module should have a clear purpose and be easy to understand. A module that encapsulates a single cloud resource is likely too small, but a module that encapsulates a single business function (like "user authentication" or "payment processing") is likely just right.
Following this principle will also make naming the module easier – if you can name the purpose, you can define the scope and you can give it a good name.
Modules are too flexible
I see a lot of recommendations and "best practices" that advocate for making modules as flexible as possible. They don't use those words directly, but they recommend that everything should be a variable, and that modules should be designed to handle a wide variety of use cases. And I see engineers taking this to the absolute extreme, creating modules with dozens of variables, many of which are complex objects or maps.
A good example is the creation of an IAM role. I often see a child module that creates an AWS IAM role in a generic way, but then also has variables for the definition of the role's policies, trust relationships, tags, and even the ability to attach managed policies. The result is a module that is so flexible that it can be used for almost any purpose, but it's also incredibly complex and difficult to use. It's also pretty meaningless.
In these cases, I typically create a module that creates a role with a specific purpose, such as service-irsa-role
. The module would have a few variables, such as the role name (but probably not even directly given the role name .. I'd probably have other business-relevant variables that would be used to construct the role name, like environment
, application
, and component
), But I would avoid making the module too flexible by allowing for arbitrary policies or managed policy attachments. And I would absolutely NOT allow for the trust relationship to be defined as a variable. The trust relationship should be fixed based on the purpose of the module (in this case for IRSA). When the service using this role needs additional permissions, the root module that invokes this child module can create and attach additional policies as needed. This allows the child module to remain focused and easy to use while still allowing for flexibility in the root module – and the root module can be invoked in different environments for deploying the same application in dev, staging, and production.
Given this opinion, you may not be surprised to hear that I avoid using most modules from the Terraform Registry. I find that most of them are too generic and flexible, making them difficult to use and understand. I'll often use them as a reference to determine which resources should be included in my own module. I prefer to create my own modules that are focused on specific business purposes and have a clear scope. This also allows me to enforce organizational standards and best practices more easily.
Exploiting Terragrunt's hooks
While this isn't exactly a Terraform mistake, I see a lot of teams using Terragrunt's hooks to run custom scripts or commands before or after Terraform commands. These teams often don't agree with the idea of using static and simple Terraform modules, and instead want to run custom logic to modify the state or configuration before applying changes. In my experience, they usually have a coding/scripting approach to Terraform, rather than a declarative "configuration" approach. They'll exploit Terragrunt's hooks to clean up state or modify configuration files.
I see this as an exploitation. They're fighting against the nature of Terraform, trying to bend it to match their mindset and not fully understanding how Terraform is designed to work. The worst part is that they won't see how great Terraform can be when used correctly because they're so focused on trying to make it work their way. These teams usually end up with brittle, hard-to-maintain infrastructure that requires constant manual intervention and breaks frequently and they end up distrusting Terraform as a whole.
Granted, I'm not against Terragrunt hooks (or even Terragrunt's code generation features), but I think they should be used sparingly and only when absolutely necessary.
Death by a thousand cuts
Developer discipline is a big subject and very much applies to Terraform. Developers are constantly weighing the cost of doing something "the right way" versus doing it "the easy way" or "the quick way".
It's so easy to see a module called "elasticache" that is used to create Elasticache Redis clusters and modify it to also be able to create Memcached clusters. Or to add a list(object)
variable to a VPC module to allow for the creation of additional subnets. Or even just a list(string)
variable to allow for additional tags. Each of these changes may seem small and insignificant on their own, but over time, they add up and lead to a module that is so complex and difficult to use that nobody wants to use it anymore.