Terraforming Cloudflare: in quest of the optimal setup

The Cloudflare Blog

The day my ping took countermeasures Announcing Claude Compliance API support with Cloudflare CASB Announcing Claude Managed Agents on Cloudflare Project Glasswing: what Mythos showed us Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse Browser Run: now running on Cloudflare Containers, it’s faster and more scalable When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug Building For The Future How Cloudflare responded to the “Copy Fail” Linux vulnerability When DNSSEC goes wrong: how we responded to the .de TLD outage Code Orange: Fail Small is complete. The result is a stronger Cloudflare network Introducing Dynamic Workflows: durable execution that follows the tenant Post-quantum encryption for Cloudflare IPsec is generally available Agents can now create Cloudflare accounts, buy domains, and deploy Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen Moving past bots vs. humans Building the agentic cloud: everything we launched during Agents Week 2026 The AI engineering stack we built internally — on the platform we ship Orchestrating AI Code Review at scale Introducing the Agent Readiness score. Check to see if your site is agent-ready Shared Dictionaries: compression that keeps up with the agentic web Redirects for AI Training enforces canonical content Unweight: how we compressed an LLM 22% without sacrificing quality Agents that remember: introducing Agent Memory Agents Week: network performance update Introducing Flagship: feature flags built for the age of AI Cloudflare’s AI Platform: an inference layer designed for agents Building the foundation for running extra-large language models AI Search: the search primitive for your agents Deploy Postgres and MySQL databases with PlanetScale + Workers Artifacts: versioned storage that speaks Git Email for agents - Cloudflare Email Service now in public beta Project Think: building the next generation of AI agents on Cloudflare Introducing Agent Lee - a new interface to the Cloudflare stack Register domains wherever you build: Cloudflare Registrar API now in beta Browser Run: give your agents a browser Rearchitecting the Workflows control plane for the agentic era Add voice to your agent Managed OAuth for Access: make internal apps agent-ready in one click Securing non-human identities: automated revocation, OAuth, and scoped permissions Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP Secure private networking for everyone: users, nodes, agents, Workers — introducing Cloudflare Mesh Building a CLI for all of Cloudflare Durable Objects in Dynamic Workers: Give each AI-generated app its own database Agents have their own computers with Sandboxes GA Dynamic, identity-aware, and secure Sandbox auth Welcome to Agents Week 500 Tbps of capacity: 16 years of scaling our global network From bytecode to bytes- automated magic packet generation Cloudflare targets 2029 for full post-quantum security How we built Organizations to help enterprises manage Cloudflare at scale Why we're rethinking cache for the AI era Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver Introducing EmDash — the spiritual successor to WordPress that solves plugin security Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers Cloudflare Client-Side Security: smarter detection, now open to everyone How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams A one-line Kubernetes fix that saved 600 hours a year Sandboxing AI agents, 100x faster Inside Gen 13- how we built our most powerful server yet Launching Cloudflare’s Gen 13 servers- trading cache for cores for 2x edge compute performance Powering the agents: Workers AI now runs large models, starting with Kimi K2.5 Introducing Custom Regions for precision data control Standing up for the open Internet- why we appealed Italy’s Piracy Shield fine From legacy architecture to Cloudflare One Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans Slashing agent token costs by 98% with RFC 9457-compliant error responses AI Security for Apps is now generally available Building a security overview dashboard for actionable insights Investigating multi-vector attacks in Log Explorer Translating risk insights into actionable protection: leveling up security posture with Cloudflare and Mastercard Fixing request smuggling vulnerabilities in Pingora OSS deployments Active defense: introducing a stateful vulnerability scanner for APIs Complexity is a choice. SASE migrations shouldn’t take years. From the endpoint to the prompt: a unified data security vision in Cloudflare One Ending the "silent drop": how Dynamic Path MTU Discovery makes the Cloudflare One Client more resilient A QUICker SASE client: re-building Proxy Mode How Automatic Return Routing solves IP overlap Always-on detections: eliminating the WAF “log versus block” trade-off Mind the gap: new tools for continuous enforcement from boot to login Stop reacting to breaches and start preventing them with User Risk Scoring Defeating the deepfake: stopping laptop farms and insider threats Moving from license plates to badges: the Gateway Authorization Proxy Evolving Cloudflare’s Threat Intelligence Platform: actionable, scalable, and ETL-less Introducing the 2026 Cloudflare Threat Report See risk, fix risk: introducing Remediation in Cloudflare CASB How Cloudy translates complex security into human action From reactive to proactive: closing the phishing gap with LLMs Modernizing with agile SASE: a Cloudflare One blog takeover Beyond the blank slate: how Cloudflare accelerates your Zero Trust journey The truly programmable SASE platform Toxic combinations: when small signals add up to a security incident We deserve a better streams API for JavaScript The most-seen UI on the Internet? Redesigning Turnstile and Challenge Pages ASPA: making Internet routing more secure Bringing more transparency to post-quantum usage, encrypted messaging, and routing security How we rebuilt Next.js with AI in one week Cloudflare One is the first SASE offering modern post-quantum encryption across the full platform Cloudflare outage on February 20, 2026

Cloudflare Team · 2019-10-09 · via The Cloudflare Blog

This is a guest post by Dimitris Koutsourelis and Alexis Dimitriadis, working for the Security Team at Workable, a company that makes software to help companies find and hire great people.

Overview

This post is about our introductive journey to the infrastructure-as-code practice; managing Cloudflare configuration in a declarative and version-controlled way. We'd like to share the experience we've gained during this process; our pain points, limitations we faced, different approaches we took and provide parts of our solution and experimentations.

Terraform world

Terraform is a great tool that fulfills our requirements, and fortunately, Cloudflare maintains its own provider that allows us to manage its service configuration hasslefree.

On top of that, Terragrunt, is a thin wrapper that provides extra commands and functionality for keeping Terraform configurations DRY, and managing remote state.

The combination of both leads to a more modular and re-usable structure for Cloudflare resources (configuration), by utilizing terraform and terragrunt modules.

We've chosen to use the latest version of both tools (Terraform-v0.12 & Terragrunt-v0.19 respectively) and constantly upgrade to take advantage of the valuable new features and functionality, which at this point in time, remove important limitations.

Workable context

Our set up includes multiple domains that are grouped in two distinct Cloudflare organisations: production & staging. Our environments have their own purposes and technical requirements (i.e.: QA, development, sandbox and production) which translates to slightly different sets of Cloudflare zone configuration.

Our approach

Our main goal was to have a modular set up with the ability to manage any configuration for any zone, while keeping code repetition to a minimum. This is more complex than it sounds; we have repeatedly changed our Terraform folder structure - and other technical aspects - during the development period. The following sections illustrate a set of alternatives through our path, along with pros & cons.

Structure

Terraform configuration is based on the project's directory structure, so this is the place to start.

Instead of retaining the Cloudflare organisation structure (production & staging as root level directories containing the zones that belong in each organization), our decision was to group zones that share common configuration under the same directory. This helps keep the code dry and the set up consistent and readable.

On the down side, this structure adds an extra layer of complexity, as two different sets of credentials need to be handled conditionally and two state files (at the environments/ root level) must be managed and isolated using workspaces.

On top of that, we used Terraform modules, to keep sets of common configuration across zone groups into a single place.Terraform modules repository

modules/
│    ├── firewall/
│        ├── main.tf
│        ├── variables.tf
│    ├── zone_settings/
│        ├── main.tf
│        ├── variables.tf
│    └── [...]  
└──

Terragrunt modules repository

environments/
│    ├── [...]
│    ├── dev/
│    ├── qa/
│    ├── demo/
│        ├── zone-8/ (production)
│            └── terragrunt.hcl
│        ├── zone-9/ (staging)
│            └── terragrunt.hcl
│        ├── config.tfvars
│        ├── main.tf
│        └── variables.tf
│    ├── config.tfvars
│    ├── secrets.tfvars
│    ├── main.tf
│    ├── variables.tf
│    └── terragrunt.hcl
└──

The Terragrunt modules tree gives flexibility, since we are able to apply configuration on a zone, group zone, or organisation level (which is inline with Cloudflare configuration capabilities - i.e.: custom error pages can also be configured on the organisation level).

Resource types

We decided to implement Terraform resources in different ways, to cover our requirements more efficiently.

1. Static resource

The first thought that came to mind was having one, or multiple .tf files implementing all the resources with hardcoded values assigned to each attribute. It's simple and straightforward, but can have a high maintenance cost if it leads to code copy/paste between environments.

So, common settings seem to be a good use case; we chose to implement access_rules Terraform resources accordingly:modules/access_rules/main.tf

resource "cloudflare_access_rule" "no_17" {
  notes   = "this is a description"
  mode    = "blacklist"
  configuration = {
    target  = "ip"
    value   = "x.x.x.x"
  }
}
[...]

2. Parametrized resources

Our next step was to add variables to gain flexibility. This is useful when few attributes of a shared resource configuration differ between multiple zones. Most of the configuration remains the same (as described above) and the variable instantiation is added in the Terraform module, while their values are fed through the Terragrunt module, as input variables, or entries inside_.tfvars_ files. The zone_settings_override resource was implemented accordingly:

modules/zone_settings/main.tf

resource "cloudflare_zone_settings_override" "zone_settings" {
  zone_id = var.zone_id
  settings {
    always_online       = "on"
    always_use_https    = "on"
    [...]
    browser_check       = var.browser_check
    mobile_redirect {
      mobile_subdomain  = var.mobile_redirect_subdomain
      status            = var.mobile_redirect_status
      strip_uri         = var.mobile_redirect_uri
    }
    
    [...]
    waf                 = "on"
    webp                = "off"
    websockets          = "on"
  }
}

environments/qa/main.tf

module "zone_settings" {
  source        = "[email protected]:foo/modules/zone_settings"
  zone_name     = var.zone_name
  browser_check = var.zone_settings_browser_check
  [...]
}

environments/qa/config.tfvars

#zone settings
zone_settings_browser_check = "off"
[...]
}

3. Dynamic resource

At that point, we thought that a more interesting approach would be to create generic resource templates to manage all instances of a given resource in one place. A template is implemented as a Terraform module and creates each resource dynamically, based on its input: data fed through the Terragrunt modules (/environments in our case), or entries in the tfvars files.

We chose to implement the account_member resource this way.modules/account_members/variables.tf

variable "users" {
  description   = "map of users - roles"
  type          = map(list(string))
}
variable "member_roles" {
  description   = "account role ids"
  type          = map(string)
}

modules/account_members/main.tf

resource "cloudflare_account_member" "account_member" {
 for_each          = var.users
 email_address     = each.key
 role_ids          = [for role in each.value : lookup(var.member_roles, role)]
 lifecycle {
   prevent_destroy = true
 }
}

We feed the template with a list of users (list of maps). Each member is assigned a number of roles. To make code more readable, we mapped users to role names instead of role ids:environments/config.tfvars

member_roles = {
  admin       = "000013091sds0193jdskd01d1dsdjhsd1"
  admin_ro    = "0000ds81hd131bdsjd813hh173hds8adh"
  analytics   = "0000hdsa8137djahd81y37318hshdsjhd"
  [...]
  super_admin = "00001534sd1a2123781j5gj18gj511321"
}
users = {
  "[email protected]"  = ["super_admin"]
  "[email protected]"  = ["analytics", "audit_logs", "cache_purge", "cf_workers"]
  "[email protected]"  = ["cf_stream"]
  [...]
  "[email protected]" = ["cf_stream"]
}

Another interesting case we dealt with was the rate_limit resource; the variable declaration (list of objects) & implementation goes as follows:modules/rate_limit/variables.tf

variable "rate_limits" {
  description   = "list of rate limits"
  default       = []
 
  type          = list(object(
  {
    disabled    = bool,
    threshold   = number,
    description = string,
    period      = number,
    
    match       = object({
      request   = object({
        url_pattern     = map(string),
        schemes         = list(string),
        methods         = list(string)
      }),
      response          = object({
        statuses        = list(number),
        origin_traffic  = bool
      })
    }),
    action      = object({
      mode      = string,
      timeout   = number
    })
  }))
}

modules/rate_limit/main.tf

locals {
 […]
}
data "cloudflare_zones" "zone" {
  filter {
    name    = var.zone_name
    status  = "active"
    paused  = false
  }
}
resource "cloudflare_rate_limit" "rate_limit" {
  count         = length(var.rate_limits)
  zone_id       =  lookup(data.cloudflare_zones.zone.zones[0], "id")
  disabled      = var.rate_limits[count.index].disabled
  threshold     = var.rate_limits[count.index].threshold
  description   = var.rate_limits[count.index].description
  period        = var.rate_limits[count.index].period
  
  match {
    request {
      url_pattern     = local.url_patterns[count.index]
      schemes         = var.rate_limits[count.index].match.request.schemes
      methods         = var.rate_limits[count.index].match.request.methods
    }
    response {
      statuses        = var.rate_limits[count.index].match.response.statuses
      origin_traffic  = var.rate_limits[count.index].match.response.origin_traffic
    }
  }
  action {
    mode        = var.rate_limits[count.index].action.mode
    timeout     = var.rate_limits[count.index].action.timeout
  }
}

environments/qa/rate_limit.tfvars

common_rate_limits = [
{
    #1
    disabled      = false
    threshold     = 50
    description   = "sample description"
    period        = 60
   
   match  = {
      request   = {
        url_pattern  = {
          "subdomain"   = "foo"
          "path"        = "/api/v1/bar"
        }
        schemes         = [ "_ALL_", ]
        methods         = [ "GET", "POST", ]
      }
      response  = {
        statuses        = []
        origin_traffic  = true
      }
    }
    action  = {
      mode      = "simulate"
      timeout   = 3600
    }
  },
  [...]
  }
]

The biggest advantage of this approach is that all common rate_limit rules are in one place and each environment can include its own rules in their .tfvars. The combination of those using Terraform built-in concat() function, achieves a 2-layer join of the two lists (common|unique rules). So we wanted to give it a try:

locals {
  rate_limits  = concat(var.common_rate_limits, var.unique_rate_limits)
}

There is however a drawback: .tfvars files can only contain static values. So, since all url attributes - that include the zone name itself - have to be set explicitly in the data of each environment, it means that every time a change is needed to a url, this value has to be copied across all environments and change the zone name to match the environment.

The solution we came up with, in order to make the zone name dynamic, was to split the url attribute into 3 parts: subdomain, domain and path. This is effective for the .tfvars, but the added complexity to handle the new variables is non negligible. The corresponding code illustrates the issue:modules/rate_limit/main.tf

locals {
  rate_limits   = concat(var.common_rate_limits, var.unique_rate_limits)
  url_patterns  = [for rate_limit in local.rate_limits:  "${lookup(rate_limit.match.request.url_pattern, "subdomain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "subdomain")}." : ""}"${lookup(rate_limit.match.request.url_pattern, "domain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "domain")}" : ${var.zone_name}}${lookup(rate_limit.match.request.url_pattern, "path", null) != null ? lookup(rate_limit.match.request.url_pattern, "path") : ""}"]
}

Readability vs functionality: although flexibility is increased and code duplication is reduced, the url transformations have an impact on code's readability and ease of debugging (it took us several minutes to spot a typo). You can imagine this is even worse if you attempt to implement a more complex resource (such as page_rule which is a list of maps with four url attributes).

The underlying issue here is that at the point we were implementing our resources, we had to choose maps over objects due to their capability to omit attributes, using the lookup() function (by setting default values). This is a requirement for certain resources such as page_rules: only certain attributes need to be defined (and others ignored).

In the end, the context will determine if more complex resources can be implemented with dynamic resources.

4. Sequential resources

Cloudflare page rule resource has a specific peculiarity that differentiates it from other types of resources: the priority attribute.When a page rule is applied, it gets a unique id and priority number which corresponds to the order it has been submitted. Although Cloudflare API and terraform provider give the ability to explicitly specify the priority, there is a catch.

Terraform doesn't respect the order of resources inside a .tf file (even in a _for each loop!); each resource is randomly picked up and then applied to the provider. So, if page_rule priority is important - as in our case - the submission order counts. The solution is to lock the sequence in which the resources are created through the depends_on meta-attribute:

resource "cloudflare_page_rule" "no_3" {
  depends_on  = [cloudflare_page_rule.no_2]
  zone_id     = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target      = "www.${var.zone_name}/foo"
  status      = "active"
  priority    = 3
  actions {
    forwarding_url {
      status_code    = 301
      url            = "https://www.${var.zone_name}"
    }
  }
}
resource "cloudflare_page_rule" "no_2" {
  depends_on  = [cloudflare_page_rule.no_1]
  zone_id     = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target      = "www.${var.zone_name}/lala*"
  status      = "active"
  priority    = 24
  actions {
    ssl                     = "flexible"
    cache_level             = "simplified"
    resolve_override        = "bar.${var.zone_name}"
    host_header_override    = "new.domain.com"
  }
}
resource "cloudflare_page_rule" "page_rule_1" {
  zone_id   = lookup(data.cloudflare_zones.zone.zones[0], "id")
  target    = "*.${var.zone_name}/foo/*"
  status    = "active"
  priority  = 1
  actions {
    forwarding_url {
      status_code     = 301
      url             = "https://foo.${var.zone_name}/$1/$2"
    }
  }
}

So we had to go with to a more static resource configuration because the depends_on attribute only takes static values (not dynamically calculated ones during the runtime).

Conclusion

After changing our minds several times along the way on Terraform structure and other technical details, we believe that there isn't a single best solution. It all comes down to the requirements and keeping a balance between complexity and simplicity. In our case, a mixed approach is good middle ground.

Terraform is evolving quickly, but at this point it lacks some common coding capabilities. So over engineering can be a catch (which we fell-in too many times). Keep it simple and as DRY as possible. :)

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Cloudflare Blog

Overview

Terraform world

Workable context

Our approach

Structure

Resource types

1. Static resource

2. Parametrized resources

3. Dynamic resource

4. Sequential resources

Conclusion