Infrastructure Autonomy using DNS Delegation and internal Top Level Domains

In this post we’ll talk about using a specific Top Level domain to separate your internal application infrastructure addresses from what you’re users see. Further, how to provide team level autonomy to using DNS delegation to provide a predictable naming strategy.

The Problem

One of the big issues with DNS management is the security elements of allowing people to add what they want to your prized potential, your front facing domain. I’ve seen this process create teams that simply manage the DNS, which is not great really very cost effective. On the opposite side, I’ve seen organisations where everyone has access to change any DNS entry, or even transfer the domain ownership.

When working with infrastructure as code, and creating things like AWS ALB’s or Azure Load Balancers, their names are… less than predictable. Further, when you’re treating this environments as something that can be torn down and spun up, those services within Azure or AWS will get new, random, names each time. This means that any other teams relying on these services will have to constantly change their configurations.

Providing team autonomy to manage as much of their stack is hard when they have to submit a request, signed by the CEO, COO, Chairman, etc. that takes a week to anction, just to point their subdomain to the new things they’ve created.

The Solution

Providing a predictable, static name for all your internal resources allows for things outside of the immediate teams to be “slow moving”, while still allowing the autonomy of the team to iterate at it’s own pace.

I also like to use predictable DNS entries to allow people to navigate and identify the purpose, team etc. of the service. This provides a lot of consistency when it comes to cross team working, and further allows the “underlying” resource to change, without external notification.

What I’ve done at every business I’ve been into recently is create a “delegation” infrastructure.

I’ve spent quite a lot of time defining and using an approach to combat this in both Azure and AWS. It’s not exactly “news” or “complicated”, what I would say is that this has been a stumbling block in cloud understanding/adoption. Understanding how to disassociate the “old school” static servers with known IP addresses, with transient services with IP addresses you don’t control is where a lot of the old school developers and engineers have struggled.

The Idea

The basic idea starts with separating how you address your “internal” addressing of services with what the end users see.

So, address your services using an internal domain name, and then link them together using CNAME entries.

E.g. if your domain is http://www.martinjt.me, use a new domain internal-martinjt.me

In this example, our users use http://www.martinjt.me However, our server lives on an emphemeral IP inside AWS which is currently 50.50.50.50. The team developing their website DO NOT have control over martinjt.me entries, as this is managed centrally by a shared team for “security and/or compliance” reasons. They do have access to add/delete/amend records on the internal domain though, which they use to setup a consistent record that is updated every time they change their backend service. Finally, the central/shared team has setup a CNAME entry to that internal domain, that won’t change.

The result in this example is that the team has access to change the destination of http://www.martinjt.me, without having direct access to it.

Subdomain Delegation

You can take this a step further, so that you don’t need to setup lots of Top Level domains.

Both Azure and AWS have the ability for you to setup DNS Zones for a Subdomain (e.g. team1.internal-martinjt.me, team2.internal-martinjt.me, etc.). These DNS Zones can then be setup inside of that teams control, and they can create more entries. Personally, I prefer to setup a structure like this:

{application|service}.{env}.{team}.internal-martinjt.me

e.g.

In this example, “booking.internal-martinjt.me” is delegated to the “Booking” team, that manage a series of applications, and they then create DNS entries for each of their applicaitons, in each environment.

Conclusion

Granting access to your frontend domain is risky, and requires a lot of trust. However, providing the team with the autonomy is predicated on granting them access to it.

All the cloud providers are able to provide this ability, and further, you’re also able to use a DNS provider like GoDaddy, etc. that you can give the teams access to.

This doesn’t, however, mitigate the security risk of this new domain being used to redirect the main site. It does stop that from being a permanent issue where that security is breached.

17