/lab/azure-vnet

Building Azure networking that actually works in production

How a real Azure network gets built — VNets, subnets, ExpressRoute, hybrid connectivity, and the things that bite you when you go live. Practical patterns from production deployments.

You found a hidden page. Welcome.

The 30-second version

Azure networking is just networking, with a UI on top and someone else doing the cabling. The fundamentals are the same: subnets, routing, firewalls, NAT. But Azure has its own opinions about how those should fit together, and ignoring those opinions is how you end up paying $40,000/month for traffic that should have been free.

What follows is the layout I default to for any new Azure footprint.

The hub-and-spoke topology

                ┌──────────────────────────────────┐
                │       hub-vnet (10.0.0.0/16)     │
                │  ┌────────┐  ┌─────────────────┐ │
                │  │ Azure  │  │  ExpressRoute   │ │
                │  │Firewall│  │     Gateway     │ │
                │  └────────┘  └─────────────────┘ │
                │  ┌─────────────────────────────┐ │
                │  │     VPN Gateway (backup)    │ │
                │  └─────────────────────────────┘ │
                └────────────┬─────────────────────┘
                             │ peered
              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────┴──────┐  ┌────┴──────┐  ┌────┴──────┐
       │ prod-vnet   │  │ dev-vnet  │  │ shared-svc│
       │ 10.10.0.0/16│  │10.20.0.0/16│ │ 10.30.0.0/16│
       └─────────────┘  └────────────┘ └────────────┘

Why this layout:

Hub centralizes shared services — firewall, VPN, ExpressRoute. Each spoke doesn't need its own.
Spokes are isolated — prod can't talk to dev unless you explicitly route it through the hub firewall.
VNet peering is non-transitive by default — A peered to B, B peered to C, does NOT mean A talks to C. You force traffic through the hub firewall, where you can inspect it.
You scale by adding more spokes. Need a new app environment? New VNet, peer it to the hub, done.

Subnet design

Within a single VNet, I always carve at least these subnets:

Subnet	CIDR	What goes here
`web`	10.10.1.0/24	Front-end VMs / App Gateway backend pools
`app`	10.10.2.0/24	App tier (Tomcat, .NET, etc.)
`data`	10.10.3.0/24	SQL, Cosmos, Redis
`pe`	10.10.4.0/24	Private Endpoints (PaaS services like Storage)
`mgmt`	10.10.99.0/24	Bastion, jump boxes, monitoring agents

NSGs (Network Security Groups) attach to each subnet. Default deny inbound, explicit allows for the flows you actually need. No "allow all from VNet" rules — those defeat the entire point of subnet isolation.

Common mistake: putting Private Endpoints in the same subnet as your VMs. Don't. Put them in a dedicated pe subnet. When you need to disable network policies for PE (which you do), you don't accidentally disable them for your VMs.

ExpressRoute or VPN?

Need	Use
Predictable performance for production traffic	ExpressRoute
Bandwidth above 1 Gbps	ExpressRoute
SLA on bandwidth + latency	ExpressRoute
Quick test / dev / disaster recovery	Site-to-site VPN
Connecting a small branch office	VPN (over ExpressRoute if you have it)
Backup if ExpressRoute fails	VPN (always have one)

The right answer for production is "both." ExpressRoute as primary, VPN as backup. The VPN tunnel costs ~$30/month and saves you the day ExpressRoute has a regional issue. Configure them with the same BGP ASNs and Azure picks automatically.

The thing that nobody tells you about ExpressRoute

ExpressRoute has two layers of routing: peering (between your edge and Microsoft) and circuit-to-VNet (between the circuit and your gateways). Both have to be configured. Both have to advertise the right prefixes. People burn entire days because their on-prem subnet isn't reaching Azure and they can't tell which layer is the problem.

Diagnostic order, every time:

Is the circuit provisioned and BGP up at the peering layer? (Check in the Azure portal under your ExpressRoute resource.)
Is the route filter advertising the right prefixes from Microsoft? (Common with private peering for Microsoft 365 endpoints.)
Is the VNet gateway connected to the circuit? (Easy to forget — it's a separate resource.)
Are the on-prem prefixes appearing in the VNet's effective routes? (Use Network Watcher → Effective Routes for any NIC.)

Cost gotchas

The egregious surprises in our first year of Azure:

1. Cross-region traffic

Anything moving between Azure regions (East US ↔ West US, e.g.) costs ~$0.02/GB. A poorly designed app that pulls 5TB/day across regions = $3,000/month nobody budgeted. Solution: keep dependencies in-region. If you must cross regions, use Azure Front Door or paired regions to minimize the cost.

2. Public IPs on every VM

Each public IP is ~$3.65/month standalone, plus traffic. Multiply by 50 VMs and you're paying $200+/month for IPs you don't need. Use a single NAT Gateway for outbound, or route everything through Azure Firewall. Centralize public exposure.

3. Azure Firewall pricing

Azure Firewall Standard is ~$1.25/hour ≈ $900/month, before traffic. Premium is more. For low-traffic dev/test environments, this is wasteful. For prod with real traffic and threat intel needs, it's actually competitive. Just don't put one in every spoke; that's why hub-and-spoke exists.

Hybrid DNS — the worst Azure topic

DNS in hybrid Azure is genuinely the most painful surface. You have:

Azure-provided DNS (the default, can't be customized)
Private DNS zones (for resolving private endpoints)
On-prem DNS (your AD)
Azure DNS Private Resolver (newish, the right answer)

The pattern that works: Azure DNS Private Resolver in the hub VNet, forwarders configured both directions. On-prem DNS resolves Azure-private records via the resolver. Azure resources resolve on-prem AD records via the resolver. Both directions work, no DNS forwarder VMs to maintain. It's not free (~$200/month) but it's worth it.

Things to do on day one of a new Azure deployment

Build with Terraform. Manual VNet config in the portal is a waste of time and impossible to roll back cleanly.
Enable NSG flow logs to a Storage account. Day-zero habit. When something breaks at 2am you'll thank yourself.
Tag everything with at least: environment, owner, cost-center. Cost reports become useful instead of opaque.
Lock the resource group for production. Read-only lock at minimum. Prevents the Tuesday morning "oh god I deleted the wrong VNet" incident.
Separate subscriptions per environment. Prod, non-prod, sandbox. Blast radius and budget controls are per-subscription.

What I'd combine with this

If you're building this topology, you'll inevitably also need:

A peered AWS VPC for multi-cloud deployments — yes, VPC peering between AWS and Azure is real, and yes it's painful. Notes there.
Palo Alto firewall if you need real next-gen inspection (Azure Firewall is fine but limited)
Cisco Meraki at branch sites for the WAN side connecting back here

These notes come from real Azure deployments at Camping World, JLL, and side projects. Got an Azure design question? Email me — I genuinely enjoy these conversations.