Cloud infrastructures and virtual datacenters are great. Nobody can deny it. It is amazing to start a whole infrastructure with a couple of clicks and throw it away if you didn’t like it. Cool, isn’t? Such amazing technologies require a complete new armory of protocols, overlay technologies, security measures and many other things that we, the guys at networking, must care to provide good, reliable, fast, redundant, secure and stable networks to the guys of the upper floors. One of these new solutions which have been released in the last years is VXLAN.
The idea of Virtual eXtensible LANs is to connect two physical separated networks using the same subnet IP block and the same VLAN tag on both sites if needed. Apart from that, VXLAN offers more than 16 millions VLAN IDs by using 24 bits overthrowing the lack of flexibility in 802.1q which only supports 4096 different VLANs (12 bits). This makes VXLAN an excellent solution for virtualized multi-tenants environments and a very scalable overlay technology to be used inside virtual datacenter (vDC) deployments. VXLAN can also use Multicast to discover other VTEPs in the same network (very useful when running a virtual distributed switch, like vSphere distributed Switch from VMware or Nexus 1000v from Cisco, on multiple physical servers)
To understand the basics of VXLAN we’ll use the following topology:
Two (or more) separated virtual machines can work in the same subnet on both ends by using VXLAN as overlay transport even when those L2 domains are separated by many L3 devices. So, how does it work? It is pure network sorcery 😉
1) When the VM1 sends en IP packet destined to 172.23.153.20 (VM2), this packet will be encapsulated into a normal Ethernet frame (assume no VLAN tagging to simplify the example) with the following parameters and be sent to its VTEP (VXLAN Tunnel EndPoint), VTEP-1 in our case:
Destination IP : 172.23.153.20 (VM2)
Source IP : 172.23.153.10 (VM1)
Destination MAC : 00:BB:AA:00:00:12 (VM2)
Source MAC : 00:AA:AA:00:00:11 (VM1)
We will call this frame an “inner Ethernet frame” and we will assume that this IP packet carries SSH data between both VMs (SSH Data + TCP header). In this example we will transport 92 bytes of pure SSH encrypted data + a 32 bytes TCP header (Don’t forget that the minimum size of a TCP header is 20 bytes and the maximum is 60) which makes a 124 bytes long TCP segment being transported inside an IP packet which adds 20 bytes extra. So far we have a 144 bytes IP packet which is being encapsulated into an Ethernet Frame and traveling from VM1 to its next stop (VTEP-1):
This inner Ethernet frame has a 14 bytes header (Preamble not counted, Destination MAC 6 bytes, Source MAC 6 bytes, Type/Lenght 2 bytes) plus a normal 4 bytes FCS trailer.
2) Once this message arrives to VTEP-1 an 8-bytes VXLAN header containing VNI (VXLAN Network ID) 5000 is added and the whole frame, except its FCS, is treated as payload inside a new UDP datagram which we will call “outer UDP”. VXLAN uses UDP port 8472. This UDP datagram is added a new IPv4 header (Outer IPv4) and again encapsulated into a new Outer Ethernet frame.
This frame travels from the VTEP-1 to the next router in the IP backbone and the packet inside is treated as any normal IP packet because it is routed based on the destination IP address (VTEP-2).
This outer frame has the following parameters:
Destination MAC : 00:CC:CC:12:34:56 (VTEP-1’s default gateway)
Source MAC : 00:DD:DD:22:22:22 (VTEP-1)
While the outer IP packet, used to reach VTEP-2 contains:
Destination IP : 18.104.22.168 (VTEP-2)
Source IP : 22.214.171.124 (VTEP-1)
3) Just before arriving at VTEP-2 we can take a look at the outer IP packet and its encapsulating frame
As you can see, nothing has changed in our outer IPv4 packet or its content, but only the source and destination MACs between the last router and VTEP-2 were added.
4) VTEP-2 receives the outer frame and decapsulates it, strips the IPv4 packet, removes the UDP header and checks the VXLAN VNI (5000). Its internal configuration indicates that VNI 5000 is only allowed to connect to VM2. Now the inner Ethernet frame moves to VM2 as if it was connected on the same physical LAN.
At this point, the content of the inner Ethernet frame is the same as in step 1:
This ability allows the infrastructure to handle multiple tenants networks by encapsulating millions VNIs inside a normal UDP packet supporting overlapping IP addresses and overlapping MACs (not inside the same VNI, of course)
Multi-tenancy network designs are critical to implement cloud solutions and VXLAN is becoming one of the preferred protocols to provide security and flexibility to customers. What is the difference with GRE? Many, but the most obvious is that GRE requires 3 different IP subnets (site A, tunnel and site B) and VXLAN only one (plus the normal routing core subnets). Some may argue that QinQ would be enough but sadly QinQ is only a L2 technology and nobody wants to support the whole network infrastructure in such a poor design. VXLAN is becoming the de-facto standard for networks virtualization and is here to stay a long while.