Metro Ethernet and MPLS circuits are often the first things people think of when trying to connect disparate networks across long distances or public networks. Those transport technologies work great, but the venerable IPsec has a number of points in its favor:

  • It’s been around forever, so it’s very standardized.
  • It’s pre-built into most routers and security devices, so it’s cost-effective.
  • It typically runs over the Internet, so there are no providers or vendors to deal with (other than the administrators controlling the remote end). If you control both ends, you don’t have to work with anyone at all.

Even though IPsec is well established and available across numerous devices, most people don’t optimize their setup to maximize their throughput, decrease CPU processing, or streamline network configurations.

This is most likely a result of IPsec’s notoriously finicky nature, as connections between sites won’t establish without what appear to be a complex set of hashing algorithms, encryption methods, and configurations that only apply to certain parts of an IPsec session. Let’s fix that.

IPsec can operate in two modes: tunnel and transport.

By far the most common is tunneling, which we’ll focus on. Transport, however, is a better choice that’s often overlooked.

IPsec Transport Mode

Transport operates similar to the way TLS/SSL works on the layer 7 level: any communication between two endpoints using a particular layer 7 protocol is authenticated (using hashes) and/or encrypted (using encryption algorithms). In the case of the well-known HTTPs protocol, this is standard web traffic.

IPsec in transport mode operates almost exactly the same way. It encrypts traffic between two endpoints with one important advantage: it encrypts ANY layer 7 protocol, and that protocol doesn’t have to know anything about encryption. This is because IPsec operates at layer 3, a much lower level in the networking stack. The benefit of this is two-fold:

  • The layer 7 protocol doesn’t have to know anything about authentication and encryption for it to be utilized.
  • It will automatically function for any protocol, whether it’s expected or not.

For example, no one in their right mind would operate a telnet daemon on a public interface in this day and age, but establish an IPsec transport link between two points, limit the telnet daemon to only respond to traffic that arrived via the secure IPsec transport, and that telnet traffic is fully secure. Everything is encrypted, from mundane pings to SIP traffic. And while you may think, “Well that’s nice. Encryption between two points is great but useless for connecting two networks together like tunnel mode does,” you have to realize that you can layer on all sorts of protocols you’d normally never run with public IPs. These include layering on protocols such as IP-in-IP, or Generic Routing Encapsulation (GRE) tunnels. GRE tunnels in particular are far easier to configure and scale out versus using IPsec in tunnel mode to handle routing.

This was actually to be the basis for something called opportunistic encryption, which was originally going to be required in IPv6; yes, everything on IPv6 was going to be automatically encrypted, whether the layer 7 protocol supported it or not. Due to certain technical hurdles, it never made it into the final IPv6 RFC. That said, transport mode is not often used because implementing it with certain vendors is either impossible and the documentation is buried or nonexistent; if it’s available, it’s usually only configurable via command line interfaces, while tunnel mode is exposed in clicky user interfaces. That’s unfortunate, as it has a number of very interesting uses. Today’s focus will be on standard IPsec tunnel mode.

IPsec Tunnel Mode

Authenticated and encrypted tunnel mode is what most people think of when IPsec is discussed. In this mode, two endpoints sitting on the edge of two separate networks establish a secure link across a hostile network (usually the internet), and packets flow between the two networks via that link as if they were directly connected without the terrifying internet in-between.

Tunnel mode accomplishes this by accepting IP packets from the internal network with the original headers, encrypting the entire packet, then encapsulating it within another packet. A hash of the packet is generated to insure authenticity when it arrives. That new packet contains the original inside of it, encrypted along with a hash, as the payload. This final packet is created using the Encapsulating Security Payload protocol (ESP), also known as IP protocol 50. ESP operates at layer 4 of the networking stack, and is therefore on the same level as protocols such as TCP and UDP. This is important to understand as packets between two endpoints don’t use TCP, UDP, ICMP, or anything like that; ESP is its own beast.

This packaged packet is forwarded to the other endpoint over the public and/or hostile network. After it’s received, the encapsulated packet within is extracted, decrypted, authenticated for validity using the hash, then forwarded on as determined by the original packet’s IP headers and the device’s routing policies. The sending and receiving machines have no idea that the packet traversed the hostile network.

Easy enough. But the devil’s in the details. So let’s break down how the link is formed and what we can do to streamline things.

IPsec Phases

IPsec links are established in two phases, appropriately referred to as phase 1 and phase 2. Once each phase is established and verified as authentic by both sides, they consider this an active security association (SA).

Each phase has its own set of SAs that utilizes a key to both authenticate and encrypt packets. After a specified expiration limit (usually a time counter, such as 24 hours), the SAs are rotated out with new ones that contain new keys. Periodically, to keep things secure, the negotiated metadata will be rotated, and the phases start up again. Though these are technically new tunnels, the process is mostly (we’ll go over this later) transparent to end users.

Phase 2 does the heavy lifting. All traffic is authenticated and encrypted based on the policies established in this phase.

Phase 1, on the other hand, is relatively quiet. Once it creates the secure channel for phase 2, not much data passes through the phase 1 policies. Because of this, it’s important for phase 2 to be focused on performance, as it’s constantly hashing/encrypting/decrypting, while phase 1 should be focused on security.

Once phase 2 is up, phase 1 is only used for metadata transfer during subsequent re-negotiations, which are extremely sporadic.

Phase 1

Let’s start with phase 1, aka the Internet Key Exchange (IKE) phase. When endpoint Alpha decides to use a tunnel to send a packet to endpoint Beta, it looks at its own configuration and sees:

Phase 1 Configuration

  • Hash Algorithm: HMAC-SHA1
  • Encryption Algorithm: AES256
  • Diffie-Hellman Key Exchange Group: 2
  • Key life: 86400 seconds (can also be specified as number of bytes)
  • Pre-shared key: k2;2.6TbYl+{/qa
  • Endpoint Alpha IP address:
  • Endpoint Beta IP address:
  • Mode: main

Alpha wants to bring the tunnel up, so it will send an IKE protocol packet to Beta containing a set of acceptable hashing and encryption methods to use. In this case, we would make an offer to use SHA1/AES256.

The IKE protocol is just a layer 7 protocol, operating over UDP port 500. Beta, if configured to use those methods, will ack the proposal. Alpha will then send a nonce of the pre-shared key secret password utilizing the Diffie-Hellman method with modulo (group) 2, or 1024 bits.

Diffie-Hellman Key Exchange

If you don’t know about Diffie-Hellman, it’s a very interesting, effective, yet simple way to securely exchange keys. Despite a lot of hand waving recently, Diffie-Hellman modulo 2 is still considered secure, as there is still no computing power available that would be able to brute force it. Also if there WAS a computer that could brute force DH Group 2, it would have to do it within 24 hours. See the key life parameter? The key is actually rotated every 24 hours. If you want to be even more paranoid against this theoretical, nonexistent number crunching computer, you could lower that to something like every few hours, as well as bump up the DH Group to the next one up, which is 5 (1536 bits).

Interestingly enough, certain vendors consider DH Group 14 (2048 bits) to be the minimum, though it’s unknown why they consider this, and the conspiratorial part of me thinks it may be because they’re just trying to sell beefier hardware. I do recommend against using DH Group 1, as it’s right on the cusp of being brute forced, though even if you have to use it can still be considered acceptable, as again, key lives in the IPsec world are exceedingly short (24 hours seems to be the max people set their phase 1 key lives to), and Group 1 could be considered vulnerable only if the same key is used over the span of multiple years.

Both endpoints complete the last part of the DH key exchange by sending nonces to each other – this time, however, they use the hashing and encryption algorithms that were previously agreed upon to secure the nonce in flight. This is a final test to verify that the endpoints are in agreement on phase 1 parameters . If it checks out, the phase 1 portion is considered successful, and you now have a phase 1 security association, or SA. Most devices will issue a log record indicating something to this effect. The SA used in this example will be valid for 86400 seconds, or 24 hours.

Phase 2

Phase 2 is a bit more straightforward. Using the newly created authenticated and encrypted channel established in phase 1, the following phase 2 parameters are confirmed between both endpoints, again, using IKE:

  • Hash Algorithm: HMAC-MD5
  • Encryption Algorithm: AES128
  • Key life: 43200 seconds
  • Perfect Forward Secrecy: disabled

That’s all well and good, but this brings up one of the main issues of IPsec parameters.

Why are we using these particular settings over another?

For encryption, we can use all sorts of methods, including DES, triple DES (3DES), Blowfish, Twofish, AES128, AES256, AES512, RC4, RSA, and so on. For hashing, we can use MD5, SHA1, or even none! And why are we using 24-hour key lifetimes?



111 W. Jackson Blvd #1600
Chicago, IL 60604
+1 (312) 829-1111

© 2022 ServerCentral, LLC dba

Inc. 5000 America's Fastest Growing Private Companies