This document is a comprehensive, high-level overview of what IPsec is and how it works.
IPsec is a flexible network security framework that may be applied to a number of different use case scenarios. In spite of its ubiquity, IPsec has got to be one of the most confusing standards in IT Security.
IPsec is confusing because it is defined by a large group of intertwined standards, protocols, and frameworks. No single document explains IPsec down to every detail. There are literally hundreds of documents that describe portions of IPsec processes and functions (though the vast majority of them are outdated). The purpose of this document is to distill down relevant IPsec specifications into a simple overview, eloquently capturing the essence and most salient points of what IPsec is and how it works.
Contents
- IPSec Overview
- Why IPsec Matters
- IPSec Process Flow
- Implementation
- IPsec Modes
- Security Associations (SAs)
- IKE: Internet Key Exchange (IKE)
- IPsec Cryptography
- Diagrams
IPSec Overview
Internet Protocol Security (IPSec) is a collection of cryptographic services and security protocols that protects communication between devices sending traffic through an untrusted network (e.g. the Internet). It is a security gateway model, providing an end-to-end security framework at the network layer.
Part of the IPv4 suite of open standards, IPsec is a staple of Virtual Private Networks (VPNs). Protecting and authenticating IP packet data flows between other IPSec-compliant products, it provides data confidentiality, integrity, and authentication between participating peers at the IP layer (layer 3 OSI model or internet layer).
IPsec functions only at the IP layer, making it irrelevant for Layer 2 use cases.
IPsec is the de-facto Internet Engineering Task Force (IETF) standard for network layer (i.e. OSI Layer 3) security. Its saving grace is its ability to provide encryption to multiple protocols via a single negotiation phase.
IPSec uses IP packet encapsulation protocols (AH, ESP) combined with various encryption standards to create a security protocol suite. Governance is then applied, using Security Associations negotiated between network peers. The end result is a suite of authenticated keys and agreed encryption and encapsulation methods that apply various privacy and security concepts to an IP packet. AH means Authentication Header. ESP means Encapsulating Security Payload.
Why IPsec Matters
Why is IPsec so important? Not only is it a required component of the IPv6 protocol standard, but IPsec and its components are referenced in over 60 Internet RFCs! The overall IPSec implementation is guided by RFC 4301, RFC 6040 (Explicit Congestion Notification or ECN), and RFC 7296 (IKEv2). It is mandated as part of the IPv6 structure in RFC 4301 and for IPv6-via-IPv4 tunnels by RFC 4891.
IPsec's value is widely touted because it enables the following features in IP packet transmissions:
- Data Confidentiality (encapsulation): Encrypts packets before transmission.
- Data Integrity: Authenticates packets. Alerts the receiver if the data has been altered during transmission.
- Authentication: The receiver can authenticate the source of the packets. Dependent on data integrity service.
- Anti-replay: The receiver can detect and reject replayed packets.
IPsec does not require these features. Rather, it dictates a framework for applying them. As you will see shortly, which features are utilized in any given connection depends on how IPsec is configured by the peer that originates the secure connection. Mode and Security Association constraints are critical to defining characteristics of any given secure connection.
IPsec Process Flow
IPsec governs how a secure exchange of data may take place between two (2) network peers. It provides a framework to establish a secure Layer 3 network connection between the network peers. IPsec is based on various established security protocols that provide secure data tunnel features and implementation. The IPsec structure coordinates the execution of multiple sets of independent processes. Melded together, they form a cohesive security framework.
- Authenticated keying
- IKEv2
- IP Data Encapsulation
- AH (Authentication Header)
- ESP (Encapsulating Security Payload)
- Security Architecture
- Mode
- Databases (Security Associations, Policies, and Peer Authorizations)
Here is a high-level overview of IPsec's process flow:
- Establish 1st (open) management tunnel between two (2) network peers
- Negotiate 1st Security Association via IKE
- Establish 2nd (secure) management tunnel between network peers per SA
- Negotiate 2nd Security Association via IKE
- Establish data (secure) tunnel between network peers per 2nd SA
- Transmit data back and forth
Implementation
IPSec acts at the network layer, protecting and authenticating IP packets between other IPSec-compliant products. IPSec enables:
- Data Confidentiality — Sender can encrypt packets before transmitting them across a network.
- Data Integrity — Receiver can authenticate packets to ensure their data has not been altered during transmission.
- Origin Authentication — If data integrity functions were utilized by the sender, the data receiver may authenticate the source of the IPSec packets.
- Anti-replay — Receiver may detect and reject repeated packets.
IPsec uses a security protocol called Internet Key Exchange (IKE) to negotiate and create symmetric keys. These keys are used by IPsec for encryption and decryption of the body of the packet and/or the data payload, depending on the connection mode. IPsec uses separate encryption ciphers (e.g. AES, 3DES, etc) versus integrity ciphers (e.g. SHA1, SHA2, etc), and AEAD ciphers that combine both functions into a single algorithm (e.g. AES-GCM).
IPsec defines the requirements of a given secure connection, including suites of security protocols that will be applied during the negotiation and implementation phases. These characteristics are defined in policies called Security Associations, are negotiated by IKE, and managed by two (2) security protocols during data exchanges called Authentication Header (AH) and Encapsulating Security Payload (ESP).
Modes
The first question to ask is which mode makes sense for a particular scenario. IPsec has two (2): Transport and Tunnel.
Transport Mode
In transport mode, only the payload of the IP packet is encrypted and/or authenticated. The IP header is neither modified nor encrypted, but is secured by a hash. Network Address Translation Tranversal (NAT-T) will only work in transport mode under some circumstances, and only if IKEv2 is utilized (which should be used anyway, instead of IKEv1 as the latter is obsolete). If NAT-T in transport mode is a priority for you, I suggest reading RFC 7296, section 2.23 (NAT Traversal) and section 2.23.1 (Transport Mode NAT Traversal) for a detailed discussion of the circumstances under which it can be done.
Tunnel Mode
In tunnel mode, the entire IP packet is encrypted and authenticated. The original IP packet gets encapsulated into a new IP packet with a new IP header. Tunnel mode is normally the default mode for network-to-network and device-to-network communication links. Remote user access, such VPNs connecting employees to a corporate network or users connecting to third-party VPN service providers ordinarily use tunnel mode.
Tunnel mode does support NAT traversal (NAT-T), because the original packet is fully preserved.
Security Associations (SAs)
Here is when the process behind IPsec begins to get confusing.
IPsec is not truly a protocol. It is a framework that provides multiple options for performing network encryption and authentication. Fundamental to IPsec accomplishing its goals are Security Associations.
A Security Association (SA) establishes parameters and describes how a secure connection between two (2) remote network peers will be managed. It is a combination of security protocols (AH, ESP) and an authentication protocol (IKE). You can think of a Security Association (SA) as a blueprint and a contract. Security associations provide governance. They describe the relationship between two or more devices and how those devices will use security services to communicate.
Inside a Security Association
A Security Association is composed of three (3) parts:
- The Security Parameter Index (SPI)
- IP destination address
- IPsec protocol identifier(s): one or both of the AH and ESP protocol types
An SA may include attributes such as:
- Cryptographic algorithm and mode
- Traffic encryption key
- Parameters for the network data connection
Why SAs Are Important
The beginning of an IPsec session between two (2) peers requires a negotiation of the communication's security protocols. Security Associations are used to keep track of all the particulars concerning a given IPSec communication session, such as network encryption and authentication methods agreed to by both peers.
The SA term can be confusing at times, because SAs are not just used by IPSec. For instance, IKE uses SAs to describe security parameters between two (2) IKE device peers.
Negotiating Consensus (IPsec SA Exchange)
The SA identifies the secure communication parameters for a two-way peer-to-peer connection the sender is willing to accept.
Imagine you're seeking to establish an agreement between yourself and another party. You draft a written contract and send it to the other party, asking them to agree to it. You advise them they cannot make any modifications to it. If they want to agree to the contract, they must agree as-is. This is basically what happens between peers when one sends a SA to the other.
A security association is a one-way conversation. That sounds heavy-handed and unrealistic, doesn't it? Obviously, a negotiation process is needed between the peers as part of the process of validating the secure connection. How does that happen if the SA is broadcast as a one-way communication? SAs are handled this way is to prevent the possibility of a third-party from altering the "terms" per se being negotiated.
Let me reiterate this point. The purpose of the Security Association is to establish a form of secure communication. Once that happens, its job is done. The SA ensures both peer devices agree to use the same methodology for the purpose of establishing the secure communication channel.
How can this work if the SA transmission is a one-way street? Any given peer may send and receive an unlimited number of SAs with the same peer or other peers. If a connection is terminated by either peer, the process must begin again if the peers wish to establish a new SA negotiation. SA credentials cannot be re-used.
The IPsec SA process uses a two-step negotiation process to make this work.
Phase One
The SA process begins by transmitting the SA details from one network peer to another via a one-way, non-encrypted logical channel between the two (2) peers. You can think of the SA as a contract negotiation between the peers on how information will be shared between them and will be protected during transit. The SA describes exactly how the secure communication will function, including a framework and protocols. It establishes all of the details by which the communication will proceed. This includes encryption algorithms, keys, mode (transport or tunnel), expirations/lifetime periods, etc. All pertinent details are contained in the SA. It is a set of instructions for creating a secure 2-way channel.
Why is this portion of the process (the beginning phase) carried out via an open communication channel (plain text/non-encrypted data)? Because at this point, an encrypted channel between the peers does not yet exist. Not to worry though; the system is designed with this weakness in mind. And that is why there is Phase 2.
Phase Two
You can think of a security association's first phase as the part where the SA is advertised with a remote peer. The second phase is all about solidifying the agreement between the peers. There are two (2) possible methods of arriving at a successful negotiation: 1) Manual configuration of both peer devices; or 2) Management by another process.
The manual method requires logistics similar to Pre-Shared Keys (PSK's). The security association must be agreed to ahead of time and both network peer devices need to be nodding their heads to each other that they are going to abide by the same rules. This normally makes sense only when the same system administrator or organization has control over both devices and has chosen ahead of time how they will communicate with one another. He/She can then hard-code the IPsec instance on each device to facilitate a quick pairing process and establish a secure connection. This scenario is uncommon.
When a process is needed to manage the SA negotiation, the IPsec standard specifies using the Internet Key Exchange (IKE) protocol to mediate the negotiation. Most of the time, Security Associations are negotiated via IKE.
IKE: Internet Key Exchange
Internet Key Exchange (IKE) is an authentication protocol. It's a little bit weird in the sense it is not just used by IPsec. IKE is both a framework and a stand-alone protocol. For example, IKE may be used to create a VPN (and in fact it's quite widely used for that purpose; especially mobile device VPNs). When it comes to IPsec, IKE is part of the IPsec framework and is used to negotiate and create symmetric keys for encryption and decryption of ESP and/or AH security associations.
So, what is the difference between a protocol and a framework? A protocol is a set of rules. A framework defines structure; a design, guide, or methodology that describes the implementation of one or more protocols.
IKE is a 2-phase hybrid security protocol that implements the Oakley and SKEME key exchange protocols inside the Internet Security Association Key Management Protocol (ISAKMP) framework. Confused yet? IKE is considered a hybrid protocol because it implements a framework (ISAKMP) with two (2) related security protocols (Oakley and SKEME).
There are two versions of IKE: IKEv1 and IKEv2. The current version is IKEv2.
Other publications (i.e. other than this website) sometimes refer to IKE synonymously with ISAKMP (Internet Security Association Key Management Protocol), which is technically incorrect and adds to the confusion. IKE and ISAKMP are not the same thing. IKE incorporates ISAKMP, but there is more to IKE than ISAKMP alone.
IKEv2
IKEv2 implements a portion of two (2) key exchange protocols - Oakley and SKEME - which are both part of another security protocol called, the Internet Security Association Key Management Protocol (ISAKMP). Put another way, IKE is not just a protocol. It is also a security framework composed of an amalgamation of parts of three (3) other security protocols: ISAKMP, Oakley, and SKEME.1 IKE uses Security Associations to negotiate the application of these protocols with a network peer.
IKEv2 is defined by RFC 5282, RFC 5998, RFC 6989, RFC 7427, RFC 7670, and RFC 8247.
Like IPsec's SA process, IKE has two (2) phases:
- IKE Phase 1: An IKE Security Association (SA) is negotiated between the two (2) network peers and a secure tunnel is established
- IKE Phase 2: IKE negotiates IKE a new SA (the Phase 2 IPsec SA) between the two (2) network peers
IKEv2 Phase 1
IKE's processes are divided into two (2) phases. Phase 1 creates a bi-directional tunnel for negotiation and management between the two (2) peers, is responsible for establishing keying material authentication between the peers, and sets up the basic keying material for Phase 2. After IKE Phase 1 authenticates the two (2) network peers, it negotiates the IKE Phase 2 SA parameters.
IKEv2 Phase 1's roles and responsibilities are:
- Authenticating IPSec peers
- Negotiating IKE SAs
- Encryption algorithm
- Hash algorithm (SHA)
- Authentication method (e.g. PSK, Kerberos, etc.)
- D-H material exchange for key generation
- Setting up a secure channel for negotiating IPSec SAs in Phase 2
- Defines authentication, encryption, and hashing types
- Defines Diffie-Hellman group and key lifetime
IKEv2 Phase 2
In Phase 2, IKE negotiates IPSec's Phase 2 SA parameters and sets up matching IPSec SAs in the peers (one for outbound communications and one for inbound). At the end of Phase 2, both peers hold a symmetric shared key for IPsec.
IKEv2 Phase 2's roles and responsibilities are:
- Creates two (2) uni-directional tunnels used for end-user packets encryption/decryption
- Negotiate IPsec security protocols
- ESP and/or AH
- Hash algorithm (SHA)
- Encryption algorithm (if required)
IKEv2 Phase 2 and PFS
If PFS (Perfect Forward Secrecy) is not utilized, in Phase 2 IKE re-uses the same cryptographic key agreement used in Phase 1. When PFS is selected, a new symmetric key is generated at the beginning of Phase 2.
IKE and IPsec Security Associations
Right about now you may be pondering if the 2-phase Security Association process for IPsec and the 2-phase IKE SA processes are related. Yes, they are. IPsec and IKE's SA processes are intertwined. IKE also uses SAs, and IPsec relies on IKE to initiate IPsec's SA Phase Two process. Their order of operation looks like this:
- IPsec SA Phase 1 negotiation
- IKE SA Phase 1 negotiation
- IKE SA Phase 2 negotiation
- IPsec SA Phase 2 negotiation
ISAKMP
Internet Security Association Key Management Protocol or ISAKMP is a framework that governs Security Association (SA) negotiations. It provides the framework for authentication and key exchange, but does not define the actual key exchange.
During IKE Phase 2, ISAKMP determines the network peers agreement on particular security policies for protecting the data flow. ISAKMP does not perform implementation. It defines the procedures (the agreement's characteristics) and how packets will be transformed during implementation. You can think of this part of the process as transforming Security Association policies into a procedure that will be followed by the sending and receiving implementation processes. ISAKMP creates step-by-step instructions for those processes to follow.
ISAKMP determines:
- Payload authentication process — AH, ESP, both, or none
- Payload encryption — will ESP be applied?
- IPSec mode (transport or tunnel?)
Oakley + SKEME Methodology
As noted above, Oakley and SKEME are used to handle key agreement negotiations. There are three (3) components in the Oakley and SKEME key determination protocol:
- Cookie exchange (optionally stateless)
- Diffie-Hellman half-key exchange (optional, but required for perfect forward secrecy)
- Authentication (optionally applying anonymity and confidentiality)
IKE Phase 1 has two (2) operating modes: Main and Aggressive. DO NOT USE Aggressive Mode for IPsec connections that traverse the Internet. IKE's aggressive mode transmits security keys in the clear (plain text). This places the connection at substantial risk of compromise from a Man-in-the-Middle attack vector.2
Oakley
Oakley Key Determination (Oakley) is a security protocol. Oakley and the SKEME security protocol work together to define key exchange techniques.
Oakley is defined under RFC 2142. Some characteristics of Oakley are:
- Authenticated key exchanges with and without EAP.
- Up to 1024-bit Diffie-Hellman keys
- Permits Pre-Shared Keys (PSKs), Secure DNS Public Key Infrastructure (DNSSEC PKI), RSA based PKI certificates, and X.509 certificates
SKEME
SKEME is a secure key exchange protocol for key management over the Internet. The protocol supports key exchange based on public key, key distribution centers, or manual installation. SKEME may (optionally) provide perfect forward secrecy and allows for the negotiation of underlying cryptographic primitives. IKEv2 uses SKEME in conjunction with Oakley to define key exchange techniques.
SKEME provides anonymity and integrity.
IPsec Cryptography
Acceptable Security Association cryptography is rarely explained properly in articles and discussions of IPsec.
IPsec itself can use various ciphers and algorithms, and uses separate encryption ciphers (AES, ChaCha20, 3DES, etc) versus integrity ciphers (e.g. SHA1, SHA2) mode or it can use an AEAD cipher that combines these two into one (AES-GCM). However, other than the distinction of cipher types/purposes, IPsec is governed not by its own set of security algorithms, but rather the components that make up IPsec's functions dictate which ciphers are acceptable or not. For instance, IKE, ESP, and AH all bring their own sets of cryptographic constraints to the table. The net result is if a device is incapable of supporting the encryption methods required by IPsec's underlying protocols, then IPsec will not be an option for that device.
Some recent IETF RFC's narrow down the list of appropriate ciphers, relative to IPsec. For example, RFC 8221 was released in October 2017 and replaced RFC 7321 (2014), which in turn replaced RFC 4835 (2007), which in turn replaced RFC 4305 (2005), which in turn.... See where I'm going with this? Cryptography is evolutionary.
Authentication Header (AH)
The Authentication Header security protocol (AH) provides authentication, integrity, and anti-replay for the entire packet (yes, both the IP header and the data carried in the packet). AH signs the entire packet with a keyed, one-way hash function. This process creates a result (called a digest) that is sent to the receiver along with the IP packet. If any part of the datagram is changed during transit, it will be detected by the receiver when it performs the same one-way hash function on the datagram and compares the value of the message digest that the sender has supplied. The one-way hash also involves the use of a secret shared between the two systems, which means that authenticity can be guaranteed (integrity).
AH does not encrypt the data, so it does not provide confidentiality. You can read the data, but you cannot modify it. AH uses HMAC algorithms to sign the packet.
AH can also enforce anti-replay protection by requiring a receiving host to set the replay bit in the header, indicating the packet was received.
NAT Not Available with AH
It's worth noting AH and NAT functions are incompatible. NAT changes to an AH-protected packet will change the source IP address, which changes the AH header, causing the packets to be rejected by the IPSec peer. Modifying any IP address in the header leads to a different hash computation on the receiving end, indicating the packet has been tampered with, and resulting in IPsec rejecting it.
NAT Traversal (NAT-T)
AH breaks Network Address Translation Traversal (NAT-T) function as well, because the IP addresses in the header cannot be modified without making it appear the packet has been modified in transit (such as during a Man-in-the-Middle attack). IP address changes will be detected due to the fact AH hashes the IP header. If you must have NAT-T, to get around this problem use ESP instead.
AH Process Flow
AH works as follows:
- The IP header and data payload is hashed.
- The hash is used to build a new AH header, which is appended to the original packet.
- The new packet is transmitted to the IPSec peer.
- The peer router hashes the IP header and data payload, extracts the transmitted hash from the AH header, and compares the two hashes. They must match exactly.

AH in Transport Mode
In transport mode, AH protects the external IP header along with the data payload. AH protects all the fields in the header that do not change in transport. The AH hash is stored after the IP header and before the ESP header (if present), and other higher-layer protocols.

AH in Tunnel Mode
In AH tunnel mode, the entire original header is authenticated, a new IP header is prepended. Just as in transport mode, AH protects the entire packet including the new IP header.
Encapsulating Security Payload (ESP)
Encapsulating Security Payload (ESP) is a security protocol that encrypts the upper-layer protocols in transport mode and the entire original IP packet in tunnel mode so that neither are readable while the datagram is in transit. ESP can also provide authentication for the packet.
ESP provides:
- Confidentiality (encryption)
- Data origin authentication
- Data integrity (data received has not been tampered with)
- Limited traffic flow confidentiality (it hides the underlying type of IP traffic) by defeating traffic flow analysis
- Anti-replay service (the packet is not a repeat) [optional feature]
Here are some tips to better understand when ESP is useful (or not):
- Confidentiality (encryption) may be selected independently of all other services
- Encryption is performed at the IP packet layer
- In IPsec Tunneling mode, the entire packet is protected
- In IPsec Transport mode, ESP only protects the data
- Integrity is not provided for the IP header (addressing)
ESP Encryption/Authentication Process Flow
When both authentication and encryption are selected, encryption is performed before authentication. ESP can also provide packet authentication with an optional field for authentication. Authentication is calculated after encryption is processed. The current IPSec standard specifies HMAC-SHA-1 as the minimum (default) algorithm for ESP authentication.
ESP in Transport Mode
In transport mode, the IP payload (data) is encrypted and the original headers are left intact. The ESP header is inserted after the IP header and before the upper-layer protocol header. The upper-layer protocols (e.g. TCP) are encrypted and authenticated along with the ESP header. ESP does not authenticate the IP header itself.

ESP in Tunnel Mode
When ESP is used in tunnel mode, the original IP header is protected because the entire original IP datagram is encrypted. The ESP authentication mechanism includes the original IP datagram and the ESP header; however, the new IP header is not included in the authentication hash.

AH vs. ESP: Which One Should I Use?
Deciding whether to use AH or ESP in a given situation might seem complex, but it can be simplified to a few rules.
- If you must have Integrity, but Confidentiality is not required, use AH.
- If you must have Confidentiality (encryption; privacy), use ESP.
- If you require NAT-Traversal, use ESP in Transport mode and do not use AH.
- Either will suffice if you desire anti-replay protection (ensuring packets are not repeated in transit).
AH's strengths are authentication and integrity, but AH cannot offer confidentiality (data encryption).
ESP on-the-other-hand, always provides confidentiality (data encryption), but other features are optional.
Remember, you may use both protocols at the same time.
Authentication (AH vs. ESP)
The main difference between the authentication provided by ESP and that provided by AH is their scope of coverage. Specifically, ESP only protects IP headers when those fields are encapsulated by ESP and IPsec is operating in tunnel mode. AH always authenticates the entire packet.
Applying Both AH and ESP Together
A number of protective benefits are gleaned by combining AH and ESP together to form a whole that is greater than the sum of its parts. Naturally, this comes at a cost: additional processing time. However, if confidentiality and/or especially authentication are premiere concerns, it may be worth the additional resources.
The Case for AH

AH excels at authentication. The AH authentication hash safeguards the entire packet, including itself.
Meanwhile, ESP's authentication hash is not quite as clever. It only protects the data ESP has encapsulated between its header and tail (inclusive). ESP does not incorporate its authentication hash into itself, while AH does. This limits the portion of the packet to which authentication is applied. Of course, there are benefits to ESP's method as they serve different purposes.
The Case for ESP

Notice how ESP's authentication process only examines the data inclusive of ESP's header and tail. This opens up the possibility of using NAT Traversal (NAT-T), whereas any type of NAT will cause the packet to be rejected if AH is used. Also, AH has no confidentiality (encryption) function. So, if the payload must be encrypted and NAT is required, it's clear AH cannot be used. If authentication is crucial, NAT is not needed, and confidentiality is less important, then AH alone may be a viable choice.
The Case for Both
What if you need full authentication and confidentiality (encryption)? Thankfully, as long as you don't require NAT-T, they may be used together to gain the best of both worlds. When that happens, the whole packet is still authenticated under AH, and the data payload enjoys an independent layer of authentication as well, via ESP.

NAT-T Via UDP Header
Network Address Translation Traversal (NAT-T) is possible by wrapping the ESP packing within UDP and using IPsec tunneling mode. This results in a UDP header inserted between the encapsulating IP header and the ESP header, as shown below.

This technique is codified in RFC 3948 (2005).
Describing how NAT-T works and how to implement it is beyond the scope of this document. If you need help with that, I recommend you review RFC 3715 and/or Encapsulating IPsec ESP in UDP for Load-balancing.
Cryptographic Algorithms for AH and ESP
RFC 8221 (October 2017) updated the cryptographic algorithms defined for use with ESP (Encapsulating Security Payload) and AH (Authorization Header) encryption methods in IP datagrams. While IPsec is not singled out by RFC 8221, but some of its components are. Among other changes, MD5 and DES encryption algorithms were finally removed from the list of acceptable ciphers.
The encryption standards for IPsec change from time-to-time based when the protocols it depends upon change their standards. Currently, the following algorithms are acceptable for encryption in Security Associations (AH and ESP):4
- AES (default)
- ChaCha20
- Poly1305
The current supported encryption options for AH and ESP also approved for IPsec and IKE are:
- HMAC-SHA1/SHA2 for integrity protection and authenticity
- AES-CBC or TripleDES-CBC (3DES-CBC) for confidentiality (encryption)5
- AES-GCM provides confidentiality and authentication
- ChaCha20 or Poly1305 for confidentiality and authentication
- *Note: ChaCha20 + Poly1305 when implemented together provide confidentiality, authentication, and integrity 6
As I've said before, IPsec is a confusing standard. ChaCha20 and Poly1305 are not mentioned in RFC 8221. However, these AEAD algorithms are explicitly mentioned in RFC 7634 (yet another RFC related to IPsec and IKE). The vast quantity of IETF RFC documents pertaining to IPsec is daunting, and one must practically embark on a treasure hunt to discover them all.
Encryption (ESP) + Authentication (AH and ESP)
Several RFCs set cryptographic standards for portions of SAs, such as AH (Authorization Header) and ESP (Encapsulating Security Payload) based functions (relating to the IP header and data payload, respectively). As an example, here's a summary of what RFC 8221 specifies regarding encryption for ESP:
- Encryption must be authenticated (i.e. can't have peanut butter without your jelly)
- You have three (3) options for encryption + authentication
- ESP + Authenticated Encryption with Associated Data (AEAD) cipher3
- ESP with a non-AEAD cipher + Authentication (via AH)
- ESP with a non-AEAD cipher + Authentication (via method other than AH)
We can distill this down a bit further to make the concepts clearer from an implementation standpoint:
- ESP + some acceptable encryption method that also performs authentication
- ESP + separate encryption method + an authentication method
- ESP + AH where AH provides authentication
Remember, ESP stands for Encapsulating Security Payload. So, thinking about its name, Payload tells you it applies to the IP Packet. In fact, ESP does not do anything with the IP header. That is AH's job.
Diffie-Hellman (D-H)
Diffie-Hellman (D-H) is an anonymous key agreement algorithm used to create encryption keys between two communicating parties. It allows the generation of symmetric keys without explicitly sharing them (though the key is often referred to as a "shared secret"). D-H provides encryption only. It does not provide PFS in and of itself. D-H keys do not provide authentication of the key generation participants, which makes it vulnerable to Man-in-the-Middle attacks. It is designed for short-term shared key creation. The IPsec and IKE standards call for specific D-H groups from which keys may be generated.
D-H is useful because mathematically, it allows both parties to derive a symmetrical secret key without sharing it explicitly. Though not required by IPsec, the resulting symmetric key may be used to facilitate Perfect Forward Secrecy (PFS).
Each Diffie-Hellman group is an algorithm formula. As of this writing, there are 24 D-H groups. The distinguishing characteristics between D-H groups are:
- Type of algorithm (e.g. modulus MODP or Elliptic Curve)
- Number of bits used to calculate the key
Diagrams
When IPsec is used to create a VPN, even though it is functionally a Layer 3 link, it is capable of acting like a Layer 2 link. IPsec may forward any IP traffic in Transport mode because the entire IP packet is packaged and securely transmitted. This makes IPsec more versatile than most Layer 3 VPN protocols.
This section provides illustrations to demonstrate the concepts described above. These diagrams demonstrate how IPsec packets look when transmitting encapsulated TCP/IP packets over TCP or UDP.
Authentication (AH Only)
Authentication without encryption (Authorization Header).
Normal TCP/IP Packet

Authorization Header (AH) Authentication | TCP | Transport Mode

Authorization Header (AH) Authentication | TCP | Tunnel Mode

UDP Encapsulated Authorization Header (AH) Authentication | Tunnel Mode

Encryption and Authentication (ESP Only)
Authentication with encryption (Encapsulating Security Payload only).
ESP | Transport Mode

ESP | Tunnel Mode

UDP Encapsulation of ESP | Tunnel Mode

Authentication And Encryption (AH + ESP)
Authentication with encryption (Authorization Header + Encapsulating Security Payload).
AH and ESP | TCP | Transport Mode

AH and ESP | TCP | Tunnel Mode

UDP Encapsulation of AH and ESP, Tunnel Mode

Footnotes
1 See RFC 7296, RFC 7427, RFC 7670, and RFC 8247.
2 IPsec VPNs using "Aggressive Mode" settings send a hash of the PSK in the clear. Among other risks, this fact is apparently exploited by the NSA using offline dictionary attacks. See https://nohats.ca/wordpress/blog/2014/12/29/dont-stop-using-ipsec-just-yet/
3 An Authenticated Encryption with Associated Data (AEAD) is just a fancy way of saying "combined mode cipher," which is a cryptographic cipher that handles encryption/decryption and authentication in a single step.
4 For non-AES cipher reference, see RFC 7634.
5 Whether or not 3DES remains an approved cipher is a gray area. It offers encryption (confidentiality) only. Avoiding it is strongly recommended. If necessary, review RFC 8221 for more information.
6 See RFC7539.