Before delving into the topic, I want to say a huge thanks to Chris Buechler (co-founder of pfSense and current principal engineer at Ubiquiti Networks) and Clinton Campbell (guest lecturer at University of Washington) for helping me improve the quality and accuracy of this post.
Almost every internet user has heard of or somehow used VPN before. We can use VPN for hiding our IP address so that we can surf the internet pseudo-anonymously (it is not entirely anonymous because your VPN provider still knows about you, but your ISP does not. After all, how can you be truly anonymous if you are on the internet?). We can also use VPN for protecting our web traffic from government censorship so that we can access any website even in certain restricted regions. For company employees, they can use VPN for safely sharing private data in remote locations through the public network. The benefits of using VPN to surf the internet are numerous and amazing. However, there are also two noticeable downsides. Once we connect to VPN server, our internet speed and bandwidth are directly affected by the physical location and capability of the server. Another downside is that big-name VPN providers tend to attract all kinds of bad actors. If those people perform something bad with the IP address provided by the VPN service (such as insulting someone on Reddit), this IP address will be blacklisted by the Reddit server. When we, the good actors, connect to this IP address and try to post something on Reddit, we simply got blacklisted as well.
In most cases, we just need to download software and a VPN connection is magically established. What is VPN really? How does it really work? Why is it so secure? Is it perfectly secure? In this post, I will try to answers those questions, and hopefully you, as VPN users, can gain a more basic understanding of the service you have been using.
What is VPN?
VPN stands for Virtual Private Network. To better understand this concept, we can divide this terminology into two parts: virtual and private network. Virtual means not physically existing, but made by software to appear so. A private network is the opposite of a public network. Specifically speaking, a private network is a non-routable network that mostly consumes a private IP address (it does not have to be a private IP address, but this is the most common circumstance due to IPv4 scarcity) and thus cannot be reached on the internet. It is generally much more secure than a public network because a private network typically restricts connections to certain groups of people such as company employees, and data transferred on the private network cannot be easily snooped by hackers sitting on the public network. So together virtual private network is a technology that allows devices on the public network to securely connect to devices on a private network. Data is actually still transferred on a public network, but the technology makes the connection so secure as if the data is transferred on a private network.
Different Types of VPN
Generally speaking, we can categorize VPN into two types based on how the connection is established.
Remote Access VPN
Remote access VPN is established when a single device connects to a remote network. This is probably the case for most VPN home users. We download VPN software on our smartphones or computers, log in with our credentials, and select a VPN server to connect. Remote access VPN can also benefit company employees who need access to company resources when working at home.
Site-To-Site VPN
Site-to-site VPN is common for companies that have offices in different geographical locations. It connects a network of one office to a network of another, which allows two different offices to share private resources securely over the internet. Site-to-site VPN largely differentiates itself from remote access VPN in that it creates a private link between two different networks instead of a single device and a network.
Different Types of VPN Protocols
The history of VPN protocols can be traced back to 1996 when a Microsoft software engineer named Gurdeep Singh-Pall invented PPTP (Point-to-Point Tunneling Protocol). As people are becoming more and more concerned with their online privacy and security, a number of VPN protocols are recently devised. In this post, I will focus on VPN protocols that are recently created and widely used today.
To create a secure VPN connection, generally speaking, there are three components that need to be considered: key exchange, tunneling, and encryption. Some protocols only take care of one component. Some protocols take care of two or more. So the actual implementation of VPN might require multiple protocols to ensure the connection is secured.
IPsec (Internet Protocol Security)
IPsec is an extension of IP (Internet Protocol) and is comprised of multiple protocols. It secures VPN connection by ensuring data confidentiality and integrity. Data traveling across the internet is confidential because the content is encrypted and turned to ciphertext. If a packet is snooped by hackers in the middle, all they can see is a bunch of gibberish rather than the actual content. Integrity ensures no packet can be manipulated by unauthorized parties during the transmission. This is achieved by hashing original data on the sender side and re-hashing received data and comparing results with the received hash on the recipient side. IPsec also provides authentication between two different hosts (e.g. VPN client and server) by requiring them to prove their identity first. Before communication happens, two hosts must establish a SA (Security Association) which is negotiated by IKE (Internet Key Exchange) protocol. Host computers will then modify every packet sent to each other based on SA parameters. IPSec comes with two modes: transport mode only secures packet payload by encrypting content; tunnel mode encapsulates the entire packet in a new packet.
IKE (Internet Key Exchange)
IKE protocol is typically used in conjunction with IPsec to produce a symmetrical key known as shared secret (Diffie-Hellman key) for two communicating parties (VPN client and server). The key will be used to encrypt and decrypt packets. The outcome of an IKE negotiation is SA (Security Association) which defines a number of parameters for securing network traffic.
L2TP (Layer 2 Tunneling protocol)
The main purpose of L2TP is to tunnel private data traveling across the untrusted public network by encapsulating a packet inside another packet. The source and destination IP addresses are often modified during the process of encapsulation. The packet appears to the public as a normal packet, but it actually carries a private and sensitive payload. However, the protocol itself does not provide any encryption mechanism. Therefore, it is often used in conjunction with IPsec to provide VPN security.
SSL/TLS (Secure Socket Layer/Transport Layer Security)
We have all viewed websites that are protected by SSL/TLS. We can verify a trusted website if the connection is HTTPS, and it has a green and secure icon on the left of the browser address bar. SSL/TLS is an encryption protocol that guarantees data confidentiality. In the OSI model, SSL/TLS takes place at the application layer (some people also say it takes place at the session layer). It protects the connection between a web browser and a web server. Unlike IPsec VPN, SSL/TLS VPN does not require users to install specialized client-side VPN software because most web browsers nowadays support SSL/TLS by default.
How Does VPN Work?
To better understand how VPN really works, I have drawn diagrams to show the life of an IP packet with and without VPN.
What is an IP Packet?
To understand IP packet, we need to talk about PDU (Protocol Data Unit) first. PDU is a unit of data specified in the protocol of a given OSI layer. It is a relevant concept in terms of OSI layers. In layer 3 (network layer), PDU is often called packet, but in layer 2 (data link layer), it is called frame. Generally speaking, any data such as files and images is broken down into smaller chunks called packets during transmission. This ensures efficiency for routing because each individual packet can travel different routes to reach a designated destination.
Life of an IP Packet Without VPN
Step 1: the user opens the webpage whatismyipaddress.com and sends a request to the whatismyipaddress.com server. The packet is initiated from the user’s PC so the packet has source IP address 10.19.190.132, which is the private IP address of the user’s PC (assuming the router is using DHCP to assign private IP address 10.19.190.132 to the user’s PC). Because this request is sent to the whatismyipaddress.com server. So the packet has destination address 128.95.120.1 (assuming whatismyipaddress.com DNS record is cached locally in the user’s PC).
Step 2: when the packet from the user’s PC arrives at the router, the router modifies the packet’s source IP address and replaces it with its own public address 205.175.106.121 before it sends out the packet (sometimes source port is also modified in case of one-to-many relationship). How does the router know which connected devices this packet belongs to if its source IP address and port are changed? The answer is NAT (Network Address Translation) protocol. The router will keep a NAT forwarding table that maps IP addresses on an internal network to IP addresses to an external network. For the case in the diagram, the router might generate a record like this (simplified version):
When a packet with the destination IP address 205.175.106.121 and port 1234 arrives at the router from another router, the router will look up the NAT forwarding table, and forwards that packet to the connected device with private IP address 10.19.190.132, which is the user’s PC. It is also important to know that on this step packet actually gets routed to ISP directly without going through any other routers (this can be verified by using the traceroute command). Our connection to the internet strictly speaking starts at our ISP. In another word, we reach the world internet through ISP.
Step 3: the diagram simplifies the journey of the packet from ISP to the whatismyipaddress.com server. In reality, the packet usually needs to go through multiple routers to reach the server. The trip from one router to another is called a hop. So the packet needs to take multiple hops. There might be millions of routers between ISP and whatismyipaddress.com server, which means there might be millions of paths for the packet to take. How does a router know which path is the fastest? The answer is a routing table. It is typically stored in a router and contains rules for determining the best route.
Step 4: this step is very similar to step 3 but in the opposite direction. The source and destination IP addresses are switched, and the packet contains payload sent from the whatismyipaddress.com server.
Step 5: ISP sends the packet directly to our home router.
Step 6: when the packet from ISP reaches its destination, the user’s home router, the router will look up the NAT table to forward this packet to a correct connected device (the user’s PC). The packet is modified again. But this time, the destination IP address is replaced with the private IP address of the user’s PC 10.19.190.132 based on the record found in the NAT table. the packet gets forwarded correctly to the user’s PC based on the modified destination address in step 5.
The diagram above illustrates the life of an IP packet without VPN. It explains why whatismyipaddress.com shows an address that is different from our PC’s address because the address displayed on whatismyipaddress.com is actually the router’s public address rather than our PC’s private address. The diagram also exposes vulnerabilities of the internet such as privacy and security. If a man in the middle intercepts our packet in any step, the hacker would know where this packet comes from and what payload it contains (if we are not viewing the HTTPS website). This is where VPN comes to the rescue.
Life of an IP Packet With VPN
Step 1: if the user’s computer has client-side VPN software installed, the packet will be encapsulated first. Encapsulation is really just a process of wrapping the original packet inside another packet created by client-side VPN software. Everything in the original packet is encrypted by a VPN client and can only be read by a VPN server. To the public, this is still a valid packet and will be routed as normal. Noted that the destination address becomes the VPN server’s IP address 206.189.234.44.
Step 2: this step is very similar to step 2 in the case without VPN. Source IP address is replaced with the router’s public IP address and NAT record is generated to keep track of the associations.
Step 3: ISP is responsible for sending this packet to the designated destination: the VPN server. Best routes will be dynamically selected based on routing protocols and routing tables.
Step 4: once the VPN server receives the packet, it unwraps the outer packet and then uses its symmetrical key to decrypt the content of the inner packet (original packet). Next, the VPN server sends the packet to where it is destined: whatismyipaddress.com server.
Step 5: this step is the same as step 4 in the case without VPN.
Step 6: the VPN server encrypts the packet payload and encapsulates the packet before sending it out.
Step 7: this step is the same as step 5 in the case without VPN.
Step 8: this step is very similar to step 6 in the case without VPN. Once the user’s computer receives this packet, the client-side VPN software will unwrap it first and then decrypt the inner packet with its symmetrical key.
The diagram above illustrates the life of an IP packet with VPN. As we can tell from the diagram, VPN adds two important security factors: packet encapsulation and encryption. Most of the time, the original packet is hidden due to encapsulation. If we visit whatismyipaddress.com with VPN, it would show 206.189.234.44 (IP address of the VPN server) rather than 205.175.106.201 (IP address of the router). This the reason why VPN allows users to surf the internet anonymously. VPN also helps us bypass government censorship on certain websites because it hides our packets’ intended destination address and replaces it with our VPN server’s IP address. For example, our ISP could prohibit us from visiting whatismyipaddress.com by dropping all traffic that has destination address 128.95.120.1 (IP address of whatismyipaddress.com server). If we have a VPN on, we would still be able to visit the website because our ISP would never see our traffic with destination address 128.95.120.1 (IP address of whatismyipaddress.com server). Instead, it sees 206.189.234.44 (IP address of the VPN server). Therefore, our traffic will not be blocked by ISP. Furthermore, VPN dramatically enhances our online security. If our packets are intercepted at step 1, 2, 3, 6, 7, or 8, the hackers would not be able to read our privacy information because they could not unwrap the packets. Even if they do manage to penetrate the VPN tunnel, unwrap the outer packets, and access the original packets, all they would see is a bunch of gibberish because they do not have the keys to decrypt the content.
Is VPN Perfectly Secure?
If we pay for VPN service, does it mean we will never be vulnerable to malicious attacks? Well, the answer is that nothing is perfect. Depending on how VPN is set up, the level of security might be different. The diagram above demonstrates remote access VPN. Although it helps us hide our identities, bypass government censorships, and protect our traffic from most malicious attacks, this type of VPN setup still has some security flaws. For example, hackers can still intercept our packets at step 4 or 5 in the diagram. Traffic traveling between the VPN server and whatismyipaddress.com server is actually not protected by VPN, and thus IP packets are not encapsulated and encrypted. If we are online shopping at a non-secure website (websites that use normal HTTP connections are generally considered as non-secure and highly vulnerable to malicious attacks such as snooping/interception), hackers can still intercept our traffic and illegally obtain our credentials and personal information. Therefore, it is not recommended to share our personal information with non-secure websites even if we are using VPN. The last point I want to make might sound a little bit scary. Shopping at an HTTPS website only means we are connecting to a legitimate server that owns the associated certificate, however, there is no way for us to know what the server is going to do with our payment information. It may send it to a third-party payment processing company via an unprotected connection. Who knows?
References
“A Brief History of VPNs | Golden Frog.” Golden Frog Blog, 22 June 2016, www.goldenfrog.com/blog/brief-history-of-vpns.
Eaton, Nick. “Should Microsoft Have Patented Its VPN in the ‘90s?” The Microsoft Blog, 13 Mar. 2010, blog.seattlepi.com/microsoft/2010/03/13/should-microsoft-have-patented-its-vpn-in-the-90s/.
“IPsec & IKE.” Configuring Remote Access VPN, sc1.checkpoint.com/documents/R77/CP_R77_VPN_AdminGuide/13847.htm.
“What Is Internet Key Exchange (IKE)? — Definition from WhatIs.com.” SearchSecurity, searchsecurity.techtarget.com/definition/Internet-Key-Exchange.