Tunneldigger, the new VPN solution SL

wlan slovenia network grew rapidly over the year and we started deploying more and more TP-Link routers as they were a cheap and powerful alternative. The problem is that they have less free flash storage space and cannot hold OpenVPN installation, which we were using for establishing tunnels over the Internet where direct wireless connectivity is not available. This meant that another solution for establishing tunnels was needed and we started looking into existing solutions (#149, #919). For some time we were trying n2n, but gave up because it was not reliable enough and had some problems with MTU on some links.

Our core requirements were ease of deployment and high performance with limited overhead from context switching between kernel- and user-space. The later was one additional issue we had with both OpenVPN and n2n. That for many nodes deployed on FTTH (fiber) the limiting factor for tunnel throughput was CPU and not Internet connectivity. This called for an in-kernel solution and after consulting with the Linux kernel mailing list, we decided that L2TPv3 was the way forward. Confidentiality of communications was not a requirement, since the whole network is open anyway and openly transmitted over wireless, so we could further remove the overhead of cryptographic operations.

As we soon realized, the missing component was brokerage of tunnels. An open source broker was available for L2TPv2, but version 2 only supported transfer of PPP sessions and we needed a real L2 solution. Version 3 supports something called an "ethernet pseudowire" that was exactly what we needed, but there was no open broker/client combination that supported it. So we decided to create our own and so Tunneldigger was born. Read on for a quick overview of the architecture.

Our solution consists of a simple control protocol that is used for negotiating new tunnel parameters such as ports and PMTU. Port negotiation is needed because deployed nodes are usually hidden behind network address translators and so external source ports are rewritten. The control protocol is not compatible with L2TPv3, because we only needed a few simple messages.

Both, the control and the data messages go over the same UDP socket via the new kernel L2TPv3 API which uses netlink sockets as the IPC interface. This interface is used to configure new tunnels and establish sessions that create virtual ethernet interfaces. Two-way periodic PMTU measurements are performed over the control channel to automatically detect the proper PMTU values to configure the interfaces with. Using the same channel for testing PMTU increases probability that the data packets will traverse the same path and will therefore encounter the same link MTUs.

The kernel implementation of L2TPv3 also has a limitation, that it cannot establish two tunnels on the same UDP port, even when they have different tunnel identifiers assigned. But we needed this functionality as nodes are often behind firewalls that only allow UDP packets to port 53 (and replies from the same address). Therefore we added some NAT trickery to the broker-side of the connection, so tunnels are internally bound to different ports, but from the outside they use the same port, packets being rewritten based on the source/destination IP address and port. This allows us to more easily traverse firewalls.

After some playing around and lots of debugging, we finally came up with a solution that is currently deployed in production on many nodes in the network. Performance and network-compatibility wise it has shown itself to be a good solution, much better that anything we have tested before. Development is done on GitHub and if you have any questions, feel free to ask on our development mailing list.