ZeroTier 2.0 Status

ZeroTier 2.0 is relatively close, so we want to take this opportunity to announce some of the changes and improvements that are coming.

Version 2.0 is a major milestone release, hence our first major version increment. It will also be a “breaking release” for a few people. Version 2.0 nodes can still talk to 1.x nodes, but the command line interface and local service API are changing enough that some users with deployment and administration scripts will need to make relatively minor changes. We are using 2.0 as an opportunity to improve and generally clean up aspects of the ZeroTier user experience.

Making Development Go Faster

This version brings a major change under the hood to how the ZeroTier network virtualization service is built. While the core network hypervisor part of ZeroTier and its main packet I/O paths remain in C++ for performance and other reasons, the code that configures and controls these parts and provides higher level management, API, and CLI interfaces has been rewritten in Go.

The simple fact is that Go is a much more productive language than C++. In our subjective experience the difference in developer productivity can be as high as 2-3X. We have a long backlog of ideas and feature requests that relate not to the core ZeroTier protocol but to improved tooling for management, deployment, and troubleshooting. This sort of stuff will just be much easier to develop in Go than C++. We are a small team, so things that improve our development velocity matter quite a bit.

Go also makes it much easier for us to support optional features at the protocol and transport layer. The core ZeroTier protocol uses UDP for its data traffic and that will remain close to the metal, but we’ve often entertained supporting other transports for increased compatibility and coverage of the application space. The most prominent of these are web sockets (TCP) and WebRTC data channels. The Go ecosystem offers rich and mature libraries for this, while C/C++ offers both fewer choices and a lot of inconvenience around the inclusion of such dependencies.

Dragging in Go does bring two downsides, but we think they are minor.

One will be a slight increase in memory consumption. From our tests we don’t expect it to be more than 20-30mb, which won’t even be noticed on the vast majority of even smaller devices. The only devices that may have issues are really tiny routers, and for those it may make sense to develop a super-minimal reduced feature set C++-only client. We may do this in the future if there is demand.

The second downside may be a slight loss of ability to support very old targets. The most impactful of these will be 32-bit Windows since we are not sure if CGo (Go’s system that allows us to link to our C bits) will work properly on 32-bit Windows systems. We’ve considered dumping 32-bit Windows support for a while, and this may force us to do so. Support for really ancient Linux distributions and for minority and experimental operating systems may also suffer.

We think the upsides of a larger ecosystem and much faster development strongly outweigh these downsides.

When this article gets circulated, commenters will certainly ask why we didn’t use Rust. While Rust is better than C++ in many ways, it isn’t substantially more productive compared to C++ in the hands of experienced C++ developers (like ourselves). The thing that overcame our general reluctance to drag in another language was Go’s ability to improve developer productivity, so Go was our choice. If we ever decided to rewrite the ZeroTier core network hypervisor and I/O path we might consider Rust there due to its superior security features. Rust feels like the spiritual successor to C++, while Go feels like a somewhat different animal geared toward rapidly developing high quality servers, interfaces, APIs, and management backends.

Interface and UI Improvements

The command line interface for ZeroTier will be renamed from zerotier-cli to just zerotier (packages will add a symbolic link) and is being redesigned to some extent to offer an improved user experience. Here’s a current draft (not final) of its help output:

Usage: zerotier [-options] <command> [-options] [command args]

Global Options
  -j                                   Output raw JSON where applicable
  -p <path>                            Connect to service running at this path
  -t <authtoken.secret path>           Use secret auth token from this file

Commands:
  help                                 Show this help
  version                              Print version
  service [path]                       Start in system service mode
  peers                                Show VL1 peers
  roots                                Show VL1 root servers
  addroot <type> [options]             Add a VL1 root
    static <identity> <ip/port> [...]  Add a root with a set identity and IPs
    dynamic <name> [default locator]   Add a dynamic root fetched by name
  removeroot <type> [options]          Remove a VL1 root
    static <identity>                  Remove a root with a set identity
    dynamic <name>                     Remove a dynamic root fetched by name
  networks                             Show joined VL2 virtual networks
  join <network ID>                    Join a virtual network
  leave <network ID>                   Leave a virtual network
  status                               Show ZeroTier service status and config
  show <network ID>                    Show verbose network info
  set <network ID> <option> <value>    Set a network local config option
    manageips <boolean>                Is IP management allowed?
    manageroutes <boolean>             Is route management allowed?
    globalips <boolean>                Can IPs in global IP space be managed?
    globalroutes <boolean>             Can global IP space routes be set?
    defaultroute <boolean>             Can default route be overridden?
  set <local config option> <value>    Set a local configuration option
    phy <IP/bits> blacklist <boolean>  Set or clear blacklist for CIDR
    phy <IP/bits> trust <path ID/0>    Set or clear trusted path ID for CIDR
    virt <address> try <IP/port> [...] Set explicit IPs for reaching a peer
    port <port>                        Set primary local port for VL1 P2P
    secondaryport <port/0>             Set or disable secondary VL1 P2P port
    tertiaryport <port/0>              Set or disable tertiary VL1 P2P port
    portsearch <boolean>               Set or disable port search on startup
    portmapping <boolean>              Set or disable use of uPnP and NAT-PMP
    explicitaddresses <IP/port> [...]  Set explicit external IPs to advertise

Most commands require a secret token to permit control of a running ZeroTier
service. The CLI will automatically try to read this token from the
authtoken.secret file in the service's working directory and then from a
file called .zerotierauth in the user's home directory. The -t option can be
used to explicitly specify a location.

That’s a work in progress, so don’t bother modifying any scripts yet!

The graphical client for Windows and MacOS will also get a face lift, though we don’t have mockups to share yet. No, we do not plan to use Electron. We’re not dogmatically opposed to Electron but we do dislike it for system tray type applications that run constantly due to its weight.

Behind the scenes we are dropping the use of a local TCP socket to interface with the service in favor of Unix domain sockets on Unix-like operating systems and named pipes on Windows. We are doing this in response to the Zoom local web server security incident. While our local web server has never presented a security issue, we want to avoid using a pattern that is now considered bad practice by most security professionals and developers.

This might make scripting and control by third party apps somewhat less convenient, though we must point out that CURL, the most popular command line web querying program, does support HTTP over Unix domain sockets.

Last and for most people least, we are dropping the name “one” from the name of our service. It will just be ZeroTier, or the “ZeroTier Network Virtualization Service” for the pedantic. The name “ZeroTier One” dates back to the earliest alpha releases and refers to its ability to create “one” network that spans physical boundaries. It’s descriptive but it isn’t necessary.

Root Server Decentralization

ZeroTier has always been a mostly decentralized system. Our design principle has been “decentralize until it hurts, then centralize until it works.” The vast majority of the logic in the system exists at the edge in individual nodes with its behavior being enforced mutually by both sides of each link and securely regulated by cryptographically signed objects. An old blog post by ZeroTier’s founder from 2014 describes some of the thought behind these designs.

Since then we’ve been looking for ways to further decentralize ZeroTier’s root infrastructure, and we think we found a way.

To enable the high performance zero configuration rapid establishment of connectivity between edge nodes, ZeroTier roots must cache a complete database of all existing nodes and their locations on the physical network. Right now that’s easy: we run the roots, everyone contacts them, and they cache everything. (The data set is not actually that big and fits easily into the RAM of a decent sized server or VM.)

In the 1.x line we allowed users to add secondary root servers termed moons with the main root servers in turn being termed planets. (This terminology has proven confusing and is going away.) Secondary roots are able to act like roots to nodes that use them and then to delegate to the global primary roots when a node cannot be found.

Secondary roots make it possible to add redundancy in case ZeroTier’s core infrastructure goes down but they’re still not true decentralization as they still depend on core infrastructure run by ZeroTier, Inc.

We’ve had a few users hack the code to allow the use of private root servers that are entirely isolated from the network, but this sacrifices one of ZeroTier’s most compelling features: a unified global namespace for nodes. If roots are isolated it’s no longer possible for any node to join any network or be authorized on any network.

To enable true global and “flat” decentralization, we had to develop something new: a fully decentralized, fully replicated database that can be shared among multiple unrelated parties with no pre-existing trust relationship. It’s called LF (pronounced like “aleph”) and is introduced in this previous post. LF is offered as a separate project from ZeroTier and is potentially useful for many other things as well.

In the 2.x line root servers can use LF to store records called locators that contain information on how to reach a node. Locators are signed by the nodes they describe, making them unforgeable. For most nodes its locator will enumerate the root servers its using. For some nodes, such as root servers themselves, the locator can also contain explicit IPv4 and IPv6 contact addresses. Using locators a root can now look up the root that has custody of a node and vector traffic there, allowing anyone to run a root that is “co-equal” with ours.

Since roots are universally interchangeable in 2.x, the old clunky “planet” and “moon” mechanism has been replaced by the straightforward ability to add and remove root servers. Roots can be configured in one of two ways: as static roots consisting of a ZeroTier identity and one or more IP/port addresses or as dynamic roots that are fetched from a DNS TXT record (or possibly other backup mechanisms). (Dynamic root names can contain encoded signing keys to protect the system against DNS poisoning attacks.)

By default ZeroTier will still use our roots, but you will be free to run your own. Your roots can use any LF instance (typically one that you run) to obtain locator records for peers, allowing nodes using your roots to share the same global cryptographic address namespace as those using ours or any other root.

There are many reasons users want this: robustness, offline or air-gapped operation, data residency regulations, the desire to control one’s own infrastructure, and the operation of ZeroTier in regions of the world where our root servers may be slow or blocked.

As far as we’re concerned we are happy to have users run their own roots. Roots don’t do much at all and we don’t make any money directly off running them.

Improved Cryptography

Cryptography has been unchanged in ZeroTier since its first beta versions. It’s a simple and deliberately boring encryption protocol using Salsa20 and Poly1305 for symmetric encryption and Curve25519/Ed25519 for key establishment.

ZeroTier’s encryption works and it’s (to our knowledge) never experienced any catastrophic vulnerabilities, but it has its limitations. The most important of these is the absence of ephemeral keys and forward secrecy.

Version 2.x will bring ephemeral keys and at least some degree of forward secrecy, though the exact design is still not finalized. The next major version will also bring AES encryption (the exact mode is also not finalized yet) and a new identity type that is based on NIST P-384 elliptic curves. The latter is strictly optional and we expect most users will continue using Curve25519/Ed25519, but this new identity type will allow us to eventually ship a NIST and FIPS-140 compliant build of ZeroTier for enterprise customers.

Smarter Multicast Replication

ZeroTier’s multicast is based on sender-side replication, a fancy way of saying that the sending node just sends copies of multicasts to every subscribing recipient. This is simple and works perfectly well for most common multicast use cases, but it doesn’t scale well to larger multicast groups and it performs poorly on aggressively hub-and-spoke physical network topologies.

Version 2.x supports peer-to-peer and hub-and-spoke multicast replication algorithms. Peer-to-peer replication means that peers assist in the propagation of multicasts, while hub-and-spoke replication means certain members of a network can be explicitly designated (at the controller) to be multicast replicators responsible for aggressively propagating multicasts to subscribers. In reality hub and spoke replication is a special case of peer to peer replication with greater participation by select nodes and is implemented as such.

Other Improvements

As part of our Go transition we’ve re-written the I/O path in the service. The new I/O path uses true multithreaded I/O for UDP packets and contains additional I/O optimizations.