ZeroTier 1.1.17 (1.2.0 beta) Available for Testing

Happy Holidays! After many months of work, we now have a beta version available of ZeroTier 1.2.0. Alongside this release comes a major revision of ZeroTier Central with support for our new rules engine and many changes under the hood to improve reliability.

Your holiday gift to us can be breaking it! … and telling us how, of course. Use GitHub or e-mail to contact@zerotier.com to report any problems.

We plan to release 1.2.0 sometime in early to middle January 2017.

Version 1.1.17 is available over at GitHub, and we have pre-built Windows [MSI] and Macintosh [PKG] binaries available as well. The new version of ZeroTier Central is up at my-beta.zerotier.com. Since it’s a beta test site it might be updated or reset at any time, and accounts and networks created there will eventually be cleared. If you want to upgrade your account there, use a Stripe test card number. (Though you won’t see many differences on the beta site.)

What’s new? Let’s see… where to begin…

ZeroTier Core

  • Rules and Micro-Segmentation: ZeroTier now has a true “enterprise-grade” SDN rules engine. You can read more about its design and capabilities here. Using it requires the new controller code, and the new ZeroTier Central UI provides a simple rules language that makes defining rules a lot easier.
  • Upstream Federation: ZeroTier now supports the configuration of user-designated upstream nodes that behave like root servers and can provide root functionality if root servers are not reachable. This is the first step toward a more mesh-like mode of operation, and also toward making ZeroTier a viable solution for on-premise SDN by allowing the designation of on-site upstream devices. This feature still needs a bit of work and testing, but you are free to try specifying a designated upstream in local.conf.
  • Dead Path Detection Improvements: ZeroTier is now even more aggressive at detecting path failure and finding new paths. Dead links should be renegotiated more rapidly.
  • Security Improvements: Stricter and smarter rate limits have been introduced to harden the core against DOS attacks, and a few other minor issues have been fixed.

ZeroTier One (Mac/Linux/Windows/etc. service)

  • New Windows UI: We’ve transformed our Windows UI into a task bar app that can be used to quickly join, activate, and deactivate networks. It remembers past networks and lets you easily re-join them.
  • New Macintosh UI: Our Macintosh UI is completely rewritten and now runs as a task bar app with similar functionality to the new Windows UI. It’s designed to resemble the Macintosh WiFi menu. Think of it as WiFi with unlimited range.
  • Controller Completely Reworked: The controller has been significantly refactored to use an in-filesystem store of JSON documents instead of SQLite to store network and member configuration and state information. This makes administration and archiving of networks easier and reduces dependencies. The controller also now builds by default in all desktop and server builds of ZeroTier One, which will allow us in the future to include simple “ad-hoc networking” capabilities in client UIs.
  • Windows High CPU Bug Fix: A major bug reported by many users running ZeroTier alongside Hyper-V that caused high CPU usage should now be fixed. This was a tough one to track down!
  • Default Route Support in UIs: Windows, Mac, Android, and iOS UIs all now expose global and default route permissions for networks.

ZeroTier Central (web UI)

  • General UI Improvements: UI sections are now collapsable, and collapsable help is now available beneath each. The look of the members area has been improved, and members now have an advanced settings panel that can be expanded via the little wrench icon next to the member’s ZeroTier address that contains less common configuration options.
  • Rules Definition Language: You’ll notice a new section called “Flow Rules” below the main network settings area. This contains an editor window where you can write rule sets. We’re working on comprehensive documentation and tutorials for this, but for now you should be able to experiment by following the help and some of the examples at the bottom of this post.
  • Real-Time Updates: Previous versions of ZeroTier Central required a refresh or reload to see changes. This version streams updates continuously. It even works if more than one person is editing a network. You’ll see changes almost instantly, like Google Docs.
  • Member Paging: The members list is now paged to improve UI performance on huge networks. We found in testing that more than 25 members per page results in poor UI performance on most systems and browsers. With paging we were able to administrate networks with thousands of members without a “spinning wheel of death” or extreme browser slowdown.
  • Better Reliability: Under the hood, Central has been refactored to use RethinkDB in high availability cluster configurations instead of PostgreSQL and to automatically manage controller fail-over. The new cluster at my-beta.zerotier.com is a true automatic failover HA cluster instead of a master-backup configuration. This should allow us to keep scaling without performance issues and should prevent issues like the one we experienced on October 27th.

Outstanding Issues Remaining

  • Documentation: We have a lot of documentation to write!
  • More Documentation: The live API docs on my-beta are out of date. There have been a few changes, and new fields and capabilities added.
  • Federation Testing: Federation (designated upstreams) still needs testing and of course documentation!
  • Further Rules Engine Testing: While the core of the rules engine is well tested, tags and capabilities still need a bit of work. There’s still a bit of UI work remaining in Central as well to fully support these more advanced features in a user-friendly way.
  • Software Updates: We’re planning a huge overhaul of the software update mechanism for Macintosh and Windows for 1.2.0, but it’s not done yet.
  • Linux Build Farm 2.0: We’re also planning on creating a new Linux build environment based on qemu-chroot that will allow us to build native binaries for many different Linux architectures and distribution versions using native compilers and test them on “native” (emulated) versions of the target system. This will allow us to offer packages for things like MIPS, ARM64, RISC-V, etc.

There’s also been a tremendous amount of work on the ZeroTier SDK, which we’ll be going into in a separate blog post closer to the actual 1.2.0 release.

I suspect that for most users our rules engine will be the most exciting part of this release. Unfortunately we have yet to produce the kind of comprehensive documentation that will be needed to take full advantage of it. Until then, you can get started by reading the help on my-beta.zerotier.com and also referring to the example below. It shows how to create a simple network that allows IPv4 and IPv6 and how to whitelist TCP connections.

Try cutting and pasting this script into your own network on my-beta and testing it out!

# This is an example rule set to illustrate the basic syntax of
# ZeroTier Central's rules editor, as well as a few of the rules
# engine's most important capabilities.

# Central just compiles this into a raw rule set, which in the Central
# UI can be seen to the right of the editor window.

# Drop all Ethernet frame types that are not IPv4 or IPv6
# Note that 'and' is the default for chains of conditions in an action,
# so its use is strictly ornamental.
drop
	not ethertype 0x0800 # IPv4
	and not ethertype 0x0806 # IPv4 ARP
	and not ethertype 0x86dd # IPv6
;

# Senders in every sender-receiver pair will send a copy of the first
# 128 bytes of every packet to a security observer.
tee 128 beefbeef01 not chr inbound;

# Now have receivers do the same. This way one security observer will
# see all sender-side traffic, and another will see all receiver-side
# traffic!
tee 128 beefbeef02 chr inbound;

# TCP whitelisting: allow ssh, http, and https. Note the 'or' modifier.
# Conditions are evaluated in order, so each 'and' or 'or' applies to
# the next condition and is evaluated vs. the result of all previous
# conditions.
accept
	chr tcp_syn
	and dport 22
	or dport 80
	or dport 443
;

# TCP whitelisting: do not allow anything else! Since ZeroTier's filter
# is stateless, we accomplish this by prohibiting the initial TCP SYN
# packet in the TCP three-way handshake. The above whitelist rules allow
# it only for designated ports.
drop
	chr tcp_syn
	and not chr tcp_ack
;

# Accept all other traffic
accept;

October 27th Web and Controller (my.zerotier.com) Outage Postmortem

We’re totally heads-down on our next release, but we wanted to take a moment to do a short postmortem on our first significant outage ever.

So here’s the basics of what happened.

Our existing (soon to be EOL) infrastructure uses bare metal machines. We do this for security reasons– these are certificate authorities and bare metal offers an additional layer of security for storage and integrity of key material. It consists of a primary/secondary configuration, which is a viable if a little old-school way of doing high availability.

So far this has worked beautifully. It’s had no issue handling heavy traffic and fail-over has been fine in the past.

On October 27th, a SSD disk belonging to a mirror set in one of these bare metal machines decided to eat itself “creatively.” Instead of failing like a light bulb, it failed slowly in a way that resulted in data corruption. This can happen with any kind of drive, but for reasons that are hard to determine (because the machine totally ate itself) these failures resulted in follow-on corruption of the mirrored data at the secondaries. This meant that when the box failed, the secondaries did not come up and the data required repair.

It was a bit yucky but we were able to compensate and bring everything progressively back up within an hour or so. It’s also important to note for those who don’t know that ZeroTier’s root server infrastructure is entirely separate from this, is a better shared-nothing redundantly-redundant configuration (two independent geo-distributed roots), and was in no way affected. As a result those running their own controllers didn’t notice anything.

Here’s a few take-aways for anyone else running… well… pretty much anything:

  1. Be sure to test fail-over, auto-master-reelection, or other HA configurations regularly. The system had been so well-behaved for so long that we hadn’t done this in a while and it bit us.
  2. Be insanely paranoid about backups. (Luckily we did this.) Be sure that you have multiple backup strategies in place that are backing up (1) to multiple locations, (2) in multiple ways, and (3) using multiple strategies to store the data. That way if something ugly like the above occurs you will be able to leverage the strengths and weaknesses of every different backup strategy you’re using to compensate for any creative failure modes you encounter. Also always always remember that certain kinds of failures can result in problems that cascade down into your backups.
  3. Point 2 applies to as-a-service things like elastic whatever databases or storage as well, since not only can other peoples’ SaaS fail but your stuff could also fail in such a way as to eat itself. SaaS won’t save you if your own code told it to blow its own brains out.
  4. Have lots of monitoring. This failure actually had a part 2, namely that we were not notified immediately that the failure was “in progress.” We got notified that something was wrong but by then the trains had already wrecked.

Funny thing is we are probably weeks away from shifting to a better infrastructure for hosted ZeroTier Central and controllers. It’s multi-data-center, fully redundant in a more modern sense (Raft consensus via RethinkDB among other things), and all that good stuff. I’m personally convinced that computers know such things. This was the existing configuration’s last chance to fail in a sufficiently interesting way so as to cause an outage. In the future we’ll be more aware of this effect and plan for things to conspire to fail before they are scheduled to be replaced. 🙂

Last but not least, some users reported connectivity problems between existing hosts. We saw a bit of this ourselves. This “shouldn’t happen,” but it seems that if a controller is down long enough it’s possible to run into certificate timestamp boundary issues in which certificates that have been issued do not agree. There are changes in the new version that may already mitigate this but this is something we’re going to test more heavily in the final testing phase of the next release.

Now back to 1.2.0!

The ZeroTier Rules Engine: Bringing Capability-Based Security to Virtual Networks

The next major release of ZeroTier’s network virtualization engine (1.2.0) is a huge milestone. In addition to other improvements, our virtual networks will be getting a lot smarter. It will now be possible to set fine-grained rules and permissions and implement security monitoring at the network level with all of this being managed via the network controller and enforced cooperatively by all network participants. This brings us to near feature parity with in-data-center SDN systems and virtual private cloud backplanes like Amazon VPC.

This post describes the basic design of the ZeroTier rules engine by way of the reasoning process that led to it. As of the time of this writing (late August, 2016), a working but not quite production ready implementation is taking shape in the “dev” branch of our GitHub repository. ETA for this release is mid to late September, but if you are brave you are welcome to pull “dev” and take a look. Start with “controller/README.md”. Note that there have been other changes to the controller too, so don’t try to drop this into a production deployment!

Ruling Certain Things Out

In designing our rules engine we took inspiration from OpenFlow, Amazon VPC, and many other sources, but in the end we decided to do something a little bit different. Our mission here at ZeroTier is to “directly connect the world’s devices” by in effect placing them all in the same cloud. The requirements implied by this mission rule out (pun intended?) many of the approaches used by conventional LAN-oriented SDN switches and endpoint firewall management solutions.

ZeroTier is designed to run on small devices. That means we can’t push big rules tables. The size of the rule set pushed to each device has to be kept under control. Meanwhile the latency and unreliability of the global Internet vs on-premise networks excludes any approach that requires constant contact between endpoints and network controllers. This means we can’t use the OpenFlow approach of querying the controller when an unrecognized “flow” is encountered. That would be slow and unreliable.

At the same time we wanted to ship a truly flexible rules engine capable of handling the complex security, monitoring, and micro-segmentation needs of large distributed organizations.

We’ve wanted to add these capabilities for a long time. The delay has come from the difficulty of designing a system that delivers on all our objectives.

Defining the Problem

To solve hard problems it helps to first take a step back and think about them conceptually and in terms of first principles. We’ve had many discussions with users about rules and micro-segmentation, and have also spent a good amount of time perusing the literature and checking out what other systems can do. Here’s a rough summary of what we came up with:

  • Global Network Rules: it should be possible to define a common set of rules for a network.
  • Security Monitoring: enterprise users often want to be able to watch traffic on a network, possibly leveraging tools like Snort or other IDS and anomalous traffic detection products.
  • Fine Grained Permissions and Network Micro-Segmentation: system administrators in large networks often want to set fine-grained permissions for individual devices or groups of devices. This is like having users, groups, and ACLs on a network.
  • Traffic Priority Control and QoS Stuff: network administrators want to be able to control the priority of traffic on networks to ensure e.g. reliable VoIP operation.

Those are the high level goals that informed our design. Here’s what we did in response to them.

Global Rules and Security Monitoring

Once fine-grained permissions, per-device rules, and device group rules are conceptually separated from the definition of global network behavior it becomes practical to limit the global rules table size to something modest enough to accommodate small devices. While there might occasionally be use cases that require more, we think something on the order of a few hundred rules that apply globally to an entire network is probably enough to address most sane requirements. This is enough space to describe in great detail exactly what traffic a network will carry and to implement complex security monitoring patterns.

So that’s what we did. Keep in mind that at this stage we are intentionally ignoring the need for fine-grained per-device stuff. We’ll pull out some bigger guns to deal with that later.

Now on to security monitoring. If the goal is near-omniscience, there is no substitute for placing a man in the middle and just proxying everything. Unfortunately that’s a scalability problem even inside busy data centers, let alone across wide area networks. But it’s a case we wanted to at least support for those who want it and are willing to take the performance hit.

To support this we added a REDIRECT action to our rules engine. Our redirect operates at the ZeroTier VL1 (virtual layer 1) layer, which is actually under the VL2 virtual Ethernet layer. That means you can send all traffic matching a given set of criteria to a specific device without in any way altering its Ethernet or IP address headers. That device can then silently observe this traffic and send it along to its destination. The fact that this can be done only for certain traffic means the hit need only be taken when desired. Traffic that does not match redirection rules can still flow directly.

Now what about a lower overhead option? For that we took some inspiration from Linux’s iptables and its massive suite of capabilities. Among these are its –tee option, which allows packet cloning to remote observers.

We therefore added our own TEE, and like REDIRECT it has the advantage of operating at VL1. With our packet cloning action every packet matching a set of criteria, or even the first N bytes of every such packet, can be sent to an observer. Criteria include TCP options. This lets network administrators do efficient and scalable things like clone every TCP SYN and FIN to an observer to watch every TCP connection on the network without having to handle connection payload. This allows a lot of network insight with very minimal overhead. A slightly higher overhead option would involve sending, say, the first 64 bytes of every packet to an observer. That would allow the observation of all Ethernet and IP header information with less of a performance hit than full proxying.

But what about endpoint compromise?

Our security monitoring capabilities can never be quite as inescapable as a hardware tap on a physical network. That’s because ZeroTier is a distributed system that relies upon endpoint devices to correctly follow and enforce their rules. If an endpoint device is compromised its ZeroTier service could be patched to bypass any network policy. But the fact that rules are evaluated and enforced on both sides of every interaction allows us to do the next best thing. By matching on the inbound/outbound flag in our rules engine and using other clever rule design patterns it’s possible to detect cases where one side of an interaction stops abiding by our redirect and packet tee’ing policies. That means an attacker must now compromise both sides of a connection to avoid being observed, and if they’ve done that… well… you have bigger problems. (A detailed discussion of how to implement this will be forthcoming in future rules engine documentation.)

Fine Grained Permissions

Global rules take care of global network behavior and security instrumentation, but what if we want to get nit-picky and start setting policies on a per-device or per-device-group basis?

Let’s say we have a large company with many departments and we want to allow people to access ports 137, 139, and 445 (SMB/CIFS) only within their respective groups. There are numerous endpoint firewall managers that can do this at the local OS firewall level, but what if we want to embed these rules right into the network?

Powerful (and expensive) enterprise switches and SDN implementations can do this, but under the hood this usually involves the compilation and management of really big tables of rules. Every single switch port and/or IP address or other identifier must get its own specific rules to grant it the desired access, and on big networks a combinatorial explosion quickly ensues. Good UIs can hide this from administrators, but that doesn’t fix the rules table bloat problem. In OpenFlow deployments that support transport-triggered (or “reactive”) rule distribution to smart switches this isn’t a big deal, but as we mentioned up top we can’t do things that way because our network controllers might be on the other side of the world from an endpoint.

On the theory side of information security we found a concept that seems to capture the majority if not all of these cases: capability based security. From the article:

A capability (known in some systems as a key) is a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights. A user program on a capability-based operating system must use a capability to access an object.

Now let’s do a bit of conceptual search and replace. Object (noun) becomes network behavior (verb), and access rights become the right to engage in that behavior on the network. We might then conclude by saying that a user device on a capability-based network must use a capability to engage in a given behavior.

It turns out there’s been a little bit of work in this area sponsored by DARPA and others (PDF), but everything we could find still talked in terms of routers and firewalls and other middle-boxes that do not exist in the peer to peer ZeroTier paradigm. But ZeroTier does include a robust cryptosystem, and as we see in systems like Bitcoin cryptography can be a powerful tool to decentralize trust.

For this use case nothing anywhere near as heavy as a block chain is needed. All we need are digital signatures. ZeroTier network controllers already sign network configurations, so why can’t they sign capabilities? By doing that it becomes possible to avoid the rules table bloat problem by only distributing capabilities to the devices to which they are assigned. These devices can then lazily push capabilities to each other on an as-needed basis, and the recipient of any capability can verify that it is valid by checking its signature.

But what is a capability in this context?

If we were achieving micro-segmentation with a giant rules table, a capability would be a set of rules. It turns out that can work here too. A ZeroTier network capability is a bundle of cryptographically signed rules that allow a given action and that can be presented ahead of relevant packets when that action is performed.

It works like this. When a sender evaluates its rules it first checks the network’s global rules table. If there is a match, appropriate action(s) are taken and rule evaluation is complete. If there is no match the sender then evaluates the capabilities that it has been assigned by the controller. If one of these matches, the capability is (if necessary) pushed to the recipient ahead of the action being performed. When the recipient receives a capability it checks its signature and timestamp and if these are valid it caches it and associates it with the transmitting member. Upon receipt of a packet the recipient can then check the global rules table and, if there is no match, proceed to check the capabilities on record for the sender. If a valid pushed capability permits the action, the packet is accepted.

… or in plainer English: since capabilities are signed by the controller, devices on the network can use them to safely inform one another of what they are allowed to do.

All of this happens “under” virtual Ethernet and is therefore completely invisible to layer 2 and above.

Controlling Capability Bloat with Tags

Capabilities alone still don’t efficiently address the full scope of the “departments” use case above. If a company has dozens of departments we don’t want to have to create dozens and dozens of nearly identical capabilities that do the same thing, and without some way of grouping endpoints a secondary scoping problem begins to arise. IP addresses could be used for this purpose but we wanted something more secure and easier to manage. Having to renumber IP networks every time something’s permissions change is terribly annoying.

To solve this problem we introduced a third and final component to our rules engine system: tags. A tag is a tiny cryptographically signed numeric key/value pair that can (like capabilities) be replicated opportunistically. The value associated with each tag ID can be matched in either global or capability scope rules.

This lets us define a single capability called (for example) SAMBA/CIFS that permits communication on ports 137, 139, and 445 and then include a rule in that capability that makes it apply only if both sides’ “department” tags match.

Think of network tags as being analogous to topic tags on a forum system or ACLs in a filesystem that supports fine-grained permissions. They’re values that can be used inside rule sets to categorize endpoints.

So What Rules Are There?

A rule in our system consists of a series of zero or more MATCH entries followed by one ACTION. Matches are ANDed together (evaluated until one does not match) and their sense can be inverted. An action with no preceding matches is always taken. The default action if nothing matches is ACTION_DROP.

Here’s a list of the actions and matches currently available:

ACTION_DROP Drop this packet (halts evaluation)
ACTION_ACCEPT Accept this packet (halts evaluation)
ACTION_TEE Send this packet to an observer and keep going (optionally only first N bytes)
ACTION_REDIRECT Redirect this packet to another ZeroTier address (all headers preserved)
MATCH_SOURCE_ZEROTIER_ADDRESS   Originating VL1 address (40-bit ZT address)
MATCH_DEST_ZEROTIER_ADDRESS Destination VL1 address (40-bit ZT address)
MATCH_ETHERTYPE Ethernet frame type
MATCH_MAC_SOURCE L2 MAC source address
MATCH_MAC_DEST L2 MAC destination address
MATCH_IPV4_SOURCE IPv4 source (with mask, does not match if not IPv4)
MATCH_IPV4_DEST IPv4 destination (with mask, does not match if not IPv4)
MATCH_IPV6_SOURCE IPv6 source (with mask, does not match if not IPv6)
MATCH_IPV6_DEST IPv6 destination (with mask, does not match if not IPv6)
MATCH_IP_TOS IP type of service field
MATCH_IP_PROTOCOL IP protocol (e.g. UDP, TCP, SCTP)
MATCH_ICMP ICMP type (V4 or V6) and optionally code
MATCH_IP_SOURCE_PORT_RANGE Range of IPv4 or IPv6 ports (inclusive)
MATCH_IP_DEST_PORT_RANGE Range of IPv4 or IPv6 ports (inclusive)
MATCH_CHARACTERISTICS Bit field of packet characteristics that include TCP flags, whether this is inbound or outbound, etc.
MATCH_FRAME_SIZE_RANGE Range of Ethernet frame sizes (inclusive)
MATCH_TAGS_DIFFERENCE Difference between two tags is <= value (use 0 to test equality)
MATCH_TAGS_BITWISE_AND Bitwise AND of tags equals value
MATCH_TAGS_BITWISE_OR Bitwise OR of tags equals value
MATCH_TAGS_BITWISE_XOR Bitwise XOR of tags equals value

Detailed documentation is coming soon. Keep an eye on the “dev” branch.

Limitations and Future Work

The ZeroTier rules engine is stateless to control CPU overhead and memory consumption, and stateless firewalls have certain shortcomings. Most of these issues have work-arounds but sometimes these are not obvious. “Design patterns” will be documented eventually along with the rules engine for working around common issues, and we’ll be building a rule editor UI into ZeroTier Central that will help as well.

We also have not addressed the problem of QoS and traffic priority. That presents additional challenges since being a virtualization system that abstracts away the physical network it is hard for ZeroTier to know physical topology. One option we’re considering is to implement QoS field mirroring, allowing ZeroTier’s transport packets to inherit QoS fields from the packets they are encapsulating. That would allow physical edge switches to prioritize traffic. We’re still exploring in this domain and hope to have something in the future.

Comments Are Welcome!

As of the time of this post our rules engine is still under active development. If you have additional thoughts or input please feel free to head over to our community forums and start a thread. Intelligent input is much appreciated since now would be the most convenient time to address any holes in the approach that we’ve outlined above.

The ZeroTier SDK: Build Interoperable Peer-to-Peer Apps with Standard Protocols

As of this week it is now possible to connect desktop and mobile apps to virtual networks with the ZeroTier SDK. With our SDK applications can now communicate peer to peer with other instances of themselves, other apps, and devices using standard network protocols and with only minimal changes to existing network code. (On some platforms no changes at all are required.) The SDK repository at GitHub contains documentation and example integrations for iOS, Android, and the Unity game engine for in-game peer to peer networking using ZeroTier.

The ZeroTier SDK is an evolution of what we formerly called Network Containers, and still supports the same Linux network stack interposition use case. It’s still beta so do not expect perfection. We are innovating here so excuse the dust.

A Less WIMPy Path to Peer-to-Peer Networking

Most existing P2P apps either engineer their own special-purpose protocols from the ground up or use one or more P2P networking libraries, but in both cases P2P communication is done using a protocol stack and deployment that is peculiar to the app and can only easily interoperate with other instances of the same app. This extends the “WIMP model” of computing (“weakly interacting massive programs,” a play on the hypothetical “weakly interacting massive particle” from physics) into network space yielding programs that cannot interoperate directly.

This makes it hard to build true ecosystems where many programs can combine to provide exponentially increasing value at higher levels. It also means that peer to peer networking is a “special snowflake” in your development process, requiring special code, special protocols, etc. that are wholly different from the ones your app uses to communicate with the cloud. This is one reason many apps simply skip on peer to peer. Why build the app’s networking features twice?

The ZeroTier SDK takes a different approach. It combines a lightweight TCP/IP stack (currently LWIP) with the ZeroTier network virtualization core to yield a “P2P library” that tries to be invisible. Our SDK allows apps to connect to each other using the same protocols (and usually the same code) they might use to connect to a cloud server or anything else on a TCP/IP network.

Since ZeroTier also runs on servers, desktops, laptops, mobile devices, containers, embedded/IoT “things,” etc., an app using the ZeroTier SDK can also freely communicate peer to peer with all of these.

The ZeroTier SDK lives entirely in the app. No elevated permissions, kernel-mode code or drivers, or other special treatment by the operating system is needed. This means that P2P apps speaking standard interoperable native network protocols can be shipped in mobile app stores without special entitlements or other hassles.

A Hypothetical Ecosystem

To understand what the ZeroTier SDK enables it helps to imagine how it might be used.

Let’s start by imagining an augmented reality game similar to Pokémon Go. The app is built with the ZeroTier SDK, and all instances of the app join a public virtual network and communicate directly to one another using HTTP and a RESTful API internal to the app. This allows instances of the app to exchange state information directly with lower latency (and at lower cost to the app’s developer) than relaying it through cloud servers.

Now the maker of the game does something interesting: they document the game’s peer to peer RESTful API and allow third party clients to obtain authentication tokens to communicate with running game instances.

Since the application peer to peer network runs ZeroTier, anything else can join it. This includes but is not limited to servers, desktops, laptops, IoT devices, and so on. Developers can now build scoreboards, web apps, secondary or “meta” games, team communication software, or virtually anything else they can imagine. Since these apps can communicate with actual instances of the game in real time, interoperation with the game ecosystem can be extremely fast and extremely rich. Since the game developer does not have to carry all this traffic over a proprietary cloud this introduces no additional cost burden.

Fast forward a year and there are IoT light bulbs that light up when players are near (with sub-50ms responsiveness), new PC games that extend the augmented reality experience provided by the mobile app into virtual reality worlds, and advanced players have written their own software to help their teams organize and cooperate together.

Beta Testers Wanted

The ZeroTier SDK is in beta and we’re still working to perfect integration on a variety of platforms. Right now we are looking for app and game developers who are interested in working with us. If you just want to take a look feel free to pull the code, but if your interest is more serious drop an e-mail to contact@zerotier.com and we’d be happy to work with you and help you out.

ZeroTier for Decentralists: What it Is, What it Isn’t, and Why

ZeroTier was founded as a “decentralize the Internet” project, and it’s very much in our DNA. In the spirit of the recent Decentralized Web Summit (which we unfortunately could not attend) we wanted to speak to this topic again by explaining how we fit into the larger Internet decentralization effort and responding to a few criticisms we occasionally receive.

The Problem: How to “Connect All the Things!”

It’s 2016. I should be able to take anything (mobile or not) with a chip in it (or an app) make it talk to any other arbitrary thing in under five minutes.

Without ZeroTier this is very hard and very annoying, and we don’t think the problem is going away soon.

Why? Chiefly because: (1) NAT, (2) slowness of IPv6 adoption, (3) crummy and buggy endpoint routers/firewalls, and (4) IP routes packets to places, not things, and almost everything today is mobile.

While IPv6 helps by scaling IP, it unfortunately does not solve all these problems. (It should have, but that’s another matter.) IPv6 endpoint routers often implement IPv4-like stateful filtering that prevents IPv6 endpoint-to-endpoint connectivity without an intermediary, and traversing this requires essentially the same three-party handshake techniques as traversing IPv4 NAT. Some networks even implement NAT in IPv6 from the mistaken belief that this improves security. IPv6 mobility, a technology for IPv6 IP address portability, could help the mobility problem but is still more or less in draft stage. With IPv6 itself taking forever to deploy, it’s probably not wise to hold your breath for any extensions.

Our Solution: SDN for Earth

Our approach to the connectivity problem is to build a purely software based virtual switch that abstracts away all this complexity and allows arbitrary network topologies to be provisioned as needed. (Right now it’s VLANs. More advanced SDN/OpenFlow-like features are in the queue.) This global-scale smart switch runs over a peer to peer communication protocol that facilities end-to-end connectivity under most scenarios and provides seamless relay fallback for cases where this is not possible.

Create a network. Add things to it. You’re done. ZeroTier handles all the rest and gives you a perfectly flat Ethernet network that looks just like WiFi or LAN.

Virtualizing at OSI-model Layer 2 (Ethernet) is our attempt to avoid the XKCD #927 problem that plagues many other alternative-Internet projects. Yes, we’ve created another overlay network (sigh), but it’s an invisible one that’s designed to go away if it’s no longer needed.

We think this is a superior approach vs. application peer to peer libraries and protocols. Instead of solving the connectivity problem with yet another protocol, we provide a shim that lets you forget this problem exists. That way if it ever really does stop existing, you can take the shim away and software complexity can actually decrease. (As a company we’ll survive by working up the stack. In any case we’re not holding our breath since IPv6 isn’t even to 25% yet.)

ZeroTier is not a VPN, though many of our users think it is. It’s useful for that because VPN, SDN, and application peer to peer networking are really all the same problem viewed from different angles. If you think about all these things as special cases of “software defined networking,” they can all be solved together with a common system.

But It’s Not Decentralized!

ZeroTier is minimally centralized, not fully decentralized, and this is the most common criticism we get from the Internet decentralization community. At the core of the ZeroTier network are two geo-distributed “root servers.” These run exactly the same open source code as regular nodes, but they run at stable statically addressed locations on the Internet and are “blessed” as such. There job is to cache very small bits of information and facilitate connection setup between other peers.

We call these things roots because their role is most closely analogous to DNS root name servers. Some have not inaccurately described our protocol as “dynamic DNS on steroids.”

We chose this architecture because we think it’s the best way to achieve our self-imposed design goals. These are:

  • ZeroTier must “just work” over the Internet as currently deployed. Any configuration item a user must enter or tweak to make it work is a bug.
  • Any device must be able to establish connectivity with any other device in less than a small multiple of actual physical RTT between them, e.g. <2.5s for two devices in North America.
  • Devices must come online in less than one minute. There must be no prolonged “warm-up period.”
  • The endpoint code must be simple enough that it could be ported to run on very small devices, e.g. small sub-1ghz ARM processors with <64mb RAM. This means both storage and code footprint must be minimal.
  • The design must be simple enough to describe in an RFC-type document. (We haven’t done this yet but the code is well commented.)
  • Complexity and statefulness are evil and should be avoided to the greatest extent possible. (For reasons already listed.)
  • Bandwidth use must be light enough to be mobile-friendly.
  • The protocol must be highly resistant to denial of service, sybil, and other kinds of attacks, and it must be possible for us to respond quickly to any such attack if it occurs.
  • It has to scale to Internet size, meaning (conservatively) tens of billions of devices.

That’s a tough set of objectives and it rules out a lot of things. Most distributed hash tables are too stateful and heavy, and they usually have warm-up periods without assistance by centralized bootstrap nodes that look an awful lot like root servers. Failure of these bootstrap nodes leads to slow launch and slow connectivity setup and slow is a synonym for broken. Any distributed systems model that involves replicating a lot of data is out due to small device constraints, with block chains being the most secure but heaviest of these and therefore the least applicable. Simple but chatty protocols like rumor mills also fail due to mobile-unfriendly levels of bandwidth consumption and lack of scalability. All of these systems (except block chains) are very hard to harden against sybil attacks.

This blog post from a few years ago goes into even more depth. More has been learned since then, but the fundamental conclusion hasn’t changed. There are decentralization purists who will reject this design, but for us the advantages are simply too great. We’re not aware of any less-centralized way to ship nearly-stateless global-scale SDN that could (in theory) be ported to run on a smart light bulb.

The other response we have to purists is (as mentioned above) that we avoid lock-in. Using ZeroTier as a transport doesn’t imply eternal dependence on it.

The root servers could be federated, and we sometimes get questions about that. So far our thinking is to make sure the protocol is very stable for an extended period of time before doing this, since once federated a system is significantly harder to change.

Disrupting “Overcast Computing”

The standard model of endpoint connectivity today is to route all traffic through closed vertically integrated cloud backends. This is what, in now-cliche Silicon Valley jargon, we’re trying to disrupt.

We call this “overcast computing.” In this model all endpoint devices are dumb terminals for servers you don’t own and can’t control and changing the brightness of your dining room lights involves a 2000-mile-plus data path. It’s a model that facilitates lock-in, surveillance, and manipulation, and that limits deployment of whole categories of higher-bandwidth applications to large companies with data center economies of scale. It’s a very good model for a small number of huge players but it’s bad for freedom, privacy, and permission-free innovation for the rest of us.

The overcast computing model is not the result of a conspiracy. It came about as a result of the problems we outlined in the first section, namely NAT, network complexity, and mobility. That’s why we decided to tackle the connectivity problem. We think it’s fundamental.

What About Meta-Data?

While ZeroTier encrypts data end-to-end, the root servers can still see meta-data about who’s communicating with whom. (As can your ISP and anything else in between.) This is another popular criticism: we’re not protecting anonymity.

Unfortunately there is no way around this one. Not even further decentralization of the protocol would fix this. In fact it might make it worse, since in a federated or fully decentralized model sybils could act as connection facilitators and log everything more easily.

This is an unavoidable issue because efficient end-to-end connectivity and meta-data anonymity are antagonistic problems. Optimizing for one sacrifices the other. To efficiently route traffic from point A to point B, A needs to know B’s location on the network and vice versa. Meanwhile to protect anonymity a system cannot offer latencies less than roughly 1/2 the circumference of Earth in light-seconds. To do so allows latency measurements to be used to triangulate position.

What’s Next?

Today ZeroTier runs on Linux, FreeBSD, Windows, Mac, iOS, and Android. We’re working on a new release that will include a number of new features and improvements, and after that we plan on releasing our SDK. The SDK project, formerly known as Network Containers, represents the extension of the ZeroTier model to apps, containers, and microservices. It brings peer to peer connectivity to user-space with little or no code modifications (on some platforms it can be done without even recompiling) and no elevated privileges, special entitlements, or kernel models.

After all that the only target left is embedded. We’ve already researched FreeRTOS and SysBIOS and we don’t think this is going to be particularly hard. The ZeroTier core is already independent of any OS-dependent code, so porting it to small devices will just be a matter of scaling down resource consumption a little.

At that point we will have “connected all the things.” For the first time developers will have a viable alternative to “overcast computing” that works seamlessly across all kinds of devices, apps, and platforms.

ZeroTier Central Update – Higher Free Limits, Pricing Changes, New Features

We just upgraded the ZeroTier Central software behind my.zerotier.com to a new version: 0.8.0. With this upgrade we bring UI improvements, bug fixes, new features, and many improvements under the hood.

We’ve also adjusted our pricing.

Free accounts can now add up to 100 devices (up from 10), and unlimited networks. (Subscriptions are now per-account, not per-network.) The price for new paid accounts has risen to $29.00/month (USD), but users with existing subscriptions will continue to pay whatever they’ve been paying. If you have one paid network for $4/month, you’ll keep paying $4/month. If you have two for $8/month, you’ll still pay $8/month. This discount will remain in effect as long as you don’t cancel your subscription. (If you actually want to pay the new rate to help support the project, cancel and re-subscribe.)

We think our new higher free limits and new pricing better reflects the split between personal and professional users. Free accounts get plenty of devices for almost any personal use case, while paid accounts get unlimited devices, special features, and services.

The first new feature for paid accounts is device health monitoring. ZeroTier Central can now notify you by e-mail or SMS (if you have it configured) if important devices on your virtual networks go down. To enable this feature on a paid account, visit the network control panel and use the pull-down menu on a device to enable monitoring.

Additional new features in development for paid users include a Slack bot, exit gateway as a service (ZeroTier One 1.1.6 will support full tunneling), web and port forwards into virtual networks, and more. We’ll be announcing these on our blog and on Twitter when they’re live.

Our payment system now supports Alipay. This should make it easier for our Chinese and other Asia-Pacific users to subscribe.

In other news our iOS VPN app is entering beta, and major new releases of our core ZeroTier One software and our Network Containers project are on the way.

Edit: since a few have asked: yes, of course you can always run your own network controller. Pricing applies only to our hosted controller and web UI. Visit our community for community support and HOWTOs on building and operating the controller code.

Connect All the Things: Year Four Recap, and 2016 Plans

Year Four: “I” becomes “We”

2015 was a big year. It was the year ZeroTier graduated from side project incubation.

ZeroTier began in early 2012 as a personal open source project. It arose from two pain points. The first was social and political. Like many I thought the Internet was becoming too centralized, and I figured the difficulty of directly connecting endpoints must be a contributing factor. If things can’t connect directly they have to make use of intermediaries, and this creates more niches for middle-boxes than perhaps ought to exist. The second pain point was professional. In my day job I struggled with the pain of networking: tunnel spaghetti, an alphabet soup of needlessly narrow hard-to-deploy protocols, terrible user (and developer) experience at every level of the stack, and endless ugly hacks to get around badly thought out physical topologies.

I wanted something that made networks as easy to create and join as IRC channels or chat rooms. I’d been entertaining some of my own ideas about peer to peer networking since 2009 so I figured I’d take a serious crack at the problem of simple direct end-to-end networking.

The design I settled on is one of pragmatically minimal centralization, trading just a bit of decentralization for huge gains in speed and user experience. “Decentralize until it hurts, then centralize until it works.” Implementing and testing, the design seemed to work almost shockingly well. A bit of math told me it would scale. I kept going.

The first usable alpha was pushed to GitHub in July 2013, and the first packaged beta binaries for end-user use were released for Macintosh and Linux in February 2014.

The original pitch I made to myself when I decided to create this was to “do a Dropbox” on VPNs. Dropbox achieved amazing success by solving an un-trendy un-sexy problem everyone thought was already solved: file sync. “But can’t I just send an e-mail attachment or upload my file to a web site?” Networking seemed like a similar domain. The problem of virtual networking is already solved by a lot of awful stuff everyone hates.

After going live, ZeroTier got a lot of positive attention. Since I started tracking the number of active devices online in early 2014, it’s had sustained ~10% month-over-month growth in user base.

In early 2015 the product went out and found its own funding. I’d been considering seed financing or crowdfunding, but two users approached me first. “I” is now “we,” the product has received quite a bit of polish, and the vision is a lot larger. What began as a “better VPN” has evolved into a vision of universal software defined networking in every setting and on every device.

Think of Earth as a data center and ZeroTier as a smart switch with cryptographic authentication, VLAN capability, and soon many other features. That’s what we’re building. 2015 was about getting it ready for a larger stage and converting the first few large-scale customers. 2016 will be about enterprise networking, applications, and beyond.

Enterprise-Grade Reliability

The Internet is only nominally peer to peer. The way we’ve deployed it makes it amazingly hostile to direct connectivity. This is partly a result of security needs. Since so much software is so broken security-wise, an ugly ham-fisted hack called a firewall has become indispensable standard practice. The second reason for the Internet’s hostility to direct communication is a kind of historical feedback loop. Since most Internet applications so far have been client-server at the protocol level, we’ve invested a lot of time and effort into making the Internet reliable for endpoints to act as clients for servers that live in “the cloud.” Comparatively little attention has been paid to the reliability or ease of endpoint-to-endpoint communication.

Developing a reliable direct networking solution that works in today’s real world is probably comparable in difficulty to making a distributed database that won’t get called “maybe.” Consider what stateful NATs might do if they run out of IP:port mappings for a particular endpoint on their network (or globally). It turns out that some implement LIFO behavior, forgetting the most recently learned mappings. Others implement FIFO behavior, expiring old ones. Finally at least a few seem to just forget them randomly and completely irrespective of whether they are being actively used for traffic. Router/gateway manufacturers simply haven’t put much thought into the reliability of these systems for this use case. After all everyone just uses the Internet to make short lived connections to web servers on port 80 or 443, right?

Building a direct networking layer means dealing with gremlins who constantly run around yanking network cables. Now throw in mobility, battery life concerns, ISP traffic shaping, and bandwidth quotas. Most people who try to do this run away screaming. This is a problem with a body count.

Here at ZeroTier we’ve taken a solemn oath to field a peer to peer network that can work reliably enough across both stationary and mobile devices (and in the cloud, etc.) to be trusted for things like point of sale networks and self-driving vehicles.

What we have so far is pretty good, but it’s not that good. We still see edge case flakiness, especially on less-standard networks. Getting that last few percent of reliability is going to take bigger guns. It will take data, analytics, and a more methodical and scientific approach.

The first step is for us to start turning on something called circuit testing. We quietly introduced it as a feature in version 1.1.0, and it allows the network controllers behind networks you’ve joined (making it opt-in) to generate test traffic between members on the same network. This traffic is small and invisible and doesn’t interfere with ordinary network operations, and it reports back some very basic statistics about what is talking to what through what network paths and how often it’s working (see the link for details).

One thing we’ve discovered is that there’s only so much we can do with scripted, synthetic scenarios and tests. The real world is full of network equipment that behaves strangely only under load or only when certain conditions arise. It’s also full of weirdly configured networks. Our customers sometimes encounter problems in the field because their network configurations are so perverse we’d never have imagined building such a thing as a test scenario. (Example: we’ve discovered that stacking multiple different brands of NAT routers is common. This causes networking mandelbugs but hey the web still works so the Internet is still up.) Circuit testing will provide real data that will allow us to hone in on problems and edge cases that are unanticipated or hard to reproduce in the lab.

The second thing circuit testing will let us do is to provide SRE (site reliability engineering) as a service for distributed networks. In the next few months we’re going to be unveiling this as a product offering. If you want to use a ZeroTier network in a mission critical setting, we can monitor it and react to problems before you notice them.

These two things work together. When we address problems in monitored networks we’ll be taking what we learn and building it into the product. Solve, improve, repeat. Ultimately it’s not possible to exceed the reliability of the underlying physical network, but we do think it’s possible to converge with it to within tiny fractions of a percent. That is our goal.

We’re aware that a few users might have privacy concerns about this, but we’ve already been quite up front about the fact that we are not an anonymity solution like Tor. ZeroTier has encryption and using it doesn’t require you to make an account on zerotier.com, but the protocol doesn’t conceal network meta-data. Nothing less than Tor-like onion routing can achieve that, and onion routing comes with speed penalties and other issues.

Circuit testing only gathers meta-data, and only network controllers for networks you have joined are allowed to do it. As with our core design, we adhere to a philosophy of pragmatic multi-objective optimization. It’s okay to sacrifice just a little bit of one thing for a lot of another. We think making ZeroTier absolutely rock solid is worth a little bit of already easy to obtain public network information. The alternative is to never adequately solve the problem of reliable peer to peer networking, and if that isn’t done we are guaranteed a future in which all traffic is man-in-the-middle’d by design.

Rules and Policies

Right now ZeroTier virtual networks emulate flat Ethernet. It’s possible to make them private and allow only approved devices (enforced with automatically issued certificates) and to set minimal policies around what types of traffic they can carry (IPv4, IPv6, etc.). That’s good enough for a huge array of use cases, but many enterprise users and special purpose use cases require more control.

That’s why in 2016 we’re going to be adding a rules engine to ZeroTier. It will allow network-wide traffic flow rules to be set in a manner similar to a simple OpenFlow-enabled smart switch or “iptables.” You’ll be able to only allow certain IP ports or protocols, prohibit lateral traffic if your application actually is client-server, and so on.

Rules would be enforced by all members of a network, so compromise of a single device wouldn’t permit them to be broken. To break the rules would require global compromise of a large number of hosts or of the network controller. If that happens you have bigger problems than network-level rule enforcement.

You’ll also be able to conduct security monitoring in your own networks by setting up rules that “tee” traffic of interest to observers. For example: by watching all TCP SYN packets an observer could see all TCP connections on a network without having to actually back-haul and man-in-the-middle all payload traffic. At least some of the security monitoring benefits of centralized choke points can be achieved without centralized choke points.

SDN for the Cloud and Data Centers

One of the odd beliefs we have here at ZeroTier is that VPN, SDN, and peer to peer networking are all the same problem framed in different ways. We’re evolving into an enterprise networking company because doing the crypto-hippie Internet decentralization thing well turns out to be the same as doing SDN well.

What ZeroTier does on the Internet, it can also do on the intranet. If these really are all the same problem then you can solve them all with one stack and manage them all with one interface.

We already have a number of users using ZeroTier for hybrid cloud. It’s a clear use case: add your cloud resources and your on-premise resources to the same virtual network and you have a location-independent private backplane. Some users also use it to mix and match cloud resources, allowing them to spread their infrastructure across hosts for better diversity or reduced cost.

In 2016 and beyond we plan to do more in this area. Our rules and policies work will make ZeroTier a distributed competitor to OpenFlow-enabled switches. Federation for root servers will allow on-premise hosting of ZeroTier’s “upstream” for reduced latency and better reliability during Internet outages. Finally, we’re researching various options for closing the performance gap between ZeroTier and things like VXLAN and IPSec. One would be to make this a kernel module, while another would be to skip the kernel entirely.

Standards Based Peer to Peer Networking

Last month we somewhat quietly released a beta of Network Containers. As the name suggests, we’re initially targeting Docker, rkt, LXC, and runC containers with this technology. It lets you package a complete user-mode network stack inside the container, allowing network virtualization to be deployed without special host access. This is particularly well suited to mixed or multi-tenant container hosting infrastructures where single hosts might run containers belonging to more than one customer, department, or subsystem.

While containers are big hype these days, we weren’t thinking about these days when we built netcon. We were thinking about those days, the ones that come after these.

Our private mission statement is to “directly connect the world’s devices.” By that we mean all of them. We want to make it easy to create arbitrary network topologies joining anything with a CPU or that runs on something with a CPU.

While Network Containers has gained some attention in Linux devops circles, our larger vision for its future is in applications. In 2016 we plan to introduce the ZeroTier Application SDK for both desktop and mobile.

Network Containers will be a central part of our SDK and will allow an application to join virtual networks without kernel support or elevated permissions. Even better, you’ll be able to communicate over these networks as easily as you can communicate over the Internet. Instances of your app can communicate securely with other instances of itself or with anything else that can join a ZeroTier network using standard TCP/IP based protocols and standard network I/O calls and libraries. In most cases you won’t even have to recompile your code. Just add the SDK to your build path and add a few lines to enable it.

There are clear uses for this today like scaling, avoiding bandwidth costs, and easing interoperability. In the future we think there will be more. Thanks to Einstein, cloud back-haul introduces an unavoidable latency penalty. If your data must travel over a thousand miles to and from a data center to reach another computer in the same city, you’re adding a mandatory 20-60 milliseconds to its round trip time. On the horizon are user interface paradigms like virtual and augmented reality. Achieving the best possible immersive experience using these technologies requires latency minimization and therefore shortest path direct networking. These applications are also going to be bandwidth intensive. If it’s expensive to back-haul pictures and video today, it’s going to be even more costly and inconvenient to pay the indirect networking tax for all the users of an immersive telepresence or virtual world application. Massive companies might be able to afford it, but the requirement that all data for everything flow through the cloud imposes an unacceptable cost on small independent developers. Historically these are the ones who innovate most in emerging areas like VR.

An Internet of Things You Actually Own

There’s not much of a gap between Network Containers and porting ZeroTier to embedded. If anything, the former was probably harder. Correctly emulating the Posix Socket API was not easy (and it’s not quite done yet).

For embedded Linux that effort often isn’t even required. It can be trivial to run ZeroTier on ARM-based Linux-powered devices. We’re already talking to the makers of IP cameras and other bandwidth-intensive devices that could benefit from a faster direct networking alternative to the “put everything in the cloud” status quo. Why should video from a baby monitor travel 1,500 miles to and from a data center to reach your phone thirty feet away in your bedroom?

Beyond speed, latency, and cost, we foresee other benefits that revolve around privacy and user control. The first age of personal computing, which lasted from the late 1970s to the late 2000s, revolved around the personalization of information processing through personal ownership of “a computer.” The grey box (or laptop) was the center of an individual’s computing world.

The Internet and the cloud (a.k.a. mainframe 2.0) have changed all that and have pushed things toward a centralized model where your devices orbit closed cloud-hosted services.

Here at ZeroTier we’ve spent a bit of time pondering what PC 2.0 might look like. One thing that’s clear is that the amount of “silicon per capita” is increasing, and any one single device is no longer the center of a person’s computing world. If computing is to become personal again, it’s going to do so through the personalization of the cloud.

What if instead of a grey box each person could own one or more private network envelopes into which they could place all their devices? This would be ideally suited to devices with a more open design, devices that are designed for you to control rather than being tethered to an opaque monolith in the cloud.

Once we release our SDK we will be positioned to start realizing this. It will become possible to embed ZeroTier in a device and also in an app for accessing that device. From there we will be exploring ways of making network virtualization even easier to use, allowing non-tech-savvy users to control their own network boundaries intuitively.

Standardizing the ZeroTier Protocol

“But isn’t ZeroTier itself a closed silo?”

Our software and protocols are open. We plan to keep some enterprise software and SaaS offerings private but in terms of the core endpoint connectivity code there is not much up our sleeves. But our “pragmatically minimal centralization” model does get some push-back, and as we move forward we’d like to take measures to address some of our users’ concerns.

The core ZeroTier protocol has been fairly stable for a while. Once we address some lingering issues and are even more confident in its stability we plan to write it up as an RFC. That will allow third party implementations and interoperability, though we would still maintain its reference implementation.

That leaves the root servers. We’re not quite sure exactly what we’re going to do there yet, but we are exploring options such as the creation of an independent non-profit entity or consortium to manage them like the root name servers. We don’t make money off them directly, so we don’t think this would impact our revenue plans very much. If anything it would likely help us convert more customers by providing stronger assurances of the network’s long term stability and viability. We’ll just have to take care to balance this against security, performance, stability, and innovation.

In the long run we’d like the ZeroTier network virtualization protocol to be as much a part of the Internet as DNS, so maybe following that model is best.

Staying Up to Date and Supporting the Project

The best ways to stay up to date with ZeroTier is to follow us on Twitter and subscribe to this blog’s RSS feed. You can also follow the main ZeroTierOne repository on GitHub and join our new community. It’s just getting off the ground but we hope it will soon evolve into an active place where users and interested parties can discuss issues, improvements, and future directions.

If you want to support our work there are several things you can do. The simplest is to use it, report issues and provide feedback, and tell other people about it. If you want to support it financially, create an account on our hosted network controller interface and you can subscribe to paid network service. You’ll also get an e-mail announcement when our new enterprise offerings are available.

Meet Alice and Bob: ZeroTier’s New Global Root Server Infrastructure

ZeroTier is growing, and to facilitate that growth we’ve just made a significant upgrade to our root server infrastructure.

While the ZeroTier network is peer to peer, it relies upon a set of designated core nodes whose role and responsibility is very close to that of the DNS root name servers, hence the similar terminology. For those who are curious about the technical background of this design, this old blog post by ZeroTier’s original author explains the reasoning process that led to the system as it is today. In short: while devices encrypt end-to-end and communicate directly whenever they can, the root server infrastructure is critical to facilitating initial setup.

Since ZeroTier’s launch its root server roles have been filled by four geographically distributed independent nodes. For some time now these have been located in San Francisco, New York, Paris, and Singapore. Each ZeroTier device announces its existence to all four of these and can choose to query whichever seems to offer the lowest latency. While this setup is simple, robust, and has served us well, it presented a scalability problem: as the network grows how do we add more root servers without requiring clients to make more announcements? It’s always possible to just make those roots bigger, but scaling and robustness will eventually demand the ability to scale elastically and to add more locations.

Since our user base is quite global, we also wanted to cover more of the world. Those four locations are great for America and Europe, but it left much of Asia, Australia, Africa, the Middle East, India, and South America with 200ms+ ping times to the nearest root. That’s not nearly as bad as it would be if ZeroTier were a completely centralized “back-haul all traffic to the cloud” protocol, but it still meant perceptibly slower connection setup and sign-on times for many users.

AnyCast to the Rescue! Strikes Out

Early in 2015 we started to explore the possibility of placing our root servers behind IPv4 and IPv6 anycast addresses. ZeroTier is a UDP-based protocol like DNS and SIP, two other protocols that are frequently deployed this way. Global anycast would allow us to put the whole root infrastructure behind one or maybe two IPs and then add as much actual capacity as we want behind that facade. The advantages at first seemed obvious: a single IP would play well with the need to work behind NAT, and the global Internet infrastructure and BGP would (we thought) take care of geographic route optimization for us.

We went to the trouble of heavily researching the topic and even obtaining a provisional allotment of address space. But as we conversed with numerous cloud providers and ISPs, it quickly became apparent that actually deploying anycast and doing it well is painful and expensive. Hundreds of gigabits of cloudy goodness can today be had for a few hundred dollars per month, but with the complexities of AnyCast and BGP sessions the cost quickly balloons to thousands of dollars per month for significantly less actual capacity. Chief among the cost drivers (and elasticity-killers) is the requirement by nearly all ISPs that anycast hosts be dedicated servers. BGP is also a bit more finicky in practice than it seems in theory. Conversations with experienced BGP administrators convinced us that actually getting anycast announcement and high availability fail-over to work reliably and to deliver optimal routing across the world is deceptively hard; easy to prototype and get operational but hard to tweak and optimize and get truly robust.

BGP also introduces a security concern: if our entire root infrastructure is behind a single IP block with a single ASN, then a single BGP highjack or poisoning attack could take it down. “Legitimate” attacks become easier too: it’s a lot easier for a censorship-happy regime to blacklist one IP block than to maintain a list of addresses scattered across blocks belonging to multiple cloud providers.

While we can see uses for anycast in the future for other potential services, a deep study of the topic made us start thinking about other options.

Then we remembered we were a software defined networking company.

ZeroTier Clustering

Clustering (also known as multi-homing) has been on the ZeroTier feature queue for quite some time. Since ZeroTier’s virtual addressing is defined entirely by cryptographic tokens, there is nothing that glues an address to a physical endpoint. That’s why I can switch WiFi networks and be back on my virtual networks in less than a minute. It also means that nothing prevents a single ZeroTier “device” from being reachable via more than one IP address. IPs are just paths and they’re completely ephemeral and interchangeable.

While the idea’s been around for a while, implementation is tricky. How are peers to determine the best path when faced with a set of possibilities? How are new paths added and old ones removed?

The thing we were stuck on is the idea that the initiator of the link should be the one making these decisions. As we were mulling over potential alternatives to anycast, a simple thought came: why doesn’t the recipient do that? It has more information.

The contacting peer knows the recipient’s ZeroTier address and at least one IP endpoint. The recipient on the other hand can know all its endpoints as well as the endpoints of the contacting peer and the health and system load of all its cluster nodes. From that information it can decide which of its available endpoints the peer should be talking to, and can use already-existing protocol messages in the ZeroTier protocol (ones designed for NAT traversal and LAN location) to send the contacting peer there. Several metrics could be used to determine the best endpoint for communication.

After a few days of coding an early prototype was running. Turns out it took only about 1500 lines of C++ to implement clustering in its entirety, including cluster state management and peer handoff. Right now this code is not included in normal clients; build 1.1.0 or newer with ZT_ENABLE_CLUSTER=1 to enable it. The current version uses a geo-IP database to determine where peers should be sent. This isn’t flawless but in practice it’s good enough, and we can improve it in the future using other metrics like direct latency measurements and network load.

Clustering allows us to create a global elastically-scalable root server infrastructure with all the same characteristics that we initially sought out through anycast, but without BGP-related security bottlenecks or management overhead and using cheap commodity cloud services. Right now the clustering code has only been heavily tested for this specific deployment scenario, but in the near future we plan to introduce it as something you can use as well. The same clustering code that now powers Alice and Bob could be used to create geographically diverse high-availability clustered services on virtual networks. Any form of clustering is challenging to use with TCP, but UDP based protocols should be a breeze as long as they can be backed by distributed databases. We’ll also be introducing clustering for network controllers, making them even more scalable and robust (though they can already be made very stable with simple fail-over).

Alice and Bob

Using our new clustering code we created two new root “servers”: Alice and Bob. The names Alice and Bob seemed a natural fit since these two names are used as examples in virtually every text on cryptography.

If clustering lets us spread each of these new root servers out across as many physical endpoints as we want, then why two? Answer: even greater redundancy. The old design had four completely independent shared-nothing roots. That means that a problem on one would be very unlikely to affect the others, and all four would have to go down for the net to experience problems. Introducing clustering means introducing shared state; cluster members are no longer shared-nothing or truly independent. To preserve the same level of true systematic redundancy it’s important that there always be more than one. That way we can do things like upgrade one, wait, then upgrade the other once we’ve confirmed that nothing is wrong. If one experiences serious problems clients will switch to the other and the network will continue to operate normally.

A goal in our new infrastructure was to offer sub-100ms round trip latency to almost everyone on Earth. Alice and Bob are both (as of the time of this writing) six node clusters spread out across at least three continents.

Alice Bob
Amsterdam / Netherlands Dallas / USA
Johannesburg / South Africa Frankfurt / Germany
New York / USA Paris / France
Sao Paolo / Brazil Sydney / Australia
San Francisco / USA Tokyo / Japan
Singapore Toronto / Canada

We’ve done some latency measurements, and the locations above bring us pretty close. There’s a gap in the Middle East and perhaps Northern India and China where latencies are likely to be higher, but they’re still going to be lower now than they were before. If we see more users in those areas we’ll try to find a good hosting provider to add a presence there.

Alice and Bob are alive now. They took over two of the previous root servers’ identities, allowing existing clients to use them with no updates or configuration changes. While clients back to 1.0.1 will work, we strongly recommend upgrading to 1.1.0 for better performance. Upgrading will also give you full dual-stack IPv4 and IPv6 capability, which the new root servers share at every one of their locations.

Before taking the infrastructure live we tested it with 50,000 Docker containers on 200 hosts at four Amazon EC2 availability zones making rapid-fire HTTP requests to and from one another. The results were quite solid, and showed that the existing infrastructure should be able to handle up to 10-15 million devices without significant upgrades. Further projections show that the existing cluster architecture and code could theoretically handle hundreds of millions of devices using larger member nodes. Going much beyond that is likely to require something a bit more elaborate, such a shift from simple member-polling to something like the Raft consensus algorithm (or an underlying database that uses it). But if we have hundreds of millions of devices, that’s something we’ll have more than enough resources to tackle.

Epilogue: The Value of Being Small

Necessity really is the mother of invention. If we were a giant company we’d have just gone the anycast route, since it seemed initially like the “right solution.” But being a comparatively poorer startup, we balked at the cost and the management overhead. Instead we set ourselves to thinking about whether we could replace the big sexy “enterprise” way of anycast with something more clever that ran on top of the same commodity cloud and VPS services we’d used so successfully in the past. In the end that led to a system that is smarter, more scalable, faster, more robust, more secure, and at least an order of magnitude cheaper. We also got a new technology we can soon make available to end users to allow them to create multi-homed geo-clustered services on virtual networks. (More on that soon!)

That’s one of the reasons startups are key to technology innovation. Big players will do things the established way because they can afford it, both in money and in personnel. Smaller players are required to substitute brains for braun.

ZeroTier 1.1.0

ZeroTier 1.1.0 has been released. The full technical release notes can be found on GitHub. This release marks a major development milestone toward “enterprise” readiness and contains three new features that are largely invisible to the user in ordinary day-to-day use but have profound future implications.

Circuit Testing

Network controllers can now send a special probe message called a circuit test. ZeroTier endpoint nodes will only respond to these probes if they’re cryptographically signed by a network controller for a network they’ve joined (making circuit tests effectively opt-in). These probes contain a list of ZeroTier addresses to which they should be forwarded. Each node, upon receiving a valid signed circuit test, removes itself from this list and forwards the test along. It also reports back to the network controller with a little bit of information about how it received the probe and where it was sent next.

If you have a problem with your Internet connection, your ISP can perform remote network diagnostics before they actually send a truck to your location. Circuit tests allow ZeroTier network controllers to do the same. Right now documentation on this feature is sparse, but in the future we’ll be adding more as well as GUI support for this in ZeroTier Central. This allows ZeroTier as a virtual ISP to offer similar diagnostic services to enterprise clients. Since it’s a part of the normal open source code base, DIY users can use the same functionality to perform remote diagnostics too.

NDP Emulation and Multicast-Free IPv6 Networks

The second new feature we’ve introduce is IPv6 NDP emulation for RFC4193-numbered networks. If that sounds cryptic and mysterious don’t worry. It doesn’t affect ordinary network traffic.

NDP stands for neighbor discovery protocol, and is the IPv6 equivalent of IPv4’s ARP. It’s how devices on an Ethernet network (physical or virtual) determine the Ethernet MAC address of a device when given its IP address.

For IPv4, ZeroTier lets you assign IP addresses within a configured network and IP address assignment range in a manner similar to DHCP. We also now support an RFC4193-based IPv6 assignment scheme, but unlike IPv4 you don’t have to define specific private address ranges or parameters. IPv6’s address space is so large that these values can be determined automatically. ZeroTier-managed local IPv6 addresses follow the following template:

fdNN:NNNN:NNNN:NNNN:NN99:93AA:AAAA:AAAA (/88)

In the address above, the N’s are the 64-bit ZeroTier network ID and the A’s are the 40-bit ZeroTier device address of a specific device on that network.

This assignment scheme guarantees unique addresses, which is a nice feature. In IPv4 you always have to worry if the local addressing scheme of a network you’ve joined conflicts with the address scheme of another network or of your local LAN. But the size of IPv6’s addresses allow us to do something else. Notice that the network ID and the ZeroTier device address can both be embedded in the address. If the device ID is embedded in the address, then IPv6 NDP is technically not necessary and can be skipped. This is exactly what ZeroTier now does, but only for addresses of the above form and only on networks where this addressing scheme is enabled. On these networks if your client sends an IPv6 NDP query for such an address, the local ZeroTier engine immediately responds with a locally-crafted emulated IPv6 NDP reply. To your OS or application it looks like how normal networks behave, but the actual sending and receiving of a multicast lookup query has been skipped.

Emulating NDP in this way has three advantages. It completely prohibits man-in-the-middle or denial of service attacks through NDP packet spoofing. (Though such attacks are already difficult on virtual networks since MACs are not spoofable.) It speeds up connection establishment by allowing the client to skip a multicast query. Last but not least, it completely eliminates the need for multicast if you’re only using these IPv6 addresses on a given network.

Multicast is more complex and costly than unicast in bandwidth, CPU, code footprint, and memory. Having an addressing scheme that eliminates the need for multicast opens up the potential for a lighter-weight multicast-free build of ZeroTier for small embedded and mobile devices. This would be attractive for Internet-of-things applications. With this address mode we can combine an ultra-light build of ZeroTier with an embedded IPv6 stack and provide a virtual network endpoint for things like thermostats, home alarm systems, and appliances. We’ll be exploring this in 2016 as we begin work on a planned IoT SDK product.

Again, this only affects these specific addresses. IPv6 addresses assigned via other mechanisms like static assignment or DHCPv6 will still work as usual, as will IPv6 link-local addresses.

Clustering and Multi-Homing

The final new feature in 1.1.0 is clustering. Clustering will be used for the new ultra-scalable root server infrastructure that that we just tested at scale, but it could also be used by endpoint devices to provide advance multi-homed high-availability services on virtual networks. This latter use is considered “alpha” and is not yet supported, but in future versions it will become a standard offering.

Clustering allows a single ZeroTier endpoint to be distributed across a geographically diverse cluster of physical devices. This is accomplished using a simple backplane protocol that shares state information such as which device has which peers and what multicast subscriptions are active. In concrete terms a clustered ZeroTier endpoint could be used to offer a geographically distributed database that appears behind a single virtual IP on a virtual network. Clients are routed automatically to the endpoint nearest them, and can be handed off on endpoint shutdown. While making this work for TCP would be difficult (but not impossible), it would be immediately useful for UDP-based applications like VoIP, video chat, games, or DNS.

It’s also potentially very useful for gateways. So far ZeroTier hasn’t offered a native “route all Internet traffic through the VPN” feature. We do plan to add this, and clustering would allow gateways to be geo-distributed for maximum performance and high availability.

More information on clustering including an in-depth discussion of its design will be posted soon.

In the immediate aftermath of 1.1.0 we plan to focus more on end-user UI improvements in ZeroTier Central and on further development to our enterprise product offerings. We’re also planning a public beta of network containers in December of 2015.

Introducing Network Containers

TL;DR: If you’re going to put the network in user space, then put the network in user space.

For the past six months we’ve been heads-down at ZeroTier, completely buried in code. We’ve been working on several things: Android and iOS versions of the ZeroTier One network endpoint service (Android is out, iOS coming soon), a new web UI that is now live for ZeroTier hosted networks and will soon be available for on-site enterprise use as well, and a piece of somewhat more radical technology we call Network Containers.

We’ve been at Hashiconf in Portland this week. Network Containers isn’t quite ready for a true release yet, but all the talk of multi-everything agile deployment around here motivated us to put together an announcement and a preview so users can get a taste of what’s in store.

Background

We’ve watched the Docker networking ecosystem evolve for the past two or more years. There are many ways to connect containers, but as near as we can tell all of them can be divided into two groups: user-space overlays that use tun/tap or pcap to create or emulate a virtual network port, and kernel-mode solutions like VXLAN and OpenVSwitch that must be configured on the Docker host itself. The former are flexible and can live inside the container, but they still often require elevated privileges and suffer from performance problems. The latter are faster but far less convenient to deploy, requiring special configuration of the container host and root access.

It’s been possible to use ZeroTier One in a Docker container since it was released, but only by launching with options like “–device=/dev/net/tun –cap-add=NET_ADMIN”. That gives it many of the same down-sides as other user-mode network overlays. We wanted to do something new, something specifically designed not only for how containers are used today but for how they’ll probably be used in the future.

Cattle Should Live in Pens

A popular phrase among container-happy devops folks today is “cattle, not pets.” If containers are the “cattle” approach to infrastructure then container hosts should be like generic cattle pens, not doggie beds with names embroidered on them. They should be pieces of metal that host “stuff” with no special application specific configuration at all.

All kernel-mode networking solutions require kernel-level configuration. This must be performed on the host as ‘root’, and can’t (easily) be shipped out with containers. It also means if a host is connected to networks X and Y it can’t host containers that need networks A and Z, introducing additional constraints for resource allocation that promote fragmentation and bin-packing problems.

We wanted our container networking solution to be contained in the container. That means no kernel, no drivers, no root, and no host configuration requirements.

The Double-Trip Problem

User-space network virtualization and VPN software usually presents itself to the system through a virtual network port (tun/tap), or by using libpcap to effectively emulate one by capturing and injecting packets on an existing real or dummy network device. The former is the approach used by ZeroTier One and by most VPN software, while the latter is used (last we checked) by Weave and perhaps a few others. The pcap “hack” has the advantage of eliminating the need for special container launch arguments and elevated permissions, but otherwise suffers from the same drawbacks as tun/tap.

User-mode network overlays that still rely on the kernel to perform TCP/IP encapsulation and other core network functions require your data to make an epic journey, passing through the kernel’s rather large and complex network stack twice. We call this the double-trip problem.

First, data exits the application by way of the socket API and enters the kernel’s TCP/IP stack. Then after being encapsulated there it’s sent to the tun/tap port or captured via pcap. Next, it enters the network virtualization service where it is further processed, encapsulated, encrypted, etc. Then the overlay-encapsulated or VPN traffic (usually UDP) must enter the kernel again, where it once again must traverse iptables, possible NAT mapping, and other filters and queues. Finally it exits the kernel by way of the network card driver and goes over the wire. This imposes two additional kernel/user mode context switches as well as several memory copy, handoff, and queueing operations.

The double-trip problem makes user-mode network overlays inherently slower than solutions that live in the kernel. But kernel-mode solutions are inflexible. They require access to the metal and root privileges, two things that aren’t convenient in any world and aren’t practical at all in the coming world of multi-tenant container hosting.

Network Containers

We think user-mode overlays that use tun/tap or pcap occupy a kind of “uncanny valley” between kernel and user mode: by relying on a kernel-mode virtual port they inherit some of the kernel’s inflexibility and limitation, but lose its performance. That’s okay for VPNs and end-user access to virtual networks, but for high performance enterprise container use we wanted something better. Network Containers is an attempt to escape this uncanny valley not by going back to the kernel but by moving the other direction and going all-in on user-mode. We’ve taken our core ZeroTier virtual network endpoint and coupled it directly to a lightweight user-mode TCP/IP stack.

This alternative network path is presented to applications via a special dynamic library that intercepts calls to the Linux socket API. This is the same strategy used by proxy wrappers like socksify and tsocks and requires no changes to applications or recompilation. It’s also used by high-performance kernel-bypassing bare metal network stacks that are deployed in areas with minimum latency requirements like high frequency trading and industrial process control. It’s difficult to get right but so far we’ve tested Apache, NodeJS, Java, Go binaries, sshd, proftpd, nginx, and numerous other applications with considerable success.

You might be thinking about edge cases, and so are we. Socket APIs are crufty and in some cases poorly specified. It’s likely that even a well-tested intercept library will clash with someone’s network I/O code somewhere. The good news is that containers come to the rescue here by making it possible to test a specific configuration and then ship with confidence. Edge case issues are much less likely in a well-tested single-purpose microservice container running a fixed snapshot of software than in a heterogenous constantly-shifting environment.

We believe this approach could combine the convenience of in-container user-mode networking with the performance of kernel-based solutions. In addition to eliminating quite a bit of context switch, system call, and memory copy overhead, a private TCP/IP stack per container has the potential to offer throughput advantages on many-core host servers. Since each container has its own stack, a host running sixteen containers effectively has sixteen completely independent TCP threads. Other advantages include the potential to handle huge numbers of TCP connections per container by liberating running applications from kernel-related TCP scaling constraints. With shared memory IPC we believe many millions of TCP connections per service are feasible. Indeed, bare metal user-mode network stacks have demonstrated this in other use cases.

Here’s a comparison of the path data takes in the Network Containers world versus conventional tun/tap or pcap based network overlays. The application sees the virtual network, while the kernel sees only encapsulated packets.

Running the Preview Demo

Network Containers is still under heavy development. We have a lot of polish, stability testing, and performance tuning to do before posting an alpha release for people to actually try with their own deployments. But to give you a taste, we’ve created a Docker container image that contains a pre-built and pre-configured instance. You can spin it up on any Docker host that allows containers to access the Internet and test it from any device in the world with ZeroTier One installed.

Don’t expect it to work perfectly, and don’t expect high performance. While we believe Network Containers could approach or even equal the performance of kernel-mode solutions like VXLAN+IPSec (but without the hassle), so far development has focused on stability and supporting a wide range of application software and we haven’t done much of any performance tuning. This build is also a debug build with a lot of expensive tracing enabled.

Here’s the steps if you want to give it a try:

Step 1: If you don’t have it, download ZeroTier One and install it on whatever device you want to use to access the test container. This could be your laptop, a scratch VM, etc.

Step 2: Join 8056c2e21c000001 (Earth), an open public network that we often use for testing. (If you don’t want to stay there don’t worry. Leaving a network is as easy as joining one. Just leave Earth when you’re done.) The Network Containers demo is pre-configured to join Earth at container start.

Step 3: Run the demo!

docker run zerotier/netcon-preview

The container will output something like this:

***
*** ZeroTier Network Containers Preview
*** https://www.zerotier.com/
***
*** Starting ZeroTier network container host...
*** Waiting for initial identity generation...
*** Waiting for network config...
*** Starting Apache...
***
*** Up and running at 28.##.##.## -- join network 8056c2e21c000001 and try:
*** > ping 28.##.##.##
*** > curl http://28.##.##.##/
***
*** Be (a little) patient. It'll probably take 1-2 minutes to be reachable.
***
*** Follow https://www.zerotier.com/blog for news and release announcements!
***

While you’re waiting for the container to start and to print out its Earth IP address, try pinging earth.zerotier.net (28.46.55.247) from the host running ZeroTier One to test your connectivity. Joining a network usually takes less than 30 seconds, but might take longer if you’re behind a highly restrictive firewall or on a slow Internet connection. If you can ping 28.46.55.247, you’re online.

Once it’s up and running try pinging it and fetching the web page it hosts. In most cases it’ll be online in under 30 seconds, but may take a bit longer.

Next Steps, and Beyond

We’re planning to ship an alpha version of Network Containers that you can package and deploy yourself in the next few months. We’re also planning an integration with Docker’s libnetwork API, which will allow it to be launched without modifying the container image. In the end it will be possible to use Network Containers in two different ways: by embedding it into the container image itself so that no special launch options are needed, or by using it as a libnetwork plugin to network-containerize unmodified Docker images.

Docker’s security model isn’t quite ready for multi-tenancy but it’s coming, and when it does we’ll see large-scale bare metal multi-tenant container hosts that will offer compute as a pure commodity. You’ll be able to run containers anywhere on any provider with a single command and manage them at scale using solutions like Hashicorp’s Terraform, Atlas, and Nomad. The world will become one data center, and we’re working to provide a simple plug-and-play VLAN solution at global scale.

Hat tip to Joseph Henry, who has been lead developer on this particular project. A huge number of commits from him will be merged shortly!