LF: A Fully Decentralized Fully Replicated Key/Value Store

TL;DR: this post introduces LF in detail and provides a little bit of back story on its development. You can head over to GitHub if you just want to jump in and give it a spin.

Also see threads at Hacker News and lobste.rs.

ZeroTier started as an open source Internet decentralization project. That goal is still very much a part of our DNA.

Our flagship network virtualization system is mostly but not completely decentralized. Our design philosophy is “decentralize until it hurts, then centralize until it works.” (This is taken from an old relational database design adage: “normalize until it hurts, denormalize until it works.”)

Back in 2011-2012 when ZeroTier was first designed there wasn’t a good way to achieve the ease of use, performance, and security we wanted without some centralized element to provide a source of truth and rapid node bootstrapping. A blog post written by ZeroTier’s founder goes into all the gory technical details for those who are interested. TL; DR: The vast majority of the intelligence in ZeroTier including its rules engine, authentication, path discovery, etc. is fully decentralized and runs at the edge but the network as a whole relies on a pool of ZeroTier-operated root servers (closely analogous to DNS root name servers) to help edge nodes find and recognize each other. Root servers are essential to the performance, network namespace unity, and “zero configuration” aspects of the ZeroTier experience.

A root server is a node (or cluster of them) that is contacted by every ZeroTier node in the world and that therefore knows where all nodes are located. It also caches identity information, allowing nodes to resolve ZeroTier addresses to their corresponding full length public keys. Total situational awareness might seem like a heavy burden, but keep in mind that the packets sent to roots and the information they must cache about each node are small. If every human being on Earth ran a ZeroTier node the storage overhead of a root would be 6-8TB, a capacity easily achievable at reasonable cost using a combination of large memory and super-fast SSD or managed cloud storage.

ZeroTier’s root servers are currently configured as a shared-nothing redundant pair of clustered nodes we’ve named Alice and Bob after the canonical names used in cryptography examples. Alice and Bob are both clusters of around six ZeroTier nodes scattered across multiple continents and cloud providers and tied together by a home-grown proprietary state replication algorithm. They’ve proven so scalable and reliable that we’ve barely had to touch them for years. Since ZeroTier was in beta around 2013 there hasn’t been a second of global down-time where no roots anywhere were reachable (excluding localized cases attributable to severe network issues like BGP configuration errors that are beyond our control).

The only theoretically workable substitute for centrally managed roots considered during early design was distributed hash tables (DHTs) like Kademlia. DHTs are unfortunately too slow, too unreliable under less-than-ideal network conditions, and too vulnerable to “Sybil” type attacks to meet our needs. They might be fine for something like BitTorrent magnet links where there’s not much value to be gained from an attack, but ZeroTier is an enterprise network virtualization system trusted by major corporations and thousands upon thousands of end users.

A year or two ago the thought occurred to us: if we can’t get rid of the roots, maybe we can find a way to cluster them in an open decentralized or federated way using techniques similar to those used by cryptocurrencies or federated networks like Mastodon?

Our interest in decentralizing the roots wasn’t just philosophical. Over the years we had numerous requests by policy-bound or very (justifiably) paranoid customers to run their own infrastructure that could be logically separated from ours as much as possible. We developed a somewhat ad-hoc bolted-on method called a “moon” (a term inspired by our whole “planetary data center” concept) that allows customers to set up their own roots that work alongside ours in a delegated manner and act like roots if ours are not available. Moons were good enough for some customers, but not all. They let things keep working if our roots are not reachable, but our roots still have to be in the picture all the way down to the individual node level or many of the most compelling aspects of ZeroTier like a globally unified address space and transparent mobility are sacrificed.

Eventually one of our largest enterprise customers offered to partly fund R&D in this area, so we set about seriously researching the topic. The identity of this customer shall at this time remain nameless due to NDAs, but let’s say it’s someone with a strong interest in extremely robust distributed systems.

The result of this funding is LF (pronounced “aleph”). Before we introduce LF, let’s take a detour to explain the reasoning process behind it.

Orchestration: Making Many Act as One

In the aforementioned blog post by ZeroTier’s founder, Decentralization: I Want to Believe, a key idea is that a little bit of centralization can have huge benefits in an otherwise fully decentralized system. This is why we like to say “decentralize until it hurts, then centralize until it works.” ZeroTier’s roots are what this post calls a “blind idiot god” (a playful H.P. Lovecraft reference), a minimal bit of centralization that knows only what it needs to know and does only what it needs to do and then gets out of the way.

Decentralization isn’t just for hippie hackers and crypto-anarchists. The ideas in that blog post were in part informed by a Google research paper titled On the Power of (even a little) Centralization in Distributed Processing. All the big Internet services companies have invested in distributed fault tolerant systems because there’s no single computer big enough to host things like Google Docs or Facebook, and even if there were it wouldn’t be fault tolerant or geographically distributed.. A lot of this work made its way into open source via popular projects like Kubernetes and Nomad.

Kubernetes and Nomad are examples of what are called orchestration systems. An orchestration system is responsible for running many instances of a distributed application and ensuring that all those instances are properly configured, connected, and able to access the right data. A key part of any orchestration system is a distributed data cache like etcd or consul that provides a reliable and secure source of typically small but very important information like IP addresses, public keys, identities, permissions, configuration variables, and so on. These “small data” caches are probably the most critical component in an orchestration system, since without them the orchestrator would have no way of knowing the current state of the world, what needs to be run, or where key resources are located. They are the “blind idiot gods” of orchestrated service architectures.

ZeroTier’s root servers help orchestrate the ZeroTier network just like etcd and consul help Kubernetes or Nomad orchestrate services. Indeed, we could use etcd or consul to back our root servers if we wished. They might help us scale better but they wouldn’t solve our centralization problem. That’s because etcd and consul only work (securely) if every node is run by the same trusted entity.

Google, Facebook, Amazon, and all the rest run decentralized infrastructure but they are not fully decentralized. Full decentralization means any untrusted third party can also run the same stuff and it all works together and remains secure. That means decentralizing trust, and that’s a hard problem the big tech players have never needed or even wanted to solve. Facebook has no reason to help you run your own Facebook node in your closet.

Even if we can’t use orchestration systems or designs similar to them “out of the box” to decentralize our roots, could certain ideas from this area inform a solution?

On Cryptocurrency and Decentralized Trust

Many readers will be thinking at this point that the trust decentralization problem was solved by the pseudonymous Satoshi Nakamoto in 2008. Just make ZeroTierCoin and do an ICO! (… and get investigated by the SEC?) Even if we don’t want to make a coin, couldn’t we just use an existing block chain as a data cache and let other root operators do the same?

We thought about it, but decided to pass.

One problem is purely perceptual. The existing cryptocurrency community has been plagued by scams, frauds, gambling, and other bad behavior. It turns out that human beings don’t stop behaving the way they’ve always behaved when given access to complex and leveraged financial systems just because a new way to operate such a system has been invented. Cryptocurrency’s creators wanted to go build their own playground independent of bad old finance, but all the bad old finance bullies seem to have just followed them over and stolen the merry-go-round (and many peoples’ lunch money).

Because of this bad perception, directly using a cryptocurrency was something we feared would drive away customers. We tested the idea in casual conversation and reactions confirmed our suspicions. Many “serious” customers regard anything even touching cryptocurrency as indicative of a scam or vaporware. This perception is unfortunate but it seems to be a reality.

Beyond perception there are also major technical issues.

Satoshi tried to solve the trust problem with a strategy borrowed (perhaps unknowingly) from nature: conspicuous consumption. Conspicuous consumption is a trust or authority signaling mechanism used by everything from peacocks to governments to rappers to announce things like biological fitness, membership in a group, or ability to handle responsibility. The idea is that spending a lot of resources on something otherwise not very useful constitutes a signal that is intrinsically hard to counterfeit. A peacock can’t easily fake a giant fan of plumage that costs a lot in terms of metabolic energy and nutrients to grow. It’s got a big colorful tail or it doesn’t.

Block chains implement conspicuous consumption through proof of work mining, a process whereby entities compete to solve otherwise useless but inherently difficult computational search problems. In a conventional block chain structure like Bitcoin the version carrying the most work in the form of hard-to-find search results is considered “correct” and gets subsequently extended, adding more work and making it progressively even harder to counterfeit.

It’s an extremely clever solution that seems to work well at a basic game theoretic level, but problems have become apparent since its introduction that make it less of a panacea than first believed.

The most obvious is that it’s too expensive. Proof of work mining only works to secure a high value ledger like a block chain if the amount of conspicuous consumption is so high that the cost of reproducing it at least equal to the potential gain from counterfeiting it. To accomplish this Bitcoin built in a mechanism whereby those that “win” an iterative proof of work contest are rewarded with currency. This incentivizes a run-away competition, securing the chain but also consuming as much electricity as several small first world nations. When the energy consumption of a major block chain is divided by its peak transaction volume, it becomes apparent that this is incredibly wasteful and probably not economically sustainable.

A less obvious problem cuts even more deeply: due to second order effects beyond the original game theoretic assumptions runaway proof of work mining may not actually solve the trust decentralization problem after all. Computational work is an industrial process, and nearly all industrial processes are subject to industrial scaling. The more of the process you do (and usually in one location), the cheaper it becomes. Over time mining has become almost entirely centralized in China where electricity is cheap (due in part to lax environmental regulations) and the majority of mining power is commanded by less than five mining pools.

Lastly, there’s a major economic issue that prevents its use for applications like our own. The bubble-prone deflationary economics of popular cryptocurrencies means there is significant uncertainty about what it would cost to use a block chain to record data in the future. What happens to ZeroTier if the cost of recording information about a new node suddenly increases by a factor of ten? Alternatives to runaway proof of work such as proof of stake still would not solve this problem. They may make it worse by making cryptocurrencies even more deflationary as proof of stake works by locking up currency.

Even if cryptocurrencies as presently deployed aren’t the right thing to solve ZeroTier’s root decentralization problem, could certain ideas from cryptocurrency inform a solution?

Introducing LF: A Reliable Data Store for Decentralized Orchestration

Most engineers tend to like “pure” systems and dislike hybrid or multi-paradigm ones. Most also have a tendency to stay in their own area of experience in terms of what tools and ideas they use to address problems. Both these impulses are often misguided.

There are almost no “pure” systems in biology, and it’s not because evolution is dumb. Evolution as embodied in biology (a deep topic far beyond this blog post) is a powerful learning and optimization machine. There are few “pure” systems in biology because in a multi-objective optimization problem the best solution is almost never at the “edge” of the solution space. Real solutions to very hard combinatorial optimization problems are usually compositions of multiple approaches tuned to settings that represent a good trade-off between many interfering or competing objectives.

ZeroTier itself is a hybrid system. It combines ideas from peer to peer networking like cryptographic addressing (VL1) with ideas from enterprise SDN (VL2) to deliver a network virtualization platform that lets you treat the whole planet like one data center.

LF echoes the same approach. It combines ideas from cryptocurrency like proof of work and cryptographically linked structures, ideas from orchestration like searchable caches of small critical data, and ideas from conventional federated trust like certificate chains and webs of trust. The result is something analogous to etcd or consul that’s designed to work with open decentralized systems that decentralize trust as well as storage and operations. All that sounds complex but as with ZeroTier itself we put a lot of effort into hiding complexity and making everything as automatic and “zero configuration” as possible. LF is not significantly harder to use than etcd, consul, or memcached. It can even be mounted as a FUSE filesystem to allow things able to cache data in files to use it without modification.

LF is a peer to peer key/value store designed to securely store small but critical bits of information. It uses proof of work like a cryptocurrency, but doesn’t rely solely upon it for trust or incentivize a runaway arms race. Fixed cost proof of work is used as a rate limiter to prevent abuse on public networks and cumulative work is used as one potential input for conflict resolution. Other potential sources of truth include local node heuristics (which conflicting entry arrived first?), elective trust of other nodes (oracles and commentary), and certificates.

The README.md file in the LF git repository goes into further detail about how LF works and how to build and use it.

LF is designed to help us decentralize our systems even more, but it’s not built exclusively for ZeroTier. We’re putting it out there under the same GPL dual-licensing terms as ZeroTier itself and inviting others to use it to build their own decentralized systems. LF would make an excellent data store for identity management, service discovery, and numerous other roles in completely decentralized application architectures. It offers performance, robustness, and security far beyond what is offered by DHTs.

Please try it out and let us know what you’re doing with it. Feel free to leave feedback or report bugs in the issues section and if you like it please consider supporting ZeroTier by paying for our services or telling people about us!