Measuring Bitcoin’s Decentralization

https://coinmetrics.io/measuring-bitcoins-decentralization/

By Karim Helmy and the Coin Metrics Team

Key Takeaways

  • Bitcoin’s decentralization can be quantified in terms of supply dispersion, hashpower distribution, and exchange consolidation, among other metrics.
  • Key metrics like the number of active addresses and the network’s hashrate continue to rise.
  • Bitcoin’s supply is becoming more evenly dispersed, and the mining and exchange markets remain competitive.

Introduction

Over the last eleven years, Bitcoin has managed to function relatively seamlessly in the face of a large number of threats, largely due to its lack of a single controlling entity. This trait, known as decentralization, encompasses a large number of loosely-coupled characteristics. Some of these traits are difficult to describe and measure, but others lend themselves well to direct analysis. 

One directly observable feature is the dispersion of funds across addresses. The distribution of wealth is a critical factor in any economy, roughly coinciding to the distribution of economic influence. For cryptoassets, which often grant large token allocations to the founding team, it’s also a severely underexplored one.

Another characteristic, the distribution of hashpower, is arguably even more important. Bitcoin relies on decentralization at this level in order to meet its goals of sustaining a secure, censorship-resistant payments and savings system. 

Bitcoin is also highly exposed to the market share distribution of exchanges, which exercise an outsized influence on the network’s economy. The distribution of volume on fiat-quoted spot pairs is particularly important, since these represent on- and off-ramps to and from the world at large.

In this week’s feature, we’ll quantify Bitcoin’s decentralization along these three verticals and track how it’s progressed over time. 

Dispersion

The presence of whales, or users with large quantities of funds held in the asset, is a concern for the viability of many cryptocurrencies. A particularly unequal distribution of funds could grant a small set of users significant influence over the direction of an asset’s markets and protocol development and call into question the asset’s viability as a store of value or medium of exchange.

Since Bitcoin balances are easily auditable, dispersion can be assessed with on-chain data. Because funds held by custodians in omnibus accounts cannot be attributed to their owner and address reuse is generally discouraged, these estimates are imperfect. However, the degree of transparency afforded is still unprecedented when compared to the legacy financial system.

Bitcoin still has whales, but since the network’s inception, its supply has become more evenly distributed, with smaller accounts comprising an increasing proportion of the aggregate supply.

Source: Coin Metrics Network Data Pro

In addition to controlling an increasing proportion of supply, addresses with smaller balances continue to represent the majority of accounts. In the face of a fluctuating dollar-denominated price, most addresses still control less than $100 worth of Bitcoin.

A closely related metric, the number of unique active addresses, also hints at usage by a broader set of network participants. Because a single user can control multiple addresses, this metric is not a perfect proxy for the number of participants, but is generally considered to be correlated. Recently, Bitcoin’s active address count has begun to approach all-time highs.

Mining

In addition to on-chain dispersion and activity, Bitcoin’s effective decentralization depends on the distribution of computational power, or hashpower, among miners.

Bitcoin relies on miners to secure the network and add new blocks to the blockchain. These miners compete to find the next block by computing a large number of energy-intensive hashes, and often aggregate into loose coalitions known as mining pools.

The amount of hashpower securing the Bitcoin network has generally grown exponentially throughout the network’s history.

Source: Coin Metrics Network Data Pro

In addition to the amount of raw hashpower securing the network, the distribution of hashpower is also important. A malicious actor who controls more than half of the network’s hashpower could 51%-attack the network and perform a double-spend, and an attacker with considerably less resources could censor transactions through feather forks.

An attacker would need to double-spend a large amount of money in order to make a 51%-attack profitable. In majority-hashpower ASIC-mined coins like Bitcoin, which require significant capital expenditure by miners, it would be difficult for a rational miner to perform a 51% attack, though these attacks are made somewhat more feasible by the presence of hashpower marketplaces.

Today, Bitcoin’s mining industry is competitive. The plot below, which is subject to a degree of survivorship bias, shows mining to be a thriving, distributed ecosystem.

While Bitcoin mining is distributed, it’s still at risk of centralization through state-level coercion and vertical and horizontal integration. Several exchanges, including Binance, OKEx, and Huobi, operate mining pools. BitMAIN, a hardware manufacturer, owns both BTC.com and AntPool, and is the only investor in ViaBTC.

Even a rational, well-resourced mining pool could have difficulty coordinating a 51% attack, since miners could leave the pool if the operator decided to attack the network. New coordination protocols like Stratum V2 may significantly increase the network’s decentralization by shifting control over block composition from pool operators to miners. 

One useful metric for gauging the decentralization of hashpower is the Nakamoto coefficient, which measures the number of pools that would need to collude in order to 51%-attack a network.  While Bitcoin has never been successfully 51%-attacked, in 2014 the mining pool GHash.io controlled over half of the network’s hashpower for about a day. During this time period, Bitcoin had a Nakamoto coefficient of 1.

Today, Bitcoin has a Nakamoto coefficient of 4, indicating a significant degree of decentralization.

Exchanges

Exchanges have a less direct impact on Bitcoin’s decentralization than miners, whose role is embedded in the protocol. As the primary markets on which Bitcoin is acquired and used, however, their influence on the network is significant.

Excessive centralization among exchanges exposes the market to systemic risks in case of insolvency. In the cryptocurrency space, the most well-known example of this is the 2013 Mt. Gox crisis, discussed in depth in SOTN Issue 35.

Consolidation would also increase the potential for censorship, negating one of the primary benefits of using Bitcoin. As the primary on-ramp from fiat to Bitcoin, the BTC/USD market is particularly important in this regard. While stablecoins have recently emerged as an alternative quote asset, fiat gateways remain a crucial way for new capital to enter the market.

While several exchanges offer trading on the BTC/USD market, the field is generally dominated by a few large players.

Source: Coin Metrics Market Data Feed

A useful metric for analyzing market concentration is the Herfindahl-Hirschman Index (HHI), which increases as a market becomes more monopolistic. While our estimates are subject to survivorship bias, the HHI of the BTC/USD spot market across Coin Metrics’ coverage universe has remained flat over the last year, having dropped significantly prior to that. Currently, the market is considered moderately consolidated according to this metric.

In addition to reported volumes, on-chain holdings offer another glimpse into the state of the industry. The comparative balances of the spot exchanges tracked by Coin Metrics’ exchange flows are shown below. Coinbase is notably excluded from these estimates due to the company’s avoidance of hot-wallet address reuse.

Source: Coin Metrics Network Data Pro

In a similar vein, tracking exchanges’ on-chain flows enables us to form a more complete view of the market and confirm reported activity. These metrics also paint the picture of a relatively competitive marketplace. Inflows for the spot exchanges tracked by Coin Metrics’ exchange flows are shown below; the behavior of outflows is very similar.

Conclusion

Bitcoin is meaningfully decentralized in terms of miner and exchange concentration, and its supply is increasingly evenly-dispersed. This analysis of Bitcoin’s decentralization is far from comprehensive, and various other metrics, such as node count and hardware manufacturer market share, should also be considered in assessing network’s health. On the whole, however, the network’s performance in these key verticals gives reason for cautious optimism.

The post Measuring Bitcoin’s Decentralization appeared first on Coin Metrics.

Inspecting Tezos decentralization: 200+ public nodes, 1000+ in total

https://medium.com/coinmonks/inspecting-tezos-decentralization-200-public-nodes-1000-in-total-6ef0761caac9?source=rss----721b17443fd5---4

When it comes to arguing Tezos decentralization they usually put roll distribution on the first place saying: “look, top 5 entities own more than half of the stake”. More advanced also highlight attacks on the voting mechanism: how many entities can block or force a proposal (which is actually a changing value).

However it’s not that straightforward, because once you are in a Proof-of-Stake network it’s not just rewards but also Value at Risk. At the end of the day it’s risk/reward ratio that matters when it comes to economic incentives and it’s only if we assume all agents are rational!

Ideally, for each attack vector (and strictly speaking every proposal introduces a new vector) one should estimate reward/VaR considering all risks for each attacker class (there are more than one profile).
We leave that for a separate study, but in this article, let us focus on another aspect of decentralization namely P2P layer.

Collecting peers and connections

In order to conduct a comprehensive analysis, we needed a high-quality data set.
Basically we could just set max_connections in the node config to a relatively large value and use /network/points RPC endpoint. However, as we found out, this output is rather polluted with nodes having different chain_id or nodes that are not operating.

Moreover, we also wanted to try to build the network graph so we needed not only vertices (nodes) but also edges (connections). We didn’t get to do it precisely in the end, but we learned a lot about how P2P works in Tezos.

Tezos Handshaker

Anyways, we went deeper and wrote a simple P2P scanner that connects to bootstrap nodes and queries known peers, then tries to connect to those peers and query their connections, etc. It worked great, however we faced several limitations:

  • Obviously, we couldn’t query known peers from nodes that are not exposed to the internet ( hidden nodes). Basically that’s fine, since we are mostly interested in public nodes;
  • Some nodes were probably rejecting our connections because they have reached the maximum connections count or for other reasons. As a workaround we do the scanning in a repeatable manner, however that does not give us 100% guarantee we’re not missing something;
  • The main problem is related to the way nodes respond to the request: they return no more than 50 results, of which 30 are best (active connections sorted by the time of establishment), and the remaining 20 are random (could be both active or not).

P2P LIKE A PRO

If you are interested in how P2P layer works in Tezos, check out the SimpleStaking blog.

Another problem relates to determining whether a node belongs to a particular network, in our case mainnet. We can confidently distinguish between public nodes, as they return version string during the handshake, however we cannot be 100% sure about hidden nodes. All we can say is that if a particular hidden node is known by several public mainnet nodes, it is likely to be mainnet node as well.

We are not sure about the reasons why carthagenet/zeronet/other nodes occur in the list of known peers of mainnet nodes. Probably this is due misconfiguration, or one’s running several nodes on the same machine, or else.

Goals and objectives

Given the above problems and limitations, we had to decide what we could calculate and how. We have formulated several goals:

  1. Identify all public nodes as they are in essence the “center” of the network and have the greatest importance;
  2. Try to detect active hidden nodes using heuristics;
  3. Make geographical analysis of these two groups;
  4. Draw an approximation of the network topology.

In order to do that we used the following algorithm:

  1. Do iterative peer scanning in order to handle max-connections issue and enumerate all random points;
  2. Finish the scan when the number of nodes stop growing for a sufficiently long period of time;
  3. Filter out nodes that do not belong to the mainnet
  4. Assign a score to each hidden node calculated as the number of public nodes that know that particular node;
  5. Filter out hidden nodes that have score less than the average.

Terms and conditions

In this article we will operate with the terms Public node and Hidden node. In both modes nodes are connecting to others, but only public ones accept incoming connections.
Bootstrap nodes are the default ones specified in the node config. This is actually a single hostname hiding a load balancer that routes requests to 27 nodes spreaded across the globe.

DISCLAIMER

In this article:

We analyse only Mainnet nodes;

The scanning method is time-stretched and it’s not possible to make a snapshot at a particular time;

We only rely on the geographical location of the nodes as well as the connections between them;

We recognize that we may not have scanned the entire network or may included inactive nodes in the dataset.

Thus, it’s important to understand that our results DON’T fully characterize the system.

We will look at the criteria for decentralization which determine how well the network can oppose a breakdown or an attack.

Tezos mainnet results

NUMBERS

During the scan we have discovered:

6298 addresses in total

1679 presumably operating nodes

203 public nodes

As you may notice, there are far more nodes in Tezos mainnet than the number of bakers. It is clear why the bakers should be decentralized (in all senses), but what about the other nodes? What are they?

Roughly speaking, while baker nodes ensures the valid state of the blockchain and actually “write” the data, the rest of the network provides decentralized access to that data (i.e. “reading”) and makes sure broadcasted “write requests” reach the baker.
This is just as important as block validation, because what’s the point in a decentralized network if you cannot access it in a decentralized way.

In the next chapters we will analyze all (presumably) running nodes and public nodes in isolation. Note, that while we are pretty confident about public nodes, there are certainly some deviations when we operate with the whole network. Still, we think it could give some interesting insights.

Geographical distribution

This is an intuitive criterion: the more continents, countries, jurisdictions, segments of the global network are covered by Tezos the better.
Connectivity and network topology are also important, especially their dependence on transcontinental communications and tier-1/2 operators, but we will examine that a bit later.

The heat map looks good, and although there are obviously countries with high concentrations of nodes, we will see later that these are mostly cloud provider data centers.

NUMBERS

Tezos nodes are distributed across 56 countries and 193 regions.

Let’s take a look at each of the sub-criteria in detail.

Hosting providers

Before we move on to detailed statistics by country and region, let’s look at the distribution of nodes by hosting providers.

Not surprisingly, we see the prevalence of popular cloud hosts, but if you take into account the country where the hosts are located, the numbers are not that big. For example, top 3 cloud providers with data centers in US (AWS, Google, Digital Ocean) host 300 Tezos nodes. The actual question is how important are those nodes for the network in general, and although we cannot answer that from the staking perspective, we can analyze the network topology based on our dataset.

Countries

Europe and the U.S. dominate, taking on about 2/3 of all nodes.

Interactive map

Note the (decimal) logarithmic scale.

Regions

As for the regions of individual countries, we can see that there is a correlation with the location of data centers of the largest hosting providers.

Interactive map

It’s more interesting, we think, to see how Tezos is scattered around the planet. Use the zoom to see the names of settlements.

Tezos network topology

We will only investigate the logical network topology. Unlike the physical topology, we will not consider the physical distance between nodes, latency and speed of packet propagation in the underlying network (Internet).

NOTE

As was pointed out, the numbers can differ in reality, but the topology will likely remain the same.

Using nodes as graph vertices and known peers connections as edges we built a network graph and calculated its basic properties.

MAINNET GRAPH

Radius: 2
Diameter:
3
Average path length:
1.9
Center size:
1082
Clustering coefficient:
0.82
Density: 
0.008

Here is a simplified interpretation of the results:

  • , , and are small which is good for network synchronization and fast propagation, and also says that presumably every node can reach the network center directly or via a trusted peer, or is part of the center itself;
  • is more than half of all presumably running nodes, supposedly it’s a more robust estimation of the network size that we used;
  • is high, the network is divided into three clusters, varying in the degree of connectivity. This is most likely a side effect of the way the scan is done, so let’s not give it much importance;
  • is low which indicates that Tezos graph is sparse;

Public nodes

Let’s take a closer look at the public nodes, we are particularly interested in how they are distributed across hosting providers and countries.

In theory, you can optimize the latency and improve connectivity using this information, e.g. in order to deal with endorsement misses or resolve other network issues.

Top countries and hostings

While the world’s largest cloud providers provide a highly reliable service, diversification will never hurt.

Interesting observation: half of Tezos’ public nodes are spinning on Amazon, including all the bootstrap nodes.

Bootstrap nodes alternatives

There is a predefined set of peers (set in the default configuration) a new node initially connects to. These peers called bootstrap peers and there are currently 27 of them, hidden behind load balancers. It is logical to assume that they are part of the center, and we will mainly care what proportion they make up and how far they are geographically dispersed.

The question that worries many people is what happens if the bootstrap nodes suddenly stop working?

As the graph shows, nothing terrible.

Further work

Using results of this work we will enrich our products with two features:

Stay tuned!

Originally published at https://baking-bad.org on July 30, 2020.


Inspecting Tezos decentralization: 200+ public nodes, 1000+ in total was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.