The tech: Narwhal and Tusk: A DAG-based mempool
Consensus on the idea that consensus isn't the solution.
Since the successful invention of the Proof-of-Work (PoW) consensus algorithm we’ve seen many innovations happening with regards to the consensus algorithm of which the most notable is Proof-of-Stake. However the TPS increased significantly by the transition to the PoS consensus mechanism, this inherently runs into scalability issues as well. Some scalability solutions that have been proposed over the years include:
Adjusting the block size
Restrict the amount of validators
New consensus mechanisms
Some of these ‘solutions’ have been implemented and resulted into higher transaction throughput but none of these are viable, longterm scaling solutions at all. The general consensus among cryptographers and distributed systems experts was that scalability should be sought after in the consensus mechanism. The researchers of the Diem project at former FaceBook successfully challenged this result of group thinking. According to them:
“A mempool, that reliably distributes transactions, is the key enabler of a high-performance ledger. It should be separated from the consensus protocol altogether, leaving consensus only the job of ordering small fixed-size references. This leads to an overall system throughput being largely unaffected by consensus throughput.”
The problem here is that the naive mempool design of monolithic blockchains like Bitcoin or Ethereum place transaction dissemination in the path of the consensus process and therefore it impacts the performance more than the consensus process itself. Narwhal demonstrates that it is possible to off-load transactions in a reliable manner to disseminate transactions before posting it to the mempool protocol. The only thing that consensus subsequently has to do is sequence a small amount of metadata. To reduce the latency in suboptimal circumstances, Narwhal will be complemented with a consensus protocol that generates a random coin to provide asynchronous consensus. So, in short, Tusk is a high performant consensus protocol that leverages Narwhal to maintain high throughput even if the number of validator increases.
Narwhal
Narwhal has the following properties:
The Integrity and Block-Availability properties of Narwhal enable a convenient separation of data dissemination from consensus. Meaning if implemented, Narwhal allows for the consensus layer to only order block digest certificates, which are small in size. A digest is a cryptographic hash function containing a string of digits created by a one-way hashing formula. I will elaborate this in a bit.
The Causality and Containment properties guarantee that any compatible consensus mechanism that utilizes Narwhal, even though it isn’t fully synchronous, it will reach high throughput. If you agree with a block digest when synchrony is restored after a period of asynchrony, the protocol can safely order all the causal related blocks and commit them.
The Chain-Quality property allows Narwhal to be used by blockchains providing censorship resistance. It also imposes restrictions on block creation and therefore prevents unnecessary ‘garbage collection’. DAG-based consensus mechanisms will eventually converge but it’s not sure when that moment will be, to make sure the chain does not stagnates and can reconstruct transactions there is this technique called ‘garbage collection’. However, one would assume that adversaries could censor transactions by delaying transactions and trigger the garbage collection mechanism. This won’t work due to the fact that the honest validator can re-insert the transaction in the next round, this process is called ‘Garbage Collection’. See the image below.
Last but not least, the mempool abstraction can scale and support the increasing demand in consensus. Narwhal’s throughput increases linearly with the number of resources each validator has while the latency doesn’t suffer. Which is arguably better than HotStuff since it only scales up to a certain degree.
Validators receive certificates of availability of blocks r and accumulate these into a list with certificates. If certificates for round r - 1 are accumulated from 2f + 1 distinct validators, a validator moves the local round to r, and subsequently creates and broadcasts a block for the new round.
Each block includes:
Identity of the creator
Local round r
Current list of transactions and certificates from r - 1
A signature from its creators
It must be said that correct validators only create a single block per round. The validators reliably broadcast each block they create to ‘Integrity’ and ‘Availability’. I will explain the process flow in a bit as well as the other properties of Narwhal.
A certificate of availability includes 2f + 1 signatures, this means that at least a majority of the honest validators have checked and stored the block. Since a block contains references to certificates from previous rounds, an inductive argument is used to show that all blocks in the causal history are certified, available and adhering to causality.
Consensus algorithms that are operating in partial synchrony, such as HotStuff or LibraBFT can leverage Narwhal. The aforementioned algorithms usually use round-robin leader election in which the leader proposes a block that is certified by other validators. With Narwhal, instead of proposing a bundle of transactions, a leader can propose one or more certificates of availability in Narwhal. Narwhal guarantees that given a certificate all validators see the same causal history which itself is a DAG over blocks.
A header gets voted on if it's the first you see from a specific validator for a specific number of the round. The validator aggregates all the votes into a certificate and re-broadcasts the certificate. This goes on and on wherein the next block will link to 2f + 1 certificates of the previous rounds so you build a structured round. This process is round 1 in Narwhal, by continuing this we get a Byzantine ‘Reliable’ Broadcast.
So the output to the consensus layer is the following: which is a structured mempool. With Tusk you just look at the DAG, interpret it to derive a total order of transactions. The consensus mechanism won’t send any messages at all.
Tusk
Tusk inherits its basis from DAG-rider and uses a round-based DAG-structure. The Tusk algorithm receives a causally ordered DAG constructed by Narwhal and orders its blocks with zero extra communication. Tusk differentiates itself in validators include in their blocks information to generate a distributed perfect random generated coin. To interpret the DAG, validators divide it into waves, each of which consists of 3 consecutive rounds. Conceptually, in the first round of a wave each validator proposes its block (and all its causal history); in the second round each validator votes on the proposals by including them in their block; and in the third round validators produce randomness to elect one random leader’s block in retrospect. To interpret Tusk as a DAG it is divided in waves which consists out of a 3-round structure and of which the third round is going simultaneously with the first round of the next wave. This is to reduce latency as well as creating randomness.
The random shared coin generation nor the consensus logic introduces any additional messages over Narwhal. Therefore, Tusk has zero message overhead.
As mentioned earlier, in Tusk every odd round gets a coin, so that every odd round you can go back and look at the round. The process continues and due to the coin you can see who was the leader of round 1 as a starting point. In order for a leader to be supported he needs to have f + 1 support.
But if it isn't there you just continue and ask “How much support does this leader has?” Support means: “How many nodes in the next round (R2) have linked to the leader?” As mentioned earlier, a leader needs to have f + 1 support. If there isn’t enough support you just continue to the next round and go over until you find a round wherein a leader has enough support. You recursively go back to the round that you couldn’t find a leader for and commit everything that's linked to L2.
If the leader in one of the next odd rounds gets to be elected as a leader, it goes back to the first round to commit it, up to the current round. Anything that is linked to the leader of the current round will be directly committed.
Benchmark
In theory, Narwhal and Tusk allows for a 3 second latency with 130K TPS. Let’s take a look at the consensus mechanisms which use a similar design architecture like Narwhal and Tusk. This is the benchmark with the trade-offs compared to other combinations. In this context the latency will refer to the time it takes to finalize a transaction. The last column is about the performance in suboptimal circumstances to which a blockchain definitely will be exposed. Based on this in one could say that in terms of TPS Narwhal and Tusk is the best performer, however there are more criteria which should be taken into consideration.
Thoughts
Despite the fact that I’m quite fond of this technical development, I think that there still are a few considerations prior to see this coming to fruition anytime soon.
Tested in a close environment
The protocol has been tested in a close environment and has been subjective to a simulated hostile environment but this doesn’t reflects the ‘real’ hostile and open environment which permissionless blockchains operate in. On top of this, the project has just been running with 50 single machine validators which isn’t demonstrative in a real environment as well. In theory this shouldn’t matter but in reality it is yet to be seen if it’s subjective to any practical difficulties which theoretically weren’t discovered on forehand.
Rust implementation
As far as I know, it currently exists as a Rust implementation, this isn’t per se bad but signals to me that we don’t see this being implemented anytime soon in the EVM-environment. This isn’t per se bad, since we still have to witness if this a viable concept with the imminent launch of the Sui blockchain. Regarding to the Sui implementation, they have decided in August 2022 that they are going to implement a modified version of Narwhal and Tusk, which is Narwhal and Bullshark. There is a good chance that if this concept is feasible, and an added value to blockchain architecture, existing blockchains might look into implementing this architectural adjustment as well. For the developers, the combination Narwhal and Bullshark only has 200 lines of code that needs to be implemented.
Execution engine
You are as strong as your weakest shackle and as fast as your slowest component, this definitely applies to blockchain as well. As long as there isn’t an execution engine that can utilize Narwhal and Tusk to its fullest, the added value of this architectural marvel will be insignificant.
Disclaimer: Nothing in this article is financial advise and solely serves educational purposes. If anything is incorrect, please let me know.