Introduction to Data Availability

In a typical blockchain architecture, there are usually three layers:

Generally, the settlement layer is where transactions are ultimately confirmed, such as Ethereum's L1, while various L2 Rollups can specialize in handling the execution part. So how can we ensure the crucial data availability? As a Permissionless design, the DA layer must provide a mechanism for the Execution and Settlement layers to check whether the transaction data is indeed available in the most minimal trust manner possible. This is the foundation of the entire blockchain at its lowest level, so how can we handle it?

What Are Data Availability and DA Problems?

As we know, a robust network will have many nodes that provide all the transactions on the chain and reach a consensus to form a final state. Nodes in the network perform different roles, including full nodes that consume significant computing power and storage space (with all transaction data), lightweight clients that inherit security from full nodes, and consensus nodes responsible for reaching consensus. So how can nodes determine whether all the data in a new block has actually been published to the network when generating a new block?

Of course, we can choose to download and verify all transaction data in the new block, but this will significantly increase the load on the nodes, making this solution unacceptable. The solution to the data availability problem aims to provide sufficient assurance to network participants who do not download and store data themselves, that is, complete transaction data is available for verification.

Current solutions for DA

Data Availability Sampling

Data availability sampling is a mechanism by which nodes can verify the availability of data without downloading all the data of a block. Each node (including non-collateralized nodes) downloads some small subset of the total data chosen at random so that there is no excessive load on individual nodes, and we rely on some Erasure Code methods such as Reed-Solomon encoding to extend the data set with redundant information to gain resilience.

Erasure Code

If we want to improve the reliability of our data when storing a copy, the most straightforward and simple way is to store multiple copies of a single piece of data, but this wastes some storage space, while Erasure code can achieve the same high reliability at a fraction of the cost of storing multiple copies. How is this achieved? Let's take a brief look at the principle of EC with some examples:

The mathematical principle is very simple: a k-1th curve can be determined by k coefficients or points on the curve.

The process of generating check blocks by EC is called EC encoding, which means multiplying all the data blocks by the Vandermonde matrix. When the data is lost and needs to be recovered, the decoding process of EC is used.

Implementation details

EC's approach is very simple and elegant but is based on a very strong premise: block producers don't do evil in the course of erasure code. So is there any way we can ensure this premise? How do we ensure that the data in the block is written by the block producer?

If we want to guarantee the validity of the data, it is very simple: we can have the block producer issue a vector commitment to the content after the block is constructed, and the validity of the EC process can be guaranteed in a number of ways:

Data Availability Committee

The committee is the trusted party that provides or certifies data availability, and we do this by setting a percentage threshold, and we accept a blob only if more than that number of people agree. in the DAC scheme, instead of publishing transaction data on the base layer, the block producer sends the block to the DAC for off-chain storage.

Depending on the degree of decentralization of the DAC, we can divide it into two different schemes, trusted and permissionless:

Solution Comparison

We have briefly described two of the current mainstream DA solutions, both of which have their own advantages and limitations:

Summary

Data availability is critical to the security of any blockchain, as it ensures that anyone can inspect the transaction ledger and verify it. Also, data availability is a key difficulty when scaling a blockchain. As blocks get larger, it becomes impractical for the average user to download all the data, so users can no longer validate the chain. The DA issue will further become a hot topic for discussion as blockchain scaling needs.

Sign up for our newsletter

Stay up to date with the roadmap progress, announcements and exclusive discounts feel free to sign up with your email.

© By Whisker —@whisker17