Integrity in the era of data
The data of today is actually owned, used, packaged and sold by big players, this data is not owned by you. Are we on the correct path to secure the present gold?

In modern times – in today’s digitised world – data is often compared to gold. But who secures the integrity of today’s gold? Do you have a way of making sure your data wasn’t maliciously changed? Do you know what data is being sold, to whom is it being sold and for how much? Do you get something out of that sale?

Many are the questions when we think about custodianship of data. Realistically speaking, almost no one cares – and the data of today is actually owned, used, packaged and sold by big players, this data is not owned by you. Are we on the correct path to secure the present gold?

Today, I’ll take you on a journey to discover more about the integrity of data – and to address the question: Are blockchain networks like Bitcoin, Ethereum, NEM or dHealth secure in a perspective of data manipulation? Can those be used to support innovation? To align incentives? To further grow digital utility?

Let’s start with taking a look at how these protocols agree on a common chain of blocks.

Table of contents

Decentral agreement

Many of the launched blockchain networks started creating their own consensus algorithms. These algorithms are used, to not just communicate on a Peer-to-Peer network – but also to agree on some predetermined rules of the network. Abiding by the rules of a source code protocol – this is what consensus represents.

Bitcoin introduced a consensus algorithm called Proof-of-Work to secure the synchronization and agreement of network nodes about the distributed state of the Bitcoin Blockchain.

This consensus algorithm starts with the agreement on searching for a number which must not be bigger than a given difficulty (another number). To find this number, the algorithm increases a field called “Nonce” multiple times, until the produced result complies to being smaller than the difficulty. This process requires computer resources and energy – because on top of increasing the “Nonce”, a Bitcoin miner must also include transactions in the block they are mining – thereby enabling a proof of work mechanism, where every block actually delivers a proof that work was done to find the lucky number.

It is this proof of provided work that makes Bitcoin the most secure blockchain of all networks currently in existence.

Adding to this, every time a Bitcoin block is mined, other miners will include a reference to the last block in their work when searching for the next best number.

Network nodes determine the current state of the blockchain by holding a copy of the data themselves. These nodes can also be used to synchronize with the network or to verify whether another node is malicious or not, for example.

Nodes on the network agree on the state of the blockchain by following up with blocks that are mined. Each of these blocks shall follow a set of rules when it is being mined, often referred as the consensus rules.

Consensus

Many consensus algorithms were born after Bitcoin’s Proof-of-Work (PoW) idea. Another one that is well known and established is Proof-of-Stake (PoS), which works somewhat similar in terms of finding a valid number, but defines that an account’s probability to mine blocks is dependent of an accounts’ stake – balance or amount of tokens – rather than using more and more computer resources in a race for the next block.

Such Proof-of-Stake networks are shared amongst stakeholder communities which are authorised to use the network – and to support the network – because they own a certain share of the supply in circulation.

Proof-of-Work is a process that uses more computer resources and the use of energy is quite intense. Alternatives include but are not limited to: Proof-of-Stake, Proof-of-Importance and Proof-of-Correctness. These alternatives usually necessitate less computer resources and energy to align and consent.

This is also how so-called permissioned blockchains came to be. Such networks also use protocol rules to define the process of block generation – just like Bitcoin – but on top of this, a list of participants is held on the network (stakeholders) to authorize the add of new blocks onto the blockchain. If you do not own a certain share of the network’s cryptocurrency, you will not be able to create new blocks.

In practise, PoW ressembles a meritocracy – where you deliver work to find the next block – and PoS resembles more a democracy – where finding a new block is tight to your position in the network. What the two have in common is what makes blockchain relevant: they make data redundant and more importantly they enable verifications.

Integrity, in a network!

For a network to secure the integrity of data, it must distribute it amongst peers. In fact, the mix of public key cryptography with the redundancy of data – as it is distributed amongst all peers – is what enables verifications about integrity. Yes, you can also verify whether the Bitcoin network is up safe or whether it was manipulated.

Another key metric of distribution is that of finding out how many miners actually execute mining operations on a given network. Is there any 1 actor that mines more than 51% of the network’s blocks? Let’s find that out.

Following up from some of our older articles, we will be using simple python commands to run the numbers on distribution of blockchain networks. With the below source code, we will take a look at the last ten blocks of the NEM blockchain. This shall let us determine whether a single miner owns more than 51% of the block generation process.

For this example, we’ll first go to install the nem-sdk package using NPM:

npm install -g nem-sdk

Now, we create a new file that we’ll name index.js and add the following source code:

// prepares the connection
let sdk = require("nem-sdk").default;
let networkId = 104;
let endpoint = {
    'host': 'http://hugealice.nem.ninja',
    'port': 7890
};

// gets the current block height
let current = sdk.com.requests.chain.height();
let min = current - 10;
let blocks = [];

// loop that reads the last 10 block headers
for (let i = current; i > min; i--) {
    let block = sdk.com.requests.chain.blockByHeight(endpoint, i);
    blocks.unshift(block);
}

For each of those blocks that the script above retrieved, we will now extract the signer field. This field represents the public key of the miner of a block. A simple line of code puts this public inside a variable:

let pubKey = block1.signer;

To complete our above example, we must read the signer public key of the last 10 blocks. In practise, it shouldn’t happen very often that one miner would mine most of those 10 blocks.

The above can be extrapolated to other blockchain networks, including Bitcoin and Ethereum. It is only by verifying whom is actually mining blocks on the network, that you can determine whether the network is operated in a distributed fashion or more centrally by a small set of actors – please, go ahead and verify the same for other networks that you may know of.

This verification as illustrated above can be used as a metric to determine how distributed the block generation is, for a specific network – given that distribution amongst peers is extremely important to ensure proper – and verifiable – data integrity.

On the probability of generating most blocks

We will now use a series of binomial tests to determine the probability of one miner to mine more than two out of ten blocks of a blockchain network. Following up, we will determine the probability of one miner to mine most of ten consecutive blocks of a network.

As an example, we’ll take 3 miners with the following block generation rates:

  M1 : 49% = 0.49
  M2 : 10% = 0.10
  M3 : 5% = 0.05

We will then use a binomial distribution with which, out of 10 blocks, 2 are generated by one and the same miner. The success rate is as follows:

  B(N=10, p=2/10)

To execute our binomial tests, we will be using the scipy.stats Python tool, that you can install with the following command:

pip install scipy

Next, using the library, we will determine the probability for each of the above listed miners, to mine at least 2 blocks out of the 10 being generated:

import scipy.stats as st
M1 = st.binom_test(2, 10, 0.49, alternative='greater')
M2 = st.binom_test(2, 10, 0.10, alternative='greater')
M3 = st.binom_test(2, 10, 0.05, alternative='greater')

The above shows that M1 has a probability of more than 98% to mine at least 2 blocks in a series of 10 generated blocks. With the above success rates, M2 only has a probability of 26% and M3 has a very low probability of 8% chances to mine at least 2 blocks out of a series of 10 consecutive blocks. 

Next, we will compute the probability of mining at least 5 out of 10 consecutive blocks, which shall be much smaller even with the higher of the above success rates: 

import scipy.stats as st
M1 = st.binom_test(5, 10, 0.49, alternative='greater')
M2 = st.binom_test(5, 10, 0.10, alternative='greater')
M3 = st.binom_test(5, 10, 0.05, alternative='greater')

This time we get less than 60% for miner M1 and a sinking probability of 0.1% chances only for M2. Thereafter, we can see that it is improbable for one single miner to mine the majority of blocks, even with a success rate of 49% at generating blocks. In other words, it would be common for one miner to mine more than one block out of 10 consecutive blocks, but not for a single miner to mine the majority of those blocks.

Even with a higher success rate at generating blocks, it is not possible for miners to determine exactly when they will mine the next block and this makes it so that manipulating the data inside those blocks gets extremely complicated – if not impossible – which of course, speaks in favour of the integrity of data.

Proof-based systems

Proof-based systems use public-ledger technologies to find consensus – or agree – on the state of the blockchain as well as to secure their data.

These systems make it possible – not just for one actor – but for everyone in the network to verify the integrity of data and to read the latest state changes on the network.

A proof-based system lets you read and verify data from a network of sources rather than from just one trusted source on the web – effectively distributing data and making it redundant.

The digitalisation of our society has clearly begun and we think it is important that data gets stored in a distributed place – redundantly – which is also verifiable, to avoid that today’s gold be in the hands of just one entity / big players.

Proofs – like for example the proof of executed work – can help increase the security and properties of integrity of data. i.e. In 1992, another flavour of Proof-of-Work was utilised to fight SPAM as e-mail client software [still today] are required to produce a [minimal] amount of work before an e-mail can be sent to your inbox. This proof-of-work integration had a tremendous effect on SPAM-Bots as they would now have to produce some work for every e-mail they sent. The above is a great example of how the integrity of e-mails was increased by using a proof-based system under the hood before transporting these.

A similar principle applies with CAPTCHA-systems as well. Those systems, where a [human] User must complete work before he/she can log-in to their software. The actual objective of this executed work is to augment the integrity of data by making sure that a [human] User is indeed present at the computer and could solve the requested work.

Applications

Public-ledgers that use Proof-of-Work algorithms effectively create a verifiable chain of work.

As such, anytime when a new block is added to the blockchain, the proof is added onto it – in the form of a number that is impossible to guess. The delivered proof can be verified by any network participants using standard public-key cryptography.

This digital chain of work that is produced could be used in many industries to augment transparency when it comes to verify the work that has been carried out.

Public-ledger technology and consensus algorithms can be leveraged in the digitalisation of sectors like e-Mobility, e-Governments / e-Identities, digital payments and agriculture to augment transparency and allows the end-customer to actually verify what he is being sold/told – versus the current systems where the end-customer has to trust big players of these industries.

Example application: Shipment with receipt

In the previously mentioned digitalised society, we can perfectly think of a Peer-to-Peer parcel shipment where the transport / the delivery is done by a Drone. For this example, we have to determine a consensus algorithm which none of the two actors – sender or recipient – can manipulate.

We define a consensus algorithm similar to Proof-of-Work. Yet, our consensus algorithm will not be based on the search of a number and rather than proving that some work was done, we will be proving that a certain distance was covered by our drone. This can be achieved using the so-called geo-location of the drone combined with its’ historical energy consumption. The energy consumption will be captured in the form of blocks of transactions that are distributed using our public-ledger network.

Let’s put a name on our hypothetical consensus algorithm: Proof-of-Distance. This system shall ensure that blocks get appended to the blockchain only if a drone does indeed cover the specified distance to eventually deliver the parcel as intended.

Thereafter, our system will only produce blocks when a parcel shipment is completed. We define that both actors in the trade – the sender and the recipient – must confirm the transaction.

The first step to achieve this will be to encrypt shipment information with the private key of the sender. We hypothetically use asymmetric cryptography such that the data of the recipient, i.e. the shipment information, can be kept private at all times, even being distributed on a public-ledger network as defined above.

Next, the drone will produce a proof that it covered the specified distance – which can be verified by both parties involved by reading transactions. At last, the shipment will be completed when the recipient digitally signs the proof of covered distance and the delivery with its private key.

This idea also illustrates that of a “permissioned” public-ledger implementation. Our consensus algorithm is said to be permissioned, because only select actors can produce blocks for the network – in contrast with more open networks like Bitcoin where anyone could theoretically append blocks to the blockchain.

Conclusion

There is many different approaches to consensus algorithms in the public-ledger sector. These are used to produce agreements about the state of ledgers over many different Peer-to-Peer networks.

The use of public-ledger technologies initiated by the Bitcoin project has a lot of potential in optimising processes and adding ways to find agreements in our society.

Data that cannot be manipulated, is distributed and transparently available – would that not be exactly what is needed for the machine economy of our digital society? We believe that the transparency added by public-ledger technologies could be a game-changer when it comes to overseeing machine processes and being able to interact with these in emergency cases.

Public-ledger technology offers redundancy and integrity for the data – of which our future will be built.

Book a call with a UBC blockchain expert for assistance or guidance on integrating the technology in your business.

We hope that this article was insightful for you and are looking forward to any feedback and messages. Please share your thoughts in the comments section below!

Disclaimer

This website may contain information about financial firms, employees of such firms, and/or their products and services such as real estate, stocks, bonds, and other types of investments. While this website may intend - as the author deem necessary - to provide information on financial matters and investments, such information or references should not be construed or interpreted as investment advice or viewed as an endorsement.