An article in Inside Higher Ed, plus Appendix

Here is [a link to] an article I wrote on blockchain in higher ed:
Blockchain Pixie Dust
[yes, same title as this site; probably not a coincidence]

When I first proposed that piece to Inside Higher Ed, I sent along as well an appendix which I thought might be useful to some readers who were less familiar with certain ideas and terminology from cryptology. However, in the end, IHE did not publish those explanations.

Here, then, for those who want a little extra background, is the following


It helps to understand some modern cryptography when talking about blockchains — but it is well worth your time to gain this understanding. Great parts of modern life take place on the Internet and yet most drivers on this information superhighway have absolutely no understanding even of the existence of seat belts. Here is a thing universities would actually be wise to do, when teaching more and more digital literacy to their students: teach them how to protect themselves and their data, to the extent it is possible, before letting them merge onto that highway.

The first basic fact here is that there is absolutely no security or privacy built into the Internet. Sending information across the `net is like taking your message, cutting it into small chunks [called packets], writing them on postcards, and going down to the local bus station to see if anyone will carry any of those postcards closer to your desired destination. Along the way, the postcards may go through various other bus-stations where good Samaritans will pass them to riders on buses that get ever closer to your destination. Perhaps the bus operators (or the NSA or nosey criminals!) will also copy down the fragments on each card and use that information for their own purposes. Bus operators might also refuse passage on the express buses to packets which have not payed a special fee — this is the issue of net neutrality.

Encryption, Decryption, and PKI

Since messages in flight over the Internet are basically public, users must encrypt everything if they want any privacy at all in the modern world. Encryption is the process of scrambling a message in such a way that only the intended recipient can read it, when that recipient first unscrambles it [which unscrambling process is called decryption]. Generally, the recipes [algorithms] for encryption and decryption are widely known, so that the community can hopefully find any flaws before they have disastrous consequences, while only a small piece of additional information [the key — think of it as a password] is kept secret by sender and receiver.

The approach which I just described, where the key is the same both to encrypt and decrypt, is called a symmetric cryptosystem, and it works well in many circumstances. For example, hopefully everyone reading this piece encrypts the main drive on their personal devices. Because, in a way, what is on your device is a message past you is sending to future you, and just as packets on the Internet are public, so will be your stored information if your device is stolen or lost. In that scenario, it is easy for past you to share knowledge of the key with future you: just remember it.

But suppose I want to send my credit card number to Jeff Bezos, so that his minions will send me a nice canister of uranium ore [read the reviews, it gets 3 1/2 stars]. I've never met Jeff and we two do not share a key. So what to do?

The answer is to use instead an asymmetric cryptosystem, where the key used to encrypt is different from the one used to decrypt — they are called, respectively, the public key and private key, and are make widely public and kept jealously secret, respectively. This seems paradoxical, but it is a reality, and in fact one which makes all commerce on the Internet possible. For when I go to Jeff's little storefront on the `net, his public key is posted there. I can use it to encrypt my credit card number before I send it through the [very public] Internet, and no one along the way will be able to steal it even though Jeff can use his private key to decrypt the message and charge me $39.95 (shipping is free on uranium ore!).

If you want a mental image to go with this kind of cryptosystem, imagine it as special lock-boxes, which require two keys. The public key can be used by anyone to operate a device inside the box, which sets it so that a message can be put into it, and then when it is closed it will latch shut and almost no one will be able to open it — after all, the keyhole for the public key, which everyone has, is on the closed, inside of the box. The person who does have the private key, however, can insert it into a keyhole on the outside of the box, turn it, and open the box to see the message.

Asymmetric cryptosystems allow the owner of a private key to put a digital signature on a public document, as follows: If I have something I want to sign, I decrypt it, and then send to someone both the document and an extra piece of information: the decrypted version. A recipient can verify my signature by encrypting that extra information and comparing it to the document which they also have, and seeing if they are identical. If so, it is known that the person who produced the extra information [called the digital signature] must have been the person who knows the private key corresponding to that public key.

In essence, what I am doing with a digital signature is turning the lock-box inside out, then using my private key to set it to latch closed, putting the document inside, and closing it. The recipient — who has both the actual document and the locked, inside-out box — can open that box because their public key fits in the keyhole which is now on the outside of the box. They can then check both versions of the document, the plain one and the one they were able to get out of the now unlocked, inside-out box, are identical.

Both asymmetric cryptosystems and digital signatures have a similar, significant problem: how do I know for sure that the public key I get off of Jeff's web page is in fact his? If I were using a symmetric cryptosystem, I would have met him in person to share that single key, and I would have demanded to see his driver's license. Since I only ever interacted with him over the Internet using a public key that I hope is his, I can never be sure — perhaps the postcard coming back from his website when I went there and asked for his public key was substituted by some malicious party at some random bus station between him and me, so that I instead got that person's public key. They could then pretend they were Jeff in communications with me, and I would not know the difference. This is called a man-in-the-middle attack, and to prevent it we need a reliable way to associate real-world identities with public keys, which would be a public-key infrastructure or PKI.

Hash functions

There's one more piece of cryptographic technology we need to build a blockchain: a way to tie pieces of information together in a robust way. Suppose I want to have a record $R_2$ which refers to another record $R_1$, but in a way that we can be absolutely sure that it was only $R_1$ and not any other thing which differs from it in even a single bit [a bit is the smallest piece of data on a computer, taking on the possible values 0 or 1].

The obvious way to do this would be to make $R_2$ simply have a miniaturized copy of $R_1$ in it. But actually bits in computers are already about as small as humans can make them, so the copy of $R_1$ inside $R_2$ would take up as much space as all of $R_1$. If we kept chaining together records in this robust but naïve way, the records would quickly grow enormous.

Instead, we put the original record $R_1$ into an algorithm which crunches it down and spits out a short summary of the whole thing — called a digest — which we can then insert into record $R_2$. These summarizing algorithms are called hash functions, and the one which is used most widely today is called SHA-256, having as its output, no matter what size input it is given, a digest of size 256 bits. The idea is that if we feed two large inputs — that differ in only one bit — into a hash function, the resulting digests will look completely different.

You may wonder how such a thing is possible: surely, by putting into a hash function larger and larger random inputs, there will be two different inputs which have the same digest — which would be called a collision. That was a good intuition, and mathematically we know there are infinitely many collisions, but it is humanly impossible to find them. There are a lot of possible digests: in fact, if every star in the observable universe had a billion Earth-like planets, and each grain of sand on the beaches of these planets contained another universe of stars, each with its billion planets, then all the grains of sand on the planets in the universes of all the grains of sand on the planets of all the starts in our universe ... would still be fewer (by a bit) than the number of possible SHA-256 digests. So it's going to be hard to find those collisions, even though mathematics tells us that they are out there somewhere.

Using hash functions, we can build immutable chains of blocks, as follows. Everyone agrees upon the first block (called the genesis block). Every block after that contains within it the hash of the previous block. That way, if anyone ever tries to claim that some past block was different from what it actually was, the next block's digest will not match, nor will the next block's, since it's digest was of a block with the correct previous digest, etc., etc., on down the whole chain.