The question is: One enormous SQLite file, or actually store the chain as a collection of files?
In the minimum viable product, the blockchain will be quite small, and it will be workable to put it one big SQLite file. The trouble with one enormous SQLite file is that when it gets big enough, we face a high and steadily increasing risk of one sector on the enormous disk going bad, corrupting the entire database. SQLite does not handle the loss of a single sector gracefully.
And one enormous SQLite file cannot possibly handle the world monetary system, which will eventually go to two hundred thousand transaction inputs per second.
We will have to use zero knowledge sharding, in which each peer handles only selected shards of the global transaction data, and submits a concise zero knowledge proof that it did so honestly.
Optimal solution is to store recently accessed data in one big SQLite file, while also storing the data in a large collection of blocks, once it has become subject to wide consensus. Older blocks, fully incorporated in the current consensus, get written to disk in our own custom Merkle-patricia tree format, with append only Merkle-patricia tree node locations, a sequential append only collection of binary trees in postfix tree format.
Each file, incorporating a range of blocks, has its location on disk, time, size, and the roots of its Merkle-patricia trees recorded in the SQL database. On program launch, the size, touch time, and root has of newest block in the file are checked. If there is a discrepancy, we do a full check of the Merkle-patricia tree, editing it as necessary to an incomplete Merkle-patricia tree, download missing data from peers, and rebuild the blocks, thus winding up with a newer touch dates. Our per peer configuration file tells us where to find the block files, and if they are not stored where expected, we rebuild. If stored where expected, but touch dates unavailable or incorrect (perhaps because this is the first time the program launched) then the entire system of Merkle-patricia trees is validated, making sure the data on disk is consistent.
How do we tell the one true blockchain, from some other evil blockchain? Well, the running definition is consensus, that you can interact with other peers because they agree on the running root hash. So you downloaded this software from somewhere, and when you downloaded it, you got the means to contact a bunch of peers, whom we suppose agree, and each have evidence that other peers agree. And, having downloaded what they agree on, you then treat it as gospel and as more authoritative that what others say, so long a touch dates, file sizes, locations, and the hash of the most recent block in the file are consistent, and the internal contents of each file are consistent with root of the most recent tree.
reaction.la gpg key 154588427F2709CD9D7146B01C99BB982002C39F
This work is licensed under the Creative Commons Attribution 4.0 International License.