# Prolly Trees

## # Why not just B-Trees?

Data structures with hysteresis have path dependency, in the case of B-trees the actual tree structure depends on the order of inserts and removes.

## # Merkle Search Trees

Merkle Search Trees are an incremental method for constructing determinstic B-trees. Their idea is to first hash all content which is inserted into the tree, and then to place items at a level proportional to the number of 0’s in the prefix of their hash written in base B.

Under the assumption that all hashes are uniformly distributed, a Merkle search tree has the following properties:

- Every internal node has on average B children
- All leaves are on average the same distance from the root
- The tree is deterministic

However, these are not Sybil resistant

## # Prolly Trees

A Prolly Tree is a search tree where the number of values stored in each node is determined probabilistically, based on the data which is stored in the tree.

Within each hash, we look for a pattern that has a known probability of occuring. If the pattern is found, that position is a boundary. We slide the window forward to the end of the containing item, and write a new chunk containing the bytes between this boundary and the previous, if any. The resulting chunk is stored in a content addressed storage system

In Noms, the pattern we look for is the 12 high bits being 1. Since this has a probability of 1/2^12, the average chunk size in Noms is 4kb.