All examples below are written in pseudocode that happens to carry a lot of syntax from Typescript. Syntax liberties are taken where intention is clear

## Spec

### Op-based

See operation-based CRDTs for more properties

### State-based

See state-based CRDTs for more properties

## Counters

### Op-based

This implementation is trivially correct as both addition and subtraction commute

### State-based

Inspired by vector clocks. Merge takes max of each entry so this forms a monotonic semilattice. We need two vectors as just operating on a single vector as max wouldn’t even work if we included decrement.

For example, say you have two states `[1, 0, 1]`

and `[1, 1, 1]`

. You would never tell if the first one happens after the second (second node subtraction) or if the second one happens after the first (second node addition).

## Last-writer-wins Registers

A register is a memory cell storing a single value.

### Op-based

`X`

is an arbitrary type

### State-based

Timestamp is monotonic increasing so compare created a valid monotonic semilattice.

## Sets

A foundational data structure that form the basis of containers, maps, and graphs.

Naively adding and removing from a set does not commute so we can only approximate the properties of a set.

Most implementations below differ by how they handle concurrent $add(e)∥remove(e)$

For example:

- Grow-only set (G-Set) avoids remove altogether
- 2-Phase set (2P-Set) is a variant where both add and remove are valid operations but an element cannot be re-added once removed
- Unique set (U-Set) is an extension of 2-Phase set where we additionally assume elements are unique. Additional requirement that causal dependencies are respected (op-based CRDTs are sufficient to ensure this)
- Add-wins set (OR-Set/AW-Set) supports both adding and removing elements. Add has precedence when an add and remove happen concurrently.

### State-based 2P-Set

The compare function (checking to see if `x`

comes before `y`

in the semilattice) here is quite tricky and not immediately obvious why it is correct.

- If
`x.set`

is a subset of`y.set`

, then`x`

must have come before`y`

because nothing is ever removed from`set`

- If
`x.set`

is the same set as`y.set`

, then`x`

can only have come before`y`

if`x.removed`

is a subset of`y.subset`

- If
`x.set`

is not a subset of`y.set`

then`x`

cannot have come before`y`

### Op-based U-Set

Again, this op-based implementation assumes causal ordering in message delivering

### Op-based AW-Set

Intuition here is to generate a unique ID for each element added. Multiple `add`

s will add multiple values and `delete`

will delete all elements with the same value.

Concurrent `add`

s commute as each `add`

is unique. If a concurrent `add`

and `remove`

happen, it also commutes as `add`

has precedence.

## Sequences

A sequence for text editing (or just sequence hereafter) is a totally-ordered set of elements, each composed of a unique identifier and an atom.

For the rest of this section, assume the following definitions

### Replicated Growable Array (RGA)

Automerge the library uses this!

Represented as a 2P-Set of vertices in a linked list.

Essentially,

- Build the tree, connecting each item to its parent
- When an item has multiple children, sort them by sequence number then by their ID.
- The resulting list (or text document) can be made by flattening the tree with a depth-first traversal.

### Continuous Sequence using real numbers

We need to translate indices into unique immutable positions (what the user intuitively means when they say ‘insert here’).

This assumption of relative order of elements remains constant over time is called the **strong list specification**.

Performance depends critically on the implementation of identifiers. One possible implementation is to use a dense identifier space like $R$ where a unique identifier can always be allocated between any two identifiers.

Indices are based off of what % of the text they get inserted at. 0.0 is the index of the start sequence, 1.0 is the index of the end sequence (this is similar to what Treedoc does).

```
0.0 1.0
NUL NUL
```

Inserting a single character would be halfway between 0.0 and 1.0 so it would have an index of 0.5.

```
0.0 0.5 1.0
NUL 'B' NUL
```

Inserting to the left of ‘B’ would be between 0.0 and 0.5 so 0.25.

```
0.0 0.25 0.5 1.0
NUL 'A' 'B' NUL
```

We represent the continuum using a tree. The first element is allocated at the root. Thereafter, it is always possible to create a new leaf $e$ between any two nodes $n$ and $m$.

We do this by representing the tree using a U-Set

## Graphs

Generally, graphs are difficult to maintain due to the property that CRDTs *cannot compute and maintain* global invariants like structure.

However, some stronger forms of acyclicity are implied by local properties, for instance a monotonic DAG, in which an edge may be added only if it oriented in the same direction as an existing path. Vertices and edges can be stores as sets.

See reference implementations in this paper