Skip to main content
2025-11-0128 min read
Software Engineering

CRDTs: The Magic Behind Collaborative Editing and Distributed Systems

Solving Conflicts Without a Therapist

If you've ever used Google Docs, Figma, or any real-time collaborative application and wondered "how the hell does this not break when multiple people edit at once?"—the answer is probably CRDTs (Conflict-free Replicated Data Types). Let's dive deep into what they are, how they work, and why they're basically magic.

The Problem: Distributed Systems Are Hard

Imagine you and a friend are editing the same document. You're in New York, they're in Tokyo. You both type at the same time:
  • You type "Hello" at position 0
  • They type "World" at position 0
  • Who wins?
Traditional approaches:
  1. Last Write Wins: Someone's work gets destroyed. Users rage quit.
  2. Locking: "This document is locked by another user." Everyone hates this.
  3. Operational Transformation (OT): Complex, requires a central server, makes engineers cry.
Enter CRDTs: data structures that magically merge concurrent operations without conflicts.

What Makes CRDTs Special?

CRDTs guarantee eventual consistency without coordination. That's huge. It means:
  • No central server required
  • Works offline
  • No locking
  • No conflicts
  • Changes always merge deterministically
The key insight: instead of trying to prevent conflicts, CRDTs are designed so that conflicts are mathematically impossible.

The Math Behind the Magic

CRDTs work because they form mathematical structures with specific properties:
1. Commutative: a ⊔ b = b ⊔ a Order doesn't matter. Applying operation A then B gives the same result as B then A.
2. Associative: (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c) Grouping doesn't matter.
3. Idempotent: a ⊔ a = a Applying the same operation multiple times has no additional effect.
These properties guarantee that no matter what order operations arrive in, everyone ends up with the same state. It's beautiful.

Types of CRDTs

There are two main flavors:

1. State-based CRDTs (CvRDTs)

  • Send entire state
  • Merge states with a join operation
  • Requires monotonic growth
  • Simple but can be bandwidth-heavy

2. Operation-based CRDTs (CmRDTs)

  • Send only operations
  • Apply operations as they arrive
  • Requires exactly-once delivery
  • More efficient but needs reliable messaging
Most real-world systems use a hybrid approach.

Common CRDT Types

Counters

G-Counter (Grow-only Counter)
The simplest CRDT. Each node tracks its own count, and the total is the sum of all counts.
python
1class GCounter:
2 def __init__(self, node_id):
3 self.node_id = node_id
4 self.counts = {} # {node_id: count}
5
6 def increment(self):
7 self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1
8
9 def value(self):
10 return sum(self.counts.values())
11
12 def merge(self, other):
13 # Take maximum count for each node - this is the key!
14 for node_id, count in other.counts.items():
15 self.counts[node_id] = max(self.counts.get(node_id, 0), count)
Why it works: Each node only increments its own counter. During merge, we take the maximum value for each node. Since counts only grow, we never lose increments. If node A has seen 5 increments from node B, and node C has seen 7 increments from node B, after merging we know B incremented at least 7 times.
PN-Counter (Positive-Negative Counter)
Supports both increment and decrement by using two G-Counters:
python
1class PNCounter:
2 def __init__(self, node_id):
3 self.node_id = node_id
4 self.p = GCounter(node_id) # Positive counts
5 self.n = GCounter(node_id) # Negative counts
6
7 def increment(self):
8 self.p.increment()
9
10 def decrement(self):
11 self.n.increment()
12
13 def value(self):
14 return self.p.value() - self.n.value()
15
16 def merge(self, other):
17 self.p.merge(other.p)
18 self.n.merge(other.n)
The brilliance: decrements are just increments to a negative counter. Both counters only grow, maintaining the CRDT properties.

Sets

G-Set (Grow-only Set)
The simplest set CRDT:
python
1class GSet:
2 def __init__(self):
3 self.elements = set()
4
5 def add(self, element):
6 self.elements.add(element)
7
8 def contains(self, element):
9 return element in self.elements
10
11 def merge(self, other):
12 self.elements = self.elements.union(other.elements)
Dead simple: merge is just set union. Once added, elements can never be removed.
2P-Set (Two-Phase Set)
Allows both add and remove:
python
1class TwoPSet:
2 def __init__(self):
3 self.added = set()
4 self.removed = set()
5
6 def add(self, element):
7 self.added.add(element)
8
9 def remove(self, element):
10 # Can only remove if it was added
11 if element in self.added:
12 self.removed.add(element)
13
14 def contains(self, element):
15 return element in self.added and element not in self.removed
16
17 def merge(self, other):
18 self.added = self.added.union(other.added)
19 self.removed = self.removed.union(other.removed)
The catch: once removed, an element can never be re-added. Remove wins over add. This is called "remove bias."
LWW-Element-Set (Last-Write-Wins Element Set)
Uses timestamps to resolve conflicts:
python
1class LWWSet:
2 def __init__(self):
3 self.adds = {} # element -> timestamp
4 self.removes = {} # element -> timestamp
5
6 def add(self, element, timestamp=None):
7 timestamp = timestamp or time.time()
8 self.adds[element] = max(self.adds.get(element, 0), timestamp)
9
10 def remove(self, element, timestamp=None):
11 timestamp = timestamp or time.time()
12 self.removes[element] = max(self.removes.get(element, 0), timestamp)
13
14 def contains(self, element):
15 add_time = self.adds.get(element, 0)
16 remove_time = self.removes.get(element, 0)
17 return add_time > remove_time
18
19 def merge(self, other):
20 # Take latest timestamp for each operation
21 for elem, ts in other.adds.items():
22 self.adds[elem] = max(self.adds.get(elem, 0), ts)
23 for elem, ts in other.removes.items():
24 self.removes[elem] = max(self.removes.get(elem, 0), ts)
Now elements can be re-added after removal. The last operation (by timestamp) wins.
OR-Set (Observed-Remove Set)
The most sophisticated set CRDT:
python
1import uuid
2
3class ORSet:
4 def __init__(self):
5 self.elements = {} # element -> set of unique tags
6
7 def add(self, element):
8 if element not in self.elements:
9 self.elements[element] = set()
10 # Generate unique tag for this add
11 tag = str(uuid.uuid4())
12 self.elements[element].add(tag)
13 return tag
14
15 def remove(self, element):
16 # Remove all tags we've observed
17 if element in self.elements:
18 del self.elements[element]
19
20 def contains(self, element):
21 return element in self.elements and len(self.elements[element]) > 0
22
23 def merge(self, other):
24 for element, tags in other.elements.items():
25 if element not in self.elements:
26 self.elements[element] = set()
27 self.elements[element].update(tags)
The key insight: each add gets a unique tag. Remove only removes tags you've observed. If someone else adds the same element concurrently (with a different tag), their add survives. This solves the add-remove conflict elegantly.

The Hard Part: Sequence CRDTs for Text Editing

This is where CRDTs get fascinating. How do you maintain order in a sequence when people are inserting characters everywhere?

The Core Challenge

When you insert a character, you need to specify WHERE it goes. But positions change as others insert/delete. If I say "insert 'A' at position 5" and you delete position 3, where does 'A' go?
The solution: don't use positions. Use something that doesn't change.

WOOT (Without Operational Transformation)

WOOT was one of the first practical sequence CRDTs. Here's the brilliant idea:
Every character gets a unique, immutable ID. When you insert a character, you specify what it comes after and before using these IDs, not positions.
python
1class WootCharacter:
2 def __init__(self, char, char_id, prev_id, next_id, visible=True):
3 self.char = char
4 self.id = char_id # (counter, site_id)
5 self.prev_id = prev_id
6 self.next_id = next_id
7 self.visible = visible # False = deleted
8
9class WootDocument:
10 def __init__(self, site_id):
11 self.site_id = site_id
12 self.counter = 0
13 # Start with sentinel characters
14 self.chars = {
15 ('BEGIN', 0): WootCharacter('', ('BEGIN', 0), None, ('END', 0)),
16 ('END', 0): WootCharacter('', ('END', 0), ('BEGIN', 0), None)
17 }
18 self.sequence = [('BEGIN', 0), ('END', 0)]
19
20 def generate_id(self):
21 self.counter += 1
22 return (self.counter, self.site_id)
23
24 def insert(self, char, prev_char_id, next_char_id):
25 char_id = self.generate_id()
26 new_char = WootCharacter(char, char_id, prev_char_id, next_char_id)
27 self.chars[char_id] = new_char
28
29 # Find insertion position
30 pos = self._find_position(prev_char_id, next_char_id)
31 self.sequence.insert(pos, char_id)
32 return char_id
33
34 def _find_position(self, prev_id, next_id):
35 # Find all characters between prev and next
36 candidates = []
37 for i, cid in enumerate(self.sequence):
38 if self._is_between(cid, prev_id, next_id):
39 candidates.append((i, cid))
40
41 # Sort by ID to ensure deterministic ordering
42 candidates.sort(key=lambda x: x[1])
43
44 # Insert after the last candidate
45 if candidates:
46 return candidates[-1][0] + 1
47 else:
48 # Insert right after prev
49 return self.sequence.index(prev_id) + 1
Think of it like this:
  • Document starts with invisible beginning/end markers
  • Each character knows its neighbors when created
  • Characters form a partial order, not a total order
Example:
Initial: [BEGIN] [END]
User A inserts 'H' between BEGIN and END
User B inserts 'i' between BEGIN and END (concurrently)
Now we have two characters, both claiming to be between BEGIN and END. WOOT uses the character IDs (which include site ID and counter) to deterministically order them. Everyone computes the same final order.
The magic: even if operations arrive in different orders, the algorithm always produces the same result.
Deletion: Characters aren't removed, just marked as deleted (tombstones). This ensures the structure remains stable for future operations.

Logoot

Logoot takes a different approach: position identifiers that live in a continuous space.
python
1class LogootPosition:
2 def __init__(self, position):
3 # Position is a list of (int, site_id) tuples
4 self.position = position
5
6 def __lt__(self, other):
7 # Lexicographic comparison
8 for i in range(min(len(self.position), len(other.position))):
9 if self.position[i] < other.position[i]:
10 return True
11 elif self.position[i] > other.position[i]:
12 return False
13 return len(self.position) < len(other.position)
14
15class LogootDocument:
16 def __init__(self, site_id):
17 self.site_id = site_id
18 self.clock = 0
19 self.chars = [] # List of (position, char, visible)
20
21 # Add sentinels
22 self.chars.append((
23 LogootPosition([(0, 0)]),
24 'BEGIN',
25 True
26 ))
27 self.chars.append((
28 LogootPosition([(sys.maxsize, 0)]),
29 'END',
30 True
31 ))
32
33 def generate_position_between(self, pos1, pos2):
34 """Generate a position between pos1 and pos2"""
35 self.clock += 1
36
37 # Find first differing level
38 level = 0
39 while (level < len(pos1.position) and
40 level < len(pos2.position) and
41 pos1.position[level] == pos2.position[level]):
42 level += 1
43
44 # Generate position
45 new_pos = pos1.position[:level].copy()
46
47 if level < len(pos1.position) and level < len(pos2.position):
48 # Positions differ at this level
49 int1, site1 = pos1.position[level]
50 int2, site2 = pos2.position[level]
51
52 if int2 - int1 > 1:
53 # Space between integers
54 new_int = (int1 + int2) // 2
55 new_pos.append((new_int, self.site_id))
56 else:
57 # No space, need to go deeper
58 new_pos.append(pos1.position[level])
59 new_pos.append((sys.maxsize // 2, self.site_id))
60 else:
61 # One position is prefix of other
62 if level < len(pos1.position):
63 new_pos.append(pos1.position[level])
64 new_pos.append((sys.maxsize // 2, self.site_id))
65
66 return LogootPosition(new_pos)
Imagine positions as decimal numbers between 0 and 1:
  • First character gets position 0.5
  • Insert before it? Use 0.25
  • Insert between 0.25 and 0.5? Use 0.375
But positions are actually lists of tuples (integer, site_id):
  • Position: [(5, site1), (3, site2)]
  • Can extend infinitely: [(5, site1), (3, site2), (7, site3)]
This creates a dense space where you can always find a position between any two positions.
Why it works:
  • Positions are globally unique (include site ID)
  • Between any two positions, infinite positions exist
  • Total ordering is deterministic (lexicographic)
  • No need to track what comes before/after
The genius: Unlike WOOT, Logoot doesn't need to store predecessor/successor relationships. The position identifier contains all ordering information.

TreeDoc

TreeDoc organizes the document as a tree rather than a list. Each node in the tree represents either:
  • A character (leaf node)
  • A group of nodes (internal node)
python
1class TreeDocNode:
2 def __init__(self, is_leaf=False, char=None):
3 self.is_leaf = is_leaf
4 self.char = char
5 self.tombstone = False
6 self.children = {} # key -> child node
7
8class TreeDoc:
9 def __init__(self, site_id):
10 self.site_id = site_id
11 self.clock = 0
12 self.root = TreeDocNode()
13
14 # Initialize with sentinels
15 begin_path = [('0', 0)]
16 end_path = [('z', 0)]
17 self._insert_at_path(begin_path, 'BEGIN')
18 self._insert_at_path(end_path, 'END')
19
20 def insert(self, prev_path, next_path, char):
21 """Insert character between two paths"""
22 self.clock += 1
23
24 # Find common ancestor
25 common_len = 0
26 while (common_len < len(prev_path) and
27 common_len < len(next_path) and
28 prev_path[common_len] == next_path[common_len]):
29 common_len += 1
30
31 # Generate new path
32 new_path = prev_path[:common_len].copy()
33
34 if common_len < len(prev_path) and common_len < len(next_path):
35 # Paths diverge - insert between them
36 key1, _ = prev_path[common_len]
37 key2, _ = next_path[common_len]
38 new_key = self._generate_key_between(key1, key2)
39 new_path.append((new_key, self.site_id))
40 else:
41 # One path is prefix of other
42 if common_len < len(prev_path):
43 new_path = prev_path[:common_len + 1].copy()
44 new_key = self._generate_key_after(
45 new_path[-1][0] if new_path else '0'
46 )
47 new_path.append((new_key, self.site_id))
48
49 self._insert_at_path(new_path, char)
50 return new_path
51
52 def _generate_key_between(self, key1, key2):
53 """Generate a key between key1 and key2"""
54 # Simple approach: use midpoint character
55 if ord(key2) - ord(key1) > 1:
56 return chr((ord(key1) + ord(key2)) // 2)
57 else:
58 # Need to extend - append to key1
59 return key1 + 'h' # Midpoint in alphabet
The structure:
                Root
               /    \
          Major1    Major2
          /    \      |
      Minor1  Minor2  Minor3
        |       |       |
       'H'     'e'     'y'
Position identifiers: Path from root to leaf
  • 'H' is at path [('a', site1), ('a', site1)]
  • 'e' is at path [('a', site1), ('b', site1)]
Insertion: When inserting between two characters:
  1. Find their common ancestor in the tree
  2. Create new node(s) as needed between the paths
  3. Use disambiguators (like site ID) for concurrent inserts at same position
Why it's brilliant:
  • Tree structure naturally handles dense areas (many insertions at same spot)
  • In a list-based CRDT, inserting 1000 characters at the same position creates deep nesting
  • TreeDoc spreads these out across the tree width
  • Rebalancing can happen locally without affecting global order
  • Path identifiers are often shorter than Logoot positions
  • Better performance for common editing patterns (typing at end, inserting in middle)
The tree structure also provides natural hierarchy for optimizations like lazy evaluation and caching.

RGA (Replicated Growable Array)

RGA uses a different strategy: causal ordering with timestamps.
python
1class RGANode:
2 def __init__(self, char, timestamp, after_node=None):
3 self.char = char
4 self.timestamp = timestamp # (clock, site_id)
5 self.after = after_node # Node this was inserted after
6 self.tombstone = False
7 self.next = [] # Ordered list of following nodes
8
9class RGA:
10 def __init__(self, site_id):
11 self.site_id = site_id
12 self.clock = 0
13 # Sentinel nodes
14 self.head = RGANode('BEGIN', (0, 0))
15 self.tail = RGANode('END', (float('inf'), 0), self.head)
16 self.head.next.append(self.tail)
17 self.nodes = {
18 (0, 0): self.head,
19 (float('inf'), 0): self.tail
20 }
21
22 def insert(self, after_timestamp, char):
23 """Insert character after node with given timestamp"""
24 self.clock += 1
25 timestamp = (self.clock, self.site_id)
26
27 after_node = self.nodes.get(after_timestamp)
28 if not after_node:
29 return None
30
31 new_node = RGANode(char, timestamp, after_node)
32 self.nodes[timestamp] = new_node
33
34 # Insert into after_node's next list in timestamp order
35 insert_pos = 0
36 for i, node in enumerate(after_node.next):
37 if node.timestamp > timestamp:
38 insert_pos = i
39 break
40 else:
41 insert_pos = i + 1
42
43 after_node.next.insert(insert_pos, new_node)
44 return timestamp
45
46 def get_sequence(self):
47 """Get visible characters in order"""
48 result = []
49
50 def traverse(node):
51 if not node.tombstone and node.char not in ['BEGIN', 'END']:
52 result.append(node.char)
53 for next_node in node.next:
54 traverse(next_node)
55
56 traverse(self.head)
57 return ''.join(result)
Each character stores:
  • The character itself
  • A timestamp (logical clock + site ID)
  • A reference to the character it was inserted after
The key insight: use the "inserted after" relationship plus timestamps to determine order.
When concurrent inserts happen after the same character:
  1. Both reference the same predecessor
  2. Use timestamps to order them deterministically
  3. Higher timestamp comes first (or use site ID as tiebreaker)
Why it works:
  • The causal history (what came after what) is preserved in the structure
  • Timestamps provide total ordering for concurrent operations
  • The tree structure (each node has ordered list of successors) maintains efficiency
  • No need for complex position identifiers
RGA is often more intuitive than WOOT or Logoot because it directly models how we think about insertion: "put this character after that one."

Advanced Sequence CRDT Concepts

The Interleaving Problem

Consider this scenario:
  • User A types "HELLO" quickly
  • User B types "WORLD" quickly
  • Both at the same position, concurrently
Bad outcome: "HWEOLRLLOD" (characters interleaved) Good outcome: "HELLOWORLD" or "WORLDHELLO" (words preserved)
This happens because basic CRDTs treat each character insertion independently. Solutions:
1. Causal Grouping:
python
1class CausalGroup:
2 def __init__(self, site_id):
3 self.site_id = site_id
4 self.groups = {} # group_id -> list of operations
5 self.current_group = None
6 self.last_op_time = 0
7
8 def start_group(self):
9 self.current_group = f"{self.site_id}-{time.time()}"
10 self.groups[self.current_group] = []
11
12 def add_to_group(self, operation):
13 now = time.time()
14 # New group if too much time passed
15 if now - self.last_op_time > 0.5: # 500ms threshold
16 self.start_group()
17
18 if self.current_group:
19 self.groups[self.current_group].append(operation)
20 self.last_op_time = now
2. Extended Timestamps: Include a sequence number within rapid bursts:
python
1class BurstAwareTimestamp:
2 def __init__(self, clock, site_id, burst_seq=0):
3 self.clock = clock
4 self.site_id = site_id
5 self.burst_seq = burst_seq # Increments within burst
6
7 def __lt__(self, other):
8 # First compare clocks
9 if self.clock != other.clock:
10 return self.clock < other.clock
11 # Same clock? Compare sites
12 if self.site_id != other.site_id:
13 return self.site_id < other.site_id
14 # Same site at same time? Use burst sequence
15 return self.burst_seq < other.burst_seq
3. Intention Preservation: Some CRDTs (like YATA) track whether insertions happened "together" and try to keep them together.

Garbage Collection

Tombstones (deleted characters) accumulate forever. This is a major problem for long-lived documents.
Why we can't just delete tombstones:
python
1# Scenario that breaks without tombstones:
2# Time 1: Document is "HELLO"
3# Time 2: User A deletes 'L' (marks as tombstone)
4# Time 3: User B (who hasn't seen the delete) inserts 'X' after the 'L'
5# Time 4: If we removed the tombstone, where does 'X' go?
Solutions:
1. Stability Windows:
python
1class StabilityGC:
2 def __init__(self, window_seconds=86400): # 24 hours
3 self.window = window_seconds
4 self.tombstones = {} # timestamp -> tombstone
5
6 def can_gc_tombstone(self, tombstone):
7 age = time.time() - tombstone.timestamp
8 if age < self.window:
9 return False
10
11 # Check if all peers have seen it
12 for peer in self.peers:
13 if peer.vector_clock < tombstone.timestamp:
14 return False
15
16 return True
2. Checkpoint and Replay:
python
1class CheckpointCRDT:
2 def __init__(self):
3 self.checkpoint = None
4 self.checkpoint_version = 0
5 self.ops_since_checkpoint = []
6
7 def create_checkpoint(self):
8 # Snapshot current state without tombstones
9 visible_chars = [c for c in self.chars if not c.tombstone]
10 self.checkpoint = self.serialize(visible_chars)
11 self.checkpoint_version += 1
12 self.ops_since_checkpoint = []
13
14 def sync_with_peer(self, peer):
15 if peer.checkpoint_version > self.checkpoint_version:
16 # Take their checkpoint and replay ops
17 self.load_checkpoint(peer.checkpoint)
18 self.apply_ops(peer.ops_since_checkpoint)
19 else:
20 # Send ops they haven't seen
21 self.send_ops_since(peer.last_seen_version)
3. Causal Stability: Track causal dependencies to know when it's safe to GC:
python
1class CausalStability:
2 def __init__(self):
3 self.delivered = set() # Operations we've delivered
4 self.buffer = {} # op_id -> (op, dependencies)
5
6 def is_causally_stable(self, op_id):
7 """Check if all ops that could reference this are delivered"""
8 op, deps = self.buffer[op_id]
9
10 # Check all operations with higher timestamps
11 for other_id, (other_op, other_deps) in self.buffer.items():
12 if other_op.timestamp > op.timestamp:
13 # Could this operation reference our op?
14 if op_id in other_deps or not self.has_delivered(other_id):
15 return False
16
17 return True

Optimization Strategies

1. Run-length Encoding for Sequential Inserts:
python
1class RunLengthCRDT:
2 def __init__(self):
3 self.runs = [] # List of (start_pos, chars, metadata)
4
5 def insert_string(self, position, text, metadata):
6 # Instead of individual characters, store the whole run
7 self.runs.append({
8 'position': position,
9 'text': text,
10 'metadata': metadata,
11 'length': len(text)
12 })
13
14 def get_char_at(self, index):
15 current_pos = 0
16 for run in self.runs:
17 if current_pos <= index < current_pos + run['length']:
18 return run['text'][index - current_pos]
19 current_pos += run['length']
2. B-tree Storage for Better Cache Performance:
python
1class BTreeCRDT:
2 def __init__(self, node_size=64):
3 self.node_size = node_size
4 self.root = BTreeNode()
5
6 class BTreeNode:
7 def __init__(self):
8 self.chars = [] # Up to node_size characters
9 self.children = [] # Child nodes
10 self.is_leaf = True
11
12 def split(self):
13 """Split node when full"""
14 mid = len(self.chars) // 2
15 right = BTreeNode()
16 right.chars = self.chars[mid:]
17 right.children = self.children[mid:] if not self.is_leaf else []
18 right.is_leaf = self.is_leaf
19
20 self.chars = self.chars[:mid]
21 self.children = self.children[:mid] if not self.is_leaf else []
22
23 return right
3. Lazy Evaluation: Don't materialize the full document until needed:
python
1class LazyCRDT:
2 def __init__(self):
3 self.operations = []
4 self.materialized = None
5 self.version = 0
6
7 def insert(self, pos, char):
8 self.operations.append(('insert', pos, char))
9 self.version += 1
10 self.materialized = None # Invalidate cache
11
12 def get_text(self):
13 if self.materialized and self.materialized[1] == self.version:
14 return self.materialized[0]
15
16 # Materialize on demand
17 text = self._apply_all_operations()
18 self.materialized = (text, self.version)
19 return text
4. Delta Compression: Send only changes since last sync:
python
1class DeltaCRDT:
2 def __init__(self):
3 self.state = {}
4 self.deltas = []
5 self.version = 0
6
7 def mutate(self, operation):
8 # Apply to state
9 self.apply_to_state(operation)
10
11 # Record delta
12 delta = {
13 'op': operation,
14 'version': self.version,
15 'timestamp': time.time()
16 }
17 self.deltas.append(delta)
18 self.version += 1
19
20 def get_deltas_since(self, version):
21 return [d for d in self.deltas if d['version'] > version]
22
23 def merge_deltas(self, remote_deltas):
24 # Apply remote deltas in causal order
25 for delta in sorted(remote_deltas, key=lambda d: d['timestamp']):
26 if delta['version'] not in self.applied_versions:
27 self.apply_to_state(delta['op'])

Modern Implementations

Yjs

Yjs uses a novel approach combining several techniques:
  • YATA (Yet Another Transformation Approach) algorithm
  • Efficient encoding with relative positioning
  • Integrates with existing editors
  • Supports rich data types (not just text)
Key innovation: Tracks dependencies between operations, allowing for more efficient garbage collection and compression.

Automerge

Automerge focuses on developer experience:
  • Works with arbitrary JSON-like data
  • Automatic CRDT selection based on data type
  • Built-in compression and efficiency optimizations
  • Time-travel debugging
Uses a hybrid approach: operation-based with periodic snapshots.

Diamond Types

A research project pushing the boundaries:
  • Proven correct implementation
  • Extremely efficient (faster than OT)
  • Novel list CRDT algorithm
  • Focus on performance over features

Why CRDTs Actually Work: The Deep Theory

CRDTs leverage several mathematical concepts:

Semi-lattices

A semi-lattice is a partially ordered set where any two elements have a least upper bound (join).
For CRDTs:
  • States form a semi-lattice
  • Merge operation is the join
  • Guarantees convergence to same state

Monotonicity

CRDTs only "grow" in the lattice order:
  • Can't go backwards
  • Can't oscillate
  • Must eventually stabilize

Causal Consistency

Operations respect causality:
  • If operation A happened before B, A's effects are visible when B is applied
  • Concurrent operations (no causal relationship) can be applied in any order

Vector Clocks and Version Vectors

Track causal dependencies:
  • Each node maintains a vector of counters
  • Increment own counter for each operation
  • Include vector with operations
  • Compare vectors to determine causal relationships

Real-World Challenges

Network Partitions

CRDTs handle partitions naturally, but:
  • Divergence grows with partition duration
  • Merge after long partition can be expensive
  • User experience during merge needs consideration

Storage Overhead

CRDTs typically require more storage:
  • Metadata (IDs, timestamps, tombstones)
  • Can be 10-100x larger than plain text
  • Compression helps but adds complexity

Semantic Preservation

CRDTs preserve syntax but not always semantics:
  • Concurrent edits to code might compile but not work
  • Formatting operations can conflict oddly
  • Higher-level constraints need additional handling

Performance at Scale

Large documents present challenges:
  • O(n) operations become noticeable
  • Memory usage grows
  • Need indexing structures
  • Batch operations for efficiency

Future Directions

Move Operations

Current CRDTs don't handle "move" well. Research into:
  • Tree-based move operations
  • Maintaining move semantics
  • Efficient implementations

Selective Synchronization

Not all peers need all data:
  • Partial replication
  • Interest management
  • Privacy-preserving sync

Verified Implementations

Formal verification of CRDT implementations:
  • Prove correctness
  • Catch subtle bugs
  • Generate implementations from specifications

Integration with Traditional Databases

Hybrid systems combining:
  • CRDT eventual consistency
  • Traditional ACID transactions
  • Best of both worlds

Conclusion

CRDTs are a beautiful middle finger to the CAP theorem - not by violating it, but by choosing AP and saying "fuck strong consistency, we'll converge eventually." They're what you reach for when you realize your users don't give a shit about your database's ACID guarantees if the app stops working when they lose internet.
Sure, they're complex as hell. Your merge conflicts will look like abstract art. Debugging them is like performing surgery with oven mitts. But sometimes eventual consistency is exactly what you need - when "eventually correct" beats "immediately broken."
Just make sure your team actually understands them before you commit to this path. I've seen too many half-assed CRDT implementations that are worse than just using a central server. If you're going to embrace the chaos of distributed systems, at least do it properly.