Property-Based Testing: Let Your Computer Find Bugs You Can't Imagine
The Bug That Changed My Testing Philosophy
Picture this: You've written a function to parse timestamps, tested it with dozens of examples, and it's been running in production for months. Then one day, it crashes on "2020-02-29T23:59:60". A leap second on a leap day—a combination you never thought to test.
This is where property-based testing shines. Instead of trying to imagine every possible edge case, you describe the properties your code should satisfy, and let the computer generate thousands of test cases, including the weird ones you'd never think of.
What Makes Property-Based Testing Different?
Traditional unit testing is example-based: you, the developer, provide a few specific inputs and assert that they produce specific outputs. Property-based testing flips this on its head: you define the general properties or "rules" your code must obey, and a framework generates hundreds or thousands of examples to try and prove you wrong.
Traditional unit tests are example-based: you provide specific inputs and check for specific outputs.
python
1
def test_sort_examples():
2
assert sort([3, 1, 2]) == [1, 2, 3]
3
assert sort([]) == []
4
assert sort([1]) == [1]
5
assert sort([2, 2, 1]) == [1, 2, 2]
Property-based tests describe general truths about your code:
python
1
from hypothesis import given, strategies as st
2
3
@given(st.lists(st.integers()))
4
def test_sort_properties(lst):
5
sorted_list = sort(lst)
6
7
# Property 1: Output length equals input length
8
assert len(sorted_list) == len(lst)
9
10
# Property 2: Output is ordered
11
for i in range(len(sorted_list) - 1):
12
assert sorted_list[i] <= sorted_list[i + 1]
13
14
# Property 3: Output contains same elements as input
15
assert sorted(lst) == sorted_list
✨
The key insight: You don't specify what to test, you specify how to test. The framework generates the what.
Interactive Testing Comparison
How Example-Based Testing Works:
You manually write specific test cases with known inputs and expected outputs.
"hello""HELLO"
"World""WORLD"
"123""123"
""""
Getting Started with Hypothesis
Let's build intuition with a simple example: a function that reverses strings.
python
1
def reverse_string(s: str) -> str:
2
"""Reverse a string."""
3
return s[::-1]
4
5
# Traditional test
6
def test_reverse_examples():
7
assert reverse_string("hello") == "olleh"
8
assert reverse_string("") == ""
9
assert reverse_string("a") == "a"
10
11
# Property-based test
12
from hypothesis import given
13
from hypothesis import strategies as st
14
15
@given(st.text())
16
def test_reverse_properties(s):
17
reversed_s = reverse_string(s)
18
19
# Property: Reversing twice gives original
20
assert reverse_string(reversed_s) == s
21
22
# Property: Length is preserved
23
assert len(reversed_s) == len(s)
24
25
# Property: First char becomes last (if non-empty)
26
if s:
27
assert reversed_s[-1] == s[0]
28
assert reversed_s[0] == s[-1]
When you run this test, Hypothesis will generate hundreds of strings: empty strings, single characters, Unicode snowmen (☃), null bytes, extremely long strings, and more.
How Property Testing Explores the Input Space
Property being tested: isInsideTriangle(x, y) correctly classifies points
Compare testing strategies: Random sampling vs intelligent shrinking. Property-based testing doesn't know boundaries beforehand - it discovers them by shrinking failures to minimal cases.
30
Total Tests
0
Inside
0
Outside
🔵 Blue dots: Triangle vertices
🟢 Green dots: Points inside the triangle
🔴 Red dots: Points outside the triangle
🟡 Yellow dot: Currently testing
Real-World Properties to Test
1. Invariants
Best for: Enforcing universal rules about your data structures or system state. For example, ensuring a cache never exceeds its capacity, or a user's balance never drops below zero in a banking application.
Properties that remain true regardless of the operation:
python
1
@given(st.dictionaries(st.text(), st.integers()))
2
def test_cache_size_invariant(initial_data):
3
cache = LRUCache(capacity=100)
4
5
for key, value in initial_data.items():
6
cache.put(key, value)
7
# Invariant: size never exceeds capacity
8
assert len(cache) <= 100
2. Round-trip Properties
Best for: Verifying that data is not lost or corrupted during serialization/deserialization, compression/decompression, or any other pair of inverse operations. This is critical for data integrity in file storage, network communication, and database interactions.
Operations that can be reversed:
python
1
@given(st.text())
2
def test_json_roundtrip(data):
3
# Skip if the string contains invalid JSON characters
4
try:
5
json_str = json.dumps(data)
6
assert json.loads(json_str) == data
7
except (UnicodeDecodeError, UnicodeEncodeError):
8
# Some strings can't be JSON encoded
9
pass
10
11
@given(st.binary())
12
def test_compression_roundtrip(data):
13
compressed = zlib.compress(data)
14
decompressed = zlib.decompress(compressed)
15
assert decompressed == data
3. Metamorphic Relations
Best for: Testing functions where the exact output is hard to predict, but the relationship between different inputs and outputs is well-defined. This is common in scientific computing, machine learning (e.g., "does adding a positive value to all inputs increase the average?"), or complex business logic.
Best for: When you're refactoring a complex algorithm or replacing a slow, simple implementation with a highly optimized one. You can use the old, trusted code as an "oracle" to verify that the new version behaves identically.
When you have a trusted reference implementation:
python
1
@given(st.lists(st.integers()))
2
def test_custom_sort_matches_builtin(lst):
3
custom_sorted = my_custom_sort(lst.copy())
4
builtin_sorted = sorted(lst)
5
assert custom_sorted == builtin_sorted
Hypothesis Strategies: Generating Complex Data
Hypothesis provides powerful strategies for generating test data:
The median should be 0.5, but our function returns 0 due to integer division!
⚠️
This bug is particularly insidious because it only appears with even-length lists where the two middle values have an odd sum. Traditional tests often miss this.
Shrinking: Finding Minimal Failing Examples
One of Hypothesis's killer features is shrinking. When it finds a failing example, it automatically simplifies it to find the minimal case that still fails.
How Hypothesis Shrinking Works
Property: buggySort(list) should preserve all elements
Bug: The function filters out negative numbers
Current Test Case
❌ Fails property
[42, -17, 0, 23, -5, 99, -1, 7, -33, 15]
Initial failing test case
Hypothesis found this failing example. Now it will try to simplify it.
Step 1 of 7Simplification Progress
python
1
def remove_duplicates(items):
2
"""Remove duplicates while preserving order."""
3
seen = set()
4
result = []
5
for item in items:
6
if item not in seen:
7
seen.add(item)
8
result.append(item)
9
# Bug: returning the set of seen items, which is unordered
10
return seen
11
12
@given(st.lists(st.integers()))
13
def test_remove_duplicates_properties(items):
14
result = remove_duplicates(items)
15
16
# Property 1: All items in the result are unique
17
assert len(result) == len(set(result))
18
19
# Property 2: The result contains only items from the original list
20
assert set(result).issubset(set(items))
21
22
# Property 3 (the one that fails): Order is preserved
23
# We can build the expected list and compare
24
expected = []
25
seen = set()
26
for item in items:
27
if item not in seen:
28
seen.add(item)
29
expected.append(item)
30
31
# This assertion will fail because `result` is an unordered set
32
assert list(result) == expected
Hypothesis might initially find a failure with [47, -23, 0, 47, 12, -23, 99, 47], but it will shrink this to the minimal failing case: [0, 1].
'left': draw(balanced_trees()) if left_size > 0 else None,
20
'right': draw(balanced_trees()) if right_size > 0 else None
21
}
Conclusion: A New Way of Thinking
Property-based testing isn't just another testing tool—it's a different way of thinking about correctness. Instead of asking "does my code work for these examples?", you ask "what should always be true about my code?"
This shift in perspective helps you:
•Find bugs you didn't know existed
•Understand your code's behavior more deeply
•Build more robust systems
•Sleep better at night
Start small. Pick one pure function in your codebase and write a property-based test for it. Let Hypothesis show you the edge cases you've been missing. Once you see it catch its first real bug, you'll be hooked.
Remember: The goal isn't to replace all your example-based tests. It's to add another powerful tool to your testing arsenal—one that helps you think differently about what it means for code to be correct.