Skip to main content
2025-01-2012 min read
Software Engineering

Property-Based Testing: Let Your Computer Find Bugs You Can't Imagine

The Bug That Changed My Testing Philosophy

Picture this: You've written a function to parse timestamps, tested it with dozens of examples, and it's been running in production for months. Then one day, it crashes on "2020-02-29T23:59:60". A leap second on a leap day—a combination you never thought to test.
This is where property-based testing shines. Instead of trying to imagine every possible edge case, you describe the properties your code should satisfy, and let the computer generate thousands of test cases, including the weird ones you'd never think of.

What Makes Property-Based Testing Different?

Traditional unit testing is example-based: you, the developer, provide a few specific inputs and assert that they produce specific outputs. Property-based testing flips this on its head: you define the general properties or "rules" your code must obey, and a framework generates hundreds or thousands of examples to try and prove you wrong.

Property-Based Testing

Developer

Property 1:
Output is sorted

Property 2:
Same length

Property 3:
Same elements

Hypothesis
Framework

Generates:
sort 5,2,8

Generates:
sort empty

Generates:
sort 0,0,0

Generates:
sort -999,1000,0

...thousands more

Example-Based Testing

Developer

Test 1: sort 3,1,2

Test 2: sort empty

Test 3: sort -1,0,1

Expects: 1,2,3

Expects: empty

Expects: -1,0,1

Traditional unit tests are example-based: you provide specific inputs and check for specific outputs.
python
1def test_sort_examples():
2 assert sort([3, 1, 2]) == [1, 2, 3]
3 assert sort([]) == []
4 assert sort([1]) == [1]
5 assert sort([2, 2, 1]) == [1, 2, 2]
Property-based tests describe general truths about your code:
python
1from hypothesis import given, strategies as st
2
3@given(st.lists(st.integers()))
4def test_sort_properties(lst):
5 sorted_list = sort(lst)
6
7 # Property 1: Output length equals input length
8 assert len(sorted_list) == len(lst)
9
10 # Property 2: Output is ordered
11 for i in range(len(sorted_list) - 1):
12 assert sorted_list[i] <= sorted_list[i + 1]
13
14 # Property 3: Output contains same elements as input
15 assert sorted(lst) == sorted_list
The key insight: You don't specify what to test, you specify how to test. The framework generates the what.

Interactive Testing Comparison

How Example-Based Testing Works:

You manually write specific test cases with known inputs and expected outputs.

"hello""HELLO"
"World""WORLD"
"123""123"
""""

Getting Started with Hypothesis

Let's build intuition with a simple example: a function that reverses strings.
python
1def reverse_string(s: str) -> str:
2 """Reverse a string."""
3 return s[::-1]
4
5# Traditional test
6def test_reverse_examples():
7 assert reverse_string("hello") == "olleh"
8 assert reverse_string("") == ""
9 assert reverse_string("a") == "a"
10
11# Property-based test
12from hypothesis import given
13from hypothesis import strategies as st
14
15@given(st.text())
16def test_reverse_properties(s):
17 reversed_s = reverse_string(s)
18
19 # Property: Reversing twice gives original
20 assert reverse_string(reversed_s) == s
21
22 # Property: Length is preserved
23 assert len(reversed_s) == len(s)
24
25 # Property: First char becomes last (if non-empty)
26 if s:
27 assert reversed_s[-1] == s[0]
28 assert reversed_s[0] == s[-1]
When you run this test, Hypothesis will generate hundreds of strings: empty strings, single characters, Unicode snowmen (☃), null bytes, extremely long strings, and more.

How Property Testing Explores the Input Space

Property being tested: isInsideTriangle(x, y) correctly classifies points

Compare testing strategies: Random sampling vs intelligent shrinking. Property-based testing doesn't know boundaries beforehand - it discovers them by shrinking failures to minimal cases.

30
Total Tests
0
Inside
0
Outside

🔵 Blue dots: Triangle vertices

🟢 Green dots: Points inside the triangle

🔴 Red dots: Points outside the triangle

🟡 Yellow dot: Currently testing

Real-World Properties to Test

1. Invariants

Best for: Enforcing universal rules about your data structures or system state. For example, ensuring a cache never exceeds its capacity, or a user's balance never drops below zero in a banking application.
Properties that remain true regardless of the operation:
python
1@given(st.dictionaries(st.text(), st.integers()))
2def test_cache_size_invariant(initial_data):
3 cache = LRUCache(capacity=100)
4
5 for key, value in initial_data.items():
6 cache.put(key, value)
7 # Invariant: size never exceeds capacity
8 assert len(cache) <= 100

2. Round-trip Properties

Best for: Verifying that data is not lost or corrupted during serialization/deserialization, compression/decompression, or any other pair of inverse operations. This is critical for data integrity in file storage, network communication, and database interactions.
Operations that can be reversed:
python
1@given(st.text())
2def test_json_roundtrip(data):
3 # Skip if the string contains invalid JSON characters
4 try:
5 json_str = json.dumps(data)
6 assert json.loads(json_str) == data
7 except (UnicodeDecodeError, UnicodeEncodeError):
8 # Some strings can't be JSON encoded
9 pass
10
11@given(st.binary())
12def test_compression_roundtrip(data):
13 compressed = zlib.compress(data)
14 decompressed = zlib.decompress(compressed)
15 assert decompressed == data

3. Metamorphic Relations

Best for: Testing functions where the exact output is hard to predict, but the relationship between different inputs and outputs is well-defined. This is common in scientific computing, machine learning (e.g., "does adding a positive value to all inputs increase the average?"), or complex business logic.
How outputs change when inputs change:
python
1@given(st.lists(st.floats(allow_nan=False, allow_infinity=False)))
2def test_average_scaling(numbers):
3 if not numbers:
4 return
5
6 avg1 = average(numbers)
7 scaled = [x * 2 for x in numbers]
8 avg2 = average(scaled)
9
10 # Property: Scaling all inputs scales the average
11 assert abs(avg2 - (avg1 * 2)) < 0.0001

4. Test Oracle Properties

Best for: When you're refactoring a complex algorithm or replacing a slow, simple implementation with a highly optimized one. You can use the old, trusted code as an "oracle" to verify that the new version behaves identically.
When you have a trusted reference implementation:
python
1@given(st.lists(st.integers()))
2def test_custom_sort_matches_builtin(lst):
3 custom_sorted = my_custom_sort(lst.copy())
4 builtin_sorted = sorted(lst)
5 assert custom_sorted == builtin_sorted

Hypothesis Strategies: Generating Complex Data

Hypothesis provides powerful strategies for generating test data:
python
1from hypothesis import strategies as st
2from datetime import datetime
3
4# Basic types
5integers = st.integers(min_value=0, max_value=100)
6floats = st.floats(allow_nan=False, allow_infinity=False)
7text = st.text(alphabet="abcdefghijklmnopqrstuvwxyz", min_size=1)
8
9# Collections
10lists_of_ints = st.lists(st.integers(), min_size=1, max_size=10)
11dict_str_to_int = st.dictionaries(st.text(), st.integers())
12
13# Complex objects
14@st.composite
15def user_profiles(draw):
16 # The `draw` function is the magic of composite strategies.
17 # It takes a strategy and "draws" a single value from it,
18 # allowing you to combine multiple strategies into one complex object.
19 return {
20 "username": draw(st.text(min_size=3, max_size=20)),
21 "age": draw(st.integers(min_value=13, max_value=120)),
22 "email": draw(st.emails()),
23 "joined": draw(st.datetimes(
24 min_value=datetime(2020, 1, 1),
25 max_value=datetime(2025, 1, 1)
26 )),
27 "premium": draw(st.booleans())
28 }
29
30@given(user_profiles())
31def test_user_serialization(user):
32 serialized = serialize_user(user)
33 deserialized = deserialize_user(serialized)
34 assert deserialized == user

Finding Real Bugs: A Case Study

Let's implement a function that finds the median of a list, but with a subtle bug:
python
1def find_median(numbers):
2 """Find the median of a list of numbers."""
3 if not numbers:
4 raise ValueError("Cannot find median of empty list")
5
6 sorted_nums = sorted(numbers)
7 n = len(sorted_nums)
8
9 if n % 2 == 1:
10 return sorted_nums[n // 2]
11 else:
12 # Bug: integer division when we need float division
13 return (sorted_nums[n // 2 - 1] + sorted_nums[n // 2]) // 2
14
15# Property-based test
16@given(st.lists(st.integers(), min_size=1))
17def test_median_properties(numbers):
18 median = find_median(numbers)
19
20 # Property 1: Median is within the range
21 assert min(numbers) <= median <= max(numbers)
22
23 # Property 2: At least half elements are >= median
24 greater_equal = sum(1 for n in numbers if n >= median)
25 assert greater_equal >= len(numbers) // 2
26
27 # Property 3: At least half elements are <= median
28 less_equal = sum(1 for n in numbers if n <= median)
29 assert less_equal >= len(numbers) // 2
Running this test, Hypothesis quickly finds a counterexample:
Falsifying example: test_median_properties(numbers=[0, 1])
The median should be 0.5, but our function returns 0 due to integer division!
⚠️
This bug is particularly insidious because it only appears with even-length lists where the two middle values have an odd sum. Traditional tests often miss this.

Shrinking: Finding Minimal Failing Examples

One of Hypothesis's killer features is shrinking. When it finds a failing example, it automatically simplifies it to find the minimal case that still fails.

How Hypothesis Shrinking Works

Property: buggySort(list) should preserve all elements

Bug: The function filters out negative numbers

Current Test Case

❌ Fails property
[42, -17, 0, 23, -5, 99, -1, 7, -33, 15]

Initial failing test case

Hypothesis found this failing example. Now it will try to simplify it.

Step 1 of 7Simplification Progress
python
1def remove_duplicates(items):
2 """Remove duplicates while preserving order."""
3 seen = set()
4 result = []
5 for item in items:
6 if item not in seen:
7 seen.add(item)
8 result.append(item)
9 # Bug: returning the set of seen items, which is unordered
10 return seen
11
12@given(st.lists(st.integers()))
13def test_remove_duplicates_properties(items):
14 result = remove_duplicates(items)
15
16 # Property 1: All items in the result are unique
17 assert len(result) == len(set(result))
18
19 # Property 2: The result contains only items from the original list
20 assert set(result).issubset(set(items))
21
22 # Property 3 (the one that fails): Order is preserved
23 # We can build the expected list and compare
24 expected = []
25 seen = set()
26 for item in items:
27 if item not in seen:
28 seen.add(item)
29 expected.append(item)
30
31 # This assertion will fail because `result` is an unordered set
32 assert list(result) == expected
Hypothesis might initially find a failure with [47, -23, 0, 47, 12, -23, 99, 47], but it will shrink this to the minimal failing case: [0, 1].

Integration Strategies

With pytest

python
1# conftest.py
2from hypothesis import settings
3
4# Configure Hypothesis for CI
5settings.register_profile("ci", max_examples=1000)
6settings.register_profile("dev", max_examples=100)
7settings.register_profile("debug", max_examples=10, verbosity=Verbosity.verbose)
8
9# Run with: pytest --hypothesis-profile=ci

Combining with Traditional Tests

python
1class TestUserRegistration:
2 # Traditional edge case tests
3 def test_empty_username_rejected(self):
4 with pytest.raises(ValueError):
5 register_user("", "email@example.com")
6
7 def test_duplicate_email_rejected(self):
8 register_user("user1", "test@example.com")
9 with pytest.raises(ValueError):
10 register_user("user2", "test@example.com")
11
12 # Property-based tests for deeper coverage
13 @given(st.text(), st.emails())
14 def test_registration_properties(self, username, email):
15 # `assume` tells Hypothesis to discard test cases that don't meet a condition.
16 # It's different from an `assert` because it doesn't cause a failure;
17 # it just skips uninteresting or invalid examples.
18 # Here, we're not interested in testing empty usernames with this property.
19 assume(username)
20
21 user = register_user(username, email)
22
23 # Properties that should hold
24 assert user.username == username
25 assert user.email == email.lower()
26 assert user.id is not None
27 assert user.created_at <= datetime.now()

When to Use Property-Based Testing

💡
Use property-based testing when:
  • Testing pure functions with clear mathematical properties
  • Working with data transformations (parsing, serialization, encoding)
  • Implementing algorithms with known properties
  • Building data structures with invariants
  • Testing APIs or protocols
⚠️
Be cautious when:
  • Testing stateful systems with complex dependencies
  • Working with external services or databases
  • Properties are hard to define or verify
  • Test execution time is critical

Advanced Techniques

Stateful Testing

Testing stateful systems by modeling them as state machines:
python
1from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
2
3class ShoppingCartMachine(RuleBasedStateMachine):
4 def __init__(self):
5 super().__init__()
6 self.cart = ShoppingCart()
7 self.model_items = {}
8
9 @rule(item_id=st.integers(), quantity=st.integers(min_value=1, max_value=10))
10 def add_item(self, item_id, quantity):
11 self.cart.add_item(item_id, quantity)
12 self.model_items[item_id] = self.model_items.get(item_id, 0) + quantity
13
14 @rule(item_id=st.integers())
15 def remove_item(self, item_id):
16 if item_id in self.model_items:
17 self.cart.remove_item(item_id)
18 del self.model_items[item_id]
19
20 @invariant()
21 def quantities_match(self):
22 for item_id, quantity in self.model_items.items():
23 assert self.cart.get_quantity(item_id) == quantity
24
25# Run the state machine test
26TestCart = ShoppingCartMachine.TestCase

Custom Strategies

python
1@st.composite
2def sorted_lists(draw, elements=st.integers()):
3 """Generate sorted lists."""
4 lst = draw(st.lists(elements))
5 return sorted(lst)
6
7@st.composite
8def balanced_trees(draw):
9 """Generate balanced binary trees."""
10 size = draw(st.integers(min_value=0, max_value=4))
11 if size == 0:
12 return None
13
14 left_size = size // 2
15 right_size = size - left_size - 1
16
17 return {
18 'value': draw(st.integers()),
19 'left': draw(balanced_trees()) if left_size > 0 else None,
20 'right': draw(balanced_trees()) if right_size > 0 else None
21 }

Conclusion: A New Way of Thinking

Property-based testing isn't just another testing tool—it's a different way of thinking about correctness. Instead of asking "does my code work for these examples?", you ask "what should always be true about my code?"
This shift in perspective helps you:
  • Find bugs you didn't know existed
  • Understand your code's behavior more deeply
  • Build more robust systems
  • Sleep better at night
Start small. Pick one pure function in your codebase and write a property-based test for it. Let Hypothesis show you the edge cases you've been missing. Once you see it catch its first real bug, you'll be hooked.

Resources for Further Learning

Remember: The goal isn't to replace all your example-based tests. It's to add another powerful tool to your testing arsenal—one that helps you think differently about what it means for code to be correct.