The C++ Rewrite of Cassandra That Actually Delivers on the Performance Promises

In the world of distributed databases, where milliseconds matter and scalability can make or break an application, ScyllaDB has emerged as a game-changing force. This isn't just another database—it's a complete reimagining of what's possible when you combine cutting-edge engineering with a relentless focus on performance. If you've ever wrestled with Cassandra's garbage collection pauses, winced at DynamoDB's pricing for high-throughput workloads, or hit MongoDB's scaling limits, ScyllaDB might just be the solution you've been searching for.

What ScyllaDB Actually Is: Wide-Column Storage Explained

ScyllaDB is a wide-column store (also called a column-family database), which means it organizes data very differently from both relational databases and document stores like MongoDB. Understanding this distinction is crucial for knowing when to use it.

💡

Wide-Column Data Model

In a wide-column store, data is organized into:

•Partition Key: Determines which node stores the data (like a hash)
•Clustering Key: Sorts data within a partition (like an index)
•Columns: Can vary per row, stored sparsely

Think of it as a distributed, sorted map-of-maps. Each partition is like a sorted table that lives on specific nodes, and you can efficiently query ranges within partitions. This is fundamentally different from:

Relational Databases (PostgreSQL, MySQL): Fixed schemas with ACID transactions and JOIN operations across normalized tables. Great for complex business logic but struggle with horizontal scaling.

Document Stores (MongoDB): Store nested JSON-like documents. Flexible schemas but typically scale by sharding, which becomes complex. Better for content management and rapid prototyping.

Key-Value Stores (Redis, DynamoDB): Simple key-value lookups. Fast but limited query capabilities—you can only fetch by exact key.

Graph Databases (Neo4j): Optimized for relationship traversal. Perfect for social networks or recommendation engines but overkill for most use cases.

Wide-column stores like ScyllaDB excel at time-series data, event logging, real-time analytics, and any use case requiring high-throughput reads/writes with predictable access patterns.

The Genesis of Speed: Why ScyllaDB Exists

ScyllaDB's story begins with a simple but powerful observation: the fundamental architecture of existing wide-column databases was holding them back. While Apache Cassandra pioneered the distributed, eventually-consistent wide-column space, its Java-based implementation came with inherent limitations—garbage collection pauses, threading overhead, and inefficient resource utilization on modern multi-core hardware.

Enter Avi Kivity and Dor Laor, the masterminds behind ScyllaDB, who previously created the innovative OSv operating system. Rather than incrementally improving existing database technology, they chose to rebuild from the ground up using C++ and the high-performance Seastar framework. The result is a database that supports the same protocols as Cassandra (CQL) and the same file formats (SSTable), but delivers significantly higher throughputs and lower latencies.

This architectural decision pays dividends in ways that traditional databases simply cannot match. ScyllaDB uses a sharded design on each node, meaning that each CPU core handles a different subset of data. Cores do not share data, but rather communicate explicitly when they need to. This approach eliminates the coordination overhead that plagues multi-threaded databases and allows ScyllaDB to scale linearly with the number of CPU cores.

When to Choose ScyllaDB Over Alternatives

Database Selection Guide

What type of data system do you need?

First, let's determine if you need a primary database, a caching layer, or a complete application backend platform.

Choose ScyllaDB Over Relational Databases When:

You Need Horizontal Scaling: If your application requires handling millions of writes per second across multiple servers, relational databases hit fundamental limits. ScyllaDB can scale to hundreds of nodes linearly.

Your Access Patterns Are Predictable: If you primarily query by known keys or key ranges (like user timelines, sensor data by time, or game leaderboards), wide-column stores are perfect. If you need complex JOINs across normalized data, stick with PostgreSQL.

Eventual Consistency Is Acceptable: ScyllaDB prioritizes availability and partition tolerance over immediate consistency. If you need ACID transactions across multiple records, use a relational database.

Example Use Cases:

•Time-series data (IoT sensors, metrics)
•User activity feeds (social media timelines)
•Real-time leaderboards and scoring
•High-frequency trading data
•Logging and event tracking

Choose ScyllaDB Over MongoDB When:

Performance Is Critical: MongoDB's document model adds overhead. If you're storing relatively flat data and need maximum throughput, ScyllaDB's columnar approach is more efficient.

You Have Predictable Query Patterns: MongoDB shines when you need flexible queries across nested documents. If your queries follow known patterns (fetch user data, get time ranges), ScyllaDB is faster.

Linear Scaling Matters: MongoDB's sharding can become complex and uneven. ScyllaDB's consistent hashing distributes data more predictably.

Example Migration: A social media platform storing user posts might move from MongoDB (flexible document queries) to ScyllaDB (optimized for timeline reads) as they scale.

Choose ScyllaDB Over DynamoDB When:

⚠️

Cost Comparison Alert: ScyllaDB costs are significantly lower than DynamoDB costs in all but one scenario, with realistic workloads showing five to forty times lower costs.

Cost Is a Major Factor: ScyllaDB costs are significantly lower than DynamoDB costs in all but one scenario, with realistic workloads showing 5× to 40× lower costs.

You Want Multi-Cloud Flexibility: DynamoDB locks you into AWS. ScyllaDB runs anywhere.

You Need Complex Queries: DynamoDB's query capabilities are limited. ScyllaDB supports rich CQL queries with secondary indexes, filtering, and aggregations.

Predictable Performance: DynamoDB's autoscaling can introduce latency spikes. ScyllaDB provides consistent performance.

Choose ScyllaDB Over Cassandra When:

Performance Is Everything: Samsung benchmark reported that ScyllaDB outperformed Cassandra on a cluster of 24-core machines by a margin of 10–37× depending on the YCSB workload.

Operational Simplicity: Cassandra requires extensive tuning. ScyllaDB is largely self-tuning.

Faster Scaling: The tablet's architecture allows ScyllaDB to double or even quadruple a cluster's throughput capacity within just 15 minutes compared to hours for Cassandra.

Performance That Defies Belief

The performance numbers surrounding ScyllaDB read like marketing fiction, but they're backed by rigorous testing and real-world deployments. The ScyllaDB authors claim that they have measured as much as 2 million requests per second on a single machine.

Real-Time Performance Comparison

System Load50%

Lower is better (P99 Latency in ms)

ScyllaDB2.1ms

✨ Sub-5ms!

Cassandra17.1ms

DynamoDB10.6ms

10-37×

Better performance than Cassandra

5-40×

Lower cost than DynamoDB

90%

CPU utilization efficiency

✨

Real-World Performance Gains

Comcast's migration from Cassandra to ScyllaDB achieved:

•10× improvement in latency
•2× the requests handled
•Less than 5% of the cost
•Node reduction: 962 → 78 nodes

But raw throughput numbers only tell part of the story. What makes ScyllaDB truly revolutionary is its ability to maintain predictable, low-latency performance under heavy load. Traditional databases often suffer from unpredictable latency spikes due to garbage collection, background compaction, or resource contention. ScyllaDB's architecture eliminates these sources of jitter, delivering consistent single-digit millisecond response times even at extreme scale.

The Secret Sauce: Seastar and Shard-Per-Core Architecture

At the heart of ScyllaDB's performance advantage lies the Seastar framework—a sophisticated asynchronous programming library designed specifically for high-performance applications on modern multi-core systems. Unlike traditional databases that rely on thread pools and shared memory, ScyllaDB dedicates each CPU core to handling a specific shard of data.

This shard-per-core approach provides several critical advantages:

•Eliminates Lock Contention: No coordination overhead between cores
•Optimal Cache Utilization: Each core works exclusively with its own data
•True Linear Scalability: More cores = proportional performance gains
•Predictable Performance: No garbage collection or threading overhead

The Seastar framework implements user-space networking and storage stacks that bypass the kernel's overhead, allowing ScyllaDB to saturate modern NVMe SSDs and 100-gigabit networks.

Revolutionary Elasticity with Tablets

With the new ScyllaDB 6.0 open-source release, the technology is taking the next step forward with an innovative replication architecture called 'tablets' that enables some impressive performance numbers for database scaling.

Traditional wide-column databases struggle with elastic scaling because redistributing data across new nodes is time-consuming. Adding nodes to a Cassandra cluster can take hours or days. ScyllaDB's tablets architecture changes this entirely.

"We can redirect ranges of [database] keys to different nodes based on needs," explained CTO Avi Kivity. This dynamic remapping enables organizations to scale clusters in real-time to match actual demand, transforming capacity planning from guesswork to responsive management.

Dual API Compatibility: The Best of Both Worlds

ScyllaDB supports both Cassandra's CQL and DynamoDB's API, making it uniquely positioned for migrations:

cql

1 -- CQL Example: Creating a time-series table
2 CREATE TABLE sensor_data (
3     sensor_id UUID,
4     timestamp TIMESTAMP,
5     temperature DOUBLE,
6     humidity DOUBLE,
7     PRIMARY KEY (sensor_id, timestamp)
8 ) WITH CLUSTERING ORDER BY (timestamp DESC);
9  
10 -- Efficient range query
11 SELECT * FROM sensor_data 
12 WHERE sensor_id = 550e8400-e29b-41d4-a716-446655440000
13 AND timestamp >= '2023-01-01' 
14 AND timestamp < '2023-02-01';

CQL Interface: Provides SQL-like syntax familiar to developers, with powerful features like secondary indexes, materialized views, and aggregations.

DynamoDB API: Allows zero-code migrations from DynamoDB while offering better performance and lower costs.

This dual compatibility is particularly valuable for organizations consolidating multiple data stores or avoiding vendor lock-in.

The Economic Argument

ScyllaDB typically offers approximately 75% total cost of ownership savings with around 5× higher throughput compared to Cassandra. Let's look at real numbers:

Real-World Cost Comparison (1 Billion Requests/Day)

DynamoDB: ~$15,000/month

•Write capacity: $0.00065 per WCU
•Read capacity: $0.00013 per RCU
•Storage: $0.25 per GB
•No infrastructure management

Cassandra on AWS: ~$8,000/month

•10 × i3.4xlarge instances
•DevOps team overhead
•Complex tuning required

ScyllaDB on AWS: ~$2,400/month

•3 × i3.4xlarge instances (70% fewer nodes!)
•Self-tuning, minimal ops
•Same performance as 10-node Cassandra

The dramatic cost reduction comes from ScyllaDB's ability to fully utilize modern hardware. While Cassandra might use 30-50% of available CPU due to JVM overhead, ScyllaDB consistently pushes 80-95% utilization. This means you need significantly fewer nodes to handle the same workload.

For DynamoDB users, the savings are even more dramatic, especially for write-heavy workloads where DynamoDB charges 5× more for writes than reads.

Specific Use Cases Where ScyllaDB Excels

Time-Series and IoT Data

Perfect Fit: Sensors generating millions of data points, each with a timestamp. ScyllaDB's clustering keys naturally sort by time, enabling efficient range queries.

ScyllaDB vs Time-Series Databases

When to use ScyllaDB for time-series:

•Need to correlate time-series with other data types (user profiles, metadata)
•Require multi-datacenter replication with tunable consistency
•Want a general-purpose database that handles time-series well
•Need extreme write throughput (millions/sec) with horizontal scaling

When to use dedicated time-series DBs (InfluxDB, TimescaleDB):

•Require specialized time-series functions (downsampling, interpolation)
•Need continuous aggregates and automatic retention policies
•Want built-in visualization and alerting (Grafana integration)
•Primarily doing analytical queries on historical data

ScyllaDB excels at the "write path" - ingesting massive streams of time-series data. For complex time-series analytics, you might use ScyllaDB for hot data and sync to a specialized TSDB for historical analysis.

Why Not Relational: Can't scale to millions of writes per second Why Not MongoDB: Document overhead unnecessary for simple time-value pairs Why Not InfluxDB: ScyllaDB offers better general-purpose capabilities alongside time-series performance

Real-Time Analytics and Dashboards

Perfect Fit: Pre-computed aggregations stored with compound keys (user_id, date, metric_type). Sub-millisecond reads power real-time dashboards. Why Not Elasticsearch: Better write performance and lower resource usage Why Not ClickHouse: ScyllaDB handles mixed read/write workloads better

Gaming Leaderboards and User Profiles

Perfect Fit: Player stats, achievements, and leaderboards with millions of concurrent players. Clustering keys maintain sorted leaderboards automatically. Why Not Redis: Better persistence and more complex data structures Why Not DynamoDB: Much lower costs at scale

AdTech and Real-Time Bidding

Perfect Fit: Storing user profiles, campaign data, and bid responses with microsecond latency requirements. Why Not PostgreSQL: Can't handle the throughput Why Not MongoDB: Too much latency overhead

Migration Strategies and Getting Started

💡

Migration Paths

•From Cassandra: Often as simple as changing connection strings due to full CQL compatibility
•From DynamoDB: Choose between API compatibility or CQL migration
•From MongoDB: Requires data model changes but tools exist to assist
•From Relational: Denormalize schema and eliminate JOINs

From Cassandra: Often as simple as changing connection strings due to full CQL compatibility. Use the ScyllaDB Migrator for data migration.

From DynamoDB: Choose between DynamoDB API compatibility (zero code changes) or migrating to CQL (enhanced functionality).

From MongoDB: Requires data model changes but tools exist to assist. Focus on identifying access patterns first.

From Relational: Denormalize your schema and eliminate JOINs. Think "query-first" rather than "normalization-first."

When NOT to Use ScyllaDB

Complex Transactions: If you need multi-record ACID transactions, use PostgreSQL or similar.

Unpredictable Query Patterns: If you need ad-hoc analytics across arbitrary dimensions, consider ClickHouse or BigQuery.

Small Scale: If you're handling < 1000 QPS, the operational overhead isn't worth it. Use PostgreSQL or managed cloud databases.

Heavy Analytics Workloads: For data warehousing with complex aggregations, specialized analytics databases perform better.

The Bottom Line

ScyllaDB isn't just incrementally better than existing wide-column databases—it's categorically different. It excels when you need:

•Massive scale (millions of operations per second)
•Predictable access patterns (key-based or range queries)
•Consistent low latency (single-digit milliseconds)
•Cost efficiency at scale
•Operational simplicity compared to Cassandra

If you're building time-series systems, real-time analytics platforms, high-scale web applications, or any system where database performance directly impacts user experience, ScyllaDB deserves serious evaluation. The migration path is straightforward, the performance gains are immediate, and the operational benefits compound over time.

The question isn't whether ScyllaDB is impressive—it's whether your specific use case aligns with its strengths as a wide-column store optimized for predictable, high-throughput workloads.

1	`-- CQL Example: Creating a time-series table`
2	`CREATE TABLE sensor_data (`
3	`sensor_id UUID,`
4	`timestamp TIMESTAMP,`
5	`temperature DOUBLE,`
6	`humidity DOUBLE,`
7	`PRIMARY KEY (sensor_id, timestamp)`
8	`) WITH CLUSTERING ORDER BY (timestamp DESC);`
9
10	`-- Efficient range query`
11	`SELECT * FROM sensor_data`
12	`WHERE sensor_id = 550e8400-e29b-41d4-a716-446655440000`
13	`AND timestamp >= '2023-01-01'`
14	`AND timestamp < '2023-02-01';`

ScyllaDB: The Monstrously Fast Wide-Column Database Redefining Performance