Database Landscape in 2020

Ashok Dey

14-09-2020

1.8k words • 11 min read
help-me-select › backend › database

api, help-me-select, mongo, nosql, redis, sql, timeseries

When I first started with web development, I never bothered to think about databases because since school time (back in 2010) we all have been taught about relation databases. Most prominently MySQL. With the buzz of Node.js I got to know about a NoSQL database called MongoDB. But when I started working as a software engineer I got to know that the database landscape is not only about relational databases like MySQL or PostgreSQL only. The landscape has a variety of shades and it’s growing day by day.

There are several types of databases that has emerged when relational databases were not able to fit the use cases or have no easy solutions. Though we did a good job of fitting all (almost) the use cases into relational data model for almost more than 40 years!

Today in 2020, the database landscape has taken a rapid growth and there are plenty of databases types that are available and the count is increasing. The relational databases had such a long dominance that we broadly categorized databases in two branches Relational & NoSQL. NoSQL is the category that holds databases which are Not Only SQL.

Today let’s explore the different types of databases, their use cases, the trade-offs and complexities associated with them. I will also mention the most popular databases in the domains.

Relational Databases

Relational databases are used to organize data into structured tables which has rows and columns. For relational database we should always be ready with the schema (structure of storing the data) and the relationships between the data should be very clear to us. Relationships between tables are maintained using keys, and data integrity is ensured through ACID.

They are ideal for applications where strict data consistency and integrity are critical and they use the SQL language for data definition and data manipulations.

When to Use

When the data can be structured and they have relationships between them.
When we need ACID compliance and transaction support.

Tradeoffs

Great for consistency, but schema changes require migrations, downtime planning, or complex online migration tooling.
Vertical scaling is easier; horizontal scaling is complex and often application-aware.
Complex joins are powerful but can be slow at scale if indexing isn’t planned carefully.

Complexities

It is a challenge to handle schema evolution with growing data.
Scaling relational databases is tuff like managing replication lags and shards.
It is hard to build efficient indexing strategies without degrading writes.

Document Databases

Document databases store data in document-like structures, usually JSON or BSON, rather than rows and columns. Each document can have a flexible schema, allowing different documents in the same collection. Document databases make it easy to store nested data structures, which reduces the need for complex joins. They are highly scalable and are well-suited to applications with rapidly evolving data requirements.

When to Use

When the data is unstructured and flexibility is required.
When we are into scenarios like event logging for audit purposes
When we need to store nested data or meta data for resources

Tradeoffs

Without strong schema enforcement, different documents may diverge structurally.
Complex relational queries often require manual denormalization or multiple round-trips.
Reads may return stale data; consistency guarantees vary across deployments.

Complexities

Designing document structure to avoid deeply nested or overly large documents.
Handling duplication due to denormalization must update the same data in multiple places.
Choosing between embedding vs referencing models.

Key Value Database

Key-value databases are the simplest form of database, storing data as a collection of key-value pairs. Each key is unique and maps directly to a value, which can be a string, JSON, or binary object. Databases like Dynamo DB are extremely fast because retrieval is based solely on the key, without any complex querying.

When to Use

When we need real-time Leaderboards and counters
When we need to store configurations or feature flags

Tradeoffs

Great for performance, but unsuitable for complex queries unless we build secondary structures.
Lookups usually require the key and there is no native range queries.
Some KV stores optimize for availability over strict consistency, so choose wisely.

Complexities

Designing keys to avoid hot partitions or uneven distribution.
Managing TTL (Time to Live) and evictions intelligently.
Ensuring atomicity across multiple keys (multi-key transactions are limited).
Maintaining cluster rebalancing and handling partition tolerance.

Wide Column Database

Wide column databases store data in tables, rows, and dynamic columns, but columns are grouped into column families, allowing efficient storage of sparse data. They are optimized for high write and read throughput and can scale horizontally across distributed clusters. These databases are often used in big data and analytical applications due to their ability to handle large volumes of structured and semi-structured data.

When to Use

When we need 100% uptime via masterless architecture, even if multiple nodes or entire data centers fail.
When we need extremely high write throughput.

Tradeoffs

Queries must match the table’s primary key and the clustering key structure.
We can’t arbitrarily query the data, the model must be designed for the queries.
Strong consistency requires specific configurations (quorums), reducing performance.
Read performance varies depending on partition size and structure.

Complexities

Query-driven data modeling usually requires duplicating data in multiple tables.
It is painful sometimes to manage compaction, repair operations, tombstones.
Be careful and avoid “wide partitions” which can degrade performance badly.
We must ensure correct replication settings for desired consistency levels.

Timeseries Database

Timeseries databases are specialized for storing and querying data indexed by time. They are optimized for handling high-frequency data, supporting operations like aggregation over time intervals, downsampling, and retention policies.

When to Use

When we require analysis of trends over time
When we get data that is time dependent like the data from sensors or IoT.

Tradeoffs

These databases are not ideal for relational or unstructured workloads.
To keep the performance old data is often auto-deleted or compressed.
Too many unique series (high cardinality) can cause performance collapse.

Complexities

Designing measurement/series/tag structure to avoid high-cardinality explosion.
Handling downsampling (rollups) and retention rules cleanly.
Dealing with backfilling data is hard. Many timeseries DBs handle out-of-order timestamps poorly.
Scaling storage and query engines for long-term data retention.

Graph Database

Graph databases focus on storing relationships between data points rather than just the data itself. They use nodes, edges, and properties to model real-world entities and connections. This makes them highly efficient for querying complex relationships, such as social networks or recommendation systems. Graph databases excel in scenarios where relationships are as important as the data.

When to Use

When our application needs to explore complex relationships.
When we need recommendation systems or knowledge graphs.

Tradeoffs

Great for deep traversal but not efficient for big analytical scans.
Harder to distribute horizontally than document or key–value stores.

Complexities

Choosing between property graph vs RDF models.
Indexing strategies that avoid slow traversals.
Handling graph updates at scale without locking large subgraphs.
Building hybrid architectures when part of the data is relational and part is graph oriented.

In-memory Database

In-memory databases store data primarily in RAM instead of on disk, allowing extremely fast read and write operations. They are often used to accelerate applications that require low-latency access to frequently accessed data. Many in-memory databases also support persistence to disk for durability. They are commonly used as caching layers or for high-speed transaction processing.

When to Use

When we need caching of data
When we need real-time analytics like leaderboards and session management.

Tradeoffs

Restart time for snapshots, disk I/O, and durability limitations.
RAM is expensive. Storing large datasets entirely in-memory may be cost-prohibitive.

Complexities

Ensuring safe persistence and recovery strategies.
Managing memory fragmentation.
Handling cluster topology changes while maintaining data consistency.

Search Database

Search databases are optimized for text search, ranking, and complex queries over large datasets. They can efficiently index and retrieve data based on keywords, relevance scoring, and full-text queries. These databases are often used in search engines, logging systems, and recommendation platforms. The most critical data structure that enables fast full-text searching in modern search databases is the Inverted Index.

When to Use

When we want to text search through huge data
We do not want to overload our primary databases with text search

Tradeoffs

Eventual consistency with distributed indexing become harder
Writes are expensive due to tokenization, analysis, and multiple index segments.
Not a replacement for primary storage.

Complexities

Choosing analyzers/tokenizers for each field.
Managing cluster sharding, rebalancing, and mapping evolution.
Dealing with segment merging and index bloat.
Avoiding node hot spots caused by uneven shard distribution.
Preventing split-brain in multi node clusters.

Special Mentions

ClickHouse

ClickHouse is a column-oriented database designed for real-time analytics on large volumes of data. It was open-sourced by Yandex and it is built for extremely fast query performance on analytical workloads (OLAP), especially where data is read-heavy and needs aggregation.

The distinct features of clickhouse database are:

It is blazing fast for analytical queries
It is well suited for big data because of columnar storage and compression
It can scale horizontally across clusters

Spanner

Spanner is a globally distributed, strongly consistent SQL database built by Google. It combines the relational features of traditional SQL databases (schemas, ACID transactions, SQL queries) with the horizontal scalability and distribution typically found in NoSQL systems. It also supports ProtoBufs!

The distinct features of spanner database are:

It is global distribution with strong consistency
It offers automatic sharding, replication, failover
It is ideal for mission-critical systems with strict consistency needs

Conclusion

If you are getting to know so many databases first time then let me warn you that this is not an exhaustive list.

When it comes to choosing the right database, the decision depends on understanding our data model, access patterns, scalability requirements, and operational constraints. No single database is perfect for every use case, so align our choice with your real application needs.

While we discussed how each database is used, managing them introduces its own set of complexities. We should not simply follow market trends we should trust our experience, understand the trade-offs, and keep learning as technologies mature.

Happy Coding!

Relational Databases

When to Use

Tradeoffs

Complexities

Most Popular Choices

Document Databases

When to Use

Tradeoffs

Complexities

Most Popular Choices

Key Value Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

Wide Column Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

Timeseries Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

Graph Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

In-memory Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

Search Database

When to Use

Tradeoffs

Complexities

Most Popular Choices

Special Mentions

ClickHouse

Spanner

Conclusion