When I first started with web development, I never bothered to think about databases because since school time (back in 2010) we all have been taught about relation databases. Most prominently MySQL. With the buzz of Node.js I got to know about a NoSQL database called MongoDB. But when I started working as a software engineer I got to know that the database landscape is not only about relational databases like MySQL or PostgreSQL only. The landscape has a variety of shades and it’s growing day by day.
There are several types of databases that has emerged when relational databases were not able to fit the use cases or have no easy solutions. Though we did a good job of fitting all (almost) the use cases into relational data model for almost more than 40 years!
Today in 2020, the database landscape has taken a rapid growth and there are plenty of databases types that are available and the count is increasing. The relational databases had such a long dominance that we broadly categorized databases in two branches Relational & NoSQL. NoSQL is the category that holds databases which are Not Only SQL.
Today let’s explore the different types of databases, their use cases, the trade-offs and complexities associated with them. I will also mention the most popular databases in the domains.
Relational Databases
Relational databases are used to organize data into structured tables which has rows and columns. For relational database we should always be ready with the schema (structure of storing the data) and the relationships between the data should be very clear to us. Relationships between tables are maintained using keys, and data integrity is ensured through ACID.
They are ideal for applications where strict data consistency and integrity are critical and they use the SQL language for data definition and data manipulations.
When to Use
- When the data can be structured and they have relationships between them.
- When we need ACID compliance and transaction support.
Tradeoffs
- Great for consistency, but schema changes require migrations, downtime planning, or complex online migration tooling.
- Vertical scaling is easier; horizontal scaling is complex and often application-aware.
- Complex joins are powerful but can be slow at scale if indexing isn’t planned carefully.
Complexities
- It is a challenge to handle schema evolution with growing data.
- Scaling relational databases is tuff like managing replication lags and shards.
- It is hard to build efficient indexing strategies without degrading writes.
Most Popular Choices
The most widely used relational databases are:
- MySQL
- PostgreSQL
- Microsoft SQL Server
Document Databases
Document databases store data in document-like structures, usually JSON or BSON, rather than rows and columns. Each document can have a flexible schema, allowing different documents in the same collection. Document databases make it easy to store nested data structures, which reduces the need for complex joins. They are highly scalable and are well-suited to applications with rapidly evolving data requirements.
When to Use
- When the data is unstructured and flexibility is required.
- When we are into scenarios like event logging for audit purposes
- When we need to store nested data or meta data for resources
Tradeoffs
- Without strong schema enforcement, different documents may diverge structurally.
- Complex relational queries often require manual denormalization or multiple round-trips.
- Reads may return stale data; consistency guarantees vary across deployments.
Complexities
- Designing document structure to avoid deeply nested or overly large documents.
- Handling duplication due to denormalization must update the same data in multiple places.
- Choosing between embedding vs referencing models.
Most Popular Choices
The most popular document databases are:
- MongoDB
- CouchDB
Key Value Database
Key-value databases are the simplest form of database, storing data as a collection of key-value pairs. Each key is unique and maps directly to a value, which can be a string, JSON, or binary object. Databases like Dynamo DB are extremely fast because retrieval is based solely on the key, without any complex querying.
When to Use
- When we need real-time Leaderboards and counters
- When we need to store configurations or feature flags
Tradeoffs
- Great for performance, but unsuitable for complex queries unless we build secondary structures.
- Lookups usually require the key and there is no native range queries.
- Some KV stores optimize for availability over strict consistency, so choose wisely.
Complexities
- Designing keys to avoid hot partitions or uneven distribution.
- Managing TTL (Time to Live) and evictions intelligently.
- Ensuring atomicity across multiple keys (multi-key transactions are limited).
- Maintaining cluster rebalancing and handling partition tolerance.
Most Popular Choices
- Redis
- Amazon DynamoDB
- Riak
Wide Column Database
Wide column databases store data in tables, rows, and dynamic columns, but columns are grouped into column families, allowing efficient storage of sparse data. They are optimized for high write and read throughput and can scale horizontally across distributed clusters. These databases are often used in big data and analytical applications due to their ability to handle large volumes of structured and semi-structured data.
When to Use
- When we need 100% uptime via masterless architecture, even if multiple nodes or entire data centers fail.
- When we need extremely high write throughput.
Tradeoffs
- Queries must match the table’s primary key and the clustering key structure.
- We can’t arbitrarily query the data, the model must be designed for the queries.
- Strong consistency requires specific configurations (quorums), reducing performance.
- Read performance varies depending on partition size and structure.
Complexities
- Query-driven data modeling usually requires duplicating data in multiple tables.
- It is painful sometimes to manage compaction, repair operations, tombstones.
- Be careful and avoid “wide partitions” which can degrade performance badly.
- We must ensure correct replication settings for desired consistency levels.
Most Popular Choices
The commonly used wide column databases are:
- Apache Cassandra
- HBase,
Timeseries Database
Timeseries databases are specialized for storing and querying data indexed by time. They are optimized for handling high-frequency data, supporting operations like aggregation over time intervals, downsampling, and retention policies.
When to Use
- When we require analysis of trends over time
- When we get data that is time dependent like the data from sensors or IoT.
Tradeoffs
- These databases are not ideal for relational or unstructured workloads.
- To keep the performance old data is often auto-deleted or compressed.
- Too many unique series (high cardinality) can cause performance collapse.
Complexities
- Designing measurement/series/tag structure to avoid high-cardinality explosion.
- Handling downsampling (rollups) and retention rules cleanly.
- Dealing with backfilling data is hard. Many timeseries DBs handle out-of-order timestamps poorly.
- Scaling storage and query engines for long-term data retention.
Most Popular Choices
The popular choices for timeseries databases are:
- InfluxDB
- TimescaleDB
Graph Database
Graph databases focus on storing relationships between data points rather than just the data itself. They use nodes, edges, and properties to model real-world entities and connections. This makes them highly efficient for querying complex relationships, such as social networks or recommendation systems. Graph databases excel in scenarios where relationships are as important as the data.
When to Use
- When our application needs to explore complex relationships.
- When we need recommendation systems or knowledge graphs.
Tradeoffs
- Great for deep traversal but not efficient for big analytical scans.
- Harder to distribute horizontally than document or key–value stores.
Complexities
- Choosing between property graph vs RDF models.
- Indexing strategies that avoid slow traversals.
- Handling graph updates at scale without locking large subgraphs.
- Building hybrid architectures when part of the data is relational and part is graph oriented.
Most Popular Choices
- Neo4j
- Jaunas Graph
In-memory Database
In-memory databases store data primarily in RAM instead of on disk, allowing extremely fast read and write operations. They are often used to accelerate applications that require low-latency access to frequently accessed data. Many in-memory databases also support persistence to disk for durability. They are commonly used as caching layers or for high-speed transaction processing.
When to Use
- When we need caching of data
- When we need real-time analytics like leaderboards and session management.
Tradeoffs
- Restart time for snapshots, disk I/O, and durability limitations.
- RAM is expensive. Storing large datasets entirely in-memory may be cost-prohibitive.
Complexities
- Ensuring safe persistence and recovery strategies.
- Managing memory fragmentation.
- Handling cluster topology changes while maintaining data consistency.
Most Popular Choices
The most widely used in-memory databases are:
- Redis
- Memcached
Search Database
Search databases are optimized for text search, ranking, and complex queries over large datasets. They can efficiently index and retrieve data based on keywords, relevance scoring, and full-text queries. These databases are often used in search engines, logging systems, and recommendation platforms. The most critical data structure that enables fast full-text searching in modern search databases is the Inverted Index.
When to Use
- When we want to text search through huge data
- We do not want to overload our primary databases with text search
Tradeoffs
- Eventual consistency with distributed indexing become harder
- Writes are expensive due to tokenization, analysis, and multiple index segments.
- Not a replacement for primary storage.
Complexities
- Choosing analyzers/tokenizers for each field.
- Managing cluster sharding, rebalancing, and mapping evolution.
- Dealing with segment merging and index bloat.
- Avoiding node hot spots caused by uneven shard distribution.
- Preventing split-brain in multi node clusters.
Most Popular Choices
The most popular search databases are:
- Elasticsearch
- Apache Solr
Special Mentions
ClickHouse
ClickHouse is a column-oriented database designed for real-time analytics on large volumes of data. It was open-sourced by Yandex and it is built for extremely fast query performance on analytical workloads (OLAP), especially where data is read-heavy and needs aggregation.
The distinct features of clickhouse database are:
- It is blazing fast for analytical queries
- It is well suited for big data because of columnar storage and compression
- It can scale horizontally across clusters
Spanner
Spanner is a globally distributed, strongly consistent SQL database built by Google. It combines the relational features of traditional SQL databases (schemas, ACID transactions, SQL queries) with the horizontal scalability and distribution typically found in NoSQL systems. It also supports ProtoBufs!
The distinct features of spanner database are:
- It is global distribution with strong consistency
- It offers automatic sharding, replication, failover
- It is ideal for mission-critical systems with strict consistency needs
Conclusion
If you are getting to know so many databases first time then let me warn you that this is not an exhaustive list.
When it comes to choosing the right database, the decision depends on understanding our data model, access patterns, scalability requirements, and operational constraints. No single database is perfect for every use case, so align our choice with your real application needs.
While we discussed how each database is used, managing them introduces its own set of complexities. We should not simply follow market trends we should trust our experience, understand the trade-offs, and keep learning as technologies mature.
Happy Coding!