How do you scale a cloud bigtable database if more queries per second are needed?

In today’s series we are going to dive-deep into Bigtable. We will follow our usual strategy of defining the service, looking at when to use it and going through it’s history and how it compares to other products. After covering BigTable’s background, we’ll jump into the core technology, it’s elasticity and approach to security, cost and disaster recovery.

Introduction

Bigtable is a petabyte-scale, high-throughput, fully managed, scalable NoSQL database. It’s a key-value store. Ideal for gaming, ad-tech, IoT and many other applications that need high read and write, up to 10 MB/s throughput, at a consistently low, sub-10ms latency. With out-of-the box integrations with both streaming products, such as Dataflow Streaming, Spark Streaming, or Storm, and batch products, such as Hadoop MapReduce, Dataflow, or Spark, Bigtable is the perfect companion for your big data applications. In addition, you will also see it as a backend for other database products such as GeoMesa, JGraphDB, JanusGraph, Heroic and OpenTSDB, or being used in Cloud Datastore to persist indexes.

The no-SQL revolution started in 2006 when Google released a white paper on Bigtable, explaining the model underlying the datastore powering applications such as Search, Google Finance and Gmail. Those principles were then incorporated into the open source database Apache HBase, whose API was later used by Google when they externalized Bigtable as their managed proprietary technology for their Cloud customers. Others later created their own versions of the key-value store. For instance, Facebook open sourced Cassandra and AWS released DynamoDB, as a managed proprietary no-SQL technology. Check out DynamoDB’s white paper to understand the differences.

Data Model

In contrast to traditional relational SQL databases with tables and rows, Bigtable is a sparse, wide-column, multi-dimensional persistent map. Sparse means that empty cells won’t occupy any space. As a result, it often makes sense to create a very large number of columns, or wide-columns, even if most columns are empty in most rows. Another benefit of storing data based on columns instead of rows is an improved compression. That’s due to increased similarity in column data in contrast to row data. For instance, if you store phone numbers, the country prefix is going to repeat itself often, allowing better compression results.

The persistent key-value map is indexed by a row key, column key, and a timestamp. Also, instead of editing tables, when a new value is added, it’s going to be appended to the data. That way, data remains immutable, no synchronization is needed during read operations, and high throughput is achieved. For example, imagine an IoT use case where you are storing temperature sensor data. If you want to get the latest temperature values, you would filter by timestamp. At the same time, it allows you to define time-series by selecting a time-window, since all data is persisted in Bigtable. If your purpose is actually to delete or mutate data, data will be appended first. At a later point in time, the system will automatically compact the data to optimize space.

Technology

There is a list of concepts that allow such a database to exist including an underlying distributed file system, data sharding, request rebalancing, data access pattern learning, and a distributed lock service. Let’s look at them more in detail.

A distributed file system will take care of reliably storing data, replicating data for disaster recovery, and distributing data for fast access. Google uses Colossus or Google File System (GFS) v2, which is an evolution over traditional file systems or persistent object storage solutions. Its resilient design embraces failure, since it assumes that it will be built on top of unreliable commodity hardware, it must recover and self-heal when errors occur. Internally, files will be split into 64MB fixed-size chunks of data, which will be distributed and replicated across servers. The distribution of a file across multiple multiple machines enables clients to read data in parallel at much higher rates, overcoming traditional disk IOPs limitations. Colossus / GFS is architected in a Master / Slave configuration, where the master will be responsible for logging every transaction to store state, which defines the order of concurrent operations as well as a mechanism to recover the master in case it crashes. The master will only store metadata in logs, which will later be replicated, being the responsibility of the slaves to store the actual data.

Cloud Bigtable uses sharding, a mechanism to split data into chunks that can be distributed and replicated for high availability and disaster recovery. Specifically in Bigtable terminology, a table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, in the so called SSTable format, which provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. As we saw before, in addition to SSTables, Colossus will store the transactions metadata in a shared log for additional reliability. Cloud Bigtable switches the paradigm from bringing the data to the compute, to bringing the compute to the data. It uses compute nodes, which don’t store the data directly but have pointers to tablets stored on Colossus. These pointers can be moved from one node to the other very quickly, hence enabling rebalancing tablets from one node to another to avoid hotspots, and recovering from compute node failures. To top this, Bigtable also has built-in data access pattern recognition algorithms, which will learn, adjust and smartly redistribute the load among multiple compute nodes. The so called Bloom Filters will help reduce the number of disk access by predicting whether an SSTable contains the requested data. The last magic piece that we need to make this work is a highly available and persistent distributed lock service that manages leases of tablets to make sure we only modify one resource simultaneously. Introducing Chubby, Bigtable’s best friend, which will ensure that we are only running one active master at the time, and discovers, knows, protects and makes available tablets to nodes. Internally, Chubby will use the Paxos algorithm for solving consensus problems such as Master election or log ordering.

How do you scale a cloud bigtable database if more queries per second are needed?

Bigtable’s behind the scenes architecture.

Scalability

Unlike relational databases, performance will not degrade with load. Instead, Bigtable will scale linearly. To do so, you need to provision Bigtable nodes, which will support 10,000 queries per second (QPS) per node. In addition, Bigtable has two levels of caching, Scan Cache, at the key-value pair level and Block Cache, at the SSTable block level.

Data durability

Taking advantage of the above described Colossus storage system, Bigtable replicates data across two zones in a region of your choice, in a matter of seconds or minutes. To do so, you would have to configure more than one cluster. Note, that the data replication model is eventually consistent, meaning that the data change will be available in the second cluster at some point in the future. Hence, the second cluster might read stale data.

If you need read-after-writes consistency, each application in the group must use a single-router routing profile. Instead, if you need strong consistency guarantees, in addition to the above describe app-profile, you can only use the second cluster as a fail-over. This configuration will enable you to have a highly available configuration.

In addition to the resilient configuration described above to support highly available, replicated clusters, Bigtable also provides data backups, so that we can cope with catastrophic events and provide disaster recovery.

Cost

As a fully managed, elastic database you only pay for what you use. The cost is predictable, composed of two components: compute and storage. For compute you pay per provisioned compute node around $0.65 an hour ($468 per month), which gives you 10,000 operations per second. For storage you pay around $0.17 for SSD or $0.026 for HDD per GB a month. When in doubt, choose SSD storage, it’s going to be faster (~6ms vs ~200ms), more predictable, and more cost-effective in most cases, since savings are minimal in comparison to the nodes you will have to add to to access data at a good rate. Consider only HDD when you have over 10TB of data. In addition, remember that egress data costs may apply.

Security

Following Google’s security in-depth principle, we want to secure access at different levels. We’ll start by making sure that Bigtable is not accessible through the internet. To accomplish that, we are going to setup Private Google Access, so that we can communicate over private ip addresses. In addition, we can setup tag based firewall rules, so that only the specified VMs can access Bigtable. Furthermore, we can setup Identity and Access Management (IAM) roles to define who can do what on Bigtable. We can setup these permissions for users, with user roles, or for our VMs, with service accounts. That way, we have our datastore secured on multiple levels.

Sum up

We looked at Bigtable through multiple lenses, so I hope the different concepts around Bigtable are clearer now. In summary, if you are looking for a simple, single millisecond digit transactions latency speed, at petabyte scale, for semi-structured data, Bigtable is for you. Now, if you want to have SQL-like transactions on non-structured data you might want to check a document store such as Cloud Datastore. If you need to go fully relational, try Cloud SQL or Cloud Spanner, or if you need interactive querying in an online analytical processing (OLAP) system, consider BigQuery.

Is Cloud Bigtable scalable?

Cloud Bigtable is a fully-managed, scalable NoSQL database service for large operational and analytical workloads on the Google Cloud Platform (GCP).

What will happen to your data in a Bigtable instance if a node goes down?

What will happen to your data in a Bigtable instance if a node goes down? a. Nothing, as the storage is separated from the node compute.

What is the maximum number of clusters per Bigtable instance?

An instance can have clusters in up to 8 regions where Bigtable is available. Each zone in a region can contain only one cluster.

What are the differences between Google Cloud Storage Bigtable and BigQuery?

Bigtable is a NoSQL wide-column database optimized for heavy reads and writes. On the other hand, BigQuery is an enterprise data warehouse for large amounts of relational structured data.