Key-Range Partitions

Partition data in sorted key ranges to efficiently handle range queries.

Problem

To split data across a set of cluster nodes, each data item needs to be mapped to a node. If users want to query a range of keys, specifying only the start and end key, all partitions will need to be queried for the values to be acquired. Querying every partition for a single request is far from optimal.

Take a key-value store as an example. We can store the author names using hash-based mapping, as in Fixed Partitions.

Keys	Hash	Partition †	Node
alice	133299819613694460644197938031451912208	0	0
bob	63479738429015246738359000453022047291	1	1
mary	37724856304035789372490171084843241126	5	1
philip	83980963731216160506671196398339418866	2	2

† Partition = Hash % Number of partitions (9)

If a user wants to get values for a range of names—say, beginning with letter “a” to “f”—there's no way to know which partitions we should fetch data from if the hash of the key is being used to map keys to partitions. All partitions need to be queried to get the values required.

Solution

Create logical partitions for keys ranged in a sorted order. The partitions can then be mapped to cluster nodes. To query a range of data, the client can get all partitions that contain keys from a given range and query only those specific partitions to get the values required.

for more details go to Chapter 20 of the online ebook at oreilly.com

This pattern is part of Patterns of Distributed Systems

Unmesh Joshi

23 November 2023