Key-Range Partitions
Partition data in sorted key ranges to efficiently handle range queries.
Problem
To split data across a set of cluster nodes, each data item needs to be mapped to a node. If users want to query a range of keys, specifying only the start and end key, all partitions will need to be queried for the values to be acquired. Querying every partition for a single request is far from optimal.
Take a key-value store as an example. We can store the author names using hash-based mapping, as in Fixed Partitions.
Keys | Hash | Partition † | Node |
---|---|---|---|
alice | 133299819613694460644197938031451912208 | 0 | 0 |
bob | 63479738429015246738359000453022047291 | 1 | 1 |
mary | 37724856304035789372490171084843241126 | 5 | 1 |
philip | 83980963731216160506671196398339418866 | 2 | 2 |
† Partition = Hash % Number of partitions (9)
If a user wants to get values for a range of names—say, beginning with letter “a” to “f”—there's no way to know which partitions we should fetch data from if the hash of the key is being used to map keys to partitions. All partitions need to be queried to get the values required.
Solution
Create logical partitions for keys ranged in a sorted order. The partitions can then be mapped to cluster nodes. To query a range of data, the client can get all partitions that contain keys from a given range and query only those specific partitions to get the values required.
for more details go to Chapter 20 of the online ebook at oreilly.com
This pattern is part of Patterns of Distributed Systems
23 November 2023