19
HASH or RANGE in distributed databases
Here is a general idea that works in any database: you want to colocate the data that will be retrieved together. In a NoSQL database which generally is built for one use case, this is easy: you have one use-case, you have one data store (called collection in MongoDB, or improperly called "table" in DynamoDB) and have one partitioning scheme: hashing. In a relational database, this is different: data is stored to be accessed by multiple use case. Because different users may navigate, or aggregate, from different business point of view. You have multiple tables (normalization to avoid update anomalies), multiple partitioning schemes for tables and indexes. Basically, there are two structures in IT when you want to colocate data:
- HASH: you apply a hash function to the key, or index entry, and the same hash value will go to the same partition.
- RANGE: you sort your key to store the values in order, or, at least, in the same partition when they have closed values.
For my examples, on YugabyteDB, I'll load an AVENGERS table:
python -c "from sqlalchemy import create_engine;import pandas;import io;import urllib.request;
pandas.read_csv(io.StringIO(urllib.request.urlopen('https://github.com/fivethirtyeight/data/blob/master/avengers/avengers.csv?raw=True').read().decode('utf-8', errors='ignore'))).to_sql('avengers',
create_engine('postgresql+psycopg2://franck:[email protected]:5433/yugabyte'), if_exists='replace', method='multi')"
This is a short python code that loads the avengers.csv from https://fivethirtyeight.com/ and the connection here is my database that I keep publicly opened in case you want to test. YugabyteDB is fully compatible with PostgreSQL so you can use any postgres driver or tool.
However, I'll do my test on a geo-distributed cluster in order to see the latency when reading from multiple nodes:
I have 173 avengers in a YugabyteDB table and one index:
yugabyte=> select count(*) from avengers;
count
-------
173
yugabyte=> select indexdef from pg_indexes
where ( schemaname , tablename )
= ( 'public' , 'avengers' )
;
indexdef
--------------------------------------------------------------------------
CREATE INDEX ix_avengers_index ON public.avengers USING lsm (index HASH)
The index has been created on the primary key. Actually, in YugabyteDB, all tables are stored in an index structure (like InnoDB tables in MySQL, clustered indexes in SQL Server, or Index Organized Tables in Oracle) and this index structure is not a B*Tree but a LSM - Log Structure Merge. And the important here is that this one, created by default, indexes on HASH. It may be the only sharding or partitioning scheme you know when coming from other distributed databases. But in YugabyteDB you have the choice and HASH is just the default.
Let's see what are the optimized access paths. Basically we will see either:
- Seq Scan which reads all rows from all tablets, which means from all nodes, and filter them afterwards. This is optimal only when you want lot of rows from the whole table.
- Index Scan which accesses only to the specific tablet, and specific range on the tablet, which is optimal to read a specific part of the table rows.
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on public.avengers (cost=0.00..100.00 rows=1000 width=152) (actual time=120.770..481.221 rows=173 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Without any where clause, of course we need to read all table rows and Seq Scan is the right access path.
Let's filter with a predicate on the key:
yugabyte=> select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where index=73;
index | Name/Alias | Appearances | Gender | Year | Death1 | Notes
------------+-----------------------+-------------+--------+------+--------+-----------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
73 | Peter Benjamin Parker | 4333 | MALE | 1990 | YES | Since joining the New Avengers: First death Killed by Morlun. Ressurected in a brand new body. Died in Amazing Spider-
Man #700. Eventually mainfested in his body that Octavious stole and then took over again.
(1 row)
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where index=73;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using avengers_pkey on public.avengers (cost=0.00..4.11 rows=1 width=152) (actual time=120.434..120.437 rows=1 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers.index = 73)
Here, the Index Scan knows which part of the table to access thanks to the "Index Cond". This is an equality predicate and on a HASH key the value is hashed and can get to the right partition, and the right place in this partition.
Let's go further:
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where index in (73,6);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using avengers_pkey on public.avengers (cost=0.00..4.11 rows=1 width=152) (actual time=841.051..841.057 rows=2 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers.index = ANY ('{73,6}'::bigint[]))
With many values the Index Scan is still possible: each value is accessed in its partition.
yugabyte=> select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where index<5;
index | Name/Alias | Appearances | Gender | Year | Death1 | Notes
------------+-----------------------------+-------------+--------+------+--------+-----------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------
2 | Anthony Edward "Tony" Stark | 3068 | MALE | 1963 | YES | Death: "Later while under the influence of Immortus Stark committed a number of horrible acts and was killed.'
This set up young Tony. Franklin Richards later brought him back
0 | Henry Jonathan "Hank" Pym | 1269 | MALE | 1963 | YES | Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held.
3 | Robert Bruce Banner | 2089 | MALE | 1963 | YES | Dies in Ghosts of the Future arc. However "he had actually used a hidden Pantheon base to survive"
1 | Janet van Dyne | 1165 | FEMALE | 1963 | YES | Dies in Secret Invasion V1:I8. Actually was sent tto Microverse later recovered
4 | Thor Odinson | 2402 | MALE | 1963 | YES | Dies in Fear Itself brought back because that's kind of the whole point. Second death in Time Runs Out has not y
et returned
(5 rows)
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where index<5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on public.avengers (cost=0.00..102.50 rows=1000 width=152) (actual time=120.801..481.287 rows=5 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Filter: (avengers.index < 5)
Rows Removed by Filter: 168
With an inequality predicate we cannot use the HASH index because the hash function doesn't keep the order. In this case a Seq Scan is the only solution, reading all rows from all tablets and filtering them afterwards. This is not an optimal access path and should be avoided for the critical use cases.
When you define the primary key, you probably don't know all use cases in advance. But you know your data. In this case, the "index" primary key is a number just to get a small identifier but the value has no business meaning, and no arithmetic purpose like comparing and sorting. There is no reason to ever query it on a range, or sort it, and then it probably doesn't need a RANGE key. HASH key is more flexible when you know you don't need a range scan because, with consistent hash sharding, you can achieve the best distribution across nodes, and keep this while rebalancing, adding, or removing.
There are some columns that have an order, and where you definitely want to query on a range. Let's index the column that holds the number of appearances of the avengers character in a Marvel comic books:
yugabyte=> create index i0 on avengers("Appearances");
CREATE INDEX
yugabyte=> select indexdef from pg_indexes
where ( schemaname , tablename ) = ( 'public' , 'avengers' ) ;
indexdef
-------------------------------------------------------------------------------
CREATE INDEX ix_avengers_index ON public.avengers USING lsm (index HASH)
CREATE INDEX i0 ON public.avengers USING lsm ("Appearances" HASH)
Without any mention of ASC or DESC it defaults to HASH.
yugabyte=> select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances"<5 order by "Appearances";
index | Name/Alias | Appearances | Gender | Year | Death1 | Notes
------------+---------------+-------------+--------+------+--------+-------------------------------------------------------------------------------------
125 | Fiona | 2 | FEMALE | 1900 | NO |
39 | Moira Brandon | 2 | FEMALE | 1993 | YES | Died in her second appearance earns honorary Avengers status doing so. Stays dead.
68 | Doug Taggert | 3 | MALE | 2005 | YES | Accidently killed by Zaran
65 | Gene Lorrene | 4 | MALE | 2005 | NO |
(4 rows)
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances"<5 order by "Appearances";
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=152.33..154.83 rows=1000 width=152) (actual time=481.279..481.280 rows=4 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Sort Key: avengers."Appearances"
Sort Method: quicksort Memory: 25kB
-> Seq Scan on public.avengers (cost=0.00..102.50 rows=1000 width=152) (actual time=120.797..481.268 rows=4 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Filter: (avengers."Appearances" < 5)
Rows Removed by Filter: 169
My HASH index was not used here when querying for the avengers with less than 5 appearance. Hashing can be used only when having discrete values in the condition:
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances" in (1,2,3,4,5) order by "Appearances";
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=5.43..5.45 rows=10 width=152) (actual time=480.476..480.477 rows=4 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Sort Key: avengers."Appearances"
Sort Method: quicksort Memory: 25kB
-> Index Scan using i0 on public.avengers (cost=0.00..5.26 rows=10 width=152) (actual time=359.614..480.466 rows=4 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers."Appearances" = ANY ('{1,2,3,4,5}'::bigint[]))
This is not very useful. For such a column, where a range, or order, makes sense we need to define a range index, either in ASCending or DESCending order:
yugabyte=> drop index i0;
DROP INDEX
yugabyte=> create index i0 on avengers("Appearances" asc);
CREATE INDEX
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances"<5 order by "Appearances";
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using i0 on public.avengers (cost=0.00..5.22 rows=10 width=152) (actual time=599.444..599.452 rows=4 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers."Appearances" < 5)
Here the index has been scanned, even if the order of the scan (descending from 5 here) is opposite of the index order (ASC). No additional filtering on the output as we have read only the required rows. And no Sort operation as we know they come in the order we want in the ORDER BY.
yugabyte=> select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances">2000 order by "Appearances";
index | Name/Alias | Appearances | Gender | Year | Death1 | Notes
------------+-----------------------------+-------------+--------+------+--------+-----------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
3 | Robert Bruce Banner | 2089 | MALE | 1963 | YES | Dies in Ghosts of the Future arc. However "he had actually used a hidden Pantheon base to survive"
57 | Reed Richards | 2125 | MALE | 1989 | NO |
40 | Benjamin Jacob Grimm | 2305 | MALE | 1986 | YES | Once killed during a battle with Doctor Doom.' Brought back by the FF when they literally went to Heaven to get
him.
4 | Thor Odinson | 2402 | MALE | 1963 | YES | Dies in Fear Itself brought back because that's kind of the whole point. Second death in Time Runs Out has not y
et returned
2 | Anthony Edward "Tony" Stark | 3068 | MALE | 1963 | YES | Death: "Later while under the influence of Immortus Stark committed a number of horrible acts and was killed.'
This set up young Tony. Franklin Richards later brought him back
92 | James "Logan" Howlett | 3130 | MALE | 2005 | YES | Died in Death_of_Wolverine_Vol_1_4. Has not yet returned
6 | Steven Rogers | 3458 | MALE | 1964 | YES | Dies at the end of Civil War. Later comes back.
73 | Peter Benjamin Parker | 4333 | MALE | 1990 | YES | Since joining the New Avengers: First death Killed by Morlun. Ressurected in a brand new body. Died in Amazing S
pider-Man #700. Eventually mainfested in his body that Octavious stole and then took over again.
(8 rows)
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Appearances">2000 order by "Appearances";
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using i0 on public.avengers (cost=0.00..5.22 rows=10 width=152) (actual time=239.388..239.401 rows=8 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers."Appearances" > 2000)
The same works when reading in ascending order of course.
So, for the numeric datataypes where the order is meaningful you want to define ASC or DESC. It is the same for a character string column if the order makes sense. Let's index the avenger's name, first with a HASH:
yugabyte=> create index i1 on avengers("Name/Alias");
CREATE INDEX
yugabyte=> select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Name/Alias"='Peter Benjamin Parker';
index | Name/Alias | Appearances | Gender | Year | Death1 | Notes
------------+-----------------------+-------------+--------+------+--------+-----------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
73 | Peter Benjamin Parker | 4333 | MALE | 1990 | YES | Since joining the New Avengers: First death Killed by Morlun. Ressurected in a brand new body. Died in Amazing Spider-
Man #700. Eventually mainfested in his body that Octavious stole and then took over again.
(1 row)
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Name/Alias"='Peter Benjamin Parker';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using i1 on public.avengers (cost=0.00..5.22 rows=10 width=152) (actual time=239.840..239.844 rows=1 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: (avengers."Name/Alias" = 'Peter Benjamin Parker'::text)
With the predicate on the full name, a HASH index is used. But let's say you don't know Spiderman's middle name:
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Name/Alias" like 'Peter% Parker';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on public.avengers (cost=0.00..102.50 rows=1000 width=152) (actual time=479.191..480.680 rows=1 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Filter: (avengers."Name/Alias" ~~ 'Peter% Parker'::text)
Rows Removed by Filter: 172
With the LIKE 'Peter% Parker'
pattern we had to scan all the rows. Can we do better? Let's create a RANGE index:
yugabyte=> create index i2 on avengers("Name/Alias" asc);
CREATE INDEX
yugabyte=> explain (analyze, verbose)
select index,"Name/Alias","Appearances","Gender","Year","Death1","Notes"
from avengers where "Name/Alias" like 'Peter% Parker';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using i2 on public.avengers (cost=0.00..5.25 rows=10 width=152) (actual time=480.660..480.665 rows=1 loops=1)
Output: index, "Name/Alias", "Appearances", "Gender", "Year", "Death1", "Notes"
Index Cond: ((avengers."Name/Alias" >= 'Peter'::text) AND (avengers."Name/Alias" < 'Petes'::text))
Filter: (avengers."Name/Alias" ~~ 'Peter% Parker'::text)
Here, the query planner knows that the names starting with 'Peter' are all colocated and this is an index access to contiguous entries. This is where a relational database is more powerful than NoSQL: you have multiple access paths and a query planner to find the optimal one.
Note that I've created two indexes here but the more indexes you have and the more expensive will be the insert, delete or update of the indexed columns. Here, a HASH index makes no sense because the RANGE index can also serve equality predicates
yugabyte=> drop index i1;
DROP INDEX
There's more to say about it but I wanted to keep it simple. The YugabyteDB Documentation covers it. Data modeling is not something complex. On a SQL database you have the agility to change the model thanks to the DDL language, and the logical independence with views. But with the HASH or RANGE key, as it also defines the physical colocation of data, better think about it in advance. If you know all the critical use-cases, then look at how they access data. If they are not all planned, don't panic. Think about the business meaning of data. Some numbers have an order (your salary, your birth date, your work hours...) and some don't (a customer id, a surrogate key,...). Some character strings have an order (your name, product names, school grades...) and some don't (your car plate number, a description,...)
You can also index multiple columns, where the first one is hashed to partition it and the second one is sorted to scan by range and order it. You can also colocate tables together in tablegroups. We will see that in future posts.
19