PostgreSQL Bitmap Scan with GIN indexes on Array or Secondary Table with Index Only Scan

In a previous post I stored tags and groups as arrays in the posts table. I created GIN indexes on them, but without the possibility to add the timestamp to it. To optimize queries on a list of tags, or groups, within a range of date, I added secondary tables, automatically maintained by triggers. This was much more efficient in YugabyteDB because this additional index is like an index (tables are stored as LSM Trees rather than heap tables) and because, without this, the Rows Removed by Filter from the GIN index are very expensive in a distributed database. But when showing the execution plan with the GIN index, I mentioned that PostgreSQL would show another join method. In this post I'll show the difference. I had it in draft an this tweet by Nikolay Samokhvalov about arrays was a good occasion:

PostgreSQL

I've run the same as the previous post but this time on AWS Aurora with PostgreSQL compatibility. For what I'm doing there, this is the same as PostgreSQL. I've loaded approximately the same generated data:

aurora=> select count(*) from posts_by_user;
  count
----------
 42670000
(1 row)

Time: 13444.326 ms (00:13.444)

The first query was about getting the the last 2-days posts for one user:

aurora=> explain analyze
           select posts_by_user.*
            from posts_by_user where user_id=1
            and created_date > now() - 2*interval '1 day'
           order by created_date desc;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan Backward using posts_by_user_pkey on posts_by_user  (cost=0.57..8607.81 rows=2141 width=405) (actual time=0.045..2.536 rows=1847 loops=1)
   Index Cond: ((user_id = 1) AND (created_date > (now() - '2 days'::interval)))
 Planning Time: 0.140 ms
 Execution Time: 2.664 ms
(4 rows)

This is fast. Aurora is not a distributed database: all writes occur on a single node, and as my data set is small, all reads are served there from the shared buffers without involving any remote call. This is not possible in a distributed database where data may be updated on other servers.

The second query gets the posts by a list of tag, and range of date, using the additional tables that I maintain with a trigger:

aurora=> explain analyze
           select posts_by_user.*
            from posts_by_user
            join posts_by_tag
            using(user_id, created_date, post_id)
            where posts_by_tag.created_date
                   > now() - interval '1 month'
                  and tag_id =1
            order by created_date desc limit 100
 ;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.15..315.72 rows=31 width=405) (actual time=0.068..0.229 rows=18 loops=1)
   ->  Nested Loop  (cost=1.15..315.72 rows=31 width=405) (actual time=0.067..0.225 rows=18 loops=1)
         ->  Index Only Scan Backward using posts_by_tag_pkey on posts_by_tag  (cost=0.58..49.20 rows=31 width=24) (actual time=0.047..0.050 rows=18 loops=1)
               Index Cond: ((tag_id = 1) AND (created_date > (now() - '1 mon'::interval)))
               Heap Fetches: 0
         ->  Index Scan using posts_by_user_user_id_post_id_key on posts_by_user  (cost=0.56..8.59 rows=1 width=405) (actual time=0.008..0.008 rows=1 loops=18)
               Index Cond: ((user_id = posts_by_tag.user_id) AND (post_id = posts_by_tag.post_id))
               Filter: (posts_by_tag.created_date = created_date)
 Planning Time: 0.372 ms
 Execution Time: 0.270 ms
(10 rows)

This, again is fast. The additional table is read by Index Only Scan because I've included all columns in the primary key, and the latest changes have been vacuumed.

Now, the most interesting, the same query without using the additional table but the GIN indexes:

aurora=> explain analyze
           select *
            from posts_by_user
            where created_date > now() - interval '1 month'
              and tag_ids @>'{1}'
order by created_date desc limit 100;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1600106.71..1600106.96 rows=100 width=405) (actual time=2302.013..2302.019 rows=18 loops=1)
   ->  Sort  (cost=1600106.71..1600112.21 rows=2201 width=405) (actual time=2302.012..2302.015 rows=18 loops=1)
         Sort Key: created_date DESC
         Sort Method: quicksort  Memory: 34kB
         ->  Bitmap Heap Scan on posts_by_user  (cost=1591353.62..1600022.59 rows=2201 width=405) (actual time=2301.754..2301.998 rows=18 loops=1)
               Recheck Cond: ((tag_ids @> '{1}'::bigint[]) AND (created_date > (now() - '1 mon'::interval)))
               Rows Removed by Index Recheck: 284
               Heap Blocks: exact=17
               ->  BitmapAnd  (cost=1591353.62..1591353.62 rows=2201 width=0) (actual time=2301.724..2301.726 rows=0 loops=1)
                     ->  Bitmap Index Scan on posts_by_user_tag_ids  (cost=0.00..3098.15 rows=269086 width=0) (actual time=3.280..3.280 rows=2000 loops=1)
                           Index Cond: (tag_ids @> '{1}'::bigint[])
                     ->  Bitmap Index Scan on posts_by_user_pkey  (cost=0.00..1588254.12 rows=440229 width=0) (actual time=2297.828..2297.828 rows=360584 loops=1)
                           Index Cond: (created_date > (now() - '1 mon'::interval))
 Planning Time: 0.132 ms
 Execution Time: 2302.057 ms
(15 rows)

This takes longer, but may still be acceptable. The GIN indexes are implemented differently in PostgreSQL than in YugabyteDB. They use a pending list, and must be read by a Bitmap Index Scan. It is an advantage here because Bitmaps Scan can combine multiple indexes: the GIN one on array of tags and the primary key to filter on the time range. This restricts the number of rows to get from the heap table (Rows Removed by Index Recheck is only 284). If you have queries combining filters on tags and groups, this may be really useful.

Today on SQL databases we have the possibility of full normalization, or nested structures like arrays, or a combination, with secondary structures maintained automatically (indexes) or though triggers. Look at the access patterns and the database engine to choose the best.

36