PostgreSQL prepared statements in PL/pgSQL

In this blog post I'll show that static SQL within PL/pgSQL blocks are automatically prepared but doing the same with dynamic statement is limited by some keyword namespace collision.

I set timing on and create a 10 million rows table:

\set timing on
Timing is on.

create table demo(id bigint primary key, quantity int);
CREATE TABLE
Time: 6.354 ms

insert into demo(id, quantity)
 select generate_series, 42
 from generate_series(1,1e7);
INSERT 0 10000000
Time: 139046.469 ms (02:19.046)

statement in PL/pgSQL loop

I'll run a select from a 10 million iteration loop in PL/pgSQL with regular SELECT statement:

do $$ declare o int; begin
-- static statement execution
for i in 1..1e7 loop
  select max(quantity) from demo
  where id=i
  into o;
end loop;
end; $$;

DO
Time: 157222.613 ms (02:37.223)

The statement execution is short as I have an index, and the parsing time may be significant when compared to the execution time:

explain analyze
 select max(quantity) from demo
 where id=42;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.46..8.46 rows=1 width=4) (actual time=0.046..0.046 rows=1 loops=1)
   ->  Index Scan using demo_pkey on demo  (cost=0.43..8.45 rows=1 width=4) (actual time=0.041..0.041 rows=1 loops=1)
         Index Cond: (id = 42)

 Planning Time: 0.242 ms
 Execution Time: 0.071 ms

The planning time is even longer than the execution time. This is a perfect case for prepared statements (like is most OLTP workloads).

In my PL/pgSQL loop, 1e7 executions took 157222.613 ms which is on average 0.016 ms per execution. This doesn't include the parsing time because PL/pgSQL prepares the statements when the PL/pgSQL code is parsed. This is one reason for using stored procedures: automatic optimization by the database engine.

prepared statement in PL/pgSQL

Now, let's try to prepare the statement explicitly to see if there is a difference:

do $$ declare o int; begin
-- prepared statement - doesn't work
  prepare q(int) as
  select max(quantity) from demo
  where id=$1
  ;
 for i in 1..1e7 loop
  execute q(i) into o;
 end loop;
 deallocate q;
end; $$;
ERROR:  function q(integer) does not exist
LINE 1: q(i)
        ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:  q(i)
CONTEXT:  PL/pgSQL function inline_code_block line 8 at EXECUTE
Time: 1.170 ms
deallocate q;
DEALLOCATE
Time: 0.072 ms

There's an error here because EXECUTE has two meanings: execute a prepared statement in SQL and execute a function in PL/pgSQL. Here, the PL/pgSQL engine thinks I want to execute a function and can't find it.

Is this a problem? Probably not because PL/pgSQL uses prepared statements implicitly. EXECUTE is also used to run dynamic SQL and then I can EXECUTE an EXECUTE 😁:

do $$ declare o int; begin
-- prepared statement dynamic execution
  prepare q(int) as
  select max(quantity) from demo
  where id=$1
  ;
 for i in 1..1e7 loop
  execute format('execute q(%s)',i) into o;
 end loop;
 deallocate q;
end; $$;

DO
Time: 268508.538 ms (04:28.509)

04:28 instead of 02:47 there is clearly no reason for things like this with explicit prepared statements. But... the implicit prepared statement works for static SQL only. What if my table name ("demo" here) is dynamic?

dynamic statement

I'm building the query dynamically here with EXECUTE and FORMAT to inject the table name from a variable:

do $$ declare o int; begin
-- dynamic statement execution
for i in 1..1e7 loop
  execute format('
  select max(quantity) from %I
  where id=$1
  ','demo') using i into o;
end loop;
end; $$;

DO
Time: 1187668.838 ms (19:47.669)

This is much slower because the implicit prepared statement is not used here. The statement is parsed each time.

Let me prepare it explicitely:

do $$ declare o int; begin
-- dynamic prepared statement and execution
 execute format('
 prepare q(int) as
  select max(quantity) from %I where id=$1;
 ','demo');
 for i in 1..1e7 loop
  execute format('execute q(%s)',i) into o;
 end loop;
 deallocate q;
end; $$;

DO
Time: 256044.967 ms (04:16.045)

With this dynamic prepared statement, the time is back to 4 minutes, the same when I was preparing a static statement. There's no reason to use the dynamic EXECUTE to run the EXECUTE prepared statement here, because the name of the prepared statement is static. However, as we have seen above, I don't know how to execute directly the prepared statement because of the namespace collision for the EXECUTE keyword. I'm quite sure I could be back to the 2 minutes without this limitation.

If you have read my last post, SLOB on YugabyteDB, you understand why I'm looking at this. PGIO executes a dynamic query (can run on different table names) in a loop. Because SQL databases can execute complex statements, and optimize their executions with a query planner that searches for the best access path, using prepared statements is a must for OLTP workloads where you execute many small statements. This is one reason why YugabyteDB uses the PostgreSQL query layer. YB is the only distributed database that supports prepared statements, stored procedures and PL/pgSQL procedural language 🚀