Course Content
ACID Properties and Query Processing
0/2
PostgreSQL Replication
0/1
PostgreSQL Upgrade
0/1
PostgreSQL Tutorial for Absolute Beginners [Administration]
About Lesson

Explain plan or an execution plan is a plan which PostgreSQL generates for a given statement. We use the EXPLAIN statement to generate an execution plan.

Let’s start with a simple example.

test_db=> explain select * from products;
QUERY PLAN
—————————————————————
Seq Scan on products  (cost=0.00..201.00 rows=10000 width=49)

(1 row)

Let’s first understand the things the plan has gave me.

Since this query has no WHERE clause, it must scan all the rows of the table, so the planner has chosen to use a simple sequential scan plan. The numbers that are quoted in parentheses are (left to right):

  1. Estimated start-up cost (0.00). This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node.
  2. Estimated total cost (201.00). This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node’s parent node might stop short of reading all available rows (see the LIMIT example below).
  3. Estimated number of rows output by this plan node (10000). Again, the node is assumed to be run to completion.
  4. Estimated average width of rows (49) output by this plan node (in bytes).

Here, the cost is nothing but the points that PostgreSQL gives for a particular plan.

For example., The table products has following attributes.

test_db=> \dt+ products

List of relations
Schema   |   Name   | Type  |   Owner   |  Size  | Description
———–+———-+——-+———–+——–+————-
test_user | products | table | test_user | 808 kB |
(1 row)

test_db=> select count(1) from products;
count
——-
10000

(1 row)

 

Lets understand the seq_page_cost, cpu_tuple_cost and cpu_operator_cost to calculate the cost of a query.

test_db=> SHOW seq_page_cost;
seq_page_cost
—————
1
(1 row)

test_db=> SHOW cpu_tuple_cost;
cpu_tuple_cost
—————-
0.01
(1 row)

test_db=> SHOW cpu_operator_cost;
cpu_operator_cost
——————-
0.0025
(1 row)

Now,
cost = blocks * seq_page_cost + number_of_records * cpu_tuple_cost

i.e.,
cost=808/8*1+10000*0.01 => 201

Now, Let’s add a condition to the query
test_db=> explain select * from products where category=14;
QUERY PLAN
———————————————————————————-
Bitmap Heap Scan on products (cost=13.11..121.88 rows=622 width=49)
Recheck Cond: (category = 14)
-> Bitmap Index Scan on ix_prod_category (cost=0.00..12.95 rows=622 width=0)
Index Cond: (category = 14)
(4 rows)

Here
→ indicates a node, a node is an atomic unit of operation. It also indicates an action that PostgreSQL performs on any table. There are different types of nodes available and we will discuss them in subsequent sections. In this example.,Bitmap Index Scan is one such node type.
Understanding the terminology
Node Type: The operation the node is performing.
Plans : The child operations executed to provide input to this operation.

Parent Relationship :A guide as to why this operation needs to be run in order to facilitate the parent operation. There are six different possibilities:

  1. Outer is the value you’ll see most often. It means “take in the rows from this operation as input, process them and pass them on”.
  2. Inner is the “inner” part of the loop. ie, for each outer row, we look up its match using this operation.
  3. Member is used for all children of append and modifyTable nodes, and on bitmap processing nodes like BitmapAnd and BitmapOr operations.
  4. InitPlan: Used for calculations performed before the query can start, eg a constant referred to in the query or the result of a CTE scan.
  5. Subquery: The child is a subquery of the parent operation. Since Postgres always uses subquery scans to feed subquery data to parent queries, only ever appears on the children of subquery scans.
  6. SubPlan: Like a Subquery, represents a new query, but used when a subquery scan is not necessary.

Filter : When present, this is a filter used to remove rows.

Parallel Aware :Whether or not the operation will be run in a special mode to support parallelism.

Relation Name :The data source being read/written from. Almost always a table name (including when the data is accessed via an index), but can also be a materialised view or foreign data source.

Alias : The alias used to refer to the Relation Name object.

Estimate Fields : These fields are added to nodes whenever the COSTS flag is set. It is on by default, but you can turn it off.

Total Cost :The estimated total cost of this operation and its descendants. The Postgres query planner often has several different ways it could resolve the same query. It calculates a cost — which is hopefully correlated with the amount of time taken — for each potential plan, and then picks the one with the smallest cost. It’s worth bearing in mind that the costs are unit-free — they’re not designed to convert into time, or disk reads.

Startup Cost : The estimated amount of overhead necessary to start the operation. Note that, unlike “Actual Startup Time”, this is a fixed value, which won’t change for different numbers of rows.

Plan Rows: The number of rows the planner expects to be returned by the operation. This is a per-loop average, like “Actual Rows”.

Plan Width: The estimated average size of each row returned by the operation, in bytes.

Actual Value Fields

test_db=> explain analyze select * from products;
QUERY PLAN
————————————————————————————————————-
Seq Scan on products (cost=0.00..201.00 rows=10000 width=49) (actual time=0.025..2.423 rows=10000 loops=1)
Planning Time: 0.438 ms
Execution Time: 3.075 ms
(3 rows)

When you run EXPLAIN with the ANALYZE flag set, the query is actually executed — allowing real performance data to be gathered.

Actual Loops: The number of times the operation is executed. For a lot of operations it will have a value of one, but when it is not, there are three different cases:

  1. Some operations can be executed more than once. For example, “Nested Loops” run their “Inner” child once for every row returned by their “Outer” child.
  2. When an operation that would normally only consist of one loop is split across multiple threads, each partial operation is counted as a Loop.
  3. The number of loops can be zero when an operation doesn’t need to be executed at all. For example if a table read is planned to provide candidates for an inner join, but there turns out to be no rows on the other side of the join, the operation can be effectively eliminated.

Actual Total Time : The actual amount of time in milliseconds spent on this operation and all of its children. It’s per-loop average, rounded to the nearest thousandth of a millisecond.

This can cause some odd occurrences, particularly for “Materialize” nodes. “Materialize”operations persist the data they receive in memory, to allow for multiple accesses, counting each access as a loop. They often get that data from a data read operation, which is only executed once

Actual Startup Time: This is the amount of time it takes to get the first row out of the operation.

Actual Rows: The number of rows returned by the operation per loop.

Buffers Fields

test_db=> explain (analyze,buffers) select * from products;
QUERY PLAN
————————————————————————————————————-
Seq Scan on products (cost=0.00..201.00 rows=10000 width=49) (actual time=0.018..1.625 rows=10000 loops=1)
Buffers: shared hit=101
Planning Time: 0.103 ms
Execution Time: 2.371 ms
(4 rows)

These values describe the memory/disk usage of the operation. They’ll only appear when you generate a plan with the BUFFERS flag set, which defaults to off – although it’s so useful that there is a movement afoot to turn it on by default when ANALYZE is enabled.

 

Each of the ten buffers keys consist of two parts, a prefix describing the type of information being accessed, and a suffix describing how it has been read/written. Unlike the other actual values, they are total values, ie not per-loop, although they still include values for their child operations.

 

There are three prefixes:

  1. Shared blocks contain data from normal tables and indexes.
  2. Local blocks contain data from temporary tables and indexes (yes, this is quite confusing given that there is also a prefix “Temp”).
  3. Temp blocks contain short-term data used to calculate hashes, sorts, Materialize operations, and similar cases.

 

There are four suffixes:

  1. Hit means that the block was found in the cache, so no full read was necessary.
  2. Read blocks were missed in the cache and had to be read from the normal source of the data.
  3. Dirtied blocks have been modified by the query.
  4. Written blocks have been evicted from the cache.

So for example “Shared Hit Blocks” is the number of blocks read from cached indexes/tables.

I/O Write Time, I/O Read Time :These are also obtainable through buffers if you seta config options enable them. If you are a superuser, you can collect these values during the current session by executing SET track_io_timing = on;

Verbose Fields :VERBOSE is a bit of a funny flag — it’s a grab-bag of different extra pieces of information that don’t necessarily have much in common.

test_db=> explain analyze verbose select * from products;
QUERY PLAN
———————————————————————————————————————–
Seq Scan on test_user.products (cost=0.00..201.00 rows=10000 width=49) (actual time=0.017..1.589 rows=10000 loops=1)
Output: prod_id, category, title, actor, price, special, common_prod_id
Planning Time: 0.092 ms
Execution Time: 2.218 ms
(4 rows)

Workers : A summary of the operation detail broken down by thread. Note that, like the “Workers Planned” and “Workers Launched” values for a parallelised operation, these entries don’t include the main thread.

Output: The columns returned by the operation.

Schema: The schema of the “Relation Name” object.