PostgreSQL Interview Questions

1.what are your responsibilities as a PostgreSQL database administrator?

Daily

  • Backups—these are usually incremental or cumulative, weekly fulls, and logs are archived and e-mailed to DBA upon failure
  • Database Alert Logs—such as ERROR,FATAL automatic notifications through e-mail, pagers
  • Operating System File space, CPU and I/O statistics—depends on system admin support
  • SQL Tuning Sets—Top 5 to 10 SQL statements
  • Tablespace growth—Extension, Partition Management, Temporary Tablespace.
  • Replication—Log Shipping/Application in Synch
  • Alert logs—intrusion detection, removal of unused accounts
  • New account creation—should be at least partially automated
  • Personnel security changes—At least 24 hours notice
  • Migrate schema and code changes or ad hoc SQL updates
  • Large table growth, coalescing tablespace
  • Keeping a log of daily changes to the database—publishing it for certain IT staff

Weekly

  • Backups—usually full
  • Cloning for non-production databases—automated or scripted
  • Tablespace growth—daily rolled up to weekly
  • Patches/Upgrades—Milestone updates

Monthly

  • Cloning for non-production databases—automated or scripted
  • Monitoring tablespace growth—weekly rolled up to monthly
  • Trends and forecasts—CPU utilization, I/O stats, logons
  • Password changes on production—sys, system,schema.
  • Practicing recovery scenarios

Quarterly

  • Database upgrades
  • Monitoring tablespace growth—monthly rolled up to yearly

Yearly

  • Tablespace growth—yearly report
  • Trends and forecast rollups
  • Attend postgresql conferences—regional or national pgconf groups
  • PostgreSQL upgrades with planned downtime.
  • Software licensing and warranty renewals
  • Hardware evaluation and replacement
  • SSL Certificate renewals.

2. What is your experience as a PostgreSQL database administrator?

X Years into PostgreSQL Database administrator.

3. Can you briefly tell me the features of PostgreSQL?

PostgreSQL Features

Object-Relational Database Management System

PostgreSQL Database is an ORDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS).

PostgreSQL is an open-source database management system and no License and a Large Community

PostgreSQL Database is an ORDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS).

Extensibility

One of the most important features of PostgreSQL is that it can be extended. If you don’t find

Something that you need, you can usually add it yourself. For example, you can add new data types, new functions and operators, and even new procedural and client languages. There are many contributed packages available on the Internet.

DBaaS Support By Major Cloud Vendors

PostgreSQL has been supported widely as a DBaaS. These services are coming from Amazon(Amazon Aurora PostgreSQL Serverless), Microsoft’s with its Azure Database for PostgreSQL, and Google’s Cloud SQL for PostgreSQL.

Scalability, High-Availability Redundancy

PostgreSQL has massive support and that has a lot of options to choose from. PostgreSQL includes streaming and logical replication built-in to the core package of the software. You may also able to setup a synchronous replication for PostgreSQL to have more high-availability cluster, while making a stand by node process your read queries.

Can you tell me about the Postmaster process?

• Postmaster is the master process called postgres
• Listens on 1-and-only-1 tcp port, default port is 5432.
• Receives client connection request

Can you briefly tell me about the utility processes?

Background writer
−Writes updated data blocks to disk
•WAL writer
−Flushes write-ahead log to disk
•Checkpointer process
−Automatically performs a checkpoint based on config parameters
•Autovacuum launcher
−Starts Autovacuum workers as needed
•Autovacuum workers
−Recover free space for reuse

•Logging collector
−Routes log messages to syslog, eventlog, or log files
•Stats collector
−Collects usage statistics by relation and block
•Archiver
−Archives write-ahead log files

What is the difference between commit and checkpoint?

Before Commit
−Uncommitted updates are in memory.
•After Commit
−Committed updates written from shared memory to WAL log file (on disk).
•After Checkpoint
−Modified data pages are written from shared memory to the data files.
Commit and Checkpoint

Can you tell me the stages involved in statement processing?

Here is the short description of the stages of query processing.

Parse Stage:

The parser stage checks the query transmitted by the application program for correct syntax and creates a query tree.

The rewrite system:

The rewrite system takes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies.

The planner/optimizer:

The planner/optimizer takes the (rewritten) query tree and creates a query plan that will be the input to the executor.

The executor:

The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan.

How can you get the help commands in PostgreSQL?

\h or \?

How do you enable alert logging in PostgreSQL?

By setting the below parameter

  • log_destination = ‘stderr’
  • logging_collector = on
  • log_directory = ‘log’
  • log_filename = ‘alert_postgresql.log’

How often checkpoint occurs in PostgreSQL?

A checkpoint occurs in the following cases:

  1. When the administrator runs the CHECKPOINT statement.
  2. With the interval specified in parameter checkpoint_timeout(default 300 seconds)
  3. Amount of data written to the WAL has reached a parameter max_wal_size (default: 1GB).
  4. At the start of an online backup
  5. At the execution of pg_start_backup function
  6. At the execution of a pg_basebackup command
  7. At the shutdown of an instance(Except for the pg_ctl stop -m immediate command execution)
  8. At the time of database configuration such as CREATE DATABASE / DROP DATABASE statement

What is max_wal_size?

It is the parameter I set to limit the maximum disk space that my wal files should take.

What is work_mem?

This is the amount of memory reserved for either a single sort or hash table operation in a query and it is controlled by work_mem database parameter. A sort operation could be one of an ORDER BY, DISTINCT or Merge join and a hash table operation could be due to a hash-join, hash based aggregation or an IN subquery.

How do you set work_mem locally?

set local work_mem=’3GB’;

What is maintenance_work_mem?

This is the maximum amount of memory allocation of RAM consumed for maintenance operations. A maintenance operation could be one of the VACUUM, CREATE INDEX or adding a FOREIGN KEY to a table. The setting is controlled by the maintenance_work_mem database parameter.

What are the different isolation levels you have in Postgresql?

Read uncommitted

This is the lowest isolation level. In this level, dirty reads are allowed, so one transaction may see not-yet-committed changes made by other transactions.

Read Committed

This is the default isolation level in PostgreSQL.

When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.

In this isolation level, a lock-based concurrency control DBMS implementation keeps write locks (acquired on selected data) until the end of the transaction, but read locks are released as soon as the SELECT operation is performed (so the non-repeatable reads phenomenon can occur in this isolation level).

Repeatable reads

In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads can occur.

Violation in SERIALIZABLE may lead to below error.

ERROR:  could not serialize access due to concurrent update

Serializable

This is the highest isolation level.

With a lock-based concurrency control DBMS implementation, serializability requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT query uses a ranged WHERE clause, especially to avoid the phantom reads phenomenon.

How do you set the isolation level in PostgreSQL?

SET TRANSACTION transaction_mode… command.

What are the different ways you can install PostgreSQL?

PostgreSQL installation can be done in 3 ways.

  1. source code installation
  2. yum installation
  3. rpm based installation

What are the different RPMs you need to install PostgreSQL?

postgresql-libs The postgresql-libs package provides the essential shared libraries for any PostgreSQL client program or interface. You will need to install this package to use any other PostgreSQL package or any clients that need to connect to a PostgreSQL server.
postgresql If you want to manipulate a PostgreSQL database on a local or remote PostgreSQL server, you need this package. You also need to install this package if you’re installing the postgresql-server package.
postgresql-contrib The postgresql-contrib package contains contributed packages that are included in the PostgreSQL distribution.
postgresql-devel The postgresql-devel package contains the header files and libraries needed to compile C or C++ applications which will directly interact with a PostgreSQL database management server and the ecpg Embedded C Postgres preprocessor. You need to install this package if you want to develop applications which will interact with a PostgreSQL server.
postgresql-docs The postgresql-docs package includes the SGML source for the documentation as well as the documentation in PDF format and some extra documentation. Install this package if you want to help with the PostgreSQL documentation project, or if you want to generate printed documentation.
postgresql-server The postgresql-server package includes the programs needed to create and run a PostgreSQL server, which will in turn allow you to create and maintain PostgreSQL databases. You should install postgresql-server if you want to create and maintain your own PostgreSQL databases and/or your own PostgreSQL server. You also need to install the postgresql package and its requirements.
postgresql-tcl The postgresql-tcl package contains the Pgtcl client library and its documentation.
postgresql-jdbc The postgresql-jdbc package includes the .jar files needed for Java programs to access a PostgreSQL database.
postgresql-plperl The postgresql-plperl package contains the the PL/Perl procedural language for the backend. PL/pgSQL is part of the core server package.
postgresql-pltcl The postgresql-plperl package contains PL/Tcl procedural language for the backend. PL/pgSQL is part of the core server package.

Important rpm’s are marked in bold

How do you stop and start cluster?

pg_ctl command

What are the different shutdown modes you have?

Shutdown modes are:

  1. smart             quit after all clients have disconnected
  2. fast           quit directly, with proper shutdown (default)
  3. immediate     quit without complete shutdown; will lead to recovery on restart

Give me any 5 important tools you use in your day to day administration

pg_ctl controls the postgres cluster
pg_basebackup take physical backup
pg_upgrade upgrade your cluster
pg_restore import logical backup
pg_dump used for logical backup or exporting

Can you give me few important files and folders in your data directory?

Item Description
PG_VERSION A file containing the major version number of PostgreSQL
base Subdirectory containing per-database subdirectories
current_logfiles File recording the log file(s) currently written to by the logging collector
global Subdirectory containing cluster-wide tables, such as pg_database
pg_commit_ts Subdirectory containing transaction commit timestamp data
pg_dynshmem Subdirectory containing files used by the dynamic shared memory subsystem
pg_logical Subdirectory containing status data for logical decoding
pg_multixact Subdirectory containing multitransaction status data (used for shared row locks)
pg_notify Subdirectory containing LISTEN/NOTIFY status data
pg_replslot Subdirectory containing replication slot data
pg_serial Subdirectory containing information about committed serializable transactions
pg_snapshots Subdirectory containing exported snapshots
pg_stat Subdirectory containing permanent files for the statistics subsystem
pg_stat_tmp Subdirectory containing temporary files for the statistics subsystem
pg_subtrans Subdirectory containing subtransaction status data
pg_tblspc Subdirectory containing symbolic links to tablespaces
pg_twophase Subdirectory containing state files for prepared transactions
pg_wal Subdirectory containing WAL (Write Ahead Log) files
pg_xact Subdirectory containing transaction commit status data
postgresql.auto.conf A file used for storing configuration parameters that are set by ALTER SYSTEM
postmaster.opts A file recording the command-line options the server was last started with
postmaster.pid A lock file recording the current postmaster process ID (PID), cluster data directory path, postmaster start timestamp, port number, Unix-domain socket directory path (empty on Windows), first valid listen_address (IP address or *, or empty if not listening on TCP), and shared memory segment ID (this file is not present after server shutdown)

What is a tablespace?

Tablespaces allow database administrators to have multiple file systems per machine and decide how to best use physical storage to store database objects. They are named locations within a filespace in which you can create objects. Tablespaces allow you to assign different storage for frequently and infrequently used database objects or to control the I/O performance on certain database objects. For example, place frequently-used tables on file systems that use high performance solid-state drives (SSD), and place other tables on standard hard drives.

Pg_default tablespace corresponds to the “base” directory in the database cluster. With an external tablespace is created a symbolic link is created in the {PGDATA}/pg_tblspc directory. File name of the symbolic link corresponds to the oid column of pg_tablespace catalog.

Can you tell me something about default schemas in PostgreSQL?

Information_schema consists of a standardized set of views that contain information about the objects in the database. These views get system information from the system catalog tables in a standardized way.

Pg_catalog contains the system catalog tables, built-in data types, functions, and operators. It is always part of the schema search path, even if it is not explicitly named in the search path.

Pg_toast stores large objects such as records that exceed the page size. This schema is used internally by the Greenplum Database system.

Every database has a default schema named public. If you do not create any schemas, objects are created in the public schema.

Are there any role attributes you have in your environment?

CONNECTION LIMIT connlimit          If role can log in, this specifies how many concurrent connections                                                  the role can make. -1 (the default) means no limit.

CREATEROLE | NOCREATEROLE     Determines if the role is allowed to create and manage other roles.                                                             NOCREATEROLE is the default.

What is the command used to take physical backup?

pg_basebackup

Give me any two incomplete recovery parameters?

  1. recovery_target = ‘immediate’ — consistent state has been reached
  2. recovery_target_lsn= ‘0/3019838’ — select pg_current_wal_lsn();
  3. recovery_target_time=2019-10-26 19:34:55
  4. recovery_target_xid= ‘634’ — select txid_current();

What happens when you lose any of these files?

Deleted pg_control

In order to recover the pg_control file, restore the pg_control file from a backup, and then re-create the WAL file with specifying the -x option to pg_resetxlog command. If the instance was terminated abnormally, specify -f option also at the same time. Instance cannot start in case that pg_control file is only restored.

Deleted WAL file

If an instance is successful, regardless of the case of abnormal termination, if the entire WAL file has been deleted, Instance cannot start under the condition. Execute pg_resetxlog command to re-create the WAL file. In case of immediately after an abnormal instance termination, specify -f option to pg_resetxlog command to create WAL files forcibly

Behavior on data file deletion

Instance started successfully. Error (ERROR category) occurred on the deleted table when accessing with the SELECT statement. Instance or session are not affected.

Logs are not output at the time of instance startup. At the time of SELECT statement execution, following log has been output.

Behavior of Visibility Map (VM) / Free Space Map (FSM) file deletion

Error does not occur on VM files, FSM file deletion, and SQL statement for the target table succeeds. These files are re-created on the next VACUUM.

Behavior at the time of pg_filenode.map file deletion

When pg_filenode.map file is deleted, it becomes impossible to mapping of the system catalog and the file name, and then cannot use the database.

Behavior at the time of PG_VERSION file deletion

When PG_VERSION file is deleted, the directory cannot be recognized as the directory for PostgreSQL database.

How do you take logical backup?

pg_dump

Can you tell me high-level steps involved in configuring replication in PostgreSQL?

  1. Configure password less connectivity between primary and standby
  2. create replication user.
  3. make necessary changes in primary postgresql.conf file and pg_hba file.
  4. take full backup of primary and import in standby.
  5. create recovery.conf file and start standby.

Can you tell me the contents of recovery.conf file?

standby_mode = ‘on’

  • primary_conninfo = ‘user=replica password=replicauser@ host=192.168.1.9 port=5432 sslmode=disable sslcompression=1’
  • restore_command = ‘cp /u01/archives/%f %p’
  • trigger_file = ‘/tmp/postgresql.trigger.5432’

can you tell me the changes you made in your postgresql.conf file?

  • listen_addresses = ‘*’
  • **wal_level = hot_standby
  • archive_mode = on
  • archive_command = ‘cp %p /u02/New_PostgreData/archivedlogs/%f’
  • archive_timeout = 5min
  • **max_wal_senders = 3
  • wal_keep_segments = 8
  • **hot_standby = on

What is the switchover process?

create a trigger file in standby and rewind your primary database and create recovey.conf file in new standby(old primary)

What is a replication slot?

Replication slot is a mechanism in Postgresql to ensure wal files are retained in primary if they are not applied in standby.

what is logical replication?

It is a mechanism to replicate logical objects like tables, databases in PostgreSQL.

What is vacuuming in PostgreSQL?

PostgreSQL’s VACUUM command has to process each table on a regular basis for several reasons:

  1. To recover or reuse disk space occupied by updated or deleted rows.
  2. To update data statistics used by the PostgreSQL query planner.
  3. To update the visibility map, which speeds up index-only scans.
  4. To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.

There are two variants of VACUUM: standard VACUUM and VACUUM FULL.

VACUUM FULL can reclaim more disk space but runs much more slowly.

Also, the standard form of VACUUM can run in parallel with production database operations. (Commands such as SELECT, INSERT, UPDATE, and DELETE will continue to function normally, though you will not be able to modify the definition of a table with commands such as ALTER TABLE while it is being vacuumed.)

VACUUM FULL requires exclusive lock on the table it is working on, and therefore cannot be done in parallel with other use of the table.

VACUUM creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.

recover or reuse disk space

If you have a table whose entire contents are deleted on a periodic basis, consider doing it with TRUNCATE rather than using DELETE followed by VACUUM.

 

TRUNCATE removes the entire content of the table immediately, without requiring a subsequent VACUUM or VACUUM FULL to reclaim the now-unused disk space. The disadvantage is that strict MVCC semantics are violated.

Update data statistics

The PostgreSQL query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the ANALYZE command, which can be invoked by itself or as an optional step in VACUUM. It is important to have reasonably accurate statistics, otherwise poor choices of plans might degrade database performance

The autovacuum daemon, if enabled, will automatically issue ANALYZE commands whenever the content of a table has changed sufficiently.

what is free space map?

Free Space Map (FSM) is a file to manage the volume of the free space in each page of the table file. It manages each page included in the file of the table at a single byte. The name of the FSM file is “{RELFILENODE}_fsm”. By referring to this file, it will be able to find the storage location of a tuple at a high speed.

can you tell me any important parameters in vacuum?

vacuum_freeze_min_age governs whether or not a tuple will be frozen while vacuum is already looking at a page to see if it has dead tuples that can be cleaned up. Tuples older than vacuum_freeze_min_age will be frozen in this case

Last there is autovacuum_max_freeze_age. If the last time the table was scanned completely for unfrozen rows was more than this many transactions ago, autovacuum will start an anti-wraparound vacuum on the table. The default is 200m

vacuum_freeze_table_age (integer)

VACUUM performs a whole-table scan if the table’s pg_class.relfrozenxid field has reached the age specified by this setting. The default is 150 million transactions

vacuum_freeze_min_age (integer) and vacuum_multixact_freeze_min_age

Specifies the cutoff age (in transactions) that VACUUM should use to decide whether to freeze row versions while scanning a table. The default is 50 million transactions.

vacuum_freeze_table_age (integer) and vacuum_multixact_freeze_table_age

VACUUM performs a whole-table scan if the table’s pg_class.relfrozenxid field has reached the age specified by this setting. The default is 150 million transactions.

autovacuum_freeze_max_age (integer) and autovacuum_multixact_freeze_max_age

Specifies the maximum age (in transactions) that a table’s pg_class.relfrozenxid field can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table.

How do you get the explain plan in PostgreSQL?

explain query;

what are the things you get when you run explain plan?

  1. Estimated start-up cost (0.00). This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node.
  2. Estimated total cost (201.00). This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node’s parent node might stop short of reading all available rows (see the LIMIT example below).
  3. Estimated number of rows output by this plan node (10000). Again, the node is assumed to be run to completion.
  4. Estimated average width of rows (49) output by this plan node (in bytes).
  5. Node: the operation that my PostgreSQL is performing.

 

 

 

 

 

About the Author

You may also like these

No Related Post

error: Content is protected !!