Deploying, Configuring, and Automating pg_partman on Amazon RDS PostgreSQL

Author: Abdul Sayeed | 7 min read | November 29, 2023

Introduction

This blog helps to configure pg_partman and convert tables into Partition Tables. Also, it will list out the steps to schedule retentions.

Initial Configuration

To configure pg_partman on RDS PostgreSQL, install PostgreSQL 14 on an EC2 Node or Kubernetes pods where pg_partman can be configured along with deploying the partition scripts and schedule for the retentions.

$ sudo apt update

$ sudo apt install postgresql postgresql-contrib

$ sudo -i -u postgres

postgres@ip-xxx-xx-xx-xx:/usr/lib/postgresql/14/bin$ psql -h -d dv_partman_test -U datavail -f ‘Partman_Config’ >> /tmp/partman_config.log

postgres@ip-xxx-xx-xx-xx:/usr/lib/postgresql/14/bin$ more Partman_Config

CREATE SCHEMA partman;

CREATE EXTENSION pg_partman WITH SCHEMA partman;

GRANT ALL ON SCHEMA partman TO partman;

GRANT ALL ON ALL TABLES IN SCHEMA partman TO partman;

GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO partman;

GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO partman;

Deploying SQL scripts for each table

Let’s take a table called “orders” as an example, which will be partitioned using the pg_partman extension.

postgres@ip-xxx-xx-xx-xx:/usr/lib/postgresql/14/bin$ more partman_test.sq

---Rename original table with _old name:

ALTER TABLE public.orders rename to orders_old;

--- Check min value :

SELECT TO_TIMESTAMP(min(clock)) from public.orders_old;

SELECT TO_TIMESTAMP(max(clock)) from public.orders_old;

-- Create template table

CREATE TABLE public.orders_template ( itemid bigint NOT NULL, clock integer NOT NULL, value bigint NOT NULL, ns integer NOT NULL DEFAULT 0 );

CREATE INDEX orders_template_inx ON public.orders_template USING btree (itemid, clock, ns);

-- Create orders table - partition

CREATE TABLE public.orders ( itemid bigint NOT NULL, clock integer NOT NULL, value bigint NOT NULL, ns integer NOT NULL DEFAULT 0 ) PARTITION BY RANGE (clock);

CREATE INDEX orders_inx ON public.orders USING btree (itemid, clock, ns);

-- Create partitions

SELECT partman.create_parent( p_parent_table := 'public.orders', p_control := 'clock', p_type := 'native', p_interval := 'daily', p_constraint_cols := null, p_premake := 10, p_automatic_maintenance := 'on', p_start_partition := null, p_inherit_fk := true, p_epoch := 'seconds', p_template_table :='public.orders_template' );

-- Validate config in partman select * from partman.part_config where parent_table='public.orders' \gx

-- Validate partitions

WITH RECURSIVE partition_info (relid, relname, relsize, relispartition, relkind) AS ( SELECT oid AS relid, relname, pg_relation_size(oid) AS relsize, relispartition, relkind FROM pg_catalog.pg_class WHERE relname = 'orders' AND relkind = 'p' UNION ALL SELECT c.oid AS relid, c.relname AS relname, pg_relation_size(c.oid) AS relsize, c.relispartition AS relispartition, c.relkind AS relkind FROM partition_info AS p, pg_catalog.pg_inherits AS i, pg_catalog.pg_class AS c WHERE p.relid = i.inhparent AND c.oid = i.inhrelid AND c.relispartition ) SELECT * FROM partition_info;

-- move data CALL partman.partition_data_proc('public.orders', p_batch := 1000, p_source_table := 'public.orders_old');

-- Validate row count with respect to child partition:

select to_char(TO_TIMESTAMP(clock),'dd-mm-YYYY') as daywise, count(*) as total_records from public.orders group by daywise order by daywise desc ;

-- Set retention: UPDATE partman.part_config set retention = '30 day', retention_keep_table = false, retention_keep_index = false WHERE parent_table = 'public.orders';

You need to update the schema name in the partman_test.sql script if there is a change in schema from public:

postgres@ip-xxx-xx-xx-xx:/usr/lib/postgresql/14/bin$ psql -h -d dv_partman_test -U datavail -f 'partman_test.sql' >> /tmp/ partman_test.log

Verify the log output:

postgres@ip-xxx-xx-xx-xx:/usr/lib/postgresql/14/bin$ more /tmp/ partman_test.log

SQL Scripts Include the Following Tasks

Partman_Config – It includes creating the partman extension, schema, and provisioning related permission to schema and underlying tables to the partman user.
ALTER Table – This will rename the existing table with _old to keep data and perform data load post creating partitions.
SELECT TO_TIMESTAMP(min(clock)) – To get the oldest record from current normal table.
SELECT TO_TIMESTAMP(max(clock)) – To get the latest record from current normal table.
CREATE TEMPLATE Table – This will create a template table to retain the index and which can be use while creating child tables. It means using the template table, pg_partman will create child tables with index.
CREATE INDEX on Template Table – Manually creating an index to apply for all child tables.
CREATE PARENT Table – This will create a parent table with the same name as existing normal table along with index.
CREATE PARTITIONS – This will create partition tables using pg_partman extension. It includes pre-make parameter which we have set to 10. It means it will create 10 future partitions.
SELECT Statement from pg_partman config file – To validate config in the partman schema.
SQL – WITH RECURSIVE – To validate partitions.
CALL partman.partition_data_proc – To load/move data from Normal Table to Partition tables. In this environment, we are using clock column for Partitioning, hence this CALL Procedure will load data with respect to date and move data in child partitions as per date.
VACUUM – Perform vacuum analyze. It will update the statistics to improve overall Partition tables performance.
SELECT TO_CHAR – Validate row count with respect to each partition’s tables.
UPDATE – Set retention. This will invoke when we call run_maintenance from scheduler and retain 30 child partitions as per our requirement.

Post Partition Validation

Manually check by comparing the output of SELECT TO_TIMESTAMP(min(clock)) and SELECT TO_TIMESTAMP(max(clock)).

Example:

dv_partman_test=> SELECT TO_TIMESTAMP(min(clock)) from public.orders_old;

to_timestamp

------------------------

2019-11-06 20:00:00+00

(1 row)

dv_partman_test=> SELECT TO_TIMESTAMP(max(clock)) from public.orders_old;

to_timestamp

------------------------

2023-07-20 18:00:00+00

(1 row)

dv_partman_test=> SELECT * from public.orders_p2019_11_06 ;

dv_partman_test=> SELECT * from public.orders_p2023_07_20 ;

Scheduling Maintenance

Linux crontab is used for scheduling or the pg_cron rds extension.

Below is a cron job example using Linux crontab scheduler:

00 00 * * * /usr/lib/postgresql/14/bin/psql -h -d dv_partman_test -U datavail -c “SELECT partman.run_maintenance(‘public.partman_test’);” >> /tmp/partman_test.log

To schedule maintenance tasks for pg_partman using pg_cron in PostgreSQL, first ensure that pg_cron is installed and properly configured. Below are the steps on how to schedule maintenance tasks such as partition creation, management, and cleanup using pg_cron.

Install and configure pg_cron

Ensure pg_cron is installed and properly configured in your PostgreSQL database. You’ll need to follow the installation instructions specific to your PostgreSQL version and environment. You can find the installation and configuration steps for pg_cron in the official documentation: https://github.com/citusdata/pg_cron

Schedule partition management and cleanup

Create SQL scripts for partition management (e.g., merging or splitting partitions) and cleanup (e.g., dropping old partitions). Schedule these scripts using pg_cron to run at the desired intervals.

-- Example: Schedule partition management (e.g., merging) script

SELECT cron.schedule('0 1 * * *', 'CALL merge_partitions()');

-- Example: Schedule partition cleanup script

SELECT cron.schedule('0 2 * * *', 'CALL cleanup_partitions()');

Adjust the cron schedules (0 1 * * * and 0 2 * * * in these examples) to your desired frequencies.

Verify and monitor scheduled jobs

Use SELECT * FROM pg_cron.job_stats; to monitor the scheduled jobs and their execution status.
With these steps, you have scheduled maintenance tasks for partition creation, management, and cleanup using pg_partman through pg_cron in PostgreSQL. Make sure to review the pg_partman and pg_cron documentation for a deeper understanding of their functionalities and usage.

Blog Author

Abdul Sayeed

Technical Manager of PostgreSQL Services

Abdul Sayeed, Technical Manager of PostgreSQL Services, Datavail, has 11+ years of IT experience, expert-level proficiency in PostgreSQL and Cassandra, and holds a Master of Computer Applications from SRTM University. He brings strong leadership qualities, problem-solving capabilities, and communication skills to the team. Abdul's certifications include AWS Certified Solutions Architect and Database Specialty, Azure Administrator, and PostgreSQL Professional Certified. Before his role at Datavail, he worked at EnterpriseDB, Cognizant, TechMahindra, and Percona.