Create a table with 1 million rows in PostgreSQL

Ashwin Kumar Ramaswamy
4 min readSep 6, 2021

Hello Everyone I hope all are fine ! Today we are going to see about how to create a million of rows in a single table in Postgres database

Prerequisite

Install Postgres in your respective OS by using this link https://www.postgresql.org/download/

Step 1: Create an Employee table by using this query

CREATE TABLE EMPLOYEES (id SERIAL PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), email VARCHAR(50), mobile_no BIGINT, date_of_birth DATE);

Step 2 — Create a function for generating a random string.

Now we will create a function that will help us create a random string of alphabets. The function can be used for generating random strings for the first_name and last_name fields.

CREATE FUNCTION get_random_string() RETURNS TEXT LANGUAGE SQL AS $$ SELECT STRING_AGG ( SUBSTR ( 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', CEIL (RANDOM() * 52)::integer, 1), '') FROM GENERATE_SERIES(1, 10) 
$$;

The get_random_string() function generates a random string of length 10. The string generated will only have letters between A-Z and a-z.

Let’s understand the function:

CEIL (RANDOM() * 52)::integer

In the above snippet, RANDOM() will return a random number between 0 and 1, but since we want numbers between 1 and 52 (we have A-Z = 26 and a-z = 26 i.e., total 52 characters), we multiply that number with 52. The CEIL function helps us round the number to the next whole number. So if the number produced by RANDOM() * 52 is 47.9 then CEIL will round off the number to 48.
And why we are using ::INTEGER? That we will understand next.

SUBSTR ( 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', CEIL (RANDOM() * 52)::integer, 1)

The SUBSTR function takes 3 arguments — string, start_position (integer), and length.
We have specified string parameter as letters between A-Z and a-z.
The start_position parameter is the starting position of the substring in the specified string, and it has to be an integer therefore we were casting the output from the above part of the query to the integer using ::integer.
And finally, 1 indicates the length of the string which we are expecting.

SELECT STRING_AGG ( SUBSTR ( 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', CEIL (RANDOM() * 52)::integer, 1), '') 
FROM GENERATE_SERIES(1, 10)

The STRING_AGG function concatenates a list of strings and places a separator between them. Hence we have an empty separator.
Finally, the GENERATE_SERIES function generates a series of values, from start to stop with a step size of one. We have specified (1, 10) which means the STRING_AGG function will run the SUBSTR function 10 times, and each time gets one character and finally, after getting 10 characters, it concatenates all characters to form a string of length 10.

The rest part in the code snippet is just used for creating a function, giving it a name, and declaring the return type of that function.

Step 3 — Create a function to insert one record into the table.

We have our helper function get_random_string() ready, now we will create a function that will be responsible for inserting one record into our employee table.

CREATE FUNCTION insert_record() RETURNS VOID LANGUAGE PLPGSQL AS $$DECLARE first_name TEXT= INITCAP(get_random_string());
DECLARE last_name TEXT= INITCAP(get_random_string());
DECLARE email TEXT= LOWER(CONCAT(first_name, '.', last_name, '@gmail.com'));
DECLARE mobile_no BIGINT=CAST(1000000000 + FLOOR(RANDOM() * 9000000000) AS BIGINT);
DECLARE date_of_birth DATE= CAST( NOW() - INTERVAL '100 year' * RANDOM() AS DATE);BEGIN
INSERT INTO EMPLOYEES (first_name, last_name, email, mobile_no, date_of_birth) VALUES (first_name, last_name, email, mobile_no, date_of_birth);
END;$$;

insert_record() this function will do the magic and insert one row to the table.

Let’s understand the function:

DECLARE first_name TEXT= INITCAP(get_random_string());
DECLARE last_name TEXT= INITCAP(get_random_string());
DECLARE email TEXT= LOWER(CONCAT(first_name, '.', last_name, '@gmail.com'));
DECLARE mobile_no BIGINT=CAST(1000000000 + FLOOR(RANDOM() * 9000000000) AS BIGINT);
DECLARE date_of_birth DATE= CAST( NOW() - INTERVAL '100 year' * RANDOM() AS DATE);

First, we are declaring 5 variables, for our 5 fields in the table.
For first_name and last_name we are calling our helper function and using the INITCAP function it converts a string expression into the proper case or title case, in which the first letter of each word is in uppercase and the remaining characters in lowercase. So we got the title cased first_name and last_name.

Now for email, we concatenate first_name.last_name@gmail.com using the CONCAT function. And convert the whole string to lower case using the LOWER function.

For mobile_no, we take a random number and multiply it by 9000000000 and get the floor value out of that using the FLOOR function. We add the result with 1000000000 because it is possible that random returns 0.01 and after multiplied by 9000000000 it will create an 8 digit number. Finally, we cast it to BIGINT and not INTEGER because it is a 10 digit number. And the maximum supported 10 digit number for INTEGER is 2,147,483,647.

For date_of_birth, we pick a random DateTime from the last 100 years with the help of INTERVAL and RANDOM function and subtract that date with today’s (the day you are reading this) date. Finally, we cast that to date datatype.

I hope the insert query in the above function is self-explanatory, in the VALUES we are using declared variables (first_name, last_name, etc)for our fields.

We are done with the major things, one final query is remaining.

Step 4 — Calling insert_record() function a million times.

Now we just have to call the insert_record() function, remember it only inserts one record in the table.

SELECT insert_record() FROM GENERATE_SERIES(1, 1000000);

The above query will call the insert_record function 1,000,000 times. And guess what? it will obviously take some time to insert these many records.
After the insert is successful you can play around with the data in the table. To test it you can run the following command.

SELECT COUNT(*) from EMPLOYEES;

And it will return 1000000, the number of rows inserted in the EMPLOYEES table.

All sorts of fancy things like indexing, partitioning, sharding, etc can be done on the above table and performance can be tested.

Thank you !

--

--

Ashwin Kumar Ramaswamy

Young tech fellow having interest to work on tech domains likes Software development, Digital Marketing. Passionate in Technical Writing and Public Speaking.