My fun journey of managing a large table of PostgreSQL (2024)

Recently, I have had a chance to work on a very large data table. By very large, I mean, 1.5 billion rows up to this point, and it growing about by about 300,000 rows every single minute.

The table takes in the constant stream of data from IoT Edge devices, and I periodically run the data consolidation script(let’s call it consolidator) to aggregate the data stream into data points by each minute.

The consolidator, at first, runs reasonably fast for its size. It takes about a minute to perform an SQL SELECT/GROUP BY/ORDER BY function. One might be surprised that one minute is relatively fast compared to the number of rows in the table. The reason for that is, I do, use an index to keep track of data. I will take you with me through my journey to the data wonderland.

For a reader who just wants TLDR

  1. Try indexing the columns that are used in conditional clauses.
  2. Think about the nature of your data. If it has intrinsic order, using BRIN might help.
  3. If you are using GROUP BY DATE_TRUNC or some similar kind of aggregate function, you might as well index the aggregated results.
  4. Every time the rows are inserted, the indices are also updated.
  5. SQL is declarative, so the end result of GROUP BY is order-independent. But indexing is not, and you might need to start the aggregate with whatever indexed column.

At the very first step, for one who works with a big data table, indexing is the first go-to solution. Indexing will basically rearrange your sequential data into a better kind of data structure that is searching-friendly in terms of time complexity. Postgres automatically creates indices for unique fields including the primary key fields. The default data structure of B-tree. This is a balanced tree of size 8KB, which will make the depth of just 3 even though the number of records is billions!

I found this a great resource explaining indexing if you have some spare time.

Since there is no free lunch, we sacrifice our space for the time complexity trade-off. The space used for indexing is just about 11GB in size per index!

My fun journey of managing a large table of PostgreSQL (2)

Luckily, the data is time-series. This enables one option for indexing as a Block-range index(aka. BRIN). It requires minimal space of just over 4kB.

My fun journey of managing a large table of PostgreSQL (3)

This indexing is based on an assumption that columns are well-correlated with the physical order of the table rows. In my specific case, the order of data is well-correlated(timestamp of each measurement value). And what kind of order is NOT well-correlated? You might ask. The data that order is artificially created. For example, user id 0001, 0002, 0003, … does not tell anything besides its order. it doesn’t naturally correlate with the columns’ values.

So, All I need is just to go ahead and implement the BRIN index and go enjoy life in the lavender field? No, not so fast…

My SQL query looks like this

SELECT data1, DATE_TRUNC('minute', data_timestamp), data2 AS minute_ts
FROM data_table
WHERE data_timestamp
BETWEEN DATE_TRUNC('minute', '2022-09-17 07:15+7'::timestamptz)
AND DATE_TRUNC('minute', '2022-09-17 07:30+7'::timestamptz)
GROUP BY data2, DATE_TRUNC('minute', data_timestamp);

And this query took about 14 minutes. Not ideal, but if you compare it to this.

SELECT data1, DATE_TRUNC('minute', data_timestamp), data2 AS minute_ts
FROM data_table
WHERE data_timestamp
BETWEEN DATE_TRUNC('minute', '2022-09-17 07:15+7'::timestamptz)
AND DATE_TRUNC('minute', '2022-09-17 07:30+7'::timestamptz)
GROUP BY DATE_TRUNC('minute', data_timestamp), data2;

Which took me more than 12 hours just to realize that, perhaps, I rather give up waiting.

So, what’s the difference here? What makes it takes so much time to retrieve the same query(from a declarative perspective) with a different order of group by column?

  • Aggregate is expensive
  • You might need to be careful about index when you want to optimize aggregation query
  • Insert Query Plan Image of wrong indexing
  • Insert Query Plan Image of indexing with DATE_TRUNC function

Remember that the table I am trying to optimize is taking a constant of data from IoT Devices? It implies that DB-Engine will, most of the time, perform inserts operation.

This is taken from the official document of the PostgreSQL website.

Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks doing so would be more efficient than a sequential table scan.

The keyword here is “update”! Meaning, the index needs to be re-arranged, every time new data is inserted. This doesn’t sound too bad for the BRIN-index type. However, in order to keep the query planner using either the system runs ANALYZE command or you have to run it periodically.

But you might have to run the ANALYZE command regularly to update statistics to allow the query planner to make educated decisions.

In conclusion, if you are working on big data with PostgreSQL. You might very well need to consider the nature of your application seriously. Knowing the right indexing might help on speed things up. BRIN index might help to keep index size in check. However, you should double check with Query planner that it does what you expect.

Next article is how I manage to break down large index (and table) into partition using pg_partman.

My fun journey of managing a large table of PostgreSQL (2024)
Top Articles
Verify Assemblies are loaded with GACUtil
Can a Second Home Be Considered a Primary Residence?
Napa Autocare Locator
Www.politicser.com Pepperboy News
Comforting Nectar Bee Swarm
Sportsman Warehouse Cda
Beds From Rent-A-Center
Crime Scene Photos West Memphis Three
Dark Souls 2 Soft Cap
Seth Juszkiewicz Obituary
Aita Autism
Craigslist Cars Nwi
6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com
The Shoppes At Zion Directory
Restaurants Near Paramount Theater Cedar Rapids
Swedestats
Caledonia - a simple love song to Scotland
EASYfelt Plafondeiland
Winco Employee Handbook 2022
Ac-15 Gungeon
Chime Ssi Payment 2023
Turbo Tenant Renter Login
Cb2 South Coast Plaza
At 25 Years, Understanding The Longevity Of Craigslist
Panolian Batesville Ms Obituaries 2022
No Limit Telegram Channel
208000 Yen To Usd
Table To Formula Calculator
Anesthesia Simstat Answers
Weather Underground Durham
Craigslist Sf Garage Sales
Grand Teton Pellet Stove Control Board
Ixlggusd
Ixl Lausd Northwest
Amici Pizza Los Alamitos
Louisville Volleyball Team Leaks
Reborn Rich Ep 12 Eng Sub
Dr Adj Redist Cadv Prin Amex Charge
The Thing About ‘Dateline’
Silive Obituary
התחבר/י או הירשם/הירשמי כדי לראות.
Exam With A Social Studies Section Crossword
Rocket Lab hiring Integration & Test Engineer I/II in Long Beach, CA | LinkedIn
Aznchikz
Used Auto Parts in Houston 77013 | LKQ Pick Your Part
15:30 Est
Rocket Bot Royale Unblocked Games 66
Coleman Funeral Home Olive Branch Ms Obituaries
Nfsd Web Portal
Buildapc Deals
라이키 유출
Lorcin 380 10 Round Clip
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5542

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.