How design table partitioning for weather data of thousands of weather - Enhance your coding expertise with Luc on @onlycoders.net

2 years ago

#70541

Luc

How design table partitioning for weather data of thousands of weather stations

I have an MySQL database set up with weather data of multiple weather stations. All stations send their data every 5 minutes. A station has multiple sensors and their values are stored in a separate rows per sensor.

For instance: when a station sends the temperature, humidity, rain and windspeed, there are 4 records stored in the 'values' table in de database. And this every 5 minutes. Our database is grown to over >1 billion records. We don't have problems with performance, because we have good indexes and optimized queries. However, sometimes we have to perform manual queries to extract some specific data with a WHERE clause on the datetime field. And such a query takes much more time. And a few months ago we had to restore a backup, and that took half a day because of the big values table. So I am thinking of a solution to have a smaller 'values' table.

In our applications we need only access to the data of the last 3 months. And users can make graphs or download data based on years (2018, 2019, 2020, 2021, 2022 etc.).

I am thinking of a redesign of the database. The first idea was to create an archive cron script, that automatically archives data in year archives. Let's say different tables like 'values2018', 'values2019' etc...

In that way I have full control in the cron to archive data and keep only 3 months of data in the 'values' table. But I see a disadvantage in this approach: The archive cron will have to loop trough data to move (INSERT) data into another table and to remove the data in de 'values' table. I think this could be very time consuming and in the meantime the whole database should be accessible for inserting new data from the stations.

The second idea is to use MySQL partitioning and create RANGE partitions based on year. But I have no experience with partitioning, so this raises some questions:

How fast will an ALTER TABLE to create the partitions be done with over 1 billion records? Does this take long and or cost performance while being executed?

Will MySQL automatically optimize my existing queries to SELECT data from the 'values' table to use the correct partitions only? Or do I need to specify which partition to SELECT from? And should I modify the INSERT queries to insert new data?

Which of the above described solutions is preferred (manual archive or partitioning) and why?

mysql

partitioning