Querying Empty Space on Redshift
If you’ve been using Amazon Redshift for a bit, you may be wondering how much empty space you have left on your Redshift cluster. This is important since you probably want to plan ahead before your cluster gets full and you have to upgrade to multiple clusters. Here are some options:
- rMonitor Redshift Storage via CloudWatch
- Check through “Performance” tab on AWS Console
- Query Redshift directly
Monitor Redshift Storage via CloudWatch
The easiest way to automatically monitor your Redshift storage is to set up CloudWatch Alerts when you first set up your Redshift cluster (you can set this up later as well).
This creates a CloudWatch Alarm, which you can later edit in your AWS Console. You can set thresholds such as 80% disk usage to trigger an alert. By putting in an email address into “Recipients” you can get notifications when this Alert is triggered.
When you put in an email address, make sure you click on the “Confirm subscription” link. Otherwise you won’t get any notification. On the CloudWatch side, here is what your PercentageDiskSpaceUsed might look like.
Check through “Performance” tab on AWS Console
Another quick option is to go to your AWS Console. After clicking on your Redshift cluster, you can go to the “Performance” tab and scroll to the bottom. There you will see a graph showing how much of your Redshift disk space is used. The image below is an example of a relatively empty cluster.
Query Redshift directly
Finally, you can directly query your Redshift cluster to check your disk space used. You can use the `stv_partitions` table and run a query like this:
select sum(capacity)/1024 as capacity_gbytes, sum(used)/1024 as used_gbytes, (sum(capacity) - sum(used))/1024 as free_gbytes from stv_partitions where part_begin=0;
Which will return something like this:
capacity_gbytes | used_gbytes | free_gbytes -----------------+-------------+------------- 372 | 11 | 360
Now, you might be surprised to see a capacity that is larger than what you thought you got with your cluster size. I certainly was (I expected 160GBs for the above cluster). We asked AWS Support what this was, and it turns out that the query gives disk space that includes space reserved for running the cluster (e.g., buffering data to run queries, etc.). Contrast this to the PercentageDiskSpaceUsed measure, which excludes such reserved disk space. This is why your Redshift cluster might be considered “Healthy” even if your PercentageDiskSpaceUsed hits 100%.
|Disk Space Provisioned for User Data||Disk Space Provisioned for Running Redshift|
|Querying stv_partitions using `capacity` and `used` columns||Included||Included|
It is best to use CloudWatch for automatically monitoring your disk space usage, but for quickly checking your cluster, queries against your cluster may come in handy.
How FlyData Can Help
FlyData provides continuous, near real-time replication between RDS, MySQL and PostgreSQL databases to Amazon Redshift. The FlyData Sync tool is an intuitive, powerful, cost-effective way to integrate your transactional databases and analytical data stores in a single interface with no manual scripting.
You can start a 14-day Free Trial and begin syncing your data within minutes. For questions about FlyData and how we can help accelerate your use-case and journey on Amazon Redshift, connect with us at firstname.lastname@example.org.