Querying Empty Space on Redshift

If you’ve been using Amazon Redshift for a bit, you may be wondering how much empty space you have left on your Redshift cluster. This is important since you probably want to plan ahead before your cluster gets full and you have to upgrade to multiple clusters. Here are some options:

  • rMonitor Redshift Storage via CloudWatch
  • Check through “Performance” tab on AWS Console
  • Query Redshift directly

Monitor Redshift Storage via CloudWatch

The easiest way to automatically monitor your Redshift storage is to set up CloudWatch Alerts when you first set up your Redshift cluster (you can set this up later as well).

CloudWatch for Redshift

This creates a CloudWatch Alarm, which you can later edit in your AWS Console. You can set thresholds such as 80% disk usage to trigger an alert. By putting in an email address into “Recipients” you can get notifications when this Alert is triggered.

AWS CloudWatch Confirm Email

When you put in an email address, make sure you click on the “Confirm subscription” link. Otherwise you won’t get any notification. On the CloudWatch side, here is what your PercentageDiskSpaceUsed might look like.

CloudWatch Monitoring Redshift Diskspace-1

CloudWatch Monitoring Redshift Disk Space-2

Check through “Performance” tab on AWS Console

Another quick option is to go to your AWS Console. After clicking on your Redshift cluster, you can go to the “Performance” tab and scroll to the bottom. There you will see a graph showing how much of your Redshift disk space is used. The image below is an example of a relatively empty cluster.

Query Redshift directly

Finally, you can directly query your Redshift cluster to check your disk space used. You can use the `stv_partitions` table and run a query like this:

select
  sum(capacity)/1024 as capacity_gbytes, 
  sum(used)/1024 as used_gbytes, 
  (sum(capacity) - sum(used))/1024 as free_gbytes 
from 
  stv_partitions where part_begin=0;

Which will return something like this:

capacity_gbytes  | used_gbytes | free_gbytes
-----------------+-------------+-------------
372              | 11          | 360

Now, you might be surprised to see a capacity that is larger than what you thought you got with your cluster size. I certainly was (I expected 160GBs for the above cluster). We asked AWS Support what this was, and it turns out that the query gives disk space that includes space reserved for running the cluster (e.g., buffering data to run queries, etc.). Contrast this to the PercentageDiskSpaceUsed measure, which excludes such reserved disk space. This is why your Redshift cluster might be considered “Healthy” even if your PercentageDiskSpaceUsed hits 100%.

Disk Space Provisioned for User Data Disk Space Provisioned for Running Redshift
PercentageDiskSpaceUsed Included Not Included
Querying stv_partitions using `capacity` and `used` columns Included Included

Summary

It is best to use CloudWatch for automatically monitoring your disk space usage, but for quickly checking your cluster, queries against your cluster may come in handy.

How FlyData Can Help

FlyData provides continuous, near real-time replication between RDS, MySQL and PostgreSQL databases to Amazon Redshift.  The FlyData Sync tool is an intuitive, powerful, cost-effective way to integrate your transactional databases and analytical data stores in a single interface with no manual scripting.

You can start a 14-day Free Trial and begin syncing your data within minutes.  For questions about FlyData and how we can help accelerate your use-case and journey on Amazon Redshift, connect with us at sales@flydata.com.


About Alex Weber:
Growth Manager at FlyData Inc. Passionate about analytics, copywriting, and UX. Follow me on Twitter: @alexweberk. Loves iced coffee.

FlyData handles real-time replication for Amazon RDS and Aurora, MySQL and PostgreSQL.

Get set up in minutes. Start uncovering data to make faster, better business decisions today.

By using this website you agree to accept our Privacy Policy and Terms & Conditions