In this guide, we’ll go over the Redshift COPY command, how it can be used to import data into your Redshift database, its syntax, and a few troubles you may run into.

What is the Redshift COPY command?

The Redshift COPY command, funnily enough, copies data from one source and loads it into your Amazon Redshift database. The source can be one of the following items:

  1. An Amazon s3 bucket (the most common source)
  2. An Amazon EMR cluster
  3. An Amazon DynamoDB table
  4. An external host (via SSH)

If your table already has data in it, the COPY command will append rows to the bottom of your table.

Redshift COPY: Syntax & Parameters

The COPY command is pretty simple. To use it, you need three things:

  1. The name of the table you want to copy your data into
  2. Your data source (see list of potential data sources above)
  3. Authorization to access your data source (usually either an IAM role or the access ID and secret key of an IAM user)

If the source file doesn’t naturally line up with the table’s columns, you can specify the column order by including a column list.

Here’s a simple example that copies data from a text file in s3 to a table in Redshift:

copy catdemo
from 's3://awssampledbuswest2/tickit/category_pipe.txt'
iam_role 'arn:aws:iam:::role/'
region 'us-west-2';

If the source file doesn’t naturally line up with the table’s columns, you can specify the column order by including a column list in your COPY command, like so:

copy catdemo (column1, column2, etc.)
from 's3://awssampledbuswest2/tickit/category_pipe.txt'
iam_role 'arn:aws:iam:::role/'
region 'us-west-2';

AWS assumes that your source is a UTF-8, pipe delimited text file. If it is not, you need to let it know by using the FORMAT AS parameter. You can learn more about the exact usage here. If you have any questions, let us know in the comments!

How to load data from different s3 regions

Some people may have trouble trying to copy their data from their own S3 buckets to a Redshift cluster. This can easily happen when an S3 bucket is created in a region different from the region your Redshift cluster is in. The error message given is not exactly the clearest, and it may be very confusing. Fortunately, the error can easily be avoided, though, by adding an extra parameter.

ERROR: S3ServiceException

For a regular COPY command to work without any special options, the S3 bucket needs to be in the same region as the Redshift cluster.

flydataherokusandboxcluster=# copy one_column ("number") from 's3://test-buckets/region_test' CREDENTIALS 'aws_access_key_id=XXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter '\t';

If not, you may get an error similar to this:

ERROR:  S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 
DETAIL:
  -----------------------------------------------
  error:  S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid 
  code:      8001
  context:   Listing bucket=test-buckets prefix=region_test
  query:     12715653
  location:  s3_utility.cpp:529
  process:   padbmaster [pid=31787]
  -----------------------------------------------

The error message is a bit ambiguous. What it is actually stating is that another option needs to be added to your COPY command to define your S3 bucket region.

Specifying the S3 Regions

To do this, you have to add the REGION option to your COPY command. For example, if our S3 bucket was in the US-WEST-2 region, we will add this to our COPY command like so:

flydataherokusandboxdbcluster=# copy one_column ("number") from 's3://test-buckets/region_test' CREDENTIALS 'aws_access_key_id=XXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter '\t' region as 'us-west-2';

INFO:  Load into table 'one_column' completed, 1 record(s) loaded successfully.
COPY

This method can also be used to verify a Redshift cluster’s region, if the region for your Redshift cluster is not clear. If a COPY is successful without using the REGION argument for the COPY command, that confirms that the Redshift cluster is in the same region as your S3 bucket.


About Jackson Riso:
Originally from Portland, OR, Jackson wanted to be an archaeologist when he grew up.

FlyData handles real-time replication for all of the major RDS systems, including mySQL, PostgreSQL, Amazon RDS, Amazon Aurora, MariaDB, and Percona.

With an easy one-time setup, you can focus on uncovering data to make business decisions.

No Credit Card Required

By using this website you agree to accept our Privacy Policy and Terms & Conditions