Now you’re ready to set up your Amazon Redshift Cluster. Here are the steps to do that.
Launch the Redshift Cluster
If you already have a running Amazon Redshift cluster you want to use, please note that you will still have to follow the last step (security policy).
From your AWS console, go to the Redshift page. Change the region to US-EAST-1 (N. Virginia),Click the “Launch Cluster” button as shown here:
Next you are shown a page to input cluster details.
Input your information here. Please note that the information that is inputted here is important and needs to be noted. We recommend naming your database as well as creating a unique Master username and password. If no database name is created, the default database name will be “dev”. After you are done, click the “Continue” button.
On the node configuration page, you are asked to specify node type and cluster size.
When choosing your cluster size, you can pick from a single cluster to a multi-node cluster. We recommend a multi-node cluster for better query performance, see our benchmarks here. After you decide, click the “Continue” button.
At the Additional Configuration screen, you are met with many more options.
You can change the cluster parameter group, encrypt your database, change the availability zone, and change the security of your cluster. Encrypting your database will affect the performance of your cluster. Click “Continue” after you are finished.
Finally, review all your settings on this review page. You may go back to edit any of these settings or click the “Launch Cluster” button to launch your cluster. Deploying the cluster may take some time so please be aware of that.
After the cluster is deployed, you may see the cluster information of the deployed cluster.
Take note of your endpoint under Cluster Database Properties, it will be used to access your Redshift cluster through your Postgres client and used in setting up FlyData.
Before you can continue, you will have to edit your security group that you have your cluster under.
If you have “Security Groups” item on “Security” menu of Amazon Redshift Cluster (No VPC)
For FlyData access, add a new EC2 Security Group connection type. Select that option from the pull down menu and add the following:
- AWS Account ID: 061397855422
- Security group: flydata
To add, click the add button. Please note that to access your Redshift cluster via your own Postgres client, you will also need to add another connection type. If you are accessing from your local environment, for example, you will need to add your own IP address via the CIDR/IP connection type option.
If you see the following message on “Security” of Amazon Redshift Cluster (using Default VPC)
Go to “VPC” from the AWS Console, and edit the security group that you chose when launching your Redshift cluster. Add TCP inbound rules to allow the access from FlyData servers. The IP addresses which need to be set are shown in the FlyData setting page.
- Port range: [Your Redshift Cluster Port] Source: [IP Address on FlyData Setting Page]
Create a Table
If you are planning to use FlyData to transfer information from CSV and TSV files to an Amazon Redshift cluster, you will first have to create the table. Because of the way Amazon Redshift is, you can connect to your cluster to create a table via many PostgreSQL clients. The psql client is a good way to connect to Amazon Redshift and you can create a table with some SQL know-how. For more detailed information on creating tables for Redshift, you can take a look at their documentation. By creating your own table, you can optimize each column the way you want as well as set distribution keys and sort keys. You may also set columns to make sure that there is always a value.
However, if you are using JSON data, we can automatically create a table for you. This will only happen if the table does not already exist. For this to happen as accurately as possible though, you should output data that matches your table as closely as possible. This would mean that the JSON data would need to be output in the desired order as well as include entries of the right type. For instance, if we see the value is recognized by JSON as an integer, we will automatically create a column that is an integer type. We support columns for int, bigint, float, varchar, timestamp, and date. We will create a regular table and a _dev table for you this way.
If you are creating a table with a timestamp or date data type, please keep in mind that you will have to format these types in a specific format for Amazon Redshift.
For timestamp, it will need to be in a “YY-MM-DD HH:mm:SS” string.
For date, it will need to be in a “YY-MM-DD” string