FlyData™ automatically loads your data to Amazon Redshift continuously and securely in a matter of minutes. At FlyData we pride ourselves on helping our new and existing customers achieve the best Amazon Redshift replication strategy in the industry. While we have many useful features to improve the ease and efficiency of replication, we are consistently making strides to improve our platform and provide useful articles to help our customers improve their data replication process to Amazon Redshift.
In this article we are going to talk about splitting your tables from Amazon RDS/MySQL into separate applications using the same Amazon Redshift data destination target. By splitting your tables between applications, this can significantly improve performance. As you know, FlyData offers near-realtime replication, but let’s see how we can squeeze a bit more performance out by splitting our tables between two separate applications.
Let’s assume that you are going through the initial setup. In this example I have one database named “test” and two tables named “Employees” and “Persons“. The content of these tables are not important as it’s meant to demonstrate the process. Once you login to the FlyData Console, you can click the Get Started button to setup your source and target data targets.
In this step, you will enter all of the required information for your Amazon Redshift cluster. Please be sure to follow the simple instructions to assure your Amazon Redshift endpoint connects properly. Once you’ve connected you’ll be prompted to add your Amazon RDS/MySQL data source. Follow along with simple instructions to set your data source. It’s important to note the database name and also the table you’d like to sync. In this instance we will be connecting Amazon RDS/MySQL and choosing the “Persons” table from the “test” database to sync.
Once you click “Register“, Amazon RDS/MySQL will connect and your initial sync will begin. Unless there are issues with your data structure, this process will complete automatically without any interaction from your side. Next, we will click on “Settings“, and look at our application settings so that we can add another application which will handle the other table, “Employees“.
Click on “Add a new Application” and give it whatever name that you prefer, in this case, we’re simply calling it “Application 2“.
Once you create the second application you will again be prompted for your Amazon Redshift settings. Since we want to target the same Amazon Redshift cluster we can choose the existing entry from the drop down menu as seen below:
Click the “Connect” button to connect your existing Amazon Redshift target to the second application “Application 2″. Once it connects successfully you will be prompted to go through the data source wizard again, use the same settings that you used previously to setup the data source in Application 1, however this time for the table entry you’ll enter your second table name in this case, “Employees“.
One you have everything entered, click “Register” to begin the sync of your “Application 2” which will replicate the “Employees” table to your existing Amazon Redshift cluster in a separate process. Once you’ve completed this, you will see a page showing success.
Now that you’ve created two separate applications and have split your tables you will receive emails notifying you of an initial sync. A second set of emails confirming your sync was successful will be received and the continuous sync has been completed.
Now let’s make sure it works! From the FlyData console, click on the “Access Redshift” link and let’s check to see that both tables were replicated properly to Amazon Redshift.
As you can see both tables “Employees” and “Persons” were successfully replicated. As a final step, let’s make sure data has populated.
With a simple select query on each table we now see data replicated. Success!
While this example was very simple and had very little data, by splitting the tables between two separate applications we are able to achieve even better sync performance by distributing the load between two applications. In the end, near-realtime replication becomes even faster! Your use case may vary, but this technique is ideal when you have a very large data source with many tables and you’d like to speed up replication to Amazon Redshift. As an added bonus by splitting up large data sources like this it can make it easier for us to diagnose potential problems, if you experience them.
We hope you enjoyed this article and found it useful. If you have any questions or need further assistance, please email us at email@example.com and we’ll respond in a flash!