At FlyData, we get inquiries from users about replicating database shards to Amazon Redshift. This article shows how to set up FlyData Sync to replicate database shards to Amazon Redshift.

Sample Database

shard

Setup

To replicate database shards, you first create a FlyData Sync application for each shard.

Setup FlyData applications

Create a new application for each database shard. Specify a Redshift schema name in the Redshift schema name field . To avoid table name conflict, the schema name must be unique for each shard. The Redshift schemas must be created beforehand.
shard_wizard

Once you set up applications for all shards, wait and relax until initial sync completes for all applications.

Create a view on Redshift

Once Sync started for all shards, you’re ready to access data on Redshift. You can access the tables as a single table by defining a view. CREATE VIEW command looks like this:

redshift> CREATE VIEW full_pageviews AS
SELECT * FROM shard1.pageviews UNION
SELECT * FROM shard2.pageviews UNION
SELECT * FROM shard3.pageviews;

Once the view is set up, querying against the view will return results from all shard tables.

Frequently Asked Questions

Q: Does sync’ing shards require any special settings or steps?

A: The only additional requirement is to use a different Redshift schema for shard. The rest is the same as setting up sync for a regular table.

Q: My shard tables have its own primary key column but they’re not unique across the tables. Can I replicate them using FlyData Sync?

A: Yes. You can sync shard tables as normal tables and create a view even if there is no unique key across tables.

Q: What if table schema changes on the source database?

A: FlyData Sync automatically replicates table schema changes such as ALTER TABLE ADD/DROP COLUMN to corresponding Redshift tables. For the view to pick up the change, however, you need to recreate the view.

redshift> DROP VIEW full_pageviews;
redshift> CREATE VIEW full_pageviews AS
SELECT * FROM shard1.pageviews UNION
SELECT * FROM shard2.pageviews UNION
SELECT * FROM shard3.pageviews;

NOTE: In case of ALTER TABLE DROP column, Sync cannot apply the change on the Redshift table if the view exists. You need to drop the view for FlyData Sync to apply the change and recreate after the change is applied on the Redshift tables.


About Mak Inada:

Ready to integrate to Redshift?

See if FlyData is right for you. No hassle. Free consultation.