Looking for the right database technology to use? Luckily there are many database technologies to choose from, including relational databases (MySQL, Postgres), NoSQL (MongoDB), columnar databases (Amazon Redshift, BigQuery), and others. Each choice has its own pros and cons, but today let’s walk through how columnar databases are unique, by comparing it against the more traditional row-oriented database (e.g., MySQL).

database plan
 

Image Source

Row-Oriented Database

Traditional databases store data by each row. The fields for each record are sequentially stored. Let’s say you have a table like this:

+----+--------------+----------------------+----------+-------------+------------+-----------+-----+
| ID | name         | address              | zip code | phone       | city       | country   | age |
+----+--------------+----------------------+----------+-------------+------------+-----------+-----+
|  1 | Benny Smith  | 23 Workhaven Lane    | 52683    | 14033335568 | Lethbridge | Canada    |  43 |
|  2 | Keith Page   | 1411 Lillydale Drive | 18529    | 16172235589 | Woodridge  | Australia |  26 |
|  3 | John Doe     | 1936 Paper Blvd.     | 92512    | 14082384788 | Santa Clara| USA       |  33 | 
+----+--------------+----------------------+----------+-------------+------------+-----------+-----+

This two-dimensional table would be stored in a row-oriented database like this:

1,Benny Smith,23 Workhaven Lane,52683,14033335568,Lethbridge,Canada,43;2,Keith Page,1411 Lillydale Drive,18529,16172235589,Woodridge,Australia,26;3,John Doe,1936 Paper Blvd.,92512,14082384788,Santa Clara,USA,33;

As you can see, a record’s fieldsare stored one by one, then the next record’s fields are stored, then the next, and on and on…

Columnar Database

Contrast the above with how a columnar database would store this data:

1,2,3;Benny Smith,Keith Page,John Doe;23 Workhaven Lane,1411 Lillydale Drive,1936 Paper Blvd.;52683,18529,92512;14033335578,16172235589,14082384788;Lethbridge,Woodridge,Santa Clara;Canada,Australia,USA;43,26,33;

Each field is stored by the column so that each ‘id’ is stored, then the ‘name’ column, then the ‘zip codes’, etc. So what implications are there when storing data in a column-oriented fashion?

The Advantages of Columnar Databases, and the Disadvantages

The primary benefit you can get by storing data in a column-oriented database is that some of your queries could become really fast. Imagine, for example, that you wanted to know the average age of all of your users. Instead of looking up the age for each record row by row (row-oriented database), you can simply jump to the area where the “age” data is stored and read just the data you need. So when querying, columnar storage lets you skip over all the non-relevant data very quickly. Row-oriented:

1,Benny Smith,23 Workhaven Lane,52683,14033335568,Lethbridge,Canada,43;2,Keith Page,1411 Lillydale Drive,18529,16172235589,Woodridge,Australia,26;3,John Doe,1936 Paper Blvd.,92512,14082384788,Santa Clara,USA,33;

Columnar:

(skip) 43,26,33;

Hence, aggregation queries (queries where you only need to lookup subsets of your total data) could become really fast compared to row-oriented databases. Further, since the data type for each column is similar, you get better compression when running compression algorithms on each column (which would make queries even faster). And this is accentuated as your dataset becomes larger and larger. You might be thinking, “Well, what if I needed to query multiple values for each user? Aggregation is great, but my app needs to show data for each individual user??” You would be spot on. There are many cases where you actually do need multiple fields from each row. And columnar databases are generally not great for these types of queries. The more fields you need to read per record, the less benefits you get from storing in a column-oriented fashion. In fact, if your queries are for looking up user-specific values only, row-oriented databases usually perform those queries much faster.

aws
 

Image Source
Another thing to consider about columnar storage is that writing new data could take more time. If you’re inserting a new record into a row-oriented database, you can simply write that in one operation. But if you’re inserting a new record to a columnar database, you need to write to each column one by one. As a result, loading new data or updating many values in a columnar database could take much more time (perhaps, more than you expect). That’s why you would usually want a row-oriented database like MySQL running the back-end of your web app, etc. And once your app becomes huge, you would also want to consider having a columnar database like Amazon Redshift to run your BI (business intelligence) analytics queries (which usually consist of aggregation queries). We’ve seen many companies that make mobile games or web apps go through this same transition.

Conclusion

To summarize, columnar databases are good for:

  • Queries that involve only a few columns
  • Aggregation queries against vast amounts of data
  • Column-wise compression

But are not so good at:

  • Incremental data loading
  • Online Transaction Processing (OLTP) usage
  • Queries against only a few rows

In this post, we very quickly skimmed through the differences between columnar and row-oriented databases, but for those of you that want to dive in further, visit these resources that have in-depth explanations into columnar databases:

How FlyData Can Help

To integrate the transactional functions of your RDS, MySQL or PostgreSQL databases with the analytical performance of Amazon Redshift, FlyData provides continuous, near real-time replication between your various endpoints.  FlyData Sync is an extremely intuitive yet powerful, cost-effective way to integrate your transactional databases and analytical data stores in a single interface with no manual scripting.

You can start a 14-day Free Trial and begin syncing your data within minutes.  For questions about FlyData and how we can help accelerate your use-case and journey on Amazon Redshift, connect with us at sales@flydata.com.


About Alex Weber:
Growth Manager at FlyData Inc. Passionate about analytics, copywriting, and UX. Follow me on Twitter: @alexweberk. Loves iced coffee.

FlyData handles real-time replication for all of the major RDS systems, including mySQL, PostgreSQL, Amazon RDS, Amazon Aurora, MariaDB, and Percona.

With an easy one-time setup, you can focus on uncovering data to make business decisions.

No Credit Card Required

By using this website you agree to accept our Privacy Policy and Terms & Conditions