In our very first post in this series, we discussed the difference between OLTP and OLAP data structures. And, how valuable to analytics it is to have data in OLAP data structure. In this post, we’ll build on this knowledge and discuss where data used for analytical purposes can live. Before, we dive into the various options for storage, we need to take a quick pitstop in database design.
We’ve covered data structures, the power of integrations, what it takes to get data transformed and ready for storage, so naturally we must discuss how data is stored. Admittedly, this topic could lead us down a very, very technical path including discussions like the use of unique and primary keys, and fact and dimension tables. But, in an attempt to keep this blog (and series) bite sized, we’ll hold off on the full court data press and keep our scope limited to the two types of data storage.
Data Warehouses vs. Data Lakes
By now, you know that the place for cross software system analysis is not inside of one of your business applications. You get that there needs to be a data Switzerland of sorts. And, likely you’ve heard the terms “Data Warehouse” and “Data Lake” thrown around. But, what’s the difference? And, more importantly, how do you decide what is right for your organization?
Let’s start with data warehouses. For definition’s sake, a data warehouse is a highly structured way to organize data into tables and views. Data warehouses are fact and dimension tables’ stomping grounds and where OLAP plants its roots. Once you have built an OLAP data model or given yourself a head start by partnering with a pre-built analytics solution, the data warehouse is the place where data is stored ready to answer your questions. If a data model is well-built, it has already taken into account the types of questions the audience will want answered.
We could spend paragraphs on data warehouses; however, we can cut the gist a bit quicker if we compare a data warehouse to a data lake. A data lake is a pool of data that is not yet structured. In many cases, data housed in a data lake doesn’t have a purpose yet. This is where a copy of ERP production data could be stored knowing that in the future it will need to be harvested and used for reporting, analytics or machine learning.
So, which do you need?
If we had to choose, we’d pick data warehouse. You’re a non-technical business leader who is in need of insights and answers to better run your business. So, you likely know the questions you are looking to solve. The only thing standing in your way is storing your data in a central location and synthesizing it so it’s readily available for any analysis. Data lakes can be valuable for organizations trying to store data for use in the future. If you are ready to take action around your data, you’ll likely need to push for a warehouse.
Getting your hands on a data warehouse is easier said than done. Especially, one that has the context of the questions you want answers for built in and has anticipated the nuances of your various data sources. In future parts of this series, we’ll explore the benefits and drawbacks of choosing to build your analytics infrastructure internally or partner with a pre-built analytics solution. It’s a decision not meant for IT only. Nor is it a decision that can chiefly be made by the business. So, we will highlight key decision-making points and common pitfalls organizations make when making these critical decisions.
In the meantime...
We’ll wrap up the part of the series by saying we hope you understand a bit more about the effort that goes into answering your valuable business questions. And, hope you’ll join us for the next part where we explore options for delivering analytics value. Or, as one of our data engineers likes to say “the icing on the cake.”