Data lakes have many advantages for companies with a wealth of structured and unstructured data – much of which may not need to be accessed immediately, but will be important for future analytics. This data needs to be stored and managed, but accessible – ideally in an affordable manner.
Data lakes are ideal for this situation because they are:
- Flexible, allowing the collection of data for “just in case” scenarios
- Available “just in time”
- Complimentary to an existing Enterprise Data Warehouse (EDW)
- Able to free up existing EDW resources
- Easily scalable
If you’re considering using this approach to data management, there are a few things to consider before you get started. The following are the 4 most important considerations our data lake and data warehousing experts have identified that you’ll want to make sure you explore before implementing a data lake.
- Identify your business requirements
What new revenue streams should be explored by the business? What are the potential impacts of compliance and regulatory requirements going forward? What untapped data sources do you have access to (or would like access to) that hold the potential value to your business? How are data volumes from those sources expected to grow?
- Governance is Top Priority
Data cataloging is an important principle that is consistently overlooked in business intelligence. Different data nuggets have different value and this value varies based on the lineage of the data, quality of data, the source of creation etc. The data needs to be cataloged so that a data analyst or a data scientist can decide for themselves which data point to use for a specific analysis.
- Consider whether warehouse integration is necessary
Take a close look at the ease with which data integration can take place on the analytics platforms you are considering. Distributed and relational database management systems are here to stay; the name of the game is coexistence and cooperation, not replacement of one for the other.
- Don’t forget self-service
In the world of big data analytics, IT needs to take a bystander role. That means the platform you choose must support self-service tools – several tools, ideally – that can be accessed and used by any user, not just data scientists.
Have questions? Contact us today to find out how to we can help you build your infrastructure and more efficiently manage your data.
EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.
Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.