Select Page

Art of BI: Top 4 Initial Considerations for Data Lakes

Author: Christian Screen | | December 14, 2017

Data lakes have many advantages for companies with a wealth of structured and unstructured data – much of which may not need to be accessed immediately, but will be important for future analytics. This data needs to be stored and managed, but accessible – ideally in an affordable manner.

Data lakes are ideal for this situation because they are:

  • Flexible, allowing the collection of data for “just in case” scenarios
  • Inexpensive
  • Available “just in time”
  • Complimentary to an existing Enterprise Data Warehouse (EDW)
  • Able to free up existing EDW resources
  • Easily scalable

If you’re considering using this approach to data management, there are a few things to consider before you get started. The following are the 4 most important considerations our data lake and data warehousing experts have identified that you’ll want to make sure you explore before implementing a data lake.

  1. Identify your business requirements
    What new revenue streams should be explored by the business? What are the potential impacts of compliance and regulatory requirements going forward? What untapped data sources do you have access to (or would like access to) that hold the potential value to your business? How are data volumes from those sources expected to grow?
  2. Governance is Top Priority
    Data cataloging is an important principle that is consistently overlooked in business intelligence. Different data nuggets have different value and this value varies based on the lineage of the data, quality of data, the source of creation etc. The data needs to be cataloged so that a data analyst or a data scientist can decide for themselves which data point to use for a specific analysis.
  3. Consider whether warehouse integration is necessary
    Take a close look at the ease with which data integration can take place on the analytics platforms you are considering. Distributed and relational database management systems are here to stay; the name of the game is coexistence and cooperation, not replacement of one for the other.
  4. Don’t forget self-service
    In the world of big data analytics, IT needs to take a bystander role. That means the platform you choose must support self-service tools – several tools, ideally – that can be accessed and used by any user, not just data scientists.

Have questions? Contact us today to find out how to we can help you build your infrastructure and more efficiently manage your data.

Read This Next

Data Lakes Infographic

As data availability increases and technology becomes more sophisticated, organizing your data – and having access to it – continues to become more and more important.

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.

Vijay Muthu | February 4, 2021

How to Recover a Table from an Oracle 12c RMAN Backup

Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.

Megan Elphingstone | February 2, 2017

Best RAID For SQL Server | RAID 0, RAID 1, RAID 5, RAID 10

Which RAID should you use with SQL Server? Learn the differences between RAID 0, RAID 1, RAID 5, and RAID 10, along with best practices.

Eric Russo | June 8, 2015

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS