Polybase Feature in SQL 2016

By | In SQL Server | October 18th, 2017

Microsoft’s release of SQL Server 2016 represents a step toward better big data management and real-time analytics for the enterprise. While there are improvements in the 2016 version in speed and data security, enhanced analytics capabilities are among the top reasons to make the switch.

Tech Republic’s Mark Kaelin describes the improvements as a major step toward efficient, fast, real-time analysis of transactional data streams without having to use a separate application for analytics. With native integration with Microsoft’s Business Intelligence (BI) suite, the latest iteration of SQL Server can dramatically expand the data science capabilities of an enterprise without having to hire costly and highly sought Hadoop experts.

If there weren’t already enough reasons to upgrade to SQL 2016 Enterprise Addition, Polybase is a built-in feature for big data management that’s rightfully getting a lot of attention. While the feature was previously part of the Analytics Platform Services (APS), it’s the first time it’s been broadly released as part of the enterprise suite. Read on to learn how Polybase works and how it can drive better big data analytics capabilities within your organization.

What Is Polybase?

Polybase is a feature that allows organizations to efficiently connect SQL and data warehouses, including Hadoop clusters. SQL administrators can create standard SQL queries that are relayed to an external data lake and return the data results. This eliminates the need for complex Java or MapReduce actions, which are time-consuming barriers that have historically made big data analysis significantly more complicated for organizations.

There are multiple ways to combine external data sources with data stored on SQL server for multisource analytics. These include HDFS on Hadoop and Windows Azure Blob storage. Polybase has the built-in capability to reach into both external storage options with a single T-SQL query written within SQL Server.

What Polybase Can Do for You

For many organizations, a lack of Hadoop skills is a barrier to analyzing transactional data sources. The skills gap is among the key obstacles to big data at many organizations, with Hadoop and related Java programming knowledge being considered one of the biggest skills crisis in tech today. Polybase allows organizations to perform Hadoop analysis with no Hadoop knowledge and no need for additional Hadoop software add-ons. Polybase handles all the work needed to perform multiple actions, including:

  • Querying Hadoop data using T-SQL
  • Querying Azure Blob Storage data using T-SQL
  • Importing data from Hadoop, Azure Blog Storage or Azure Data Lake Storage without a separate import or extract, transform, load (ETL) tool.
  • Exporting data to Hadoop, Azure Blog or Azure Data Lake
  • Integrating with Microsoft BI stack or other third-party analytics tools

3 Polybase Use Cases for Organizations in Any Industry

Polybase acts as a bridge between SQL Server and external databases that are designed for the storage of massive data sets. The most exciting use cases for Polybase are related to improved mobility of data, including unprecedented access to big data sets. While Polybase’s potential isn’t limited to the use cases below, they illustrate some ways it could benefit organizations across industries.

1. Moving Infrequently Needed Data into Hadoop or Azure

Hadoop, Azure Blob Storage and Azure Data Lake storage are all solutions designed for the efficient storage of large data sets. With Polybase, DBAs who are experienced in T-SQL but lack Hadoop skills can help with cost-effective data hygiene by moving data from SQL server into Hadoop or Azure if demand for the data sets is minimal.

2. Streaming Analytics

Hadoop enables organizations to ingest, store and analyze fast-moving data streams, including insights from devices connected to the internet of things (IoT), mobile devices and other sensors. With Polybase, organizations can increase real-time data science capabilities by using Polybase to access streaming data sets for real-time intelligence and reporting.

3. Extensible, Fast Data Transfer to SQL

Prior to Polybase, moving data from Hadoop to SQL Server was possible but often challenging due to limited tool availability and data formatting restraints. Polybase makes it simpler and faster than ever to move data into SQL for business intelligence activities. Microsoft SQL Server 2016 offers extensibility for high-demand data movement, including Polybase Scale-Out Group that allows clustered servers for high-volume data movement from Hadoop to SQL Server.

Polybase Makes Big Data More Actionable

For many organizations, the Polybase feature has the potential to turn transactional insights stored in Hadoop or an Azure blob into actionable intelligence by making it possible to perform analytics based on simple, fast queries written within SQL server. Microsoft offers a free, 180-day trial of SQL Server 2016, allowing organizations to test Polybase and other new features at a minimum cost commitment.

When coupled with other improvements within SQL Server 2016, Polybase has the potential to make data more powerful for users across the enterprise. By improving communication between the SQL Server engine and external data storage sources, your organization can stop drowning in big data and improve your analytical capabilities.

To learn more about why Microsoft SQL Server 2016 has been called the best in this family of technology, download Datavail’s white paper “Making the Move to SQL Server 2016” today.

Contact Us
Eric Russo
Senior Vice President of Database Services
Eric Russo is SVP of Database Services overseeing all of Datavail’s database practices including project and managed services for MS SQL, Oracle, Oracle EBS, MySQL, MongoDB, SharePoint and DB2. He is also the Product Owner for Datavail Delta, a database monitoring tool. He has 21 years’ experience in technology including 16 years in database management. His management success and style has attracted top DBAs from around the world to create one of the most talented and largest SQL Server teams. He has been with Datavail since 2008: previous to that his work experiences include DBA Manager at StrataVia, Senior Web Developer at Manifest Information Systems and SQL Server DBA at Clark County, Nevada.

Leave a Reply

Your email address will not be published.
Required fields are marked (*).