Select Page

Merging the Hybrid World of Blobs and SQL Server 2016 with Polybase

Author: Eric Russo | | July 26, 2016

There are a number of new capabilities that have been introduced in SQL Server 2016. One of the features, that is not often discussed, is a tool called Polybase. I’ve personally been familiar with the general concept of Polybase for more than 3 years since it was first introduced with the PDW Appliance (MPP Architecture – now APS), but I’ve never been very well-versed on its functionality. It appears that Polybase has come a long way and I feel that integrating it with SQL Server 2016 was an excellent move on Microsoft’s part.

Why do I say that? Let’s answer that question by going over the purpose of Polybase. Assume you have data in structured (database like SQL Server) and unstructured (Hadoop or Blobs) formats. To make your databases system as efficient as possible, you need to pull data from both these worlds to create a single useable view – in your current database environment, this is quite a daunting task. Many customers solve this problem by using something like a SCOOP connector to Hadoop with an ETL package to bring the data from the Hadoop system into SQL Server or their data warehouse. From there, they build the needed reports.

Even though the above process works, Polybase can make it simpler by removing the need to have an ETL middle tier to transfer the data. Polybase gives you a query interface from SQL Server via T-SQL that can query both structured and unstructured data from SQL Server directly. Now you can join data sitting inside Hadoop and SQL Server in one single query and get the output. The ability to transfer data that is stored on an Azure Blob like this is impressive and incredibly useful for DBAs looking for more efficient solutions.

Now that we understand the functionality and benefits of Polybase, let’s review the TSQL Script I used to do the setup.

CREATEDATABASE Polybase

GO

USE Polybase

GO

EXECsp_configure'hadoop connectivity',7

RECONFIGURE

You’ll notice that I have used the option ‘7’. This is because we are going to use the Azure Blob as the source. Next let’s look at the security script.

-- Creating master key and credential

CREATEMASTERKEYENCRYPTIONBYPASSWORD='My$uperC00LP@ssw0rd';

CREATEDATABASESCOPEDCREDENTIAL AzureBlobSecret WITHIDENTITY='sqlauthpolybase1',

Secret='your Secret key from blob';

Next is the meat of the script which I will use to configure a pointer to the blob account. Then we will define the schema so it will recognize when SQL Server does a query. Finally, we will map it to a definition of an external table that marries the blob location of the file to the file format we used.

-- Creating external data source (Azure Blob Storage) w/ credential

CREATEEXTERNALDATASOURCE AzureStore_Polybase with (

TYPE=HADOOP,

LOCATION='wasbs://[email protected]',

CREDENTIAL= AzureBlobSecret

)

-- define the file format being used

CREATEEXTERNALFILEFORMAT Poly_FileFormat WITH (

FORMAT_TYPE=DELIMITEDTEXT,

FORMAT_OPTIONS (FIELD_TERMINATOR='|',

USE_TYPE_DEFAULT= TRUE))

-- Define the table's schema when we will access

CREATEEXTERNALTABLE EmployeeExtTable(

eid int,

ename varchar(50)

)

WITH (

LOCATION='/SQLAuth.txt',

DATA_SOURCE= AzureStore_Polybase,

FILE_FORMAT= Poly_FileFormat

);

GO

Once these are configured, we can access the data sitting inside a delimited file like any other table:

SELECT*FROM EmployeeExtTable

GO

In this example, we used a blob as the source for storing files. However, they can also reside on a Hadoop filesystem and a similar code can be written to access them.

You might have more interesting use cases in your journey to building such Hybrid scenarios with Azure; I would love to hear about them. What are you doing within your organization to leverage the benefits of Polybase? Do you find it useful? Have you found any additional capabilities we may not have covered? Please leave us your thoughts in the “Comments” section.

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS Oracle error message is very common for database administrators. Learn how to diagnose & resolve this common issue here today.

Vijay Muthu | February 4, 2021

Data Types: The Importance of Choosing the Correct Data Type

Most DBAs have struggled with the pros and cons of choosing one data type over another. This blog post discusses different situations.

Craig Mullins | October 11, 2017

How to Recover a Table from an Oracle 12c RMAN Backup

Our database experts explain how to recover and restore a table from an Oracle 12c RMAN Backup with this step-by-step blog. Read more.

Megan Elphingstone | February 2, 2017

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS