Select Page

Merging the Hybrid World of Blobs and SQL Server 2016 with Polybase

Eric Russo | | July 26, 2016

Merging Polybase

There are a number of new capabilities that have been introduced in SQL Server 2016. One of the features, that is not often discussed, is a tool called Polybase. I’ve personally been familiar with the general concept of Polybase for more than 3 years since it was first introduced with the PDW Appliance (MPP Architecture – now APS), but I’ve never been very well-versed on its functionality. It appears that Polybase has come a long way and I feel that integrating it with SQL Server 2016 was an excellent move on Microsoft’s part.

Why do I say that? Let’s answer that question by going over the purpose of Polybase. Assume you have data in structured (database like SQL Server) and unstructured (Hadoop or Blobs) formats. To make your databases system as efficient as possible, you need to pull data from both these worlds to create a single useable view – in your current database environment, this is quite a daunting task. Many customers solve this problem by using something like a SCOOP connector to Hadoop with an ETL package to bring the data from the Hadoop system into SQL Server or their data warehouse. From there, they build the needed reports.

Even though the above process works, Polybase can make it simpler by removing the need to have an ETL middle tier to transfer the data. Polybase gives you a query interface from SQL Server via T-SQL that can query both structured and unstructured data from SQL Server directly. Now you can join data sitting inside Hadoop and SQL Server in one single query and get the output. The ability to transfer data that is stored on an Azure Blob like this is impressive and incredibly useful for DBAs looking for more efficient solutions.

Now that we understand the functionality and benefits of Polybase, let’s review the TSQL Script I used to do the setup.

CREATEDATABASE Polybase

GO

USE Polybase

GO

EXECsp_configure'hadoop connectivity',7

RECONFIGURE

You’ll notice that I have used the option ‘7’. This is because we are going to use the Azure Blob as the source. Next let’s look at the security script.

-- Creating master key and credential

CREATEMASTERKEYENCRYPTIONBYPASSWORD='My$uperC00LP@ssw0rd';

CREATEDATABASESCOPEDCREDENTIAL AzureBlobSecret WITHIDENTITY='sqlauthpolybase1',

Secret='your Secret key from blob';

Next is the meat of the script which I will use to configure a pointer to the blob account. Then we will define the schema so it will recognize when SQL Server does a query. Finally, we will map it to a definition of an external table that marries the blob location of the file to the file format we used.

-- Creating external data source (Azure Blob Storage) w/ credential

CREATEEXTERNALDATASOURCE AzureStore_Polybase with (

TYPE=HADOOP,

LOCATION='wasbs://polybaseDemo@sqlauthpolybase1.blob.core.windows.net',

CREDENTIAL= AzureBlobSecret

)

-- define the file format being used

CREATEEXTERNALFILEFORMAT Poly_FileFormat WITH (

FORMAT_TYPE=DELIMITEDTEXT,

FORMAT_OPTIONS (FIELD_TERMINATOR='|',

USE_TYPE_DEFAULT= TRUE))

-- Define the table's schema when we will access

CREATEEXTERNALTABLE EmployeeExtTable(

eid int,

ename varchar(50)

)

WITH (

LOCATION='/SQLAuth.txt',

DATA_SOURCE= AzureStore_Polybase,

FILE_FORMAT= Poly_FileFormat

);

GO

Once these are configured, we can access the data sitting inside a delimited file like any other table:

SELECT*FROM EmployeeExtTable

GO

In this example, we used a blob as the source for storing files. However, they can also reside on a Hadoop filesystem and a similar code can be written to access them.

You might have more interesting use cases in your journey to building such Hybrid scenarios with Azure; I would love to hear about them. What are you doing within your organization to leverage the benefits of Polybase? Do you find it useful? Have you found any additional capabilities we may not have covered? Please leave us your thoughts in the “Comments” section.

12c Upgrade Bug with SQL Tuning Advisor

This blog post outlines steps to take on Oracle upgrade 11.2 to 12.1 if you’re having performance problems. Oracle offers a patch and work around to BUG 20540751.

Megan Elphingstone | March 22, 2017

Oracle EPM Cloud Vs. On-Premises: What’s the Difference?

EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.

Bobby Ellis | April 10, 2018

Scripting Out the Logins, Server Role Assignments, and Server Permissions

Imagine there are over one hundred logins in the source server and you need to migrate them all over to the destination server. Wouldn’t it be awesome if we could automate the process by generating the scripts for the required tasks?

JP Chen | October 1, 2015

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS