Select Page

Merging the Hybrid World of Blobs and SQL Server 2016 with Polybase

Author: Eric Russo | | July 26, 2016

There are a number of new capabilities that have been introduced in SQL Server 2016. One of the features, that is not often discussed, is a tool called Polybase. I’ve personally been familiar with the general concept of Polybase for more than 3 years since it was first introduced with the PDW Appliance (MPP Architecture – now APS), but I’ve never been very well-versed on its functionality. It appears that Polybase has come a long way and I feel that integrating it with SQL Server 2016 was an excellent move on Microsoft’s part.

Why do I say that? Let’s answer that question by going over the purpose of Polybase. Assume you have data in structured (database like SQL Server) and unstructured (Hadoop or Blobs) formats. To make your databases system as efficient as possible, you need to pull data from both these worlds to create a single useable view – in your current database environment, this is quite a daunting task. Many customers solve this problem by using something like a SCOOP connector to Hadoop with an ETL package to bring the data from the Hadoop system into SQL Server or their data warehouse. From there, they build the needed reports.

Even though the above process works, Polybase can make it simpler by removing the need to have an ETL middle tier to transfer the data. Polybase gives you a query interface from SQL Server via T-SQL that can query both structured and unstructured data from SQL Server directly. Now you can join data sitting inside Hadoop and SQL Server in one single query and get the output. The ability to transfer data that is stored on an Azure Blob like this is impressive and incredibly useful for DBAs looking for more efficient solutions.

Now that we understand the functionality and benefits of Polybase, let’s review the TSQL Script I used to do the setup.



USE Polybase


EXECsp_configure'hadoop connectivity',7


You’ll notice that I have used the option ‘7’. This is because we are going to use the Azure Blob as the source. Next let’s look at the security script.

-- Creating master key and credential



Secret='your Secret key from blob';

Next is the meat of the script which I will use to configure a pointer to the blob account. Then we will define the schema so it will recognize when SQL Server does a query. Finally, we will map it to a definition of an external table that marries the blob location of the file to the file format we used.

-- Creating external data source (Azure Blob Storage) w/ credential




CREDENTIAL= AzureBlobSecret


-- define the file format being used





-- Define the table's schema when we will access


eid int,

ename varchar(50)




DATA_SOURCE= AzureStore_Polybase,

FILE_FORMAT= Poly_FileFormat



Once these are configured, we can access the data sitting inside a delimited file like any other table:

SELECT*FROM EmployeeExtTable


In this example, we used a blob as the source for storing files. However, they can also reside on a Hadoop filesystem and a similar code can be written to access them.

You might have more interesting use cases in your journey to building such Hybrid scenarios with Azure; I would love to hear about them. What are you doing within your organization to leverage the benefits of Polybase? Do you find it useful? Have you found any additional capabilities we may not have covered? Please leave us your thoughts in the “Comments” section.

How to Solve the Oracle Error ORA-12154: TNS:could not resolve the connect identifier specified

The “ORA-12154: TNS:could not resolve the connect identifier specified” Oracle error is a commonly seen message for database administrators.

Vijay Muthu | February 4, 2021

Shortcut to Fixing a Corrupted Oracle Inventory

Oracle Central Inventory files are essential to installing and patching software. The process for fixing a corrupted file can be found here.

Chad Cleveland | February 23, 2016

MongoDB Best Practices: Design, Deployment & More

This post provides a rundown of best practices to use when running MongoDB.

Esayas Aloto | February 28, 2017

Subscribe to Our Blog

Never miss a post! Stay up to date with the latest database, application and analytics tips and news. Delivered in a handy bi-weekly update straight to your inbox. You can unsubscribe at any time.

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.


Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.