SlamData / Datavail Conference Interview
Join Datavail’s Chuck Ezell and GG Gaidhane, as well as SlamData’s Jeff Carr as they discuss overcoming the challenges of traditional data warehousing. One of them being getting all of the relational data into the data warehouse takes a lot of transformative processing. Find out more.
Stephen Faig: Welcome to Data Summit 2016 in New York City. I’m Stephen Faig, director of database trends and applications in unisphere research. Here with Jeff Carr of SlamData, Chuck Ezell of DataVail, and GG of DataVail. How are the needs around developing solutions quicker changing the capabilities enterprises need around data warehousing?
Chuck Ezell: There are a lot of different challenges that we face when we’re getting data out of all these ERP systems, CRM systems, Oracle EBS for example, the powerful system, there’s a lot of great relational data there, but getting it into the data warehouse takes a lot of transformative processing, a lot of energy related to extracting that data, transforming it, and loading it into the traditional warehouse that we have today, this star scheme of structure. Once we get it in there, we’ve got to be able to present it. Then we’ve got all the other issues on the other end of how do you then deal with the performance problems and other things related to our star scheme, our structure with a performance, or even the end to end development or trying to capture things like a perishable insights. With this whole ETL process, there are a lot … There’s a lot of energy involved in monitoring and developing and when the jobs fail there’s a lot of issues or daily activity wrapped around just making sure these things run.
What we wanted to do in our presentation was say, “Look, is there a better way? Is there a way we can eliminate this energy and structure that we have around transforming this data?” Here we have this opportunity to introduce a very unstructured JSON database perhaps Mongo database. We can extract the data and not do all the transformative activity, drop it into some type of great no sequel data store. Then we have a problem. We’ve got all this great data, all this JSON data that’s in this Mongo database for example, what do we do with it? We’ve got to represent it. We’ve got to be able to present the analytics from it. We wanted to also present some other options of how can we then transition from just having a lot of data JSON data into a way to represent this data in a nice analytical way.
GG: We can run all data exploration like that, jump, or even simple reports without having that entire ETL process. We can use something like SlamData, which works great with MongoDB, and we can have all the interactive reports quickly in time. That’s the, I think, fastest kind of market we can have.
Jeff Carr: One of the things that we’ve done with SlamData is we’ve essentially re-architected the core underpinnings of Sequel to support more complex data. One of the big changes that’s happened with JSON and CSV and XML and all of these modern data structures, non-tabular data structures, is they tend to have more dimensionality and more complexity to it. That’s problematic for traditional relational algebra in Sequel, just being able to query that and do it efficiently. One of the core invasions in SlamData is we’ve actually re-architected into what we call Sequel Squared, which is essentially … It’s a super-set of SQL in that it supports all of the traditional things we would expect SQL to do. We haven’t created a new language. It’s the same SQL that the artists know and love, but we’ve added some additional operators on top of it very, very discreet number of operators on top of it that allow us to deal with things like arrays and nested data and all of the complex things that you see in JSON and XML and these other modern data structures.
By doing that, it creates an opportunity for us to essentially directly attach to that data. Instead of today’s model where you have to bring in these very complex data flows, ETL all of them into your data warehouse, and then figure out how you want to query those, we created an environment where you can just dump all the data into whatever backend you choose, whether that’s Hadoop or MongoDB, and then immediately start interacting with that data and using SlamData where you can just write queries, pull back results, visualize those results, publish those reports, pip data to another application, just do whatever you want to do with it.
Chuck Ezell: It kind of takes this whole highly structured traditional data warehouse flips it upside down really, so you can do your extraction from all your other data sources and drop them into something like a Mongo database, and then have this really nice reporting front end that you can just touch the database. You’re not creating some intermediary schema structure, where you’re creating temporary tables and things that actually where you have to manage some middle layer. You basically touch the JSON data directly, and very powerful, very quick way to get right to the JSON.
Jeff Carr: The model we need going forward is that the analytics should actually match the data. Right now what we’re doing is taking an analytic model that was built 40 or 45 years ago and saying, “Well, if we can just get all the data to look exactly like what this tool wants then we’re fine.” It’s getting harder and harder because our data sources just keep growing, and it’s more complex, and it’s more data, and everything just getting … The data piece is getting more complex and if anything, as an industry, we’re encouraging that. We’re telling people to do really radically advanced things with data, but then when they analytics time comes, which it always does. That’s the whole point of data is at some point somebody’s going to want to analyze it, then we default to this, “Oh, figure out a way to smash it into some perfectly flat homogenous table that my tool that was essentially built 40 years ago, at least from a technology perspective will work with it.”
That’s a problem. The piece that we’re trying to get to is if data’s changing and it’s becoming more complex, which it is, then the tooling that we use to analyze it, query it, visualize it, should also get more complex. That’s what we’ve really tried to do, and I think we’ve made a lot of progress there. I’m sure there’s other people working on the problem as well, but that’s … It’s the mindset of we need to bring the tooling and the analytics along with the changes in the complexity of the data.