Implementing COUNT(DISTINCT) with Fast Refresh Materialized Views

By | In Oracle | December 24th, 2010

Last week when we talked about how to implement a fast refresh materialized view with a MAX aggregate function. This week we will tackle another restriction on fast refresh materialized views. We will work through an example on how to modify a materialized view that includes a COUNT(DISTINCT) in order to get it to fast refresh.

An example COUNT(DISTINCT) query

One of the first SQL queries you probably learned to write was to get the number of rows using COUNT(*). If you want to get the number of distinct values from a set of rows, you’ll want to change this COUNT(*) into a COUNT(DISTINCT). Let’s code up an example with both COUNT(*) and COUNT(DISTINCT) using the same orders table that we used last week. Here is that table with some sample data.

For our COUNT(DISTINCT) example, let’s group by each day and get both the number of orders for that day as well as the number of distinct customers who placed orders on that day. We’ll use this example SQL query as the basis for our materialized view (mview).

We can see in our example data that on December 8th, there were four orders placed by three different customers.

What happens when we try to use COUNT(DISTINCT) in a fast refresh materialized view?

Let’s try to create a fast refresh mview with the example query we came up with above. We will need a materialized view log (mlog) on the orders table, so let’s create that first and see what happens when we try to create the mview.

We get an ORA-12015 error, telling us that the mview cannot be created. Maybe DBMS_ADVISOR has something to offer us in terms of fixing this?

What Oracle is basically telling us to do here is to remove the COUNT(DISTINCT) from our mview query and to change the GROUP BY clause from TRUNC(order_date) to a combination of order_date and customer_id. Not what we wanted, but somewhat helpful, as we will see later on.

It looks like we will be unable to implement a fast refresh mview that has a COUNT(DISTINCT) in it. So is this the end of the line? Are we stuck? No. We can create the fast refresh mview with the structure and view of the data the way we want, we’ll just need to find another way without the COUNT(DISTINCT).

A workaround for COUNT(DISTINCT) in a fast refresh materialized view

If we cannot create a fast refresh mview that includes a COUNT(DISTINCT), then perhaps we can rewrite the mview to get rid of the COUNT(DISTINCT). We still want the data in the same format that COUNT(DISTINCT) provided us, we just want to change the implementation. So how can we do this? The trick will be to create an mview on top of an mview.

For our two mview solution, we will want one mview that looks like the original SQL query. Let’s call this the “top” mview and this will be the mview that our end users will query. The “bottom” mview will look similar to the materialized view that DBMS_ADVISOR suggested. The two mviews will have different primary keys. The top mview will have a PK of order_day and the bottom mview will have a PK of order_day and customer_id. Notice how the bottom mview has the additional customer_id in it’s primary key and customer_id was also the column we were doing the COUNT(DISTINCT) on.

Let’s create this bottom mview first and name it “orders_by_day_and_cust_mv”. We’ll build the primary key for this mview based on the columns we are grouping by.

The top mview will query this bottom mview we just created. We will implement the COUNT(*) from our original query as a SUM of num_daily_customer_orders and the COUNT(DISTINCT) as a COUNT of num_daily_customer_orders. The bottom mview will need a materialized view log as well, so we’ll create that first.

So far so good. We have removed the COUNT(DISTINCT) from our mviews and the data looks exactly like the output of our example query. We missed something though, which causes us to get an error when we UPDATE some data and then try to refresh the mviews. Let’s see what that looks like and then we’ll fix that in a minute.

This ORA-32314 is easy enough to resolve. We just forgot to add an addition column because of the SUM aggregate function. Oracle requires us to have both COUNT(*) and COUNT(column_name) on our fast refresh mviews if we have SUM(column_name). In this example, we have SUM(num_daily_customer_orders), and had already added COUNT(num_daily_customer_orders), so let’s just add the missing COUNT(*) and try our UPDATE again.

It works just the way we wanted. The only thing left to decide is how we are going to refresh both the top and bottom mviews together.

Your choice on how to fast refresh the two mview solution

We need a plan to refresh both mviews in our two mview approach to this problem. Last week when we talked about how to get an mview to fast refresh with a MAX aggregate function, we ended up creating a PL/SQL procedure to do our refresh. We had some custom code in that example, so a PL/SQL procedure was the way to go. We could use that same approach here, or we could use the built-in functionality in the DBMS_MVIEW.REFRESH procedure to handle this. The choice is yours.

PL/SQL procedure approach
This is fairly straight forward. We want to refresh the bottom mview before refreshing the top mview. This could be something as simple as:

Or we might modify the procedure to optionally support a complete refresh of both mviews:

Nested refresh approach
Last week we were executing a MERGE statement inside our PL/SQL refresh procedure. In this example, we are using only the functionality of mviews as provided by Oracle, and Oracle has an optional parameter in the DBMS_MVIEW.REFRESH procedure that will refresh both mviews for us. If we supply a TRUE value to the “nested” parameter while executing a refresh of the top mview, Oracle will refresh the bottom mview first, followed by the top mview. Exactly what we want.

A final look at the code

This code was tested on both 10g and 11g versions of Oracle. We started with the following materialized view definition that failed to create an mview.

And we ended with solution that used two materialized views to remove the COUNT(DISTINCT) from our query without changing the final structure or data in the query. For the refresh of the mview, we changed our call to DBMS_MVIEW.REFRESH to pass in a TRUE value to the “nested” parameter.

Dallas Willett
Blue Gecko – Remote DBA Services

Contact Us

Leave a Reply

Your email address will not be published.
Required fields are marked (*).