Select Page

Implementing COUNT(DISTINCT) with Fast Refresh Materialized Views

Dallas Willett | | December 24, 2010

Last week when we talked about how to implement a fast refresh materialized view with a MAX aggregate function. This week we will tackle another restriction on fast refresh materialized views. We will work through an example on how to modify a materialized view that includes a COUNT(DISTINCT) in order to get it to fast refresh.

An example COUNT(DISTINCT) query

One of the first SQL queries you probably learned to write was to get the number of rows using COUNT(*). If you want to get the number of distinct values from a set of rows, you’ll want to change this COUNT(*) into a COUNT(DISTINCT). Let’s code up an example with both COUNT(*) and COUNT(DISTINCT) using the same orders table that we used last week. Here is that table with some sample data.

  CREATE TABLE orders (
    order_id                       NUMBER(38)    NOT NULL,
    customer_id                    NUMBER(38)    NOT NULL,
    order_date                     DATE          NOT NULL,
    state                          CHAR(1)       NOT NULL,
    order_amount                   NUMBER(18,2)  NULL,
    CONSTRAINT orders_pk PRIMARY KEY (order_id)
  );

  SQL> SELECT * FROM orders ORDER BY TRUNC(order_date), customer_id;

    ORDER_ID CUSTOMER_ID ORDER_DAT S ORDER_AMOUNT
  ---------- ----------- --------- - ------------
           5        1010 06-DEC-10 C        42.42
           3        1010 07-DEC-10 C       123.41
           4        1010 07-DEC-10 C        55.01
           7        1020 07-DEC-10 C        99.15
           8        1020 07-DEC-10 C        74.85
           2        1010 08-DEC-10 O        12.99
           1        1010 08-DEC-10 C         5.05
           6        1020 08-DEC-10 O
           9        1030 08-DEC-10 O        13.33

For our COUNT(DISTINCT) example, let’s group by each day and get both the number of orders for that day as well as the number of distinct customers who placed orders on that day. We’ll use this example SQL query as the basis for our materialized view (mview).

  SQL> SELECT TRUNC(order_date) as order_day
       , COUNT(*) as num_daily_orders
       , COUNT(DISTINCT customer_id) as num_daily_customers
  FROM orders
  GROUP BY TRUNC(order_date)
  ORDER BY TRUNC(order_date);

  ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS
  --------- ---------------- -------------------
  06-DEC-10                1                   1
  07-DEC-10                4                   2
  08-DEC-10                4                   3

We can see in our example data that on December 8th, there were four orders placed by three different customers.

What happens when we try to use COUNT(DISTINCT) in a fast refresh materialized view?

Let’s try to create a fast refresh mview with the example query we came up with above. We will need a materialized view log (mlog) on the orders table, so let’s create that first and see what happens when we try to create the mview.

  CREATE MATERIALIZED VIEW LOG on orders
   WITH PRIMARY KEY, ROWID,
   SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW orders_by_day_mv
   REFRESH FAST
   AS
  SELECT TRUNC(order_date) as order_day
       , COUNT(*) as num_daily_orders
       , COUNT(DISTINCT customer_id) as num_daily_customers
  FROM orders
  GROUP BY TRUNC(order_date);

  ERROR at line 7:
  ORA-12015: cannot create a fast refresh materialized view from a complex query

We get an ORA-12015 error, telling us that the mview cannot be created. Maybe DBMS_ADVISOR has something to offer us in terms of fixing this?

  SQL> set serveroutput on
  SQL> DECLARE
   v_task_name varchar2(30);
   BEGIN
   DBMS_ADVISOR.TUNE_MVIEW(v_task_name,'CREATE MATERIALIZED VIEW
    orders_by_day_mv REFRESH FAST AS
    SELECT TRUNC(order_date) as order_day, COUNT(*) as num_daily_orders,
    COUNT(DISTINCT customer_id) as num_daily_customers
    FROM orders GROUP BY TRUNC(order_date)');
    dbms_output.put_line(v_task_name);
  END;
  /

  TASK_433
  PL/SQL procedure successfully completed.

  SELECT * FROM user_tune_mview WHERE task_name = 'TASK_433' ORDER BY action_id;

  TASK_NAME                       ACTION_ID SCRIPT_TYPE
  ------------------------------ ---------- --------------
  STATEMENT
  --------------------------------------------------------------------------------
  TASK_433                                3 IMPLEMENTATION
  CREATE MATERIALIZED VIEW EXAMPLE.ORDERS_BY_DAY_MV   REFRESH FAST WITH ROWID DISA
  BLE QUERY REWRITE AS SELECT EXAMPLE.ORDERS.ORDER_DATE C1, EXAMPLE.ORDERS.CUSTOME
  R_ID C2, COUNT(*) M1 FROM EXAMPLE.ORDERS GROUP BY EXAMPLE.ORDERS.ORDER_DATE, EXA
  MPLE.ORDERS.CUSTOMER_ID

  TASK_433                                4 UNDO
  DROP MATERIALIZED VIEW EXAMPLE.ORDERS_BY_DAY_MV

What Oracle is basically telling us to do here is to remove the COUNT(DISTINCT) from our mview query and to change the GROUP BY clause from TRUNC(order_date) to a combination of order_date and customer_id. Not what we wanted, but somewhat helpful, as we will see later on.

It looks like we will be unable to implement a fast refresh mview that has a COUNT(DISTINCT) in it. So is this the end of the line? Are we stuck? No. We can create the fast refresh mview with the structure and view of the data the way we want, we’ll just need to find another way without the COUNT(DISTINCT).

A workaround for COUNT(DISTINCT) in a fast refresh materialized view

If we cannot create a fast refresh mview that includes a COUNT(DISTINCT), then perhaps we can rewrite the mview to get rid of the COUNT(DISTINCT). We still want the data in the same format that COUNT(DISTINCT) provided us, we just want to change the implementation. So how can we do this? The trick will be to create an mview on top of an mview.

For our two mview solution, we will want one mview that looks like the original SQL query. Let’s call this the “top” mview and this will be the mview that our end users will query. The “bottom” mview will look similar to the materialized view that DBMS_ADVISOR suggested. The two mviews will have different primary keys. The top mview will have a PK of order_day and the bottom mview will have a PK of order_day and customer_id. Notice how the bottom mview has the additional customer_id in it’s primary key and customer_id was also the column we were doing the COUNT(DISTINCT) on.

Let’s create this bottom mview first and name it “orders_by_day_and_cust_mv”. We’ll build the primary key for this mview based on the columns we are grouping by.

  CREATE MATERIALIZED VIEW orders_by_day_and_cust_mv
   REFRESH FAST
   AS
  SELECT TRUNC(order_date) as order_day
       , customer_id
       , COUNT(*) as num_daily_customer_orders
  FROM orders
  GROUP BY TRUNC(order_date)
         , customer_id;

  ALTER TABLE orders_by_day_and_cust_mv
    ADD CONSTRAINT pk_orders_by_day_and_cust_mv
        PRIMARY KEY (order_day, customer_id);

The top mview will query this bottom mview we just created. We will implement the COUNT(*) from our original query as a SUM of num_daily_customer_orders and the COUNT(DISTINCT) as a COUNT of num_daily_customer_orders. The bottom mview will need a materialized view log as well, so we’ll create that first.

  CREATE MATERIALIZED VIEW LOG on orders_by_day_and_cust_mv
   WITH PRIMARY KEY, ROWID,
        SEQUENCE (num_daily_customer_orders) INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW orders_by_day_mv
   REFRESH FAST AS
  SELECT order_day
       , SUM(num_daily_customer_orders) as num_daily_orders
       , COUNT(num_daily_customer_orders) as num_daily_customers
  FROM orders_by_day_and_cust_mv
  GROUP BY order_day;

  SQL> SELECT * FROM orders_by_day_mv ORDER BY order_day;

  ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS
  --------- ---------------- -------------------
  07-DEC-10                1                   1
  08-DEC-10                4                   2
  09-DEC-10                4                   3

So far so good. We have removed the COUNT(DISTINCT) from our mviews and the data looks exactly like the output of our example query. We missed something though, which causes us to get an error when we UPDATE some data and then try to refresh the mviews. Let’s see what that looks like and then we’ll fix that in a minute.

  SQL> UPDATE orders
  SET order_date = order_date+1 WHERE customer_id = 1030;

  1 row updated.

  SQL> COMMIT;

  Commit complete.

  SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST');

  PL/SQL procedure successfully completed.

  SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST');
  BEGIN DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST'); END;

  *
  ERROR at line 1:
  ORA-32314: REFRESH FAST of "EXAMPLE"."ORDERS_BY_DAY_MV" unsupported after
  deletes/updates
  ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2251
  ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2457
  ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2426
  ORA-06512: at line 1

This ORA-32314 is easy enough to resolve. We just forgot to add an addition column because of the SUM aggregate function. Oracle requires us to have both COUNT(*) and COUNT(column_name) on our fast refresh mviews if we have SUM(column_name). In this example, we have SUM(num_daily_customer_orders), and had already added COUNT(num_daily_customer_orders), so let’s just add the missing COUNT(*) and try our UPDATE again.

  CREATE MATERIALIZED VIEW orders_by_day_mv
   REFRESH FAST AS
  SELECT order_day
       , SUM(num_daily_customer_orders) as num_daily_orders
       , COUNT(num_daily_customer_orders) as num_daily_customers
       , COUNT(*) as prevent_errors1
  FROM orders_by_day_and_cust_mv
  GROUP BY order_day;

  SQL> UPDATE orders
  SET order_date = order_date+1 WHERE customer_id = 1030;

  1 row updated.

  SQL> COMMIT;

  Commit complete.

  SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST');

  PL/SQL procedure successfully completed.

  SQL> EXEC DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST');

  PL/SQL procedure successfully completed.

  SQL> SELECT * FROM orders_by_day_mv ORDER BY order_day;

  ORDER_DAY NUM_DAILY_ORDERS NUM_DAILY_CUSTOMERS PREVENT_ERRORS1
  --------- ---------------- ------------------- ---------------
  06-DEC-10                1                   1               1
  07-DEC-10                4                   2               2
  08-DEC-10                3                   2               2
  09-DEC-10                1                   1               1

It works just the way we wanted. The only thing left to decide is how we are going to refresh both the top and bottom mviews together.

Your choice on how to fast refresh the two mview solution

We need a plan to refresh both mviews in our two mview approach to this problem. Last week when we talked about how to get an mview to fast refresh with a MAX aggregate function, we ended up creating a PL/SQL procedure to do our refresh. We had some custom code in that example, so a PL/SQL procedure was the way to go. We could use that same approach here, or we could use the built-in functionality in the DBMS_MVIEW.REFRESH procedure to handle this. The choice is yours.

PL/SQL procedure approach
This is fairly straight forward. We want to refresh the bottom mview before refreshing the top mview. This could be something as simple as:

  CREATE OR REPLACE PROCEDURE ref_orders_by_day_mv AS
  BEGIN
    DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv','FAST');
    DBMS_MVIEW.REFRESH('orders_by_day_mv','FAST');
  END;
  /

Or we might modify the procedure to optionally support a complete refresh of both mviews:

  CREATE OR REPLACE PROCEDURE ref_orders_by_day_mv
  (p_refresh_mode IN VARCHR2 DEFAULT 'FAST') AS
  BEGIN
    DBMS_MVIEW.REFRESH('orders_by_day_and_cust_mv',p_refresh_mode);
    DBMS_MVIEW.REFRESH('orders_by_day_mv',p_refresh_mode);
  END;
  /

Nested refresh approach
Last week we were executing a MERGE statement inside our PL/SQL refresh procedure. In this example, we are using only the functionality of mviews as provided by Oracle, and Oracle has an optional parameter in the DBMS_MVIEW.REFRESH procedure that will refresh both mviews for us. If we supply a TRUE value to the “nested” parameter while executing a refresh of the top mview, Oracle will refresh the bottom mview first, followed by the top mview. Exactly what we want.

  EXEC DBMS_MVIEW.REFRESH(list=>'orders_by_day_mv',method=>'FAST',nested=>TRUE);

A final look at the code

This code was tested on both 10g and 11g versions of Oracle. We started with the following materialized view definition that failed to create an mview.

  CREATE MATERIALIZED VIEW LOG on orders
   WITH PRIMARY KEY, ROWID,
        SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW orders_by_day_mv
   REFRESH FAST
   AS
  SELECT TRUNC(order_date) as order_day
       , COUNT(*) as num_daily_orders
       , COUNT(DISTINCT customer_id) as num_daily_customers
  FROM orders
  GROUP BY TRUNC(order_date);

And we ended with solution that used two materialized views to remove the COUNT(DISTINCT) from our query without changing the final structure or data in the query. For the refresh of the mview, we changed our call to DBMS_MVIEW.REFRESH to pass in a TRUE value to the “nested” parameter.

  CREATE MATERIALIZED VIEW LOG on orders
   WITH PRIMARY KEY, ROWID,
        SEQUENCE (order_date, customer_id) INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW orders_by_day_and_cust_mv
   REFRESH FAST
   AS
  SELECT TRUNC(order_date) as order_day
       , customer_id
       , COUNT(*) as num_daily_customer_orders
  FROM orders
  GROUP BY TRUNC(order_date)
         , customer_id;

  ALTER TABLE orders_by_day_and_cust_mv
    ADD CONSTRAINT pk_orders_by_day_and_cust_mv
        PRIMARY KEY (order_day, customer_id);

  CREATE MATERIALIZED VIEW LOG on orders_by_day_and_cust_mv
   WITH PRIMARY KEY, ROWID,
        SEQUENCE (num_daily_customer_orders) INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW orders_by_day_mv
   REFRESH FAST AS
  SELECT order_day
       , SUM(num_daily_customer_orders) as num_daily_orders
       , COUNT(num_daily_customer_orders) as num_daily_customers
       , COUNT(*) as prevent_errors1
  FROM orders_by_day_and_cust_mv
  GROUP BY order_day;

  EXEC DBMS_MVIEW.REFRESH(list=>'orders_by_day_mv',method=>'FAST',nested=>TRUE);

Dallas Willett
Datavail – Remote DBA Services

12c Upgrade Bug with SQL Tuning Advisor

This blog post outlines steps to take on Oracle upgrade 11.2 to 12.1 if you’re having performance problems. Oracle offers a patch and work around to BUG 20540751.

Megan Elphingstone | March 22, 2017

Oracle EPM Cloud Vs. On-Premises: What’s the Difference?

EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.

Bobby Ellis | April 10, 2018

Scripting Out the Logins, Server Role Assignments, and Server Permissions

Imagine there are over one hundred logins in the source server and you need to migrate them all over to the destination server. Wouldn’t it be awesome if we could automate the process by generating the scripts for the required tasks?

JP Chen | October 1, 2015

Work with Us

Let’s have a conversation about what you need to succeed and how we can help get you there.

CONTACT US

Work for Us

Where do you want to take your career? Explore exciting opportunities to join our team.

EXPLORE JOBS