Creating Materialized Views
normally specify the ROWID clause. In addition, for aggregate materialized views, it
must also contain every column in the table referenced in the materialized view, the
INCLUDING NEW VALUES clause and the SEQUENCE clause. You can typically achieve
better fast refresh performance of local materialized views containing aggregates or
joins by using a WITH COMMIT SCN clause.
An example of a materialized view log is shown as follows where one is created on the
table sales:
CREATE MATERIALIZED VIEW LOG ON sales WITH ROWID
(prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold, amount_sold)
INCLUDING NEW VALUES;
Alternatively, you could create a commit SCN-based materialized view log as follows:
CREATE MATERIALIZED VIEW LOG ON sales WITH ROWID
(prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold, amount_sold),
COMMIT SCN INCLUDING NEW VALUES;
Oracle recommends that the keyword SEQUENCE be included in your materialized
view log statement unless you are sure that you will never perform a mixed DML
operation (a combination of INSERT, UPDATE, or DELETE operations on multiple
tables). The SEQUENCE column is required in the materialized view log to support fast
refresh with a combination of INSERT, UPDATE, or DELETE statements on multiple
tables. You can, however, add the SEQUENCE number to the materialized view log
after it has been created.
The boundary of a mixed DML operation is determined by whether the materialized
view is ON COMMIT or ON DEMAND.
■ For ON COMMIT, the mixed DML statements occur within the same transaction
because the refresh of the materialized view will occur upon commit of this
transaction.
■ For ON DEMAND, the mixed DML statements occur between refreshes. The
following example of a materialized view log illustrates where one is created on
the table sales that includes the SEQUENCE keyword:
CREATE MATERIALIZED VIEW LOG ON sales WITH SEQUENCE, ROWID
(prod_id, cust_id, time_id, channel_id, promo_id,
quantity_sold, amount_sold) INCLUDING NEW VALUES;
Using the FORCE Option with Materialized View Logs
If you specify FORCE and any items specified with the ADD clause have already been
specified for the materialized view log, Oracle does not return an error, but silently
ignores the existing elements and adds to the materialized view log any items that do
not already exist in the log. For example, if you used a filter column such as cust_id
and this column already existed, Oracle Database ignores the redundancy and does
not return an error.
Materialized View Log Purging
Purging materialized view logs can be done during the materialized view refresh
process or deferred until later, thus improving refresh performance time. You can
choose different options for when the purge will occur, using a PURGE clause, as in the
following:
CREATE MATERIALIZED VIEW LOG ON sales
PURGE START WITH sysdate NEXT sysdate+1
WITH ROWID
Basic Materialized Views 9-25
Creating Materialized Views
(prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold, amount_sold)
INCLUDING NEW VALUES;
You can also query USER_MVIEW_LOGS for purge information, as in the following:
SELECT PURGE_DEFERRED, PURGE_INTERVAL, LAST_PURGE_DATE, LAST_PURGE_STATUS
FROM USER_MVIEW_LOGS
WHERE LOG_OWNER "SH" AND MASTER = 'SALES';
In addition to setting the purge when creating a materialized view log, you can also
modify an existing materialized view log by issuing a statement resembling the
following:
ALTER MATERIALIZED VIEW LOG ON sales PURGE IMMEDIATE;
See Also: Oracle Database SQL Language Reference for more
information regarding materialized view log syntax
Using Oracle Enterprise Manager
A materialized view can also be created using Enterprise Manager by selecting the
materialized view object type. There is no difference in the information required if this
approach is used.
Using Materialized Views with NLS Parameters
When using certain materialized views, you must ensure that your NLS parameters
are the same as when you created the materialized view. Materialized views with this
restriction are as follows:
■ Expressions that may return different values, depending on NLS parameter
settings. For example, (date > "01/02/03") or (rate <= "2.150") are NLS
parameter dependent expressions.
■ Equijoins where one side of the join is character data. The result of this equijoin
depends on collation and this can change on a session basis, giving an incorrect
result in the case of query rewrite or an inconsistent materialized view after a
refresh operation.
■ Expressions that generate internal conversion to character data in the SELECT list
of a materialized view, or inside an aggregate of a materialized aggregate view.
This restriction does not apply to expressions that involve only numeric data, for
example, a+b where a and b are numeric fields.
Adding Comments to Materialized Views
You can add a comment to a materialized view. For example, the following statement
adds a comment to data dictionary views for the existing materialized view:
COMMENT ON MATERIALIZED VIEW sales_mv IS 'sales materialized view';
To view the comment after the preceding statement execution, the user can query the
catalog views, {USER, DBA} ALL_MVIEW_COMMENTS. For example, consider the
following example:
SELECT MVIEW_NAME, COMMENTS
FROM USER_MVIEW_COMMENTS WHERE MVIEW_NAME = 'SALES_MV';
The output will resemble the following:
9-26 Oracle Database Data Warehousing Guide
Registering Existing Materialized Views
MVIEW_NAME COMMENTS
----------- -----------------------
SALES_MV sales materialized view
Note: If the compatibility is set to 10.0.1 or higher, COMMENT ON TABLE will not be
allowed for the materialized view container table. The following error message will be
thrown if it is issued.
ORA-12098: cannot comment on the materialized view.
In the case of a prebuilt table, if it has an existing comment, the comment will be
inherited by the materialized view after it has been created. The existing comment will
be prefixed with '(from table)'. For example, table sales_summary was created
to contain sales summary information. An existing comment 'Sales summary
data' was associated with the table. A materialized view of the same name is created
to use the prebuilt table as its container table. After the materialized view creation, the
comment becomes '(from table) Sales summary data'.
However, if the prebuilt table, sales_summary, does not have any comment, the
following comment is added: 'Sales summary data'. Then, if we drop the
materialized view, the comment will be passed to the prebuilt table with the comment:
'(from materialized view) Sales summary data'.
Registering Existing Materialized Views
Some data warehouses have implemented materialized views in ordinary user tables.
Although this solution provides the performance benefits of materialized views, it
does not:
■ Provide query rewrite to all SQL applications.
■ Enable materialized views defined in one application to be transparently accessed
in another application.
■ Generally support fast parallel or fast materialized view refresh.
Because of these limitations, and because existing materialized views can be extremely
large and expensive to rebuild, you should register your existing materialized view
tables whenever possible. You can register a user-defined materialized view with the
CREATE MATERIALIZED VIEW ... ON PREBUILT TABLE statement. Once registered, the
materialized view can be used for query rewrites or maintained by one of the refresh
methods, or both.
The contents of the table must reflect the materialization of the defining query at the
time you register it as a materialized view, and each column in the defining query
must correspond to a column in the table that has a matching datatype. However, you
can specify WITH REDUCED PRECISION to allow the precision of columns in the
defining query to be different from that of the table columns.
The table and the materialized view must have the same name, but the table retains its
identity as a table and can contain columns that are not referenced in the defining
query of the materialized view. These extra columns are known as unmanaged
columns. If rows are inserted during a refresh operation, each unmanaged column of
the row is set to its default value. Therefore, the unmanaged columns cannot have NOT
NULL constraints unless they also have default values.
Materialized views based on prebuilt tables are eligible for selection by query rewrite
provided the parameter QUERY_REWRITE_INTEGRITY is set to STALE_TOLERATED
or TRUSTED. See Chapter 18, "Basic Query Rewrite" for details about integrity levels.
Basic Materialized Views 9-27
Choosing Indexes for Materialized Views
When you drop a materialized view that was created on a prebuilt table, the table still
exists—only the materialized view is dropped.
The following example illustrates the two steps required to register a user-defined
table. First, the table is created, then the materialized view is defined using exactly the
same name as the table. This materialized view sum_sales_tab_mv is eligible for use
in query rewrite.
CREATE TABLE sum_sales_tab
PCTFREE 0 TABLESPACE demo
STORAGE (INITIAL 8M) AS
SELECT s.prod_id, SUM(amount_sold) AS dollar_sales,
SUM(quantity_sold) AS unit_sales
FROM sales s GROUP BY s.prod_id;
CREATE MATERIALIZED VIEW sum_sales_tab_mv
ON PREBUILT TABLE WITHOUT REDUCED PRECISION
ENABLE QUERY REWRITE AS
SELECT s.prod_id, SUM(amount_sold) AS dollar_sales,
SUM(quantity_sold) AS unit_sales
FROM sales s GROUP BY s.prod_id;
You could have compressed this table to save space. See "Storage And Table
Compression" on page 9-16 for details regarding table compression.
In some cases, user-defined materialized views are refreshed on a schedule that is
longer than the update cycle. For example, a monthly materialized view might be
updated only at the end of each month, and the materialized view values always refer
to complete time periods. Reports written directly against these materialized views
implicitly select only data that is not in the current (incomplete) time period. If a
user-defined materialized view already contains a time dimension:
■ It should be registered and then fast refreshed each update cycle.
■ You can create a view that selects the complete time period of interest.
■ The reports should be modified to refer to the view instead of referring directly to
the user-defined materialized view.
If the user-defined materialized view does not contain a time dimension, then:
■ Create a new materialized view that does include the time dimension (if possible).
■ The view should aggregate over the time column in the new materialized view.
Choosing Indexes for Materialized Views
The two most common operations on a materialized view are query execution and fast
refresh, and each operation has different performance requirements. Query execution
might need to access any subset of the materialized view key columns, and might need
to join and aggregate over a subset of those columns. Consequently, query execution
usually performs best if a single-column bitmap index is defined on each materialized
view key column.
In the case of materialized views containing only joins using fast refresh, Oracle
recommends that indexes be created on the columns that contain the rowids to
improve the performance of the refresh operation.
If a materialized view using aggregates is fast refreshable, then an index appropriate
for the fast refresh procedure is created unless USING NO INDEX is specified in the
CREATE MATERIALIZED VIEW statement.
9-28 Oracle Database Data Warehousing Guide
Analyzing Materialized View Capabilities
If the materialized view is partitioned, then, after doing a partition maintenance
operation on the materialized view, the indexes become unusable, and they need to be
rebuilt for fast refresh to work.
See Oracle Database Performance Tuning Guide for information on using the SQL Access
Advisor to determine what indexes are appropriate for your materialized view.
Dropping Materialized Views
Use the DROP MATERIALIZED VIEW statement to drop a materialized view. For
example, consider the following statement:
DROP MATERIALIZED VIEW sales_sum_mv;
This statement drops the materialized view sales_sum_mv. If the materialized view
was prebuilt on a table, then the table is not dropped, but it can no longer be
maintained with the refresh mechanism or used by query rewrite. Alternatively, you
can drop a materialized view using Oracle Enterprise Manager.
Analyzing Materialized View Capabilities
You can use the DBMS_MVIEW.EXPLAIN_MVIEW procedure to learn what is possible
with a materialized view or potential materialized view. In particular, this procedure
enables you to determine:
■ If a materialized view is fast refreshable
■ What types of query rewrite you can perform with this materialized view
■ Whether PCT refresh is possible
Using this procedure is straightforward. You simply call DBMS_MVIEW.EXPLAIN_
MVIEW, passing in as a single parameter the schema and materialized view name for
an existing materialized view. Alternatively, you can specify the SELECT string for a
potential materialized view or the complete CREATE MATERIALIZED VIEW statement.
The materialized view or potential materialized view is then analyzed and the results
are written into either a table called MV_CAPABILITIES_TABLE, which is the default,
or to an array called MSG_ARRAY.
Note that you must run the utlxmv.sql script prior to calling EXPLAIN_MVIEW
except when you are placing the results in MSG_ARRAY. The script is found in the
admin directory. It is to create the MV_CAPABILITIES_TABLE in the current schema.
An explanation of the various capabilities is in Table 9–7 on page 9-32, and all the
possible messages are listed in Table 9–8 on page 9-34.
Using the DBMS_MVIEW.EXPLAIN_MVIEW Procedure
The EXPLAIN_MVIEW procedure has the following parameters:
■ stmt_id
An optional parameter. A client-supplied unique identifier to associate output
rows with specific invocations of EXPLAIN_MVIEW.
■ mv
The name of an existing materialized view or the query definition or the entire
CREATE MATERIALIZED VIEW statement of a potential materialized view you
want to analyze.
■ msg-array
Basic Materialized Views 9-29
Analyzing Materialized View Capabilities
The PL/SQL VARRAY that receives the output.
EXPLAIN_MVIEW analyzes the specified materialized view in terms of its refresh and
rewrite capabilities and inserts its results (in the form of multiple rows) into MV_
CAPABILITIES_TABLE or MSG_ARRAY.
See Also: Oracle Database PL/SQL Packages and Types Reference for
further information about the DBMS_MVIEW package
DBMS_MVIEW.EXPLAIN_MVIEW Declarations
The following PL/SQL declarations that are made for you in the DBMS_MVIEW
package show the order and datatypes of these parameters for explaining an existing
materialized view and a potential materialized view with output to a table and to a
VARRAY.
Explain an existing or potential materialized view with output to MV_
CAPABILITIES_TABLE:
DBMS_MVIEW.EXPLAIN_MVIEW (mv IN VARCHAR2,
stmt_id IN VARCHAR2:= NULL);
Explain an existing or potential materialized view with output to a VARRAY:
DBMS_MVIEW.EXPLAIN_MVIEW (mv IN VARCHAR2,
msg_array OUT SYS.ExplainMVArrayType);
Using MV_CAPABILITIES_TABLE
One of the simplest ways to use DBMS_MVIEW.EXPLAIN_MVIEW is with the MV_
CAPABILITIES_TABLE, which has the following structure:
CREATE TABLE MV_CAPABILITIES_TABLE
(STATEMENT_ID VARCHAR(30), -- Client-supplied unique statement identifier
MVOWNER VARCHAR(30), -- NULL for SELECT based EXPLAIN_MVIEW
MVNAME VARCHAR(30), -- NULL for SELECT based EXPLAIN_MVIEW
CAPABILITY_NAME VARCHAR(30), -- A descriptive name of the particular
-- capability:
-- REWRITE
-- Can do at least full text match
-- rewrite
-- REWRITE_PARTIAL_TEXT_MATCH
-- Can do at leat full and partial
-- text match rewrite
-- REWRITE_GENERAL
-- Can do all forms of rewrite
-- REFRESH
-- Can do at least complete refresh
-- REFRESH_FROM_LOG_AFTER_INSERT
-- Can do fast refresh from an mv log
-- or change capture table at least
-- when update operations are
-- restricted to INSERT
-- REFRESH_FROM_LOG_AFTER_ANY
-- can do fast refresh from an mv log
-- or change capture table after any
-- combination of updates
-- PCT
-- Can do Enhanced Update Tracking on
-- the table named in the RELATED_NAME
-- column. EUT is needed for fast
-- refresh after partitioned
9-30 Oracle Database Data Warehousing Guide
Analyzing Materialized View Capabilities
-- maintenance operations on the table
-- named in the RELATED_NAME column
-- and to do non-stale tolerated
-- rewrite when the mv is partially
-- stale with respect to the table
-- named in the RELATED_NAME column.
-- EUT can also sometimes enable fast
-- refresh of updates to the table
-- named in the RELATED_NAME column
-- when fast refresh from an mv log
-- or change capture table is not
-- possible.
-- See Table 9–7
POSSIBLE CHARACTER(1), -- T = capability is possible
RELATED_TEXT
-- F = capability is not possible
RELATED_NUM
MSGNO VARCHAR(2000), -- Owner.table.column, alias name, and so on
MSGTXT -- related to this message. The specific
SEQ
-- meaning of this column depends on the
-- NSGNO column. See the documentation for
-- DBMS_MVIEW.EXPLAIN_MVIEW() for details.
NUMBER, -- When there is a numeric value
-- associated with a row, it goes here.
INTEGER, -- When available, QSM message # explaining
-- why disabled or more details when
-- enabled.
VARCHAR(2000), -- Text associated with MSGNO.
NUMBER); -- Useful in ORDER BY clause when
-- selecting from this table.
You can use the utlxmv.sql script found in the admin directory to create MV_
CAPABILITIES_TABLE.
Example 9–8 DBMS_MVIEW.EXPLAIN_MVIEW
First, create the materialized view. Alternatively, you can use EXPLAIN_MVIEW on a
potential materialized view using its SELECT statement or the complete CREATE
MATERIALIZED VIEW statement.
CREATE MATERIALIZED VIEW cal_month_sales_mv
BUILD IMMEDIATE
REFRESH FORCE
ENABLE QUERY REWRITE AS
SELECT t.calendar_month_desc, SUM(s.amount_sold) AS dollars
FROM sales s, times t WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;
Then, you invoke EXPLAIN_MVIEW with the materialized view to explain. You need to
use the SEQ column in an ORDER BY clause so the rows will display in a logical order.
If a capability is not possible, N will appear in the P column and an explanation in the
MSGTXT column. If a capability is not possible for multiple reasons, a row is displayed
for each reason.
EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW ('SH.CAL_MONTH_SALES_MV');
SELECT capability_name, possible, SUBSTR(related_text,1,8)
AS rel_text, SUBSTR(msgtxt,1,60) AS msgtxt
FROM MV_CAPABILITIES_TABLE
ORDER BY seq;
Basic Materialized Views 9-31
Analyzing Materialized View Capabilities
CAPABILITY_NAME P REL_TEXT MSGTXT
--------------- - -------- ------
PCT N
REFRESH_COMPLETE Y SALES no partition key or PMARKER in select list
REFRESH_FAST N TIMES relation is not a partitioned table
REWRITE Y SH.TIMES mv log must have new values
PCT_TABLE N SH.TIMES mv log must have ROWID
PCT_TABLE N SH.TIMES mv log does not have all necessary columns
REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have new values
REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have ROWID
REFRESH_FAST_AFTER_INSERT N SH.SALES mv log does not have all necessary columns
REFRESH_FAST_AFTER_INSERT N DOLLARS SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_INSERT N see the reason why
REFRESH_FAST_AFTER_INSERT N SH.TIMES REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ONETAB_DML N SH.SALES COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr)
see the reason why
REFRESH_FAST_AFTER_ONETAB_DML N REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_AFTER_ONETAB_DML N mv log must have sequence
REFRESH_FAST_AFTER_ANY_DML N mv log must have sequence
PCT is not possible on any of the detail
REFRESH_FAST_AFTER_ANY_DML N tables in the materialized view
REFRESH_FAST_AFTER_ANY_DML N
REFRESH_PCT N PCT is not possible on any detail tables
REWRITE_FULL_TEXT_MATCH Y
REWRITE_PARTIAL_TEXT_MATCH Y
REWRITE_GENERAL Y
REWRITE_PCT N
See Also:
■ Chapter 16, "Maintaining the Data Warehouse" for further
details about PCT
■ Chapter 19, "Advanced Query Rewrite" for further details
about PCT
MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details
Table 9–7 lists explanations for values in the CAPABILITY_NAME column.
Table 9–7 CAPABILITY_NAME Column Details
CAPABILITY_NAME Description
PCT If this capability is possible, Partition Change Tracking (PCT) is possible on at least one
detail relation. If this capability is not possible, PCT is not possible with any detail relation
referenced by the materialized view.
REFRESH_COMPLETE If this capability is possible, complete refresh of the materialized view is possible.
REFRESH_FAST If this capability is possible, fast refresh is possible at least under certain circumstances.
REWRITE If this capability is possible, at least full text match query rewrite is possible. If this
capability is not possible, no form of query rewrite is possible.
9-32 Oracle Database Data Warehousing Guide
Analyzing Materialized View Capabilities
Table 9–7 (Cont.) CAPABILITY_NAME Column Details
CAPABILITY_NAME Description
PCT_TABLE If this capability is possible, it is possible with respect to a particular partitioned table in
the top level FROM list. When possible, PCT applies to the partitioned table named in the
RELATED_TEXT column.
PCT is needed to support fast fresh after partition maintenance operations on the table
named in the RELATED_TEXT column.
PCT may also support fast refresh with regard to updates to the table named in the
RELATED_TEXT column when fast refresh from a materialized view log is not possible.
PCT is also needed to support query rewrite in the presence of partial staleness of the
materialized view with regard to the table named in the RELATED_TEXT column.
When disabled, PCT does not apply to the table named in the RELATED_TEXT column. In
this case, fast refresh is not possible after partition maintenance operations on the table
named in the RELATED_TEXT column. In addition, PCT-based refresh of updates to the
table named in the RELATED_TEXT column is not possible. Finally, query rewrite cannot
be supported in the presence of partial staleness of the materialized view with regard to
the table named in the RELATED_TEXT column.
PCT_TABLE_ If this capability is possible, it is possible with respect to a particular partitioned table in
REWRITE the top level FROM list. When possible, PCT applies to the partitioned table named in the
RELATED_TEXT column.
This capability is needed to support query rewrite against this materialized view in partial
stale state with regard to the table named in the RELATED_TEXT column.
When disabled, query rewrite cannot be supported if this materialized view is in partial
stale state with regard to the table named in the RELATED_TEXT column.
REFRESH_FAST_ If this capability is possible, fast refresh from a materialized view log is possible at least in
AFTER_INSERT the case where the updates are restricted to INSERT operations; complete refresh is also
possible. If this capability is not possible, no form of fast refresh from a materialized view
log is possible.
REFRESH_FAST_ If this capability is possible, fast refresh from a materialized view log is possible regardless
AFTER_ONETAB_DML of the type of update operation, provided all update operations are performed on a single
table. If this capability is not possible, fast refresh from a materialized view log may not be
possible when the update operations are performed on multiple tables.
REFRESH_FAST_ If this capability is possible, fast refresh from a materialized view log is possible regardless
AFTER_ANY_DML of the type of update operation or the number of tables updated. If this capability is not
possible, fast refresh from a materialized view log may not be possible when the update
operations (other than INSERT) affect multiple tables.
REFRESH_FAST_PCT If this capability is possible, fast refresh using PCT is possible. Generally, this means that
refresh is possible after partition maintenance operations on those detail tables where PCT
is indicated as possible.
REWRITE_FULL_ If this capability is possible, full text match query rewrite is possible. If this capability is
TEXT_MATCH not possible, full text match query rewrite is not possible.
REWRITE_PARTIAL_ If this capability is possible, at least full and partial text match query rewrite are possible.
TEXT_MATCH If this capability is not possible, at least partial text match query rewrite and general query
rewrite are not possible.
REWRITE_GENERAL If this capability is possible, all query rewrite capabilities are possible, including general
query rewrite and full and partial text match query rewrite. If this capability is not
possible, at least general query rewrite is not possible.
REWRITE_PCT If this capability is possible, query rewrite can use a partially stale materialized view even
in QUERY_REWRITE_INTEGRITY = ENFORCED or TRUSTED modes. When this capability
is not possible, query rewrite can use a partially stale materialized view only in QUERY_
REWRITE_INTEGRITY = STALE_TOLERATED mode.
MV_CAPABILITIES_TABLE Column Details
Table 9–8 lists the semantics for RELATED_TEXT and RELATED_NUM columns.
Basic Materialized Views 9-33
Analyzing Materialized View Capabilities
Table 9–8 MV_CAPABILITIES_TABLE Column Details
MSGNO MSGTXT RELATED_NUM RELATED_TEXT
NULL NULL For PCT capability only:
[owner.]name of the table upon which
2066 This statement resulted in an Oracle Oracle error number PCT is enabled
2067
2068 error that occurred [owner.]name of relation for which
2069 PCT is not supported
2070 No partition key or PMARKER or join
2071 dependent expression in SELECT [owner.]name of relation for which
2072 list PCT is not supported
2077
2078 Relation is not partitioned [owner.]name of relation for which
2079 PCT is not supported
2080 PCT not supported with
2081 multicolumn partition key [owner.]name of relation for which
2082 PCT is not supported
PCT not supported with this type of
partitioning [owner.]name of relation for which
PCT is not supported
Internal error: undefined PCT The unrecognized
failure code numeric PCT failure [owner.]table_name of table upon
code which the materialized view log is
needed
Requirements not satisfied for fast
refresh of nested materialized view [owner.]table_name of table upon
which the materialized view log is
Materialized view log is newer than needed
last full refresh
[owner.]table_name of table upon
Materialized view log must have which the materialized view log is
new values needed
Materialized view log must have [owner.]table_name of table upon
ROWID which the materialized view log is
needed
Materialized view log must have
primary key [owner.]table_name of table upon
which the materialized view log is
Materialized view log does not have needed
all necessary columns
[owner.]table_name of table upon
Problem with materialized view log which the materialized view log is
needed
2099 Materialized view references a Offset from the
[owner.]name of the table or view in
2126 remote table or view in the FROM list SELECT keyword to question
2129
2130 the table or view in Name of the first different node, or
NULL if the first different node is local
question
[owner.]name of the table involved
Multiple master sites with the join or filter condition (or NULL
when not available)
Join or filter condition(s) are
complex The alias name in the SELECT list of the
expression in question
Expression not supported for fast Offset from the
refresh SELECT keyword to
the expression in
question
9-34 Oracle Database Data Warehousing Guide
Analyzing Materialized View Capabilities
Table 9–8 (Cont.) MV_CAPABILITIES_TABLE Column Details
MSGNO MSGTXT RELATED_NUM RELATED_TEXT
The alias name of the first different
2150 SELECT lists must be identical Offset from the select item in the SELECT list
across the UNION operator SELECT keyword to
the first different [owner.]name of relation for which
select item in the PCT_TABLE_REWRITE is not enabled
SELECT list [owner.]name of relation for which
PCT is not enabled
2182 PCT is enabled through a join
dependency [owner.]name of relation for which
PCT is not enabled
2183 Expression to enable PCT not in The unrecognized [owner.]name of relation for which
PCT_TABLE_REWRITE is not enabled
PARTITION BY of analytic function numeric PCT failure
or model code
2184 Expression to enable PCT cannot be
rolled up
2185 No partition key or PMARKER in the
SELECT list
2186 GROUP OUTER JOIN is present
2187 Materialized view on external table
Basic Materialized Views 9-35
Analyzing Materialized View Capabilities
9-36 Oracle Database Data Warehousing Guide
10
10 Advanced Materialized Views
This chapter discusses advanced topics in using materialized views. It contains the
following topics:
■ Partitioning and Materialized Views
■ Materialized Views in Analytic Processing Environments
■ Materialized Views and Models
■ Invalidating Materialized Views
■ Security Issues with Materialized Views
■ Altering Materialized Views
Partitioning and Materialized Views
Because of the large volume of data held in a data warehouse, partitioning is an
extremely useful option when designing a database. Partitioning the fact tables
improves scalability, simplifies system administration, and makes it possible to define
local indexes that can be efficiently rebuilt. Partitioning the fact tables also improves
the opportunity of fast refreshing the materialized view because this may enable
Partition Change Tracking (PCT) refresh on the materialized view. Partitioning a
materialized view also has benefits for refresh, because the refresh procedure can then
use parallel DML in more scenarios and PCT-based refresh can use truncate partition
to efficiently maintain the materialized view. See Oracle Database VLDB and Partitioning
Guide for further details about partitioning.
Partition Change Tracking
It is possible and advantageous to track freshness to a finer grain than the entire
materialized view. The ability to identify which rows in a materialized view are
affected by a certain detail table partition, is known as Partition Change Tracking.
When one or more of the detail tables are partitioned, it may be possible to identify the
specific rows in the materialized view that correspond to a modified detail partition(s);
those rows become stale when a partition is modified while all other rows remain
fresh.
You can use PCT to identify which materialized view rows correspond to a particular
partition. PCT is also used to support fast refresh after partition maintenance
operations on detail tables. For instance, if a detail table partition is truncated or
dropped, the affected rows in the materialized view are identified and deleted.
Identifying which materialized view rows are fresh or stale, rather than considering
the entire materialized view as stale, allows query rewrite to use those rows that are
Advanced Materialized Views 10-1
Partitioning and Materialized Views
fresh while in QUERY_REWRITE_INTEGRITY = ENFORCED or TRUSTED modes.
Several views, such as DBA_MVIEW_DETAIL_PARTITION, detail which partitions are
stale or fresh. Oracle does not rewrite against partial stale materialized views if
partition change tracking on the changed table is enabled by the presence of join
dependent expression in the materialized view. See "Join Dependent Expression" on
page 10-3 for more information.
To support PCT, a materialized view must satisfy the following requirements:
■ At least one of the detail tables referenced by the materialized view must be
partitioned.
■ Partitioned tables must use either range, list or composite partitioning.
■ The top level partition key must consist of only a single column.
■ The materialized view must contain either the partition key column or a partition
marker or ROWID or join dependent expression of the detail table. See Oracle
Database PL/SQL Packages and Types Reference for details regarding the DBMS_
MVIEW.PMARKER function.
■ If you use a GROUP BY clause, the partition key column or the partition marker or
ROWID or join dependent expression must be present in the GROUP BY clause.
■ If you use an analytic window function or the MODEL clause, the partition key
column or the partition marker or ROWID or join dependent expression must be
present in their respective PARTITION BY subclauses.
■ Data modifications can only occur on the partitioned table. If PCT refresh is being
done for a table which has join dependent expression in the materialized view,
then data modifications should not have occurred in any of the join dependent
tables.
■ The COMPATIBILITY initialization parameter must be a minimum of 9.0.0.0.0.
■ PCT is not supported for a materialized view that refers to views, remote tables, or
outer joins.
Partition Key
Partition change tracking requires sufficient information in the materialized view to be
able to correlate a detail row in the source partitioned detail table to the corresponding
materialized view row. This can be accomplished by including the detail table
partition key columns in the SELECT list and, if GROUP BY is used, in the GROUP BY
list.
Consider an example of a materialized view storing daily customer sales. The
following example uses the sh sample schema and the three detail tables sales,
products, and times to create the materialized view. sales table is partitioned by
time_id column and products is partitioned by the prod_id column. times is not
a partitioned table.
Example 10–1 Partition Key
The following is an example:
CREATE MATERIALIZED VIEW LOG ON SALES WITH ROWID
(prod_id, time_id, quantity_sold, amount_sold) INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON PRODUCTS WITH ROWID
(prod_id, prod_name, prod_desc) INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON TIMES WITH ROWID
(time_id, calendar_month_name, calendar_year) INCLUDING NEW VALUES;
10-2 Oracle Database Data Warehousing Guide
Partitioning and Materialized Views
CREATE MATERIALIZED VIEW cust_dly_sales_mv
BUILD DEFERRED REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE AS
SELECT s.time_id, p.prod_id, p.prod_name, COUNT(*),
SUM(s.quantity_sold), SUM(s.amount_sold),
COUNT(s.quantity_sold), COUNT(s.amount_sold)
FROM sales s, products p, times t
WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id
GROUP BY s.time_id, p.prod_id, p.prod_name;
For cust_dly_sales_mv, PCT is enabled on both the sales table and products
table because their respective partitioning key columns time_id and prod_id are in
the materialized view.
Join Dependent Expression
An expression consisting of columns from tables directly or indirectly joined through
equijoins to the partitioned detail table on the partitioning key and which is either a
dimensional attribute or a dimension hierarchical parent of the joining key is called a
join dependent expression. The set of tables in the path to detail table are called join
dependent tables. Consider the following:
SELECT s.time_id, t.calendar_month_name
FROM sales s, times t WHERE s.time_id = t.time_id;
In this query, times table is a join dependent table since it is joined to sales table on
the partitioning key column time_id. Moreover, calendar_month_name is a
dimension hierarchical attribute of times.time_id, because calendar_month_
name is an attribute of times.mon_id and times.mon_id is a dimension
hierarchical parent of times.time_id. Hence, the expression calendar_month_
name from times tables is a join dependent expression. Let's consider another
example:
SELECT s.time_id, y.calendar_year_name
FROM sales s, times_d d, times_m m, times_y y
WHERE s.time_id = d.time_id AND d.day_id = m.day_id AND m.mon_id = y.mon_id;
Here, times table is denormalized into times_d, times_m and times_y tables. The
expression calendar_year_name from times_y table is a join dependent
expression and the tables times_d, times_m and times_y are join dependent tables.
This is because times_y table is joined indirectly through times_m and times_d
tables to sales table on its partitioning key column time_id.
This lets users create materialized views containing aggregates on some level higher
than the partitioning key of the detail table. Consider the following example of
materialized view storing monthly customer sales.
Example 10–2 Join Dependent Expression
Assuming the presence of materialized view logs defined earlier, the materialized
view can be created using the following DDL:
CREATE MATERIALIZED VIEW cust_mth_sales_mv
BUILD DEFERRED REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE AS
SELECT t.calendar_month_name, p.prod_id, p.prod_name, COUNT(*),
SUM(s.quantity_sold), SUM(s.amount_sold),
COUNT(s.quantity_sold), COUNT(s.amount_sold)
FROM sales s, products p, times t
Advanced Materialized Views 10-3
Partitioning and Materialized Views
WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id
GROUP BY t.calendar_month_name, p.prod_id, p.prod_name;
Here, you can correlate a detail table row to its corresponding materialized view row
using the join dependent table times and the relationship that times.calendar_
month_name is a dimensional attribute determined by times.time_id. This enables
partition change tracking on sales table. In addition to this, PCT is enabled on
products table because of presence of its partitioning key column prod_id in the
materialized view.
Partition Marker
The DBMS_MVIEW.PMARKER function is designed to significantly reduce the
cardinality of the materialized view (see Example 10–3 for an example). The function
returns a partition identifier that uniquely identifies the partition for a specified row
within a specified partition table. Therefore, the DBMS_MVIEW.PMARKER function is
used instead of the partition key column in the SELECT and GROUP BY clauses.
Unlike the general case of a PL/SQL function in a materialized view, use of the DBMS_
MVIEW.PMARKER does not prevent rewrite with that materialized view even when the
rewrite mode is QUERY_REWRITE_INTEGRITY = ENFORCED.
As an example of using the PMARKER function, consider calculating a typical number,
such as revenue generated by a product category during a given year. If there were
1000 different products sold each month, it would result in 12,000 rows in the
materialized view.
Example 10–3 Partition Marker
Consider an example of a materialized view storing the yearly sales revenue for each
product category. With approximately hundreds of different products in each product
category, including the partitioning key column prod_id of the products table in
the materialized view would substantially increase the cardinality. Instead, this
materialized view uses the DBMS_MVIEW.PMARKER function, which increases the
cardinality of materialized view by a factor of the number of partitions in the
products table.
CREATE MATERIALIZED VIEW prod_yr_sales_mv
BUILD DEFERRED
REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE AS
SELECT DBMS_MVIEW.PMARKER(p.rowid), p.prod_category, t.calendar_year, COUNT(*),
SUM(s.amount_sold), SUM(s.quantity_sold),
COUNT(s.amount_sold), COUNT(s.quantity_sold)
FROM sales s, products p, times t
WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id
GROUP BY DBMS_MVIEW.PMARKER (p.rowid), p.prod_category, t.calendar_year;
prod_yr_sales_mv includes the DBMS_MVIEW.PMARKER function on the products
table in its SELECT list. This enables partition change tracking on products table
with significantly less cardinality impact than grouping by the partition key column
prod_id. In this example, the desired level of aggregation for the prod_yr_sales_
mv is to group by products.prod_category. Using the DBMS_MVIEW.PMARKER
function, the materialized view cardinality is increased only by a factor of the number
of partitions in the products table. This would generally be significantly less than the
cardinality impact of including the partition key columns.
Note that partition change tracking is enabled on sales table because of presence of
join dependent expression calendar_year in the SELECT list.
10-4 Oracle Database Data Warehousing Guide
Partitioning and Materialized Views
Partial Rewrite
A subsequent INSERT statement adds a new row to the sales_part3 partition of
table sales. At this point, because cust_dly_sales_mv has PCT available on table
sales using a partition key, Oracle can identify the stale rows in the materialized
view cust_dly_sales_mv corresponding to sales_part3 partition (The other
rows are unchanged in their freshness state). Query rewrite cannot identify the fresh
portion of materialized views cust_mth_sales_mv and prod_yr_sales_mv
because PCT is available on table sales using join dependent expressions. Query
rewrite can determine the fresh portion of a materialized view on changes to a detail
table only if PCT is available on the detail table using a partition key or partition
marker.
Partitioning a Materialized View
Partitioning a materialized view involves defining the materialized view with the
standard Oracle partitioning clauses, as illustrated in the following example. This
statement creates a materialized view called part_sales_mv, which uses three
partitions, can be fast refreshed, and is eligible for query rewrite:
CREATE MATERIALIZED VIEW part_sales_mv
PARALLEL PARTITION BY RANGE (time_id)
(PARTITION month1
VALUES LESS THAN (TO_DATE('31-12-1998', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
TABLESPACE sf1,
PARTITION month2
VALUES LESS THAN (TO_DATE('31-12-1999', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
TABLESPACE sf2,
PARTITION month3
VALUES LESS THAN (TO_DATE('31-12-2000', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
TABLESPACE sf3)
BUILD DEFERRED
REFRESH FAST
ENABLE QUERY REWRITE AS
SELECT s.cust_id, s.time_id,
SUM(s.amount_sold) AS sum_dol_sales, SUM(s.quantity_sold) AS sum_unit_sales
FROM sales s GROUP BY s.time_id, s.cust_id;
Partitioning a Prebuilt Table
Alternatively, a materialized view can be registered to a partitioned prebuilt table as
illustrated in the following example:
CREATE TABLE part_sales_tab_mv(time_id, cust_id, sum_dollar_sales, sum_unit_sale)
PARALLEL PARTITION BY RANGE (time_id)
(PARTITION month1
VALUES LESS THAN (TO_DATE('31-12-1998', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
TABLESPACE sf1,
PARTITION month2
VALUES LESS THAN (TO_DATE('31-12-1999', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
Advanced Materialized Views 10-5
Partitioning and Materialized Views
TABLESPACE sf2,
PARTITION month3
VALUES LESS THAN (TO_DATE('31-12-2000', 'DD-MM-YYYY'))
PCTFREE 0
STORAGE (INITIAL 8M)
TABLESPACE sf3) AS
SELECT s.time_id, s.cust_id, SUM(s.amount_sold) AS sum_dollar_sales,
SUM(s.quantity_sold) AS sum_unit_sales
FROM sales s GROUP BY s.time_id, s.cust_id;
CREATE MATERIALIZED VIEW part_sales_tab_mv
ON PREBUILT TABLE
ENABLE QUERY REWRITE AS
SELECT s.time_id, s.cust_id, SUM(s.amount_sold) AS sum_dollar_sales,
SUM(s.quantity_sold) AS sum_unit_sales
FROM sales s GROUP BY s.time_id, s.cust_id;
In this example, the table part_sales_tab_mv has been partitioned over three
months and then the materialized view was registered to use the prebuilt table. This
materialized view is eligible for query rewrite because the ENABLE QUERY REWRITE
clause has been included.
Benefits of Partitioning a Materialized View
When a materialized view is partitioned on the partitioning key column or join
dependent expressions of the detail table, it is more efficient to use a TRUNCATE
PARTITION statement to remove one or more partitions of the materialized view
during refresh and then repopulate the partition with new data. Oracle Database uses
this variant of fast refresh (called PCT refresh) with partition truncation if the
following conditions are satisfied in addition to other conditions described in
"Partition Change Tracking" on page 10-1.
■ The materialized view is partitioned on the partitioning key column or join
dependent expressions of the detail table.
■ If PCT is enabled using either the partitioning key column or join expressions, the
materialized view should be range or list partitioned.
■ PCT refresh is nonatomic.
Rolling Materialized Views
When a data warehouse or data mart contains a time dimension, it is often desirable to
archive the oldest information and then reuse the storage for new information. This is
called the rolling window scenario. If the fact tables or materialized views include a
time dimension and are horizontally partitioned by the time attribute, then
management of rolling materialized views can be reduced to a few fast partition
maintenance operations provided the unit of data that is rolled out equals, or is at least
aligned with, the range partitions.
If you plan to have rolling materialized views in your data warehouse, you should
determine how frequently you plan to perform partition maintenance operations, and
you should plan to partition fact tables and materialized views to reduce the amount
of system administration overhead required when old data is aged out. An additional
consideration is that you might want to use data compression on your infrequently
updated partitions.
You are not restricted to using range partitions. For example, a composite partition
using both a time value and a key value could result in a good partition solution for
your data.
10-6 Oracle Database Data Warehousing Guide
Materialized Views in Analytic Processing Environments
See Chapter 16, "Maintaining the Data Warehouse" for further details regarding
CONSIDER FRESH and for details regarding compression.
Materialized Views in Analytic Processing Environments
This section discusses the concepts used by analytic SQL and how relational databases
can handle these types of queries. It also illustrates the best approach for creating
materialized views using a common scenario.
Cubes
While data warehouse environments typically view data in the form of a star schema,
for analytical SQL queries, data is held in the form of a hierarchical cube. A
hierarchical cube includes the data aggregated along the rollup hierarchy of each of its
dimensions and these aggregations are combined across dimensions. It includes the
typical set of aggregations needed for business intelligence queries.
Example 10–4 Hierarchical Cube
Consider a sales data set with two dimensions, each of which has a 4-level hierarchy:
■ Time, which contains (all times), year, quarter, and month.
■ Product, which contains (all products), division, brand, and item.
This means there are 16 aggregate groups in the hierarchical cube. This is because the
four levels of time are multiplied by four levels of product to produce the cube.
Table 10–1 shows the four levels of each dimension.
Table 10–1 ROLLUP By Time and Product
ROLLUP By Time ROLLUP By Product
year, quarter, month division, brand, item
year, quarter division, brand
year division
all times all products
Note that as you increase the number of dimensions and levels, the number of groups
to calculate increases dramatically. This example involves 16 groups, but if you were
to add just two more dimensions with the same number of levels, you would have 4 x
4 x 4 x 4 = 256 different groups. Also, consider that a similar increase in groups occurs
if you have multiple hierarchies in your dimensions. For example, the time dimension
might have an additional hierarchy of fiscal month rolling up to fiscal quarter and
then fiscal year. Handling the explosion of groups has historically been the major
challenge in data storage for online analytical processing systems.
Typical online analytical queries slice and dice different parts of the cube comparing
aggregations from one level to aggregation from another level. For instance, a query
might find sales of the grocery division for the month of January, 2002 and compare
them with total sales of the grocery division for all of 2001.
Benefits of Partitioning Materialized Views
Materialized views with multiple aggregate groups give their best performance for
refresh and query rewrite when partitioned appropriately.
Advanced Materialized Views 10-7
Materialized Views in Analytic Processing Environments
PCT refresh in a rolling window scenario requires partitioning at the top level on some
level from the time dimension. And, partition pruning for queries rewritten against
this materialized view requires partitioning on GROUPING_ID column. Hence, the
most effective partitioning scheme for these materialized views is to use composite
partitioning (range-list on (time, GROUPING_ID) columns). By partitioning the
materialized views this way, you enable:
■ PCT refresh, thereby improving refresh performance.
■ Partition pruning: only relevant aggregate groups are accessed, thereby greatly
reducing the query processing cost.
If you do not want to use PCT refresh, you can just partition by list on GROUPING_ID
column.
Compressing Materialized Views
You should consider data compression when using highly redundant data, such as
tables with many foreign keys. In particular, materialized views created with the
ROLLUP clause are likely candidates. See Oracle Database SQL Language Reference for
data compression syntax and restrictions and "Storage And Table Compression" on
page 9-16 for details regarding compression.
Materialized Views with Set Operators
Oracle Database provides support for materialized views whose defining query
involves set operators. Materialized views with set operators can now be created
enabled for query rewrite. You can refresh the materialized view using either ON
COMMIT or ON DEMAND refresh.
Fast refresh is supported if the defining query has the UNION ALL operator at the top
level and each query block in the UNION ALL, meets the requirements of a materialized
view with aggregates or materialized view with joins only. Further, the materialized
view must include a constant column (known as a UNION ALL marker) that has a
distinct value in each query block, which, in the following example, is columns 1
marker and 2 marker.
See "Restrictions on Fast Refresh on Materialized Views with UNION ALL" on
page 9-23 for detailed restrictions on fast refresh for materialized views with UNION
ALL.
Examples of Materialized Views Using UNION ALL
The following examples illustrate creation of fast refreshable materialized views
involving UNION ALL.
Example 10–5 Materialized View Using UNION ALL with Two Join Views
To create a UNION ALL materialized view with two join views, the materialized view
logs must have the rowid column and, in the following example, the UNION ALL
marker is the columns, 1 marker and 2 marker.
CREATE MATERIALIZED VIEW LOG ON sales WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON customers WITH ROWID;
CREATE MATERIALIZED VIEW unionall_sales_cust_joins_mv
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE AS
(SELECT c.rowid crid, s.rowid srid, c.cust_id, s.amount_sold, 1 marker
FROM sales s, customers c
10-8 Oracle Database Data Warehousing Guide
Materialized Views and Models
WHERE s.cust_id = c.cust_id AND c.cust_last_name = 'Smith')
UNION ALL
(SELECT c.rowid crid, s.rowid srid, c.cust_id, s.amount_sold, 2 marker
FROM sales s, customers c
WHERE s.cust_id = c.cust_id AND c.cust_last_name = 'Brown');
Example 10–6 Materialized View Using UNION ALL with Joins and Aggregates
The following example shows a UNION ALL of a materialized view with joins and a
materialized view with aggregates. A couple of things can be noted in this example.
Nulls or constants can be used to ensure that the data types of the corresponding
SELECT list columns match. Also, the UNION ALL marker column can be a string
literal, which is 'Year' umarker, 'Quarter' umarker, or 'Daily' umarker in
the following example:
CREATE MATERIALIZED VIEW LOG ON sales WITH ROWID, SEQUENCE
(amount_sold, time_id)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW LOG ON times WITH ROWID, SEQUENCE
(time_id, fiscal_year, fiscal_quarter_number, day_number_in_week)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW unionall_sales_mix_mv
REFRESH FAST ON DEMAND AS
(SELECT 'Year' umarker, NULL, NULL, t.fiscal_year,
SUM(s.amount_sold) amt, COUNT(s.amount_sold), COUNT(*)
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY t.fiscal_year)
UNION ALL
(SELECT 'Quarter' umarker, NULL, NULL, t.fiscal_quarter_number,
SUM(s.amount_sold) amt, COUNT(s.amount_sold), COUNT(*)
FROM sales s, times t
WHERE s.time_id = t.time_id and t.fiscal_year = 2001
GROUP BY t.fiscal_quarter_number)
UNION ALL
(SELECT 'Daily' umarker, s.rowid rid, t.rowid rid2, t.day_number_in_week,
s.amount_sold amt, 1, 1
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.time_id between '01-Jan-01' AND '01-Dec-31');
Materialized Views and Models
Models, which provide array-based computations in SQL, can be used in materialized
views. Because the MODEL clause calculations can be expensive, you may want to use
two separate materialized views: one for the model calculations and one for the
SELECT ... GROUP BY query. For example, instead of using one, long materialized
view, you could create the following materialized views:
CREATE MATERIALIZED VIEW my_groupby_mv
REFRESH FAST
ENABLE QUERY REWRITE AS
SELECT country_name country, prod_name prod, calendar_year year,
SUM(amount_sold) sale, COUNT(amount_sold) cnt, COUNT(*) cntstr
FROM sales, times, customers, countries, products
WHERE sales.time_id = times.time_id AND
sales.prod_id = products.prod_id AND
sales.cust_id = customers.cust_id AND
Advanced Materialized Views 10-9
Invalidating Materialized Views
customers.country_id = countries.country_id
GROUP BY country_name, prod_name, calendar_year;
CREATE MATERIALIZED VIEW my_model_mv
ENABLE QUERY REWRITE AS
SELECT country, prod, year, sale, cnt
FROM my_groupby_mv
MODEL PARTITION BY(country) DIMENSION BY(prod, year)
MEASURES(sale s) IGNORE NAV
(s['Shorts', 2000] = 0.2 * AVG(s)[CV(), year BETWEEN 1996 AND 1999],
s['Kids Pajama', 2000] = 0.5 * AVG(s)[CV(), year BETWEEN 1995 AND 1999],
s['Boys Pajama', 2000] = 0.6 * AVG(s)[CV(), year BETWEEN 1994 AND 1999],
...
<hundreds of other update rules>);
By using two materialized views, you can incrementally maintain the materialized
view my_groupby_mv. The materialized view my_model_mv is on a much smaller
data set because it is built on my_groupby_mv and can be maintained by a complete
refresh.
Materialized views with models can use complete refresh or PCT refresh only, and are
available for partial text query rewrite only.
See Also: Chapter 23, "SQL for Modeling" for further details about
model calculations
Invalidating Materialized Views
Dependencies related to materialized views are automatically maintained to ensure
correct operation. When a materialized view is created, the materialized view depends
on the detail tables referenced in its definition. Any DML operation, such as an
INSERT, or DELETE, UPDATE, or DDL operation on any dependency in the
materialized view will cause it to become invalid. To revalidate a materialized view,
use the ALTER MATERIALIZED VIEW COMPILE statement.
A materialized view is automatically revalidated when it is referenced. In many cases,
the materialized view will be successfully and transparently revalidated. However, if a
column has been dropped in a table referenced by a materialized view or the owner of
the materialized view did not have one of the query rewrite privileges and that
privilege has now been granted to the owner, you should use the following statement
to revalidate the materialized view:
ALTER MATERIALIZED VIEW mview_name COMPILE;
The state of a materialized view can be checked by querying the data dictionary views
USER_MVIEWS or ALL_MVIEWS. The column STALENESS will show one of the values
FRESH, STALE, UNUSABLE, UNKNOWN, UNDEFINED, or NEEDS_COMPILE to indicate
whether the materialized view can be used. The state is maintained automatically.
However, if the staleness of a materialized view is marked as NEEDS_COMPILE, you
could issue an ALTER MATERIALIZED VIEW ... COMPILE statement to validate the
materialized view and get the correct staleness state. If the state of a materialized view
is UNUSABLE, you must perform a complete refresh to bring the materialized view
back to the FRESH state. If the materialized view is based on a prebuilt table that you
never refresh, you must drop and re-create the materialized view. The staleness of
remote materialized views is not tracked. Thus, if you use remote materialized views
for rewrite, they are considered to be trusted.
10-10 Oracle Database Data Warehousing Guide
Security Issues with Materialized Views
Security Issues with Materialized Views
To create a materialized view in your own schema, you must have the CREATE
MATERIALIZED VIEW privilege and the SELECT privilege to any tables referenced that
are in another schema. To create a materialized view in another schema, you must
have the CREATE ANY MATERIALIZED VIEW privilege and the owner of the
materialized view needs SELECT privileges to the tables referenced if they are from
another schema. Moreover, if you enable query rewrite on a materialized view that
references tables outside your schema, you must have the GLOBAL QUERY REWRITE
privilege or the QUERY REWRITE object privilege on each table outside your schema.
If the materialized view is on a prebuilt container, the creator, if different from the
owner, must have SELECT WITH GRANT privilege on the container table.
If you continue to get a privilege error while trying to create a materialized view and
you believe that all the required privileges have been granted, then the problem is
most likely due to a privilege not being granted explicitly and trying to inherit the
privilege from a role instead. The owner of the materialized view must have explicitly
been granted SELECT access to the referenced tables if the tables are in a different
schema.
If the materialized view is being created with ON COMMIT REFRESH specified, then the
owner of the materialized view requires an additional privilege if any of the tables in
the defining query are outside the owner's schema. In that case, the owner requires the
ON COMMIT REFRESH system privilege or the ON COMMIT REFRESH object privilege on
each table outside the owner's schema.
Querying Materialized Views with Virtual Private Database (VPD)
For all security concerns, a materialized view serves as a view that happens to be
materialized when you are directly querying the materialized view. When creating a
view or materialized view, the owner must have the necessary permissions to access
the underlying base relations of the view or materialized view that they are creating.
With these permissions, the owner can publish a view or materialized view that other
users can access, assuming they have been granted access to the view or materialized
view.
Using materialized views with Virtual Private Database is similar. When you create a
materialized view, there must not be any VPD policies in effect against the base
relations of the materialized view for the owner of the materialized view. However,
the owner of the materialized view may establish a VPD policy on the new
materialized view. Users who access the materialized view are subject to the VPD
policy on the materialized view. However, they are not additionally subject to the
VPD policies of the underlying base relations of the materialized view, since security
processing of the underlying base relations is performed against the owner of the
materialized view.
Using Query Rewrite with Virtual Private Database
When you access a materialized view using query rewrite, the materialized view
serves as an access structure much like an index. As such, the security implications for
materialized views accessed in this way are much the same as for indexes: all security
checks are performed against the relations specified in the request query. The index or
materialized view is used to speed the performance of accessing the data, not provide
any additional security checks. Thus, the presence of the index or materialized view
presents no additional security checking.
Advanced Materialized Views 10-11
Altering Materialized Views
This holds true when you are accessing a materialized view using query rewrite in the
presence of VPD. The request query is subject to any VPD policies that are present
against the relations specified in the query. Query rewrite may rewrite the query to
use a materialize view instead of accessing the detail relations, but only if it can
guarantee to deliver exactly the same rows as if the rewrite had not occurred.
Specifically, query rewrite must retain and respect any VPD policies against the
relations specified in the request query. However, any VPD policies against the
materialized view itself do not have effect when the materialized view is accessed
using query rewrite. This is because the data is already protected by the VPD policies
against the relations in the request query.
Restrictions with Materialized Views and Virtual Private Database
Query rewrite does not use its full and partial text match modes with request queries
that include relations with active VPD policies, but it does use general rewrite
methods. This is because VPD transparently transforms the request query to affect the
VPD policy. If query rewrite were to perform a text match transformation against a
request query with a VPD policy, the effect would be to negate the VPD policy.
In addition, when you create or refresh a materialized view, the owner of the
materialized view must not have any active VPD policies in effect against the base
relations of the materialized view, or an error is returned. The materialized view
owner must either have no such VPD policies, or any such policy must return NULL.
This is because VPD would transparently modify the defining query of the
materialized view such that the set of rows contained by the materialized view would
not match the set of rows indicated by the materialized view definition.
One way to work around this restriction yet still create a materialized view containing
the desired VPD-specified subset of rows is to create the materialized view in a user
account that has no active VPD policies against the detail relations of the materialized
view. In addition, you can include a predicate in the WHERE clause of the materialized
view that embodies the effect of the VPD policy. When query rewrite attempts to
rewrite a request query that has that VPD policy, it matches up the VPD-generated
predicate on the request query with the predicate you directly specify when you create
the materialized view.
Altering Materialized Views
Six modifications can be made to a materialized view. You can:
■ Change its refresh option (FAST/FORCE/COMPLETE/NEVER).
■ Change its refresh mode (ON COMMIT/ON DEMAND).
■ Recompile it.
■ Enable or disable its use for query rewrite.
■ Consider it fresh.
■ Partition maintenance operations.
All other changes are achieved by dropping and then re-creating the materialized
view.
The COMPILE clause of the ALTER MATERIALIZED VIEW statement can be used when
the materialized view has been invalidated. This compile process is quick, and allows
the materialized view to be used by query rewrite again.
10-12 Oracle Database Data Warehousing Guide
Altering Materialized Views
See Also:
■ Oracle Database SQL Language Reference for further information
about the ALTER MATERIALIZED VIEW statement
■ "Invalidating Materialized Views" on page 10-10
Advanced Materialized Views 10-13
Altering Materialized Views
10-14 Oracle Database Data Warehousing Guide
11
11 Dimensions
This chapter discusses using dimensions in a data warehouse: It contains the following
topics:
■ What are Dimensions?
■ Creating Dimensions
■ Viewing Dimensions
■ Using Dimensions with Constraints
■ Validating Dimensions
■ Altering Dimensions
■ Deleting Dimensions
What are Dimensions?
A dimension is a structure that categorizes data in order to enable users to answer
business questions. Commonly used dimensions are customers, products, and time.
For example, each sales channel of a clothing retailer might gather and store data
regarding sales and reclamations of their Cloth assortment. The retail chain
management can build a data warehouse to analyze the sales of its products across all
stores over time and help answer questions such as:
■ What is the effect of promoting one product on the sale of a related product that is
not promoted?
■ What are the sales of a product before and after a promotion?
■ How does a promotion affect the various distribution channels?
The data in the retailer's data warehouse system has two important components:
dimensions and facts. The dimensions are products, customers, promotions, channels,
and time. One approach for identifying your dimensions is to review your reference
tables, such as a product table that contains everything about a product, or a
promotion table containing all information about promotions. The facts are sales (units
sold) and profits. A data warehouse contains facts about the sales of each product at
on a daily basis.
A typical relational implementation for such a data warehouse is a star schema. The
fact information is stored in what is called a fact table, whereas the dimensional
information is stored in dimension tables. In our example, each sales transaction
record is uniquely defined as for each customer, for each product, for each sales
channel, for each promotion, and for each day (time).
Dimensions 11-1
What are Dimensions?
See Also: Chapter 20, "Schema Modeling Techniques" for further
details
In Oracle Database, the dimensional information itself is stored in a dimension table.
In addition, the database object dimension helps to organize and group dimensional
information into hierarchies. This represents natural 1:n relationships between
columns or column groups (the levels of a hierarchy) that cannot be represented with
constraint conditions. Going up a level in the hierarchy is called rolling up the data
and going down a level in the hierarchy is called drilling down the data. In the retailer
example:
■ Within the time dimension, months roll up to quarters, quarters roll up to years,
and years roll up to all years.
■ Within the product dimension, products roll up to subcategories, subcategories
roll up to categories, and categories roll up to all products.
■ Within the customer dimension, customers roll up to city. Then cities roll up to
state. Then states roll up to country. Then countries roll up to subregion.
Finally, subregions roll up to region, as shown in Figure 11–1.
Figure 11–1 Sample Rollup for a Customer Dimension
region
subregion
country
state
city
customer
Data analysis typically starts at higher levels in the dimensional hierarchy and
gradually drills down if the situation warrants such analysis.
Dimensions do not have to be defined. However, if your application uses dimensional
modeling, it is worth spending time creating them as it can yield significant benefits,
because they help query rewrite perform more complex types of rewrite. Dimensions
are also beneficial to certain types of materialized view refresh operations and with the
SQL Access Advisor. They are only mandatory if you use the SQL Access Advisor (a
GUI tool for materialized view and index management) without a workload to
recommend which materialized views and indexes to create, drop, or retain.
11-2 Oracle Database Data Warehousing Guide
Creating Dimensions
See Also:
■ Chapter 18, "Basic Query Rewrite" for further details regarding
query rewrite
■ Oracle Database Performance Tuning Guide for further details
regarding the SQL Access Advisor
In spite of the benefits of dimensions, you must not create dimensions in any schema
that does not fully satisfy the dimensional relationships described in this chapter.
Incorrect results can be returned from queries otherwise.
Creating Dimensions
Before you can create a dimension object, the dimension tables must exist in the
database possibly containing the dimension data. For example, if you create a
customer dimension, one or more tables must exist that contain the city, state, and
country information. In a star schema data warehouse, these dimension tables already
exist. It is therefore a simple task to identify which ones will be used.
Now you can draw the hierarchies of a dimension as shown in Figure 11–1. For
example, city is a child of state (because you can aggregate city-level data up to
state), and country. This hierarchical information will be stored in the database object
dimension.
In the case of normalized or partially normalized dimension representation (a
dimension that is stored in more than one table), identify how these tables are joined.
Note whether the joins between the dimension tables can guarantee that each
child-side row joins with one and only one parent-side row. In the case of
denormalized dimensions, determine whether the child-side columns uniquely
determine the parent-side (or attribute) columns. If you use constraints to represent
these relationships, they can be enabled with the NOVALIDATE and RELY clauses if the
relationships represented by the constraints are guaranteed by other means.
You may want the capability to skip NULL levels in a dimension. An example of this is
with Puerto Rico. You may want Puerto Rico to be included within a region of North
America, but not include it within the state category. If you want this capability, use
the SKIP WHEN NULL clause. See the sample dimension later in this section for more
information and Oracle Database SQL Language Reference for syntax and restrictions.
You create a dimension using either the CREATE DIMENSION statement or the
Dimension Wizard in Oracle Enterprise Manager. Within the CREATE DIMENSION
statement, use the LEVEL clause to identify the names of the dimension levels.
See Also: Oracle Database SQL Language Reference for a complete
description of the CREATE DIMENSION statement
This customer dimension contains a single hierarchy with a geographical rollup, with
arrows drawn from the child level to the parent level, as shown in Figure 11–1 on
page 11-2.
Each arrow in this graph indicates that for any child there is one and only one parent.
For example, each city must be contained in exactly one state and each state must be
contained in exactly one country. States that belong to more than one country violate
hierarchical integrity. Also, you must use the SKIP WHEN NULL clause if you want to
include cities that do not belong to a state, such as Washington D.C. Hierarchical
integrity is necessary for the correct operation of management functions for
materialized views that include aggregates.
Dimensions 11-3
Creating Dimensions
For example, you can declare a dimension products_dim, which contains levels
product, subcategory, and category:
CREATE DIMENSION products_dim IS (products.prod_id)
LEVEL product IS (products.prod_subcategory)
LEVEL subcategory IS (products.prod_category) ...
LEVEL category
Each level in the dimension must correspond to one or more columns in a table in the
database. Thus, level product is identified by the column prod_id in the products
table and level subcategory is identified by a column called prod_subcategory in
the same table.
In this example, the database tables are denormalized and all the columns exist in the
same table. However, this is not a prerequisite for creating dimensions. "Using
Normalized Dimension Tables" on page 11-8 shows how to create a dimension
customers_dim that has a normalized schema design using the JOIN KEY clause.
The next step is to declare the relationship between the levels with the HIERARCHY
statement and give that hierarchy a name. A hierarchical relationship is a functional
dependency from one level of a hierarchy to the next level in the hierarchy. Using the
level names defined previously, the CHILD OF relationship denotes that each child's
level value is associated with one and only one parent level value. The following
statement declares a hierarchy prod_rollup and defines the relationship between
products, subcategory, and category:
HIERARCHY prod_rollup
(product CHILD OF
subcategory CHILD OF
category)
In addition to the 1:n hierarchical relationships, dimensions also include 1:1
attribute relationships between the hierarchy levels and their dependent, determined
dimension attributes. For example, the dimension times_dim, as defined in Oracle
Database Sample Schemas, has columns fiscal_month_desc, fiscal_month_name,
and days_in_fiscal_month. Their relationship is defined as follows:
LEVEL fis_month IS TIMES.FISCAL_MONTH_DESC
...
ATTRIBUTE fis_month DETERMINES
(fiscal_month_name, days_in_fiscal_month)
The ATTRIBUTE ... DETERMINES clause relates fis_month to fiscal_month_name
and days_in_fiscal_month. Note that this is a unidirectional determination. It is
only guaranteed, that for a specific fiscal_month, for example, 1999-11, you will
find exactly one matching values for fiscal_month_name, for example, November
and days_in_fiscal_month, for example, 28. You cannot determine a specific
fiscal_month_desc based on the fiscal_month_name, which is November for
every fiscal year.
In this example, suppose a query were issued that queried by fiscal_month_name
instead of fiscal_month_desc. Because this 1:1 relationship exists between the
attribute and the level, an already aggregated materialized view containing fiscal_
month_desc can be joined back to the dimension information and used to identify the
data.
See Also: Chapter 18, "Basic Query Rewrite" for further details of
using dimensional information
11-4 Oracle Database Data Warehousing Guide
Creating Dimensions
A sample dimension definition follows:
CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory) [SKIP WHEN NULL]
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
subcategory CHILD OF
category)
ATTRIBUTE product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_class, prod_unit_of_measure,
prod_pack_size, prod_status, prod_list_price, prod_min_price)
ATTRIBUTE subcategory DETERMINES
(prod_subcategory, prod_subcategory_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_category_desc);
Alternatively, the extended_attribute_clause could have been used instead of
the attribute_clause, as shown in the following example:
CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory)
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
subcategory CHILD OF
category
)
ATTRIBUTE product_info LEVEL product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_class, prod_unit_of_measure,
prod_pack_size, prod_status, prod_list_price, prod_min_price)
ATTRIBUTE subcategory DETERMINES
(prod_subcategory, prod_subcategory_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_category_desc);
The design, creation, and maintenance of dimensions is part of the design, creation,
and maintenance of your data warehouse schema. Once the dimension has been
created, verify that it meets these requirements:
■ There must be a 1:n relationship between a parent and children. A parent can
have one or more children, but a child can have only one parent.
■ There must be a 1:1 attribute relationship between hierarchy levels and their
dependent dimension attributes. For example, if there is a column fiscal_
month_desc, then a possible attribute relationship would be fiscal_month_
desc to fiscal_month_name. For skip NULL levels, if a row of the relation of a
skip level has a NULL value for the level column, then that row must have a NULL
value for the attribute-relationship column, too.
■ If the columns of a parent level and child level are in different relations, then the
connection between them also requires a 1:n join relationship. Each row of the
child table must join with one and only one row of the parent table unless you use
the SKIP WHEN NULL clause. This relationship is stronger than referential integrity
alone, because it requires that the child join key must be non-null, that referential
integrity must be maintained from the child join key to the parent join key, and
that the parent join key must be unique.
Dimensions 11-5
Creating Dimensions
■ You must ensure (using database constraints if necessary) that the columns of each
hierarchy level are non-null unless you use the SKIP WHEN NULL clause and that
hierarchical integrity is maintained.
■ An optional join key is a join key that connects the immediate non-skip child (if
such a level exists), CHILDLEV, of a skip level to the nearest non-skip ancestor
(again, if such a level exists), ANCLEV, of the skip level in the hierarchy. Also, this
joinkey is allowed only when CHILDLEV and ANCLEV are defined over different
relations.
■ The hierarchies of a dimension can overlap or be disconnected from each other.
However, the columns of a hierarchy level cannot be associated with more than
one dimension.
■ Join relationships that form cycles in the dimension graph are not supported. For
example, a hierarchy level cannot be joined to itself either directly or indirectly.
Note: The information stored with a dimension objects is only
declarative. The previously discussed relationships are not
enforced with the creation of a dimension object. You should
validate any dimension definition with the DBMS_
DIMENSION.VALIDATE_DIMENSION procedure, as discussed in
"Validating Dimensions" on page 11-10.
Dropping and Creating Attributes with Columns
You can use the attribute clause in a CREATE DIMENSION statement to specify one or
multiple columns that are uniquely determined by a hierarchy level.
If you use the extended_attribute_clause to create multiple columns
determined by a hierarchy level, you can drop one attribute column without dropping
them all. Alternatively, you can specify an attribute name for each attribute clause
CREATE or ALTER DIMENSION statement so that an attribute name is specified for
each attribute clause where multiple level-to-column relationships can be individually
specified.
The following statement illustrates how you can drop a single column without
dropping all columns:
CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory)
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
subcategory CHILD OF category)
ATTRIBUTE product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_class, prod_unit_of_measure,
prod_pack_size,prod_status, prod_list_price, prod_min_price)
ATTRIBUTE subcategory_att DETERMINES
(prod_subcategory, prod_subcategory_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_category_desc);
ALTER DIMENSION products_dim
DROP ATTRIBUTE subcategory_att LEVEL subcategory COLUMN prod_subcategory;
11-6 Oracle Database Data Warehousing Guide
Creating Dimensions
See Also: Oracle Database SQL Language Reference for a complete
description of the CREATE DIMENSION statement
Multiple Hierarchies
A single dimension definition can contain multiple hierarchies. Suppose our retailer
wants to track the sales of certain items over time. The first step is to define the time
dimension over which sales will be tracked. Figure 11–2 illustrates a dimension
times_dim with two time hierarchies.
Figure 11–2 times_dim Dimension with Two Time Hierarchies
year fis_year
quarter fis_quarter
month fis_month
fis_week
day
From the illustration, you can construct the hierarchy of the denormalized time_dim
dimension's CREATE DIMENSION statement as follows. The complete CREATE
DIMENSION statement as well as the CREATE TABLE statement are shown in Oracle
Database Sample Schemas.
CREATE DIMENSION times_dim
LEVEL day IS times.time_id
LEVEL month IS times.calendar_month_desc
LEVEL quarter IS times.calendar_quarter_desc
LEVEL year IS times.calendar_year
LEVEL fis_week IS times.week_ending_day
LEVEL fis_month IS times.fiscal_month_desc
LEVEL fis_quarter IS times.fiscal_quarter_desc
LEVEL fis_year IS times.fiscal_year
HIERARCHY cal_rollup (
day CHILD OF
month CHILD OF
quarter CHILD OF
year
)
HIERARCHY fis_rollup (
day CHILD OF
fis_week CHILD OF
fis_month CHILD OF
fis_quarter CHILD OF
fis_year
) <attribute determination clauses>;
Dimensions 11-7
Viewing Dimensions
Using Normalized Dimension Tables
The tables used to define a dimension may be normalized or denormalized and the
individual hierarchies can be normalized or denormalized. If the levels of a hierarchy
come from the same table, it is called a fully denormalized hierarchy. For example,
cal_rollup in the times_dim dimension is a denormalized hierarchy. If levels of a
hierarchy come from different tables, such a hierarchy is either a fully or partially
normalized hierarchy. This section shows how to define a normalized hierarchy.
Suppose the tracking of a customer's location is done by city, state, and country. This
data is stored in the tables customers and countries. The customer dimension
customers_dim is partially normalized because the data entities cust_id and
country_id are taken from different tables. The clause JOIN KEY within the
dimension definition specifies how to join together the levels in the hierarchy. The
dimension statement is partially shown in the following. The complete CREATE
DIMENSION statement as well as the CREATE TABLE statement are shown in Oracle
Database Sample Schemas.
CREATE DIMENSION customers_dim
LEVEL customer IS (customers.cust_id)
LEVEL city IS (customers.cust_city)
LEVEL state IS (customers.cust_state_province)
LEVEL country IS (countries.country_id)
LEVEL subregion IS (countries.country_subregion)
LEVEL region IS (countries.country_region)
HIERARCHY geog_rollup (
customer CHILD OF
city CHILD OF
state CHILD OF
country CHILD OF
subregion CHILD OF
region
JOIN KEY (customers.country_id) REFERENCES country);
If you use the SKIP WHEN NULL clause, you can use the JOIN KEY clause to link levels
that have a missing level in their hierarchy. For example, the following statement
enables a state level that has been declared as SKIP WHEN NULL to join city and
country:
JOIN KEY (city.country_id) REFERENCES country;
This ensures that the rows at customer and city levels can still be associated with the
rows of country, subregion, and region levels.
Viewing Dimensions
Dimensions can be viewed through one of two methods:
■ Using Oracle Enterprise Manager
■ Using the DESCRIBE_DIMENSION Procedure
Using Oracle Enterprise Manager
All of the dimensions that exist in the data warehouse can be viewed using Oracle
Enterprise Manager. Select the Dimension object from within the Schema icon to
display all of the dimensions. Select a specific dimension to graphically display its
hierarchy, levels, and any attributes that have been defined.
11-8 Oracle Database Data Warehousing Guide
Using Dimensions with Constraints
Using the DESCRIBE_DIMENSION Procedure
To view the definition of a dimension, use the DESCRIBE_DIMENSION procedure in
the DBMS_DIMENSION package. For example, if a dimension is created in the sh
sample schema with the following statements:
CREATE DIMENSION channels_dim
LEVEL channel IS (channels.channel_id)
LEVEL channel_class IS (channels.channel_class)
HIERARCHY channel_rollup (
channel CHILD OF channel_class)
ATTRIBUTE channel DETERMINES (channel_desc)
ATTRIBUTE channel_class DETERMINES (channel_class);
Execute the DESCRIBE_DIMENSION procedure as follows:
SET SERVEROUTPUT ON FORMAT WRAPPED; --to improve the display of info
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
You then see the following output results:
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
DIMENSION SH.CHANNELS_DIM
LEVEL CHANNEL IS SH.CHANNELS.CHANNEL_ID
LEVEL CHANNEL_CLASS IS SH.CHANNELS.CHANNEL_CLASS
HIERARCHY CHANNEL_ROLLUP (
CHANNEL CHILD OF
CHANNEL_CLASS)
ATTRIBUTE CHANNEL LEVEL CHANNEL DETERMINES
SH.CHANNELS.CHANNEL_DESC
ATTRIBUTE CHANNEL_CLASS LEVEL CHANNEL_CLASS DETERMINES
SH.CHANNELS.CHANNEL_CLASS
Using Dimensions with Constraints
Constraints play an important role with dimensions. Full referential integrity is
sometimes enabled in data warehouses, but not always. This is because operational
databases normally have full referential integrity and you can ensure that the data
flowing into your data warehouse never violates the already established integrity
rules.
It is recommended that constraints be enabled and, if validation time is a concern, then
the NOVALIDATE clause should be used as follows:
ENABLE NOVALIDATE CONSTRAINT pk_time;
Primary and foreign keys should be implemented also. Referential integrity
constraints and NOT NULL constraints on the fact tables provide information that query
rewrite can use to extend the usefulness of materialized views.
In addition, you should use the RELY clause to inform query rewrite that it can rely
upon the constraints being correct as follows:
ALTER TABLE time MODIFY CONSTRAINT pk_time RELY;
This information is also used for query rewrite. See Chapter 18, "Basic Query Rewrite"
for more information.
If you use the SKIP WHEN NULL clause, at least one of the referenced level columns
should not have NOT NULL constraints.
Dimensions 11-9
Validating Dimensions
Validating Dimensions
The information of a dimension object is declarative only and not enforced by the
database. If the relationships described by the dimensions are incorrect, incorrect
results could occur. Therefore, you should verify the relationships specified by
CREATE DIMENSION using the DBMS_DIMENSION.VALIDATE_DIMENSION
procedure periodically.
This procedure is easy to use and has only four parameters:
■ dimension: the owner and name.
■ incremental: set to TRUE to check only the new rows for tables of this
dimension.
■ check_nulls: set to TRUE to verify that all columns that are not in the levels
containing a SKIP WHEN NULL clause are not null.
■ statement_id: a user-supplied unique identifier to identify the result of each
run of the procedure.
The following example validates the dimension TIME_FN in the sh schema:
@utldim.sql
EXECUTE DBMS_DIMENSION.VALIDATE_DIMENSION ('SH.TIME_FN', FALSE, TRUE,
'my first example');
Before running the VALIDATE_DIMENSION procedure, you need to create a local
table, DIMENSION_EXCEPTIONS, by running the provided script utldim.sql. If the
VALIDATE_DIMENSION procedure encounters any errors, they are placed in this table.
Querying this table will identify the exceptions that were found. The following
illustrates a sample:
SELECT * FROM dimension_exceptions
WHERE statement_id = 'my first example';
STATEMENT_ID OWNER TABLE_NAME DIMENSION_NAME RELATIONSHIP BAD_ROWID
------------ ------------ ---------
my first example ----- ---------- -------------- FOREIGN KEY AAAAuwAAJAAAARwAAA
SH MONTH TIME_FN
However, rather than query this table, it may be better to query the rowid of the
invalid row to retrieve the actual row that has violated the constraint. In this example,
the dimension TIME_FN is checking a table called month. It has found a row that
violates the constraints. Using the rowid, you can see exactly which row in the month
table is causing the problem, as in the following:
SELECT * FROM month
WHERE rowid IN (SELECT bad_rowid
FROM dimension_exceptions
WHERE statement_id = 'my first example');
MONTH QUARTER FISCAL_QTR YEAR FULL_MONTH_NAME MONTH_NUMB
------ ------- ---------- ---- --------------- ----------
199903 1998
19981 19981 March 3
Altering Dimensions
You can modify a dimension using the ALTER DIMENSION statement. You can add or
drop a level, hierarchy, or attribute from the dimension using this command.
11-10 Oracle Database Data Warehousing Guide
Deleting Dimensions
Referring to the time dimension in Figure 11–2 on page 11-7, you can remove the
attribute fis_year, drop the hierarchy fis_rollup, or remove the level fiscal_
year. In addition, you can add a new level called f_year as in the following:
ALTER DIMENSION times_dim DROP ATTRIBUTE fis_year;
ALTER DIMENSION times_dim DROP HIERARCHY fis_rollup;
ALTER DIMENSION times_dim DROP LEVEL fis_year;
ALTER DIMENSION times_dim ADD LEVEL f_year IS times.fiscal_year;
If you used the extended_attribute_clause when creating the dimension, you
can drop one attribute column without dropping all attribute columns. This is
illustrated in "Dropping and Creating Attributes with Columns" on page 11-6, which
shows the following statement:
ALTER DIMENSION product_dim
DROP ATTRIBUTE size LEVEL prod_type COLUMN Prod_TypeSize;
If you try to remove anything with further dependencies inside the dimension, Oracle
Database rejects the altering of the dimension. A dimension becomes invalid if you
change any schema object that the dimension is referencing. For example, if the table
on which the dimension is defined is altered, the dimension becomes invalid.
You can modify a dimension by adding a level containing a SKIP WHEN NULL clause,
as in the following statement:
ALTER DIMENSION times_dim
ADD LEVEL f_year IS times.fiscal_year SKIP WHEN NULL;
You cannot, however, modify a level that contains a SKIP WHEN NULL clause. Instead,
you need to drop the level and re-create it.
To check the status of a dimension, view the contents of the column invalid in the
ALL_DIMENSIONS data dictionary view. To revalidate the dimension, use the
COMPILE option as follows:
ALTER DIMENSION times_dim COMPILE;
Dimensions can also be modified or deleted using Oracle Enterprise Manager.
Deleting Dimensions
A dimension is removed using the DROP DIMENSION statement. For example, you
could issue the following statement:
DROP DIMENSION times_dim;
Dimensions 11-11
Deleting Dimensions
11-12 Oracle Database Data Warehousing Guide
Part IV
PartIV Managing the Data Warehouse
Environment
This section discusses the tasks necessary for managing a data warehouse.
It contains the following chapters:
■ Chapter 12, "Overview of Extraction, Transformation, and Loading"
■ Chapter 13, "Extraction in Data Warehouses"
■ Chapter 14, "Transportation in Data Warehouses"
■ Chapter 15, "Loading and Transformation"
■ Chapter 16, "Maintaining the Data Warehouse"
■ Chapter 17, "Change Data Capture"
12
12 Overview of Extraction, Transformation, and
Loading
This chapter discusses the process of extracting, transporting, transforming, and
loading data in a data warehousing environment. It includes the following topics:
■ Overview of ETL in Data Warehouses
■ ETL Tools for Data Warehouses
Overview of ETL in Data Warehouses
You must load your data warehouse regularly so that it can serve its purpose of
facilitating business analysis. To do this, data from one or more operational systems
must be extracted and copied into the data warehouse. The challenge in data
warehouse environments is to integrate, rearrange and consolidate large volumes of
data over many systems, thereby providing a new unified information base for
business intelligence.
The process of extracting data from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for extraction, transformation, and
loading. Note that ETL refers to a broad process, and not three well-defined steps. The
acronym ETL is perhaps too simplistic, because it omits the transportation phase and
implies that each of the other phases of the process is distinct. Nevertheless, the entire
process is known as ETL.
The methodology and tasks of ETL have been well known for many years, and are not
necessarily unique to data warehouse environments: a wide variety of proprietary
applications and database systems are the IT backbone of any enterprise. Data has to
be shared between applications or systems, trying to integrate them, giving at least
two applications the same picture of the world. This data sharing was mostly
addressed by mechanisms similar to what we now call ETL.
ETL Basics in Data Warehousing
What happens during the ETL process? The following tasks are the main actions in the
process.
Extraction of Data
During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications. Very often, it is not possible to
identify the specific subset of interest, therefore more data than necessary has to be
extracted, so the identification of the relevant data will be done at a later point in time.
Depending on the source system's capabilities (for example, operating system
Overview of Extraction, Transformation, and Loading 12-1
ETL Tools for Data Warehouses
resources), some transformations may take place during this extraction process. The
size of the extracted data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business situation. The same is true for the
time delta between two (logically) identical extractions: the time span may vary
between days/hours and minutes to near real-time. Web server log files, for example,
can easily grow to hundreds of megabytes in a very short period.
Transportation of Data
After data is extracted, it has to be physically transported to the target system or to an
intermediate system for further processing. Depending on the chosen way of
transportation, some transformations can be done during this process, too. For
example, a SQL statement which directly accesses a remote target through a gateway
can concatenate two columns as part of the SELECT statement.
The emphasis in many of the examples in this section is scalability. Many long-time
users of Oracle Database are experts in programming complex data transformation
logic using PL/SQL. These chapters suggest alternatives for many such data
manipulation operations, with a particular emphasis on implementations that take
advantage of Oracle's new SQL functionality, especially for ETL and the parallel query
infrastructure.
ETL Tools for Data Warehouses
Designing and maintaining the ETL process is often considered one of the most
difficult and resource-intensive portions of a data warehouse project. Many data
warehousing projects use ETL tools to manage this process. Oracle Warehouse
Builder, for example, provides ETL capabilities and takes advantage of inherent
database abilities. Other data warehouse builders create their own ETL tools and
processes, either inside or outside the database.
Besides the support of extraction, transformation, and loading, there are some other
tasks that are important for a successful ETL implementation as part of the daily
operations of the data warehouse and its support for further enhancements. Besides
the support for designing a data warehouse and the data flow, these tasks are typically
addressed by ETL tools such as Oracle Warehouse Builder.
Oracle is not an ETL tool and does not provide a complete solution for ETL. However,
Oracle does provide a rich set of capabilities that can be used by both ETL tools and
customized ETL solutions. Oracle offers techniques for transporting data between
Oracle databases, for transforming large volumes of data, and for quickly loading new
data into a data warehouse.
Daily Operations in Data Warehouses
The successive loads and transformations must be scheduled and processed in a
specific order. Depending on the success or failure of the operation or parts of it, the
result must be tracked and subsequent, alternative processes might be started. The
control of the progress as well as the definition of a business workflow of the
operations are typically addressed by ETL tools such as Oracle Warehouse Builder.
Evolution of the Data Warehouse
As the data warehouse is a living IT system, sources and targets might change. Those
changes must be maintained and tracked through the lifespan of the system without
overwriting or deleting the old ETL process flow information. To build and keep a
level of trust about the information in the warehouse, the process flow of each
12-2 Oracle Database Data Warehousing Guide
ETL Tools for Data Warehouses
individual record in the warehouse can be reconstructed at any point in time in the
future in an ideal case.
Overview of Extraction, Transformation, and Loading 12-3
ETL Tools for Data Warehouses
12-4 Oracle Database Data Warehousing Guide
13
13 Extraction in Data Warehouses
This chapter discusses extraction, which is the process of taking data from an
operational system and moving it to your data warehouse or staging system. The
chapter discusses:
■ Overview of Extraction in Data Warehouses
■ Introduction to Extraction Methods in Data Warehouses
■ Data Warehousing Extraction Examples
Overview of Extraction in Data Warehouses
Extraction is the operation of extracting data from a source system for further use in a
data warehouse environment. This is the first step of the ETL process. After the
extraction, this data can be transformed and loaded into the data warehouse.
The source systems for a data warehouse are typically transaction processing
applications. For example, one of the source systems for a sales analysis data
warehouse might be an order entry system that records all of the current order
activities.
Designing and creating the extraction process is often one of the most time-consuming
tasks in the ETL process and, indeed, in the entire data warehousing process. The
source systems might be very complex and poorly documented, and thus determining
which data needs to be extracted can be difficult. The data has to be extracted
normally not only once, but several times in a periodic manner to supply all changed
data to the data warehouse and keep it up-to-date. Moreover, the source system
typically cannot be modified, nor can its performance or availability be adjusted, to
accommodate the needs of the data warehouse extraction process.
These are important considerations for extraction and ETL in general. This chapter,
however, focuses on the technical considerations of having different kinds of sources
and extraction methods. It assumes that the data warehouse team has already
identified the data that will be extracted, and discusses common techniques used for
extracting data from source databases.
Designing this process means making decisions about the following two main aspects:
■ Which extraction method do I choose?
This influences the source system, the transportation process, and the time needed
for refreshing the warehouse.
■ How do I provide the extracted data for further processing?
This influences the transportation method, and the need for cleaning and
transforming the data.
Extraction in Data Warehouses 13-1
Introduction to Extraction Methods in Data Warehouses
Introduction to Extraction Methods in Data Warehouses
The extraction method you should choose is highly dependent on the source system
and also from the business needs in the target data warehouse environment. Very
often, there is no possibility to add additional logic to the source systems to enhance
an incremental extraction of data due to the performance or the increased workload of
these systems. Sometimes even the customer is not allowed to add anything to an
out-of-the-box application system.
Logical Extraction Methods
There are two types of logical extraction:
■ Full Extraction
■ Incremental Extraction
Full Extraction
The data is extracted completely from the source system. Because this extraction
reflects all the data currently available on the source system, there's no need to keep
track of changes to the data source since the last successful extraction. The source data
will be provided as-is and no additional logical information (for example, timestamps)
is necessary on the source site. An example for a full extraction may be an export file of
a distinct table or a remote SQL statement scanning the complete source table.
Incremental Extraction
At a specific point in time, only the data that has changed since a well-defined event
back in history is extracted. This event may be the last time of extraction or a more
complex business event like the last booking day of a fiscal period. To identify this
delta change there must be a possibility to identify all the changed information since
this specific time event. This information can be either provided by the source data
itself such as an application column, reflecting the last-changed timestamp or a change
table where an appropriate additional mechanism keeps track of the changes besides
the originating transactions. In most cases, using the latter method means adding
extraction logic to the source system.
Many data warehouses do not use any change-capture techniques as part of the
extraction process. Instead, entire tables from the source systems are extracted to the
data warehouse or staging area, and these tables are compared with a previous extract
from the source system to identify the changed data. This approach may not have
significant impact on the source systems, but it clearly can place a considerable burden
on the data warehouse processes, particularly if the data volumes are large.
Oracle's Change Data Capture (CDC) mechanism can extract and maintain such delta
information. See Chapter 17, "Change Data Capture" for further details about the
Change Data Capture framework.
Physical Extraction Methods
Depending on the chosen logical extraction method and the capabilities and
restrictions on the source side, the extracted data can be physically extracted by two
mechanisms. The data can either be extracted online from the source system or from
an offline structure. Such an offline structure might already exist or it might be
generated by an extraction routine.
There are the following methods of physical extraction:
13-2 Oracle Database Data Warehousing Guide
Introduction to Extraction Methods in Data Warehouses
■ Online Extraction
■ Offline Extraction
Online Extraction
The data is extracted directly from the source system itself. The extraction process can
connect directly to the source system to access the source tables themselves or to an
intermediate system that stores the data in a preconfigured manner (for example,
snapshot logs or change tables). Note that the intermediate system is not necessarily
physically different from the source system.
With online extractions, you must consider whether the distributed transactions are
using original source objects or prepared source objects.
Offline Extraction
The data is not extracted directly from the source system but is staged explicitly
outside the original source system. The data already has an existing structure (for
example, redo logs, archive logs or transportable tablespaces) or was created by an
extraction routine.
You should consider the following structures:
■ Flat files
Data in a defined, generic format. Additional information about the source object
is necessary for further processing.
■ Dump files
Oracle-specific format. Information about the containing objects may or may not
be included, depending on the chosen utility.
■ Redo and archive logs
Information is in a special, additional dump file.
■ Transportable tablespaces
A powerful way to extract and move large volumes of data between Oracle
databases. A more detailed example of using this feature to extract and transport
data is provided in Chapter 14, "Transportation in Data Warehouses". Oracle
recommends that you use transportable tablespaces whenever possible, because
they can provide considerable advantages in performance and manageability over
other extraction techniques.
See Oracle Database Utilities for more information on using export/import.
Change Data Capture
An important consideration for extraction is incremental extraction, also called Change
Data Capture. If a data warehouse extracts data from an operational system on a
nightly basis, then the data warehouse requires only the data that has changed since
the last extraction (that is, the data that has been modified in the past 24 hours).
Change Data Capture is also the key-enabling technology for providing near real-time,
or on-time, data warehousing.
When it is possible to efficiently identify and extract only the most recently changed
data, the extraction process (and all downstream operations in the ETL process) can be
much more efficient, because it must extract a much smaller volume of data.
Unfortunately, for many source systems, identifying the recently modified data may
Extraction in Data Warehouses 13-3
Introduction to Extraction Methods in Data Warehouses
be difficult or intrusive to the operation of the system. Change Data Capture is
typically the most challenging technical issue in data extraction.
Because change data capture is often desirable as part of the extraction process and it
might not be possible to use the Change Data Capture mechanism, this section
describes several techniques for implementing a self-developed change capture on
Oracle Database source systems:
■ Timestamps
■ Partitioning
■ Triggers
These techniques are based upon the characteristics of the source systems, or may
require modifications to the source systems. Thus, each of these techniques must be
carefully evaluated by the owners of the source system prior to implementation.
Each of these techniques can work in conjunction with the data extraction technique
discussed previously. For example, timestamps can be used whether the data is being
unloaded to a file or accessed through a distributed query. See Chapter 17, "Change
Data Capture" for further details.
Timestamps
The tables in some operational systems have timestamp columns. The timestamp
specifies the time and date that a given row was last modified. If the tables in an
operational system have columns containing timestamps, then the latest data can
easily be identified using the timestamp columns. For example, the following query
might be useful for extracting today's data from an orders table:
SELECT * FROM orders
WHERE TRUNC(CAST(order_date AS date),'dd') =
TO_DATE(SYSDATE,'dd-mon-yyyy');
If the timestamp information is not available in an operational source system, you are
not always able to modify the system to include timestamps. Such modification would
require, first, modifying the operational system's tables to include a new timestamp
column and then creating a trigger to update the timestamp column following every
operation that modifies a given row.
Partitioning
Some source systems might use range partitioning, such that the source tables are
partitioned along a date key, which allows for easy identification of new data. For
example, if you are extracting from an orders table, and the orders table is
partitioned by week, then it is easy to identify the current week's data.
Triggers
Triggers can be created in operational systems to keep track of recently updated
records. They can then be used in conjunction with timestamp columns to identify the
exact time and date when a given row was last modified. You do this by creating a
trigger on each source table that requires change data capture. Following each DML
statement that is executed on the source table, this trigger updates the timestamp
column with the current time. Thus, the timestamp column provides the exact time
and date when a given row was last modified.
A similar internalized trigger-based technique is used for Oracle materialized view
logs. These logs are used by materialized views to identify changed data, and these
13-4 Oracle Database Data Warehousing Guide
Data Warehousing Extraction Examples
logs are accessible to end users. However, the format of the materialized view logs is
not documented and might change over time.
If you want to use a trigger-based mechanism, use synchronous change data capture.
It is recommended that you use synchronous Change Data Capture for trigger based
change capture, because CDC provides an externalized interface for accessing the
change information and provides a framework for maintaining the distribution of this
information to various clients.
Materialized view logs rely on triggers, but they provide an advantage in that the
creation and maintenance of this change-data system is largely managed by the
database.
However, Oracle recommends the usage of synchronous Change Data Capture for
trigger-based change capture, since CDC provides an externalized interface for
accessing the change information and provides a framework for maintaining the
distribution of this information to various clients
Trigger-based techniques might affect performance on the source systems, and this
impact should be carefully considered prior to implementation on a production source
system.
Data Warehousing Extraction Examples
You can extract data in two ways:
■ Extraction Using Data Files
■ Extraction Through Distributed Operations
Extraction Using Data Files
Most database systems provide mechanisms for exporting or unloading data from the
internal database format into flat files. Extracts from mainframe systems often use
COBOL programs, but many databases, and third-party software vendors, provide
export or unload utilities.
Data extraction does not necessarily mean that entire database structures are unloaded
in flat files. In many cases, it may be appropriate to unload entire database tables or
objects. In other cases, it may be more appropriate to unload only a subset of a given
table such as the changes on the source system since the last extraction or the results of
joining multiple tables together. Different extraction techniques vary in their
capabilities to support these two scenarios.
When the source system is an Oracle database, several alternatives are available for
extracting data into files:
■ Extracting into Flat Files Using SQL*Plus
■ Extracting into Flat Files Using OCI or Pro*C Programs
■ Exporting into Export Files Using the Export Utility
■ Extracting into Export Files Using External Tables
Extracting into Flat Files Using SQL*Plus
The most basic technique for extracting data is to execute a SQL query in SQL*Plus
and direct the output of the query to a file. For example, to extract a flat file,
country_city.log, with the pipe sign as delimiter between column values,
Extraction in Data Warehouses 13-5
Data Warehousing Extraction Examples
containing a list of the cities in the US in the tables countries and customers, the
following SQL script could be run:
SET echo off SET pagesize 0 SPOOL country_city.log
SELECT distinct t1.country_name ||'|'|| t2.cust_city
FROM countries t1, customers t2 WHERE t1.country_id = t2.country_id
AND t1.country_name= 'United States of America';
SPOOL off
The exact format of the output file can be specified using SQL*Plus system variables.
This extraction technique offers the advantage of storing the result in a customized
format. Note that, using the external table data pump unload facility, you can also
extract the result of an arbitrary SQL operation. The example previously extracts the
results of a join.
This extraction technique can be parallelized by initiating multiple, concurrent
SQL*Plus sessions, each session running a separate query representing a different
portion of the data to be extracted. For example, suppose that you wish to extract data
from an orders table, and that the orders table has been range partitioned by
month, with partitions orders_jan1998, orders_feb1998, and so on. To extract a
single year of data from the orders table, you could initiate 12 concurrent SQL*Plus
sessions, each extracting a single partition. The SQL script for one such session could
be:
SPOOL order_jan.dat
SELECT * FROM orders PARTITION (orders_jan1998);
SPOOL OFF
These 12 SQL*Plus processes would concurrently spool data to 12 separate files. You
can then concatenate them if necessary (using operating system utilities) following the
extraction. If you are planning to use SQL*Loader for loading into the target, these 12
files can be used as is for a parallel load with 12 SQL*Loader sessions. See Chapter 14,
"Transportation in Data Warehouses" for an example.
Even if the orders table is not partitioned, it is still possible to parallelize the
extraction either based on logical or physical criteria. The logical method is based on
logical ranges of column values, for example:
SELECT ... WHERE order_date
BETWEEN TO_DATE('01-JAN-99') AND TO_DATE('31-JAN-99');
The physical method is based on a range of values. By viewing the data dictionary, it is
possible to identify the Oracle Database data blocks that make up the orders table.
Using this information, you could then derive a set of rowid-range queries for
extracting data from the orders table:
SELECT * FROM orders WHERE rowid BETWEEN value1 and value2;
Parallelizing the extraction of complex SQL queries is sometimes possible, although
the process of breaking a single complex query into multiple components can be
challenging. In particular, the coordination of independent processes to guarantee a
globally consistent view can be difficult. Unlike the SQL*Plus approach, using the
external table data pump unload functionality provides transparent parallel
capabilities.
Note that all parallel techniques can use considerably more CPU and I/O resources on
the source system, and the impact on the source system should be evaluated before
parallelizing any extraction technique.
13-6 Oracle Database Data Warehousing Guide