Data Integration Best Practices
Subjects:
Data Integration Structure
Data Integration Organisation
Capture Control (CCT Tables)
Error Monitoring
Data Validation
Data Protection (Scrambler)
Conformed Modelling
SQL Optimisation
Self Documentation
Role Assignment
Rename Standard Transforms
SAS DI Studio Version 3.4 under SAS In
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ntelligence Platform 9.1.3
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Validation
Challenge: How can I ensure only
warehouse?
Solution: Use the Data Validatio
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
y clean data gets loaded into the
on transformation.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Validation
Challenge: How can I ensure only
warehouse?
Solution: Use the Data Validatio
Use the standard Invalid, Missing
Employ custom validation and ap
1 = Exclusion
2 = Correction
3 = Improvement
Store exceptions in permanent d
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
y clean data gets loaded into the
on transformation.
g, Duplicate tabs.
pply a severity rating:
dataset for further analysis.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Validation
e.g.
Check for
Truncation of
Key columns
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
2) Determine validation
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
4) This gets written to temp dataset
ETLS_EXCEPTIONS.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
4) This gets written to temp dataset
ETLS_EXCEPTIONS.
5) Run %Append_Data_Quality Macro in
post-processing.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Dat
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
4) This gets written to temp dataset
ETLS_EXCEPTIONS.
5) Run %Append_Data_Quality Macro in
post-processing.
6) Use BI tools to investigate Data
Quality issues (e.g. Particular source
system requires cleansing)
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Validation – %Append_Data_Quali
Does
ETLS_EXCEPTIONS
exist ?
Yes
Append exceptions to
permanent table
DQ_Error_Event.
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ity Macro Logic.
No Halt macro as no
errors to process.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Validation – Table Properties for DQ
Column name Description
Row_Extraction_Date Date-timestamp when the row was exported or ex
Exception_Event_Date Date-timestamp when the exception was identifie
Job_Name The name of the ETL job which identified the exce
Table_Name The library and table name which contains the row
Row_Number The row number containing the exception.
Column_Name The column name containing the datum of the exc
Screen_Description The screen (data quality test) description.
Exception_Description Standardised description of the exception .
Exception_Action Automated data conform action (if any) .
Exception_Severity The severity level of the DQ Error Event (1=Exclusi
Unconformed_ValueN Original value (numeric) before conforming .
Conformed_ValueN Conformed (numeric) value .
Unconformed_ValueC Original value (character) before conforming .
Conformed_ValueC Conformed (character) value .
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
Q_ERROR_EVENT. Type Length
extracted from the source system. Num (8)
ed by the data warehouse processes. Num (8)
eption.
w and column containing the exception. Char (64)
ception. Char (41)
Num (8)
ion, 2=Correction, 3=Improvement ). Char (32)
Char (256)
Char (256)
Char (256)
Num (8)
Num (8)
Num (8)
Char (256)
Char (256)
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Subjects:
Data Integration Structure
Data Integration Organisation
Capture Control (CCT Tables)
Error Monitoring
Data Validation
Data Protection (Scrambler)
Conformed Modelling
SQL Optimisation
Self Documentation
Role Assignment
Rename Standard Transforms
SAS DI Studio Version 3.4 under SAS In
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ntelligence Platform 9.1.3
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Scrambling
Challenge: How can I ensure I’m
on development/test s
Solution: Use Data Scrambling
environments.
Often development source syste
and warehouses can propagate t
protection act.
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
m not holding sensitive production data
systems.
routines in non-production
ems are created using production data,
the risk of breaching the data
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Scrambling – Custom Transform
The %data_scrambler macro allows
passed through normally.
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
for columns to be scrambled or
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
Data Scrambling – Custom transform
Edit Paramters:
Select
Pass – don’t scramble key fields!
Scramble method:
Ranuni Function
MD5 Function
Translate Function
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Data Scrambling – What about Product
%let liveEnvironment = PROD;
%let thisEnvironment=
%sysfunc(substr(%sysfunc(upcase(%s
Don’t perform scramble routine if thi
When runnning in Dev the METASERV
Could set up a table with environmen
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
tion?
sysfunc(getoption(METASERVER)))),1,4);
isEnvironment = liveEnvironment.
VER option should be different.
nt value in.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Subjects:
Data Integration Structure
Data Integration Organisation
Capture Control (CCT Tables)
Error Monitoring
Data Validation
Data Protection (Scrambler)
Conformed Modelling
SQL Optimisation
Self Documentation
Role Assignment
Rename Standard Transforms
SAS DI Studio Version 3.4 under SAS In
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ntelligence Platform 9.1.3
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Conformed Model
Challenge: How can I track trend
don’t hold history.
Solution: Use a conformed data
changing dimensions
Fact Tables
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ds in my data when the source systems
a model in a warehouse, using slowly
where appropriate.
Re-Useable
Dimensions
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Conformed Model
In the Integrate layer use the SCD
of effective date processing.
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
Type II Loader transform to make use
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Conformed Model
In the Integrate Layer use the Surr
for dimension tables.
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
rogate Key Generator to determine keys
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Subjects:
Data Integration Structure
Data Integration Organisation
Capture Control (CCT Tables)
Error Monitoring
Data Validation
Data Protection (Scrambler)
Conformed Modelling
SQL Optimisation
Self Documentation
Role Assignment
Rename Standard Transforms
SAS DI Studio Version 3.4 under SAS In
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ntelligence Platform 9.1.3
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
SQL Optimisation
Challenge: How can I ensure the
achieved through my
Solution: Use the undocumente
procedure to determin
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
best possible SQL performance is
SQL Join transform.
ed _Method option on the SQL
ne processing.
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
SQL Optimisation: _Method Option (SAS
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
S Note 33604)
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.
Data Integration Best Practices
Subjects:
Data Integration Structure
Data Integration Organisation
Capture Control (CCT Tables)
Error Monitoring
Data Validation
Data Protection (Scrambler)
Conformed Modelling
SQL Optimisation
Self Documentation
Role Assignment
Rename Standard Transforms
SAS DI Studio Version 3.4 under SAS In
www.definitivequality.com Copyr
SAS® Professionals Convention
14-16 July 2009
ntelligence Platform 9.1.3
right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.