The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Data Integration Best Practices - Ning ... sas ®

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2017-03-26 01:20:03

Data Integration Best Practices - Ning

Data Integration Best Practices - Ning ... sas ®

Data Integration Best Practices

Subjects:
 Data Integration Structure
 Data Integration Organisation
 Capture Control (CCT Tables)
 Error Monitoring

 Data Validation
 Data Protection (Scrambler)
 Conformed Modelling
 SQL Optimisation

 Self Documentation
 Role Assignment
 Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS In

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ntelligence Platform 9.1.3

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Validation

Challenge: How can I ensure only
warehouse?

Solution: Use the Data Validatio

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

y clean data gets loaded into the
on transformation.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Validation

Challenge: How can I ensure only
warehouse?

Solution: Use the Data Validatio

 Use the standard Invalid, Missing

 Employ custom validation and ap
1 = Exclusion
2 = Correction
3 = Improvement

 Store exceptions in permanent d

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

y clean data gets loaded into the
on transformation.
g, Duplicate tabs.
pply a severity rating:

dataset for further analysis.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Validation

e.g.
Check for
Truncation of
Key columns

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation
1) Create each condition

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation
1) Create each condition
2) Determine validation

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
4) This gets written to temp dataset
ETLS_EXCEPTIONS.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation
1) Create each condition
2) Determine validation
3) Define corrective action if required
4) This gets written to temp dataset
ETLS_EXCEPTIONS.
5) Run %Append_Data_Quality Macro in
post-processing.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Dat

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ta Validation

1) Create each condition
2) Determine validation
3) Define corrective action if required

4) This gets written to temp dataset
ETLS_EXCEPTIONS.

5) Run %Append_Data_Quality Macro in
post-processing.

6) Use BI tools to investigate Data
Quality issues (e.g. Particular source
system requires cleansing)

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Validation – %Append_Data_Quali

Does
ETLS_EXCEPTIONS

exist ?
Yes

Append exceptions to
permanent table
DQ_Error_Event.

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ity Macro Logic.

No Halt macro as no
errors to process.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Validation – Table Properties for DQ

Column name Description

Row_Extraction_Date Date-timestamp when the row was exported or ex
Exception_Event_Date Date-timestamp when the exception was identifie

Job_Name The name of the ETL job which identified the exce
Table_Name The library and table name which contains the row
Row_Number The row number containing the exception.
Column_Name The column name containing the datum of the exc
Screen_Description The screen (data quality test) description.
Exception_Description Standardised description of the exception .
Exception_Action Automated data conform action (if any) .
Exception_Severity The severity level of the DQ Error Event (1=Exclusi
Unconformed_ValueN Original value (numeric) before conforming .
Conformed_ValueN Conformed (numeric) value .
Unconformed_ValueC Original value (character) before conforming .
Conformed_ValueC Conformed (character) value .

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

Q_ERROR_EVENT. Type Length

extracted from the source system. Num (8)
ed by the data warehouse processes. Num (8)
eption.
w and column containing the exception. Char (64)
ception. Char (41)
Num (8)
ion, 2=Correction, 3=Improvement ). Char (32)
Char (256)
Char (256)
Char (256)
Num (8)
Num (8)
Num (8)
Char (256)
Char (256)

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Subjects:
 Data Integration Structure
 Data Integration Organisation
 Capture Control (CCT Tables)
 Error Monitoring

 Data Validation
 Data Protection (Scrambler)
 Conformed Modelling
 SQL Optimisation

 Self Documentation
 Role Assignment
 Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS In

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ntelligence Platform 9.1.3

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Scrambling

Challenge: How can I ensure I’m
on development/test s

Solution: Use Data Scrambling
environments.

 Often development source syste
and warehouses can propagate t
protection act.

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

m not holding sensitive production data
systems.
routines in non-production
ems are created using production data,
the risk of breaching the data

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Scrambling – Custom Transform

The %data_scrambler macro allows
passed through normally.

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

for columns to be scrambled or

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

Data Scrambling – Custom transform
Edit Paramters:

Select
Pass – don’t scramble key fields!
Scramble method:
Ranuni Function
MD5 Function
Translate Function

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Data Scrambling – What about Product

%let liveEnvironment = PROD;

%let thisEnvironment=
%sysfunc(substr(%sysfunc(upcase(%s

 Don’t perform scramble routine if thi
 When runnning in Dev the METASERV
 Could set up a table with environmen

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

tion?

sysfunc(getoption(METASERVER)))),1,4);
isEnvironment = liveEnvironment.
VER option should be different.
nt value in.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Subjects:
 Data Integration Structure
 Data Integration Organisation
 Capture Control (CCT Tables)
 Error Monitoring

 Data Validation
 Data Protection (Scrambler)
 Conformed Modelling
 SQL Optimisation

 Self Documentation
 Role Assignment
 Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS In

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ntelligence Platform 9.1.3

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Conformed Model

Challenge: How can I track trend
don’t hold history.

Solution: Use a conformed data
changing dimensions

Fact Tables

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ds in my data when the source systems
a model in a warehouse, using slowly
where appropriate.

Re-Useable
Dimensions

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Conformed Model

In the Integrate layer use the SCD
of effective date processing.

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

Type II Loader transform to make use

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Conformed Model

In the Integrate Layer use the Surr
for dimension tables.

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

rogate Key Generator to determine keys

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Subjects:
 Data Integration Structure
 Data Integration Organisation
 Capture Control (CCT Tables)
 Error Monitoring

 Data Validation
 Data Protection (Scrambler)
 Conformed Modelling
 SQL Optimisation

 Self Documentation
 Role Assignment
 Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS In

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ntelligence Platform 9.1.3

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SQL Optimisation

Challenge: How can I ensure the
achieved through my

Solution: Use the undocumente
procedure to determin

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

best possible SQL performance is
SQL Join transform.
ed _Method option on the SQL
ne processing.

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SQL Optimisation: _Method Option (SAS

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

S Note 33604)

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

Subjects:
 Data Integration Structure
 Data Integration Organisation
 Capture Control (CCT Tables)
 Error Monitoring

 Data Validation
 Data Protection (Scrambler)
 Conformed Modelling
 SQL Optimisation

 Self Documentation
 Role Assignment
 Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS In

www.definitivequality.com Copyr

SAS® Professionals Convention

14-16 July 2009

ntelligence Platform 9.1.3

right © 2009 Defin itive Quality Solutions Limited. Registered in England No.:05141146.


Click to View FlipBook Version