The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

CREATING CUSTOM TABLES USING SAS® Tyler Cole, Pacific Data Designs, Inc., San Francisco, CA ABSTRACT Columnar tables showing various statistics and summary values ...

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2016-05-06 07:03:03

CREATING CUSTOM TABLES USING SAS - lexjansen.com

CREATING CUSTOM TABLES USING SAS® Tyler Cole, Pacific Data Designs, Inc., San Francisco, CA ABSTRACT Columnar tables showing various statistics and summary values ...

CREATING CUSTOM TABLES USING SAS®

Tyler Cole, Pacific Data Designs, Inc., San Francisco, CA

ABSTRACT

Columnar tables showing various statistics and summary values across different populations are commonly
used to report data in many business environments. In some cases, the content and layout of such tables are
not dynamic – populations, table entries, and statistics are determined ahead of time and “coded into”
individual table programs. Depending on the programming approach, creating and modifying columnar tables
can be a simple or daunting task.
This paper presents one common program design implemented to create tables for clinical trials. Following
this design, statistical macros are used to create summary values which are combined and then output using
DATA _NULL_. Along with an outline of this method, key topics such as input data structure, macro design,
macro output, and DATA _NULL_ reporting are addressed in this paper. With an efficient program design and
a clear understanding of the issues involved, creating columnar tables becomes a snap.

INTRODUCTION

Tables are thought-out and designed ahead of programming; they have a fixed number of columns and a fixed
number of entries. Most tables for clinical trials have at least 3 columns: one column to describe table entries,
one column to identify statistical calculations, and one or more columns to report statistical values calculated
for different populations. Below is a sample table used to convey demographic and vital signs information:

Following the approach described in this paper, program steps taken to complete columnar tables are
organized into three separate functions: statistical computations, output collection and reporting. Factoring in
the layout of the input dataset containing records to be summarized, these three functions are successively
completed to produce tables in a reliable, well-organized manner.

THE INPUT DATASET

The input dataset contains patient records which are summarized and reported in a table. Key variables used
to identify individual patients are present in this dataset; variables used to indicate patient populations - e.g.
dose group - are included as well. All analysis variables are numeric. Categorical values such as {male,

female} and {yes, no} are converted into numeric equivalents. Using numeric codes for categorical values
makes it easier to control the order which values appear in a table entry.
Statistics in a table are calculated taking into account the record structure of the input dataset. Essentially, two
layouts are available: horizontal and vertical. Horizontal datasets contain one record per patient and have
separate variables used to store separate analysis values. In contrast, vertical datasets contain many records
per patient and may store different analysis values in the same variable. Sample datasets are show below:

Horizontal Input Dataset: DemoVS_Hor.sas7bdat

PATID DOSE AGE GENDER RACE VIS1_WGT VIS2_WGT
A201 50 29 1 72.5 74.7
A202 0 64 2 1 54.3 55.7
A203 50 37 1 2 70.2 75.2
A204 0 55 1 . 71.1 .
A205 50 40 2 4 58.9 62.5
A206 . 20 2 2 62.2 68.0
5

Horizontally oriented dataset have one record per patient. Information is horizontally spread out among
different analysis variables.

Vertical Input Dataset: DemoVS_Ver.sas7bdat

PATID DOSE VIS AGE GENDER RACE WGT
A201 50 1 29 1 1 72.5
A201 50 2 29 1 1 74.7
A202 0 1 64 2 2 54.3
A202 0 2 64 2 2 55.7
A203 50 1 37 1 . 70.2
A203 50 2 37 1 . 75.2
A204 0 1 55 1 4 71.1
A204 0 2 55 1 4
A205 50 1 40 2 2 .
A205 50 2 40 2 2 58.9
A206 . 1 20 2 5 62.5
A206 . 2 20 2 5 62.2
68.0

Vertically oriented input datasets have many records per patient. Information is vertically organized into value-
specific records identifiable by additional key variables. Using these additional variables, appropriate records
are selected when computing statistics.
As a rule, information stored in a vertically oriented dataset require less analysis variables than if stored
horizontally. Additional analysis values are easier to incorporate into vertical datasets; however, horizontal
datasets are more straightforward to use.

STATISTICAL MACROS

Summary statistics, as well as specialized statistical values, are calculated using macros stored separate from
the table program. These macros summarize data contained in the input dataset. Given parameters such as
input dataset, analysis variable, output dataset name, and output variable name, a statistical macro will
produce a custom-named dataset containing, among other things, a custom-named variable having a set of
statistical values computed for a certain population. For commonly used statistics – population count, median,

mean, standard deviation, minimum, maximum, partial count, and percent - the stat code variable (statc) is
supplied and used to identify, select, and merge statistics for a table. Statc is a standardized id code shared
among statistical macros.

Sample stat code values are shown in the format procedure below.

Format: statc.
proc format;

value statc
1 = ‘N’
2 = ‘Median’
3 = ‘Mean’
4 = ‘S.D.’
5 = ‘Min, Max’
6 = ‘n (%)’;

run;

Note: stat code 1, “N,” is the number of patients in the effective population. This value represents the number
of patients in a certain population who have data sufficient to take part in a statistical computation.

EXAMPLE 1: %STAT

The statistical macro %STAT produces a set of summary statistics each identified by a stat code value. Given
parameter values for the input dataset (DemoVs_Hor), analysis variable (vis1_ wgt), output dataset name
(E2P3), and output variable (P3) this macro executes to create summary values for a table entry.

%STAT(indsn=DemoVS_Hor, invar=vis1_wgt, outdsn=E2P3, outvar=P3);

The E2P3 dataset is created.

Output: E2P3.sas7bdat

STATC P3
1 6
2 66.2
3 64.9
4 7.5
5 54.3, 72.4

Note: E2P3 was created using a horizontally oriented input dataset. If a vertically oriented dataset were used,
similar records would be selected using a filter.

%STAT(indsn=DemoVS_Ver, filter=if vis=1, invar=wgt, outdsn=E2P3, outvar=P3);

The resulting dataset would be the same as before.
As a general rule, records containing missing analysis variable values are not included in the effective

population and do not take part in the calculation of statistics as shown in %STAT. If records or variable values
are not sufficient for a given calculation, N/A is produced.

EXAMPLE 2: %PCNT_ME

Some statistical macros produce value summaries for a categorical analysis variable. Mutually exclusive
percentages summing to 100, for example, would be calculated by a macro of this sort. Macros operating on
categorical variables produce similar statistics for different categorical values. Consequently, stat code

(statc) has repeated entries and is not sufficient to identify values. When this is the case, a line number
variable (lineno) is supplied to denote the categorical value summarized.
The %PCNT_ME (percent - mutually exclusive) macro calculates a population count with partial counts and
percents for each value present in a categorical analysis variable. For race, this macro is called as:

%PCNT_ME(indsn=DemoVS_Hor, invar=race, outdsn=E4P3, outvar=P3);

The E4P3 dataset is then created with summaries for the values present: race=1 (Caucasian), race=2 (Black),
race=4(Hispanic), and race=5 (other).

Output: E4P3.sas7bdat;

LINENO STATC P3
-1
1 15
2
4 6 1 (20%)
5
6 2 (40%)

6 1 (20%)

6 1 (20%)

In short, lineno identifies the categorical value summarized; statc identifies the type of summarization
performed. Together, these variables are sufficient to identify output values produced by %PCNT_ME.

INTERNAL FRAMES

Macros such as PCNT_ME produce summaries for each value of a categorical variable. Unless a frame is
implemented, these macros can only produce statistics for values present in the analysis variable itself. This
much is institutive: how can a macro know what is missing if it isn’t there? A frame allows table entries to show
0 or N/A for values not present in the data; it can be applied internal or external to the macro. External frames
are explained in the next section; internal frames are presented thus:

An internal frame is given a set of potential values through a parameter in the macro call. This parameter may
involve a list of values or a do loop. For example:

%PCNT_ME(indsn=DemoVS_Hor, invar=race, outdsn=E4P3, outvar=P3, frame=1 to 5);

The E4P3 dataset is produced.

Framed Output: E4P3.sas7bdat;

LINENO STATC P3
-1 1 5
1 6 1 (20%)
2 6 2 (40%)
3 6 0 (N/A)
4 6 1 (20%)
5 6 1 (20%)

All possible values for a categorical analysis variable are listed in a frame parameter. Depending on the

effective population count, %PCNT_ME will show 0 (0.0%) or 0 (N/A) for framed categories not present in the
data. Framed categories which are not present among analysis variable values come about in three data-
specific situations:

• No records exist for the population being evaluated (effective population count: 0)

• Records for the population exist but analysis variable values are all missing (effective population
count: 0)

• Population records and analysis variable values exist, but not all categories are present in the data
(effective population count: 1 or more)

The effective population count for the first situation is zero – no population records exist. As a result, the
denominator used to calculate percentages is also zero. Framed categories are consequently produced with 0
(N/A%). Percentages for mutually exclusive values are not possible in the second situation – no analysis
values are available. Framed categories are then produced as 0 (N/A%) as well. Categorical analysis variable
values exist in the third situation, yet not all categories are present. These missing categorical values are
provided by the frame; the corresponding output value is 0 (0.0%).

THE TABLE DATASET

The table dataset holds entry and statistic codes as well as statistical values reported in the body of a table.
After statistical macros have executed, the resulting datasets are combined to form entry-specific datasets.
These entry datasets are then concatenated to form the final Table_DS table dataset. Unneeded statistics are
filtered out during this process; external frames, if used, are applied as well. Once the table dataset is
completed, records are sequentially output to an external table file using reporting methods such as DATA
_NULL_.

ENTRY DATASETS

Statistical macros execute to produce datasets containing statistical values for a given (effective) population.
When a table requires statistics to be calculated for multiple populations, macros are called separately for
each population. In other words, macros are called in sets - each set produces population-specific datasets
having values needed for an entry of a table. These datasets are then merged together when forming an entry
dataset.
Macros called in the set below create statistics for three populations: Placebo (dose=0), 50 Mg (dose=50), and
All. Output datasets are named using the table-entry/population convention (ExPy). Output variables are
named for the respective population (Py).

%STAT(indsn=DemoVS_Hor,filter=if (dose=0), invar=age,outdsn=E2P1,outvar=P1);

%STAT(indsn=DemoVS_Hor,filter=if (dose=50), invar=age,outdsn=E2P2,outvar=P2);

%STAT(indsn=DemoVS_Hor, invar=age,outdsn=E2P3,outvar=P3);

The resulting datasets are:

Output: E2P1.sas7bdat Output: E2P2.sas7bdat Output: E2P3.sas7bdat

STATC P1 STATC P2 STATC P3
1 2 1 3 1 6
2 59.5 2 37.0 2 38.5
3 59.5 3 35.3 3 40.8
4 6.4 4 5.7 4 16.3
5 55.0, 64.0 5 29.0, 40.0 5 20.0, 64.0

Using statc, these datasets are merged to form the entry dataset for the second table entry A variable
identifying the table entry itself (entry) is created as well.

data Entry2; merge E2P1 E2P2 E1P3; by statcode; entry=2; run;

Contents of Entry2 are shown:

Output: Entry2.sas7bdat

STATC ENTRY P1 P2 P3
1 2 2 3 6
2 2 59.5 37.0 38.5
3 2 59.5 35.3 40.8
4 2 6.4 5.7 16.3
5 2 55.0, 64.0 29.0, 40.0 20.0, 64.0

Statistics which are computed, but not shown on a table are filtered out of entry datasets by specifying a
condition on statc.

EXTERNAL FRAMES

Depending on the data being reported, table entries which show statistics for all possible values of a
categorical variable may require a frame. If an internal frame is not specified in macro, an external frame may
be used. External frames are applied as entry datasets are created. Taking effective population counts from

population-specific datasets created previously, an entry frame is formed by outputting records (and lineno)
for all categorical values possible.

%PCNT_ME(indsn=DemoVS_Hor,filter=if (dose=0), invar=race,outdsn=E4P1,outvar=P1);

%PCNT_ME(indsn=DemoVS_Hor,filter=if (dose=50),invar=race,outdsn=E4P2,outvar=P2);

%PCNT_ME(indsn=DemoVS_Hor, invar=race,outdsn=E4P3,outvar=P3);

Datasets produced by this set of macros:

Unframed Unframed Output: Unframed Output:
Output:E4P1.sas7bdat E4CP2.sas7bdat E4P3.sas7bdat

LINENO STATC P1 LINENO STATC P2 LINENO STATC P3
-1 1 2 -1 1 2 -1 1 5
2 6 1(50%) 1 6 1(50%) 1 6 1(20%)
4 6 1(50%) 2 6 1(50%) 2 6 2(40%)
4 6 1(20%)
5 6 1(20%)

If the table entry dataset were formed by merging these datasets without a frame, a category value - lineno=3
(Asian) - would be missing. Additionally, gaps would also exist in the Placebo and 50 mg populations. Using
the unframed output shown above, the corresponding entry frame is created.

data E4Frame(keep=lineno statc entry Entry4 Frame: E4Frame.sas7bdat;
np1 np2 np3);
LINENO STATC ENTRY NP1 NP2 NP3
merge E4P1 E4P2 E4P2; 1 6 4225
by lineno; 2 6 4225
entry=4; 3 6 4225
if lineno=-1 then do lineno=1 to 5; 4 6 4225
5 6 4225
statc=6;
np1=input(p1, 8.);
np2=input(p2, 8.);
np3=input(p3, 8.);
output;
end;
run;

The fourth table entry is now formed by merging population-specific datasets along with the frame.

data Entry4; Entry4 Frame: E4Frame.sas7bdat;
merge E4P1
E4P2 LINENO STATC ENTRY P1 P2 P3 NP1 NP2 NP3
E4P3
E4Frame; -1 1 4 2 2 5
by lineno;
entry=4; 164 1(50%) 1(20%) 2 2 5

run; 2 6 4 1(50%) 1(50%) 2(40%) 2 2 5

364 225

4 6 4 1(50%) 1(20%) 2 2 5

564 1(20%) 2 2 5

All possible categories are now present in the Entry4 dataset; however, gaps do exist in the placebo and 50
Mg populations. When the Table_DS dataset is created, these gaps are filled in following the same reasoning
as described with internal frames.

THE TABLE_DS DATASET

The Table_DS dataset holds all statistical values and entry information reported in a table. Records for this
dataset (the table dataset) are gained by concatenating individual datasets created for table entries. Gaps
resulting from external frames, if any, are filled in as iterations are made through the Table_DS data step.
This is the last data step completed before reporting takes place.

data Table_DS;
*Concatenate entry datasets;
set Entry1 Entry2 Entry3 Entry4 Entry5 Entry6;

*If needed, fill gaps resulting **;

*from an external frame **;

array pop(*) p1 – p3;

array npop(*) np1 – np3;

if entry in(3,4) then do i=1 to dim(pop);

if missing(pop(i)) then do;

if 0 < npop(i) then pop(i)=’ 0(0.0%)’;

else pop(i)=’ 0 (N/A)’;

end;

end;

run;

To facilitate reporting using DATA _NULL_, this dataset is sorted by entry.

proc sort data=Table_DS; by entry; run;

The reporting step then begins.

DATA _NULL_ REPORTING

DATA _NULL_ is a common reporting technique used to output records to an external file. This is the most
versatile of reporting methods offered by SAS – it imparts complete control over most features of a report. This
versatility, unfortunately, entails more programming. Even the most elementary components of a table must
be explicitly coded. The reporting program described in this section is not elaborate; however, basic elements
are described.

ENTRY LABELS

Besides population-specific statistics, the Table_DS table dataset contains variables used to identify table-

entries (entry), categorical values (lineno), and statistics (statc). Formatted values of these identifier
variables are output in the leftmost columns of the table with subsequent statistical values following on the
right. Formats for these variables are thought-out and created in advance.
Entry numbers, and corresponding formatted values, are created based on the order they appear in the table.

proc format;
value entry
1=’Number Of Patients’
2=’Age (Years)’
3=’Gender’
4=’Ethnic Group’
5=’Weight(Lb): Visit 1’
6=’Weight(Lb): Visit 2’;

run;

Separate formats for categorical variables are created as well.

proc format;
value gender
1=”Male’
2=’Female’;
value race
1=’Caucasian’
2=’Black’
3=’Asian’
4=’Hispanic’
5=’Other’;

run;

Format values for stat code (statc) were described earlier.
Entry and value labels assigned in this manner must be updated whenever table entry or category values are
reordered or reassigned. Outdated label formats can cause incorrect results to appear on a table.

COLUMN SETTINGS

Widths and spacing for table columns are set before executing DATA _NULL_. Values for these settings,
along with a column count, are set in macro variables. These variables are called when column positions are
calculated for the table.

%LET SPACING=4; ** Spaces Between Table Columns;
%LET COLCNT=5; ** Number Of Table Columns;
%LET COLW1=30; ** Width: Table Column 1;
%LET COLW2=10; ** Width: Table Column 2;
%LET COLW3=10; ** Width: Table Column 3;
%LET COLW4=10; ** Width: Table Column 4;
%LET COLW5=10; ** Width: Table Column 5;

Using these settings, column positions are calculated:

*Calculate Column Positions;
%LET COL1=8; *Position Of The First Column;
%DO I=2 %TO &COLCNT;

%LET PREV=%EVAL(&I-1);
%LET COL&I=%EVAL(&&COL&PREV + &&COLW&PREV + &SPACING);
%END;

DATA _NULL_

After label formats have been created, and column positions calculated, DATA _NULL_ executes to create an
external table file. Put statements for formatted entry and categorical variable codes are conditionally
executed based on values for entry, lineno, and statc. Outputting entry and value labels using formats
remains feasible as long as labels are short and fit into one column. However if labels are long and require
multiple lines in the table, then other labeling methods must be used.

data _null_;
set Table_DS;
by entry;

file out print header=hdr linesleft=remain;

* --- Report Body;
if first.entry then put /@&COL1 entry entry.;
else if entry=3 then put @&COL1 +3 gender gender.;
else if entry=4 then put @&COL1 +3 race race.;

put @&COL2 statc statc.
@&COL3 p1
@&COL4 p2
@&COL5 p3;

* --- Footer;
*Note: Cutoff value depends on the number;
* of lines required for the footer;
if last or remain < 9 then link ftr;
return;

* ----- PUT HEADER;
hdr:

...Insert titles...
...Insert column headers...
return;

* ----- PUT FOOTER;
ftr:

...Insert footnotes...
...Begin a new page...
return;
run;

CONCLUSION

This paper presented one program design implemented to calculate and report statistics in a columnar table.
Following this design, tables are completed in three distinct programming steps: statistical computation, output

collection, and reporting. Three key identification variables - entry, lineno, and statc – were used
throughout these steps to merge, label, and report table entries and statistics. One powerful variation of this
design involves packaging statistical macros “called in sets” into bigger macros able to create table entries in a
single step given population specifications. Further variations are possible indeed.

APPENDIX

/*******************************************************************************/

/* File: Stat.sas */

/* Description: Generate summary statistics for an analysis variable */

/*******************************************************************************/

%MACRO STAT(indata=all, filter=, invar=, outdata=Stat, outvar=P, outfmt=5.1);

data _TheData; set &indata; invar=&INVAR; &filter; run;

* ----- compute summary statistics;
proc summary data=_TheData;
var invar;
output out=_Summ n=n median=median mean=mean std=std min=min max=max;
run;

data &outdata(keep=statc &outvar);
length &outvar $ 13;
if 0 < nobs then set _Summ nobs=nobs;

do statc=1 to 5;
if statc=1 then do;
if missing(n) then &OUTVAR='0';
else &OUTVAR=put(n, 3.);
output;
end;
else if statc=2 then do;
if missing(median) then &OUTVAR=' N/A';
else &OUTVAR=put(median, &OUTFMT.);

output;
end;
else if statc=3 then do;

if missing(mean) then &OUTVAR=' N/A';
else &OUTVAR=put(mean, &OUTFMT.);
output;
end;
else if statc=4 then do;
if missing(std) then &OUTVAR=' N/A';
else &OUTVAR=put(std, &OUTFMT.);
output;
end;
else if statc=5 then do;
if missing(min) and

missing(max) then &OUTVAR=' N/A, N/A';
else &OUTVAR=put(min, &OUTFMT.)||', '||left(put(max, &OUTFMT.));
output;
end;
end;
run;

proc datasets nolist; delete _TheData _Summ; quit;
%MEND STAT;

/*******************************************************************************/

/* File: PCNT_ME.sas */

/* Description: Create mutually exclusive percents for a categorical analysis */

/* variable. */

/*******************************************************************************/

%MACRO PCNT_ME(indata=all, filter=, invar=, frame=, outdata=Pcnt_ME, outvar=P);
data _TheData; set &indata; invar=&INVAR; &filter; run;

*** Missing values excluded;
proc freq data=_TheData noprint;
tables invar / out=_Pcnt(where=(not missing(invar)));
run;

*** Total number of non-missing records;
data _Pcnt;

if nobs=0 then count=0;
else set _Pcnt nobs=nobs;
run;

%IF &FRAME NE %THEN %DO;
data _Frame; do invar=&FRAME; output; end; run;
proc sort data=_Frame; by invar; run;

data _Pcnt;
merge _Pcnt
_Frame;
by invar;

run;
%END;

*** Calculate the total included in the percentage;
proc sql;
create table _PcntTotal as
select invar, count, percent, sum(count) as total
from _Pcnt;

quit;

data &outdata(keep=lineno statc &OUTVAR);
length &outvar $ 13;
set _PcntTotal;

if _n_=1 then do;
lineno=-1;
&outvar=put(total, 3.);
statc=1;
output;

end;

if not missing(invar) then do;
if missing(count) then count=0;
if missing(percent) then percent=0;

if round(percent, .00001) =100 then
&outvar=put(count, 3.)||' ('||
trim(left(put(percent/100, percent8.1)))||')';

else if 0 < total then
&outvar = put(count, 3.)||' ('||
trim(put(percent/100, percent7.1))||')';

else &outvar = put(count, 3.)||' ( N/A)';

statc=6;
lineno=invar; *invar must be numeric;
output;
end;
run;

proc datasets nolist; delete _TheData _Pcnt _Frame _PcntTotal; quit;
%MEND PCNT_ME;

REFERENCES

SAS Institute Inc, (2000), SAS Language Reference: Dictionary, Version 8. Cary, NC: SAS Institute Inc.

SAS Institute Inc, (1997), SAS Macro Language: Reference, First Edition. Cary, NC: SAS Institute Inc.

CONTACT INFORMATION

Tyler Cole
Pacific Data Designs, Inc.
900 North Point Street, Suite C180
San Francisco, CA 94109
(415) 776-0660
[email protected]

TRADEMARK INFORMATION

SAS and SAS Certified Professional are registered trademarks of SAS Institute, Inc. in the USA and other
countries.  indicates USA registration.


Click to View FlipBook Version