Difference Between Set Statement And Proc Append In SAS For Appending Data

Hi,

As we all know that there are three ways to upload data into the Data Warehouse:

Append – Can say :Append to existing

  1. Replace

  2. Update/ Insert

Here, we are just going to explain one way only named as APPEND

What append means: We can say that, append is a process in which data would be added in to a primary table from last observation (from the bottom) from secondary table.

Here we are going to see the difference between two ways of appending the tables in Data warehouse and we would see that which one (Data Step or Proc Step) is more efficient.

Two ways to append the data are:

  1. Set Statement ( In Data Step)

  2. Proc Append ( In Proc Step)

Suppose we have two Data sets named as Primary and Secondary both have three variables and 252000,48000 observations respectively.

Dataset Primary – 3 Variables (X, Y, Z) – 252000 Observations

Dataset Secondary – 3 Variables (X, Y, Z) – 48000 Observations

Just Run a small piece of code:

Data  Set_Append;

Set Primary Secondary;

Run;


Combined dataset SET_APPEND would have 300000 observations. If we see the log then we would get to know that SET Statements read both the data set. It means SAS processes the both data set.  Log is displaying the CPU timing and Real timings which are 0.17 and 0.18 seconds respectively.

Now try to run a small piece of code again from SAS procedure named as PROC APPEND:

Proc Append Base=Practice Data=Secondary force;

Quit;


Now Primary dataset would have 300000 observations and as per log, Proc append is reading only secondary dataset, it means it is we have saved time from SAS which was in to read primary dataset. Log is displaying this information is clearly that CPU timing and Real timings which are 0.01 and 0.04 seconds respectively.

Some useful differences between Proc Append and Data Step SET Statement while using as Load Style APPEND for data warehouse:

  • Proc Append is faster than data step SET Statement

  • We can use proc append for only two tables or datasets

  • In Data step, Set statement, we can merge any number of tables or datasets

  • Proc Append uses formats, Informats and Labels from BASE= dataset

  • Set Statement uses defined formats, informats and labels explicitly

  • For Set Statement, if same variable is having different length in two or more datasets, then length would be defined as per the first dataset in SET statement

  • Proc Append, looks for the FORCE option to concatenate and, if length of the variable does not match with primary dataset or table

For description related to FORCE option in PROC APPEND, Kindly refer to:
Why We Use FORCE Option With PROC APPEND In SAS To Upload Data To DataWarehouse

5 comments:

  1. This information is priceless. Where can I find out more?

    ReplyDelete
  2. What is the difference between single set and multiple set statements?
    How can v use multiple &'s?

    ReplyDelete
  3. Sir Uma shankar,
    Thanks for shating your valuable knowledge .may i know the difference between CPU time and real time mentioned in Set and Process Append discussion.

    Thanks ,
    B.Veers Babu,
    9948556112.

    ReplyDelete
  4. Very precisely said and covers significant information

    ReplyDelete