Proc SQL With Where Clause In SAS

We can have a look into WHERE clause in SQL Procedure. It helps us to dig the specific data but it could be faster and useful if we use WHERE Clause with operators.

Suppose we have a SAS dataset and variable of it as follows: Date, Name,LName Address, State

e.g: Name =SHANKER,  LName=SAINI

To select data from any range we can use BETWEEN – AND operator with Where clause in PROC SQL.

  • Syntax: Where Date between ‘01Jul2014’d AND ‘28Jul2014’d


We can use IN operator to select values from list:

  • Syntax: WHERE State in (‘MH’,’UP’,’PB’)


We can use MISSING operator to select missing values from data:

  • Syntax: Where State is MISSING


We can use CONTAINS operator to select a string by using its letters only:

  • Syntax: Where Name CONTAINS ‘KER’


 We can use LIKE operator with WHERE Clause with PROC SQL to select any pattern:

  • Syntax:  Where name like ’K_R’



  1. We have two wild card operators with WHERE clause ‘_’ (Underscore) and ‘%’ (Percent)

  2. With the Like operator in WHERE Clause , Underscore(_) wild card is for one character and percent ‘%’ sign is for more than one character missing in string



  • Syntax : Where Name like ‘%ER’


We can use (*) to select the data with spelling variation or for values that sound like specified values like:   Syntax : Where LName=*’SAINI’

We can use a column reference / alias in WHERE Clause to refer to a calculated value (We have to use CALCULATED keyword in this case).

 PROC SQL;

Select Var1, Var2, Var3 + Var4 as Total, CALCULATED Total/4 as Quarter from DSN; QUIT;

Or

PROC SQL;

Select Var1, Var2, Var3 + Var4 as Total from DSN

Where CALCULATER Total  > 500;      or      Where Var3 + Var4 > 500 ;  QUIT;

SAS SORTSIZE Option For Large Amount Data Files Of SAS Data Sets

Hi,

Suppose we are dealing with a file (.txt) which is around 500 mb and we need to sort the data with the help of Proc Sort. if we directly sort the data then system would not be able to do it. It would get hang and all.

To deal with such type of big data, we need to check some SAS System options like : SORTSIZE (specific to proc sort), MEMSIZE, REALMEMSIZE.

1. The MEMSIZE option is set defaults to 96 MB and SORTSIZE is set to default to 80 MB for Unix/Linux servers.

It is not good to set the MEMSIZE option to zero on Unix and Linux servers.

2. On Windowing environment, MEMSIZE option is set to zero by default and for SORTSIZE it is 64 MB.

3. SORTSIZE option creates a temporary utility file in sas work library, in which it store the data

4. The SORTSIZE sas option is similar to the REALMEMSIZE sas system option

5. SORTSIZE sas system option only effect the SORT Procedure while REALMEMSIZE sas system option effects multiple procedures.

SORTSIZE = option ;

Proc Sort statement supports the sortsize = option , which set the limit of the amount of memory available for proc sort to use.

To set sortsize option:-  Here "n" is defined as a real number only

n         -     Amount of memory in bytes

nK      -     Amount of memory in kilobytes

nM      -     Amount of memory in megabytes

nG       -     Amount of memory in Gigabytes

Min     -     Specify the minimum amount of memory available

Max    -     Specify the maximum amount of memory available

 

* SORTSIZE = value can increase or decrease CPU and I/O resource utilization

* If our machine has 14 MB of physical memory and we are sorting large data sets, setting SORTSIZE option between 4 MB to 10 MB may improve your system performance.

* We always need free disk space that should be equals to three to four time of large data sets. Suppose our data set requires 2 MB of disk space, then we would be in need 6 MB to 8 MB of disk space to sort the data

 

Finding Max Observation By Data Step Only In SAS

Hi All,

Suppose we have a data set like below:

EmpID      EmpSal     EmpName

E001          5000          A

E002          7500          B

E003          2500          C

E004          9800          D

E005          6100          E

 

I am looking for the maximum salaried EmpID / EmpName or whatever which should contain maximum value from a variable.

then you can try like it, without using any procedures and options:

 

data want;

   set have nobs=_nobs_;

   if EmpSal > max_EmpSal then do;

     max_EmpSal = EmpSal;

     max_EmpID = EmpID;

     max_EmpName = EmpName;

   end;

   retain max_:;

   keep max_:;

   if _n_ = _nobs_;

run;

 

SAS PDV - Program Data Vector In SAS

Hi,

PDV (Program Data Vector) : The information about each of the variables is stored in a reserved area of memory called PDV.

SAS Processes data steps in two stages : (1) Compile Stage  (2) Execution Stage

In Compile stage SAS does some housekeeping jobs like to prepare an area to store the sas dataset, then to check input file ( infile statement) to fix various attribute of input file. Then to put all the information at a place in memory called as INPUT BUFFER (where it place each record of data as it is read from the input file).

SAS checks for each variable is character or numeric and the storage length of each variable, this information is called the descriptor portion of the data set.

Descriptor Portion contains : Name of data set, Date and time that the data set was created , Number of observations and the number of variables (It contains attribute information for each variable in the data set )

SAS does following action during compile stage :

(a)  Syntax Scan

(b)  Sas Source code translation to machine language

(c)  Definition of input and output files

(d)  Create input buffer, PDV and data set descriptor information

(e)  Setting the variable attribute for output sas data set

(f)  Capture of variables to be initialized to missing.

 

Regards

Uma Shanker Saini