SASCERTGURU: August 2014

Difference Between SAS SUM Function And Using '+' Operator In SAS

Hi All,

Operators : For any type of calculation, we all are much familiar with Division (/), Multiplication (*), Addition (+) and Substraction (-) which are known as operators. Whenever we are in need to do some calculation in SAS, we use operators. Operators are to do indicated operation between SAS variables or constant values.

Sum Function in SAS :- It returns the sum of non missing and missing arguments.

Ex: Total = Sum (U,M,A);

Total = Sum (of U1 - U10) ;

Total = Sum ( U M A );

"+" Operator in SAS :- It returns a missing value if any of the values (arguments ) are missing.

Difference between SAS Procedure and SAS function

SAS functions would be in need for argument values to be supplied across an observation in SAS dataset.

SAS functions are already written ( pre-written) expressions which gives programming shortcuts for many manipulation and calculation of data

SAS functions are for normal (common ) mathematical operations such as square root, natural log and many more....

SAS Procedures would be in need for one argument value per observation in SAS dataset.

How To Handle Special Missing Values In SAS

Hi All,

Today we are going to discuss, how to handle missing values in SAS. Yes, I know, you are going to say that you know this thing very well. It’s not about the normal missing values. It’s for SPECIAL MISSING VALUES.

What is SPECIAL in Missing values:

Suppose I am the owner of a medical shop and I need to enter the name of the medicines which are not available. I am going to take ‘i’ for fever medicine and ‘-‘for cold.

Now I have two type of missing values ‘i’ and ‘-‘. I think you got my point for special missing values.

If not, then another example is going on:

Suppose in an organization an employee is absent on a day and organization is noting down this one with a notation ‘A’. Some time it happens that employee is in office but he forgot to swipe his details so the system would reflect absent as per the process. But for this type of missing information of absent, organization is having notation like ‘<’.

Finally you got the point for SPECIAL MISSING VALUES.

The special missing value is a type of numeric missing value that enables us to show different categories of missing data by using the letters A to Z (A-Z) or an underscore in the data.

To deal with the special missing values in SAS we have MISSING Statement. We just need to assign all special missing values in Missing Statement;

Here we go with the example:

Data DSN;

Input id x;

Missing i a _;

Datalines;

1 i

2 0

3 a

4 155

5 _

6 .

7 -24546

8 548

;

Run;

And the output would be like this :

* Here we have Special Missing values ‘I’, ‘a’ and ‘_’.

* DSN is the name of the dataset.

The order of special missing values can be understand like this: (_) < (.) < (A) < (Z) < (-24546) < (0) < (155).

*(-24546) is a negative integer and (155) is positive integer.

You can check the all special values, if it is working correctly in the following example: (Try by yourself just giving you code)

Data calc;

Input x1 x2 x3; (Missing statement is unavailable)

Datalines;

1 2 a

3 2 i

3 1 x

1 1 1

2 2 2

;

Run;

Data Test;

Set calc;

Missing a i x;

Total = x1+x2+x3;

Sumfn= sum(of x1-x3);

Run;

Data exp;

Missing a;

X1 = ._;

X2 = 5;

X3 = .a;

Sum = x1 + x2 + x3;

Sumfn = sum(of x1,x2,x3);

Run;

Note=> Remember if we are assigning a missing value in an expression or in any assignment statement, a period (.) must be used before the special missing value or an underscore otherwise there will be an error in SAS log i.e. "NOTE: Variable _ is uninitialized." And same happens with a character missing value as well.

Column Pointer Control in SAS : @, +N,/ and #N

Hi,

To get the control while reading external file (with infile statement) and with cards/dataline statement :

@n : Moves the pointer to the nth column in the input buffer

+n : Moves the pointer forward "n" column in the input buffer

/ : Moves the pointer to the next line in the input buffer

#n : Moves the pointer to the nth line in the input buffer

Example :

Data DSN;

input ID 1-4 // openweight 1-3 closeweight 5-7;

Cards;

1023 David Shaw

Red

189 167

1049 Kathy Jones

yellow

145 124

;

run;

* For others please try yourself and put the example into the comment..

Difference Between Run Statement And Quit Statement in SAS

Hi,

In SAS, Run Statement and Quit statement could be the formally last word in Data step or Proc Step but both have lots of differences. I am writing just few of them.

RUN Statement : -

1. It executes all previously submitted SAS Statements.

2. It executes the procedure but not end it.

3. For Run statement or for run group execution we need to submit either a RUN cancel or a Quit Statement.

QUIT Statement :-

1. It executes all previously submitted SAS Statements as well as it marks the end of the procedure.

2. Quit statement allow us to use the same procedure continuously without executing or submitting another procedure.

3. It is to run group processing which enables us to submit certain procedures with a RUN statement.

What Is Data Integration With Consultation Of SAS DI Studio

Hi All,

We are familiar with all the definition of Data warehouse and the four important words which could define the characteristics of data. We can use the data of the data warehouse to perform various things like: Data mining, Business Intelligence and various types of reporting.

We know it, If we want to perform any operation with the data, We have to have fetch the data from the data ware house. It means firstly we need to collect the data from various sources to create in a unified view for a business purpose.

SAS DI or SAS Data Integration is a ETL (Extract, Transform, Loading) tool. It means we can use SAS DI to create the data warehouse or to fetch the data from it to perform various things and again can upload the data to DW.

Four magical key words to define the Data ware house are Subject oriented, Time-Variant, Integrated and Non-volatile. Which are defined in the definition given by Ralph Kimball.

We are not going currently deeper inside the Data warehouse or it's contents.

Here we go for Data Integration and SAS Data Integration Tool :

Unified solution to support all project from building data warehouses to migrate to new ERP System and everything in between

Process of actively managing all of our organizational data

Data Integration is the process of consolidating of data from variety of sources to produce unified view

Can read data from all available and relevant sources

To clean the data from point of entry to the point of delivery or every touch point in between

Consolidated view from many operation system to create Data Warehouse and Data mart to support BI and Analytics

To Support BI, Analytics program and to store business historical data

Business gets data from many sources and from many vendors, which contain same information (Changes made in one system would get reflect in all system)

To load or Migrate the data from any system

To consolidate multiple databases into one

To migrate data from legacy system to reduce cost or to upgrade to latest release of critical application

Provide integrated solution builds from the ground up solution for all need of data integration

Solutions are time proven to be rock solid and stable

It requires from simple and one time migration to complex and real time solutions

ETL from across the enterprise to create consistent and accurate information

Data quality (Consistent and reliable data)

Supports impact and change analysis

Data synchronization

Metadata Management

Can read data from any source including : unstructured data, message cube and legacy systems

SAS DI Studio not only Extract, transform and load the data, it also perform data cleansing and dimension conforming

Remove mistakes and corrects data, Captures the flow of transitional data and gives a structure to data to be used by BI tools

Useful in every sector like : Public Sector, Life Science, Manufacturing, Financial …

SAS DI is a bridge for BI – Directional flow

Data migration, Data consolidation, Master data Management, Data Synchronization

Extensibility with custom connection

User friendly GUI, Metadata capture and less training time

Allows integration with existing infrastructure

Secure communication between Server and client

These notes could be useful for SAS Consultation of SAS DI Studio.

Library Concatenation in SAS : How To Refer Multiple SAS Libraries To A Single Libref

Whenever we want to access more than one SAS library in a data set by using only single Libref, SAS library combination or concatenation concept comes into the picture.

Combination of SAS library is the logical combining of two or more SAS Libraries. By this, we can access several libraries with one libref.

As we are already familiar that we can combine / concatenate more than two libraries by specifying their own libref or physical location name in LIBNAME statement.

Suppose we have some libraries at physical location like:

Libname JAN “C:\Users\Uma\Desktop\blog data”;

Libname FEB “C:\Users\Uma\Desktop\blog data\Functions”;

Libname MAR “C:\Users\Uma\Desktop\blog data\Pictures”;

Libname APR “C:\Users\Uma\Desktop\Template Code”;

Combination of more than two libraries would be as following:

Libname YEAR (JAN FEB MAR APR);

Or

Libname YEAR (“Path1” “Path2” “Path3” “Path4”);

Even we can go like this as well

Libname Total (“C:\Users\Uma\Desktop\Template Code\Backup Template” FEB MAR APR);

We can use this assigned libref with another LIBNAME as well as follows

LIBNAME COMPLETE (YEAR “Path5”);

If we don’t enclosed the physical name in single or double quotation marks then SAS would look for a previously mentioned libref with the same name

Efficiency And Performance Strategies Within SAS

Efficiency and performance strategies for SAS can classified into five stages:

1. CPU time           2. Data Storage          3.Elapsed time          4. I/O           5. Memory

1.          CPU Time

A. Use Keep= and Drop= data set options to retain desired variable.

B. Create and use indexes with large data sets.

C. Utilize macro for redundant code.

D. Use IF/Then - else statement to process data.

E. Use the data sets procedure copy statement to copy data sets with indexes.

F. Use SQL procedures to consolidate the number of steps.

G. Turn off macro facility when not needed.

H. Avoid unnecessary sorting - Plan for its use.

I. Use procedures that support CLASS statement.

J. Use the stored program facility for complex data steps.

2. Data Storage

A. Use Length statements to reduce variable size.

B. Create character variable as much as possible.

C. Use _Null_ steps for processing null data sets.

3.               I/O

A. Read only data, that is needed.

B. Use "Where" Statements to subset data.

C. Use data compression for large data set.

D. Store data in SAS Data sets, not external files.

E. Perform data subsets early and at same time.

4.                Programming time (Elapsed time)

A. Use procedures whenever possible.

B. Document programs and routines with comments.

C. Code for unknown data values.

D. Assign descriptive and meaningful variable names

For memory go with some options like sort size, buff size, mem size and Real Mem Size........

Concatenation Functions In SAS : CAT, CATT, CATS, CATX, CATQ

Hi All,

Today i am explaining something about to add two or more character strings in SAS. Adding two or more character or string which simply notify towards the CONCATENATION.

We can simply define concatenation as to put two or more strings together is concatenation. Here we would not discuss about the concatenation operators which are || (vertical bar) and !! (Exclamation mark ) but we would discuss on some concatenation functions in SAS which are CAT, CATT, CATQ, CATS and CATX.

All the Concatenation functions takes two or more arguments and concatenate the strings or variables. For all concatenation functions (if applies) removes the leading or trailing blanks from resultant concatenated string before the concatenating of strings.

Note : Whenever we use concatenation operator || or !!, The Length of the resultant string would be the sum of the lengths of all individual strings we are adding or concatenating.

CAT Function in SAS : It concatenates the two or more character strings and does not remove leading or trailing blanks. The resultant concatenated string would be a character string.

Syntax : Result_cat =CAT (String1, String2,.....StringN);

CATT Function in SAS : It concatenates the two or more character strings and removes trailing blanks from the resultant string or variable. The resultant concatenated string would be a character string.

Syntax : Result_catt =CATT(String1, String2,.....StringN);

CATS Function in SAS : It concatenates the two or more character strings and removes all leading and trailing blanks from the resultant string or variable. The resultant concatenated string would be a character string.

Synatax : Result_cats =CATS(String1, String2,.......StringN);

CATX Function in SAS : it concatenates the two or more character strings and adds a delimiters after each string's value. It also removes leading and trailing from resultant string or variable. The resultant concatenated string would be a character string. We can say that CATX is same as CATS just it adds a delimiter between values being concatenated.

Syntax: Result_catx=CATX('Delimiter', String1, String2,,..... , StringN);

CATQ Function in SAS : It concatenates the two or more character or numeric strings by adding a delimiter and quotation mark to that string which contain the delimiter. CATQ function is similar to the CATX function excepts it also adds quotation marks.

Syntax: Result_catQ=CATQ(Modifier,'Delimiter',String1, String2,.....StringN);

Modifiers for CATQ function:-

1 or ' : For single quotation mark

2 or " : For double quotation mark

a or A : For adding quotation mark to all of the arguments

c or C : For comma as a delimiter

d or D : Tells that we have specified delimiter argument

h or H : For horizontal tab as a delimiter

For all concatenate functions:-

The Default LENGTH of returned variable from any CAT* function would be 200 bytes, if Length is not previously specified to the assigned variable of CAT function

CAT* function always returns a value to a variable

For numeric variables / arguments, CAT* function removes trailing and leading after formatting numeric arguments to the BESTw. format

The returned values from CAT, CATS, CATT and CATX are normally equivalent with the resultant values of concatenation operator (with certain combination like : Trim, Left, Strip) except in length

CAT, CATS,CATT and CATX functions are faster than using TRIM and LEFT functions

In CATQ, if we do not use C,D or H as modifiers, then CATQ would use blank as delimiter

Interview Question