SAS Servers In a Glance : SAS Application Server (SASApp)

Hi Guys,

As we are learning SAS DI Studio, So it is a high time to keep an eye on SAS Servers. We can simply say that we have a list of SAS Servers but here we would put spot light on some important servers only.

SAS Application Server/s : We can easily say that this server is a bucket of all important SAS server. If we would talk with respect to SAS DI Studio, when administrator installs SAS intelligence platform, he define a metadata object which represents SAS server tier . For SMC (SAS Management Console), such type of object is known as SAS Application Server. By default SAS Application Server is named as SASApp.

SASApp is not an actual server while it contains a set of actual servers. In other word, we can say that SASApp is a logical container of actual servers.
 Usually SAS Application Server contains the following servers:

1. SAS Workspace Server: It executes SAS code, it means workspace server reads and writes data. In SAS DI Studio, when we submit any ETL job for execution, SAS DIS generates SAS code to perform execution and submits the SAS code to Workspace Server for processing.

2. Pooled Workspace Server:  This SAS Server automatically use pooling and load balancing.

3. Stored Process Server:  It is also a great time to discuss Stored Processes. Stored processes are SAS programs which are stored and can be executed by client application. These processes are to perform complex work like: to analyze data and creating reports, and then publish / return the results to the client / channel or repository.
So Stored Process Server submits stored processes by SAS for execution.

4. SAS OLAP Server: It creates cubes and processes queries against cubes. OLAP Server processes the data by using multidimensional expressions language (MDX).

5. SAS / CONNECT Server: This server is to move data between client and server machine or we can say that it enables the client to execute code on a remote host. It can also be used for interactive access to remote libraries.

6. SAS Grid Server: This server is to enable platform for LSF to start SAS/CONNECT servers on a SAS compute grid, so that it could execute grid enabled jobs which are created in SAS DI Studio and SAS Enterprise Miner.

7. SAS/SHARE Server:  It enables parallel access of sever libraries from multiple users.

8. SAS Scalable Performance Data (SPD) Server: it enables concurrent processing of large data stores, Data backup and to restore utilities as well as some tuning options.

What Is The Difference Between SAS Character Functions INDEX And INDEXC

In my previous post related to, how to read mixture dates from raw data in sas, I have used INDEXC function so it’s a good time to define INDEX and INDEXC function in SAS with difference in both.

link to previous post :How To Read Different Formats Of Dates In SAS (Reading Mixture DatesFrom Raw Data In SAS )

INDEX Function : It looks for a character expression in a character string and returns the position of string’s first character for the first position. In short, it scans the string and provides the location of sub-string.

INDEXC Function : It looks for a expression for any of the given characters and returns the position of that expression. In Short, It scans the string and provides the location of sub-string based on list of expression in sub-string.

Explanation : Suppose we have a character string “UmAShankerSaini” and we are looking for ‘ain’.

Once we are running INDEX function, then it would search for complete string like ‘ain’ and it would return the position of it, which is 12.

Once we are running INDEXC function, then it would make a list of all letters in specified string and returns the position of any character for the first instance of any of them. So when it comes to lowercase ‘a’ at the sixth character, it returns the position value which is 6.  In case of uppercase ‘A’, it would return 3.

Data DSN;
Name='UmAShankerSaini';
Exp='ain';
Indx=Index(Name,Exp);
IndxC=INDEXC(Name,Exp);
Put Indx= / Indxc = ;
Run;




How To Read Different Formats Of Dates In SAS (Reading Mixture Dates From Raw Data In SAS)

Today we are going to discuss about, how to read dates in SAS, which are in multiple format in raw data.

Suppose we have dates in raw data which contains multiple types of separators and formats.

We would read such type of mixed date data and make all dates in one required format. We have two master blaster functions to perform this task :INDEXC and WORDDATE

SAS Code would be as follows:

Data DSN;

Input @1 Dummy_Dates $15.;

If INDEXC (Dummy_Dates,'-/:') NE 0

Then Date=Input(Dummy_Dates,MMDDYY10.);

Else Date=Input(Dummy_Dates,Date9.);

Format Date WORDDATE.;

Drop Dummy_Dates;

Cards;

10/07/2014

10MAY1984

8-12-1999

1:09:60

;
Run;








For more on INDEXC : What Is The Difference Between SAS Character Functions INDEX And INDEXC

Types Of Error In SAS : Syntax Error, Execution Time Error, Data Error,Semantic Error

Many of us are familiar with Errors in SAS and most of the time we have seen the classification of errors in SAS log.

As per the log classification in SAS, There are four type of errors in SAS :

1. SYNTAX ERROR      2. EXECUTION TIME ERROR    3. DATA ERROR     4. SEMANTIC ERROR

1. SYNTAX ERROR : We can have such types of errors whenever our programming statements or codes does not follow the SAS language coding rules. SAS detects SYNTAX Error at COMPILE TIME / STAGE. Syntax errors could be due to the following reasons :

  • Any keyword which is not spelled correctly (Misspelled keyword)

  • Missing or Invalid punctuation

  • Invalid statements or data set options

Quotations marks not used properly (Unmatched)

2. EXECUTION TIME ERROR : Whenever SAS executes a program which processes data, during the processing of data value if any error occurs that is known as EXECUTION TIME ERROR. Most of the execution time errors just gives warning messages or notes in SAS log but These errors allow the program to continue the execution of data. We can identify execution time error usually by location which would be given as line and column numbers in a SAS note or SAS error message. Execution time error could be due to following reasons:


  • Any mathematical operation which is not legal ( division by zero )

  • Any argument to function which is not legal

  • For BY group processing, when observations are not in proper order

  • While referring to a member of an array which does not exist

  • Open and close error (for brackets and inverted commas) on SAS data sets and other files like INFILE

  • Any INPUT statements which do not match with the cards / datalines 

3. DATA ERROR : We can have such type of errors in SAS, whenever any data values are not as per the SAS statement which we have specified in our SAS program. Suppose we have defined a variable as a numeric but we are passing some data values which are character then SAS gives an error that would be known as DATA ERROR. Data errors occurs during the program execution but these errors allow the program to continue the execution of data and does the following things:

  • It prints the obs under the rule line

  • In the SAS log, It writes an invalid data notes with line number

  • It sets the automatic variable _ERROR_ to for current observation

  • In SAS log, it prints the input line and column number which contain invalid data

4. SEMANTIC ERRORS : We can have such type of error in SAS, whenever our programming statements or codes are up to the mark means any SAS statement is correct but any element is not valid for that specific usage. SAS detects SEMANTIC ERRORS at COMPILE TIME / STAGE. Whenever SAS finds any semantic error in program SAS goes for syntax check mode. Semantic errors could be due to the following reasons :

  • Defining or using a numeric variable where only a character variable is applicable

  • For Array, when we use any illegal reference for it

  • Whenever we do not specify correct number of arguments  for any SAS Functions

Ex:  Suppose we are using a libname for a dataset without assigning it prior, for semantic error, libname statement is valid but as it is not specified previously so it would through an error.

Why We Use FORCE Option With PROC APPEND In SAS To Upload Data To DataWarehouse

Here we are going to discuss use of ‘FORCE’ option in PROC APPEND.

 FORCE option: As per its name, it forces PROC APPEND to concatenate or add or append the data sets in some conditions in which the append procedure would normally fails like: when data sets contains different variables, formats, Lengths and labels.

 PROC APPEND without FORCE option (Using different variables in datasets):

 Suppose we have two datasets named as Primary and Secondary.

                        Primary Dataset – 3 Variables (X, Y,Z ) – 252000 observations

                       Secondary Dataset – 2 Variables (X, Y) -  48000 observations

 Now we run the following code:

         Proc Append Base=Primary Data=Secondary;

        Quit;

 We would get error due to non matching variables in Datasets. Error would be thrown only if Base data set would have different rather than secondary dataset.


Case: 1:-

Now if we use FORCE option then it will force to run PROC APPEND and FORCE would drop or truncate the variable as per the Base dataset (Primary).

Proc Append Base=Primary Data=Secondary FORCE;

Quit;






Data would be added to the primary dataset after truncating or dropping the extra variable in secondary dataset as per primary dataset.

Case: 2:- What if our Secondary dataset would have fewer variables than Primary dataset?

 FORCE option would force to run PROC APPEND and it will assign missing values for the missing variable’s observation in appended dataset from where secondary dataset have values.

                    Primary dataset – 3 variables (X, Y, Z) – 252000 observations

Secondary dataset – 2 Variables (X, Y) – 48000 observations

Code:

 Proc Append Base=Primary Data=Secondary FORCE;

Quit;

Log:

Result:

FORCE option in PROC APPEND has concatenated both dataset and assigned missing values to missing variables observation in secondary data set with respect to primary data set.

We are familiar with PROC APPEND in SAS to upload data to the data warehouse and it’s attributes, Kindly refer to : Difference Between Set Statement And Proc Append In SAS For Appending Data

Difference Between Set Statement And Proc Append In SAS For Appending Data

Hi,

As we all know that there are three ways to upload data into the Data Warehouse:

Append – Can say :Append to existing

  1. Replace

  2. Update/ Insert

Here, we are just going to explain one way only named as APPEND

What append means: We can say that, append is a process in which data would be added in to a primary table from last observation (from the bottom) from secondary table.

Here we are going to see the difference between two ways of appending the tables in Data warehouse and we would see that which one (Data Step or Proc Step) is more efficient.

Two ways to append the data are:

  1. Set Statement ( In Data Step)

  2. Proc Append ( In Proc Step)

Suppose we have two Data sets named as Primary and Secondary both have three variables and 252000,48000 observations respectively.

Dataset Primary – 3 Variables (X, Y, Z) – 252000 Observations

Dataset Secondary – 3 Variables (X, Y, Z) – 48000 Observations

Just Run a small piece of code:

Data  Set_Append;

Set Primary Secondary;

Run;


Combined dataset SET_APPEND would have 300000 observations. If we see the log then we would get to know that SET Statements read both the data set. It means SAS processes the both data set.  Log is displaying the CPU timing and Real timings which are 0.17 and 0.18 seconds respectively.

Now try to run a small piece of code again from SAS procedure named as PROC APPEND:

Proc Append Base=Practice Data=Secondary force;

Quit;


Now Primary dataset would have 300000 observations and as per log, Proc append is reading only secondary dataset, it means it is we have saved time from SAS which was in to read primary dataset. Log is displaying this information is clearly that CPU timing and Real timings which are 0.01 and 0.04 seconds respectively.

Some useful differences between Proc Append and Data Step SET Statement while using as Load Style APPEND for data warehouse:

  • Proc Append is faster than data step SET Statement

  • We can use proc append for only two tables or datasets

  • In Data step, Set statement, we can merge any number of tables or datasets

  • Proc Append uses formats, Informats and Labels from BASE= dataset

  • Set Statement uses defined formats, informats and labels explicitly

  • For Set Statement, if same variable is having different length in two or more datasets, then length would be defined as per the first dataset in SET statement

  • Proc Append, looks for the FORCE option to concatenate and, if length of the variable does not match with primary dataset or table

For description related to FORCE option in PROC APPEND, Kindly refer to:
Why We Use FORCE Option With PROC APPEND In SAS To Upload Data To DataWarehouse

ANY Function For Data Cleaning In SAS : ANYALNUM, ANYALPHA, ANYDIGIT,ANYPUNCT, ANYSPACE

We have some more SAS character functions for data cleaning. We can call this group of SAS Functions as "ANY" function group.

We can use given functions to search digits, alphanumeric (upper and lower cases, letters and digits), alpha (completely character), punctuation characters and space characters (tab, blanks, line feeds).

For these SAS Character functions you must be familiar with LOCALE and TRANTAB SAS System options.

ANYALNUM Function: It gives the first position of alphanumeric character from a variable after searching for a alphanumeric character string in a character variable.

Syntax:  Any_Alnum = ANYALNUM (String,Start_Position);



ANYALPHA Function: It gives the first position of an alphabetic character from a variable after searching for a alphabetic character string in an character variable.

Syntax: Any_Alpha=ANYALPHA (String,Start_Position);



ANYDIGIT Function: It gives the first position of a digit from a character string after searching for any digit in character variable.

Syntax: Any_Digit=ANYDIGIT (String,Start_Position);



ANYPUNCT Function: It gives the first position of a punctuation character from a character string after searching for punctuation in character variable.

 Syntax: Any_Punct=ANYPUNCT (String,Start_Position);



ANYSPACE Function: It gives the first position of a white – space character (Horizontal and Vertical tab, Blank, Carriage, line feed and form feed) from a character string after searching for a white-space in character variable.

Syntax: Any_Space=ANYSPACE (String,Start_Position);

  •   String: Could be any variable, any expression to search and any character constant

  •  Start_position: Any optional integer which tells about the position from which the search should start and some time the direction to start

  •  If you want that search should be right aligned, then Start_Position value should be positive

  •  In the same way if Start_Position value is negative, then search would be proceeds to the left

These functions would return a value zero if any following is true:

  •  If searching character not found

  •  If the value of Start_Position is more than the length of the String

  •  If value of Start_Position is equals to zero

Combined code for all above functions in one shot:

Data DSN;

Input Practice $26.;

Alpha_Num=ANYALNUM (Practice);

Alpha_Num_L=ANYALNUM (Practice,-999);

Alpha=ANYALPHA (Practice);

Alpha_L=ANYALPHA (Practice,-999);

Digit=ANYDIGIT (Practice);

Digit_L=ANYDIGIT (Practice,-999);

Punct=ANYPUNCT (Practice);

Space=ANYSPACE (Practice);

Cards;

My Employee Id is 5001710

Brawvo ! well done

1357924680

;

Run;

  •  DSN is the name of the Dataset

Example:  How could we use these functions for data cleaning

Data Alpha_only Mixed Punct;

Input Practice $26.;

If Anydigit (Practice) then output mixed;

If Anypunct (Practice) then output Punct;

Else output Alpha_only;

Cards;

My Employee Id is 5001710

Brawvo ! well done

1357924680

;

Run;

  •  Alpha_only, Mixed and Punct are newly created data set

SAS Functions - LOWCASE, UPCASE and PROPCASE Function ( SAS Characterfunction )

LOWCASE Function: It converts all letters to LOWER CASE of any character constant, Variable or Expression. It is a character function.

We can say that LOWCASE function converts all letters of any arguments to lowercase letters.

Syntax:  LOW_CASE=Lowcase(X);




UPCASE Function: It converts all letters to UPPER CASE of any character Constant, Variable or Expression. It is a character function.

We can say that UPCASE function converts all letters of any arguments to UPPERCASE letters.

Syntax: UP_CASE=Upcase(x);




PROPCASE Function: It converts all letters in PROPER CASE of any character constant, Variable or Expression. It is a character function.

We can say that PROPCASE function converts the words of any arguments into the PROPERCASE Words.

 Syntax:  PROP_CASE=Propcase(X);

 Combined code for all above function :

 Data DSN;

X='UmA ShANkeR saINI';

UP_CASE=Upcase(x);

LOW_CASE=Lowcase(X);

PROP_CASE=Propcase(X);

Run;


And the output would be as follows :

 *  DSN is the name of the Dataset

These function are not designed for SBCS, DBCS and MBCS character encoding

SAS ERROR Library JANUARY Is Not In A Valid Format For Access MethodRANDOM

Hi,

It is a very common error, whenever we try to assign a libref to the LIBNAME function.

Suppose we try to run the following code in windows environment :

Libname January "C:\Users\Uma\Desktop\blog data\1.txt";

and then we would get the following error in SAS log :

                    ERROR: Library JANUARY is not in a valid format for access method RANDOM.

                    ERROR: Error in the LIBNAME statement.

I am not going to deal with the more technical stuff for this. 

Just try to understand the basic job of LIBNAME statement. It's primary functionality is to refer a library or physical location. We can call it here as folder.

 So on the above example, we have given path for the specific file not for folder.

 Remember, whenever you are going to use LIBNAME just give the folder path ( Window environment) not path of any specific file.

 In short : Libname is to refer the folder not file (Windows environment)

 Here we go with correct one :

                                       Libname January "C:\Users\Uma\Desktop\blog data";


Output Column For Number Of Duplicate Rows In SAS

Suppose we have a table or SAS data set that contains duplicate rows/ observation.

We just want to calculate / count the number of duplicate rows from that table / SAS data set .

It would be great if we can generate an output column which could show that how many times each row occurs.

Example : 

Suppose we have a table / SAS data set that contain two variable / column known as last name (Lname) and first name (Fname) showing as :

                                                   Data DSN;

                                                  input Fname $ 1-7 Lname $ 8-15 ;

                                                  Cards;

                                                  Rebecca Smith

                                                  Kathy   John

                                                  Rebecca Smith

                                                  Tom     Andrew

                                                  Reed    Samuel

                                                  Rebecca Smith

                                                  Kathy   John

                                                  Reed    Samuel

                                                  Davis   Kerry

                                                  ;

                                                  Run;

The output would be like that :


To generate the number of duplicate column from the table :

                                                    Proc SQL;

                                                   Select *, count (*) as count 

                                                   From DSN

                                                   Group by Fname, Lname 

                                                   Having count (*) > 1;

                                                   Quit;


and the result would be as :


* DSN is the name of the SAS data set.

Implicit And Explicit PASS THROUGH Facility In SAS : Libname And ProcSQL

Hi All,

The most important job for us to retrieve or fetch the data from the database like : Oracle, DB2, Teradata, Sybase, Mysql, and many more.....

Now we have two facilities to fetch the data from any database to the SAS ( we use SAS/Access for it).

(a) IMPLICIT - Libname

(b) EXPLICIT - SQL Pass Through facility

We are going to see the comparison, difference and many other things between both of them (Libname and SQL pass through).

                  Comparison between LIBNAME and SQL PASS THROUGH

  1.  If we are using LIBNAME then SAS passes some functions to database to process the data. We expect that these functions would react in the same manner whether we pass them through the implicit with LIBNAME or explicit with PASS - THROUGH. What i can tell, using these functions would be in equal but LIBNAME would be preferable as the coding for LIBNAME is simple.  We can use following SAS function while interfacing with the ODBC: ABS, AVG, CEIL, LOG, Min, Max, Sin, Sqrt, SUM, Count and many more......

  2.   We should not forget that if we use LIBNAME statement, it is used to create a library to store and access the SAS files or datasets in the specified particular folder of an operating system. We can use this one with the data step and proc steps to run and to access the files or datasets but should not be used to access the databases.

  3. If we are trying to join more than one table means multiple tables and looking for only aggregate result, it this situation SQL PASS THROUGH facility is much better option. We can leave the aggregation part for the database or RDBMS and it would provide the result which would be in less data volume.

  4. Suppose if we are going to do the same aggregation while using LIBNAME facility or engine then  Libname would fetch all the big table (High in volume) over the SAS server and then it would do the  calculation ( like : Joining tables and merging) on the SAS session.

Now the difference would be from few minutes to few hours.. ( for upper two points ).

  1.  SQL PASS THROUGH facility which we use as a part of Proc SQL is used to make a connection between the databases such as RDBMS (Teradata / DB2...). We can use explicit facility to access and to create the tables in different RDBMS / Databases.

Some Important Points (I am fan of explicit facility)

a)     Whenever we submit any data source specific (ODBC) SQL statements directly to the any data source like Teradata by using an extension of the Proc SQL, then it would called as SQL Pass through facility.

b)     Explicit facility (SQL Pass through) uses SAS / Access to talk or to connect to the mentioned or referenced database or data sources and passes the statements directly to the database for execution.

c)     Explicit facility / SQL Pass through facility is only an alternate way of SAS   /Access LIBNAME statement to connect to the database.

d)     SQL Pass Through facility / explicit facility support any type of non ANSI standard SQL which is supported by our data source or ODBC.

e)     SQL Pass through facility and Libname facility both are providing the access to the database.

f)       SQL pass through (Explicit facility), which allow us to pass the code directly to the database while libname (Implicit facility) coding is much easier than pass through and system efficiency is optimized by SAS.

We can use SQL pass through facility for the following operations

I. To terminate or establish connection with data source (via ODBC to RDBMS) using connect and disconnect statement

(a)  Use of CONNECT TO statement: It establishes a connection to the data source / database.

(b)  Use of DISCONNECT Statement: It terminates the connection from the data source / database.

II. Explicit facility or SQL Pass through sends dynamic and data source specific (ODBC) statements of SQL to a data source (Database) using execute and connect statements

Use of Execute statement in SQL Pass Through Facility

(a)     Execute Statement Sends data source-specific and non-query SQL statements directly to the data source (Database)

(b)    We cannot store Execute Statement as a part of explicit facility in Sql view.

(c)     We can use following statements for the data source by using explicit facility Execute statement : CREATE, DROP, INSERT, UPDATE and DELETE

Examples for Execute Statement:

  • Execute (DROP table 'table name' ) by database_name;

  • Execute (CREATE table ' table_name' ( 'Col_1' varchar (10), 'Amount' Currency)) by database_name;

  • Execute (INSERT INTO 'table_name' values ('Uma18752',12345)) by database_name;

III. We can use CONNECT TO component in SQL Pass Through to directly retrieve data from a data source by using FROM clause of select statement.

Example:

  • Select * from CONNECTION TO database_name (Select * from 'table_name');

  • Select * from CONNECTION TO database_name (Select 'Col_1', Amount from 'table_name');

       SYNTAX for SQL PASS THROUGH FACILITY ( EXPLICIT FACILITY)

PROC SQL <options>;

CONNECT TO <database_name> AS <alias> (arguments for connect statements)

DISCONNECT FROM <database_name>/<alias>

EXECUTE <data_base_specific_sql_statements> BY <database_name>

SELECT <list all required column>

FROM CONNECTION TO <database_name>;

For Pass - Through facility any error conditions would also be written to the SAS log and those would be in following macro variables:  SQLXRC, SQLXMSG

Now we are going to see the comparison for all three ways to access or fetch the data from any database.

As we are comparing with the some code and the same result so we need to pass each code in the same environment for the same ODBC (database ). Then only we could get the exact difference:

PASS-THROUGH Facility:

Proc sql;

Connect to teradata(dsn=Uma18752 user=xxxxxx password=yyyyyyy);

Create table practice.explicit as

Select * from connection to teradata

(select tera1.col_1

From tera2.tab01 as tera1

Inner join tera2.cab01 as col_1

On tera1.id_uma=tera2.id_uma

Where Col_1=5001710);

Disconnect from teradata;

Quit;



LIBNAME (Implicit facility):

Libname in teradata complete="dsn=UMA18752;Uid=xxxxxx;pwd=yyyyyyy"

Schema=tera2;

Proc sql;

Create table practice.implicit as

Select tera1.col_1

From in.tab01 as tera1

Inner join in.tera2.cab01 as tera2

On tera1.id_uma=tera2.id_uma

Where Col_1=5001710;

Quit;



DATA STEP Coding:

Data Practice.DSN;

Merge in.tera2.tab01 (in=tera1 in=tera1)

  1. tera2.cab01 (in=tera2 keep=id_uma in=tera1

Where=(Col_1=5001710))

By id_uma;

If tera2 and tera1;

Run;

                                                  COMPARISON

The pass-through facility (explicit) and LIBNAME (Implicit) code produce very similar log for execution and  mean run-times. The DATA STEP code uses significantly more time for user cpu, though slightly less system cpu. (See Table 1) :

                                                           TABLE 1 

PASS-THROUGH                   1                  2                  3                     Mean

real time            01:15.7        01:05.0        01:21.2               01:14.1

user cpu time     03.90            3.91             3.94                    3.92

system cpu time  0.86             0.79             0.61                    0.75

Libname - SQL

real time           01:09.1        01:06.1         01:25.6               01:13.6

user cpu time     3.90             3.89              3.62                    3.80

system cpu time 0.55             0.62              0.8                      0.42



DATA STEP  Code

real time          02:58.4          01:13.1        02:05.0              02:05.5

user cpu time   4.79               5.14              4.56                   4.50

system cpu time 0.49             0.55              0.46                   0.50

By the above table we can calculate the data step is taking too much time for processing in comparison of Explicit and implicit so we can say that data step is not up to mark for fetching the data from database.

We can choose implicit facility or explicit facility as per the requirement to fetch the data from any database.

* Sooner we would have screen prints for table 1 result and code as well with easy example.

SAS Default Sample Data Sets Library With Proc Datasets procedure

Some useful codes for Proc datasets procedure:-

1. To get the default SAS Datasets or Sample library :

 Proc datasets library = Sampsio;

run;



2. To copy an entire library to a specific destination in SAS :

Proc datasets library = Source;

Copy out = Destination;

run;



3. To copy a dataset from one library to another library in SAS :

Proc Copy in = Practice  out = Production;

select DSN;

Run;

* Dataset DSN has been copied to Production library from practice library.



4. Using Datasets Procedure to modify the variable name of existing dataset:

Proc Datasets library = Practice;

Modify DSN;

Rename Emp=Employee;

Quit;

* Variable name EMP has been modified as Employee from Practice library.



5. To delete as SAS library:

Proc Datasets library = Libraryname Kill;

Quit;

* kill option deletes all the members from the library.



6. To delete a SAS Dataset from a SAS library:

Proc Datasets library = Libraryname  nolist;

Delete DSN;

Quit;

* Dataset DSN has been deleted. Nolist option to suppress the printing of the directory of SAS files in the SAS log or ODS output.



7. To change the name of a SAS Dataset:

Proc Datasets library = Practice nolist;

Change Learn=Earn;

Quit;

* Data set Learn has been renamed as Earn.

Change Numeric Values ( 1,5,100...) Into Alphabetic Values ( One, Five,Hundred ...) In SAS

Hi,

Most of the time we stuck with the situation where we need to change Numeric values into Alphabetic.

Suppose we want to convert some numeric values like 1 into one, 5 into five, 100 into hundred and any value.... and even 0 into zero.

Here we go :

                                                             Data DSN;

                                                             input Variable $ Number ;

                                                             Alphabetic=put(number,words30.);

                                                             cards;

                                                             A 0

                                                             B 5

                                                             C 49

                                                             D 100

                                                             E 499

                                                             F 67

                                                             ;

                                                             run;


And the output would be as you need :


In case of, if your value is big like 123456789, need not to worry, just increase the length of words argument.

Example :                    Alphabetic = put (Number,words75.);

Multilevel Referencing Of Macro In SAS - &&&&&&&& MACRO VARIABLE

Hi,

We can use any number of ampersands (&) in an indirect macro variable reference.

SAS resolves the entire reference from left to right. If a pair (double) of ampersands (&&) is encountered, the pair (double) is resolved to single ampersand, then the next part of the reference is processed.

Suppose we have following macro variables:

%let a=UMA;

%let UMA=SHANKER;

%let SHANKER=SAINI;



And we are going to check the output of the following PUT statements with multiple macro variables.

%put &a;

 %put &&a;

%put &&&a;

%put &&&&a;

%put &&&&&a;

%put &&&&&&a;

%put &&&&&&&a;



And output would be something like this:

%put &a;   * UMA *;

%put &&a;  * UMA *;

%put &&&a; * SHANKER * ;


 Explanation: && resolves to single & (put in reserve)  resolve &a to UMA, now with reserve & sas reads &UMA , result is SHANKER.

%put &&&&a; * UMA . pairing && and && comes to && resolves to & *;

Explanation: && resolves to single & (put in reserve) again && resolve to single & (put in reserve), now we have only a, no execution. in reserve we have && which is &a. again result is UMA.

%put &&&&&a;

Explanation: && resolves to single & (put in reserve) again && resolve to single & (put in reserve), now we have &a, resolve to UMA. In reserve we have two ampersand, which is one (&) and now it is &UMA, resolve to SHANKER.

%put &&&&&&a;

Explanation: && resloves to single & (put in reserve) again && resolve to single & (put in reserve) , && resolve to single & (put in reserve), now we have only a , no execution. In reserve we have &&& so again result is SHANKER.

%put &&&&&&&a;

Explanation: Would resolve to SAINI, because we would have three ampersand (in reserve) and &a resolve to UMA. &&&UMA resolve to &SHANKER and it would resolve to SAINI.

SAS Character Function- Compress and Compbl

Hi,

Here we have some more character function for data cleaning. It means to get rid off from the unwanted data / characters from our data or string, that character could be special characters like : ?,!,@,#,$ and many more..

COMPRESS function in SAS: It removes or suppresses all mentioned special characters from character string and return the desired character string.

Syntax:  Compress (string, ‘unwanted characters’, ‘Modifiers’);

Data DSN;

X='@This% is !$an (exam^$ple #*of S!@A_S';

Y=Compress(X,'@,%,!,$,^,#,*,_,(');

Put X = /

Y =;

Run;



Just add all the unwanted character into the second argument with comma (,) and it would get removed from your variable.

  •  String – a character constant, any expression which resolves to character, character variable

  • Unwanted character- here we can mention all the character which needs to be removed

  • For Data step, Length of the returned variable from COMPRESS function would be equal to the variable’s lengths given in the first argument, if length is not assigned to new variable

  • COMPRESS function also allows null arguments, any null arguments would be treated as string if it has a zero length

  • Returned variable type for COMPRESS function would always as character

  • COMPRESS function removes each and every incidence from the specified character string. If we specify a blank as a character to delete from string then COMPRESS function would delete all blanks from source variable

  • If we want to use modifiers and we not specifying any second argument, then we need to use two commas together which would indicate that the modifier is the third argument

Modifiers - A variable, character constant or any expression which modifies the action of compress function. Some of the useful modifiers are given below:

A: It adds alphabetic characters to the list of characters to be deleted

D: It adds digits to the list of the characters to be deleted

I: It tells to ignore the case of characters to be deleted or kept

K: It keeps the listed character instead of removing them

N: It adds digits and underscores character

L:  It adds all lower case character to the list of character to be deleted or kept.

P: It adds all the punctuation marks to the list of character to be deleted.

U: It adds all the uppercase letters to the list of character

W: It adds all the printable character to the list of character

Suppose we want to retain some digits as well in our string then we need to use some modifiers like it :

Data DSN;

X='@This% is !$an (exam^$ple #*of S!@AS 12345';

Y=Compress(X,' ','KN');

Put X = /

Y =;






COMPBL function: This function removes extra blanks or multiple blanks from a character string by assuming each incidence of two or more consecutive blanks into a single blank

Syntax : COMPBL (argument);

Data DSN;

X='Uma   Shanker        Saini';

Y=COMPBL(X);

Put X =  /

Y= ;

Run;



  • Argument specifies to a variable, any character constant, expression to compress or any valid expression which would evaluates to character string

  • For Data step, Length of the returned variable from COMPBL function would be equal to the variable’s lengths given in the first argument, if length is not assigned to new variable

  • COMPBL function also allows null arguments, any null arguments would be treated as string if it has a zero length

  • Returned variable type for COMPBL function would always as character

  • COMPBL function removes multiple blanks only from the source variable and there would be no effect on single blank

SAS Date Function And SAS Time Function

Hi,

Most of the time we have date in a format which contains time as well. We call it as datetime or datetime stamp. It means we are going to deal with a date which also contains time.

What could we do if we want only a specific part from given datetime stamp....

Suppose we have a variable which is containing the datetime stamp and we need to schedule the job as per the time only or in simple case we just want to make some more variables from it like date, time, hour, month and second as well.

Here we go :

For example we are taking date and time as : 26SEP97:01:47:37AM

you guys can create the main variable by using datetime() function. Just do as per the code:

Data DSN;

                                                      When=datetime();

                                                      Date=datepart(when);

                                                      Time=timepart(when);

                                                      Day=day(date);

                                                      Month=Month(date);

                                                      Year=year(date);

                                                      Hour=Hour(time);

                                                      Minute=minute(time);

                                                      Second=second(time);

                                                      Qtr=qtr(date);

                                                      Format when nldatm. date date9. time timeAMPM.;

                                                      Run;



* DSN is the name of the data set.

* Our main variable is 'When'.

* NLDTM., DATE9., TIMEAMPM. are the formats

And the result would be as follows:

In data step, what ever we wrote at the left hand side that are the variables and on the right hand side all are the function with the same name as variable.

we can also use following functions :

QTR     -  To extract quarter of the year from date.

WEEK  -  To get to know the week of the year from date.

WEEKDAY - To get the day of the week from date.

* Whenever we import data to the SAS from .txt file means if we ae using proc import to import .txt file, it's a good practice to use SAS System option Datestyle. It would put your datetime in required format.

Ex :    Options DATESTYLE = MDY;

It would read the data in sequence of Month, Day and Year when ANYDATE informat is given to the data.

Difference Between SAS SUM Function And Using '+' Operator In SAS

Hi All,

Operators :   For any type of calculation, we all are much familiar with Division (/), Multiplication (*), Addition (+) and Substraction (-) which are known as operators. Whenever we are in need to do some calculation in SAS, we use operators. Operators are to do indicated operation between SAS variables or constant values.

Sum Function in SAS :- It returns the sum of non missing and missing arguments.

Ex:          Total = Sum (U,M,A);

Total =  Sum (of  U1 - U10) ;

Total = Sum ( U M A );

"+" Operator in SAS :- It returns a missing value if any of the values (arguments ) are missing.

Difference between SAS Procedure and SAS function

  •  SAS functions would be in need for argument values to be supplied across an observation in SAS dataset.

  •  SAS functions are already written ( pre-written) expressions which gives programming shortcuts for many manipulation and calculation of data

  •  SAS functions are for normal (common ) mathematical operations such as square root, natural log and many more....

  •  SAS Procedures would be in need for one argument value per observation in SAS dataset.

How To Handle Special Missing Values In SAS

Hi All,

 Today we are going to discuss, how to handle missing values in SAS. Yes, I know, you are going to say that you know this thing very well. It’s not about the normal missing values. It’s for SPECIAL MISSING VALUES.

 What is SPECIAL in Missing values:

Suppose I am the owner of a medical shop and I need to enter the name of the medicines which are not available. I am going to take ‘i’ for fever medicine and ‘-‘for cold.

Now I have two type of missing values ‘i’ and ‘-‘. I think you got my point for special missing values.

 If not, then another example is going on:

Suppose in an organization an employee is absent on a day and organization is noting down this one with a notation ‘A’. Some time it happens that employee is in office but he forgot to swipe his details so the system would reflect absent as per the process. But for this type of missing information of absent, organization is having notation like ‘<’.

 Finally you got the point for SPECIAL MISSING VALUES.

 The special missing value is a type of numeric missing value that enables us to show different categories of missing data by using the letters A to Z (A-Z) or an underscore in the data.

 To deal with the special missing values in SAS we have MISSING Statement. We just need to assign all special missing values in Missing Statement;

 Here we go with the example:

Data DSN;

Input id x;

Missing i a _;

Datalines;

1 i

2 0

3 a

4 155

5 _

6 .

7 -24546

8 548

;

Run;


And the output would be like this :

* Here we have Special Missing values ‘I’, ‘a’ and ‘_’.

* DSN is the name of the dataset.

 The order of special missing values can be understand like this: (_) < (.) < (A) < (Z) < (-24546) < (0) < (155).

 *(-24546) is a negative integer and (155) is positive integer.

 You can check the all special values, if it is working correctly in the following example: (Try by yourself just giving you code)

Data calc;

Input x1 x2 x3; (Missing statement is unavailable)

Datalines;

1 2 a

3 2 i

3 1 x

1 1 1

2 2 2

;

Run;




Data Test;

Set calc;

Missing a i x;

Total = x1+x2+x3;

Sumfn= sum(of x1-x3);

Run;




Data exp;

Missing a;

X1 = ._;

X2 = 5;

X3 = .a;

Sum = x1 + x2 + x3;

Sumfn = sum(of x1,x2,x3);

Run;

Note=> Remember if we are assigning a missing value in an expression or in any assignment statement, a period (.) must be used before the special missing value or an underscore otherwise there will be an error in SAS log i.e. "NOTE: Variable _ is uninitialized."  And same happens with a character missing value as well.