SAS Character Function- Compress and Compbl

Hi,

Here we have some more character function for data cleaning. It means to get rid off from the unwanted data / characters from our data or string, that character could be special characters like : ?,!,@,#,$ and many more..

COMPRESS function in SAS: It removes or suppresses all mentioned special characters from character string and return the desired character string.

Syntax:  Compress (string, ‘unwanted characters’, ‘Modifiers’);

Data DSN;

X='@This% is !$an (exam^$ple #*of S!@A_S';

Y=Compress(X,'@,%,!,$,^,#,*,_,(');

Put X = /

Y =;

Run;



Just add all the unwanted character into the second argument with comma (,) and it would get removed from your variable.

  •  String – a character constant, any expression which resolves to character, character variable

  • Unwanted character- here we can mention all the character which needs to be removed

  • For Data step, Length of the returned variable from COMPRESS function would be equal to the variable’s lengths given in the first argument, if length is not assigned to new variable

  • COMPRESS function also allows null arguments, any null arguments would be treated as string if it has a zero length

  • Returned variable type for COMPRESS function would always as character

  • COMPRESS function removes each and every incidence from the specified character string. If we specify a blank as a character to delete from string then COMPRESS function would delete all blanks from source variable

  • If we want to use modifiers and we not specifying any second argument, then we need to use two commas together which would indicate that the modifier is the third argument

Modifiers - A variable, character constant or any expression which modifies the action of compress function. Some of the useful modifiers are given below:

A: It adds alphabetic characters to the list of characters to be deleted

D: It adds digits to the list of the characters to be deleted

I: It tells to ignore the case of characters to be deleted or kept

K: It keeps the listed character instead of removing them

N: It adds digits and underscores character

L:  It adds all lower case character to the list of character to be deleted or kept.

P: It adds all the punctuation marks to the list of character to be deleted.

U: It adds all the uppercase letters to the list of character

W: It adds all the printable character to the list of character

Suppose we want to retain some digits as well in our string then we need to use some modifiers like it :

Data DSN;

X='@This% is !$an (exam^$ple #*of S!@AS 12345';

Y=Compress(X,' ','KN');

Put X = /

Y =;






COMPBL function: This function removes extra blanks or multiple blanks from a character string by assuming each incidence of two or more consecutive blanks into a single blank

Syntax : COMPBL (argument);

Data DSN;

X='Uma   Shanker        Saini';

Y=COMPBL(X);

Put X =  /

Y= ;

Run;



  • Argument specifies to a variable, any character constant, expression to compress or any valid expression which would evaluates to character string

  • For Data step, Length of the returned variable from COMPBL function would be equal to the variable’s lengths given in the first argument, if length is not assigned to new variable

  • COMPBL function also allows null arguments, any null arguments would be treated as string if it has a zero length

  • Returned variable type for COMPBL function would always as character

  • COMPBL function removes multiple blanks only from the source variable and there would be no effect on single blank

1 comment: