SAS SORTSIZE Option For Large Amount Data Files Of SAS Data Sets

Hi,

Suppose we are dealing with a file (.txt) which is around 500 mb and we need to sort the data with the help of Proc Sort. if we directly sort the data then system would not be able to do it. It would get hang and all.

To deal with such type of big data, we need to check some SAS System options like : SORTSIZE (specific to proc sort), MEMSIZE, REALMEMSIZE.

1. The MEMSIZE option is set defaults to 96 MB and SORTSIZE is set to default to 80 MB for Unix/Linux servers.

It is not good to set the MEMSIZE option to zero on Unix and Linux servers.

2. On Windowing environment, MEMSIZE option is set to zero by default and for SORTSIZE it is 64 MB.

3. SORTSIZE option creates a temporary utility file in sas work library, in which it store the data

4. The SORTSIZE sas option is similar to the REALMEMSIZE sas system option

5. SORTSIZE sas system option only effect the SORT Procedure while REALMEMSIZE sas system option effects multiple procedures.

SORTSIZE = option ;

Proc Sort statement supports the sortsize = option , which set the limit of the amount of memory available for proc sort to use.

To set sortsize option:-  Here "n" is defined as a real number only

n         -     Amount of memory in bytes

nK      -     Amount of memory in kilobytes

nM      -     Amount of memory in megabytes

nG       -     Amount of memory in Gigabytes

Min     -     Specify the minimum amount of memory available

Max    -     Specify the maximum amount of memory available

 

* SORTSIZE = value can increase or decrease CPU and I/O resource utilization

* If our machine has 14 MB of physical memory and we are sorting large data sets, setting SORTSIZE option between 4 MB to 10 MB may improve your system performance.

* We always need free disk space that should be equals to three to four time of large data sets. Suppose our data set requires 2 MB of disk space, then we would be in need 6 MB to 8 MB of disk space to sort the data

 

No comments:

Post a Comment