//
archives

Sample Program

This category contains 5 posts

Conditional deleting observation from a dataset


If we are talking about splitting data, one of the proven method will be Hash, very efficient(one pass to the data if presorted, no need to know how many ‘types’ in advance) and fully dynamic.

data have;

input type   x   y;

cards;

1 12 13

1 14 15

1 12 13

2 14 25

2 15 26

2 26 23

3 25 26

3 26 35

3 36 38

;

data _null_;

declare hash h();

     h.definekey(‘_n_’);

     h.definedata(‘type’, ‘x’, ‘y’);

     h.definedone();

do _n_=1 by 1 until (last.type);

set have;

by type;

           rc=h.replace();

end;

     rc=h.output(dataset:cats(‘want’,type));

run;

Advertisements

Calculate a person’s age


/* Create sample data */

data birth;
  input name $ bday :mmddyy10.;
datalines;
Miguel      12/31/1973
Joe         02/28/1976
Rutger      03/29/1976
Broguen     03/01/1976
Susan       12/12/1976
Michael     02/14/1971
LeCe        11/09/1967
Hans        07/02/1955
Lou         07/30/1960
;

/* Use the INTCK function to count the number of months between       */
/* the date of birth and the current date.  Divide the number of      */
/* months by 12 to produce the number of years.  Use the MONTH        */
/* function to determine if the month of the birthday and the current */
/* date are the same.  If they are, determine if the birthday has     */
/* occurred this year.  If it hasn't, adjust the age by subtracting   */
/* one year.                                                          */  


data ages;
  set birth;
  retain current;
  if _n_=1 then current=today();
  format bday current worddate20.;
  age=int(intck('month',bday,current)/12);
  if month(bday)=month(current) then age=age-(day(bday)>day(current));
run;

proc print;
run;

Comparing a row in a dataset with the next row


SAS has the LAG function to look backwards as LinusH pointed out. However there is no looking forward function. But you can kind of simulate this using a second SET statement with the POINT= option so that you will read the I+1 observation, you will need to rename the variables from the second SET statement, so that you can compare. Taking your logic here is some sample code.

data have;
infile cards dlm=”,”;
input
    Patient_ID : 8.
    Admit_Date : anydtdtm.
    Disch_Date : anydtdtm.
    Facility_Type : $32.
  ;
format admit_date disch_date datetime19.;
cards;
12345,02JAN2014:00:00:00.000,05JAN2014:00:00:00.000,Nursing Home
12345,05JAN2014:00:00:00.000,07JAN2014:00:00:00.000,Hospital
23456,02JAN2014:00:00:00.000,06JAN2014:00:00:00.000,Nursing Home
23456,06JAN2014:00:00:00.000,09JAN2014:00:00:00.000,Hospital
23456,08JAN2014:00:00:00.000,11JAN2014:00:00:00.000,Nursing Home
23456,11JAN2014:00:00:00.000,19JAN2014:00:00:00.000,Hospital
23456,19JAN2014:00:00:00.000,23JAN2014:00:00:00.000,Nursing Home
23456,24JAN2014:00:00:00.000,25JAN2014:00:00:00.000,Hospital
23456,25JAN2014:00:00:00.000,30JAN2014:00:00:00.000,Nursing Home
34567,09JAN2014:00:00:00.000,15JAN2014:00:00:00.000,Nursing Home
45678,13JAN2014:00:00:00.000,20JAN2014:00:00:00.000,Nursing Home
;
data want;
  obs1 = 1;
do while( obs1 <= nobs);
set have nobs=nobs;
    obs2 = obs1 + 1;
set
      have(
rename=(
        Patient_ID = pId2
        Admit_Date = ad2 
        Disch_Date = dd2
        Facility_Type = ft2
        )
      ) point=obs2
    ;
If pId2 = Patient_ID and
      ft2 = “Hospital” and
      Facility_Type = “Nursing Home” and
      datepart(ad2) >= datepart(Admit_Date) and
      datepart(ad2) <= datepart(Disch_Date) Then
      Flag = 1;
Else Flag = 0;
output;
    obs1 + 1;
end;
drop obs1 obs2 pId2 ad2 dd2 ft2;
run;

SAS Date, Time, and Datetime Functions


DATE()

returns today’s date as a SAS date value.

DATEJUL( yyddd )

returns the SAS date value given the Julian date in yyddd or yyyyddd format. For example, DATE = DATEJUL(99001); assigns the SAS date value ’01JAN99’D to DATE, and DATE = DATEJUL(1999365); assigns the SAS date value ’31DEC1999’D to DATE.

DATEPART( datetime )

returns the date part of a SAS datetime value as a date value.

DATETIME()

returns the current date and time of day as a SAS datetime value.

DAY( date )

returns the day of the month from a SAS date value.

DHMS( date, hour, minute, second )

returns a SAS datetime value for date, hour, minute, and second values.

HMS( hour, minute, second )

returns a SAS time value for hour, minute, and second values.

HOLIDAY( ‘holiday‘, year )

returns a SAS date value for the holiday and year specified. Valid values for holiday are ‘BOXING’, ‘CANADA’, ‘CANADAOBSERVED’, ‘CHRISTMAS’, ‘COLUMBUS’, ‘EASTER’, ‘FATHERS’, ‘HALLOWEEN’, ‘LABOR’, ‘MLK’, ‘MEMORIAL’, ‘MOTHERS’, ‘NEWYEAR’,’THANKSGIVING’, ‘THANKSGIVINGCANADA’, ‘USINDEPENDENCE’, ‘USPRESIDENTS’, ‘VALENTINES’, ‘VETERANS’, ‘VETERANSUSG’, ‘VETERANSUSPS’, and ‘VICTORIA’. For example: EASTER2000 = HOLIDAY(’EASTER’, 2000);

HOUR( datetime )

returns the hour from a SAS datetime or time value.

INTCINDEX( ‘date-interval‘, date )

INTCINDEX( ‘datetime-interval‘, datetime )

returns the index of the seasonal cycle given an interval and an appropriate SAS date, datetime, or time value. For example, the seasonal cycle for INTERVAL=’DAY’ is ‘WEEK’, soINTCINDEX(’DAY’,’01SEP78’D); returns 35 since September 1, 1978, is the sixth day of the th week of the year. For correct results, date intervals should be used with date values, and datetime intervals should be used with datetime values.

INTCK( ‘date-interval‘, date1, date2 )

INTCK( ‘datetime-interval‘, datetime1, datetime2 )

returns the number of boundaries of intervals of the given kind that lie between the two date or datetime values.

INTCYCLE( ‘interval‘ )

returns the interval of the seasonal cycle, given a date, time, or datetime interval. For example, INTCYCLE(‘MONTH’) returns ‘YEAR’ since the months January, February, …, December constitute a yearly cycle. INTCYCLE(‘DAY’) returns ‘WEEK’ since Sunday, Monday, …, Saturday is a weekly cycle.

INTFIT( date1, date2, ‘D‘ )

INTFIT( datetime1, datetime2, ‘DT‘ )

INTFIT( obs1, obs2, ‘OBS‘ )

returns an interval that fits exactly between two SAS date, datetime, or observation values, in the sense of the INTNX function uses SAMEDAY alignment. In the following example, result1 is the same as date1 and result2 is the same as date2.

   FitInterval = INTFIT( date1, date2, 'D' );
   result1 = INTNX( FitInterval, date1, 0, 'SAMEDAY');
   result2 = INTNX( FitInterval, date1, 1, 'SAMEDAY');

More than one interval can fit the preceding definition. For instance, two SAS date values that are seven days apart could be fit with either ‘DAY7’ or ‘WEEK’. The INTFIT algorithm chooses the more common interval, so ‘WEEK’ is the result when the dates are seven days apart. The INTFIT function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.

INTFMT(‘interval‘ ,’size‘)

returns a recommended format, given a date, time, or datetime interval for displaying the time ID values associated with a time series of the given interval. The valid values of size (‘long,’ ‘l,’ ‘short,’ ‘s’) specify whether the returned format uses a two-digit or four-digit year to display the SAS date value.

INTGET( date1, date2, date3 )

INTGET( datetime1, datetime2, datetime3 )

returns an interval that fits three consecutive SAS date or datetime values. The INTGET function examines two intervals: the first interval between date1 and date2, and the second interval between date2 and date3. In order for an interval to be detected, either the two intervals must be the same or one interval must be an integer multiple of the other interval. That is, INTGET assumes that at least two of the dates are consecutive points in the time series, and that the other two dates are also consecutive or represent the points before and after missing observations. The INTGET algorithm assumes that large values are SAS datetime values, which are measured in seconds, and that smaller values are SAS date values, which are measured in days. The INTGET function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.

INTINDEX( ‘date-interval‘, date )

INTINDEX( ‘datetime-interval‘, datetime )

returns the seasonal index, given a date, time, or datetime interval and an appropriate date, time, or datetime value. The seasonal index is a number that represents the position of the date, time, or datetime value in the seasonal cycle of the specified interval. For example, INTINDEX(’MONTH’,’01DEC2000’D); returns 12 because monthly data is yearly periodic and DECEMBER is the th month of the year. However, INTINDEX(’DAY’,’01DEC2000’D); returns 6 because daily data is weekly periodic and December 01, 2000, is a Friday, the sixth day of the week. To correctly identify the seasonal index, the interval specification should agree with the date, time, or datetime value. For example, INTINDEX(’DTMONTH’,’01DEC2000’D); andINTINDEX(’MONTH’,’01DEC2000:00:00:00’DT); do not return the expected value of 12. However, both INTINDEX(’MONTH’,’01DEC2000’D); andINTINDEX(’DTMONTH’,’01DEC2000:00:00:00’DT); return the expected value of 12.

INTNX( ‘date-interval‘, date, n <, ‘alignment’> )

INTNX( ‘datetime-interval‘, datetime, n <, ‘alignment’> )

returns the date or datetime value of the beginning of the interval that is n intervals from the interval that contains the given date or datetime value. The optional alignment argument specifies that the returned date is aligned to the beginning, middle, or end of the interval. Beginning is the default. In addition, you can specify SAME (or S) alignment. The SAME alignment bases the alignment of the calculated date or datetime value on the alignment of the input date or datetime value. As illustrated in the following example, the SAME alignment can be used to calculate the meaning of “same day next year” or “same day 2 weeks from now.”

   nextYear = INTNX( 'YEAR', '15Apr2007'D, 1, 'S' );
   TwoWeeks = INTNX( 'WEEK', '15Apr2007'D, 2, 'S' );

The preceding example returns ’15Apr2008’D for nextYear and ’29Apr2007’D for TwoWeeks. For all values of alignment, the number of intervals n between the input date and the resulting date agrees with the input value.

   date2 = INTNX( interval, date1, n1, align );
   n2 = INTCK( interval, date1, date2 );

The result is always that n2 = n1.

INTSEAS( ‘interval‘ )

returns the length of the seasonal cycle, given a date, time, or datetime interval. The length of a seasonal cycle is the number of intervals in a seasonal cycle. For example, when the interval for a time series is described as monthly, many procedures use the option INTERVAL=MONTH to indicate that each observation in the data then corresponds to a particular month. Monthly data are considered to be periodic for a one-year seasonal cycle. There are 12 months in one year, so the number of intervals (months) in a seasonal cycle (year) is 12. For quarterly data, there are 4 quarters in one year, so the number of intervals in a seasonal cycle is 4. The periodicity is not always one year. For example, INTERVAL=DAY is considered to have a seasonal cycle of one week, and because there are 7 days in a week, the number of intervals in a seasonal cycle is 7.

INTSHIFT( ‘interval‘ )

returns the shift interval that applies to the shift index if a subperiod is specified. For example, YEAR intervals are shifted by MONTH, so INTSHIFT(‘YEAR’) returns ‘MONTH’.

INTTEST( ‘interval‘ )

returns 1 if the interval name is a valid interval, 0 otherwise. For example, VALID = INTTEST(’MONTH’); should set VALID to 1, while VALID = INTTEST(’NOTANINTERVAL’); should set VALID to 0. The INTTEST function can be useful in verifying which values of multiplier n and the shift index s are valid in constructing an interval name.

JULDATE( date )

returns the Julian date from a SAS date value. The format of the Julian date is either yyddd or yyyyddd depending on the value of the system option YEARCUTOFF=. For example, using the default system option values, JULDATE( ’31DEC1999’D ); returns 99365, while JULDATE( ’31DEC1899’D ); returns 1899365.

MDY( month, day, year )

returns a SAS date value for month, day, and year values.

MINUTE( datetime )

returns the minute from a SAS time or datetime value.

MONTH( date )

returns the numerical value for the month of the year from a SAS date value. For example, MONTH=MONTH(’01JAN2000’D); returns , the numerical value for January.

NWKDOM( n, weekday, month, year )

returns a SAS date value for the th weekday of the month and year specified. For example, Thanksgiving is always the fourth () Thursday () in November (). ThusTHANKS2000 = NWKDOM( 4, 5, 11, 2000); returns the SAS date value for Thanksgiving in the year 2000. The last weekday of a month can be specified using . Memorial Day in the United States is the last () Monday () in May (), and so MEMORIAL2002 = NWKDOM( 5, 2, 5, 2002); returns the SAS date value for Memorial Day in 2002. Because always specifies the last occurrence of the month and most months have only 4 instances of each day, the result for is often the same as the result for . NWKDOM is useful for calculating the SAS date values of holidays that are defined in this manner.

QTR( date )

returns the quarter of the year from a SAS date value.

SECOND( date )

returns the second from a SAS time or datetime value.

TIME()

returns the current time of day.

TIMEPART( datetime )

returns the time part of a SAS datetime value.

TODAY()

returns the current date as a SAS date value. (TODAY is another name for the DATE function.)

WEEK( date <, ‘descriptor’> )

returns the week of year from a SAS date value. The algorithm used to calculate the week depends on the descriptor.

If the descriptor is ‘U,’ weeks start on Sunday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.

If the descriptor is ‘V’, the result is equivalent to the ISO 8601 week of year definition. The range is to . Week is a leap week. The first week of the year, Week , and the last week of the year, Week or , can include days in another Gregorian calendar year.

If the descriptor is ‘W’, weeks start on Monday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.

WEEKDAY( date )

returns the day of the week from a SAS date value. For example WEEKDAY=WEEKDAY(’17OCT1991’D); returns , the numerical value for Thursday.

YEAR( date )

returns the year from a SAS date value.

YYQ( year, quarter )

returns a SAS date value for year and quarter values.

Compute daylight saving time


data _null_;
  do year=2006 to 2009;
    fdoy=mdy(1,1,year);
    /* For years prior to 2007, daylight time begins in the United States on */
    /* the first Sunday in April and ends on the last Sunday in October.     */
    if year <= 2006 then do;
      fdo_apr=intnx('month',fdoy,3);
      dst_beg=intnx('week.1',fdo_apr,(weekday(fdo_apr) ne 1));
      fdo_oct=intnx('month',fdoy,9);
      dst_end=intnx('week.1',fdo_oct,(weekday(fdo_oct) in (6,7))+4);
    end;

    /* Due to the Energy Policy Act of 2005, Pub. L. no. 109-58, 119 Stat 594 */
    /* (2005).  Starting in March 2007, daylight time in the United States    */
    /* will begin on the second Sunday in March and end on the first Sunday   */
    /* in November.  For more information, one reference is                   */
    /* http://aa.usno.navy.mil                                                */
    else do;  
      fdo_mar=intnx('month',fdoy,2);
      dst_beg=intnx('week.1',fdo_mar, (weekday(fdo_mar) in (2,3,4,5,6,7))+1);
      fdo_nov=intnx('month',fdoy,10);
      dst_end=intnx('week.1',fdo_nov,(weekday(fdo_nov) ne 1));
    end;
    put dst_beg= worddate. /
    dst_end= worddate. / ;
  end;
run;
%d bloggers like this: