//
archives

Weekends

This category contains 5 posts

SAS Day 3


INPUT OVERVIEW

The INPUT statement describes the arrangement of a target data to be read in a DATA step. You need to provide variable names followed by $ (indicating a character value), pointer control, column-specifications, informat, and/or line hold specifiers (i.e., @, and @@) in an INPUT statement.

  • Column pointer controls such as @n and +n move the input pointer to a specified column in the input buffer.
  • Line pointer controls such as #n and / move the input pointer to a specified line in the input buffer.
  • Column specifications specify the columns of the input record that contain the value to read.
  • A informat is an instruction that SAS uses to read data into variables.
  • @, a single trailing @, holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. Thus, the next INPUT statement reads from the same record (line).
  • @@, a double trailing @, holds the input record for the execution of the next INPUT statement across iterations of the DATA step. Thus, the INPUT statement for the next iteration of the DATA step continues to read the same record (line).

The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a DATA step. In order to read external data files, you have to use the INFILE statement.

There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input. The following table summarizes features of four major styles.

input method

Which input style is the best? It depends on your skills and characteristics of data sets. If your data set has just a few observations with several variables, the list input or the named input will be better than the column input or the formatted input. When data elements are not separated with a blank or other delimiters, you cannot use the list input style. When data are well arranged, the column input or formatted input will be better than the list input. Therefore, you need to examine the data structure carefully when deciding the best input style. Of course, you must take this issue into account from the data coding stage.

LIST INPUT

The input style simply lists variables separated with a blank. This style is also called the free format.

DATA listed;
INPUT name $ id score;
DATALINESS /*–1—-+—-2—*/;
Park 8740031 87.5
Hwang . 94.3

RUN;

A character variable should be followed by $. A missing value should be marked with a period (.); a blank does not mean a missing value in this input style. Do not use more than one “.” for a missing value. The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory are assigned to each variable. Therefore, a string longer than 8 characters will be trimmed. If you want to read a string longer than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements. Or you may use different input styles such as column input or formatted input.

DATA _NULL_;
LENGTH analysis $15.;
INFORMAT year MMDDYY10.;
INPUT analysis year;
FORMAT year DATE9.;
CARDS /*–1—-+—-2—*/;
Regression 1/2/2002
ANOVA 05/05/2007
Time-Series 09/03/1968
RUN;
/* Output
Regression 01OCT2000
ANOVA 05MAY2004
Time-Series 03SEP2009
*/

In the example above, you may use “INFORMAT analysis $15.” instead of the LENGTH statement. INFORMAT tells how data are read, while FORMAT tells the format to be displayed. MMDDYY10. reads data in the MM/DD/YYYY format. DATE9. displays date in the DDMMMYYYY format. Without the FORMAT for year, SAS will return odd numbers such as 14884, which are internally used in SAS.

The following example reads an ASCII text file with a comma delimited. Remember the default delimiter is a blank. See the INFILE statement for the detail.

DATA _NULL_;
INFILE ‘a:\tiger.dat’ DELIMITER=’,’ STOPOVER;
INPUT name $ id score

RUN;

MODIFIED LIST INPUT

The modified list style is a mixture of the list input and the formatted input. This style can deal with ill-structured data. There are three format modifiers to be used especially when reading complex data.

  • colon (:) reads data longer than standard 8 characters or numbers until encountering specified delimiter or reaching the variable width specified.
  • ampersand (&) format modifier reads character values that contain embedded blanks with list input and reads until encountering more than one consecutive delimiter. You may include ” (double quotes) in the value of a character variable.
  • tilde (~) reads and retains single quotation marks, double quotation marks, and delimiters within quoted character values. That is, double quotation marks enclosing a string are treated as values of a character variable.

The following example illustrates how : and & work in INPUT. The “Lindblom80” in the first row is trimed since it exceeds 8 characters; only first 8 characters, as specified in the INPUT statement, are read and the last two characters “08” are ignored. In the second row, SAS reads the first four characters “Park”, which are shorter than 8 characters, and then encounters a comma (delimiter); SAS stops reading data for the variable “name” and moves on to next variable. The variable “title” is defined by & with a maximum 50 characters. The delimiter, a comma, in the first and third row is treated as a character value. Two consecutive double quotation marks are read as a double quotation marks. Therefore, the title of the second observation is Readig “Small Is Beautiful” as shown in the output. Characters exceeding the maximum, 50 characters in this case, will be ignored.

DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : $8. title & $50.;
DATALINES;
Lindblom80,”Still Muddling, Not Yet Through”
Park, “Reading “”Small Is Beautiful”””
Simon, “””It was a disaster,”” he continue…”
RUN;

/* Output
Lindblom Still Muddling, Not Yet Through
Park     Reading “Small Is Beautiful”
Simon    “It was a disaster,” he continue…
*/

The INFILE statement above says that data are comma delimited and will be listed after DATALINES. DSD at the end of INFILE eliminates double quotation marks enclosing the character value when reading data. If you omit DSD, SAS will consider a comma in character values as a delimiter and read enclosing double quotation marks as character values. As a result, the output would look like,

Lindblom “Still Muddling
Park “Reading “”Small Is Beautiful”””
Simon “””It was a disaster,”” he continue…”

The second example shows how ~ (tilde) and DSD work together to read a string with a delimiter. SAS reads a comma in the string as a character value but does not eliminate double quoatation marks enclosing the string. If you omit DSD, the title of the second row will be ‘”Still Muddling’ because SAS treats a comma in the string as the delimiter and stops reading the character value for variable “title.”

DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : $20. year : 4.0 title ~ $50.;
DATALINES;
Meyer and Rowan,1977,”Institutionalized Organization”
Lindblom,1979,”Still Muddling, Not Yet Through”
RUN;
/* Output
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom        1979 “Still Muddling, Not Yet Through”
*/
/* Output without DSD
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom 1979 “Still Muddling
*/

You may not ommit : after “year” in the INPUT statement above even when data are in the same fixed format. When the variable “year” is specified at the last of the list in the INPUT statement, : is not necessary.

COLUMN INPUT

The column input style reads the value of a variable from its specified column location. A variable name is followed by its starting and ending columns.

DATA columned;
INPUT name $ 1-5 id 6-12 score 14-17;
CARDS /*–1—-+—-2—*/;
Park 8740031 87.5
Hwang9301020 94.3

RUN;

SAS reads a variable “name” from 1 through 5 columns, id from 6 through 12 columns, and so on. This input style works good for well structured data.

FORMATTED INPUT

The formatted input style reads input values with specified inforamts after variable names. Informats provide the data type and the width of an input value. Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point. You cannot omit d even when d = 0. The use $CHARw. or $w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.

DATA formatted;
INPUT name $5. id 7. score 4.1;
DATALINES /*–+—-2—*/;
Park 8740031 875 /* score=87.5 */
Hwang9301020 943 /* score=94.3 */

RUN;

You can use parentheses to simplify expressions.

DATA formatted;
INPUT name $5. id 7. (grade1-grade3) (3.);
DATALINES /*–+—-2—*/;
Park 8740031 89 95100
Hwang9301020100 93 99

RUN;

The following example illustrates how effectively the formatted input uses column holders,informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses. SAS reads a variable x1 as a string five characters long, a numeric variable x2 7 digits long without decimal point, three digit numeric variables x3 through x5, and then skip one column (+1) before reading a numeric variable income containing commas.

DATA formatted;
INPUT (x1-x5) ($CHAR5. 7. 3*3.0) +1 income COMMA7.;
DATALINES /*–+—-2—-+—-3*/;
Park 8740031 89 95100 84,895
Hwang9301020100 93 99 168,579

RUN;
/* Output
Park 8740031 89 95 100 84895
Hwang 9301020 100 93 99 168579
*/

The formattted input can use both column and line pointer controls. These pointer controls are very useful when reading multiple observations from the same line or reading an observation from multiple lines.

  • @n, a column control, moves the input pointer to nth column
  • @@, a line holder, keeps the pointer in the line and wait other data input
  • +n, a column control, moves the pointer to the right by n columns
  • #n, a row control, goes to the nth line
  • / goes to the first column of the next line

NAMED INPUT

The named input reads a data value that follows its variable name. A variable name and its data value are separated by an equal sign. String data are NOT enclosed by double quotation marks in this style. Like the list style, the named style supports standard length of variables only. The format provides some sorts of flexibility, but it will not be appropriate for a large data set.

DATA named;
INPUT name=$ id= grade=;
DATALINES;
name=Park id=8740031 grade=89
name=Hwang id=9301020 grade=100

RUN;

MIXED INPUT

The INPUT statement can contain list input, column input, formatted input, and/or named input.

DATA mixed;
INPUT name $ 1-5 @7 id $7. +1 grade1 3. grade2 18-22;
CARDS /*–1—-+—-2—*/;
Park 8740031 89 95.1
Hwang 9301020 100 93.9

RUN;

READING MULTIPLE OBSERVATIONS

Let us read multiple observations in a line using the formatted input style. The following script reads string variables “name” and “id” consecutively, and reads three digit numeric variables x1 through x3, and then keep reading next observations, if available, without moving to next line.

DATA formatted;
INPUT name $ id $ (x1-x3)(3.) @@;
CARDS /*–1—-+—-2—-+—-3—-+—-4—-+—-5-*/;
Park 8740031  89 95100 Choi 9730625 100100 95
Hwang 9301020 100 93 99 …
;RUN;
/* Output
Park 8740031 89 95 100
Choi 9730625 100 100 95
Hwang 9301020 100 93 99
*/

The following example reads data using a DO loop.

DATA rbd_block;
INPUT treat $ @@;
DO block=’High’, ‘Medium’, ‘Low’; /* DO block=1 TO 3;*/
INPUT income @@; OUTPUT;
END;
DATALINES;
Drug1 34 55 34
Drug2 45 56 32
Drug3 45 56 32
;RUN;
/* Output
1 Drug1 High 34
2 Drug1 Medi 55
3 Drug1 Low 34
4 Drug2 High 45
5 Drug2 Medi 56
6 Drug2 Low 32
7 Drug3 High 45
8 Drug3 Medi 56
9 Drug3 Low 32
*/

Suppose individual observations have different numbers of repeatition. Pay attention to IF and OUTPUT statements.

DATA repeat;
INPUT crop $ no @;
DROP no;
IF no GT 0 THEN DO;
DO trial=1 TO no;
INPUT cost benefit @;
OUTPUT;
END;
END;
DATALINES;
rice 3 54 87 98 77 57 67
bean 2 65 87 96 54
RUN;
/* Output
rice 1 54 87
rice 2 98 77
rice 3 57 67
bean 1 65 87
bean 2 96 54
*/

READING MULTIPLE LINES

Now, let us read observations whose data are provided in multiple lines. The #n or / indicates a data line to be read for the variable.

DATA spanned;
INPUT #1 No 7.0 #2 name $CHAR15. / address $CHAR50. #4 phone $CHAR12.;
DATALINES;
000001
Park
2451 E. 10th St. APT 311
812-857-9425
000002
Hun
800 N. Union St. APT 525
812-857-6256
RUN;
/* Output
1 Park 2451 E. 10th St. APT 311 812-857-9425
2 Hun 800 N. Union St. APT 525 812-857-6256
*/

The INPUT statement above says that read a 7 digit numeric variable “No” from the first line (#1), a 15 character string variable “name” from the second line (#2), a 50 character string variable “address” from the next line (/), and a 12 character string variable “phone” from the fourth line (#4). Alternatively, the INPUT may be replaced by “INPUT No 7.0 / Name $15 / Address $50 / Phone $12;.”

Advertisements

SAS Day 2


SAS Programs contains the following steps:
Data Step
•Proc Step
•Combination of DATA and PROC step

Data Step
Typically create or modify SAS data sets and they can also be used to produce custom-designed reports.
DATA steps are used to:
  Put data into a SAS data set
  Compute values
  Check for and correct errors in data
  Produce new SAS data sets by sub setting, merging, and updating existing data sets
  Put data into a SAS data set & Compute values
  Check for and correct errors in data
  Produce new SAS data sets by sub setting, merging, and updating existing data sets

Proc Step
They pre-written routines that enable us to analyze and process the data in a SAS data
set and to present the data in the form of a report. PROC steps sometimes create new
SAS data sets that contain the results of the procedure. PROC steps can list, sort, and
summarize data.
PROC steps are used to:
      •Create a report that lists the data
      •Produce descriptive statistics
      •Create a summary report
      •Produce plots and charts

Reading In stream Data using Cards and Datalines
Data can be entered into SAS data set directly through SAS program. Reading in
stream data is useful when to create data and test programming statements on a few
observations.

To read in stream data use:
  *DATALINES statement as the last statement in the DATA step (except for the RUN statement) and immediately preceding the data lines.
  *a null statement( a single semicolon) to indicate the end of the input data
  *Only one DATALINES statement can be used in a DATA step
  *Use separate DATA steps to enter multiple sets of data
  *If the data contains semicolons, use the DATALINES4 statement plus a null statement that consists of four semicolons (;;;;) to indicate the end of the input data.

Codes to create dataset

Data  Day1.employee;
Length city$10  ;
Input  City$ Id$ Sal Doj;
InFormat  sal  dollar10.  doj ddmmyy10. ;
Format  sal  dollar10.  doj ddmmyy10. ;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
Chennai I111 $29,000 10/10/2010
;run;

In the above code we creating a new dataset employee . Length statement is used to increase the length of the variable (column heading) city as the default length is 8 . If
the length is not increased then the letter ―e‖ of the observation Bangalore will come in the employee data set.
Input is the keyword to declare the column headings i.e. city id sal doj. Dollar ($) sign is used with the variable city and id because they are character variable.
Informat is the keyword to read values with special character like $,/ , comma etc. Format is the keyword to write them in the proper format example salary as $23,000 etc.
Datalines is the keyword to declare the observations under the variables. Now if we want to add label (descriptive text )to the variable sal for better understanding. Secondly we can also change the name of a variable permanently by the rename keyword.

Below is the code for label and rename.

Data  Day1.employee;
Length city$10  ;
Input  City$ Id$ Sal Doj;
Informat  sal  dollar10.  doj ddmmyy10. ;
Format  sal  dollar10.  doj ddmmyy10. ;
label sal=”salary of the employees”;
Rename doj = date_of_joining;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
;run;

Some Basic SAS Procedures

Proc Contents
Proc contents lists the structure of the specified SAS data set. The information includes
the names and types (numeric or character) of the variables in the data set. The most
common form of usage is

Proc contents data=second;
run;
This lists the information for the data set second.

proc contents data=_all_;
run;
_All_ –  printing the contents of individual files when you specify _ALL_ in the DATA= option.

proc contents data=_all_  nods;
run;
Nods- Suppress the printing of individual files

Proc contents data=sashelp.class varnum;
run;
Varnum- Print a list of the variables by their position in the data set. By default, the
CONTENTS statement lists the variables alphabetically.

Proc contents data=sashelp.class  position short;
run;
Position Short– Print the name of the variable from the dataset both in alphabetic and
creation order.

SAS DAY 1


SAS Programming Environment Contains 6 Main Windows:

1.  Project Designer: Shows the Process Flow of  a Project in Flow charts
2.  Project Explorer: Shows the Process Flow of a Project as Drop Down Menu
3.  Code Editor: Used to write and Edit codes
4.  Server List: Show the Physical Storage Locations of Data
5.  Log Window: Information about the execution of a program and Lists the errors while execution
6.  Output Window: Displays the output of execution of a program

There are two types of libraries in SAS
  Temporary library
  Permanent library

Depending on the library name that is used when create a file, we can store SAS files

Temporarily or permanently.


Temporary Library
Its Temporary Storage Location of a SAS data file. They last only for the current SAS session. Work is the temporary library in SAS. When the session ends, the data files stored in
the temporary library are automatically deleted.
The file is stored in Work, when:
–No specific library name is used while creating a file.
–Specify the library name as Work

Example:
Data employee;
Set local.emp;
Run;
On the above code employee data will be stored in temporary library work.

Permanent Library:
It‘s the Permanent storage location of data files. Data sets stored in any Permanent
SAS libraries are available for use in subsequent SAS sessions. A data set stored in a permanently library will be there unless we delete them physically. To store files permanently in a SAS data library specify a library name Other than the default library name
Work.
Three Permanent Libraries provided by SAS are:
              Local
              SASuser
              SAShelp

Creating a Permanent Library
To create a permanent library use libname statement. It creates a reference to the
path where SAS files are stored. The LIBNAME statement is global, which means that
the librefs remain in effect until modify them , cancel them, or end your SAS session.
The LIBNAME statement assigns a permanent library for the current SAS session only.
Assign a librefs to each permanent SAS data library each time a SAS session starts. SAS
no longer has access to the files in the library, once the libref is deleted or SAS session
is ended. Contents of Permanent library exists in the path specified.

Syntax for Creating a used defined Library linemen<libref‗path‗ ;

where,
  libref is the name of the library to be created. The following are some conventions
that needs to be followed while creating the user defined library.
  A used defined library or any library for that matter in SAS can‘t have more than 8
characters.
  A libref can have both alpha base (A-Z) and Numeric base(0-9).
  It can start with an alphabet but not with any numbers.
  It can‘t have any of the special characters except „_‟and it can begin with „_‟
and can continue with several combination of „_‟.
  path is location in memory to store the SAS files

SAS Data Sets
•SAS Data Set is a SAS file which holds Data
•Data must be in the form of a SAS data set to be processed
•Many of the data processing tasks access data in the form of a SAS data set and analyze, manage, or present the data
•A SAS data set also points to one or more indexes, which enable SAS to locate records in the data set more efficiently

Rules for SAS Data Set Names
SAS data set names :
•can be 1 to 32 characters long
•must begin with a letter (A–Z, either uppercase or lowercase) or an underscore „_‟.
•can continue with any combination of numbers, letters, or underscores.
These are examples of valid data set names:
•_sales1
•Datatelecom

Columns in SAS
Columns are generally known as headings, fields but in SAS columns are called
variables . It is a collection of values that
describe a particular characteristic. In
this table ID, Department, Satisfaction,
Years and Status are the name of the
variables in the data set.

Rows in SAS
Rows are sometime called Cases or records but in SAS these are called observations . It is a Collection of data values
that usually relate to a single object in
SAS Data Sets
Example- Accounting, Chemistry are the observations under Variable Name
(Department).

Missing Values in SAS
If a data is unknown for a particular observation, a missing value is recorded
  ―.‖ (called period) indicates missing value of a numeric variable. Salary which is a
numeric variable has 3 missing values
  ― ― (blank) indicates missing value of a character variable. In this table above Department is a character variable and has 1 missing value in it.

Referencing Permanent SAS Files Two-Level Names
Two-level name are used to reference a permanent SAS file in SAS programs
There are two parts in a Two-Level Name:
1.  Libref name
2.  Filename

Candidates should attempt all questions.


1.  What is the purpose of the statement DATA_NULL_ ?
2.  Which pointer control is used to read multiple records sequentially?
                    1.  @n
                    2.  +n
                    3.  /
3.  How do the IN= variable improve the capability of a MERGE?
4.  How many missing values are available? When might u use them?
5.  How are numeric and character missing values represented internally?
6.  If u have a data set that contains 100 variables, but u need only 5 of those, what is the code to
force SAS to use only those variables?
7.  Code  a  PROC  SORT  on  a  data  set  containing  state,  district  and  country  as  the  primary
variable, along with several numeric variables.
8.  How would u delete duplicate observation?
9.  How would u code a merge that will keep only the observation that have matches from both
sets?
10.  What is the difference between nodup and nodupkey options?
11.  WHAT IS A SAS DATE?
12.  Explain PRINTTO procedure with Example.
13.  Explain about Intck & Intnx with suitable example.
14.  Explain SCAN vs. SUBSTR?
15.  What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c, d); ?
16.  How can u import .CSV file in to SAS? Tell Syntax?

Weekends


1296047527final-format-weekends

Get in touch with us :

sasindia@outlook.com

%d bloggers like this: