//
archives

Archive for

SAS Informats and Formats


Informats are typically used to read or input data from external files called flat files
(text files, ASCII files, or sequential files) whereas formats are used for outputting data. The informat instructs SAS on how to read data into SAS variables while formats define format to output data to external files/datasets/log.

SAS informats/formats are typically grouped into three categories: character, numeric, and date/time. These are named according to the following syntax structure:

• Character Informats: $INFORMAT/FORMATw. E.g. $CHAR6.
• Numeric Informats: INFORMAT/FORMATw.d E.g. 8.2
• Date/Time Informats: INFORMAT/FORMATw. E.g. DATE9.

The $ indicates a character informat/format. INFORMAT/FORMAT refers to the sometimes optional
SAS informat/format name. The w indicates the width (bytes or number of columns) of the variable. The d is used for numeric data to specify the number of digits to the right of the decimal place. All informats/format must contain a decimal point (.) so that SAS can differentiate it from a SAS variable.

Following table will illustrate various ways to use Informats and Formats in SAS –

SAS Informats and Formats

Difference between IF and WHERE

One can subset dataset using IF or WHERE statement to select specific observations from existing SAS data sets in order to create a new SAS data set that includes only some of the observations from the input data source.
Let us explore difference between IF and WHERE at the ground level, below figure will explain processing of IF and WHERE at PDV level.

As shown in the figure, WHERE conditions are applied before the data enters the input buffer while IF conditions are applied after the data enters the program data vector. This is the reason why the WHERE condition is faster because not all observations have to be read and because it can only be applied on variables that exist in the input data set.

Processing of Where and If

Since WHERE is applied before PDV so it makes automatic and new variables created in data step inaccessible to WHERE statement, following table will summarize conditions where IF and WHERE statements can be used –

Difference between Where and If

Data Set Options

DATA SET OPTIONS

There are several data set options that can be specified with the SET statement. These options are enclosed in parentheses and follow immediately the name of the data set that they apply to.
SET dsname_1 ( OPTION = list ) ;

The data set options function identical to the SAS statements of the same name, with one key difference.Unlike SAS statements which operate on the variables in the Program Data Vector, SET statement options operate on variables before they are transferred to the Program Data Vector.

These data set options are:

• DROP = varlist
• KEEP = varlist
• FIRSTOBS = num
• IN = var
• OBS = num
• RENAME = varlist
• WHERE = condition

The DROP = and KEEP = options specify which variables in the input data set are to be omitted or processed in the data step. They apply only to the data set name most recently referenced, so different DROP = or KEEP = lists can be specified when multiple data sets are listed with a single SET statement.

The RENAME = option renames variables exactly the same as the RENAME statement. However, since this is a SET statement data set option, the RENAME operation occurs before the variable is added to the Program Data Vector. Unlike the RENAME statement, your SAS code should reference the NEW name instead of the old name.

The SAS Supervisor DROPs or KEEPs variables first when the DROP = or KEEP = data set options are specified. It is important, therefore, to DROP or KEEP the original names and not the RENAMEd names.

The FIRSTOBS = and the OBS = data set options are often confused. Both options take a positive number as an argument. FIRSTOBS = specifies the observation number in a data set that is the starting observation for the SET statement. OBS = specifies the observation number that is the last observation in a data set to read.

The IN = data set option is used with multiple data sets where it is important to know which data set contributed an observation. A separate IN = variable can be specified for each data set defined with the SET statement. The variable named by the IN = option has a value that is set to 1 for every observation that originated from the data set.

DATA newdata ;
SET dsname_1 ( in = in_1 ) dsname_2 ( in = in_2 ) ;
——-
IF ( in_1 ) THEN ———– ;
ELSE IF ( in_2 ) THEN ———– ;
RUN ;

The WHERE = data set option selects observations from a SAS data set that meet the conditions specified. It functions identical to the WHERE statement. However, the WHERE = data set option is more efficient than the WHERE statement, because only those observations that match the conditional test are transferred into the Program Data Vector. With the WHERE statement, all observations are read from the input data set and non-selected observations are discarded. The WHERE =option only selects those observations that match the criteria.
If both a WHERE = and a WHERE statement are used in the same data step, the WHERE statement is ignored for those data sets that have a WHERE = condition defined.

Reading data from a dataset

Reading a SAS data set in a DATA step is simpler than reading raw data because the work of describing the data to SAS has already been done.
The function of the SET statement is to process existing SAS data sets as input for a DATA step. With no options specified, the SAS System sequentially reads each observation in the named data sets, one observation at atime, until there are no further observations to process.

The simplest form of the SET statement is :

SET dsname ;

With no options specified, the SAS System sequentially reads each observation in the named data set(s), one observation at a time, until there are no further observations to process.

SET statement can be used to concatenate, merge or interleave two or more datasets.

Concatenating SAS datasets using SET statement –

SET dsname1 dsname2….dsname

In this example, all observations in the first data set are read before the SAS Supervisor starts reading the second data set, this process continues until an end of file condition occurs after reading the last observation in the last data set listed in the SET statement All variables in the original data set(s) are added to the Program Data Vector of the DATA Step..
Up to 50 SAS data sets can be specified in a single SET statement, all the data sets must exist, although they may be empty (contain no observations).

Merging datasets using SET statement –

SET dsname1
SET dsname2

In this example, the SAS Supervisor maintains two pointers, one for each data set. An observation would be read from data set dsname_1 followed by an observation from data set dsname_2, and so on until an end of file condition occurs in one of the data sets. The combination of the two data sets would form a single observation in the new data set.
Where the same variables exist in both data sets, the values of the second data set would overlay the values of the first data set.

Interleaving datasets using SET statement –

SET dsname_1 dsname_2 … dsname_n ;
BY varlist ;

If the data sets are sorted, they can be interleaved based on order of the sorted variable(s). When a BY statement is used, all observations that belong to a particular BY group are read sequentially from each data set. When there are no more observations to read in any of the data sets for a particular BY group, the next BY group is selected and the process repeats itself. This continues until all observations in all the data sets have been read.

0

Special Options for reading raw files

FLOWOVER – It is default behavior of INPUT statement while reading an external file i.e. SAS will attempt to finish filling the value with the next record in case it encounters the shorter line (end of an input line before reading in data for all variables specified in the input statement).

MISSOVER – The MISSOVER option prevents the DATA step from going to the next line if it does not find values in the current record for all of the variables in the INPUT statement. Instead, the DATA step assigns a missing value for all variables that do not have complete values according to any specified informats.

STOPOVER – SAS will stop reading data when it encounters short line.

TRUNCOVER – The TRUNCOVER option prevents DATA step from going to the next line if it does not find values in the current record for all of the variables in the INPUT statement, it assigns the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement and will assigns a missing value for all variables that do not have values at all.

PAD – Pads short lines with blanks to the length of the record.

Reading raw data in SAS

SAS is a system used by many companies for data analysis.Since it deals with data so it is very important to import data to SAS system for further analysis.

One way to provide SAS with data is to have SAS read the data from a text file and create a SAS data set. SAS has different ways of reading data from text files and, depending on how the data values are arranged , you can choose an input method that is most convenient.

What is Raw Data –

Raw data is unprocessed data that has not been read into a SAS data set.

You can use a DATA step to read raw data into a SAS data set from two sources:

• Instream data -. passing data in the sas program itself .Most of the times we use this approach to create some dummy or test data.
• External file – Reading data from some external files.

Types of Data

Standard data
– are character or numeric values that can be read with list, column, formatted, or named input. Examples of standard data include:
TUTORIAL
201.21

Nonstandard data
– is data that can be read only with the aid of informats. Examples of nonstandard data include numeric values that contain commas, dollar signs, or blanks; date and time values; and hexadecimal and binary values.

Essential statements to read data

. The data step for reading raw data from a file has 3 essential statements:

• Data – to create a dataset
• Infile – to provide file location
• Input – to read data

Choosing an Input Style –

The INPUT statement reads raw data from instream data lines or external files into a SAS data set. You can use the following different input styles, depending on the layout of data values in the records:

• list input
• column input
• formatted input
• named input.

List Input –

• Data values should be separated by blanks or other delimiters like comma or tab.
• If there are any missing values, they must be indicated by a placeholder, such as a period.
• The data do not need to be lined up in columns, so lines can be of unequal length.
• Need not to specify column location of the data fields, it only requires you to specify the variable names and a dollar sign ($), if defining a character variable.

Here is an example of a raw data file that is separated by blanks, note that missing values are indicated by a period (.), with a blank between periods for contiguous missing values.

Wesley F 29 68000 139000
Gary F 35 64000 12000
John M . . 11200
Ramesh F 22 56000 13300

An example of list input follows:

data loans;
length name $ 12;
input name $ gender $ age Original_loan_amt Balance;
datalines;
Wesley F 29 68000 139000
Gary F 35 64000 12000
John M . . 11200
Ramesh F 22 56000 13300
;
run;

List input has several restrictions on the type of data that it can read:

• Input values must be separated by at least one blank (the default delimiter) or by the delimiter specified with the DLM= or DLMSTR= option in the INFILE statement.
• If you want SAS to read consecutive delimiters as if there is a missing value between them, specify the DSD option in the INFILE statement.
• Blanks cannot represent missing values. A real value, such as a period, must be used instead.
• To read and store a character input value longer than 8 bytes, define a variable’s length by using a LENGTH, INFORMAT, or ATTRIB statement before the INPUT statement.
• Character values cannot contain embedded blanks when the file is delimited by blanks.
• Fields must be read in order.
• Data must be in standard numeric or character format.
• Data values should be separated by blanks or other delimiters like comma or tab.
• If there are any missing values, they must be indicated by a placeholder, such as a period.
• The data do not need to be lined up in columns, so lines can be of unequal length.
• List input does require you to specify column location of the data fields, it only requires that you specify the variable names and a dollar sign ($), if defining a character variable.

Modified List Input –

• A more flexible version of list input, includes format modifiers.
• Format modifiers enable to read nonstandard data by using SAS informats.
• Format modifiers that can be used with modified list input are –

The & (ampersand) format modifier enables you to read character values that contains one or more embedded blanks, SAS reads until it encounters two consecutive blanks, the defined length of the variable, or the end of the input line, whichever comes first.

The : (colon) format modifier enables you to specify an informat after a variable name, whether character or numeric to read nonstandard data

The ~ (tilde) format modifier enables you to read and retain single quotation marks, double quotation marks, and delimiters within character values.

The following is an example of the : and ~ format modifiers. You must use the DSD option in the INFILE statement. Otherwise, the INPUT statement ignores the ~ format modifier. You can notice that SAS kept quaotes in team name because of ~ modifier and Name ‘venktpati’(length 9 i.e. greater than default length of 8) read successfully with modifier : and informat $9.

data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,”Green Hornets, London”,AAA
Mitchel,23,19,25,”High Volts, Portland”,AAA
Venktpati,09,17,54,”Vulcans, Las Vegas”,AA
;
run;
proc print data=scores noobs; run;

Output from Example with Format Modifiers :

Name Score1 Score2 Score3 Team Div

Smith 12 22 46 “Green Hornets, London” AAA
Mitchel 23 19 25 “High Volts, Portland” AAA
Venktpati 9 17 54 “Vulcans, Las Vegas” AA

Column Input :

Column input enables you to read standard data values that are aligned in columns in the data records. Specify the variable name, followed by a dollar sign ($) if it is a character variable, and specify the columns in which the data values are located in each record:

data scores;
infile datalines truncover;
input name $ 1-12 score2 17-20 score1 27-30;
datalines;
Riley 1132 987
Henderson 1015 1102
;
run;
Note: Use the TRUNCOVER option in the INFILE statement to ensure that SAS handles data values of varying lengths appropriately. .

To use column input, data values must be:

• in the same field on all the input lines
• in standard numeric or character form.

Note: You cannot use an informat with column input. .

Features of column input include the following:

• Character values can contain embedded blanks.
• Character values can be from 1 to 32,767 characters long.
• Placeholders, such as a single period (.), are not required for missing data.
• Input values can be read in any order, regardless of their position in the record.
• Values or parts of values can be reread.
• Both leading and trailing blanks within the field are ignored.
• Values do not need to be separated by blanks or other delimiters.

Formatted Input –

Formatted input combines the flexibility of using informats with many of the features of column input. By using formatted input, you can read nonstandard data for which SAS requires additional instructions. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input
buffer when you read data.

The INPUT statement in the following DATA step uses formatted input and pointer controls. Note that $12. and COMMA5. are informats and +4 and +6 are column pointer controls.

data scores;
input name $12. +4 score1 comma5. +6 score2 comma5.;
datalines;
Riley 1,132 1,187
Henderson 1,015 1,102
;
run;

Important points about formatted input are:

• Characters values can contain embedded blanks.
• Character values can be from 1 to 32,767 characters long.
• Placeholders, such as a single period (.), are not required for missing data.
• With the use of pointer controls to position the pointer, input values can be read in any order, regardless of their positions in the record.
• Values or parts of values can be reread.
• Formatted input enables you to read data stored in nonstandard form, such as packed decimal or numbers with commas.

Named Input –

You can use named input to read records in which data values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs.

data games;
input name=$ score1= score2=;
datalines;
name=riley score1=1132 score2=1187
;
run;
proc print data=games; run;

Note: When an equal sign follows a variable in an INPUT statement, SAS expects that data remaining on the input line contains only named input values. You cannot switch to another form of input in the same INPUT statement after using named input. Also, note that any variable that exists in the input data but is not defined in the INPUT statement generates a note in the SAS log indicating a missing field. .

Here is a synopsis table to remember which method is appropriate for what type of data –

SAS input methods

Internal processing of SAS Data step

Let us look at some of the internal workings of SAS as it processes a DATA step. SAS processes DATA steps in two stages—a compile stage and an execution stage.

When you submit a DATA step for execution, it is first compiled and then executed.
The following figure shows the flow of action for a typical SAS DATA step.

SAS Data Step phases

The Compilation Phase

When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a variable type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items:

1.)Input buffer is a logical area in memory into which SAS reads each record of raw
data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)
2.)Program data vector (PDV) is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_.

• The _N_ variable counts the number of times the DATA step begins to iterate.
• The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred).

SAS does not write these variables to the output data set.

3.Descriptor information is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

The Execution Phase

By default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows:

SAS Execution phase

1 The DATA step begins with a DATA statement. Each time the DATA statement
executes, a new iteration of the DATA step begins, and the _N_ automatic variable
is incremented by 1.

2 SAS sets the newly created program variables to missing in the program data
vector (PDV).

3 SAS reads a data record from a raw data file into the input buffer, or it reads an
observation from a SAS data set directly into the program data vector. You can use
an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.

4 SAS executes any subsequent programming statements for the current record.

5 At the end of the statements, an output, return, and reset occur automatically.
SAS writes an observation to the SAS data set, the system automatically returns
to the top of the DATA step, and the values of variables created by INPUT and
assignment statements are reset to missing in the program data vector. Note that
variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are
not reset to missing here.

6 SAS counts another iteration, reads the next record or observation, and executes
the subsequent programming statements for the current observation.

7 The DATA step terminates when SAS encounters the end-of-file in a SAS data set
or a raw data file.

Writing your first program

A Simple Program to Read Raw Data and Produce a Report

To give you a flavor of SAS programming, here is a simple program to read data from a text file and produce some basic summaries.All the SAS programs are mix of data steps and procedures , in the following program we have one data step and two procedures – Proc Freq and Proc Means.

We have some test data values –

These values represent AccNumber , Name , BranchName , Balance and ArrearBucket. Each data value is separated from the next by one or more blanks. You want to produce two reports: one showing the frequencies for Branch (how many accounts each branch); the other showing the balance in each arrear bucket.

Here is a listing of the raw data file that you want to analyze:

1023 Gary Milby Phoenix 10000 1
1049 Jim Fidler London 145124 0
1219 Anthony Nance London 210000 1
1246 Ravi Sinha Phoenix 194100 2
1078 Ashley McKnight London 127000 0

Here is the program:

data branch_performance; 1
input AccNumber 1-4 Name $ 6-24 BranchName $ Balance ArrearBucket; 2
datalines; 3
1023 Gary Milby Phoenix 10000 1
1049 Jim Fidler London 145124 0
1219 Anthony Nance London 210000 1
1246 Ravi Sinha Phoenix 194100 2
1078 Ashley McKnight London 127000 0

; 5

run;

title “Number of accounts in each branch”;
proc freq data=branch_performance;
tables BranchName;
run;

title “Arrears summary statistics”;
proc means data=branch_performance;
class BranchName ArrearBucket;
var balance ;
run;

The following list corresponds to the numbered items in the preceding program,don’t worry we will cover all these in detail in our later tutorials.

1 The DATA statement tells SAS to begin building a SAS data set named Branch_Performance.
2 The INPUT statement identifies the fields to be read from the input data and names the SAS variables to be created from them (AccNumber, Name, BranchName, Balance , and ArrearBucket).
3 The DATALINES statement indicates that data lines follow.
4 The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data.
5 The semicolon signals the end of the raw data, and is a step boundary. It tells SAS that the preceding statements are ready for execution.

The dollar sign following variable names tells SAS that values for Name and BranchName are character values. Without a dollar sign, SAS assumes values are numbers and should be stored as SAS numeric values.

Finally, the DATA step ends with a RUN statement. We will discuss Proc Freq and Means in later tutorials in details, for the time being just understand that Freq is used to take frequency of number of accounts while Means is used to calculate balance in each arrear bucket in each branch.

There are several TITLE statements in this program. As you may have guessed, the text following the keyword TITLE (placed in single or double quotes) is printed at the top of each page of SAS output. Statements such as the TITLE statement are called global statements. The term global refers to the fact that the operations these statements perform are not tied to one single DATA or PROC step. They affect the entire SAS environment

What is SAS dataset ?

Have you ever worked with any database? If yes then you might be aware of term table if not need not to worry we will explain. Since database stores data so obviously it will need some memory space and standard template where it can keep that data e.g. You keep fruits in a basket.
Table is that basket of database where it can contain its fruits i.e. data. Now think you are working in a cold storage where large number of fruits are stored so now your one basket will not work but large containers are required .Each container stores one kind of fruits in a way that one retrieve that fruits easily. So now your cold storage signifies a DBMS and various containers are its tables.
So table contains some related data for DBMS.Now this table is stored in form of rows and columns. Each row is called a record and column stands for variables.

Similarly SAS organizes data into a rectangular form or table that is called a SAS data set , SAS can reads data from anywhere (for example, raw data, spreadsheets), it stores the data in its own special form called a SAS data set. Only SAS can read and write SAS data sets. If you opened a SAS data set with another program (Microsoft Word, for example), it would not be a pretty sight—it would consist of some recognizable characters and many funny-looking graphics characters. In other words, Even if SAS is reading data from Oracle tables or DB2, it is actually converting the data into SAS data set format in the background.

SAS Dataset can be divided into 2 parts –

• Descriptor Portion
• Data Portion

The descriptor information for a SAS data set makes the file self-documenting; that
is, each data set can supply the attributes of the data set and of its variables.

Descriptor information includes the number of observations, the observation length,
the date that the data set was last modified, and other facts. Descriptor information for
individual variables includes attributes such as name, type, length, format, label, and
whether the variable is indexed.

SAS dataset

The following items correspond to the portions in the figure above:
1. A SAS data file (member type DATA) contains descriptor information and data values. SAS data sets can be a member type DATA (SAS data file) or VIEW (SAS view).
2. An index is a separate file that you can create for a SAS data file in order to provide direct access to specific observations. The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set.

In a SAS data set, each row represents information about an individual entity and is called an observation. Each column represents the same type of information and is called a variable. Each separate piece of information is a data value. In a SAS data set, an observation contains all the data values for an entity; a variable contains the same type of data value for all entities.

Overview of Base SAS Programming Language

Elements of the SAS Language

The SAS language contains statements, expressions, functions and CALL routines, options, formats, and informats – elements that many programming languages share. However, the way you use the elements of the SAS language depends on certain programming rules. The most important rules are listed below –

    Rules for SAS Statements

There are only a few rules for writing SAS statements:

• SAS statements end with a semicolon.
• You can enter SAS statements in lowercase, uppercase, or a mixture of the two.
• You can begin SAS statements in any column of a line and write several statements on the same line.
• You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines.
• Words in SAS statements are separated by blanks or by special characters (such as the equal sign or the minus sign)

    Rules for Most SAS Names

SAS names are used for SAS data set names, variable names, and other items. The following rules apply:
• A SAS name can contain from one to 32 characters.
• The first character must be a letter or an underscore (_).
• Subsequent characters must be letters, numbers, or underscores.
• Blanks cannot appear in SAS names.

    Special Rules for Variable Names

For variable names only, SAS remembers the combination of uppercase and lowercase letters that you use when you create the variable name. Internally, the case of letters does not matter. “CAT,” “cat,” and “Cat” all represent the same variable. But for presentation purposes, SAS remembers the initial case of each letter and uses it to represent the variable name when printing it.

    Types of Variables

SAS has only two types of variables:

• character
• numeric

This makes it much simpler to use and understand than some other programs that have many more data types (for example, integer, long integer, and logical). SAS determines a fixed storage length for every variable. Most SAS users never need to think about storage lengths for numerical values—they are stored in 8 bytes (about 14 or 15 significant digits, depending on your operating system) if you don’t specify otherwise. The majority of SAS users will never have to change this default value (it can lead to complications and should only be considered by experienced SAS programmers). Each character value (data stored as letters, special characters, and numerals) is assigned a fixed storage length explicitly by program statements or by various rules that SAS has about the length of character values.

What is the purpose of the trailing and How would you use them?


If a variable doesn’t have a single trailing @ with an input statement, when sas encounters another input statement after the first one,it would load a new record, this action will result in some observations not been outputted and missing data. To avoid this, a single trailing @ is used to hold the record , so that when it encounters another input statements within the datasets a new record is not loaded, It basically tells sas to use the data just placed in the input buffer for the next input statement . the single trailing @ is release when an input statement without a trailing @ is encountered.

`input zz$ status @@’ tells SAS to read two words at a time from the input buffer to the pdv without discarding the rest of the line.Without the trailing @@, SAS would read the first two words from input buffer and ignore therest of the line. This would result in reading less records.

They are used for the records such as
001F38 H
002 F 40 G
To read these values to the datastep
Data example;
input @10 type $ @;
if type=’H’ then
input @1 id 3. @4 gender $1. @5 age2.;
else if type=’G’ then
input @1 id3. @5 gender $1. @7 age 2.;
end;
cards;
001F38 H
002 F 40 G
;
run;

The double trailing holds the until the end of the record.
Data example2;
input id age @@;
cards;
001 23 002 43 003 65 004 32 005 54
;
run;

Input Method


/* List Input Method */

data work.demo;
infile cards;
input pid age color $ race$ height weight;
cards;
200 34 white Asian 56 78
201 59 black African 84 45
202 23 white Asian 45 56
;
run;
proc print data= work.demo;
run;

/*NAMED INPUT METHOD*/
/*sometimes in raw data, data values available with variable names. In this case we can use named input method. Here order of input does not affect.It will print in order of input statement*/

data work.demo1;
infile cards;
length name $ 14;
input id= age= name= $;
cards;
id=123 age=34 name= mayank
id=124 age=38 name= kioran rao
id=125 age=45 name= mayur kumar
id=126 age=40 name= mayank kumar
;
run;
proc print data = demo1;
run;

/*COLUMN INPUT METHOD*/
/*sometimes in raw data, data values are available in specific columns. */

data demo2;
infile cards;
input pid 1-4 name $ 5-16
age 17-19 color $ 20-25;
cards;
100 Kiran kumar 89 white
101 pawan       67 black
102 kranthi     89 white
;
run;

/* note if you will not specify the column input then data will get scattered*/

data demo3;
infile cards;
input pid 1-4 name $ 5-16
age 17-19 color $ 20-25;
cards;
100 Kiran kumar 89 white
101 pawan 67 black
102 kranthi 89 white
;
run;

data work.demo;
infile cards;
input pid age color $ race$ height weight;
cards;
100 Kiran kumar 89 white
101 pawan       67 black
102 kranthi     89 white
;
run;

/*FORMAT INPUT METHOD*/
/*in format input methods symbols are playing a main role.
+n,n. n-indicates number
+n-> column pointer-non-required data
n. -> column range-required data */

data demo4;
infile cards;
input +0 id 3. +1 name $ 11.
+1 age 3. +0 color $ 7.;
cards;
100 kiran kumar 89 white
101 pawan       67 black
102 kranthi     87 white
;
proc  print data= demo4;
run;

/*ABSOLUTE INPUT METHOD*/
/*USING ABSOLUTE input method we can read standard and non-standard data
@n =column hold pointer
n.  column range
n -> indicates column number */

data demo5;
infile cards;
input @1 id 3. @5 name $ 11.
@17 age 2. @20 race $ 5.;
cards;
100 kiran kumar 89 white
101 pawan       67 black
102 kranthi     89 white
;
run;

/*mixed input method*/
/* if we use or write one or more input technique in required input statement,then it is called mixed input statement*/

data demo6;
infile cards;
input @1 pid 3. +1 name $ 11.
age 17-19 @20 race $ 5.;
cards;
100 kiran kumar 89 white
101 pawan       67 black
102 kranthi     89 white
;
run;

/* Modifying List Input
There are two modifiers that can be used with list input:
The ampersand (&) modifier is used to read character values that contain embedded blanks The colon (:) modifier is used to read nonstandard data values and character
values that are longer than eight characters, but which contain no embedded blanks and other special characters like comma dollar etc */

data sales;
   infile cards;
   input    EmpID     :       $4.
            Name      &      $15.
            Region    :       $5.
            Customer  &      $18.
            Date      : mmddyy10.
            Item      :       $8.
            Quantity  :        5.
            UnitCost  :  dollar9.;
   TotalSales = Quantity * UnitCost;
/*   format date mmddyy10. UnitCost TotalSales dollar9.;*/
  * drop Date;
cards;
1843 George Smith  North Barco Corporation  10/10/2006 144L 50 $8.99
1843 George Smith  South Cost Cutter’s  10/11/2006 122 100 $5.99
1843 George Smith  North Minimart Inc.  10/11/2006 188S 3 $5,199
1843 George Smith  North Barco Corporation  10/15/2006 908X 1 $5,129
1843 George Smith  South Ely Corp.  10/15/2006 122L 10 $29.95
0177 Glenda Johnson  East Food Unlimited  9/1/2006 188X 100 $6.99
0177 Glenda Johnson  East Shop and Drop  9/2/2006 144L 100 $8.99
1843 George Smith  South Cost Cutter’s  10/18/2006 855W 1 $9,109
9888 Sharon Lu  West Cost Cutter’s  11/14/2006 122 50 $5.99
9888 Sharon Lu  West Pet’s are Us  11/15/2006 100W 1000 $1.99
0017 Jason Nguyen  East Roger’s Spirits  11/15/2006 122L 500 $39.99
0017 Jason Nguyen  South Spirited Spirits  12/22/2006 407XX 100 $19.95
0177 Glenda Johnson  North Minimart Inc.  12/21/2006 777 5 $10.500
0177 Glenda Johnson  East Barco Corporation  12/20/2006 733 2 $10,000
1843 George Smith  North Minimart Inc.  11/19/2006 188S 3 $5,199
;
run;

/* Line Pointer Controls:
When SAS reads raw data values, it keeps track of its position with an input pointer
line pointer control positions the input pointer on a specific record by using the INPUT statement There are two types of line pointer controls: The forward slash (/) specifies a line location that is relative to the current one The #n specifies the absolute number of the line to which you want to move the pointer */

data address1;
   infile datalines ;
   input #1 Name $40.
         #2 Street $40.
         #3 @1  City $20.
            @21 State $2.
            @24 Zip $5.;
datalines;
ron   coDY
1178  HIGHWAY 480
camp   verde        tx 78010
jason Tran
123 lake  view drive
East  Rockaway      ny 11518
;
proc print;run;

data LPC ;
input Lname $ 1-8 Fname $ 10-15 /
Department $ 1-12 Jobcode $ 15-19 /
Salary comma10. ;
cards ;
goel     varun
marketing     sr01
$25,209.03
goel     varun
marketing     sr01
$25,209.03
goel     varun
marketing     sr01
25,209.03
goel     varun
marketing     sr01
.
;
run;
proc print ;run;

ARRAYS


*SAS ARRAYS ARE A COLLECTION OF ELEMENTS (SAS VARIABLES )THAT ALLOW YOU TO WRITE SAS STATEMENT
REFERING  TO THIS GROUP OF VARIABLES.

*ALWAYS ARE USED TO PERFORM A SIMILAR OPERATION ON A GROUP OF VARIABLES ;

*EX : W/O ARRAY ;

/* Changes will not be done */

data new;
   infile cards;
   input height weight age;
   cards;
   999 45  34
   165 999 78
   678 87  999
   ;
   if Height = 999 then Height = .;
   if Weight = 999 then Weight = .;
   if Age    = 999 then Age    = .;
run;

 

data new;
   infile cards;
   input height weight age;
   if Height = 999 then Height = .;
   if Weight = 999 then Weight = .;
   if Age    = 999 then Age    = .;
   cards;
   999 45  34
   165 999 78
   678 87  999
   ;
  
run;

*Program 13-2 Converting values of 999 to a SAS missing value – using arrays;

data new1;
   infile cards;
   input height weight age;
   cards;
   999 45  34
   165 999 78
   678 87  999
   ;
  
run;

data new1;
set new;
      array myvars{3} Height Weight Age;
   do i = 1 to 3;
      if myvars{i} = 999 then myvars{i} = .;
   end;
   drop i;
run;

/*Program 13-3 Rewriting Program 13 2 using the CALL MISSING routine In Bracket it denotes number of variabels;*/

data new2;
   set new;
   array myvars{3} Height Weight Age;
   do i = 1 to 3;
      if myvars{i} = 999 then call missing(myvars{i});
   end;
   drop i;
run;

*Program 13-4 Converting values of NA and ? to a character missing values; data chars ;

   input A $ B $ x y Ques $;
datalines;
NA ? 3 4 ABC
AAA BBB 8 . ?
NA NA 9 8 NA
;
data missing;
   set chars;
   array char_vars{*} $ _character_;
   do i = 1 to dim(char_vars);
      if char_vars{i} in (‘NA’ ‘?’) then call missing(char_vars{i});
   end;
   drop i;
run;

data missing1;
   set chars;
   array char_vars{*} $ a b ques;
   do i = 1 to dim(char_vars);
      if char_vars{i} in (‘NA’ ‘?’) then call missing(char_vars{i});
   end;
   drop i;
run;

 

*Program 13-5 Converting all character values in a SAS data  set to lowercase;

data lower;
   set chars;
   array all_chars{*} _character_;
   do i = 1 to dim(all_chars);
      all_chars{i} = lowcase(all_chars{i});
   end;
   drop i;
run;

*Program 13-6 Using an array to create new variables;
data temp;
   input Fahren1-Fahren24 @@;
   array Fahren[24];
   array Celsius[24] Celsius1-Celsius24;
   do Hour = 1 to 24;
      Celsius{Hour} = (Fahren{Hour} – 32)/1.8;
   end;
   drop Hour;
datalines;
35 37 40 42 44 48 55 59 62 62 64 66 68 70 72 75 75
72 66 55 53 52 50 45
;

 

DATA faminc;
   INPUT famid faminc1-faminc12 ;
CARDS;
1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818
2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471
3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215
;
RUN;
 
PROC PRINT DATA=faminc;
RUN;

DATA faminc1a;
   SET faminc;
    taxinc1 = faminc1 * .10 ;
    taxinc2 = faminc2 * .10 ;
    taxinc3 = faminc3 * .10 ;
    taxinc4 = faminc4 * .10 ;
    taxinc5 = faminc5 * .10 ;
    taxinc6 = faminc6 * .10 ;
    taxinc7 = faminc7 * .10 ;
    taxinc8 = faminc8 * .10 ;
    taxinc9 = faminc9 * .10 ;
    taxinc10= faminc10 * .10 ;
    taxinc11= faminc11 * .10 ;
    taxinc12= faminc12 * .10 ;
RUN;
 
PROC PRINT DATA=faminc1a;
RUN;

DATA faminc1b;
   SET faminc ;
   ARRAY Afaminc(12) faminc1-faminc12 ;
   ARRAY Ataxinc(12) taxinc1-taxinc12 ;
 
   DO month = 1 TO 12;
     Ataxinc(month) = Afaminc(month) * .10 ;
   END;
RUN;

PROC PRINT DATA=faminc1b;
   VAR faminc1-faminc12 taxinc1-taxinc12;
RUN;

DATA faminc2a;
   SET faminc;
    incqtr1 = faminc1+faminc2+faminc3 ;
    incqtr2 = faminc4+faminc5+faminc6 ;
    incqtr3 = faminc7+faminc8+faminc9 ;
    incqtr4 = faminc10+faminc11+faminc12 ;
RUN;

PROC PRINT DATA=faminc2a;
   var faminc1-faminc12 incqtr1-incqtr4;
RUN;

DATA faminc2b;
   SET faminc ;

   ARRAY Afaminc(12) faminc1-faminc12 ;
   ARRAY Aincqtr(4)  incqtr1-incqtr4 ;

   DO qtr = 1 TO 4 ;
     month3 = 3*qtr;
     Aincqtr(qtr) = Afaminc(month3-2) + Afaminc(month3-1) + Afaminc(month3) ;
   END;
RUN;

 
PROC PRINT DATA=faminc2b;
   var faminc1-faminc12 incqtr1-incqtr4;
RUN;

%d bloggers like this: