//
archives

# Archive for

### Conjoint Model

Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.

The Conjoint Model

Conjoint analysis is based on a main effects analysis-of-variance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical utility or part-worth utility value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. Metric conjoint analysis models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple main-effects ANOVA with some specialized
output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors. This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the ith level of mileage, the jth level of expected reliability, and the k th level of price. The grand mean is , and the error is ijk. Nonmetric conjoint analysis finds a monotonic transformation of the preference judgments.
The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.   The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using
ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.

SAS Program Statements

OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;
*
* Define a data set named TIRES.
* The variable RANK typically would be the average across all subjects.
*;
DATA TIRES;
INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 9-10;
CARDS;
1 1 2 1 3
1 1 3 2 2
1 2 1 2 14
1 2 2 2 10
1 3 1 1 17
1 3 3 1 12
2 1 1 2 7
2 1 3 2 1
2 2 1 1 8
2 2 3 1 5
2 3 2 1 13
2 3 2 2 16
3 1 1 1 6
3 1 2 1 4
3 2 2 2 15
3 2 3 1 9
3 3 1 2 18
3 3 3 2 11
;
*
* Set up value labels.
*;
PROC FORMAT;
VALUE BRANDF
1 = ‘GOODSTONE’
2 = ‘PIROGI ‘
3 = ‘MACHISMO ‘;
VALUE PRICEF
1 = ‘\$69.99’
2 = ‘\$74.99’
3 = ‘\$79.99’;
VALUE LIFEF
1 = ‘50,000’
2 = ‘60,000’
3 = ‘70,000’;
VALUE HAZARDF
1 = ‘YES’
2 = ‘NO ‘;
PROC FREQ NOPRINT;
FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;
*
* Conduct nonmetric (i.e., simple) conjoint analysis.
*;
PROC TRANSREG MAXITER=50 UTILITIES SHORT;
ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;
MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);
OUTPUT IREPLACE PREDICTED;
*;
PROC PRINT LABEL;
VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;
LABEL PRANK = ‘PREDICTED RANKS’;
*
* Conduct metric conjoint analysis using the %mktex SAS macro.
* The parentheses after the %MKTEX macro defines:
* The number of categories for each variable.
* The number of combinations being evaluated.
* Seed= [some number] is not strictly necessary, but helps ensure a
reproducible design.
*;
%MKTEX(3 3 3 2, N=18, SEED=448)
%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,
STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)
%MKTEVAL;
PROC PRINT DATA=SASUSER.TIREDESIGN;
RUN;

### Conjoint Model

Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.

The Conjoint Model

Conjoint analysis is based on a main effects analysis-of-variance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical utility or part-worth utility value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. Metric conjoint analysis models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple main-effects ANOVA with some specialized
output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors. This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the ith level of mileage, the jth level of expected reliability, and the k th level of price. The grand mean is , and the error is ijk. Nonmetric conjoint analysis finds a monotonic transformation of the preference judgments.
The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.   The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using
ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.

SAS Program Statements

OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;
*
* Define a data set named TIRES.
* The variable RANK typically would be the average across all subjects.
*;
DATA TIRES;
INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 9-10;
CARDS;
1 1 2 1 3
1 1 3 2 2
1 2 1 2 14
1 2 2 2 10
1 3 1 1 17
1 3 3 1 12
2 1 1 2 7
2 1 3 2 1
2 2 1 1 8
2 2 3 1 5
2 3 2 1 13
2 3 2 2 16
3 1 1 1 6
3 1 2 1 4
3 2 2 2 15
3 2 3 1 9
3 3 1 2 18
3 3 3 2 11
;
*
* Set up value labels.
*;
PROC FORMAT;
VALUE BRANDF
1 = ‘GOODSTONE’
2 = ‘PIROGI ‘
3 = ‘MACHISMO ‘;
VALUE PRICEF
1 = ‘\$69.99’
2 = ‘\$74.99’
3 = ‘\$79.99’;
VALUE LIFEF
1 = ‘50,000’
2 = ‘60,000’
3 = ‘70,000’;
VALUE HAZARDF
1 = ‘YES’
2 = ‘NO ‘;
PROC FREQ NOPRINT;
FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;
*
* Conduct nonmetric (i.e., simple) conjoint analysis.
*;
PROC TRANSREG MAXITER=50 UTILITIES SHORT;
ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;
MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);
OUTPUT IREPLACE PREDICTED;
*;
PROC PRINT LABEL;
VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;
LABEL PRANK = ‘PREDICTED RANKS’;
*
* Conduct metric conjoint analysis using the %mktex SAS macro.
* The parentheses after the %MKTEX macro defines:
* The number of categories for each variable.
* The number of combinations being evaluated.
* Seed= [some number] is not strictly necessary, but helps ensure a
reproducible design.
*;
%MKTEX(3 3 3 2, N=18, SEED=448)
%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,
STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)
%MKTEVAL;
PROC PRINT DATA=SASUSER.TIREDESIGN;
RUN;

### PROC TRANSREG

##### When using PROC TRANSREG, what are the defaults with pspline?

Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Psplines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.

In this page, we will walk through an example proc transreg with the pspline option and explore its defaults.  The bspline, spline, and pspline options, when similarly specified, yield the same results.  Their differences lie in the number and type of transformed variables generated for estimation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg

```
data a;
x=-0.000001;
do i=0 to 199;
if mod(i,50)=0 then do;
c=((x/2)-5)**2;
if i=150 then c=c+5;
y=c;
end;
x=x+0.1;
y=y-sin(x-c);
output;
end;
run;

proc gplot data = a;
plot y*x;
run;``` Clearly, there is not a single, continuous function relating Y to X.  The relationship does not appear random, but it does appear to change with X.  Thus it makes sense to try to fit this with splines.  Before running the proc transreg, we can see that our data contains four variables:

```proc print data = a (obs = 5); run;

Obs       X       I       C          Y
1    0.10000    0    25.0000    24.7694
2    0.20000    1    25.0000    24.4427
3    0.30000    2    25.0000    24.0234
4    0.40000    3    25.0000    23.5155
5    0.50000    4    25.0000    22.9241```

In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here.  To predict y, we indicate that we wish to use piecewise polynomial functions of x with pspline(x). We also opted to output a dataset, a2, containing predicted values from the model.

```proc transreg data=a;
model identity(y) = pspline(x);
output out = a2 predicted;
run;

The TRANSREG Procedure

TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration    Average    Maximum                Criterion
Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
1    0.00000    0.00000     0.46884                 Converged```

We can see in the outcome above that the model converged and has an R-squared value of 0.47.  Let’s look at the dataset output by proc transreg.

`proc print data = a2 (obs = 5); run;`
```Obs  _TYPE_  _NAME_     Y        TY       PY    Intercept    X_1      X_2

1  SCORE    ROW1   24.7694  24.7694  24.1144      1      0.10000  0.01000
2  SCORE    ROW2   24.4427  24.4427  23.4722      1      0.20000  0.04000
3  SCORE    ROW3   24.0234  24.0234  22.8424      1      0.30000  0.09000
4  SCORE    ROW4   23.5155  23.5155  22.2249      1      0.40000  0.16000
5  SCORE    ROW5   22.9241  22.9241  21.6195      1      0.50000  0.25000

Obs    X_3      TIntercept      TX_1       TX_2       TX_3        X

1  0.00100         1        0.10000    0.01000    0.00100    0.10000
2  0.00800         1        0.20000    0.04000    0.00800    0.20000
3  0.02700         1        0.30000    0.09000    0.02700    0.30000
4  0.06400         1        0.40000    0.16000    0.06400    0.40000
5  0.12500         1        0.50000    0.25000    0.12500    0.50000
```

In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty,  has been added for the “transformed” value of y (since our transformation was the identity, these values are the same as y); three variables (x_1, x_2, x_3) that are the powers of x have been added.  Transformations of these three variables and the intercept are also included and indicated with a ‘t‘.  We can see that, by default, SAS fits a single third-degree polynomial in x to y.  Note that though splines are often used to fit piecewise functions, the default setting when using pspline in proc transreg is to estimate just one function (zero knots).

We can plot the predicted values to see how closely they match the original data.

```legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2;
proc gplot data = a2;
plot (y py)*x / overlay legend = legend;
run;``` For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming x in the same manner as proc transreg.

```
data a3; set a;
x2 = x*x;
x3 = x*x*x;
run;

proc reg data = a3;
model y = x x2 x3;
run;```
```The REG Procedure
Model: MODEL1
Dependent Variable: Y

Number of Observations Used         200

Analysis of Variance
Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3     7955.26078     2651.75359      57.67    <.0001
Error                   196     9012.65604       45.98294
Corrected Total         199          16968

Root MSE              6.78107    R-Square     0.4688
Dependent Mean       12.04335    Adj R-Sq     0.4607
Coeff Var            56.30551

Parameter Estimates
Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       24.76908        1.95451      12.67      <.0001
X             1       -6.60903        0.84002      -7.87      <.0001
x2            1        0.62721        0.09698       6.47      <.0001
x3            1       -0.01513        0.00317      -4.77      <.0001```

In this example, using proc transreg only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of x.  Doing so with data and proc reg steps quickly becomes unmanageable or impossible, while doing so with proc transreg is effective and efficient.

### R BASICS

Basic Commands

• quit(); q()
• help(command); help.start()
• search(); help.search()
• dir(); methods()
• library(p); identify(); attach(); detatch()
• remove(); rm()
• start:end; c(); rep(); seq()
• scan(); print(); str(); ls()
• cat(); cat(“concaternate”, c, “and print”, “\t”)
• options(prompt=’.’, continue=”///”, digits=10); getOption(“width”)
• source(); source.url() /* run commands in a file */ Simple examples

library() # list packages available
list(data()) # list data sets in the current package
summary(Davis)
list(Davis)
list(Davis\$weight)
stem(Davis[,2]) # equal to stem(Davis\$weight)
stem(Davis\$height, scale=4)
boxplot(Davis\$weight)
w<-Davis\$weight
h<-Davis\$height
plot(w ~ h)
cor(Davis[,c(2:3)])
cor.test(w,h)
t.test(Davis[,2], mu=65)
t.test(Davis\$height, Davis\$weight, mu=100, paired=FALSE)
var.test(Davis\$height, Davis\$weight) OPERATOR/FUNCTION Operators

• <- (left assignment), -> (right assignment)
• +, -, *, /, ^, %% (modulus)
• >, >=, <, <=, == (equal), != (not equal)
• & (and), | (or)
• %*% (matrix product); %/% (division)
• %o% (Outer product); %x% (Kronecker product)
• %in% (Matching operator); Functions

• abs(); sin(); cos(); tan(); exp(); sqrt(); min(); max()
• log(); log(v,10); log10(); log2(); log(v, base=10)
• mean(); sum(); median(); range(); var(); sd()
• rank(); ave(v, group); by(group)
• c(a, b, c); c(start:end); seq(start:end); seq(10, 100, by=5)
• rep(n, time); rep(7, 3); rep(start:end, time)
• rep(1:3, c(2,2,2)); rep(1:3, each=2); rep(1:3, c(1:3))
• seq(1,4); seq(1,10, by=2); seq(0,1, length=10)
• length(), sort(), order(); rev(v) ## to reverse
• dnorm(1.96); dt(1.96, 100); df(1.96, 1, 100); dchisq(1.96, 10)
• pnorm(1.96); pt(1.96, 100); pf(1.96, 1, 100); pchisq(1.96, 10)
• rpois(n, lamda); rnorm(n); rt(n, df); rt(n, df=c(1:10)); rexp(n)
• substring(s, start, stop); substr(s, start, stop); nchar(s)
• date()
• mode() ## type of object INPUT OUTPUT Reading Text Files

• source(f); /* to execute commands in the file */
• names(m)<-c(“a”, “b”, “c”)
• as.is=TRUE; as.is=T # not to be converted into a factor
• na.strings<-c(“.”, “NA”, “”, “#”) # characters for missing
• cnt=count.fields(df); which(cnt=7); Reading Data Frames

• data(d); data(d, package=”p”)
• data.frame(v1, v2) /* to make a data frame out of vectors */
• m3<-data.frame(as.matrix(m[,2:4]))
• m2<-edit(m); m2<-edit(data.frame(m))
• data.entry(df) Handling Data

• m2<-match(v1, v2, nomatch=0) # data merging
• m2<-match(m[,1], m[,3]) Writing Data

• cat(); print()
• cat(“y x1 x2”, “2 4 2”, “5 2 7″, file=”sample.txt”, sep=”\n”)
• write.(obj, f)
• write.table(df, file=’firms.csv’, sep=”,”, row.names=NA, col.names=NA)
• save(f, obj); save.image(f)
• sink(); format() MATRICES Defining Matrices

• m<-c(1, 2, 3, 4); c(1, 2, 3, 4)->m; assign(“m”, c(1, 2, 3, 4))
• m<-data.frame(column1=c(1,2,3), column2=c(4,5,6)); ## 2 by 3
• rep(c(1,2,3), 2); rep(c(1,2,3), each=2);
• rep(c(1,2,3), c(2,2,2,)); m<-c(c1=15, c2=54, c3=50)
• seq(1,4); seq(1,10, by=2); seq(0,1, length=10);
• intm<-1:4; intm<-numeric(); intmm<-1; intmm>-2
• strm<c(“a”, “b”, “c”); strm<-charanter(); strmm<-“a”; strmm<-“b”
• blm<-c(T,F); blm<-v1>10; ## a boolean vector of TRUE and FALSE
• m<-scan()
• mm<-matrix(1:12,4); mm<-matrix(1:12, nrow=4)
• mm<-matrix(1:12, ncol=3); mm<-matrix(1:12, nrow=4)
• mm<-matrix(1:12, nrow=4, ncol=3); mm<-matrix(1:12, 4, 3)
• arrm<-array(1:10); arrm<-array(1:10, dim=c(2,5))
• cbind(); rbind(); gl(); expand.grid()
• list() Referring Matrices

• m[,2]; v=m[2,]; m[-1, -3] ## to extract elements
• m[c(1, 5, 6)]; m2=m[-c(1, 5, 6)] ## to extract elements
• m<-c(c1=15, c2=54, c3=50); m<-c(“c1”, “c3”)
• m2<-m\$c2; m2<-m[,2]; m2<-m[,”c2″]; m2<-m[]
• m[,3:5]; m3<-m[,c(3, 4, 5)]; m3<-m[,c(“c3”, “c4”, “c5”)]
• m<-c(4, 2, 4); names(m)<-c(“Grape”, “Pear”, “Apple”)
• m1\$v2 /*variable 2 of the data frame 1*/
• white(); which.max(); which(min)
• attr(m, which); attributes(obj) Matrix Functions

• t(); det(); rank(); eigen(); diag(); prod(); crossprod()
• sum(); mean(); var(); sd(); min(); max(); prod(); cumsum(); cumprod()
• is.na(m) ## to check if m contains a missing value
• rowsum(); colsum(); nrow(); ccol()
• dim(m); dimnames(m)
• merge(df1, df2)
• as.factor(); as.matrix(), as.vector(); /* conversion*/
• is.factor(); is.matrix(), is.vector();
• class(); unclass()
• na.omit(); na.fail(); unique(); table(); sample()
• as.array(); as.data.frame()
• as.numeric(); as.characters(); as.logical(); as.complex() REGRESSION Ordinary Least Squares (OLS)

• lm(); glm()
• m.ols<-lm(v1~v2+v3, data=m) ## linear model
• lm(v1~v2+v3, data=m); summary(lm(v1~v2+v3, data=m)); summary(m.ols)
• names(m.ols); coef(m.ols); fitted(m.ols); resid(m.ols)
• predict(fit); AIC(fit); logLik(fit); deviance(fit)
• model.matrix(v1~v2+v3, data=m)
• m.ols2<-model.matrix(v1~v2+v3, data=m); summary(m.ols2) Binary Response Regressions

• summary(m.logit); coef(m.logit); fitted(m.logit); resid(m.logit)
• lsfit(v1,v2)
• nls(); m.nonlin<-lm(v1~v2+v2^2, data=m)
• anova(m.ols, m.nonlin)
• m.qr<-qr(m) ## QR Decomposition of a Matrix STATISTICS Descriptives

• summary(m); fivenum(m)
• stem(v); boxplot(v); boxplot(v1, v2); hist(v)
• qqnorm(v); qqline(v)
• rug(); lines()
• table() /*to make a table*/
• tabulate() Multivariate Analysis

• cor(m); cor(sqrt(m)) ## Pearson correlation
• cor.test(v1, v2)
• prcomp() /* Principal components in the mva package*/
• kmeans() /* Kmeans cluster analysis in the mva package*/
• factanal() /* Factor analysis in the mva package*/
• cancor() /* Canonical correlation in the mva package*/ Categorical Data Analysis

• chisq.test(v1,v2) ## Pearson Chi-squared Test
• fisher.test(v1,v2) ## Fisher Exact Test
• friedman.test(v1,v2) ## Friedman Test
• prop.test(); binom.test() ## sign test
• kruskal.test(v1,v2) ## Kruskal-Wallis Rank Sum Test
• wilcox.test(v1,v2) ## Wilcoxon Rank Sum (Mann-Whitney) Test
• ks.test(v1,v2) ## Two Sample Kolmogorov-Smirnov Test
• bartlett.test(v1,v2) ## Bartlett Test for Homogeneity of Variances T-TEST AND ANOVA (pdf)

• t.test(v1,v2); t.test(v1,v2, var.equal=FALSE)
• t.test(v1,v2, mu=0 paired=FALSE)
• t.test(v1.v2, mu=10, paired=F, var.equal=T)
• power.t.test(v1,v2); pairwise.t.test()
• var.test(v1,v2) ## F test for equal variance
• m.anova<-aov(v1~v2+v3, data=m)
• aov(); anova()
• summary(m.anova)
• power.anova.test() ## Power calculations for balanced one-way ANOVA tests PROGRAMMING Modules

frame_name<-function(arguments) {…}
mile.to.km<-function(mile) {mile*8/5}
km<-mile.to.km(c(35, 55, 75)) Flow Control

if (condition) {…} else if (condition) {…} else {…}
while (condition ) {…} # {} may be omitted for a single line expression
for (index in start:end) {…}
for (i in 1t:100) {sum <- sum + i}
repeat {…}
switch (statement, list) Programming Functions

• expression(); parse(); deparse(); eval()
• optim() /* general-purpose optimization */
• nlm() /* Newton algorithm */
• lm() /* linear models */
• nls() /* nonlinear least squares model */ GRAPHICS Plotting

• plot(y~x, data=m, pch=16) # plotting character (pch)
• pairs(m) # scatterplot matrix
• xyrange<-range(m) # to get range of m
• plot(y~x, data=m, xlim=xyrange, ylim=xyrange)
• abline(0,1)
• plot((0:10), sin((1:10)*pi, type=”1″) # 1 joins the points
• barplot(); boxplot(); stem(); hist();
• matplot() /* matrix plot */
• pairs(m) /* scatterplots */
• coplot() /* conditional plot */
• stripplot() /* strip plot */
• qqplot(); qqnorm(); qqline() /* quantile0quantile plot */ Options

• points() # to add points to a plot
• lines() # to add lines
• text() # to add texts
• mtext() # to add margin texts
• axis() # to control axis
• par(cex=1.25 mex=1.25)
• par(mfrow=c(2,2), mfcol=c(1,1))

Regards,

SAS India

### SAS Day 3

INPUT OVERVIEW

The INPUT statement describes the arrangement of a target data to be read in a DATA step. You need to provide variable names followed by \$ (indicating a character value), pointer control, column-specifications, informat, and/or line hold specifiers (i.e., @, and @@) in an INPUT statement.

• Column pointer controls such as @n and +n move the input pointer to a specified column in the input buffer.
• Line pointer controls such as #n and / move the input pointer to a specified line in the input buffer.
• Column specifications specify the columns of the input record that contain the value to read.
• A informat is an instruction that SAS uses to read data into variables.
• @, a single trailing @, holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. Thus, the next INPUT statement reads from the same record (line).
• @@, a double trailing @, holds the input record for the execution of the next INPUT statement across iterations of the DATA step. Thus, the INPUT statement for the next iteration of the DATA step continues to read the same record (line).

The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a DATA step. In order to read external data files, you have to use the INFILE statement.

There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input. The following table summarizes features of four major styles. Which input style is the best? It depends on your skills and characteristics of data sets. If your data set has just a few observations with several variables, the list input or the named input will be better than the column input or the formatted input. When data elements are not separated with a blank or other delimiters, you cannot use the list input style. When data are well arranged, the column input or formatted input will be better than the list input. Therefore, you need to examine the data structure carefully when deciding the best input style. Of course, you must take this issue into account from the data coding stage.

LIST INPUT

The input style simply lists variables separated with a blank. This style is also called the free format.

DATA listed;
INPUT name \$ id score;
DATALINESS /*–1—-+—-2—*/;
Park 8740031 87.5
Hwang . 94.3

RUN;

A character variable should be followed by \$. A missing value should be marked with a period (.); a blank does not mean a missing value in this input style. Do not use more than one “.” for a missing value. The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory are assigned to each variable. Therefore, a string longer than 8 characters will be trimmed. If you want to read a string longer than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements. Or you may use different input styles such as column input or formatted input.

DATA _NULL_;
LENGTH analysis \$15.;
INFORMAT year MMDDYY10.;
INPUT analysis year;
FORMAT year DATE9.;
CARDS /*–1—-+—-2—*/;
Regression 1/2/2002
ANOVA 05/05/2007
Time-Series 09/03/1968
RUN;
/* Output
Regression 01OCT2000
ANOVA 05MAY2004
Time-Series 03SEP2009
*/

In the example above, you may use “INFORMAT analysis \$15.” instead of the LENGTH statement. INFORMAT tells how data are read, while FORMAT tells the format to be displayed. MMDDYY10. reads data in the MM/DD/YYYY format. DATE9. displays date in the DDMMMYYYY format. Without the FORMAT for year, SAS will return odd numbers such as 14884, which are internally used in SAS.

The following example reads an ASCII text file with a comma delimited. Remember the default delimiter is a blank. See the INFILE statement for the detail.

DATA _NULL_;
INFILE ‘a:\tiger.dat’ DELIMITER=’,’ STOPOVER;
INPUT name \$ id score

RUN; MODIFIED LIST INPUT

The modified list style is a mixture of the list input and the formatted input. This style can deal with ill-structured data. There are three format modifiers to be used especially when reading complex data.

• colon (:) reads data longer than standard 8 characters or numbers until encountering specified delimiter or reaching the variable width specified.
• ampersand (&) format modifier reads character values that contain embedded blanks with list input and reads until encountering more than one consecutive delimiter. You may include ” (double quotes) in the value of a character variable.
• tilde (~) reads and retains single quotation marks, double quotation marks, and delimiters within quoted character values. That is, double quotation marks enclosing a string are treated as values of a character variable.

The following example illustrates how : and & work in INPUT. The “Lindblom80” in the first row is trimed since it exceeds 8 characters; only first 8 characters, as specified in the INPUT statement, are read and the last two characters “08” are ignored. In the second row, SAS reads the first four characters “Park”, which are shorter than 8 characters, and then encounters a comma (delimiter); SAS stops reading data for the variable “name” and moves on to next variable. The variable “title” is defined by & with a maximum 50 characters. The delimiter, a comma, in the first and third row is treated as a character value. Two consecutive double quotation marks are read as a double quotation marks. Therefore, the title of the second observation is Readig “Small Is Beautiful” as shown in the output. Characters exceeding the maximum, 50 characters in this case, will be ignored.

DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : \$8. title & \$50.;
DATALINES;
Lindblom80,”Still Muddling, Not Yet Through”
Simon, “””It was a disaster,”” he continue…”
RUN;

/* Output
Lindblom Still Muddling, Not Yet Through
Simon    “It was a disaster,” he continue…
*/

The INFILE statement above says that data are comma delimited and will be listed after DATALINES. DSD at the end of INFILE eliminates double quotation marks enclosing the character value when reading data. If you omit DSD, SAS will consider a comma in character values as a delimiter and read enclosing double quotation marks as character values. As a result, the output would look like,

Lindblom “Still Muddling
Simon “””It was a disaster,”” he continue…”

The second example shows how ~ (tilde) and DSD work together to read a string with a delimiter. SAS reads a comma in the string as a character value but does not eliminate double quoatation marks enclosing the string. If you omit DSD, the title of the second row will be ‘”Still Muddling’ because SAS treats a comma in the string as the delimiter and stops reading the character value for variable “title.”

DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : \$20. year : 4.0 title ~ \$50.;
DATALINES;
Meyer and Rowan,1977,”Institutionalized Organization”
Lindblom,1979,”Still Muddling, Not Yet Through”
RUN;
/* Output
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom        1979 “Still Muddling, Not Yet Through”
*/
/* Output without DSD
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom 1979 “Still Muddling
*/

You may not ommit : after “year” in the INPUT statement above even when data are in the same fixed format. When the variable “year” is specified at the last of the list in the INPUT statement, : is not necessary. COLUMN INPUT

The column input style reads the value of a variable from its specified column location. A variable name is followed by its starting and ending columns.

DATA columned;
INPUT name \$ 1-5 id 6-12 score 14-17;
CARDS /*–1—-+—-2—*/;
Park 8740031 87.5
Hwang9301020 94.3

RUN;

SAS reads a variable “name” from 1 through 5 columns, id from 6 through 12 columns, and so on. This input style works good for well structured data. FORMATTED INPUT

The formatted input style reads input values with specified inforamts after variable names. Informats provide the data type and the width of an input value. Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point. You cannot omit d even when d = 0. The use \$CHARw. or \$w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.

DATA formatted;
INPUT name \$5. id 7. score 4.1;
DATALINES /*–+—-2—*/;
Park 8740031 875 /* score=87.5 */
Hwang9301020 943 /* score=94.3 */

RUN;

You can use parentheses to simplify expressions.

DATA formatted;
DATALINES /*–+—-2—*/;
Park 8740031 89 95100
Hwang9301020100 93 99

RUN;

The following example illustrates how effectively the formatted input uses column holders,informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses. SAS reads a variable x1 as a string five characters long, a numeric variable x2 7 digits long without decimal point, three digit numeric variables x3 through x5, and then skip one column (+1) before reading a numeric variable income containing commas.

DATA formatted;
INPUT (x1-x5) (\$CHAR5. 7. 3*3.0) +1 income COMMA7.;
DATALINES /*–+—-2—-+—-3*/;
Park 8740031 89 95100 84,895
Hwang9301020100 93 99 168,579

RUN;
/* Output
Park 8740031 89 95 100 84895
Hwang 9301020 100 93 99 168579
*/

The formattted input can use both column and line pointer controls. These pointer controls are very useful when reading multiple observations from the same line or reading an observation from multiple lines.

• @n, a column control, moves the input pointer to nth column
• @@, a line holder, keeps the pointer in the line and wait other data input
• +n, a column control, moves the pointer to the right by n columns
• #n, a row control, goes to the nth line
• / goes to the first column of the next line NAMED INPUT

The named input reads a data value that follows its variable name. A variable name and its data value are separated by an equal sign. String data are NOT enclosed by double quotation marks in this style. Like the list style, the named style supports standard length of variables only. The format provides some sorts of flexibility, but it will not be appropriate for a large data set.

DATA named;
DATALINES;

RUN; MIXED INPUT

The INPUT statement can contain list input, column input, formatted input, and/or named input.

DATA mixed;
INPUT name \$ 1-5 @7 id \$7. +1 grade1 3. grade2 18-22;
CARDS /*–1—-+—-2—*/;
Park 8740031 89 95.1
Hwang 9301020 100 93.9

RUN; READING MULTIPLE OBSERVATIONS

Let us read multiple observations in a line using the formatted input style. The following script reads string variables “name” and “id” consecutively, and reads three digit numeric variables x1 through x3, and then keep reading next observations, if available, without moving to next line.

DATA formatted;
INPUT name \$ id \$ (x1-x3)(3.) @@;
CARDS /*–1—-+—-2—-+—-3—-+—-4—-+—-5-*/;
Park 8740031  89 95100 Choi 9730625 100100 95
Hwang 9301020 100 93 99 …
;RUN;
/* Output
Park 8740031 89 95 100
Choi 9730625 100 100 95
Hwang 9301020 100 93 99
*/

The following example reads data using a DO loop.

DATA rbd_block;
INPUT treat \$ @@;
DO block=’High’, ‘Medium’, ‘Low’; /* DO block=1 TO 3;*/
INPUT income @@; OUTPUT;
END;
DATALINES;
Drug1 34 55 34
Drug2 45 56 32
Drug3 45 56 32
;RUN;
/* Output
1 Drug1 High 34
2 Drug1 Medi 55
3 Drug1 Low 34
4 Drug2 High 45
5 Drug2 Medi 56
6 Drug2 Low 32
7 Drug3 High 45
8 Drug3 Medi 56
9 Drug3 Low 32
*/

Suppose individual observations have different numbers of repeatition. Pay attention to IF and OUTPUT statements.

DATA repeat;
INPUT crop \$ no @;
DROP no;
IF no GT 0 THEN DO;
DO trial=1 TO no;
INPUT cost benefit @;
OUTPUT;
END;
END;
DATALINES;
rice 3 54 87 98 77 57 67
bean 2 65 87 96 54
RUN;
/* Output
rice 1 54 87
rice 2 98 77
rice 3 57 67
bean 1 65 87
bean 2 96 54
*/ READING MULTIPLE LINES

Now, let us read observations whose data are provided in multiple lines. The #n or / indicates a data line to be read for the variable.

DATA spanned;
INPUT #1 No 7.0 #2 name \$CHAR15. / address \$CHAR50. #4 phone \$CHAR12.;
DATALINES;
000001
Park
2451 E. 10th St. APT 311
812-857-9425
000002
Hun
800 N. Union St. APT 525
812-857-6256
RUN;
/* Output
1 Park 2451 E. 10th St. APT 311 812-857-9425
2 Hun 800 N. Union St. APT 525 812-857-6256
*/

The INPUT statement above says that read a 7 digit numeric variable “No” from the first line (#1), a 15 character string variable “name” from the second line (#2), a 50 character string variable “address” from the next line (/), and a 12 character string variable “phone” from the fourth line (#4). Alternatively, the INPUT may be replaced by “INPUT No 7.0 / Name \$15 / Address \$50 / Phone \$12;.”

### SAS Date, Time, and Datetime Functions

SAS date, time, and datetime functions are used to perform the following tasks:

• compute date, time, and datetime values from calendar and time-of-day values
• compute calendar and time-of-day values from date and datetime values
• convert between date, time, and datetime values
• perform calculations that involve time intervals
• provide information about time intervals

For all interval functions, you can supply the intervals and other character arguments either directly as a quoted string or as a SAS character variable. When you use a character variable, you should set the length of the character variable to at least the length of the longest string for that variable that is used in the DATA step.

Also, to ensure correct results when using interval functions, use date intervals with date values and datetime intervals with datetime values.

See SAS Language Reference: Dictionary for a complete description of these functions.

The following list shows SAS date, time, and datetime functions in alphabetical order.

DATE()

returns today’s date as a SAS date value.

DATEJUL( yyddd )

returns the SAS date value when given the Julian date in yyddd or yyyyddd format. For example, DATE = DATEJUL(99001); assigns the SAS date value ’01JAN99’D to DATE, and DATE = DATEJUL(1999365); assigns the SAS date value ’31DEC1999’D to DATE.

DATEPART( datetime )

returns the date part of a SAS datetime value as a date value.

DATETIME()

returns the current date and time of day as a SAS datetime value.

DAY( date )

returns the day of the month from a SAS date value.

DHMS( date, hour, minute, second )

returns a SAS datetime value for date, hour, minute, and second values.

HMS( hour, minute, second )

returns a SAS time value for hour, minute, and second values.

HOLIDAY( ‘holiday‘, year )

returns a SAS date value for the holiday and year specified. Valid values for holiday are ‘BOXING’, ‘CANADA’, ‘CANADAOBSERVED’, ‘CHRISTMAS’, ‘COLUMBUS’, ‘EASTER’, ‘FATHERS’, ‘HALLOWEEN’, ‘LABOR’, ‘MLK’, ‘MEMORIAL’, ‘MOTHERS’, ‘NEWYEAR’,’THANKSGIVING’, ‘THANKSGIVINGCANADA’, ‘USINDEPENDENCE’, ‘USPRESIDENTS’, ‘VALENTINES’, ‘VETERANS’, ‘VETERANSUSG’, ‘VETERANSUSPS’, and ‘VICTORIA’. For example: EASTER2000 = HOLIDAY(’EASTER’, 2000);

HOUR( datetime )

returns the hour from a SAS datetime or time value.

INTCINDEX( ‘date-interval‘, date )

INTCINDEX( ‘datetime-interval‘, datetime )

returns the index of the seasonal cycle when given an interval and an appropriate SAS date, datetime, or time value. For example, the seasonal cycle for INTERVAL=’DAY’ is ‘WEEK’, soINTCINDEX(’DAY’,’01SEP78’D); returns 35 because September 1, 1978, is the sixth day of the th week of the year. For correct results, date intervals should be used with date values, and datetime intervals should be used with datetime values.

INTCK( ‘date-interval‘, date1, date2 <, ‘method’> )

INTCK( ‘datetime-interval‘, datetime1, datetime2 <, ‘method’> )

returns the number of boundaries of intervals of the given kind that lie between the two date or datetime values. The optional method argument specifies that the intervals are counted using either a discrete or a continuous method. The default DISCRETE (or DISC or D) method uses discrete time intervals. For the DISCRETE method, the distance in MONTHS between January 31, 2000, and February 1, 2000, is one month. The CONTINUOUS (or CONT or C) method uses continuous time intervals. For the CONTINUOUS method, the distance in MONTHS between January 15, 2000, and February 14, 2000, is zero, but the distance in MONTHS between January 15, 2000, and February 15, 2000, is one month.

INTCYCLE( ‘interval‘ <, seasonality> )

returns the interval of the seasonal cycle, given a date, time, or datetime interval. For example, INTCYCLE(‘MONTH’) returns ‘YEAR’ because the months January, February, …, December constitute a yearly cycle. INTCYCLE(‘DAY’) returns ‘WEEK’ because Sunday, Monday, …, Saturday constitute a weekly cycle.

You can specify the optional seasonality argument to construct a cycle other than the default seasonal cycle. For example, INTCYCLE(‘MONTH’, 3) returns ‘QTR’. The optional second argument is the seasonal frequency.

INTFIT( date1, date2, ‘D‘ )

INTFIT( datetime1, datetime2, ‘DT‘ )

INTFIT( obs1, obs2, ‘OBS‘ )

returns an interval that fits exactly between two SAS date, datetime, or observation values. That is, if the interval result of the INTFIT function is used with date1, , and SAMEDAY alignment in the INTNX function, then the result is date2. This concept is illustrated in the following example, where result1 is the same as date1 and result2 is the same as date2.

```   FitInterval = INTFIT( date1, date2, 'D' );
result1 = INTNX( FitInterval, date1, 0, 'SAMEDAY');
result2 = INTNX( FitInterval, date1, 1, 'SAMEDAY');
```

More than one interval can fit the preceding definition. For example, two SAS date values that are seven days apart could be fit with either ‘DAY7’ or ‘WEEK’. The INTFIT function chooses the more common interval, so ‘WEEK’ is the result when the dates are seven days apart. The INTFIT function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.

INTFMT(‘interval‘ ,’size‘)

returns a recommended format when given a date, time, or datetime interval for displaying the time ID values associated with a time series of the given interval. The second argument to INTFMT controls the width of the year for date formats; it can take the value ‘long’ or ‘l’ to specify that the returned format display a four-digit year or the value ‘short’ or ‘s’ to specify that the returned format display a two-digit year.

INTGET( date1, date2, date3 )

INTGET( datetime1, datetime2, datetime3 )

returns an interval that fits three consecutive SAS date or datetime values. The INTGET function examines two intervals: the first interval between date1 and date2, and the second interval between date2 and date3. In order for an interval to be detected, either the two intervals must be the same or one interval must be an integer multiple of the other interval. That is, INTGET assumes that at least two of the dates are consecutive points in the time series, and that the other two dates are also consecutive or represent the points before and after missing observations. The INTGET function assumes that large values are SAS datetime values, which are measured in seconds, and that smaller values are SAS date values, which are measured in days. The INTGET function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.

INTINDEX( ‘date-interval‘, date <, seasonality> )

INTINDEX( ‘datetime-interval‘, datetime <, seasonality> )

returns the seasonal index for the specified date or datetime interval and an appropriate date or datetime value. The seasonal index is a number that represents the position of the date or datetime value in the seasonal cycle of the specified interval. For example, INTINDEX(’MONTH’,’01DEC2000’D); returns 12 because monthly data is yearly periodic and DECEMBER is the th month of the year. However, INTINDEX(’DAY’,’01DEC2000’D); returns 6 because daily data is weekly periodic and December 01, 2000, is a Friday, the sixth day of the week. To correctly identify the seasonal index, the interval specification should agree with the date or datetime value. For example, INTINDEX(’DTMONTH’,’01DEC2000’D); and INTINDEX(’MONTH’,’01DEC2000:00:00:00’DT);do not return the expected value of 12. However, both INTINDEX(’MONTH’,’01DEC2000’D); and INTINDEX(’DTMONTH’,’01DEC2000:00:00:00’DT); return the expected value of 12.

You can specify the optional seasonality argument to use a seasonal cycle other than the default seasonal cycle. For example, INTINDEX(’MONTH’,’01APR2000’D); returns the value 4, to indicate the fourth month of the year. However, INTINDEX(’MONTH’,’01APR2000’D,3); and INTINDEX(’MONTH’,’01APR2000’D,’QTR’); return the value 1 to indicate the first month of the quarter. Specifying either 3 or ‘QTR’ for the third argument uses a quarterly seasonal cycle instead of the default yearly seasonal cycle.

INTNX( ‘date-interval‘, date, n <, ‘alignment’> )

INTNX( ‘datetime-interval‘, datetime, n <, ‘alignment’> )

returns the date or datetime value of the beginning of the interval that is n intervals from the interval that contains the given date or datetime value. The optional alignment argument specifies that the returned date is aligned to the beginning, middle, or end of the interval. Beginning is the default. In addition, you can specify SAME (S) alignment. The SAME alignment bases the alignment of the calculated date or datetime value on the alignment of the input date or datetime value. As illustrated in the following example, the SAME alignment can be used to calculate the meaning of “same day next year” or “same day two weeks from now.”

```   nextYear = INTNX( 'YEAR', '15Apr2007'D, 1, 'S' );
TwoWeeks = INTNX( 'WEEK', '15Apr2007'D, 2, 'S' );
```

The preceding example returns ’15Apr2008’D for nextYear and ’29Apr2007’D for TwoWeeks.

For all values of alignment, the number of discrete intervals n between the input date and the resulting date agrees with the input value. In the following example, the result is always that n2 = n1:

```   date2 = INTNX( interval, date1, n1, align );
n2 = INTCK( interval, date1, date2 );
```

The preceding example uses the DISCRETE method of the INTCK function by default. The result n2 = n1 does not always apply when the CONTINUOUS method of the INTCK function is specified.

INTSEAS( ‘interval‘ <, seasonality> )

returns the length of the seasonal cycle for the specified date or datetime interval. The length of a seasonal cycle is the number of intervals in a seasonal cycle. For example, when the interval for a time series is described as monthly, many procedures use the option INTERVAL=MONTH to indicate that each observation in the data corresponds to a particular month. Monthly data are considered to be periodic for a one-year seasonal cycle. There are 12 months in one year, so the number of intervals (months) in a seasonal cycle (year) is 12. For quarterly data, there are 4 quarters in one year, so the number of intervals in a seasonal cycle is 4. The periodicity is not always one year. For example, INTERVAL=DAY is considered to have a seasonal cycle of one week, and because there are 7 days in a week, the number of intervals in a seasonal cycle is 7.

You can specify the optional seasonality argument to use a seasonal cycle other than the default seasonal cycle. For example, INTSEAS(‘MONTH’, 3) and INTSEAS(‘MONTH’, ‘QTR’) both specify a quarterly seasonal cycle and return the value 3. If the optional seasonality argument is numeric, it is the seasonal frequency. If the optional seasonality argument is character, it is the seasonal cycle.

INTSHIFT( ‘interval‘ )

returns the shift interval that applies to the shift index if a subperiod is specified. For example, YEAR intervals are shifted by MONTH, so INTSHIFT(‘YEAR’) returns ‘MONTH’.

INTTEST( ‘interval‘ )

returns 1 if the interval name is valid, 0 otherwise. For example, VALID = INTTEST(’MONTH’); should set VALID to 1, while VALID = INTTEST(’NOTANINTERVAL’); should set VALID to 0. The INTTEST function can be useful in verifying which values of multiplier n and the shift index s are valid in constructing an interval name.

JULDATE( date )

returns the Julian date from a SAS date value. The format of the Julian date is either yyddd or yyyyddd depending on the value of the system option YEARCUTOFF=. For example, using the default system option values, JULDATE( ’31DEC1999’D ); returns 99365, while JULDATE( ’31DEC1899’D ); returns 1899365.

MDY( month, day, year )

returns a SAS date value for month, day, and year values.

MINUTE( datetime )

returns the minute from a SAS time or datetime value.

MONTH( date )

returns the numerical value for the month of the year from a SAS date value. For example, MONTH=MONTH(’01JAN2000’D); returns , the numerical value for January.

NWKDOM( n, weekday, month, year )

returns a SAS date value for the th weekday of the month and year specified. For example, Thanksgiving is always the fourth (n=4) Thursday (weekday=5) in November (month=11). ThusTHANKS2000 = NWKDOM( 4, 5, 11, 2000); returns the SAS date value for Thanksgiving in the year 2000. The last weekday of a month can be specified by using n=5. Memorial Day in the United States is the last (n=5) Monday (weekday=2) in May (month=5), and so MEMORIAL2002 = NWKDOM( 5, 2, 5, 2002); returns the SAS date value for Memorial Day in 2002. Because always specifies the last occurrence of the month and most months have only 4 instances of each day, the result for is often the same as the result for . NWKDOM is useful for calculating the SAS date values of holidays that are defined in this manner.

QTR( date )

returns the quarter of the year from a SAS date value.

SECOND( date )

returns the second from a SAS time or datetime value.

TIME()

returns the current time of day.

TIMEPART( datetime )

returns the time part of a SAS datetime value.

TODAY()

returns the current date as a SAS date value. (TODAY is another name for the DATE function.)

WEEK( date <, ‘descriptor’> )

returns the week of year from a SAS date value. The algorithm used to calculate the week depends on the descriptor, which can take the value ‘U’, ‘V’, or ‘W’.

If the descriptor is ‘U,’ weeks start on Sunday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.

If the descriptor is ‘V’, the result is equivalent to the ISO 8601 week of year definition. The range is to . Week is a leap week. The first week of the year, Week , and the last week of the year, Week or , can include days in another Gregorian calendar year.

If the descriptor is ‘W’, weeks start on Monday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.

WEEKDAY( date )

returns the day of the week from a SAS date value. For example WEEKDAY=WEEKDAY(’17OCT1991’D); returns , the numerical value for Thursday.

YEAR( date )

returns the year from a SAS date value.

YYQ( year, quarter )

returns a SAS date value for year and quarter values.

### Arithmetic Functions

 ABS(argument) returns absolute value DIM(array-name) returns the number of elements in a one-dimensional array or the number of elements in a specified dimension of a multidimensional array. n specifies the dimension, in a multidimensional array, for which you want to know the the number of elements. DIM(array-name,bound-n) returns the number of elements in a one-dimensional array or the number of elements in the specified dimension of a multidimensional array bound-n specifies the dimension in a multidimensional array, for which you want to know the number of elements. HBOUND(array-name) returns the upper bound of an array HBOUND(array-name,bound-n) returns the upper bound of an array LBOUND(array-name) returns the lower bound of an array LBOUND(array-name,bound-n) returns the lower bound of an array MAX(argument,argument, …) returns the largest value of the numeric arguments MIN(argument,argument, …) returns the smallest value of the numeric arguments MOD(argument-1, argument-2) returns the remainder SIGN(argument) returns the sign of a value or 0 SQRT(argument) returns the square root

### Character Functions

 BYTE(n) returns one character in the ASCII or EBCDIC collating sequence where nis an integer representing a specific ASCII or EBCDIC character COLLATE(start-position<,end-position>) | (start-position<,,length>) returns an ASCII or EBCDIC collating sequence character string COMPBL(source) removes multiple blanks between words in a character string COMPRESS(source<,characters-to-remove>) removes specific characters from a character string DEQUOTE(argument) removes quotation marks from a character value INDEX(source,excerpt) searches the source for the character string specified by the excerpt INDEXC(source,excerpt-1<, … excerpt-n>) searches the source for any character present in the excerpt INDEXW(source,excerpt) searches the source for a specified pattern as a word LEFT(argument) left-aligns a SAS character string LENGTH(argument) returns the length of an argument LOWCASE(argument) converts all letters in an argument to lowercase QUOTE(argument) adds double quotation marks to a character value RANK(x) returns the position of a character in the ASCII or EBCDIC collating sequence REPEAT(argument,n) repeats a character expression REVERSE(argument) reverses a character expression RIGHT(argument) right-aligns a character expression SCAN(argument,n<,delimiters>) returns a given word from a character expression SOUNDEX(argument) encodes a string to facilitate searching SUBSTR(argument,position<,n>)=characters-to-replace replaces character value contents var=SUBSTR(argument,position<,n>) extracts a substring from an argument. (var is any valid SAS variable name.) TRANSLATE(source,to-1,from-1<,…to-n,from-n>) replaces specific characters in a character expression TRANWRD(source,target,replacement) replaces or removes all occurrences of a word in a character string TRIM(argument) removes trailing blanks from character expression and returns one blank if the expression is missing TRIMN(argument) removes trailing blanks from character expressions and returns a null string if the expression is missing UPCASE(argument) converts all letters in an argument to uppercase VERIFY(source,excerpt-1<,…excerpt-n) returns the position of the first character unique to an expression

### Date and Time Functions

 DATDIF(sdate,edate,basis) returns the number of days between two dates DATE() returns the current date as a SAS date value DATEJUL(julian-date) converts a Julian date to a SAS date value DATEPART(datetime) extracts the date from a SAS datetime value DATETIME() returns the current date and time of day DAY(date) returns the day of the month from a SAS date value DHMS(date,hour,minute,second) returns a SAS datetime value from date, hour, minute, and second HMS(hour,minute,second) returns a SAS time value from hour, minute, and second HOUR(

### Mathematical Functions

 AIRY(x) returns the value of the AIRY function DAIRY(x) returns the derivative of the AIRY function DIGAMMA(argument) returns the value of the DIGAMMA function ERF(argument) returns the value of the (normal) error function ERFC(argument) returns the value of the (normal) error function EXP(argument) returns the value of the exponential function GAMMA(argument) returns the value of the GAMMA function IBESSEL(nu,x,kode) returns the value of the modified bessel function JBESSEL(nu,x) returns the value of the bessel function LGAMMA(argument) returns the natural logarithm of the GAMMA function LOG(argument) returns the natural (base e) logarithm LOG2(argument) returns the logarithm to the base 2 LOG10(argument) returns the logarithm to the base 10 TRIGAMMA(argument) returns the value of the TRIGAMMA function

### Noncentrality Functions

 CNONCT(x,df,prob) returns the noncentrality parameter from a chi-squared distribution FNONCT(x,ndf,ddf,prob) returns the value of the noncentrality parameter of an F distribution TNONCT(x,df,prob) returns the value of the noncentrality parameter from the student’s t distribution

### Probability and Density Functions

 CDF(‘dist’,quantile,parm-1,…,parm-k) computes cumulative distribution functions LOGPDF|LOGPMF(‘dist’,quantile,parm-1,…,parm-k) computes the logarithm of a probability density (mass) function. The two functions are identical. LOGSDF(‘dist’,quantile,parm-1,…,parm-k) computes the logarithm of a survival function PDF|PMF(‘dist’,quantile,parm-1,…,parm-k) computes probability density (mass) functions POISSON(m,n) returns the probability from a POISSON distribution PROBBETA(x,a,b) returns the probability from a beta distribution PROBBNML(p,n,m) returns the probability from a binomial distribution PROBCHI(x,df<,nc>) returns the probability from a chi-squared distribution PROBF(x,ndf,ddf<,nc>) returns the probability from an F distribution PROBGAM(x,a) returns the probability from a gamma distribution PROBHYPR(N,K,n,x<,r>) returns the probability from a hypergeometric distribution PROBMC probabilities and critical values (quantiles) from various distributions for multiple comparisons of the means of several groups. PROBNEGB(p,n,m) returns the probability from a negative binomial distribution PROBBNRM(x,y,r) standardized bivariate normal distribution PROBNORM(x) returns the probability from the standard normal distribution PROBT(x,df<,nc>) returns the probability from a Student’s t distribution SDF(‘dist’,quantile,parm-1,…,parm-k) computes a survival function

### Quantile Functions

 BETAINV(p,a,b) returns a quantile from the beta distribution CINV(p,df<,nc>) returns a quantile from the chi-squared distribution FINV(p,ndf,ddf<,nc>) returns a quantile from the F distribution GAMINV(p,a) returns a quantile from the gamma distribution PROBIT(p) returns a quantile from the standard normal distribution TINV(p,df<,nc>) returns a quantile from the t distribution

### Sample Statistics Functions

 CSS(argument,argument,…) returns the corrected sum of squares CV(argument,argument,…) returns the coefficient of variation KURTOSIS(argument,argument,…) returns the kurtosis (or 4th moment) MAX(argument,argument, …) returns the largest value MIN(argument,argument, …) returns the smallest value MEAN(argument,argument, …) returns the arithmetic mean (average) MISSING(numeric-expression | character-expression) returns a numeric result that indicates whether the argument contains a missing value N(argument,argument, ….) returns the number of nonmissing values NMISS(argument,argument, …) returns the number of missing values ORDINAL(count,argument,argument,…) returns the largest value of a part of a list RANGE(argument,argument,…) returns the range of values SKEWNESS(argument,argument,argument,…) returns the skewness STD(argument,argument,…) returns the standard deviation STDERR(argument,argument,…) returns the standard error of the mean SUM(argument,argument,…) returns the sum USS(argument,argument,…) returns the uncorrected sum of squares VAR(argument,argument,…) returns the variance

### State and ZIP Code Functions

FIPNAME(expression)

converts FIPS codes to uppercase state names

FIPNAMEL(expression)

converts FIPS codes to mixed case state names

FIPSTATE(expression)

converts FIPS codes to two-character postal codes

STFIPS(postal-code)

converts state postal codes to FIPS state codes

STNAME(postal-code)

converts state postal codes to uppercase state names

 Tip: For Version 6, the maximum length of the value that is returned is 200 characters. For Version 7 and beyond, the maximum length is 20 characters.

STNAMEL(postal-code)

converts state postal codes to mixed case state names

 Tip: For Version 6, the maximum length of the value that is returned is 200 characters. For Version 7 and beyond, the maximum length is 20 characters.

ZIPFIPS(zip-code)

converts ZIP codes to FIPS state codes

ZIPNAME(zip-code)

converts ZIP codes to uppercase state names

ZIPNAMEL(zip-code)

converts ZIP codes to mixed case state names

ZIPSTATE(zip-code)

converts ZIP codes to state postal codes

### Trigonometric and Hyperbolic Functions

 ARCOS(argument) returns the arccosine ARSIN(argument) returns the arcsine ATAN(argument) returns the arctangent COS(argument) returns the cosine COSH(argument) returns the hyperbolic cosine SIN(argument) returns the sine SINH(argument) returns the hyperbolic sine TAN(argument) returns the tangent TANH(argument) returns the hyperbolic tangent

### Truncation Functions

 CEIL(argument) returns the smallest integer that is greater than or equal to the argument FLOOR(argument) returns the largest integer that is less than or equal to the argument FUZZ(argument) returns the nearest integer if the argument is within 1E-12 INT(argument) returns the integer value ROUND(argument,round-off-unit) rounds to the nearest round-off unit TRUNC(number, length) truncates a numeric value to a specified length

### Variable Information Functions

 GETVARC(data-set-id,var-num) returns the value of a SAS data set character variable GETVARN(data-set-id,var-num) returns the value of a SAS data set numeric variable VARFMT(data-set-id,var-num) returns the format assigned to a SAS data set variable VARINFMT(data-set-id,var-num) returns the informat assigned to a SAS data set variable VARLABEL(data-set-id,var-num) returns the label assigned to a SAS data set variable VARLEN(data-set-id,var-num) returns the length of a SAS data set variable VARNAME(data-set-id,var-num) returns the name of a SAS data set variable VARNUM(data-set-id,var-name) returns the number of a SAS data set variable’s position in a SAS data set VARRAY(name) returns a value that indicates whether the specified name is an array VARRAYX(expression) returns a value that indicates whether the value of the specified argument is an array VARTYPE(data-set-id,var-num) returns the data type of a SAS data set variable VFORMAT(var) returns the format associated with the given variable VFORMATD(var) returns the format decimal value associated with the given variable VFORMATDX(expression) returns the format decimal value associated with the value of the specified argument VFORMATN(var) returns the format name associated with the given variable VFORMATNX(expression) returns the format name associated with the value of the specified argument VFORMATW(var) returns the format width associated with the given variable VFORMATWX(expression) returns the format width associated with the value of the specified argument VFORMATX(expression) returns the format associated with the value of the specified argument VINARRAY(var) returns a value that indicates whether the given variable is a member of an array VINARRAYX(expression) returns a value that indicates whether the value of the specified argument is a member of an array VINFORMAT(var) returns the informat associated with the given variable VINFORMATD(var) returns the informat decimal value associated with the given variable VINFORMATDX(expression) returns the informat decimal value associated with the value of the specified argument VINFORMATN(var) returns the informat name associated with the given variable VINFORMATNX(expression) returns the informat name associated with the value of the specified argument VINFORMATW(var) returns the informat width associated with the given variable VINFORMATWX(expression) returns the informat width associated with the value of the specified argument VINFORMATX(expression) returns the informat associated with the value of the specified argument VLABEL(var) returns the label associated with the given variable VLABELX(expression) returns the variable label for the value of a specified argument VLENGTH(var) returns the compile-time (allocated) size of the given variable VLENGTHX(expression) returns the compile-time (allocated) size for the value of the specified argument VNAME(var) returns the name of the given variable VNAMEX(expression) validates the value of the specified argument as a variable name VTYPE(var) returns the type (character or numeric) of the given variable VTYPEX(expression) returns the type (character or numeric) for the value of the specified argument

Regards,

SAS INDIA

### Bill Gates is putting his own money into a small impact-investing fund focused on India Bill Gates arguably ushered in the current golden age of philanthropy. Now the world’s richest man has endorsed another way to put money to work for good: impact investing.

It’s safe to say Gates doesn’t need the market-rate returns expected from his new venture-capital investment in Unitus Seed Fund. His modest commitment closes a \$20 million US-Indian fund through which Unitus has taken early stakes in more than a dozen for-profit startups providing health, education and livelihoods, for Indian families living on less than \$10 a day. (Impact investments are intended to generate—and measure and report—social and environmental impact alongside a financial return.)

The investment puts Gates on one side of a debate that has divided his fellow tech titans and billionaires, and now apparently separates the Microsoft co-founder from his friend Warren Buffett. Should private investors back businesses with explicit social and environmental missions and metrics?

Buffett and Gates are co-founders of the Giving Pledge, which has signed up more than 125 billionaires to give away at least half of their fortunes. But Buffett has favored the traditional separation of business and charity. “I think it’s tough to serve two masters,” he told a conference last year. “I would rather have the investment produce the capital and then have an organization totally focused on the philanthropic aspects.”

Marc Andreessen, the Silicon Valley venture capitalist who challenged Gates in the Netscape-Microsoft Web browser wars of the 1990s, has also been critical of the idea. Two years ago, Andreessen said “I would run screaming from a B Corp,” or for-benefit company that adopts explicit social goals, which he said are distractions for startups.

“The split model makes me nervous and I don’t think we would ever touch that,” Andreessen said on a panel. “It’s like a houseboat. It’s not a great house and not a great boat.”

Will Poole, co-founder and managing director of Unitus, recently spent several days on a houseboat in Kerala, in India’s southwest. “It was an excellent boat and a fine house,” Poole said, “and we provided local economic development at the same time.”

Unitus, with headquarters in Seattle and Bangalore, in its first year has made 14 investments of generally between \$100,000 and \$250,000 in companies such as Smile Merchants, which operates low-income dental clinics near Mumbai, and Hippocampus Learning Centers, a network of private kindergartens serving more than 6,000 rural and low-income students.

Poole, who spent 13 years at Microsoft, got Gates on board after gaining commitments from other high- and ultra-high net worth investors. Unitus has attracted 15 Indian nationals and more than a dozen non-resident Indians, including venture capitalist Vinod Khosla, Romesh Wadhwani, founder of Aspect Development and Steve Singh, CEO of Concur Technologies, which SAP recently agreed to buy for more than \$8 billion.

Gates’ investment in Unitus comes out of a personal fund, not from the Gates Foundation, which has set aside \$1 billion for to provide equity, loans and loan guarantees to for-profit companies. Gates personally has made a number of food and energy investments, including Hampton Creek Foods and EcoMotors, a low-emission engine maker. But Unitus is apparently his first investment in a fund or company targeting so-called base-of-the-pyramid customers, the poor in the developing world.

“Impact investing is a powerful model with the potential to build markets and drive change for the people who need it most,” Gates said in a statement confirming the investment in Unitus.

Some of the naysayers may be softening their positions. Andreessen’s wife, Laura Arrillaga-Andreessen, is the founder of a social venture-capital firm and earlier this year, Andreessen’s firm, Andreessen Horowitz, invested in AltSchool, a network of micro-schools offering personalized education for children…that is in the process of becoming a certified B Corp.

### SAS Global Certification Program Regards,

SAS INDIA

### MISSING(), NMISS() and the CMISS() functions

SAS provides several functions to test for missing values but in this post we will focus on MISSING(), CMISS() and NMISS() functions. The NMISS() function is reserved for numeric variables. The MISSING() and CMISS() functions can be used with either character or numeric variables.  The CMISS() and NMISS() functions are designed by SAS to count the number of arguments with missing values whereas the MISSING function checks whether or not a variable is missing. The MISSING(), CMISS(), and  NMISS() functions provide a simple approach to check for missing values and these functions let you write few lines of code by avoiding large if-statements when you need to check for missing values in several values at the same time.

MISSING() function is very useful when you need to check any variable has a missing value or not, but not sure if it’s a character or numeric? MISSING function works for either character or numeric variables and it also checks for the special numeric missing values (.A, .B,.C ._ etc)as well. The MISSING() function produces a numeric result (0 or 1) if the data point is present or missing. MISSING(varname) is the same as MISSING(varname)=1.  MISSING(varname)=0 specifies when the data point is present.

The MISSING function is particularly useful if you use special missing values since ‘if varname=.’ will not identify all missing values in such cases.

NOTE: Missing value is not consistent in SAS as it changes between numeric and character variables. A single period (.) represents the Numeric missing value. A single blank enclosed in single or double quotes (‘ ‘ or “  ” ) represents the Character missing value. A single period followed by a single letter or an underscore (ex: .A, .B, .Z, ._) represents Special numeric missing values. Please note that these special missing values available for numeric variables only.

The NMISS() function will count the number of arguments with missing values in the specified list of numeric variables. NMISS() function is very useful if you want to make sure if at least one variable in the list is not missing.

The CMISS() is available with SAS 9.2 and SAS Enterprise Guide 4.3 and is similar to the NMISS() function. The only difference is that it counts the number arguments that are missing for both character and numeric variables.

The NMISS() function returns the number of argument variables which have missing values. NMISS works with multiple numeric values, whereas MISSING works with only one value that can be either numeric or character.

Examples:

* count the number of the variables A, B, and C which have missing values;

count=nmiss(A, B, C);

count=nmiss(of A B C);

* count the number of the variables from Var1 to Var10 which have missing values;

count=nmiss(of var1-var10);

Examples:

x1=nmiss(1,0,.,2,5,.);

2

x2=nmiss(1,0);

0

x3=nmiss(of x1-x2);

0