Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.
The Conjoint Model
Conjoint analysis is based on a main effects analysisofvariance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical utility or partworth utility value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. Metric conjoint analysis models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple maineffects ANOVA with some specialized
output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors.
This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the ith level of mileage, the jth level of expected reliability, and the k th level of price. The grand mean is , and the error is ijk. Nonmetric conjoint analysis finds a monotonic transformation of the preference judgments.
The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.
The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using
ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.
SAS Program Statements
OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;
*
* Define a data set named TIRES.
* The variable RANK typically would be the average across all subjects.
*;
DATA TIRES;
INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 910;
CARDS;
1 1 2 1 3
1 1 3 2 2
1 2 1 2 14
1 2 2 2 10
1 3 1 1 17
1 3 3 1 12
2 1 1 2 7
2 1 3 2 1
2 2 1 1 8
2 2 3 1 5
2 3 2 1 13
2 3 2 2 16
3 1 1 1 6
3 1 2 1 4
3 2 2 2 15
3 2 3 1 9
3 3 1 2 18
3 3 3 2 11
;
*
* Set up value labels.
*;
PROC FORMAT;
VALUE BRANDF
1 = ‘GOODSTONE’
2 = ‘PIROGI ‘
3 = ‘MACHISMO ‘;
VALUE PRICEF
1 = ‘$69.99’
2 = ‘$74.99’
3 = ‘$79.99’;
VALUE LIFEF
1 = ‘50,000’
2 = ‘60,000’
3 = ‘70,000’;
VALUE HAZARDF
1 = ‘YES’
2 = ‘NO ‘;
PROC FREQ NOPRINT;
FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;
*
* Conduct nonmetric (i.e., simple) conjoint analysis.
*;
PROC TRANSREG MAXITER=50 UTILITIES SHORT;
ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;
MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);
OUTPUT IREPLACE PREDICTED;
*;
PROC PRINT LABEL;
VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;
LABEL PRANK = ‘PREDICTED RANKS’;
*
* Conduct metric conjoint analysis using the %mktex SAS macro.
* The parentheses after the %MKTEX macro defines:
* The number of categories for each variable.
* The number of combinations being evaluated.
* Seed= [some number] is not strictly necessary, but helps ensure a
reproducible design.
*;
%MKTEX(3 3 3 2, N=18, SEED=448)
%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,
STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)
%MKTEVAL;
PROC PRINT DATA=SASUSER.TIREDESIGN;
RUN;
Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.
The Conjoint Model
Conjoint analysis is based on a main effects analysisofvariance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical utility or partworth utility value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. Metric conjoint analysis models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple maineffects ANOVA with some specialized
output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors.
This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the ith level of mileage, the jth level of expected reliability, and the k th level of price. The grand mean is , and the error is ijk. Nonmetric conjoint analysis finds a monotonic transformation of the preference judgments.
The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.
The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using
ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.
SAS Program Statements
OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;
*
* Define a data set named TIRES.
* The variable RANK typically would be the average across all subjects.
*;
DATA TIRES;
INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 910;
CARDS;
1 1 2 1 3
1 1 3 2 2
1 2 1 2 14
1 2 2 2 10
1 3 1 1 17
1 3 3 1 12
2 1 1 2 7
2 1 3 2 1
2 2 1 1 8
2 2 3 1 5
2 3 2 1 13
2 3 2 2 16
3 1 1 1 6
3 1 2 1 4
3 2 2 2 15
3 2 3 1 9
3 3 1 2 18
3 3 3 2 11
;
*
* Set up value labels.
*;
PROC FORMAT;
VALUE BRANDF
1 = ‘GOODSTONE’
2 = ‘PIROGI ‘
3 = ‘MACHISMO ‘;
VALUE PRICEF
1 = ‘$69.99’
2 = ‘$74.99’
3 = ‘$79.99’;
VALUE LIFEF
1 = ‘50,000’
2 = ‘60,000’
3 = ‘70,000’;
VALUE HAZARDF
1 = ‘YES’
2 = ‘NO ‘;
PROC FREQ NOPRINT;
FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;
*
* Conduct nonmetric (i.e., simple) conjoint analysis.
*;
PROC TRANSREG MAXITER=50 UTILITIES SHORT;
ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;
MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);
OUTPUT IREPLACE PREDICTED;
*;
PROC PRINT LABEL;
VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;
LABEL PRANK = ‘PREDICTED RANKS’;
*
* Conduct metric conjoint analysis using the %mktex SAS macro.
* The parentheses after the %MKTEX macro defines:
* The number of categories for each variable.
* The number of combinations being evaluated.
* Seed= [some number] is not strictly necessary, but helps ensure a
reproducible design.
*;
%MKTEX(3 3 3 2, N=18, SEED=448)
%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,
STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)
%MKTEVAL;
PROC PRINT DATA=SASUSER.TIREDESIGN;
RUN;
Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Psplines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.
In this page, we will walk through an example proc transreg with the pspline option and explore its defaults. The bspline, spline, and pspline options, when similarly specified, yield the same results. Their differences lie in the number and type of transformed variables generated for estimation.
We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg.
data a; x=0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=ysin(xc); output; end; run; proc gplot data = a; plot y*x; run;
Clearly, there is not a single, continuous function relating Y to X. The relationship does not appear random, but it does appear to change with X. Thus it makes sense to try to fit this with splines. Before running the proc transreg, we can see that our data contains four variables:
proc print data = a (obs = 5); run; Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241
In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here. To predict y, we indicate that we wish to use piecewise polynomial functions of x with pspline(x). We also opted to output a dataset, a2, containing predicted values from the model.
proc transreg data=a; model identity(y) = pspline(x); output out = a2 predicted; run; The TRANSREG Procedure TRANSREG Univariate Algorithm Iteration History for Identity(Y) Iteration Average Maximum Criterion Number Change Change RSquare Change Note  1 0.00000 0.00000 0.46884 Converged
We can see in the outcome above that the model converged and has an Rsquared value of 0.47. Let’s look at the dataset output by proc transreg.
proc print data = a2 (obs = 5); run;Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 1 SCORE ROW1 24.7694 24.7694 24.1144 1 0.10000 0.01000 2 SCORE ROW2 24.4427 24.4427 23.4722 1 0.20000 0.04000 3 SCORE ROW3 24.0234 24.0234 22.8424 1 0.30000 0.09000 4 SCORE ROW4 23.5155 23.5155 22.2249 1 0.40000 0.16000 5 SCORE ROW5 22.9241 22.9241 21.6195 1 0.50000 0.25000 Obs X_3 TIntercept TX_1 TX_2 TX_3 X 1 0.00100 1 0.10000 0.01000 0.00100 0.10000 2 0.00800 1 0.20000 0.04000 0.00800 0.20000 3 0.02700 1 0.30000 0.09000 0.02700 0.30000 4 0.06400 1 0.40000 0.16000 0.06400 0.40000 5 0.12500 1 0.50000 0.25000 0.12500 0.50000
In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty, has been added for the “transformed” value of y (since our transformation was the identity, these values are the same as y); three variables (x_1, x_2, x_3) that are the powers of x have been added. Transformations of these three variables and the intercept are also included and indicated with a ‘t‘. We can see that, by default, SAS fits a single thirddegree polynomial in x to y. Note that though splines are often used to fit piecewise functions, the default setting when using pspline in proc transreg is to estimate just one function (zero knots).
We can plot the predicted values to see how closely they match the original data.
legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; proc gplot data = a2; plot (y py)*x / overlay legend = legend; run;
For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming x in the same manner as proc transreg.
data a3; set a; x2 = x*x; x3 = x*x*x; run; proc reg data = a3; model y = x x2 x3; run;The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7955.26078 2651.75359 57.67 <.0001 Error 196 9012.65604 45.98294 Corrected Total 199 16968 Root MSE 6.78107 RSquare 0.4688 Dependent Mean 12.04335 Adj RSq 0.4607 Coeff Var 56.30551 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 24.76908 1.95451 12.67 <.0001 X 1 6.60903 0.84002 7.87 <.0001 x2 1 0.62721 0.09698 6.47 <.0001 x3 1 0.01513 0.00317 4.77 <.0001
In this example, using proc transreg only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of x. Doing so with data and proc reg steps quickly becomes unmanageable or impossible, while doing so with proc transreg is effective and efficient.
Basic Commands
 quit(); q()
 help(command); help.start()
 search(); help.search()
 dir(); methods()
 library(p); identify(); attach(); detatch()
 remove(); rm()
 start:end; c(); rep(); seq()
 scan(); print(); str(); ls()
 cat(); cat(“concaternate”, c, “and print”, “\t”)
 options(prompt=’.’, continue=”///”, digits=10); getOption(“width”)
 source(); source.url() /* run commands in a file */
Simple examples
library() # list packages available
library(car) # load a package
list(data()) # list data sets in the current package
summary(Davis)
list(Davis)
list(Davis$weight)
stem(Davis[,2]) # equal to stem(Davis$weight)
stem(Davis$height, scale=4)
boxplot(Davis$weight)
w<Davis$weight
h<Davis$height
plot(w ~ h)
cor(Davis[,c(2:3)])
cor.test(w,h)
t.test(Davis[,2], mu=65)
t.test(Davis$height, Davis$weight, mu=100, paired=FALSE)
var.test(Davis$height, Davis$weight)d<=read.csv(“c:/temp/R/nes.csv”, header=TRUE) list(names(d)) # list variable names
Operators
 < (left assignment), > (right assignment)
 +, , *, /, ^, %% (modulus)
 >, >=, <, <=, == (equal), != (not equal)
 & (and),  (or)
 %*% (matrix product); %/% (division)
 %o% (Outer product); %x% (Kronecker product)
 %in% (Matching operator);
Functions
 abs(); sin(); cos(); tan(); exp(); sqrt(); min(); max()
 log(); log(v,10); log10(); log2(); log(v, base=10)
 mean(); sum(); median(); range(); var(); sd()
 rank(); ave(v, group); by(group)
 c(a, b, c); c(start:end); seq(start:end); seq(10, 100, by=5)
 rep(n, time); rep(7, 3); rep(start:end, time)
 rep(1:3, c(2,2,2)); rep(1:3, each=2); rep(1:3, c(1:3))
 seq(1,4); seq(1,10, by=2); seq(0,1, length=10)
 length(), sort(), order(); rev(v) ## to reverse
 dnorm(1.96); dt(1.96, 100); df(1.96, 1, 100); dchisq(1.96, 10)
 pnorm(1.96); pt(1.96, 100); pf(1.96, 1, 100); pchisq(1.96, 10)
 rpois(n, lamda); rnorm(n); rt(n, df); rt(n, df=c(1:10)); rexp(n)
 substring(s, start, stop); substr(s, start, stop); nchar(s)
 date()
 mode() ## type of object
Reading Text Files
 source(f); /* to execute commands in the file */
 read.table(f); read.table.url(url)
 download.file(url); url.show(url)
 m<read.table(“f:/temp/cigar.txt”, header=TRUE)
 m<read.table(‘f:/temp/cigar.txt’)
 names(m)<c(“a”, “b”, “c”)
 read.csv(f, header=TRUE, sep=”,”, quote=”\””, dec=”.”)
 read.csv2(f, header=TRUE, sep=”;”, quote=”\””, dec=”,”)
 read.delim(f, header=TRUE, sep=”\t”, quote=”\””, dec=”.”)
 read.delim2(f, header=TRUE, sep=”\t”, quote=”\””, dec=”,”)
 m<read.csv(“nes2.csv, header=TRUE)
 read.fwf(file, widths=c(3,5,3), header=”FALSE, sep=””, as.is=FALSE)
 as.is=TRUE; as.is=T # not to be converted into a factor
 na.strings<c(“.”, “NA”, “”, “#”) # characters for missing
 cnt=count.fields(df); which(cnt=7);
Reading Data Frames
 load(d);
 data(d); data(d, package=”p”)
 data.frame(v1, v2) /* to make a data frame out of vectors */
 m3<data.frame(as.matrix(m[,2:4]))
 m2<edit(m); m2<edit(data.frame(m))
 data.entry(df)
Handling Data
 m2<match(v1, v2, nomatch=0) # data merging
 m2<match(m[,1], m[,3])
Writing Data
 cat(); print()
 cat(“y x1 x2”, “2 4 2”, “5 2 7″, file=”sample.txt”, sep=”\n”)
 write.(obj, f)
 write.table(df, file=’firms.csv’, sep=”,”, row.names=NA, col.names=NA)
 save(f, obj); save.image(f)
 sink(); format()
Defining Matrices
 m<c(1, 2, 3, 4); c(1, 2, 3, 4)>m; assign(“m”, c(1, 2, 3, 4))
 m<data.frame(column1=c(1,2,3), column2=c(4,5,6)); ## 2 by 3
 rep(c(1,2,3), 2); rep(c(1,2,3), each=2);
 rep(c(1,2,3), c(2,2,2,)); m<c(c1=15, c2=54, c3=50)
 seq(1,4); seq(1,10, by=2); seq(0,1, length=10);
 intm<1:4; intm<numeric(); intm[1]m<1; intm[2]m>2
 strm<c(“a”, “b”, “c”); strm<charanter(); strm[1]m<“a”; strm[2]m<“b”
 blm<c(T,F); blm<v1>10; ## a boolean vector of TRUE and FALSE
 m<scan()
 mm<matrix(1:12,4); mm<matrix(1:12, nrow=4)
 mm<matrix(1:12, ncol=3); mm<matrix(1:12, nrow=4)
 mm<matrix(1:12, nrow=4, ncol=3); mm<matrix(1:12, 4, 3)
 arrm<array(1:10); arrm<array(1:10, dim=c(2,5))
 cbind(); rbind(); gl(); expand.grid()
 list()
Referring Matrices
 m[,2]; v=m[2,]; m[1, 3] ## to extract elements
 m[c(1, 5, 6)]; m2=m[c(1, 5, 6)] ## to extract elements
 m<c(c1=15, c2=54, c3=50); m<c(“c1”, “c3”)
 m2<m$c2; m2<m[,2]; m2<m[,”c2″]; m2<m[[2]]
 m[,3:5]; m3<m[,c(3, 4, 5)]; m3<m[,c(“c3”, “c4”, “c5”)]
 m<c(4, 2, 4); names(m)<c(“Grape”, “Pear”, “Apple”)
 m1$v2 /*variable 2 of the data frame 1*/
 white(); which.max(); which(min)
 attr(m, which); attributes(obj)
Matrix Functions
 t(); det(); rank(); eigen(); diag(); prod(); crossprod()
 sum(); mean(); var(); sd(); min(); max(); prod(); cumsum(); cumprod()
 is.na(m) ## to check if m contains a missing value
 rowsum(); colsum(); nrow(); ccol()
 dim(m); dimnames(m)
 merge(df1, df2)
 as.factor(); as.matrix(), as.vector(); /* conversion*/
 is.factor(); is.matrix(), is.vector();
 class(); unclass()
 na.omit(); na.fail(); unique(); table(); sample()
 as.array(); as.data.frame()
 as.numeric(); as.characters(); as.logical(); as.complex()
Ordinary Least Squares (OLS)
 lm(); glm()
 m.ols<lm(v1~v2+v3, data=m) ## linear model
 lm(v1~v2+v3, data=m); summary(lm(v1~v2+v3, data=m)); summary(m.ols)
 names(m.ols); coef(m.ols); fitted(m.ols); resid(m.ols)
 predict(fit); AIC(fit); logLik(fit); deviance(fit)
 model.matrix(v1~v2+v3, data=m)
 m.ols2<model.matrix(v1~v2+v3, data=m); summary(m.ols2)
Binary Response Regressions
 m.logit<glm(v1~v2+v3,family=binomial(link=logit),data=m)
 summary(m.logit); coef(m.logit); fitted(m.logit); resid(m.logit)
 lsfit(v1,v2)
 nls(); m.nonlin<lm(v1~v2+v2^2, data=m)
 anova(m.ols, m.nonlin)
 m.qr<qr(m) ## QR Decomposition of a Matrix
Descriptives
 summary(m); fivenum(m)
 stem(v); boxplot(v); boxplot(v1, v2); hist(v)
 qqnorm(v); qqline(v)
 rug(); lines()
 table() /*to make a table*/
 tabulate()
Multivariate Analysis
 cor(m); cor(sqrt(m)) ## Pearson correlation
 cor.test(v1, v2)
 prcomp() /* Principal components in the mva package*/
 kmeans() /* Kmeans cluster analysis in the mva package*/
 factanal() /* Factor analysis in the mva package*/
 cancor() /* Canonical correlation in the mva package*/
Categorical Data Analysis
 chisq.test(v1,v2) ## Pearson Chisquared Test
 fisher.test(v1,v2) ## Fisher Exact Test
 friedman.test(v1,v2) ## Friedman Test
 prop.test(); binom.test() ## sign test
 kruskal.test(v1,v2) ## KruskalWallis Rank Sum Test
 wilcox.test(v1,v2) ## Wilcoxon Rank Sum (MannWhitney) Test
 ks.test(v1,v2) ## Two Sample KolmogorovSmirnov Test
 bartlett.test(v1,v2) ## Bartlett Test for Homogeneity of Variances
TTEST AND ANOVA (pdf)
 t.test(v1,v2); t.test(v1,v2, var.equal=FALSE)
 t.test(v1,v2, mu=0 paired=FALSE)
 t.test(v1.v2, mu=10, paired=F, var.equal=T)
 power.t.test(v1,v2); pairwise.t.test()
 var.test(v1,v2) ## F test for equal variance
 m.anova<aov(v1~v2+v3, data=m)
 aov(); anova()
 summary(m.anova)
 power.anova.test() ## Power calculations for balanced oneway ANOVA tests
Modules
frame_name<function(arguments) {…}
mile.to.km<function(mile) {mile*8/5}
km<mile.to.km(c(35, 55, 75))Flow Control
if (condition) {…} else if (condition) {…} else {…}
while (condition ) {…} # {} may be omitted for a single line expression
for (index in start:end) {…}
for (i in 1t:100) {sum < sum + i}
repeat {…}
switch (statement, list)Programming Functions
 expression(); parse(); deparse(); eval()
 optim() /* generalpurpose optimization */
 nlm() /* Newton algorithm */
 lm() /* linear models */
 nls() /* nonlinear least squares model */
Plotting
 plot(y~x, data=m, pch=16) # plotting character (pch)
 pairs(m) # scatterplot matrix
 xyrange<range(m) # to get range of m
 plot(y~x, data=m, xlim=xyrange, ylim=xyrange)
 abline(0,1)
 plot((0:10), sin((1:10)*pi, type=”1″) # 1 joins the points
 barplot(); boxplot(); stem(); hist();
 matplot() /* matrix plot */
 pairs(m) /* scatterplots */
 coplot() /* conditional plot */
 stripplot() /* strip plot */
 qqplot(); qqnorm(); qqline() /* quantile0quantile plot */
Options
 points() # to add points to a plot
 lines() # to add lines
 text() # to add texts
 mtext() # to add margin texts
 axis() # to control axis
 par(cex=1.25 mex=1.25)
 par(mfrow=c(2,2), mfcol=c(1,1))
Regards,
SAS India
INPUT OVERVIEW
The INPUT statement describes the arrangement of a target data to be read in a DATA step. You need to provide variable names followed by $ (indicating a character value), pointer control, columnspecifications, informat, and/or line hold specifiers (i.e., @, and @@) in an INPUT statement.
The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a DATA step. In order to read external data files, you have to use the INFILE statement.
There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input. The following table summarizes features of four major styles.
Which input style is the best? It depends on your skills and characteristics of data sets. If your data set has just a few observations with several variables, the list input or the named input will be better than the column input or the formatted input. When data elements are not separated with a blank or other delimiters, you cannot use the list input style. When data are well arranged, the column input or formatted input will be better than the list input. Therefore, you need to examine the data structure carefully when deciding the best input style. Of course, you must take this issue into account from the data coding stage.
LIST INPUT
The input style simply lists variables separated with a blank. This style is also called the free format.
DATA listed;
INPUT name $ id score;
DATALINESS /*–1—+—2—*/;
Park 8740031 87.5
Hwang . 94.3
…
RUN;
A character variable should be followed by $. A missing value should be marked with a period (.); a blank does not mean a missing value in this input style. Do not use more than one “.” for a missing value. The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory are assigned to each variable. Therefore, a string longer than 8 characters will be trimmed. If you want to read a string longer than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements. Or you may use different input styles such as column input or formatted input.
DATA _NULL_;
LENGTH analysis $15.;
INFORMAT year MMDDYY10.;
INPUT analysis year;
FORMAT year DATE9.;
CARDS /*–1—+—2—*/;
Regression 1/2/2002
ANOVA 05/05/2007
TimeSeries 09/03/1968
RUN;
/* Output
Regression 01OCT2000
ANOVA 05MAY2004
TimeSeries 03SEP2009
*/
In the example above, you may use “INFORMAT analysis $15.” instead of the LENGTH statement. INFORMAT tells how data are read, while FORMAT tells the format to be displayed. MMDDYY10. reads data in the MM/DD/YYYY format. DATE9. displays date in the DDMMMYYYY format. Without the FORMAT for year, SAS will return odd numbers such as 14884, which are internally used in SAS.
The following example reads an ASCII text file with a comma delimited. Remember the default delimiter is a blank. See the INFILE statement for the detail.
DATA _NULL_;
INFILE ‘a:\tiger.dat’ DELIMITER=’,’ STOPOVER;
INPUT name $ id score
…
RUN;
MODIFIED LIST INPUT
The modified list style is a mixture of the list input and the formatted input. This style can deal with illstructured data. There are three format modifiers to be used especially when reading complex data.
The following example illustrates how : and & work in INPUT. The “Lindblom80” in the first row is trimed since it exceeds 8 characters; only first 8 characters, as specified in the INPUT statement, are read and the last two characters “08” are ignored. In the second row, SAS reads the first four characters “Park”, which are shorter than 8 characters, and then encounters a comma (delimiter); SAS stops reading data for the variable “name” and moves on to next variable. The variable “title” is defined by & with a maximum 50 characters. The delimiter, a comma, in the first and third row is treated as a character value. Two consecutive double quotation marks are read as a double quotation marks. Therefore, the title of the second observation is Readig “Small Is Beautiful” as shown in the output. Characters exceeding the maximum, 50 characters in this case, will be ignored.
DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : $8. title & $50.;
DATALINES;
Lindblom80,”Still Muddling, Not Yet Through”
Park, “Reading “”Small Is Beautiful”””
Simon, “””It was a disaster,”” he continue…”
RUN;
/* Output
Lindblom Still Muddling, Not Yet Through
Park Reading “Small Is Beautiful”
Simon “It was a disaster,” he continue…
*/
The INFILE statement above says that data are comma delimited and will be listed after DATALINES. DSD at the end of INFILE eliminates double quotation marks enclosing the character value when reading data. If you omit DSD, SAS will consider a comma in character values as a delimiter and read enclosing double quotation marks as character values. As a result, the output would look like,
Lindblom “Still Muddling
Park “Reading “”Small Is Beautiful”””
Simon “””It was a disaster,”” he continue…”
The second example shows how ~ (tilde) and DSD work together to read a string with a delimiter. SAS reads a comma in the string as a character value but does not eliminate double quoatation marks enclosing the string. If you omit DSD, the title of the second row will be ‘”Still Muddling’ because SAS treats a comma in the string as the delimiter and stops reading the character value for variable “title.”
DATA modified;
INFILE DATALINES DELIMITER=’,’ DSD;
INPUT name : $20. year : 4.0 title ~ $50.;
DATALINES;
Meyer and Rowan,1977,”Institutionalized Organization”
Lindblom,1979,”Still Muddling, Not Yet Through”
RUN;
/* Output
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom 1979 “Still Muddling, Not Yet Through”
*/
/* Output without DSD
Meyer and Rowan 1977 “Institutionalized Organization”
Lindblom 1979 “Still Muddling
*/
You may not ommit : after “year” in the INPUT statement above even when data are in the same fixed format. When the variable “year” is specified at the last of the list in the INPUT statement, : is not necessary.
COLUMN INPUT
The column input style reads the value of a variable from its specified column location. A variable name is followed by its starting and ending columns.
DATA columned;
INPUT name $ 15 id 612 score 1417;
CARDS /*–1—+—2—*/;
Park 8740031 87.5
Hwang9301020 94.3
…
RUN;
SAS reads a variable “name” from 1 through 5 columns, id from 6 through 12 columns, and so on. This input style works good for well structured data.
FORMATTED INPUT
The formatted input style reads input values with specified inforamts after variable names. Informats provide the data type and the width of an input value. Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point. You cannot omit d even when d = 0. The use $CHARw. or $w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.
DATA formatted;
INPUT name $5. id 7. score 4.1;
DATALINES /*–+—2—*/;
Park 8740031 875 /* score=87.5 */
Hwang9301020 943 /* score=94.3 */
…
RUN;
You can use parentheses to simplify expressions.
DATA formatted;
INPUT name $5. id 7. (grade1grade3) (3.);
DATALINES /*–+—2—*/;
Park 8740031 89 95100
Hwang9301020100 93 99
…
RUN;
The following example illustrates how effectively the formatted input uses column holders,informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses. SAS reads a variable x1 as a string five characters long, a numeric variable x2 7 digits long without decimal point, three digit numeric variables x3 through x5, and then skip one column (+1) before reading a numeric variable income containing commas.
DATA formatted;
INPUT (x1x5) ($CHAR5. 7. 3*3.0) +1 income COMMA7.;
DATALINES /*–+—2—+—3*/;
Park 8740031 89 95100 84,895
Hwang9301020100 93 99 168,579
…
RUN;
/* Output
Park 8740031 89 95 100 84895
Hwang 9301020 100 93 99 168579
*/
The formattted input can use both column and line pointer controls. These pointer controls are very useful when reading multiple observations from the same line or reading an observation from multiple lines.
NAMED INPUT
The named input reads a data value that follows its variable name. A variable name and its data value are separated by an equal sign. String data are NOT enclosed by double quotation marks in this style. Like the list style, the named style supports standard length of variables only. The format provides some sorts of flexibility, but it will not be appropriate for a large data set.
DATA named;
INPUT name=$ id= grade=;
DATALINES;
name=Park id=8740031 grade=89
name=Hwang id=9301020 grade=100
…
RUN;
MIXED INPUT
The INPUT statement can contain list input, column input, formatted input, and/or named input.
DATA mixed;
INPUT name $ 15 @7 id $7. +1 grade1 3. grade2 1822;
CARDS /*–1—+—2—*/;
Park 8740031 89 95.1
Hwang 9301020 100 93.9
…
RUN;
READING MULTIPLE OBSERVATIONS
Let us read multiple observations in a line using the formatted input style. The following script reads string variables “name” and “id” consecutively, and reads three digit numeric variables x1 through x3, and then keep reading next observations, if available, without moving to next line.
DATA formatted;
INPUT name $ id $ (x1x3)(3.) @@;
CARDS /*–1—+—2—+—3—+—4—+—5*/;
Park 8740031 89 95100 Choi 9730625 100100 95
Hwang 9301020 100 93 99 …
;RUN;
/* Output
Park 8740031 89 95 100
Choi 9730625 100 100 95
Hwang 9301020 100 93 99
*/
The following example reads data using a DO loop.
DATA rbd_block;
INPUT treat $ @@;
DO block=’High’, ‘Medium’, ‘Low’; /* DO block=1 TO 3;*/
INPUT income @@; OUTPUT;
END;
DATALINES;
Drug1 34 55 34
Drug2 45 56 32
Drug3 45 56 32
;RUN;
/* Output
1 Drug1 High 34
2 Drug1 Medi 55
3 Drug1 Low 34
4 Drug2 High 45
5 Drug2 Medi 56
6 Drug2 Low 32
7 Drug3 High 45
8 Drug3 Medi 56
9 Drug3 Low 32
*/
Suppose individual observations have different numbers of repeatition. Pay attention to IF and OUTPUT statements.
DATA repeat;
INPUT crop $ no @;
DROP no;
IF no GT 0 THEN DO;
DO trial=1 TO no;
INPUT cost benefit @;
OUTPUT;
END;
END;
DATALINES;
rice 3 54 87 98 77 57 67
bean 2 65 87 96 54
RUN;
/* Output
rice 1 54 87
rice 2 98 77
rice 3 57 67
bean 1 65 87
bean 2 96 54
*/
READING MULTIPLE LINES
Now, let us read observations whose data are provided in multiple lines. The #n or / indicates a data line to be read for the variable.
DATA spanned;
INPUT #1 No 7.0 #2 name $CHAR15. / address $CHAR50. #4 phone $CHAR12.;
DATALINES;
000001
Park
2451 E. 10th St. APT 311
8128579425
000002
Hun
800 N. Union St. APT 525
8128576256
RUN;
/* Output
1 Park 2451 E. 10th St. APT 311 8128579425
2 Hun 800 N. Union St. APT 525 8128576256
*/
The INPUT statement above says that read a 7 digit numeric variable “No” from the first line (#1), a 15 character string variable “name” from the second line (#2), a 50 character string variable “address” from the next line (/), and a 12 character string variable “phone” from the fourth line (#4). Alternatively, the INPUT may be replaced by “INPUT No 7.0 / Name $15 / Address $50 / Phone $12;.”
SAS date, time, and datetime functions are used to perform the following tasks:
For all interval functions, you can supply the intervals and other character arguments either directly as a quoted string or as a SAS character variable. When you use a character variable, you should set the length of the character variable to at least the length of the longest string for that variable that is used in the DATA step.
Also, to ensure correct results when using interval functions, use date intervals with date values and datetime intervals with datetime values.
See SAS Language Reference: Dictionary for a complete description of these functions.
The following list shows SAS date, time, and datetime functions in alphabetical order.
You can specify the optional seasonality argument to construct a cycle other than the default seasonal cycle. For example, INTCYCLE(‘MONTH’, 3) returns ‘QTR’. The optional second argument is the seasonal frequency.
FitInterval = INTFIT( date1, date2, 'D' ); result1 = INTNX( FitInterval, date1, 0, 'SAMEDAY'); result2 = INTNX( FitInterval, date1, 1, 'SAMEDAY');
More than one interval can fit the preceding definition. For example, two SAS date values that are seven days apart could be fit with either ‘DAY7’ or ‘WEEK’. The INTFIT function chooses the more common interval, so ‘WEEK’ is the result when the dates are seven days apart. The INTFIT function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.
You can specify the optional seasonality argument to use a seasonal cycle other than the default seasonal cycle. For example, INTINDEX(’MONTH’,’01APR2000’D); returns the value 4, to indicate the fourth month of the year. However, INTINDEX(’MONTH’,’01APR2000’D,3); and INTINDEX(’MONTH’,’01APR2000’D,’QTR’); return the value 1 to indicate the first month of the quarter. Specifying either 3 or ‘QTR’ for the third argument uses a quarterly seasonal cycle instead of the default yearly seasonal cycle.
nextYear = INTNX( 'YEAR', '15Apr2007'D, 1, 'S' ); TwoWeeks = INTNX( 'WEEK', '15Apr2007'D, 2, 'S' );
The preceding example returns ’15Apr2008’D for nextYear and ’29Apr2007’D for TwoWeeks.
For all values of alignment, the number of discrete intervals n between the input date and the resulting date agrees with the input value. In the following example, the result is always that n2 = n1:
date2 = INTNX( interval, date1, n1, align ); n2 = INTCK( interval, date1, date2 );
The preceding example uses the DISCRETE method of the INTCK function by default. The result n2 = n1 does not always apply when the CONTINUOUS method of the INTCK function is specified.
You can specify the optional seasonality argument to use a seasonal cycle other than the default seasonal cycle. For example, INTSEAS(‘MONTH’, 3) and INTSEAS(‘MONTH’, ‘QTR’) both specify a quarterly seasonal cycle and return the value 3. If the optional seasonality argument is numeric, it is the seasonal frequency. If the optional seasonality argument is character, it is the seasonal cycle.
returns the second from a SAS time or datetime value.
TIME()
returns the current time of day.
TIMEPART( datetime )
returns the time part of a SAS datetime value.
TODAY()
returns the current date as a SAS date value. (TODAY is another name for the DATE function.)
WEEK( date <, ‘descriptor’> )
returns the week of year from a SAS date value. The algorithm used to calculate the week depends on the descriptor, which can take the value ‘U’, ‘V’, or ‘W’.
If the descriptor is ‘U,’ weeks start on Sunday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.
If the descriptor is ‘V’, the result is equivalent to the ISO 8601 week of year definition. The range is to . Week is a leap week. The first week of the year, Week , and the last week of the year, Week or , can include days in another Gregorian calendar year.
If the descriptor is ‘W’, weeks start on Monday and the range is to . If weeks and exist, they are only partial weeks. Week 52 can be a partial week.
WEEKDAY( date )
returns the day of the week from a SAS date value. For example WEEKDAY=WEEKDAY(’17OCT1991’D); returns , the numerical value for Thursday.
YEAR( date )
returns the year from a SAS date value.
YYQ( year, quarter )
returns a SAS date value for year and quarter values.
ABS(argument) 
returns absolute value 
DIM<n>(arrayname) 
returns the number of elements in a onedimensional array or the number of elements in a specified dimension of a multidimensional array. n specifies the dimension, in a multidimensional array, for which you want to know the the number of elements. 
DIM(arrayname,boundn) 
returns the number of elements in a onedimensional array or the number of elements in the specified dimension of a multidimensional array boundn specifies the dimension in a multidimensional array, for which you want to know the number of elements. 
HBOUND<n>(arrayname) 
returns the upper bound of an array 
HBOUND(arrayname,boundn) 
returns the upper bound of an array 
LBOUND<n>(arrayname) 
returns the lower bound of an array 
LBOUND(arrayname,boundn) 
returns the lower bound of an array 
MAX(argument,argument, …) 
returns the largest value of the numeric arguments 
MIN(argument,argument, …) 
returns the smallest value of the numeric arguments 
MOD(argument1, argument2) 
returns the remainder 
SIGN(argument) 
returns the sign of a value or 0 
SQRT(argument) 
returns the square root 
BYTE(n) 
returns one character in the ASCII or EBCDIC collating sequence where nis an integer representing a specific ASCII or EBCDIC character 
COLLATE(startposition<,endposition>)  (startposition<,,length>) 
returns an ASCII or EBCDIC collating sequence character string 
COMPBL(source) 
removes multiple blanks between words in a character string 
COMPRESS(source<,characterstoremove>) 
removes specific characters from a character string 
DEQUOTE(argument) 
removes quotation marks from a character value 
INDEX(source,excerpt) 
searches the source for the character string specified by the excerpt 
INDEXC(source,excerpt1<, … excerptn>) 
searches the source for any character present in the excerpt 
INDEXW(source,excerpt) 
searches the source for a specified pattern as a word 
LEFT(argument) 
leftaligns a SAS character string 
LENGTH(argument) 
returns the length of an argument 
LOWCASE(argument) 
converts all letters in an argument to lowercase 
QUOTE(argument) 
adds double quotation marks to a character value 
RANK(x) 
returns the position of a character in the ASCII or EBCDIC collating sequence 
REPEAT(argument,n) 
repeats a character expression 
REVERSE(argument) 
reverses a character expression 
RIGHT(argument) 
rightaligns a character expression 
SCAN(argument,n<,delimiters>) 
returns a given word from a character expression 
SOUNDEX(argument) 
encodes a string to facilitate searching 
SUBSTR(argument,position<,n>)=characterstoreplace 
replaces character value contents 
var=SUBSTR(argument,position<,n>) 
extracts a substring from an argument. (var is any valid SAS variable name.) 
TRANSLATE(source,to1,from1<,…ton,fromn>) 
replaces specific characters in a character expression 
TRANWRD(source,target,replacement) 
replaces or removes all occurrences of a word in a character string 
TRIM(argument) 
removes trailing blanks from character expression and returns one blank if the expression is missing 
TRIMN(argument) 
removes trailing blanks from character expressions and returns a null string if the expression is missing 
UPCASE(argument) 
converts all letters in an argument to uppercase 
VERIFY(source,excerpt1<,…excerptn) 
returns the position of the first character unique to an expression 
DATDIF(sdate,edate,basis) 
returns the number of days between two dates 
DATE() 
returns the current date as a SAS date value 
DATEJUL(juliandate) 
converts a Julian date to a SAS date value 
DATEPART(datetime) 
extracts the date from a SAS datetime value 
DATETIME() 
returns the current date and time of day 
DAY(date) 
returns the day of the month from a SAS date value 
DHMS(date,hour,minute,second) 
returns a SAS datetime value from date, hour, minute, and second 
HMS(hour,minute,second) 
returns a SAS time value from hour, minute, and second 
HOUR(<time  datetime>) 
returns the hour from a SAS time or datetime value 
INTCK(‘interval’,from,to) 
returns the number of time intervals in a given time span 
INTNX(‘interval’,startfrom,increment<,’alignment’>) 
advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value 
JULDATE(date) 
returns the Julian date from a SAS date value 
MDY(month,day,year) 
returns a SAS date value from month, day, and year values 
MINUTE(time  datetime) 
returns the minute from a SAS time or datetime value 
MONTH(date) 
returns the month from a SAS date value 
QTR(date) 
returns the quarter of the year from a SAS date value 
SECOND(time  datetime) 
returns the second from a SAS time or datetime value 
TIME() 
returns the current time of day 
TIMEPART(datetime) 
extracts a time value from a SAS datetime value 
TODAY() 
returns the current date as a SAS date value 
WEEKDAY(date) 
returns the day of the week from a SAS date value 
YEAR(date) 
returns the year from a SAS date value 
YRDIF(sdate,edate,basis) 
returns the difference in years between two dates 
YYQ(year,quarter) 
returns a SAS date value from the year and quarter 
AIRY(x) 
returns the value of the AIRY function 
DAIRY(x) 
returns the derivative of the AIRY function 
DIGAMMA(argument) 
returns the value of the DIGAMMA function 
ERF(argument) 
returns the value of the (normal) error function 
ERFC(argument) 
returns the value of the (normal) error function 
EXP(argument) 
returns the value of the exponential function 
GAMMA(argument) 
returns the value of the GAMMA function 
IBESSEL(nu,x,kode) 
returns the value of the modified bessel function 
JBESSEL(nu,x) 
returns the value of the bessel function 
LGAMMA(argument) 
returns the natural logarithm of the GAMMA function 
LOG(argument) 
returns the natural (base e) logarithm 
LOG2(argument) 
returns the logarithm to the base 2 
LOG10(argument) 
returns the logarithm to the base 10 
TRIGAMMA(argument) 
returns the value of the TRIGAMMA function 
CNONCT(x,df,prob) 
returns the noncentrality parameter from a chisquared distribution 
FNONCT(x,ndf,ddf,prob) 
returns the value of the noncentrality parameter of an F distribution 
TNONCT(x,df,prob) 
returns the value of the noncentrality parameter from the student’s t distribution 
CDF(‘dist’,quantile,parm1,…,parmk) 
computes cumulative distribution functions 
LOGPDFLOGPMF(‘dist’,quantile,parm1,…,parmk) 
computes the logarithm of a probability density (mass) function. The two functions are identical. 
LOGSDF(‘dist’,quantile,parm1,…,parmk) 
computes the logarithm of a survival function 
PDFPMF(‘dist’,quantile,parm1,…,parmk) 
computes probability density (mass) functions 
POISSON(m,n) 
returns the probability from a POISSON distribution 
PROBBETA(x,a,b) 
returns the probability from a beta distribution 
PROBBNML(p,n,m) 
returns the probability from a binomial distribution 
PROBCHI(x,df<,nc>) 
returns the probability from a chisquared distribution 
PROBF(x,ndf,ddf<,nc>) 
returns the probability from an F distribution 
PROBGAM(x,a) 
returns the probability from a gamma distribution 
PROBHYPR(N,K,n,x<,r>) 
returns the probability from a hypergeometric distribution 
PROBMC 
probabilities and critical values (quantiles) from various distributions for multiple comparisons of the means of several groups. 
PROBNEGB(p,n,m) 
returns the probability from a negative binomial distribution 
PROBBNRM(x,y,r) 
standardized bivariate normal distribution 
PROBNORM(x) 
returns the probability from the standard normal distribution 
PROBT(x,df<,nc>) 
returns the probability from a Student’s t distribution 
SDF(‘dist’,quantile,parm1,…,parmk) 
computes a survival function 
BETAINV(p,a,b) 
returns a quantile from the beta distribution 
CINV(p,df<,nc>) 
returns a quantile from the chisquared distribution 
FINV(p,ndf,ddf<,nc>) 
returns a quantile from the F distribution 
GAMINV(p,a) 
returns a quantile from the gamma distribution 
PROBIT(p) 
returns a quantile from the standard normal distribution 
TINV(p,df<,nc>) 
returns a quantile from the t distribution 
CSS(argument,argument,…) 
returns the corrected sum of squares 
CV(argument,argument,…) 
returns the coefficient of variation 
KURTOSIS(argument,argument,…) 
returns the kurtosis (or 4th moment) 
MAX(argument,argument, …) 
returns the largest value 
MIN(argument,argument, …) 
returns the smallest value 
MEAN(argument,argument, …) 
returns the arithmetic mean (average) 
MISSING(numericexpression  characterexpression) 
returns a numeric result that indicates whether the argument contains a missing value 
N(argument,argument, ….) 
returns the number of nonmissing values 
NMISS(argument,argument, …) 
returns the number of missing values 
ORDINAL(count,argument,argument,…) 
returns the largest value of a part of a list 
RANGE(argument,argument,…) 
returns the range of values 
SKEWNESS(argument,argument,argument,…) 
returns the skewness 
STD(argument,argument,…) 
returns the standard deviation 
STDERR(argument,argument,…) 
returns the standard error of the mean 
SUM(argument,argument,…) 
returns the sum 
USS(argument,argument,…) 
returns the uncorrected sum of squares 
VAR(argument,argument,…) 
returns the variance 
FIPNAME(expression) 
converts FIPS codes to uppercase state names 

FIPNAMEL(expression) 
converts FIPS codes to mixed case state names 

FIPSTATE(expression) 
converts FIPS codes to twocharacter postal codes 

STFIPS(postalcode) 
converts state postal codes to FIPS state codes 

STNAME(postalcode) 
converts state postal codes to uppercase state names


STNAMEL(postalcode) 
converts state postal codes to mixed case state names


ZIPFIPS(zipcode) 
converts ZIP codes to FIPS state codes 

ZIPNAME(zipcode) 
converts ZIP codes to uppercase state names 

ZIPNAMEL(zipcode) 
converts ZIP codes to mixed case state names 

ZIPSTATE(zipcode) 
converts ZIP codes to state postal codes 
ARCOS(argument) 
returns the arccosine 
ARSIN(argument) 
returns the arcsine 
ATAN(argument) 
returns the arctangent 
COS(argument) 
returns the cosine 
COSH(argument) 
returns the hyperbolic cosine 
SIN(argument) 
returns the sine 
SINH(argument) 
returns the hyperbolic sine 
TAN(argument) 
returns the tangent 
TANH(argument) 
returns the hyperbolic tangent 
CEIL(argument) 
returns the smallest integer that is greater than or equal to the argument 
FLOOR(argument) 
returns the largest integer that is less than or equal to the argument 
FUZZ(argument) 
returns the nearest integer if the argument is within 1E12 
INT(argument) 
returns the integer value 
ROUND(argument,roundoffunit) 
rounds to the nearest roundoff unit 
TRUNC(number, length) 
truncates a numeric value to a specified length 
GETVARC(datasetid,varnum) 
returns the value of a SAS data set character variable 
GETVARN(datasetid,varnum) 
returns the value of a SAS data set numeric variable 
VARFMT(datasetid,varnum) 
returns the format assigned to a SAS data set variable 
VARINFMT(datasetid,varnum) 
returns the informat assigned to a SAS data set variable 
VARLABEL(datasetid,varnum) 
returns the label assigned to a SAS data set variable 
VARLEN(datasetid,varnum) 
returns the length of a SAS data set variable 
VARNAME(datasetid,varnum) 
returns the name of a SAS data set variable 
VARNUM(datasetid,varname) 
returns the number of a SAS data set variable’s position in a SAS data set 
VARRAY(name) 
returns a value that indicates whether the specified name is an array 
VARRAYX(expression) 
returns a value that indicates whether the value of the specified argument is an array 
VARTYPE(datasetid,varnum) 
returns the data type of a SAS data set variable 
VFORMAT(var) 
returns the format associated with the given variable 
VFORMATD(var) 
returns the format decimal value associated with the given variable 
VFORMATDX(expression) 
returns the format decimal value associated with the value of the specified argument 
VFORMATN(var) 
returns the format name associated with the given variable 
VFORMATNX(expression) 
returns the format name associated with the value of the specified argument 
VFORMATW(var) 
returns the format width associated with the given variable 
VFORMATWX(expression) 
returns the format width associated with the value of the specified argument 
VFORMATX(expression) 
returns the format associated with the value of the specified argument 
VINARRAY(var) 
returns a value that indicates whether the given variable is a member of an array 
VINARRAYX(expression) 
returns a value that indicates whether the value of the specified argument is a member of an array 
VINFORMAT(var) 
returns the informat associated with the given variable 
VINFORMATD(var) 
returns the informat decimal value associated with the given variable 
VINFORMATDX(expression) 
returns the informat decimal value associated with the value of the specified argument 
VINFORMATN(var) 
returns the informat name associated with the given variable 
VINFORMATNX(expression) 
returns the informat name associated with the value of the specified argument 
VINFORMATW(var) 
returns the informat width associated with the given variable 
VINFORMATWX(expression) 
returns the informat width associated with the value of the specified argument 
VINFORMATX(expression) 
returns the informat associated with the value of the specified argument 
VLABEL(var) 
returns the label associated with the given variable 
VLABELX(expression) 
returns the variable label for the value of a specified argument 
VLENGTH(var) 
returns the compiletime (allocated) size of the given variable 
VLENGTHX(expression) 
returns the compiletime (allocated) size for the value of the specified argument 
VNAME(var) 
returns the name of the given variable 
VNAMEX(expression) 
validates the value of the specified argument as a variable name 
VTYPE(var) 
returns the type (character or numeric) of the given variable 
VTYPEX(expression) 
returns the type (character or numeric) for the value of the specified argument 
Regards,
SAS INDIA
Bill Gates arguably ushered in the current golden age of philanthropy. Now the world’s richest man has endorsed another way to put money to work for good: impact investing.
It’s safe to say Gates doesn’t need the marketrate returns expected from his new venturecapital investment in Unitus Seed Fund. His modest commitment closes a $20 million USIndian fund through which Unitus has taken early stakes in more than a dozen forprofit startups providing health, education and livelihoods, for Indian families living on less than $10 a day. (Impact investments are intended to generate—and measure and report—social and environmental impact alongside a financial return.)
The investment puts Gates on one side of a debate that has divided his fellow tech titans and billionaires, and now apparently separates the Microsoft cofounder from his friend Warren Buffett. Should private investors back businesses with explicit social and environmental missions and metrics?
Buffett and Gates are cofounders of the Giving Pledge, which has signed up more than 125 billionaires to give away at least half of their fortunes. But Buffett has favored the traditional separation of business and charity. “I think it’s tough to serve two masters,” he told a conference last year. “I would rather have the investment produce the capital and then have an organization totally focused on the philanthropic aspects.”
Marc Andreessen, the Silicon Valley venture capitalist who challenged Gates in the NetscapeMicrosoft Web browser wars of the 1990s, has also been critical of the idea. Two years ago, Andreessen said “I would run screaming from a B Corp,” or forbenefit company that adopts explicit social goals, which he said are distractions for startups.
“The split model makes me nervous and I don’t think we would ever touch that,” Andreessen said on a panel. “It’s like a houseboat. It’s not a great house and not a great boat.”
Will Poole, cofounder and managing director of Unitus, recently spent several days on a houseboat in Kerala, in India’s southwest. “It was an excellent boat and a fine house,” Poole said, “and we provided local economic development at the same time.”
Unitus, with headquarters in Seattle and Bangalore, in its first year has made 14 investments of generally between $100,000 and $250,000 in companies such as Smile Merchants, which operates lowincome dental clinics near Mumbai, and Hippocampus Learning Centers, a network of private kindergartens serving more than 6,000 rural and lowincome students.
Poole, who spent 13 years at Microsoft, got Gates on board after gaining commitments from other high and ultrahigh net worth investors. Unitus has attracted 15 Indian nationals and more than a dozen nonresident Indians, including venture capitalist Vinod Khosla, Romesh Wadhwani, founder of Aspect Development and Steve Singh, CEO of Concur Technologies, which SAP recently agreed to buy for more than $8 billion.
Gates’ investment in Unitus comes out of a personal fund, not from the Gates Foundation, which has set aside $1 billion for to provide equity, loans and loan guarantees to forprofit companies. Gates personally has made a number of food and energy investments, including Hampton Creek Foods and EcoMotors, a lowemission engine maker. But Unitus is apparently his first investment in a fund or company targeting socalled baseofthepyramid customers, the poor in the developing world.
“Impact investing is a powerful model with the potential to build markets and drive change for the people who need it most,” Gates said in a statement confirming the investment in Unitus.
Some of the naysayers may be softening their positions. Andreessen’s wife, Laura ArrillagaAndreessen, is the founder of a social venturecapital firm and earlier this year, Andreessen’s firm, Andreessen Horowitz, invested in AltSchool, a network of microschools offering personalized education for children…that is in the process of becoming a certified B Corp.
SAS provides several functions to test for missing values but in this post we will focus on MISSING(), CMISS() and NMISS() functions. The NMISS() function is reserved for numeric variables. The MISSING() and CMISS() functions can be used with either character or numeric variables. The CMISS() and NMISS() functions are designed by SAS to count the number of arguments with missing values whereas the MISSING function checks whether or not a variable is missing. The MISSING(), CMISS(), and NMISS() functions provide a simple approach to check for missing values and these functions let you write few lines of code by avoiding large ifstatements when you need to check for missing values in several values at the same time.
MISSING() function is very useful when you need to check any variable has a missing value or not, but not sure if it’s a character or numeric? MISSING function works for either character or numeric variables and it also checks for the special numeric missing values (.A, .B,.C ._ etc)as well. The MISSING() function produces a numeric result (0 or 1) if the data point is present or missing. MISSING(varname) is the same as MISSING(varname)=1. MISSING(varname)=0 specifies when the data point is present.
The MISSING function is particularly useful if you use special missing values since ‘if varname=.’ will not identify all missing values in such cases.
NOTE: Missing value is not consistent in SAS as it changes between numeric and character variables. A single period (.) represents the Numeric missing value. A single blank enclosed in single or double quotes (‘ ‘ or “ ” ) represents the Character missing value. A single period followed by a single letter or an underscore (ex: .A, .B, .Z, ._) represents Special numeric missing values. Please note that these special missing values available for numeric variables only.
The NMISS() function will count the number of arguments with missing values in the specified list of numeric variables. NMISS() function is very useful if you want to make sure if at least one variable in the list is not missing.
The CMISS() is available with SAS 9.2 and SAS Enterprise Guide 4.3 and is similar to the NMISS() function. The only difference is that it counts the number arguments that are missing for both character and numeric variables.
The NMISS() function returns the number of argument variables which have missing values. NMISS works with multiple numeric values, whereas MISSING works with only one value that can be either numeric or character.
Examples:
* count the number of the variables A, B, and C which have missing values;
count=nmiss(A, B, C);
count=nmiss(of A B C);
* count the number of the variables from Var1 to Var10 which have missing values;
count=nmiss(of var1var10);
Examples:
x1=nmiss(1,0,.,2,5,.);