Knowledge Matter, Results Count.

Variables are characteristics or properties of data that can take on different values or amounts for different individuals in the population. They are data attributes such as ID number, age, gender, and body temperature. Variables can be classified according to their function or the way they’re used in a study.

A variable can be independent or dependent. An independent variable can take different values. An independent variable affects or determines a dependent variable. A dependent variable can take different values in response to an independent variable. In some contexts, a researcher selects or controls the value of an independent variable, in order to determine its relationship to the dependent variable.

For example, a researcher investigating the effect of fertilizer on plant growth could change the amount of fertilizer – the independent variable – in order to observe the effect on the plants – the dependent variable. In other contexts, however, the independent variable’s values are simply taken as given.

For example, suppose a researcher is trying to determine the effect of a variable such as incarceration rate on crime rate. The researcher can’t manipulate the variable incarceration rate. It is simply observed. In this case, you might hear this variable referred to as a predictor variable. You might also hear this type of variable referred to as an explanatory or control variable. A dependent variable is also known as a response variable or an outcome variable.

Advertisements

Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.

**The Conjoint Model**

Conjoint analysis is based on a main effects analysis-of-variance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical *utility *or *part-worth utility *value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. *Metric conjoint analysis *models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple main-effects ANOVA with some specialized

output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors.

This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the *ith *level of mileage, the jth level of expected reliability, and the k th level of *price. *The grand mean is , and the error is ijk. *Nonmetric conjoint analysis *finds a monotonic transformation of the preference judgments.

The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.

The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using

ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.

**SAS Program Statements**

OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;

*

* Define a data set named TIRES.

* The variable RANK typically would be the average across all subjects.

*;

DATA TIRES;

INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 9-10;

CARDS;

1 1 2 1 3

1 1 3 2 2

1 2 1 2 14

1 2 2 2 10

1 3 1 1 17

1 3 3 1 12

2 1 1 2 7

2 1 3 2 1

2 2 1 1 8

2 2 3 1 5

2 3 2 1 13

2 3 2 2 16

3 1 1 1 6

3 1 2 1 4

3 2 2 2 15

3 2 3 1 9

3 3 1 2 18

3 3 3 2 11

;

*

* Set up value labels.

*;

PROC FORMAT;

VALUE BRANDF

1 = ‘GOODSTONE’

2 = ‘PIROGI ‘

3 = ‘MACHISMO ‘;

VALUE PRICEF

1 = ‘$69.99’

2 = ‘$74.99’

3 = ‘$79.99’;

VALUE LIFEF

1 = ‘50,000’

2 = ‘60,000’

3 = ‘70,000’;

VALUE HAZARDF

1 = ‘YES’

2 = ‘NO ‘;

PROC FREQ NOPRINT;

FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;

*

* Conduct nonmetric (i.e., simple) conjoint analysis.

*;

PROC TRANSREG MAXITER=50 UTILITIES SHORT;

ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;

MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);

OUTPUT IREPLACE PREDICTED;

*;

PROC PRINT LABEL;

VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;

LABEL PRANK = ‘PREDICTED RANKS’;

*

* Conduct metric conjoint analysis using the %mktex SAS macro.

* The parentheses after the %MKTEX macro defines:

* The number of categories for each variable.

* The number of combinations being evaluated.

* Seed= [some number] is not strictly necessary, but helps ensure a

reproducible design.

*;

%MKTEX(3 3 3 2, N=18, SEED=448)

%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,

STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)

%MKTEVAL;

PROC PRINT DATA=SASUSER.TIREDESIGN;

RUN;

Conjoint analysis indicates consumer preferences for products with multiple characteristics, wherein these characteristics vary among several categories. For example, the researcher might want to learn consumer preferences for a coffee maker with three characteristics: price (with three levels), number of cups brewed (with three levels), and timed start (yes or no). The task is to determine which of the 3x3x2 = 12 combinations of characteristics is most preferred by consumers.

**The Conjoint Model**

Conjoint analysis is based on a main effects analysis-of-variance model. Data are collected by asking subjects about their preferences for hypothetical products defined by attribute combinations. Conjoint analysis decomposes the judgment data into components, based on qualitative attributes of the products. A numerical *utility *or *part-worth utility *value is computed for each level of each attribute. Large utilities are assigned to the most preferred levels, and small utilities are assigned to the least preferred levels. The attributes with the largest utility range are considered the most important in predicting preference. Conjoint analysis is a statistical model with an error term and a loss function. *Metric conjoint analysis *models the judgments directly. When all of the attributes are nominal, the metric conjoint analysis is a simple main-effects ANOVA with some specialized

output. The attributes are the independent variables, the judgments comprise the dependent variable, and the utilities are the parameter estimates from the ANOVA model. The following is a metric conjoint analysis model for three factors.

This model could be used, for example, to investigate preferences for cars that differ on three attributes: mileage, expected reliability, and price. Yijk is one subject’s stated preference for a car with the *ith *level of mileage, the jth level of expected reliability, and the k th level of *price. *The grand mean is , and the error is ijk. *Nonmetric conjoint analysis *finds a monotonic transformation of the preference judgments.

The model, which follows directly from conjoint measurement, iteratively fits the ANOVA model until the transformation stabilizes. The R2 increases during every iteration until convergence, when the change in R2 is essentially zero. The following is a metric conjoint analysis model for three factors.

The R2 for a nonmetric conjoint analysis model will always be greater than or equal to the R2 from a metric analysis of the same data. The smaller R2 in metric conjoint analysis is not necessarily a disadvantage, since results should be more stable and reproducible with the metric model. Metric conjoint analysis was derived from nonmetric conjoint analysis as a special case. Today, metric conjoint analysis is used more often than nonmetric conjoint analysis. In the SAS System, conjoint analysis is performed with the SAS/STAT procedure TRANSREG (transformation regression). Metric conjoint analysis models are fit using

ordinary least squares, and nonmetric conjoint analysis models are fit using an alternating least squares algorithm.

**SAS Program Statements**

OPTIONS PAGENO=1 PAGESIZE=56 NOLABEL;

*

* Define a data set named TIRES.

* The variable RANK typically would be the average across all subjects.

*;

DATA TIRES;

INPUT BRAND 1 PRICE 3 LIFE 5 HAZARD 7 RANK 9-10;

CARDS;

1 1 2 1 3

1 1 3 2 2

1 2 1 2 14

1 2 2 2 10

1 3 1 1 17

1 3 3 1 12

2 1 1 2 7

2 1 3 2 1

2 2 1 1 8

2 2 3 1 5

2 3 2 1 13

2 3 2 2 16

3 1 1 1 6

3 1 2 1 4

3 2 2 2 15

3 2 3 1 9

3 3 1 2 18

3 3 3 2 11

;

*

* Set up value labels.

*;

PROC FORMAT;

VALUE BRANDF

1 = ‘GOODSTONE’

2 = ‘PIROGI ‘

3 = ‘MACHISMO ‘;

VALUE PRICEF

1 = ‘$69.99’

2 = ‘$74.99’

3 = ‘$79.99’;

VALUE LIFEF

1 = ‘50,000’

2 = ‘60,000’

3 = ‘70,000’;

VALUE HAZARDF

1 = ‘YES’

2 = ‘NO ‘;

PROC FREQ NOPRINT;

FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.;

*

* Conduct nonmetric (i.e., simple) conjoint analysis.

*;

PROC TRANSREG MAXITER=50 UTILITIES SHORT;

ODS SELECT TESTSNOTE COVERGENCESTATUS FITSTATISTICS UTILITIES;

MODEL MONOTONE(RANK / REFLECT) = CLASS(BRAND PRICE LIFE HAZARD / ZERO=SUM);

OUTPUT IREPLACE PREDICTED;

*;

PROC PRINT LABEL;

VAR RANK TRANK PRANK BRAND PRICE LIFE HAZARD;

LABEL PRANK = ‘PREDICTED RANKS’;

*

* Conduct metric conjoint analysis using the %mktex SAS macro.

* The parentheses after the %MKTEX macro defines:

* The number of categories for each variable.

* The number of combinations being evaluated.

* Seed= [some number] is not strictly necessary, but helps ensure a

reproducible design.

*;

%MKTEX(3 3 3 2, N=18, SEED=448)

%MKTLAB(VARS = BRAND PRICE LIFE HAZARD, OUT=SASUSER.TIREDESIGN,

STATEMENTS = FORMAT BRAND BRANDF. PRICE PRICEF. LIFE LIFEF. HAZARD HAZARDF.)

%MKTEVAL;

PROC PRINT DATA=SASUSER.TIREDESIGN;

RUN;

**Proc transreg** performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. **Psplines** are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.

In this page, we will walk through an example **proc transreg** with the **pspline** option and explore its defaults. The **bspline**, **spline**, and **pspline** options, when similarly specified, yield the same results. Their differences lie in the number and type of transformed variables generated for estimation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for **proc transreg**.

data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc gplot data = a; plot y*x; run;

Clearly, there is not a single, continuous function relating Y to X. The relationship does not appear random, but it does appear to change with X. Thus it makes sense to try to fit this with splines. Before running the **proc transreg**, we can see that our data contains four variables:

proc print data = a (obs = 5); run;Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241

In the **proc transreg** command, we indicate in the **model** line that we wish to predict variable **y** without transformation with **identity(y)**. If we wished to model a transformed version of **y **(the log or rank of **y**, for example), we would indicate the transformation here. To predict **y,** we indicate that we wish to use piecewise polynomial functions of **x **with **pspline(x)**. We also opted to output a dataset, **a2**, containing predicted values from the model.

proc transreg data=a; model identity(y) = pspline(x); output out = a2 predicted; run;The TRANSREG Procedure TRANSREG Univariate Algorithm Iteration History for Identity(Y) Iteration Average Maximum Criterion Number Change Change R-Square Change Note ------------------------------------------------------------------------- 1 0.00000 0.00000 0.46884 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47. Let’s look at the dataset output by **proc transreg**.

proc print data = a2 (obs = 5); run;Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 1 SCORE ROW1 24.7694 24.7694 24.1144 1 0.10000 0.01000 2 SCORE ROW2 24.4427 24.4427 23.4722 1 0.20000 0.04000 3 SCORE ROW3 24.0234 24.0234 22.8424 1 0.30000 0.09000 4 SCORE ROW4 23.5155 23.5155 22.2249 1 0.40000 0.16000 5 SCORE ROW5 22.9241 22.9241 21.6195 1 0.50000 0.25000 Obs X_3 TIntercept TX_1 TX_2 TX_3 X 1 0.00100 1 0.10000 0.01000 0.00100 0.10000 2 0.00800 1 0.20000 0.04000 0.00800 0.20000 3 0.02700 1 0.30000 0.09000 0.02700 0.30000 4 0.06400 1 0.40000 0.16000 0.06400 0.40000 5 0.12500 1 0.50000 0.25000 0.12500 0.50000

In addition to adding the predicted values, **py**, to the dataset, we can see that a new variable, **ty**, has been added for the “transformed” value of **y** (since our transformation was the identity, these values are the same as **y**); three variables (**x_1**, **x_2**, **x_3**) that are the powers of **x** have been added. Transformations of these three variables and the intercept are also included and indicated with a ‘**t**‘. We can see that, by default, SAS fits a single third-degree polynomial in **x** to **y**. Note that though splines are often used to fit piecewise functions, the default setting when using **pspline** in **proc transreg** is to estimate just one function (zero knots).

We can plot the predicted values to see how closely they match the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; proc gplot data = a2; plot (y py)*x / overlay legend = legend; run;

For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming **x** in the same manner as **proc transreg**.

data a3; set a; x2 = x*x; x3 = x*x*x; run; proc reg data = a3; model y = x x2 x3; run;The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7955.26078 2651.75359 57.67 <.0001 Error 196 9012.65604 45.98294 Corrected Total 199 16968 Root MSE 6.78107 R-Square 0.4688 Dependent Mean 12.04335 Adj R-Sq 0.4607 Coeff Var 56.30551 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 24.76908 1.95451 12.67 <.0001 X 1 -6.60903 0.84002 -7.87 <.0001 x2 1 0.62721 0.09698 6.47 <.0001 x3 1 -0.01513 0.00317 -4.77 <.0001

In this example, using **proc transreg** only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of **x**. Doing so with **data** and **proc reg** steps quickly becomes unmanageable or impossible, while doing so with **proc transreg** is effective and efficient.

%d bloggers like this: