Knowledge Matter, Results Count.

**Proc transreg** performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. **Psplines** are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.

In this page, we will walk through an example **proc transreg** with the **pspline** option and explore its defaults. The **bspline**, **spline**, and **pspline** options, when similarly specified, yield the same results. Their differences lie in the number and type of transformed variables generated for estimation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for **proc transreg**.

data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc gplot data = a; plot y*x; run;

Clearly, there is not a single, continuous function relating Y to X. The relationship does not appear random, but it does appear to change with X. Thus it makes sense to try to fit this with splines. Before running the **proc transreg**, we can see that our data contains four variables:

proc print data = a (obs = 5); run;Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241

In the **proc transreg** command, we indicate in the **model** line that we wish to predict variable **y** without transformation with **identity(y)**. If we wished to model a transformed version of **y **(the log or rank of **y**, for example), we would indicate the transformation here. To predict **y,** we indicate that we wish to use piecewise polynomial functions of **x **with **pspline(x)**. We also opted to output a dataset, **a2**, containing predicted values from the model.

proc transreg data=a; model identity(y) = pspline(x); output out = a2 predicted; run;The TRANSREG Procedure TRANSREG Univariate Algorithm Iteration History for Identity(Y) Iteration Average Maximum Criterion Number Change Change R-Square Change Note ------------------------------------------------------------------------- 1 0.00000 0.00000 0.46884 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47. Let’s look at the dataset output by **proc transreg**.

proc print data = a2 (obs = 5); run;Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 1 SCORE ROW1 24.7694 24.7694 24.1144 1 0.10000 0.01000 2 SCORE ROW2 24.4427 24.4427 23.4722 1 0.20000 0.04000 3 SCORE ROW3 24.0234 24.0234 22.8424 1 0.30000 0.09000 4 SCORE ROW4 23.5155 23.5155 22.2249 1 0.40000 0.16000 5 SCORE ROW5 22.9241 22.9241 21.6195 1 0.50000 0.25000 Obs X_3 TIntercept TX_1 TX_2 TX_3 X 1 0.00100 1 0.10000 0.01000 0.00100 0.10000 2 0.00800 1 0.20000 0.04000 0.00800 0.20000 3 0.02700 1 0.30000 0.09000 0.02700 0.30000 4 0.06400 1 0.40000 0.16000 0.06400 0.40000 5 0.12500 1 0.50000 0.25000 0.12500 0.50000

In addition to adding the predicted values, **py**, to the dataset, we can see that a new variable, **ty**, has been added for the “transformed” value of **y** (since our transformation was the identity, these values are the same as **y**); three variables (**x_1**, **x_2**, **x_3**) that are the powers of **x** have been added. Transformations of these three variables and the intercept are also included and indicated with a ‘**t**‘. We can see that, by default, SAS fits a single third-degree polynomial in **x** to **y**. Note that though splines are often used to fit piecewise functions, the default setting when using **pspline** in **proc transreg** is to estimate just one function (zero knots).

We can plot the predicted values to see how closely they match the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; proc gplot data = a2; plot (y py)*x / overlay legend = legend; run;

For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming **x** in the same manner as **proc transreg**.

data a3; set a; x2 = x*x; x3 = x*x*x; run; proc reg data = a3; model y = x x2 x3; run;The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7955.26078 2651.75359 57.67 <.0001 Error 196 9012.65604 45.98294 Corrected Total 199 16968 Root MSE 6.78107 R-Square 0.4688 Dependent Mean 12.04335 Adj R-Sq 0.4607 Coeff Var 56.30551 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 24.76908 1.95451 12.67 <.0001 X 1 -6.60903 0.84002 -7.87 <.0001 x2 1 0.62721 0.09698 6.47 <.0001 x3 1 -0.01513 0.00317 -4.77 <.0001

In this example, using **proc transreg** only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of **x**. Doing so with **data** and **proc reg** steps quickly becomes unmanageable or impossible, while doing so with **proc transreg** is effective and efficient.

Advertisements

%d bloggers like this: