This category contains 1 post


When using PROC TRANSREG, what are the defaults with pspline?

Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Psplines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function. 

In this page, we will walk through an example proc transreg with the pspline option and explore its defaults.  The bspline, spline, and pspline options, when similarly specified, yield the same results.  Their differences lie in the number and type of transformed variables generated for estimation. 

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg

data a;
  do i=0 to 199;
    if mod(i,50)=0 then do;
      if i=150 then c=c+5;

proc gplot data = a;
  plot y*x;

Clearly, there is not a single, continuous function relating Y to X.  The relationship does not appear random, but it does appear to change with X.  Thus it makes sense to try to fit this with splines.  Before running the proc transreg, we can see that our data contains four variables:

proc print data = a (obs = 5); run;

Obs       X       I       C          Y
  1    0.10000    0    25.0000    24.7694
  2    0.20000    1    25.0000    24.4427
  3    0.30000    2    25.0000    24.0234
  4    0.40000    3    25.0000    23.5155
  5    0.50000    4    25.0000    22.9241

In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here.  To predict y, we indicate that we wish to use piecewise polynomial functions of x with pspline(x). We also opted to output a dataset, a2, containing predicted values from the model.

proc transreg data=a;
   model identity(y) = pspline(x);
   output out = a2 predicted;

The TRANSREG Procedure

     TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration    Average    Maximum                Criterion
   Number     Change     Change    R-Square       Change    Note
        1    0.00000    0.00000     0.46884                 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47.  Let’s look at the dataset output by proc transreg.

proc print data = a2 (obs = 5); run;
Obs  _TYPE_  _NAME_     Y        TY       PY    Intercept    X_1      X_2

  1  SCORE    ROW1   24.7694  24.7694  24.1144      1      0.10000  0.01000
  2  SCORE    ROW2   24.4427  24.4427  23.4722      1      0.20000  0.04000
  3  SCORE    ROW3   24.0234  24.0234  22.8424      1      0.30000  0.09000
  4  SCORE    ROW4   23.5155  23.5155  22.2249      1      0.40000  0.16000
  5  SCORE    ROW5   22.9241  22.9241  21.6195      1      0.50000  0.25000
  Obs    X_3      TIntercept      TX_1       TX_2       TX_3        X

  1  0.00100         1        0.10000    0.01000    0.00100    0.10000
  2  0.00800         1        0.20000    0.04000    0.00800    0.20000
  3  0.02700         1        0.30000    0.09000    0.02700    0.30000
  4  0.06400         1        0.40000    0.16000    0.06400    0.40000
  5  0.12500         1        0.50000    0.25000    0.12500    0.50000

In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty,  has been added for the “transformed” value of y (since our transformation was the identity, these values are the same as y); three variables (x_1, x_2, x_3) that are the powers of x have been added.  Transformations of these three variables and the intercept are also included and indicated with a ‘t‘.  We can see that, by default, SAS fits a single third-degree polynomial in x to y.  Note that though splines are often used to fit piecewise functions, the default setting when using pspline in proc transreg is to estimate just one function (zero knots). 

We can plot the predicted values to see how closely they match the original data. 

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; 
proc gplot data = a2;
   plot (y py)*x / overlay legend = legend;

For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming x in the same manner as proc transreg.

data a3; set a;
  x2 = x*x;
  x3 = x*x*x;

proc reg data = a3;
  model y = x x2 x3;
The REG Procedure
Model: MODEL1
Dependent Variable: Y

Number of Observations Read         200
Number of Observations Used         200

                             Analysis of Variance
                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3     7955.26078     2651.75359      57.67    <.0001
Error                   196     9012.65604       45.98294
Corrected Total         199          16968

Root MSE              6.78107    R-Square     0.4688
Dependent Mean       12.04335    Adj R-Sq     0.4607
Coeff Var            56.30551

                        Parameter Estimates
                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       24.76908        1.95451      12.67      <.0001
X             1       -6.60903        0.84002      -7.87      <.0001
x2            1        0.62721        0.09698       6.47      <.0001
x3            1       -0.01513        0.00317      -4.77      <.0001

In this example, using proc transreg only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of x.  Doing so with data and proc reg steps quickly becomes unmanageable or impossible, while doing so with proc transreg is effective and efficient.

%d bloggers like this: