**Proc transreg** performs transformation regression in which both the
outcome and predictor(s) can be transformed and splines can be fit. Splines are
piecewise polynomials that can be used to estimate relationships that are
difficult to fit with a single function.

In this page, we will walk through an example using some of the most commonly
used options of **proc transreg**. For more information on the options
available, see the SAS
Online Documentation.

We can begin by creating a dataset with an outcome Y and a predictor X. This
example data is generated in the SAS examples for **proc transreg**.

data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc gplot data = a; plot y*x; run;

Clearly, there is not a single
function relating Y to X. The relationship does not appear random, but it
does appear to change with X. Thus it makes sense to try to fit this with
splines. We will start with the SAS defaults and then show how you can
specify the number of polynomials (pieces) you wish to fit, the degree of the
polynomials you wish to fit. Before running the **proc transreg**, we
can see that our data contains four variables:

proc print data = a (obs = 5); run;Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241

In the **proc transreg** command, we indicate in the **model** line
that we wish to predict variable **y** without transformation with **
identity(y)**. If we wished to model a transformed version of **y **(the
log or rank of **y**, for example), we would indicate the transformation
here. To predict **y,** we indicate that we wish to use piecewise
polynomial functions of **x **with **pspline(x)**. There are many other
options available that will not be covered in this page. We chose **
pspline** because it is the most commonly used and easy-to-understand option.
We also opted to output a dataset, **a2**, containing predicted values from
the model.

proc transreg data=a; model identity(y) = pspline(x); output out = a2 pprefix = p; run;The TRANSREG Procedure TRANSREG Univariate Algorithm Iteration History for Identity(Y) Iteration Average Maximum Criterion Number Change Change R-Square Change Note ------------------------------------------------------------------------- 1 0.00000 0.00000 0.46884 Converged

We can see in the outcome above that the model converged and has an R-squared
value of 0.47. Let’s look at the dataset output by **proc transreg**.

proc print data = a2 (obs = 5); run;

Obs _TYPE_ _NAME_ Y TY pY Intercept X_1 X_2 1 SCORE ROW1 24.7694 24.7694 24.1144 1 0.10000 0.01000 2 SCORE ROW2 24.4427 24.4427 23.4722 1 0.20000 0.04000 3 SCORE ROW3 24.0234 24.0234 22.8424 1 0.30000 0.09000 4 SCORE ROW4 23.5155 23.5155 22.2249 1 0.40000 0.16000 5 SCORE ROW5 22.9241 22.9241 21.6195 1 0.50000 0.25000 Obs X_3 TIntercept TX_1 TX_2 TX_3 X 1 0.00100 1 0.10000 0.01000 0.00100 0.10000 2 0.00800 1 0.20000 0.04000 0.00800 0.20000 3 0.02700 1 0.30000 0.09000 0.02700 0.30000 4 0.06400 1 0.40000 0.16000 0.06400 0.40000 5 0.12500 1 0.50000 0.25000 0.12500 0.50000

In addition to adding the predicted values, **py**, to the dataset, we can see that a
new variable, **ty**, has been added for the "transformed" value of **
y** (since our transformation was the identity, these values are the same as
**y**); three variables (**x_1**, **x_2**, **x_3**) that are the
powers of **x** have been added. Transformations of these three
variables and the intercept are also included and indicated with a ‘**t**‘.
We can see that, by default, SAS fits a third-degree polynomial in **x** to
**y**. We can plot the predicted values to see how closely they match
the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; proc gplot data = a2; plot (y py)*x / overlay legend = legend; run;

If we want SAS to fit more than one polynomial, we can indicate that by specifying a number of "knots". A knot is a point at which one polynomial ends and another begins. Similarly, we can indicate the degree of the polynomials to be fit. Generally, as we increase the number of knots or number of degrees, we are able to generate functions that more closely fit the data. This improved fit comes at the cost of estimating more parameters.

Let us look at how we can specify degrees and knots to achieve different types of models.

One might believe that **x** and **y** are linearly related, but that
the slope of the line changes at some point in **x**. In such a
situation, you can fit one one-degree (straight line) polynomial up to the given
point in **x** (the "knot") and another one-degree polynomial from that point
on. The **proc transreg** code for this model and a plot of the results
are below.

proc transreg data=a; model identity(Y) = pspline(X / nknots=1 degree = 1); output out = k1d1 PPREFIX =k1d1; run; proc gplot data = k1d1; plot (y k1d1y)*x / overlay legend = legend; run;

We can improve the fit of the above model by fitting two second-degree polynomials:

proc transreg data=a; model identity(Y) = pspline(X / nknots=1 degree = 2); output out = k1d2 PPREFIX =k1d2; run; proc gplot data = k1d2; plot (y k1d2y)*x / overlay legend = legend; run;

The table below shows how the fit improves with the number of knots and degrees.

Knots | Degrees | R^{2} |

0 | 1 | 0.10061 |

0 | 2 | 0.40720 |

0 | 3 | 0.46884 |

1 | 1 | 0.47545 |

1 | 2 | 0.46467 |

2 | 1 | 0.41828 |

2 | 2 | 0.51827 |

2 | 3 | 0.55391 |

3 | 1 | 0.50651 |

3 | 2 | 0.53603 |

These are the most basic examples of **proc transreg **using polynomial
splines. SAS offers many examples that employ other options in its
online
documentation.