A Note on the Finite Sample Properties of Ridge Estimator
Madhulika Dube^{1*}, Isha^{2**}, Vinod Kumar^{3**}
Abstract: The article studies the finite sample properties of generalized ridge regression estimator using a different approach. A comparative study of relative bias and relative efficiency of the estimator with respect to the ordinary least square have been made empirically. The results have also been compared with the existing results and are found to be quite different from those already existing in literature.
Keywords: Ordinary Least Squares, Generalized Ridge Regression (GRR), Relative Bias, Relative MSE, Relative Efficiency, Finite Sample Properties.
Introduction
In linear regression models, Ridge Regression is perhaps the most widely used technique in the presence of multicollinearity. Proposed by Hoerl and Kennard [3], [4] the Ridge Regression Estimator is characterized by a scalar, the choice of which is subjective requiring the judgment of the analyst. However, working with the canonical form of the regression model Hoerl and Kennard [3] defined general ridge regression estimator suggesting an initial choice of the characterizing scalar. Extensive work has been carried out since then, a good account of which is available in Vinod and Ullah [7] and [6]. Working with the initial choice of the characterizing scalar, Dwivedi et al. [1] worked out the first two moments of individual coefficients of GRR estimator assuming error distribution to be normal. Hemmerle and Carey[2] also worked out exact properties of two different forms of GRR estimators but demonstrated that the one suggested by Hoerl and Kennard [3] performs better in terms of relative bias and relative risk. It will not be out of context to mention that the estimators perform differently when sample size is small and it is more so in the presence of multicollinearity as negative effects of multicollinearity are magnified in smaller samples. Owing to this, assuming error distribution to be normal the paper attempts to assess the finite sample behavior of generalized ridge regression. For this purpose the relative bias, relative mean squared error and the relative efficiency of the estimator in comparison to OLS have been evaluated numerically and compared with the existing results. Interestingly, the expressions of relative bias and relative risk are found to be different from the existing results obtained by Dwivedi et al. [1]. The following section describes the estimator and its properties and empirically enumerates the results. A brief outline of proof of the theorem is provided in the end.
The Estimator and its Properties
Consider the canonical form of the linear regression model
(2.1)
where y is an vector of observation on the dependent variables, X is an full column rank matrix of observation of explanatory variables, and vector of unknown regression coefficients . The elements of disturbance vector u are assumed to be i.i.d each following normal distribution with mean zero and variance σ^{2} so that
(2.2)
Following [3], we can write
This canonical reduction can be obtained by using singular value decomposition of the matrix X (see, [7], p56). Using it the general ridge regression estimator is given by
(2.3)
and K is a diagonal matrix with nonnegative elements k_{1},k_{2},…k_{p} as the characterizing scalar, and
is the ordinary least square estimator of . Clearly, GRR is a biased estimator, with bias vector
(2.4)
and
(2.5)
as and K are assumed to be diagonal matrices, is also diagonal and
(2.6)
Provided ‘s are nonstochastic.
Now minimizing the expression (2.6) term by term i.e. minimizing the diagonal elements of the mean squared error matrix of (2.5) with respect to yields
(2.7)
[3] Suggested to start with
(2.8)
where is the i^{th} element of the ordinary least squares estimator b of and
(2.9)
is an unbiased estimator of where
Using (2.8) in (2.3) leads to an adaptive estimator of as
(2.10)
where the i ^{th} element of is given by
(2.11)
For finite sample sizes, the following theorem gives the first and second moments of
Theorem: Assuming normality of errors the first and second moments of of (2.11) are given by
where is the noncentrality parameter. Using (2.12) and (2.13) we can compute the bias and mean squared error of .
Proof: see Appendix.
Using these, it is easy to compute the relative bias and relative mean squared error using
(2.14)
and
(2.15)
Respectively. The efficiency of the OLS relative to GRR estimator is obtained from
(2.16)
The values of and have been tabulated for a few selected vales of and .These results are provided in Tables 2.1, 2.2, 2.3 respectively and have been graphed for selected values of and.
The expressions (2.12) and (2.13) are clearly different from those obtained by Dwivedi et al. [1] and therefore a substantial difference is observed numerically. Unlike the results obtained by Dwivedi et al. [1], the magnitude of relative bias is found to be decreasing function of the noncentrality parameter and an increasing function of degrees of freedom, so long as and . However, for the magnitude of relative bias is found to be increasing. Interestingly, as increases the relative bias tends to 1, justification of this easily comes from the fact that the Ridge regression estimator shrinks the parameter vector towards zero. The relative MSE and relative efficiency are also observed to be decreasing for specific values of and Hence, the finite sample properties of the Ridge regression estimator are not only heavily dependent upon the noncentrality parameter but on the degrees of freedom as well. It is also pertinent to mention that ambiguity in the numerical computations in relative bias, relative MSE and relative efficiency are found in the paper by Dwivedi et al. [1] when which are evident in the respective tables.
Table 2.1: Relative Bias for specific values of noncentrality parameter and degrees of freedom


1 
2 
5 
10 
20 
50 
0.01 
E0 
0.24917 
0.28795 
0.46717 
0.73385 
0.9078 
0.98448 
E1 
0.249 
0.287 
0.318 
0.33 
0.337 
0.353 
0.05 
E0 
0.24588 
0.28419 
0.47043 
0.73901 
0.91041 
0.98502 
E1 
0.246 
0.283 
0.313 
0.325 
0.332 
0.348 
0.1 
E0 
0.24187 
0.27962 
0.47464 
0.74536 
0.91358 
0.98566 
E1 
0.242 
0.278 
0.307 
0.319 
0.326 
0.341 
0.5 
E0 
0.21306 
0.24776 
0.5126 
0.79201 
0.93533 
0.98992 
E1 
0.213 
0.243 
0.268 
0.278 
0.283 
0.296 
0.7 
E0 
0.2006 
0.2347 
0.53362 
0.81263 
0.94412 
0.99155 
E1 
0.201 
0.228 
0.251 
0.26 
0.265 
0.276 
0.9 
E0 
0.18924 
0.22337 
0.55544 
0.83154 
0.95175 
0.99292 
E1 
0.189 
0.215 
0.236 
0.244 
0.248 
0.258 
1 
E0 
0.18394 
0.21832 
0.56653 
0.84038 
0.95518 
0.99352 
E1 
0.184 
0.208 
0.228 
0.236 
0.241 
0.25 
2 
E0 
0.14196 
0.18716 
0.67665 
0.90889 
0.97873 
0.99732 
E1 
0.142 
0.158 
0.172 
0.177 
0.18 
0.185 
5 
E0 
0.1107 
0.25493 
0.89913 
0.98566 
0.99789 
0.99982 
E1 
0.08 
0.086 
0.091 
0.093 
0.094 
0.095 
10 
E0 
0.56774 
0.70662 
0.99125 
0.99952 
0.99996 
1 
E1 
0.045 
0.047 
0.048 
0.049 
0.049 
0.049 
50 
E0 
1 
1 
1 
1 
1 
1 
E1 
0.014 
0.014 
0.014 
0.014 
0.014 
0.014 
E0 –Results by our approach
E1Results by Dwivedi et al. [1].
Table 2.2: Relative MSE for specific values of noncentrality parameter and degrees of freedom


1 
2 
5 
10 
20 
50 
0.01 
E0 
31.53006 
28.35765 
19.30435 
8.598509 
2.945015 
1.17679 
E1 
31.53 
28.423 
28.85 
24.837 
24.267 
22.754 
0.05 
E0 
6.525343 
5.922473 
4.079349 
2.146142 
1.245423 
1.010629 
E1 
6.525 
5.94 
5.454 
5.263 
5.155 
4.87 
0.1 
E0 
3.394597 
3.111778 
2.178566 
1.348879 
1.03842 
0.991026 
E1 
3.395 
3.123 
2.898 
2.809 
2.758 
2.626 
0.5 
E0 
0.853875 
0.819369 
0.686614 
0.779676 
0.909809 
0.982917 
E1 
0.854 
0.827 
0.804 
0.795 
0.789 
0.776 
0.7 
E0 
0.657999 
0.63841 
0.598543 
0.768053 
0.914648 
0.985032 
E1 
0.658 
0.646 
0.635 
0.631 
0.628 
0.622 
0.9 
E0 
0.543213 
0.530987 
0.561328 
0.774361 
0.922711 
0.987153 
E1 
0.543 
0.538 
0.533 
0.531 
0.53 
0.527 
1 
E0 
0.501285 
0.570042 
0.552523 
0.780415 
0.927023 
0.988142 
E1 
0.501 
0.498 
0.495 
0.494 
0.493 
0.492 
2 
E0 
0.289568 
0.292192 
0.593021 
0.858859 
0.962787 
0.994925 
E1 
0.29 
0.292 
0.294 
0.295 
0.295 
0.296 
5 
E0 
0.118409 
0.213709 
0.854008 
0.975889 
0.996133 
0.999642 
E1 
0.123 
0.123 
0.124 
0.124 
0.124 
0.124 
10 
E0 
0.455941 
0.620464 
0.986397 
0.999155 
0.999928 
0.999996 
E1 
0.059 
0.058 
0.058 
0.058 
0.057 
0.057 
50 
E0 
1 
1 
1 
1 
1 
1 
E1 
0.013 
0.013 
0.013 
0.013 
0.013 
0.013 
E0 –Results by our approach
E1Results by Dwivedi et al. [1].
Graph: Showing relative bias and relative MSE of GRR Estimator (Figure 1 and Figure2)
Table 2.3: Relative Efficiency for specific values of noncentrality parameter and degrees of freedom


1 
2 
5 
10 
20 
50 
0.01 
E0 
63.06011 
56.7153 
38.6087 
17.19702 
5.890029 
2.35358 
E1 
63.06 
56.846 
51.701 
49.674 
48.534 
45.508 
0.05 
E0 
65.25343 
59.22473 
40.79349 
21.46142 
12.45423 
10.10629 
E1 
65.253 
59.398 
54.542 
52.626 
51.548 
48.698 
0.1 
E0 
67.89194 
62.23557 
43.57132 
26.97759 
20.76841 
19.82052 
E1 
67.892 
62.465 
57.955 
56.17 
55.167 
52.527 
0.5 
E0 
85.3875 
81.93687 
68.66142 
77.96756 
90.98089 
98.29169 
E1 
85.388 
82.69 
80.394 
79.46 
78.937 
77.632 
0.7 
E0 
92.11985 
89.37737 
83.79599 
107.5274 
128.0507 
137.9045 
E1 
92.12 
90.405 
88.907 
88.282 
87.936 
87.106 
0.9 
E0 
97.77838 
95.57758 
101.039 
139.385 
166.088 
177.6875 
E1 
97.779 
96.845 
95.985 
95.607 
95.403 
94.949 
1 
E0 
100.2571 
114.0084 
110.5045 
156.0829 
185.4047 
197.6283 
E1 
100.258 
99.65 
99.056 
98.783 
98.638 
98.34 
2 
E0 
115.8271 
116.8768 
237.2084 
343.5434 
385.1149 
397.9702 
E1 
115.883 
116.927 
117.683 
117.94 
118.099 
118.569 
5 
E0 
118.4089 
213.7089 
854.0076 
975.8885 
996.1333 
999.6423 
E1 
123.245 
123.494 
123.602 
123.635 
123.653 
123.833 
10 
E0 
911.8829 
1240.929 
1972.794 
1998.311 
1999.857 
1999.992 
E1 
117.495 
116.319 
115.394 
115.026 
114.891 
114.656 
50 
E0 
10000 
10000 
10000 
10000 
10000 
10000 
E1 
129.538 
128.682 
128.197 
128.039 
127.988 
127.827 
E0 –Results by our approach
E1Results by Dwivedi et al. [1].
Appendix
In order to find the expression (2.12) of the theorem, let us define
(A.1)
where x_{i} (i=1,2…p) is the i^{th} column vector of X. Since is the OLS estimator of following
N(), therefore, the distribution of where .
Next, the distribution of is with degrees of freedom and is independent of the distribution of. Using, we can write
(A.2)
(A.3)
Following [5] we can write the above equation as
We notice that the value of integral is zero for odd values of j because then the power of z is odd. Dropping such terms we have
(A.5)
Using the duplication formula above expression becomes
(A.6)
The integral part
is computed using the transformations
which gives the integral part as
(A.7)
Substituting the above value of (A.7) and using it in (A.6) we get the first raw moment of as given in the theorem.
Proceeding in the same way the second and higher raw moments of can be obtained.
References
 Dwivedi, T. D. Srivastava, V. K. Hall, R. L., “Finite Sample Properties of Ridge Estimators”. Technometrics, Vol. 22, No. 2. Pp205212, 1980.
 Hemmerle, W. J. and Carey, M.B., "Some Properties of Generalized Ridge estimators" Communications in Statistics: Simulation and Computation. 12:3, 239253, 1983.
 Hoerl, A. E. and Kennard, R. W., “Ridge Regression Biased Estimation for NonOrthogonal Problems”. Technometrics, 12, 5567, 1970a.
 Hoerl, A. E. and Kennard, R. W. “Ridge Regression Applications to NonOrthogonal Problems”. Technometrics, 12, 6982, 1970b.
 Judge, G.G. and Bock, M.E., the Statistical Implications of PreTest and SteinRule Estimators in Econometrics. North Holland Publishing Company, Amsterdam, 1978.
 Marvin, H.J. Gruber, Regression Estimators: A Comparative Study. The John Hopkins University Press, 2010.
 Vinod, H.D. and Ullah, A., Recent Advance in Regression Analysis. Marcel Dekker, New York, 1981.