A Note on the Finite Sample Properties of Ridge Estimator
Madhulika Dube1*, Isha2**, Vinod Kumar3**
Abstract: The article studies the finite sample properties of generalized ridge regression estimator using a different approach. A comparative study of relative bias and relative efficiency of the estimator with respect to the ordinary least square have been made empirically. The results have also been compared with the existing results and are found to be quite different from those already existing in literature.
Keywords: Ordinary Least Squares, Generalized Ridge Regression (GRR), Relative Bias, Relative MSE, Relative Efficiency, Finite Sample Properties.
Introduction
In linear regression models, Ridge Regression is perhaps the most widely used technique in the presence of multicollinearity. Proposed by Hoerl and Kennard [3], [4] the Ridge Regression Estimator is characterized by a scalar, the choice of which is subjective requiring the judgment of the analyst. However, working with the canonical form of the regression model Hoerl and Kennard [3] defined general ridge regression estimator suggesting an initial choice of the characterizing scalar. Extensive work has been carried out since then, a good account of which is available in Vinod and Ullah [7] and [6]. Working with the initial choice of the characterizing scalar, Dwivedi et al. [1] worked out the first two moments of individual coefficients of GRR estimator assuming error distribution to be normal. Hemmerle and Carey[2] also worked out exact properties of two different forms of GRR estimators but demonstrated that the one suggested by Hoerl and Kennard [3] performs better in terms of relative bias and relative risk. It will not be out of context to mention that the estimators perform differently when sample size is small and it is more so in the presence of multicollinearity as negative effects of multicollinearity are magnified in smaller samples. Owing to this, assuming error distribution to be normal the paper attempts to assess the finite sample behavior of generalized ridge regression. For this purpose the relative bias, relative mean squared error and the relative efficiency of the estimator in comparison to OLS have been evaluated numerically and compared with the existing results. Interestingly, the expressions of relative bias and relative risk are found to be different from the existing results obtained by Dwivedi et al. [1]. The following section describes the estimator and its properties and empirically enumerates the results. A brief outline of proof of the theorem is provided in the end.
The Estimator and its Properties
Consider the canonical form of the linear regression model
(2.1)
where y is an vector of observation on the dependent variables, X is an full column rank matrix of observation of explanatory variables, and vector of unknown regression coefficients . The elements of disturbance vector u are assumed to be i.i.d each following normal distribution with mean zero and variance σ2 so that
(2.2)
Following [3], we can write
This canonical reduction can be obtained by using singular value decomposition of the matrix X (see, [7], p5-6). Using it the general ridge regression estimator is given by
(2.3)
and K is a diagonal matrix with nonnegative elements k1,k2,…kp as the characterizing scalar, and
is the ordinary least square estimator of . Clearly, GRR is a biased estimator, with bias vector
(2.4)
and
(2.5)
as and K are assumed to be diagonal matrices, is also diagonal and
(2.6)
Provided ‘s are non-stochastic.
Now minimizing the expression (2.6) term by term i.e. minimizing the diagonal elements of the mean squared error matrix of (2.5) with respect to yields
(2.7)
[3] Suggested to start with
(2.8)
where is the ith element of the ordinary least squares estimator b of and
(2.9)
is an unbiased estimator of where
Using (2.8) in (2.3) leads to an adaptive estimator of as
(2.10)
where the i th element of is given by
(2.11)
For finite sample sizes, the following theorem gives the first and second moments of
Theorem: Assuming normality of errors the first and second moments of of (2.11) are given by
where is the non-centrality parameter. Using (2.12) and (2.13) we can compute the bias and mean squared error of .
Proof: see Appendix.
Using these, it is easy to compute the relative bias and relative mean squared error using
(2.14)
and
(2.15)
Respectively. The efficiency of the OLS relative to GRR estimator is obtained from
(2.16)
The values of and have been tabulated for a few selected vales of and .These results are provided in Tables 2.1, 2.2, 2.3 respectively and have been graphed for selected values of and.
The expressions (2.12) and (2.13) are clearly different from those obtained by Dwivedi et al. [1] and therefore a substantial difference is observed numerically. Unlike the results obtained by Dwivedi et al. [1], the magnitude of relative bias is found to be decreasing function of the non-centrality parameter and an increasing function of degrees of freedom, so long as and . However, for the magnitude of relative bias is found to be increasing. Interestingly, as increases the relative bias tends to -1, justification of this easily comes from the fact that the Ridge regression estimator shrinks the parameter vector towards zero. The relative MSE and relative efficiency are also observed to be decreasing for specific values of and Hence, the finite sample properties of the Ridge regression estimator are not only heavily dependent upon the non-centrality parameter but on the degrees of freedom as well. It is also pertinent to mention that ambiguity in the numerical computations in relative bias, relative MSE and relative efficiency are found in the paper by Dwivedi et al. [1] when which are evident in the respective tables.
Table 2.1: Relative Bias for specific values of non-centrality parameter and degrees of freedom
|
|
1 |
2 |
5 |
10 |
20 |
50 |
0.01 |
E0 |
-0.24917 |
-0.28795 |
-0.46717 |
-0.73385 |
-0.9078 |
-0.98448 |
E1 |
-0.249 |
-0.287 |
-0.318 |
-0.33 |
-0.337 |
-0.353 |
0.05 |
E0 |
-0.24588 |
-0.28419 |
-0.47043 |
-0.73901 |
-0.91041 |
-0.98502 |
E1 |
-0.246 |
-0.283 |
-0.313 |
-0.325 |
-0.332 |
-0.348 |
0.1 |
E0 |
-0.24187 |
-0.27962 |
-0.47464 |
-0.74536 |
-0.91358 |
-0.98566 |
E1 |
-0.242 |
-0.278 |
-0.307 |
-0.319 |
-0.326 |
-0.341 |
0.5 |
E0 |
-0.21306 |
-0.24776 |
-0.5126 |
-0.79201 |
-0.93533 |
-0.98992 |
E1 |
-0.213 |
-0.243 |
-0.268 |
-0.278 |
-0.283 |
-0.296 |
0.7 |
E0 |
-0.2006 |
-0.2347 |
-0.53362 |
-0.81263 |
-0.94412 |
-0.99155 |
E1 |
-0.201 |
-0.228 |
-0.251 |
-0.26 |
-0.265 |
-0.276 |
0.9 |
E0 |
-0.18924 |
-0.22337 |
-0.55544 |
-0.83154 |
-0.95175 |
-0.99292 |
E1 |
-0.189 |
-0.215 |
-0.236 |
-0.244 |
-0.248 |
-0.258 |
1 |
E0 |
-0.18394 |
-0.21832 |
-0.56653 |
-0.84038 |
-0.95518 |
-0.99352 |
E1 |
-0.184 |
-0.208 |
-0.228 |
-0.236 |
-0.241 |
-0.25 |
2 |
E0 |
-0.14196 |
-0.18716 |
-0.67665 |
-0.90889 |
-0.97873 |
-0.99732 |
E1 |
-0.142 |
-0.158 |
-0.172 |
-0.177 |
-0.18 |
-0.185 |
5 |
E0 |
-0.1107 |
-0.25493 |
-0.89913 |
-0.98566 |
-0.99789 |
-0.99982 |
E1 |
-0.08 |
-0.086 |
-0.091 |
-0.093 |
-0.094 |
-0.095 |
10 |
E0 |
-0.56774 |
-0.70662 |
-0.99125 |
-0.99952 |
-0.99996 |
-1 |
E1 |
-0.045 |
-0.047 |
-0.048 |
-0.049 |
-0.049 |
-0.049 |
50 |
E0 |
-1 |
-1 |
-1 |
-1 |
-1 |
-1 |
E1 |
-0.014 |
-0.014 |
-0.014 |
-0.014 |
-0.014 |
-0.014 |
E0 –Results by our approach
E1-Results by Dwivedi et al. [1].
Table 2.2: Relative MSE for specific values of non-centrality parameter and degrees of freedom
|
|
1 |
2 |
5 |
10 |
20 |
50 |
0.01 |
E0 |
31.53006 |
28.35765 |
19.30435 |
8.598509 |
2.945015 |
1.17679 |
E1 |
31.53 |
28.423 |
28.85 |
24.837 |
24.267 |
22.754 |
0.05 |
E0 |
6.525343 |
5.922473 |
4.079349 |
2.146142 |
1.245423 |
1.010629 |
E1 |
6.525 |
5.94 |
5.454 |
5.263 |
5.155 |
4.87 |
0.1 |
E0 |
3.394597 |
3.111778 |
2.178566 |
1.348879 |
1.03842 |
0.991026 |
E1 |
3.395 |
3.123 |
2.898 |
2.809 |
2.758 |
2.626 |
0.5 |
E0 |
0.853875 |
0.819369 |
0.686614 |
0.779676 |
0.909809 |
0.982917 |
E1 |
0.854 |
0.827 |
0.804 |
0.795 |
0.789 |
0.776 |
0.7 |
E0 |
0.657999 |
0.63841 |
0.598543 |
0.768053 |
0.914648 |
0.985032 |
E1 |
0.658 |
0.646 |
0.635 |
0.631 |
0.628 |
0.622 |
0.9 |
E0 |
0.543213 |
0.530987 |
0.561328 |
0.774361 |
0.922711 |
0.987153 |
E1 |
0.543 |
0.538 |
0.533 |
0.531 |
0.53 |
0.527 |
1 |
E0 |
0.501285 |
0.570042 |
0.552523 |
0.780415 |
0.927023 |
0.988142 |
E1 |
0.501 |
0.498 |
0.495 |
0.494 |
0.493 |
0.492 |
2 |
E0 |
0.289568 |
0.292192 |
0.593021 |
0.858859 |
0.962787 |
0.994925 |
E1 |
0.29 |
0.292 |
0.294 |
0.295 |
0.295 |
0.296 |
5 |
E0 |
0.118409 |
0.213709 |
0.854008 |
0.975889 |
0.996133 |
0.999642 |
E1 |
0.123 |
0.123 |
0.124 |
0.124 |
0.124 |
0.124 |
10 |
E0 |
0.455941 |
0.620464 |
0.986397 |
0.999155 |
0.999928 |
0.999996 |
E1 |
0.059 |
0.058 |
0.058 |
0.058 |
0.057 |
0.057 |
50 |
E0 |
1 |
1 |
1 |
1 |
1 |
1 |
E1 |
0.013 |
0.013 |
0.013 |
0.013 |
0.013 |
0.013 |
E0 –Results by our approach
E1-Results by Dwivedi et al. [1].
Graph: Showing relative bias and relative MSE of GRR Estimator (Figure 1 and Figure2)
Table 2.3: Relative Efficiency for specific values of non-centrality parameter and degrees of freedom
|
|
1 |
2 |
5 |
10 |
20 |
50 |
0.01 |
E0 |
63.06011 |
56.7153 |
38.6087 |
17.19702 |
5.890029 |
2.35358 |
E1 |
63.06 |
56.846 |
51.701 |
49.674 |
48.534 |
45.508 |
0.05 |
E0 |
65.25343 |
59.22473 |
40.79349 |
21.46142 |
12.45423 |
10.10629 |
E1 |
65.253 |
59.398 |
54.542 |
52.626 |
51.548 |
48.698 |
0.1 |
E0 |
67.89194 |
62.23557 |
43.57132 |
26.97759 |
20.76841 |
19.82052 |
E1 |
67.892 |
62.465 |
57.955 |
56.17 |
55.167 |
52.527 |
0.5 |
E0 |
85.3875 |
81.93687 |
68.66142 |
77.96756 |
90.98089 |
98.29169 |
E1 |
85.388 |
82.69 |
80.394 |
79.46 |
78.937 |
77.632 |
0.7 |
E0 |
92.11985 |
89.37737 |
83.79599 |
107.5274 |
128.0507 |
137.9045 |
E1 |
92.12 |
90.405 |
88.907 |
88.282 |
87.936 |
87.106 |
0.9 |
E0 |
97.77838 |
95.57758 |
101.039 |
139.385 |
166.088 |
177.6875 |
E1 |
97.779 |
96.845 |
95.985 |
95.607 |
95.403 |
94.949 |
1 |
E0 |
100.2571 |
114.0084 |
110.5045 |
156.0829 |
185.4047 |
197.6283 |
E1 |
100.258 |
99.65 |
99.056 |
98.783 |
98.638 |
98.34 |
2 |
E0 |
115.8271 |
116.8768 |
237.2084 |
343.5434 |
385.1149 |
397.9702 |
E1 |
115.883 |
116.927 |
117.683 |
117.94 |
118.099 |
118.569 |
5 |
E0 |
118.4089 |
213.7089 |
854.0076 |
975.8885 |
996.1333 |
999.6423 |
E1 |
123.245 |
123.494 |
123.602 |
123.635 |
123.653 |
123.833 |
10 |
E0 |
911.8829 |
1240.929 |
1972.794 |
1998.311 |
1999.857 |
1999.992 |
E1 |
117.495 |
116.319 |
115.394 |
115.026 |
114.891 |
114.656 |
50 |
E0 |
10000 |
10000 |
10000 |
10000 |
10000 |
10000 |
E1 |
129.538 |
128.682 |
128.197 |
128.039 |
127.988 |
127.827 |
E0 –Results by our approach
E1-Results by Dwivedi et al. [1].
Appendix
In order to find the expression (2.12) of the theorem, let us define
(A.1)
where xi (i=1,2…p) is the ith column vector of X. Since is the OLS estimator of following
N(), therefore, the distribution of where .
Next, the distribution of is with degrees of freedom and is independent of the distribution of. Using, we can write
(A.2)
(A.3)
Following [5] we can write the above equation as
We notice that the value of integral is zero for odd values of j because then the power of z is odd. Dropping such terms we have
(A.5)
Using the duplication formula above expression becomes
(A.6)
The integral part
is computed using the transformations
which gives the integral part as
(A.7)
Substituting the above value of (A.7) and using it in (A.6) we get the first raw moment of as given in the theorem.
Proceeding in the same way the second and higher raw moments of can be obtained.
References
- Dwivedi, T. D. Srivastava, V. K. Hall, R. L., “Finite Sample Properties of Ridge Estimators”. Technometrics, Vol. 22, No. 2. Pp205-212, 1980.
- Hemmerle, W. J. and Carey, M.B., "Some Properties of Generalized Ridge estimators" Communications in Statistics: Simulation and Computation. 12:3, 239-253, 1983.
- Hoerl, A. E. and Kennard, R. W., “Ridge Regression Biased Estimation for Non-Orthogonal Problems”. Technometrics, 12, 55-67, 1970a.
- Hoerl, A. E. and Kennard, R. W. “Ridge Regression Applications to Non-Orthogonal Problems”. Technometrics, 12, 69-82, 1970b.
- Judge, G.G. and Bock, M.E., the Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. North Holland Publishing Company, Amsterdam, 1978.
- Marvin, H.J. Gruber, Regression Estimators: A Comparative Study. The John Hopkins University Press, 2010.
- Vinod, H.D. and Ullah, A., Recent Advance in Regression Analysis. Marcel Dekker, New York, 1981.