8.2 Critique of Methodology

The study findings are based on the econometric Blinder-Oaxaca decomposition method. Section 5.4 of this report discusses the problems associated with decompositions methods and a summary of the relevant commentary is reproduced here. First, it has been argued that decomposition methods can only examine post-hiring wage discrimination. Even when occupation is incorporated as a set of indicator variables, occupation is assigned after hiring the employee. Therefore, the decomposition cannot measure discrimination in the hiring decision and, as such, possibly underestimates the gender pay gap (assuming that discrimination in hiring operates in favour of men at the expense of women).

Second, the method is affected by the index number problem, as the choice of reference group (male employees or female employees) affects the results. In practice, this problem is most frequently negated by all studies reporting results based on the male wage structure, and this was the approach adopted for this study. Thus the findings from this study are directly comparable to those from other studies using the Blinder-Oaxaca decomposition.

Third, as the decomposition is effectively a comparison of two identically specified regression models that typically incorporate categorical variables (such as ethnicity, occupation), an omitted reference category is required for each set of categorical variables. In some cases, the choice of reference category is obvious (e.g., NZ European for ethnicity), and in other cases the decision is much more arbitrary (e.g., the choice for employer, occupation). Where the decision is less clear-cut it could be preferable to include all categories. This option is not possible using a regression methodology as the model would be over-specified 25 . A related problem was that occupation indicator categories that contained only male or only female employees had to be removed from the analysis (this had no effect on the decomposition results as the mean and associated coefficient for the other gender's regression model are both zero, causing that occupation to provide no contribution to either the earnings function or the unexplained residual). In both these cases information contained in the dataset has been omitted from the analysis. Given these two problems associated with categorical variables and the large, artefactual effects of proxy variables on regression models, an examination of the usefulness of regression analyses on the gender pay gap would be timely.

Finally, as the regression models in the decompositions are based on the ordinary least squares method, the decomposition results are a statement about the effects on the earnings of the "average" male employee compared to the "average" female employee. These "average" results may not be reflective of the results for employees at other earnings percentiles (e.g., 10th percentile, 75th percentile) and, therefore, may be of less utility for understanding the earnings dynamics of employees earning away from the mean.

The decomposition does provide useful and interesting information. Rather than replacing the method with an alternative analysis technique, a more useful approach would appear to be supplementing the analysis with a method more attuned to incorporating categorical variables such as the Classification and Regression Trees (CART) method. A decision tree method seems to be ideally suited to such a research question as the group of interest (male versus female employees) is already known in the dataset and a set of binary decision rules (yes/no) are suitable for analysis that combine continuous and categorical data. A second major advantage of such a supplementary methodology is that it does not require the data to meet all the rigorous assumptions associated with regression such as the dependent variable having a normal distribution and constant variance. Alternative data mining techniques could also prove valuable.

25 The use of j-1 indicator variable categories is due to the requirements of the linear regression model that presumes the absence of perfect collinearity among the predictor variables (Hardy, 1993). This means that none of the predictor variables can be expressed as a perfect linear combination of the other predictor variables.

Last modified: