There were 22,822 observations in the 2000 dataset and 24,060 observations in the 2001 dataset. For both datasets the same cleaning techniques were applied. As the decomposition methodology is multivariate, an observation requires information on all variables in the model in order to be included in the analysis. Instead of relying on the statistical process to remove the observations with missing values, or imputing the missing values, observations with missing values were removed prior to the analyses. Table 1 below shows the effect of the data cleaning process on observation numbers.
Table 1 Data Cleaning Steps and Numbers of Observations Affected
Cleaning Step |
2000 Dataset |
2001 Dataset |
Initial number of observations |
37,386 |
38,930 |
|
-7,462 |
-7,549 |
|
- 362 |
- 453 |
|
- 230 |
- 163 |
|
-4,037 |
-3,996 |
|
-2,181 |
-2,359 |
|
- 59 |
- 103 |
|
- 233 |
- 247 |
Observations included in analysis |
22,822 |
24,060 |
14 Employees who terminated their employment between 1 July and 30 June of the analysis year, and employees on secondment, leave without pay, or parental leave as at 30 June. It is assumed that the salary information provided for current employees reflects the actual salary as at 30 June, whereas the salary information for non-current employees is influenced by additional factors (e.g., no incremental pay increase in the current year).
15 Due to the small number of part-time employees, these people were excluded from the analysis rather than incorporating an indicator variable indicating full-time status into the model.
16 These are occupations that comprise solely women or solely men. Refer Appendix A.