Modeling and forecasting of the under-five mortality rate in Kermanshah province in Iran: a time series analysis
Article information
Abstract
OBJECTIVES:
The target of the Fourth Millennium Development Goal (MDG-4) is to reduce the rate of under-five mortality by two-thirds between 1990 and 2015. Despite substantial progress towards achieving the target of the MDG-4 in Iran at the national level, differences at the sub-national levels should be taken into consideration.
METHODS:
The under-five mortality data available from the Deputy of Public Health, Kermanshah University of Medical Sciences, was used in order to perform a time series analysis of the monthly under-five mortality rate (U5MR) from 2005 to 2012 in Kermanshah province in the west of Iran. After primary analysis, a seasonal auto-regressive integrated moving average model was chosen as the best fitting model based on model selection criteria.
RESULTS:
The model was assessed and proved to be adequate in describing variations in the data. However, the unexpected presence of a stochastic increasing trend and a seasonal component with a periodicity of six months in the fitted model are very likely to be consequences of poor quality of data collection and reporting systems.
CONCLUSIONS:
The present work is the first attempt at time series modeling of the U5MR in Iran, and reveals that improvement of under-five mortality data collection in health facilities and their corresponding systems is a major challenge to fully achieving the MGD-4 in Iran. Studies similar to the present work can enhance the understanding of the invisible patterns in U5MR, monitor progress towards the MGD-4, and predict the impact of future variations on the U5MR.
INTRODUCTION
The under-five mortality rate (U5MR) is a key pointer of child well-being including health status, and is also a broad indicator of social and economic progress [1,2]. It is one of the main indicators for assessing and monitoring progress in child health status with respect to the United Nations’ Millennium Development Goal 4 [1-4]. The target of the Fourth Millennium Development Goal (MDG-4) is to reduce the U5MR by two-thirds between 1990 and 2015 [2,5]. The U5MR is defined as the probability (per 1,000 live births) that a child will die before reaching the age of five if subject to current age-specific mortality rates [2]. Despite global population growth in recent decades, substantial progress has been made towards achieving MDG-4 in most countries of the world [1,2,6], as well as in Iran [6-9]. The number of under-five deaths worldwide has declined from nearly 12.7 million in 1990 to 6.3 million in 2013 [2]. Consequently, the global U5MR has dropped from almost 90 deaths per 1,000 live births in 1990 to 46 in 2013 [2]. However, this progress is not equally distributed at national and sub-national levels [2,4,10].
Iran is a country that has experienced considerable reductions in the U5MR over the past decades [11]. In Iran, the U5MR was estimated as 281 per 1,000 live births in 1970 to 1975 [6] and declined from 46 per 1,000 live births in 1993 to 25 in 2005 [2]. Despite these gains, it has been seen that the decline in the U5MR across the country is heterogeneous and unequally distributed among provinces [2,12,13]. For example, in 2004, the estimate of the U5MR using data from the death registration system of the Ministry of Health and Medical Education was 28 per 1,000 live births for the whole country, while it was 32 per 1,000 live births in Kermanshah province in the west of Iran [11]. Thus, assessing fluctuations in the U5MR at the national level might not be sufficient and sub-national studies can provide a more detailed understanding of heterogeneity in the U5MR across the country. Moreover, since the U5MR is not constant over time, it is also important to understand how the rate evolves over time and explain its stochastic variations using a valid statistical model.
The main purpose of this study was to detect any statistically significant features in the monthly U5MR in Kermanshah province from 2005 to 2012 and find an appropriate time series model that could adequately explain variations in the monthly U5MR. The study period of 2005 to 2012 was chosen because the child mortality data were available to the authors only for these years at the time of conducting the study.
MATERIALS AND METHODS
Socio-demographic characteristics
Kermanshah province is an undeveloped province located in the west of Iran and comprising 14 counties. Based on the population and housing census data of 2011, the population of Kermanshah province was 1,945,227 residents (about 2.6% of the Iranian population, 78 persons/km2 and 69.7% urbanization rate) with mainly Kurdish ethnic background [14].
Ethics statement
This study is based on the digital file of under-five mortality records available from the Deputy of Public Health, Kermanshah University of Medical Sciences. The data were anonymized and deidentified before usage and hence no informed consent was required for this work.
Data source
The monthly frequencies of under-five mortality in Kermanshah province were extracted from the digital file of the province mortality records from 2005 to 2012 (based on Iranian calendar time) in the provincial health jurisdiction. In this mortality digital file, under-five death records were collected from hospitals, health centers and health houses located in the province. There are few studies on sub-national child mortality in Iran, but a published report by the Iranian Ministry of Health showed that in 2010, more than 91% of mortality in the neonatal period and more than 63% of mortality 1 to 59 months after birth occurred in Iranian hospitals [15]. The frequencies were converted to rates per 1,000 live births using the number of under-five mortalities in each month as the numerator and births in the same month in the province as the denominator. Real times of live births were extracted from the National Organization for Civil Registration. A sequence of 96 monthly U5MRs was obtained and studied for temporal variations. Through the following steps, a time series analysis has been conducted on the data in order to identify structural patterns in the monthly U5MR in Kermanshah province from 2005 to 2012 and a short-term (six months) prediction has been made. See Appendix 1 and references therein for the technical details of the time series analysis.
Data preparation
Box-Cox transformation for assessment of stability in the data variance (p=0.20 for the null hypothesis of θ=1) shows that variance of the U5MR series is constant over time and no variance-stabilizing transformation is required. The Holt–Winters smoothing was applied to the U5MR series to examine any trends in the data. The time series plot of the original and the smoothed U5MR data are shown in Figure 1A. No seasonal or periodic components are clearly apparent in this plot but the smoothed series suggests an increasing trend, especially in the early years of the study period. Prais-Winsten regression also confirms (t=2.20, p=0.030) the presence of a significant linear trend with a positive slope. The augmented Dickey-Fuller unit root test accepts (lags=3, Z(t)=-2.51 and approximate p=0.112) the null hypothesis that the U5MR series has a unit root and indicates that the increasing trend in the data is stochastic and is caused by the effect of random shocks. Therefore, successive differencing could be performed on the U5MR to eliminate the unit root and hence the stochastic increasing trend and obtain a zero mean stationary time series. The first difference of the U5MR and its smoothed series are shown in Figure 1B. No trend is recognizable in the smoothed differenced series and the augmented Dickey-Fuller unit root test for the differenced series rejects (lags=3, Z(t) =-6.801 and approximate p<0.001) the null hypothesis of unit root and confirms that the first difference of the U5MR is a stationary time series.
Model identification and estimation
The portmanteau Q-test (Q=91.89, p<0.001) and the Bartlett’s periodogram-based test (B=2.46, p<0.001) for white noise reject the null hypothesis of no serial correlation among observations of the differenced U5MR. To check the structure of such correlations, the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) plots of the differenced U5MR series are shown in (A) and (B) of Figure 2. According to the plots, only the first lag of the ACF is significant (lying outside the grey 95% confidence bands) and the first few lags of PACF are decaying. The ACF and PACF of a first order moving average (MA), or auto-regressive integrated moving average (ARIMA) (0, 0, 1) time series have the same pattern and hence an ARIMA (0, 1, 1) model is appropriate for the original U5MR series. After fitting the ARIMA (0, 1, 1) model (Akaike information criterion [AIC]=534.2, Bayesian information criterion [BIC]=541.9) to the U5MR series and obtaining the model residuals, the ACF and PACF of the model residuals are plotted in Figure 2C and 2D. Both the ACF and PACF are significant at lag 6 (local spike) indicating that there is a six-monthly serial correlation in the data that the fitted ARIMA (0, 1, 1) model is not able to explain. This could be caused by a periodic (seasonal) component with period s=6. Thus, several seasonal auto-regressive integrated moving average (SARIMA) (p, d, q) (P, D, Q)6 models with different combinations of SARIMA (p, d, q) and (P, D, Q) orders were fitted to the U5MR series. Based on the corresponding AIC and BIC values of the fitted models, the best-fitting model was SARIMA (0, 1, 1) (0, 0, 1)6 with the smallest value of AIC=526.2 and BIC=533.9.
Diagnostic checking
According to Figure 3, there is no major discrepancy between the observed and expected U5MR from the fitted model (Figure 3A) and the model residuals vary randomly around zero (Figure 3B). In order to examine the adequacy of the fitted model, the ACF and PACF of the model residuals are shown in Figure (E) and (F) of Figure 2. All lags of ACF and PACF are within the 95% confidence bands, indicating that there is no correlation structure in the model residuals and hence that the fitted SARIMA model is adequate. This is also confirmed by the portmanteau Q-test (Q=37.4, p=0.59) and Bartlett’s periodogram-based test (B=0.60, p=0.86) for white noise, which accept the null hypothesis of no serial correlation in the model residuals. Moreover, the Shapiro-Wilk test (W=0.984, p=0.29), and Skewness-Kurtosis test (χ2=3.64, p=0.16) accept the normality assumption of the model residuals. Finally, the ARCH-LM test for heteroscedasticity in variance of the model residuals accepts (χ2= 0.004, p=0.951) the null hypothesis of no auto-regressive conditional heteroskedasticity effect. Therefore the overall goodness-of-fit of the model was evaluated.
All analyses were performed using STATA/SE version 12.0 (StataCorp, College Station, TX, USA) [16] and the significance level was chosen at p-value<0.05.
RESULTS
According to the best-fitting model, the U5MR at month t, xt, is determined by
where zt is a white noise process with standard deviation
The six months ahead (short-term) predictions of the U5MR and their corresponding 95% prediction intervals based on the fitted SARIMA model are reported in Table 2. In general, the prediction intervals are wider for time series with a stochastic trend [17] and here relatively wide ranges of prediction intervals indicate the high uncertainty associated with the predictions. Using the most recent and more accurate data, such predictions can be used for monitoring and forecasting progress towards MDG-4.
DISCUSSION
In this paper, time series analysis of monthly U5MR data in Kermanshah province in the west of Iran was conducted. According to our findings, the U5MR in Kermanshah province during the study period had a stochastic increasing trend. After primary analysis and applying necessary data adjustments, a SARIMA (0, 1, 1) (0, 0, 1)6 was selected as the best fitting model and model assessment and goodness-of-fit tests showed that the model can adequately explain the fluctuations in U5MR. Finally, predictions six months ahead of December 2012 and their corresponding prediction intervals were obtained based on the fitted model.
Mortality data sources in developing countries, such as Iran [15,18,19], often suffer from various data availability and data quality issues [1,3,4] and among these issues the under-reporting of deaths and misreporting of ages are common [2,18]. For example, despite considerable efforts to decrease the U5MR in the whole country [2,6-9,11,15], the monthly U5MR in Kermanshah province shows a significant increasing trend during the study period. As demonstrated in Figure 4, the yearly U5MR in Kermanshah province was clearly lower than the estimated national average in 2005 and then higher than the estimated national average from 2007 to 2012. This discrepancy is most likely caused by under-reporting and misreporting of data in the earlier years and improving quality in data collection and registration coverage of U5MR data in the last years of the study period, which is remarkable for the health care system of Kermanshah province. Although there are some demographic methods for assessing causes and estimation of death under-numeration, these methods assume balanced population growth [15]. Iran has undergone drastic demographic changes during the recent three decades. Population growth had a high rate 30 years ago and afterwards dropped to low values [20,21]. These changes make use of demographic methods for estimation of death under-numeration difficult [15]; thus authors were obliged to use the available data for this study.
Besides the possibility that the U5MR has been under-reported, another limitation of this study is that the available data cover a rather narrow time period of eight years due to restrictions in data availability. The third limitation of the study is that, although the model adequacy was evaluated by various statistical methods, the capability of the model for forecasting in short time periods and the accuracy of the predicted values by model are not assessed. The uncertainty of the predictions is assessed only by prediction intervals. However, it should be noticed that generating accurate estimates and predictions of child mortality is a considerable challenge for developing countries [1,2,4,10].
Despite its limitations, the greatest strength of the present study is that it is the first study in Iran that uses time series analysis in order to mathematically model and predict an out-of-sample U5MR. However, additional data from other parts of the country are needed to generalize the results to the national level. In 2015, the MDG-4 target for the Iranian U5MR is 19 deaths per 1,000 live births [2]. It is currently around one year before the 2015 deadline of the goal and substantial progress has been made, but the progress remains insufficient to achieve MDG-4, particularly in Kermanshah province. It can be seen from Figure 4 that despite considerable progress in the past two decades towards reducing the U5MR in Iran [2,10,11], as well as in Kermanshah province [11,15], the burden of child deaths is higher than the nationally estimated and expected levels in this province.
To achieve the MDG-4 on time, reducing under-five mortality inequities among Iranian provinces is an important priority. It seems that one of the major challenges ahead in fully achieving the MDG-4 in Iran is qualitative and quantitative improvement of under-five mortality data collection in health facilities and their corresponding systems. In general, for generating reliable mortality statistics at the sub-national levels it is necessary to improve the reporting and registering of child deaths by health facilities and to make sure that all child deaths that occur in health facilities are of critical importance. This is also crucial in understanding and measuring the impact and effectiveness of plans and interventions in this field. Since the definition of the U5MR indicator has not changed during the recent years, analyzing the invisible patterns and statistical modeling of this indicator within recent years is valuable and may enable health policy-makers to monitor the progress of Iranian child health status in order to evaluate the efficacy of health care systems. Such findings may be useful as they enhance the understanding of the current underlying patterns in the U5MR and monitoring of progress towards the MDG-4. Moreover, using studies similar to the present work, it is possible to predict the impact of future changes on this important child health indicator. Surely, with the expansion of study time periods and improvement of data collection methods, more accurate results will be obtained in the future.
Acknowledgements
The authors wish to thank Dr. Farshad Pourmalek, Epidemiologist, University of British Colombia, Canada, for his useful comments and technical support in the preparation of the paper.
Notes
The author has no conflicts of interest to declare for this study.
References
Appendices
Appendix 1. TIME SERIES ANALYSIS
Time series data
A sequence of observations collected at successive points in time, denoted by {xt}
Data preparation
Smoothing techniques are usually used to reduce irregular roughness (random fluctuations) and detect any persistent pattern such as long term trends or cycles in the data [17,23]. The most common types of smoothing techniques are moving averages (MA), single and double exponential smoothing and Holt-Winters exponential smoothing [16,25]. The Holt-Winters method assumes a linear trend with an intercept and slope that vary over time and filters out the irregular fluctuations using smoothing parameters (weights) [22,24]. The presence of a linear trend with a non-zero slope in the data can be tested using the Prais-Winsten regression method, which uses the generalized least-squares method to estimate the intercept and slope in a linear regression model by allowing errors to be serially correlated [16]. The trend can be deterministic or stochastic [17]. A time series with a deterministic trend reverts to the trend line in the long run and random shocks (fluctuations) have transitory effects on the level of the series [25]. A time series with a stochastic trend or a unit root is not trend reverting and random shocks have permanent effects on the level of the series [23,24]. The hypothesis that the time series contains a unit root (stochastic trend) can be tested using unit root tests like the Augmented Dickey-Fuller test [22,24]. In the case that the presence of a unit root is confirmed, successive differencing can be performed on the data to obtain a zero mean stationary time series. The first lag difference of data, yt=Δxt=xt-xt-1, can eliminate the unit root [17]. In addition, by subtracting the value of an earlier data from the value of a later data, the first lag differencing can also remove a linear deterministic trend in the data [23,25]. For more complex situations, higher orders of lag differencing might be needed to eliminate several unit roots and/or non-linear deterministic trends [17]. Besides trend, nonconstant variance can also cause nonstationarity [17]. Box-Cox transformation with parameter θ is defined as (xtθ-1)/θ and can be applied to time series in which the variance changes over time in order to stabilize the variance [17].
Model identification and estimation
The autoregressive integrated moving average (ARIMA) model by Box-Jenkins is the most well-known model for a wide variety of time series [25]. An ARIMA (p, d, q) model does not require the original time series xt to be stationary but it assumes that after d times differencing, the new series yt=Δdxt becomes a stationary zero mean time series. It also assumes that the current value of the differenced series yt is a linear combination of p previous values yt-1, yt-2,…,yt-p and q+1 values zt, zt-1, …, zt-q of a white noise process (independent and identically distributed random shocks with zero mean and common variance
The autoregressive order p is the number of terms in the model that describe the direct dependency among successive observations and the MA order q is the number of persistent random shocks that describe indirect dependency among successive observations [25]. They are identified by examining patterns in plots of the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series yt [23,25]. After identifying p and q, the model parameters φ1,…, φp, θ1,…, θq and
Seasonal or periodic cycles produce local spikes at specific lags in plots of ACF and PACF [23,25]. Any ARIMA model can be extended to include seasonality and is then called a seasonal autoregressive integrated moving average (SARIMA) model. A SARIMA (p, d, q) (P, D, Q)s model consists of two multiplicative components: a regular ARIMA (p, d, q) component and a seasonal component with autoregressive order P, MA order Q and difference order D [17,26,27]. The seasonal component is responsible for the recurrence of a recognizable pattern after the seasonal periods. The general model equation of a SARIMA (p, d, q) (P, D, Q)s model has p regular autoregressive parameters, q regular MA parameters, P seasonal autoregressive parameters and Q seasonal MA parameters [17]. Similar to the regular ARIMA model, ACF and PACF plots and AIC and BIC criteria can be used to identify the seasonal orders (P, D, Q), though in this case the situation is more complicated [17]. Using the maximum likelihood method, all p+q+P+Q+1 regular and seasonal parameters of the model can be estimated [23,24].
Diagnostic checking
After fitting the desired SARIMA model to the observed time series, the model residuals are obtained by subtracting the expected values of observations under the model,
Prediction and prediction intervals
Once a suitable model has been fitted to the data, the known structure of the model can be used to predict future values of the time series based on previously observed values with a recursive method [23]. Prediction error, the difference between the actual and the predicted values, is inevitable and hence the uncertainty of the predictions must be considered; the farther the prediction beyond the observed values, the less reliable the prediction. For this reason, an interval that with a certain probability (for example 95%) will contain the actual future value, called the prediction interval (PI), is also reported for each prediction [25]. The PI is always wider than the corresponding confidence interval because of the added uncertainty involved in predicting a random variable versus its mean [23,24].