Forecasting realized volatility: The role of implied volatility, leverage effect, overnight returns, and volatility of realized volatility

We forecast realized volatility extending the heterogeneous autoregressive model (HAR) to include implied volatility (IV), the leverage effect, overnight returns, and the volatility of realized volatility. We analyze 10 international stock indices finding that, although a simple HAR model augmented with IV (HAR ‐ IV) is more accurate than any HAR model excluding it, all markets support further extensions of the HAR ‐ IV model. More accurate forecasts are found using overnight returns in all markets except the UK, the volatility of realized volatility in the US, and the leverage effect in five markets. A value ‐ at ‐ risk exercise supports the economic significance of our findings.

At the same time, the availability of high-frequency data has enabled the development of nonparametric daily estimators of volatility, known as realized volatility.The most popular such measure is the sum of squared intraday returns, after the seminal work of Andersen and Bollerslev (1998).For (at least) two reasons, much attention is devoted to realized volatility models over the last two decades.First, realized volatility is a more accurate measure of volatility which, unlike squared daily returns, ably captures the persistence of volatility.Second, well-known reduced form time series models (e.g., ARFIMA) based on realized volatility outperform GARCH and stochastic volatility models in forecasting future volatility (Andersen et al., 2003). 1  Due to its simplicity and ability to reproduce volatility persistence, the most popular realized volatility modeling approach is the Heterogeneous Autoregressive (HAR) model proposed by Corsi (2009).Several extensions of the standard HAR model are proposed to capture the stylized facts of volatility.First, Corsi and Renò (2012) allowing for the leverage effect, that is, that volatility increases more after a negative shock than an equal positive shock, find that its impact on realized volatility is significant for the S&P500 index. 2 Second, Wang et al. (2015) include overnight returns along with the leverage effect and find significant effects for the nighttime arrival of news and asymmetry in the Chinese stock market. 3Third, arguing that, apart from volatility itself, the volatility of realized volatility may cluster, Corsi et al. (2008) note that the inclusion of the time-varying volatility of realized volatility leads to improved S&P500 volatility forecasts.
While IV has been widely exploited in low-frequency volatility modeling, its role in realized volatility models has not been considered until the work of Busch et al. (2011) who, employing a standard HAR model, find that IV is a strong predictor of realized volatility.Buncic and Gisler (2016) show that VIX contains a substantial amount of predictive information for forecasting the realized volatility of international equity markets, augmenting each foreign market's HAR model with both forward-and backward-looking US volatility information. 4Despite such convincing evidence on the role of IV in a standard HAR model, the wider interaction between IV and the stylized facts of volatility in the family of HARtype models is unknown.While the above-discussed extensions of HAR are widespread, to the best of our knowledge, the corresponding extensions of an IV-augmented HAR (HAR-IV) have not been assessed.In this study we investigate the predictive performance of HAR extensions once IV is included and so consider HAR-IV extensions accounting for the leverage effect, overnight returns, and the volatility of realized volatility.Ultimately, the aim is to find the most accurate predictive specification, while also drawing conclusions on the individual and joint role of such extensions.
Unlike most of the existing studies on the role of IV in realized volatility models, which focus on the United States, we additionally consider stock market indices in Europe and Japan.All data are observed at daily frequency in the sample running from February 2, 2001 to June 27, 2019.
Our main results are as follows.First, the inclusion of IV in the standard HAR model substantially improves volatility forecasts in all 10 international stock markets, confirming the importance of IV information.Not only does the HAR-IV outperform the standard HAR but, in all markets, is also more accurate than any HAR model variant that does not include IV.Furthermore, using the Giacomini and Rossi (2010) testing approach for local superior predictive ability and so assessing how the relative forecasting performance of our models evolve, this result is consistent over our sample period.Second, no single extension of HAR-IV outperforms the others across all the markets.Nonetheless, overnight returns are beneficial for several indices in improving the accuracy of the predictive model.While the inclusion of IV does not impair the evidence of Corsi et al. (2008) in favor of including the volatility of realized volatility in the United States, no other market leads us to the same conclusion. 5More mixed evidence is found on the leverage effect which improves predictions only in five markets.Third, our main conclusions are unchanged once we apply our volatility forecasts to the practice of risk management within a value-at-risk (VaR) framework.
The remainder of the paper is organized as follows.In Section 2, we outline realized volatility and describe the predictive models together with the forecast evaluation techniques.The data are presented in Section 3. In Section 4, we present the empirical results of our volatility forecasting exercise, while VaR results are in Section 5. Section 6 concludes.
2 Similar results are provided by Martens et al. (2009) and Wang et al. (2015) for the S&P500 index and the Chinese stock market, respectively.
3 Gallo (2001) and Tsiakas (2008) also provide evidence about the predictive content of overnight returns for the subsequent daytime volatility.

4
The ability of option-implied information to forecast realized volatility has also been examined by Oikonomou et al. (2019) and Jeon et al. (2020).

5
More precisely, Corsi et al. (2008) find evidence of volatility of realized volatility in the S&P500.Accounting for IV we reach the same conclusion for both S&P500 and Dow Jones Industrial Average (DJIA)-but not for the less liquid Nasdaq100.
This section starts with a brief introduction to realized volatility.We then describe the HAR model of Corsi (2009) and its various extensions that will be used to assess the role of IV, leverage effect, overnight returns, and volatility of realized volatility on realized forecasts.We conclude this section with a description of the techniques we implement to evaluate the forecast performance of our models.

| Realized measure
Consider that the logarithm of an asset price at time t, p t , is a continuous-time diffusion process where μ t is a locally bounded predictable drift, W t is the Brownian motion, and σ t is a stochastic process independent of W t .For this log-price process, the integrated variance is As shown in Andersen et al. (2001) a natural estimator for the integrated variance is the sum of squared intraday returns, which is commonly known as realized variance (RV).The standard definition of the RV estimator of integrated variance is where m is the number of intraday returns during day t.Letting m → ∞, that is, in case of continuous sampling, RV t converges to the true integrated variance.
Subsequently, several alternatives to the standard RV have been proposed (see, Barndorff-Nielsen & Shephard, 2004;Hansen & Lunde, 2006, among others).However, the standard RV, based on 5-min intraday returns, remains the most popular in practical applications, for example, Bollerslev et al. (2009), Sévi (2014), Buncic and Gisler (2016), Bollerslev et al. (2018), Gong and Lin (2018), Jeon et al. (2020), andZhang et al. (2020).Notably, Liu et al. (2015) consider the problem of comparing the empirical accuracy of over 400 different volatility estimators constructed from high-frequency data across multiple assets and conclude that "it is difficult to significantly beat 5-minute RV." Therefore, following the sampling frequency used in much of the existing RV literature, we choose the 5-min RV.

| Modeling realized volatility
We adopt the HAR model of Corsi (2009) as our benchmark to model and forecast RV.Corsi (2009), inspired by the Heterogeneous Market Hypothesis of Muller et al. (1997), addresses the heterogeneity that arises from market participants operating across a spectrum of different trading frequencies.The main idea being that investors with different time horizons perceive, react to and cause different types of volatility.The HAR model is an additive cascade model of different volatility components (daily, weekly, and monthly) designed to mimic the trading frequencies of the different agents.
The HAR model is defined as where y log RV = ( ) 2 , ε t+1 is an innovation term and y t (d) , y t (w) , and y t (m) are the three realized volatility components-daily, weekly, and monthly, respectively.Our HAR benchmark model differs from the original HAR specification of Corsi (2009) in two dimensions.First, our model relies on the logarithmic RV, y t+1 instead of realized volatility per se, as in the original HAR formulation of Corsi (2009).This choice is motivated by the observation of Andersen et al. (2003) who note that the distribution of logarithmic RV is much closer to normal, and Andersen et al. (2007) who report similar results when RV, realized volatility, and their logarithms are used to estimate the HAR model parameters.Second, we use a slightly different lag structure for constructing the weekly and monthly volatility components to the one in Corsi (2009).The original HAR model regresses RV on the previous 1-, 5-, and 22-day average realized volatility.To avoid overlapping terms and, thus, ease interpretation, we use the numerically identical reparameterization implemented in Patton and Sheppard (2015).
Here, the weekly component is constructed based on log RV between lag 2 and 5, with the monthly component based on log RV between lag 6 and 22. 6 Thus, to formalize the structure of our HAR model for RV, let be the daily, weekly, and monthly components, respectively.One of the reasons for the popularity of the HAR model in Equation ( 4) is its simplicity.Notably, once the three volatility components have been constructed, the HAR model is estimated by ordinary least squares (OLS).
To explicitly capture the salient features of stock index RV, we extend the benchmark model in Equation ( 4) to allow for IV, the leverage effect, overnight returns, and the volatility of realized volatility.Hence, we consider a more general HAR specification of the form (5) where the model allows for a k-dimensional vector X of noted exogenous variables. 7 We examine the predictive power of IV on RV by augmenting the HAR model with IV similar to Busch et al. (2011).As the nature of the IV indices appears to be highly persistent, we use a HAR-type structure for IV.Of note, Fernandes et al. (2014) evaluate different HAR models to forecast the VIX index and find that it is difficult to beat the forecasting performance of the simple HAR process.Therefore, we add three components of IV in the HAR model (HAR-IV) by setting (m) in Equation ( 5).This allows us to capture the information content of IV across different maturities beyond that within the RV components.The log IV components are defined analogously to the RV components. 8 Corsi and Renò ( 2012) extend the HAR model by adding positive and negative daily, weekly, and monthly returns to capture the leverage effect.Thus, the HAR-L model is specified by setting − .As such, RV reacts asymmetrically to previous positive and negative daily returns of equal magnitude. 9 We capture the arrival of news at nighttime by adding the overnight return as an explanatory variable in the HAR model.Similar to Wang et al. (2015) we refer to HAR-O when we set β X β r ′ = t 1 over, .The overnight return, r t over, , is defined as the difference in the logs of the opening price on day t + 1 and the closing price on day t.
We also capture the conditional heteroskedasticity of the innovations of RV.Corsi et al. (2008) observe that the innovations of RV are not identically and independently distributed (iid), but exhibit volatility clustering.To account for the volatility of RV they extend the HAR model by incorporating a GARCH component (HAR-G).So the innovation term in Equation ( 5) is not a Gaussian white noise, but its variance is time-varying; ε h z The HAR model implemented in Patton and Sheppard (2015) has also been used by Prokopczuk et al. (2016) to evaluate the importance of jumps for forecasting RV for four energy futures.

7
We do not explicitly consider the role and results for jump components in this paper.Important contributions in this area include (Barndorff-Nielsen and Shephard (2005), Andersen et al. (2007), and Corsi and Reno (2012).Of note, Busch et al. (2011) find IV contains additional information about future S&P500 volatility and its continuous path and jump components beyond that in RV and its components.However, more recently, Buncic and Gisler (2017), using aggregate daily realized measures, assessed the role of jumps and leverage in predicting RV for 18 international stock markets.They find that the separation of RV into a continuous and a jump component is beneficial only for the S&P500 index and it has limited value for the non-US markets.Nonetheless, using the bi-power variation, we do consider jumps and find that while accounting for jumps improves performance over the standard HAR model for some indices, it does not improve performance compared with the preferred forecast models reported below.Therefore, we do not report these results in full and instead focus on components that have received less attention in the literature.Nonetheless, the results are available upon request.
8 Buncic and Gisler (2016) also augment the HAR model with the same three IV components to forecast RV.More recently, Plíhal and Lyócsa (2021) evaluate the role of IV calculated across different maturities (of up to 1 month) to similarly observe additional predictive information in forecasting the EUR/USD exchange rate RV. 9 In the original analysis, we also included positive and negative returns over the last week and month as in Corsi and Renò (2012).However, their parameters were not significant in most cases and thus were omitted.
Finally, we extend the models by allowing them to capture multiple RV features simultaneously. 10Thus, we have 16 HAR specifications to model and forecast RV.

| Forecast evaluation
The ability of the models described in Section 2.2 to accurately forecast RV is assessed using the quasi-likelihood (QLIKE) loss function.
where y ˆit is the prediction of y t based on t − 1 information obtained with the ith predictive model, T is the sample size, and N is the size of the estimation window.The model that yields the smallest average loss is the most accurate and therefore preferred.Because true volatility is latent, forecasted volatility must be compared against an ex post proxy of volatility that is imperfect in nature.Patton (2011) shows that the QLIKE loss function is robust in the sense that the ranking of the models is the same whether the ranking is done using the true volatility or some unbiased proxy, such as the RV presented in Equation ( 3).The QLIKE loss function is also used by Brownlees et al. (2011), Sévi (2014), and Tian et al. (2017), among others, to evaluate their forecasts.
To test for significant differences in predictive accuracy of the models, we employ the model confidence set (MCS) approach developed by Hansen et al. (2011) and the pairwise test proposed by Giacomini and White (2006).The MCS procedure determines the set that contains the best models from a full set of models M. Starting with the full set, M, the MCS procedure sequentially eliminates the models that are found to be significantly inferior until the null hypothesis of equal forecast accuracy is no longer rejected at the α significance level.The set of surviving models is the MCS, M ˆα 1− ⁎ , containing the best models from M at the α (1 − ) confidence level.Following Hansen et al. (2011), we test the null hypothesis of equal predictive ability using the range statistics and the semiquadratic statistics is the loss differential between the ith and the jth model, The Giacomini-White (GW) test of conditional predictive ability evaluates the forecasting performance of two competing models, accounting for parameter uncertainty.The null hypothesis of equal predictive ability is tested using the Wald statistic where Ω ˆT is the Newey and West (1987) HAC estimator of the asymptotic variance of d ij t , .The MCS procedure and the GW test help evaluate the model with the best forecast performance on average over the out-of-sample period.However, in unstable environments, the relative forecasting performance of the models may 10 Specifically, HAR-IVL, HAR-IVO, and HAR-IVG are extensions of the HAR-IV model that seek to capture the leverage effect, overnight returns, and the volatility of RV, respectively.HAR-LO and HAR-LG are extensions of the HAR-L model that capture overnight returns and the volatility of RV, respectively.HAR-OG extends the HAR model to account for both overnight returns and the volatility of RV.HAR-LOG is an extension of the HAR model to include simultaneously the leverage effect, overnight returns, and the volatility of RV.HAR-IVLO and HAR-IVLG are extensions of the HAR-IVL model to also include overnight returns and the volatility of RV, respectively.HAR-IVOG extends the HAR-IV specification to capture both overnight returns and the volatility of RV.Finally, HAR-IVLOG is the most sophisticated specification in this study that extends the simple HAR model to include simultaneously the IV, leverage effect, overnight returns, and the volatility of RV. change over time.To evaluate whether the relative predictive performance of the models is indeed time-varying we consider the cumulative forecast error of the ith predictive model (based on QLIKE loss) for each prediction t N N T = + 1, + 2, …, .In addition to cumFE it we apply the Giacomini and Rossi (2010) fluctuation test to formally test whether the models' relative performance is time-varying.We test the null hypothesis that the ith predictive model has equal predictive ability with HAR-IV using the test statistic: where o is the HAR-IV model and σ ˆ2 is the HAC standard error estimator of the variance of d k oi t , .Hence, the fluctuation test is similar to the GW test described before, but is computed over a rolling out-of-sample window of size k.If the max F t t k , is higher than the critical value the null hypothesis of local equal predictive ability between the two models is rejected. 11

| DATA
Our data set consists of daily RV data, open and close prices, and IV indices for 10 international equity markets.The daily RV measures that we employ are based on 5-min intraday returns and are obtained from the Oxford-Man Institute's Quantitative Finance Realized Library (Heber et al., 2009). 12The daily open and close prices, and the IV indices are collected from Thomson Reuters Datastream (2019).Since the various IV indices have been listed on different dates, we consider the period from February 2, 2001 to June 27, 2019 to study the indices over the same time period.
The 10 international stock markets we examine are: the S&P500 (US), the DJIA (US), the Nasdaq100 (US), the Euro STOXX50 (Euro area), the CAC40 (France), the DAX (Germany), the AEX (The Netherlands), the SMI (Switzerland), the FTSE100 (United Kingdom), and the Nikkei225 (Japan).The IV indices constructed from the market prices of options on these stock indices are the VIX, VXD, VXN, VSTOXX, VCAC, VDAX-New, VAEX, VSMI, VFTSE, and VXJ, respectively.They represent the expected market volatility over the subsequent 22 trading days.The CBOE VIX index is probably the most widely used example of option-implied information and its popularity spawned the introduction of IV indices around the world. 13 Table 1 provides the descriptive statistics for the (log) RV and IV as well as for daily returns of all data we use.The last six columns of the table provide the first to third order of the autocorrelation function (ACF) and partial ACF (PACF).The log RV appears approximately Gaussian with the skewness being between 0 and 1 and the kurtosis being around 3 for all indices.The only exception is the log RV series for STOXX50 where a kurtosis of 5.635 suggests slightly fatter tails than a Gaussian variable.As documented in other studies, the ACF and PACF values highlight the long memory characteristic of (realized) volatility.Similar features emerge for the log IV series, being close to a normal distribution, except for the log IV series of Nikkei225 that has fatter tails.Also, the log IV series are the most persistent series overall.Finally, the daily returns of all indices are negatively skewed and leptokurtic.This suggests that the return distribution is not symmetric and it has heavier tails than the normal.
11 Under the null hypothesis, the asymptotic distribution of the fluctuation test can be approximated by functionals of the Brownian motion.Critical values are computed by simulation and are reported in Giacomini and Rossi (2010). 12http://realized.oxford-man.ox.ac.uk/data/download.Our RVs are from Library Version 0.3.The Oxford-Man Institute's Quantitative Finance Realized Library contains a selection of daily nonparametric estimated of volatility. 13All IV indices apply the VIX algorithm that relies on a model-free approach rather than a model-based approach that depends on option pricing structure, such as the Black-Scholes model (Jiang and Tian, 2005).As the VIXs are quoted in annualized percentages and to make them comparable to RV measures, we compute the daily IV index as IV ( /(100 252 )) 2 .The same approach is also followed by Blair et al. (2001) and Becker et al. (2007), among others, to evaluate the information content of IV about future volatility when the alternative is historical models based only on daily returns.
Table 2 presents the out-of-sample forecast evaluation results for all 10 indices using a rolling estimation window with an initial in-sample period from February 2, 2001 to December 31, 2005.To assess the one-step-ahead forecasts we report the average QLIKE loss of the benchmark HAR model in Equation ( 4) and the 15 extensions of the HAR model presented in Equation ( 5).The relative forecast error obtained by the ratio of the average QLIKE loss for each model to the average QLIKE loss of the benchmark HAR is provided in parentheses.The model that yields the lowest loss, and thus the model with the best forecast performance is indicated in bold for every index.Several interesting conclusions arise from Table 2 regarding the importance of the various features of realized volatility.In most cases, the relative forecast error is below one, indicating the poor performance of the benchmark HAR model.Thus, accounting for one or more of the volatility features appears to improve forecast accuracy.Augmenting the benchmark HAR model with each of the different stylized facts individually (first part of Table 2) enhances the forecast accuracy across all indices.Explicitly incorporating the implied VIX in the HAR model notably improves its forecast performance across all indices.The HAR-IV specification yields the lowest loss when compared with the HAR, HAR-L, HAR-O, and HAR-G across all indices (except AEX).The presence of the leverage effect and overnight returns also improve the forecast performance of the HAR model.The only exception is for the FTSE100 index.In this case the benchmark HAR performs marginally better than the HAR-O (the relative error is higher than 1).Interestingly, accounting for the volatility of realized volatility adds little or nothing in terms of forecast accuracy.The forecast error of the HAR-G is higher than the benchmark HAR for most indices indicating that the HAR model performs better than the HAR-G.
Across all features, accounting for IV contributes most to the improvement in forecast performance of the HAR model.Looking at the forecast errors of the models that account for each of the various stylized facts separately, that is, HAR-IV, HAR-L, HAR-O, and HAR-G, we find that the biggest predictive gains are driven by IV.In addition, the parsimonious HAR-IV specification yields remarkably lower loss than specifications that exclude IV, but simultaneously include two or more of the other stylized facts, such as HAR-LO, HAR-LG, HAR-OG, and HAR-LOG.
Although the HAR-IV model is more accurate than any HAR-type that does not include IV, the results in Table 2 indicate that extensions of the HAR-IV further improve forecast accuracy.Comparing the forecast error of the HAR-IV with more sophisticated models that include not only IV, but also the leverage effect, overnight returns, and/or the volatility of realized volatility we find that an extension of the HAR-IV provides a more accurate forecast of realized volatility across all indices.The HAR-IVOG performs best for the S&P500 and DJIA, while the HAR-IVLO provides the best model for the Nasdaq100 and most of the European indices.The exceptions being the DAX and FTSE100 index, for which the HAR-IVLO is the second-best model, with the first being the HAR-IVLOG and HAR-IVL, respectively.The HAR-IVO is the best model for forecasting the realized volatility of the Nikkei index.
To compare the relative forecast performance of the models we apply the MCS test.On the basis of the QLIKE loss function Table 3 presents the p-values from the MCS test at the 10% and 25% significance levels.The MCS findings reveal the same picture as Table 2. Across all indices only HAR specifications that include IV belong to the MCS with p-values larger than 10%.This indicates that such models are superior to the other HAR specifications.The results are consistent using both the range and semiquadratic statistics.This indicates that the IV is a strong predictor of future RV, confirming the previous finding about the importance of IV information (Busch et al., 2011).However, the simpler HAR-IV model belongs to the MCS only in one case-the DJIA index.For all the other indices, the HAR-IV is significantly outperformed by an extension of the model that accounts for one or more additional volatility stylized fact.Again, indicating that they contain incremental information about future realized volatility beyond that captured in HAR-IV.
In sum, the key result in Tables 2 and 3 demonstrates that a substantial predictive gain is driven by the inclusion of IV.In most cases, the parsimonious HAR-IV performs better than models that do not include IV even if they are more sophisticated, that is, contain more than one of the other stylized features of RV simultaneously.However, the evidence equally supports the inclusion of further extensions to the HAR-IV model.
Further graphical evidence of the superiority of HAR-IV is provided in Figure 2 based on the results of the Giacomini and Rossi (2010) test.This test compares the relative performance of two competing models over time.We evaluate the performance of HAR-IV against the standard HAR and HAR-L, HAR-O, HAR-G.The solid (blue) line is the difference between the QLIKE error of HAR-IV and one of the other HAR specifications.Giacomini and Rossi (2010) use this approach to test the null hypothesis of equal local predictive ability of the two forecasting models.The zero line indicates equal predictive ability and the dashed lines indicate the 5% critical values.The HAR-IV specification performs better (worse) than a competing model at the 5% level of significance when the solid line is below (above) the lower (upper) dashed line.For most indices the continuous line is clearly outside the critical values and below the lower dashed line.This means that the HAR-IV model consistently outperforms the standard HAR model and alternative HAR models that include either leverage effect, overnight returns or the volatility of realized volatility.In some periods of time the null hypothesis of equal predictive ability is not rejected as the solid line lies between the two dashed lines.In a limited number of cases, one of the competing models performs significantly better than the HAR-IV.For example, for the AEX index, the HAR-IV is significantly worse than its competitors between mid-2011 and 2013.Nonetheless, the overall superiority of the HAR-IV model is consistent across time.
4.2 | Do leverage effects, overnight returns, and the volatility of realized volatility enhance the forecast result?
From the results reported in Tables 2 and 3, and Figures 1 and 2 it is evident that IV is a strong predictor of future realized volatility across all the markets we analyze.This result confirms the previous finding about the importance of IV information for the US market (Busch et al., 2011).The HAR-IV model outperforms not only the standard HAR model, but also other HAR models that capture well-known features of realized volatility, such as the leverage effect, overnight returns, and the volatility of realized volatility, but ignore IV.However, the results above provide evidence in support of extensions to the HAR-IV.In this subsection we further investigate the predictive performance of the HAR-IV model and extensions accounting for the noted features.
Table 4 presents the Giacomini and White pairwise test results for all indices.The p-values reported in the table are based on the mean difference between the row model and the column model.The null hypothesis of equal forecasting performance between the row and column models is tested in terms of QLIKE forecast error.The sign in parenthesis indicates which model performs best.A positive sign shows that the column model is significantly preferred, that is, the row model forecast yields a larger loss than the column model forecast.Similarly, a negative sign denotes that the row model forecast performs significantly better than the column model forecast, with the latter producing a larger loss.
Several interesting conclusions emerge from Table 4. Augmenting HAR-IV with overnight returns significantly enhances forecasting accuracy for all indices with the exception of the DJIA and FTSE100 indices.Incorporating the volatility of realized volatility in the HAR-IV significantly improves the forecast performance of the model only for the S&P500 and DJIA index.This result is in line with Corsi et al. (2008) who show that allowing for the time-varying volatility of realized volatility leads to an improvement in the predictive performance of the standard HAR for the S&P500.However, the results show that this is not the case for any of the other markets we consider.For all the other indices, the HAR-IV outperforms the HAR-IVG.In some cases, the null hypothesis of equal predictive ability between HAR-IV and HAR-IVG is rejected at the 5% level indicating that the former provides significantly better forecasts than the latter.In other cases, the null hypothesis of equal predictive ability between the two models is not rejected, which indicates that the volatility of realized volatility does not contain any significant additional information.
Mixed results are found on the role of the leverage effect.Adding the leverage effect, one of the most salient features of realized volatility, to the HAR-IV significantly improves the forecasting performance in only two cases-for the SMI and FTSE100 indices.For the S&P500 the HAR-IV is significantly better than HAR-IVL.For all other indices, augmenting HAR-IV with leverage effect does not significantly enhance the predictive power.Thus, the leverage effect does not appear to play a significant role in forecasting realized volatility.An explanation for this is that the IV indices reflect the leverage effect.IV may capture the leverage effect given the relation between leverage and volatility and changes in leverage on the volatility smile found in the option literature.This is also complemented by the observed negative empirical relation between VIX and stock returns (e.g., Giot, 2005).Notably, several researchers identify a link between leverage and IV.This includes Branger and Schlag (2004) and Ciliberti et al. (2009) who link the shape of the volatility smile to leverage.Further, Choi and Richardson (2016) demonstrate a positive relation between leverage and option IV.They further note that the asymmetric effect in an estimated EGARCH model arises from both a volatility feedback effect as well as leverage. 15e further augment the HAR-IV model to account simultaneously for more than one stylized fact.We find that the most sophisticated HAR-IV model, that is, the HAR-IV model that accounts simultaneously for the leverage effect, overnight returns, and the volatility of volatility (HAR-IVLOG) does not significantly outperform more restricted specifications across all markets, which means that on average a more parsimonious model is superior.However, no single extension of HAR-IV outperforms all others across the markets we study.Augmenting HAR-IVO with leverage effect (HAR-IVLO) provides the best forecast model for the Nasdaq, CAC40, DAX, and SMI index.That is, for these indices, the HAR-IVLO model performs significantly better than all other competing models.The HAR-IVOG produces the best forecast for the S&P500 and DJIA, while for the remaining indices, an even more parsimonious model provides the best forecast.The HAR-IVO performs best for the STOXX50, AEX, and Nikkei225, and the HAR-IVL is the preferred model for the FTSE100.
Overall, the results regarding the interaction between IV and the stylized facts of volatility indicate that all markets support some extensions of the HAR-IV to include one or more of the stylized fact of volatility.Looking at the best performing HAR-IV specifications we conclude that overnight returns are beneficial for nine out of 10 indices in the sense that its inclusion improves the accuracy of the predictive model.As found by Corsi et al. (2008), allowing for the time-varying volatility of realized volatility helps predict volatility in the United States.However, this result does not extend to any other market we consider.Mixed results are found for the leverage effect which improves predictions only in five markets.
The out-of-sample performance of the realized volatility models to produce reasonable VaR estimates is assessed by computing the violation rate (VR).That is, the frequency that the actual loss exceeds the predicted VaR.If a VaR model is correctly specified, the VR should be equal to the prespecified VaR level a (in our application a is 5%).The Kupiec (1995) unconditional coverage statistic tests the null hypothesis that the observed VR is statistically equal to the expected one, a.The null hypothesis will be rejected wherever the observed VR is statistically different to the expected VR.
Besides the VR, a relevant VaR model should also consider the cluster of the violation.Christoffersen (1998) introduces a conditional coverage statistic that tests the joint hypothesis of unconditional coverage and the independence of violations.That is, in a correctly specified VaR model not only should the violation occur at a specified rate, but they should also occur independently and spread across the whole estimation period.
The results for the one-step-ahead VaR forecasting across all indices for all the HAR specifications are reported in Table 5.The results show that there is not a single model that passes the backtesting tests across all indices.Also, the optimal model varies across indices.This result is consistent with findings in the literature that VaR models are not robust across different markets (Degiannakis et al., 2013).According to the VR, very few models produce a VR equal to the expected one (5%).In some cases the observed VR exceeds the expected VR indicating an excessive underestimation of risk, while in other cases the observed VR is less than the expected one indicating an excessive overestimation of risk.However, and according to the Kupiec (1995) and Christoffersen (1998) test, in many cases the observed VR is not statistically different to the expected VR and the VaR violations are independently distributed.Also, the VR shows that in most cases when IV is added in a HAR specification the VR approaches closer to its expected one.This result indicates that models that include IV appear to improve the forecast accuracy of VaR.Taking into consideration the models that pass the backtesting tests, and the models that produce a VR equal or closer to the expected one, these can be considered to produce the most efficient VaR.Specifically, the HAR-IV model produces the most accurate estimates for the Nasdaq, CAC40 and AEX index, the HAR-IVL performs best for the DJIA, STOXX50, DAX, and SMI index, HAR-IVG produces the most efficient VaR for the S&P500 and FTSE100 index and HAR-IVO is the best performing model for the Nikkei225.
Overall, and consistent with the results presented in Section 4, models that include IV appear to improve the forecast accuracy of VaR.Also, a more parsimonious model seems to produce better VaR estimates than a more sophisticated specification.Thus, a HAR model that accounts for IV and, in most cases, one of the other stylized facts produces the most accurate VaR estimates.
its average value among the n T N = − one-period ahead forecasts computed, and var d ( ̅ ) ij  is an estimate of var d ( ̅ ) ij obtained by using a block-bootstrap procedure of 10,000 resamples.The MCS procedure yields p-values for each model in the initial set.For a given model i M∈ , the MCS p-value, p ˆi, is the threshold at which i M α
Abbreviation: HAR, heterogeneous autoregressive.*Indicatesthat the model belongs to the MCS at the 75% confidence level.**Indicatesthat the model belongs to the MCS at the 90% confidence level.

F
I G U R E 2 Fluctuation test ofGiacomini and Rossi (2010).Note: The figure reportsGiacomini and Rossi's (2010) fluctuation test statistic for comparing forecasts of HAR-IV model relative to either HAR, HAR-L, HAR-O, or HAR-G for all stock markets.The test examines the stability over time of out-of-sample relative forecasting performance of a pair of models in an unstable environment.The null hypothesis is that two models have equal predictive ability at each point in time estimated over a rolling window of data.The zero line indicates equal predictive ability and the dashed lines indicate the 5% critical values.The solid blue lines indicate the fluctuation test statistic.The HAR-IV specification is significantly better (worse) than one of the competing models at the 5% level when the solid line is below (above) the lower (upper) dashed line.HAR, heterogeneous autoregressive [Color figure can be viewed at [wileyonlinelibrary.com]

F
Giacomini and White (2006) test.The null hypothesis that the row model and column model perform equally well is tested in terms of QLIKE forecast error.The superscripts + and − indicate rejection of the null hypothesis, with a positive (negative) sign denoting that the row (column) model is outperformed by the column (row) model.The forecast evaluation period covers from January 1, 2006 to June 27, 2019.Abbreviations: HAR, heterogeneous autoregressive; QLIKE, quasi-likelihood.T A B L E 5 Results of backtesting the 5%VaR estimates across all indices Entries report the violation rate (VR) and the p-values of Kupiec's likelihood ratio test of unconditional coverage (LR uc ) and Christoffersen's likelihood ratio test of conditional coverage (LR cc ) for each model.A violation occurs if the empirical return exceeds the predicted VaR on a particular day.The expected value of the VaR violation ratio is the corresponding tail size, that is, 5%.*, **, and *** denote the significance at 10%, 5%, and 1% levels, respectively.Abbreviations: HAR, heterogeneous autoregressive; VaR, value-at-risk.
Summary statistics of (log) RV, (log) IV, and daily returns Entries report the summary statistics of the daily (log) realized volatility, the (log) implied volatility, and the returns of the 10 international stock markets.The sample period covers from February 2, 2001 to June 27, 2019.The last six columns of the table provide the first to the third order of the autocorrelation function (ACF) and partial ACF (PACF).
T A B L E 1