Spatial Dependency in Nonstationary GEV Modelling of Extreme Precipitation over Great Britain

This paper presents a study on extreme precipitation using both stationary and non-stationary Generalized Extreme Value (GEV) models over a large number of samples distributed over Great Britain (GB) for the last century, aiming to gain insights in the spatial dependency of the GEV distribution. Not only L-Moments (LM) and Maximum Likelihood (ML) estimation methods but a Bayesian Markov-Chain Monte Carlo (B-MCMC) method are incorporated into the GEV models to 10 characterize the uncertainty in the nonstationary risk-based assessment. The samples are generated using a toolbox of spatial random sampling for grid-based data analysis (SRS-GDA). The results show that a markedly large proportion (70%) of the samples are favour nonstationary assumption GEV models as far as the annual maximum daily rainfall (AMDR) is concerned. The most frequent AMDR, as represented by the location parameter tend to be increasing over the time for more than half of the samples and in contrast, only 8% have a downward trend. A spatially clustering pattern is also clearly present. For rarer 15 (with 0.1 probability) AMDR, they are shown to have a tendency of becoming more extreme over time, for more than half of the samples. For the three methods, the LM method with stationary GEV maintain best results but for AMDR values with higher probability (5-year return level); the B-MCMC method with nonstationary GEV, however, outperform other combinations by a large margin for more extreme events (50-year return level). The findings suggest that an overhaul of the current engineering design storm practice may be needed in view of environmental change impact on natural processes. 20


Introduction
Extreme value (EV) theory and its application in modelling meteorological and environmental processes is a standard practice for designing and validating many infrastructure systems. Following this, a classic analysis approach is to use historical hydroclimatic data, such as rainfall, temperature, flood etc., to estimate the parameters of the required EV model which would offer probability distribution of the natural phenomenon in question, so as to address its occurrence or exceedance probability at 25 given thresholds. Since Jenkinson (1955) proposed a generalized approach to analyzing the frequency distribution of annual maxima, many efforts have been put in quantifying the natural phenomena at extreme levels by using the Generalized Extreme Value (GEV) models (e.g. Gumbel, Fréchet, Weibull) with parameter estimation using the Maximum Likelihood method (ML) https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License. and the L-Moments method (LM), especially in designing and planning water engineering systems (Coles and Tawn, 1996;Lazoglou and Anagnostopoulou, 2017;Mannshardt-Shamseldin et al., 2010;Shukla et al., 2012;Yoon et al., 2015). 30 Recently, there has been a growing interest in explaining natural events from a climate-change perspective as many key hydroclimatic variables, e.g. precipitation, temperature, streamflow etc., are indeed changing due to the impact from climate change (Assani and Guerfi, 2017;Herring et al., 2018). In view of the reliability of infrastructure designs based upon extreme value analysis, stationary risk analyses have been re-assessed from a new adaptive perspective where Sarhadi et al. (2016) proposed a multivariate time-varying risk framework for all stochastic multidimensional systems under the influence of the changing 35 environment. For the commonly used GEV model, this is meant to assume its scale and location parameters to be varying with time or other climate indices. For example, Hasan et al. (2012) proposed two nonstationary GEV models for extreme temperature and each model assumes only one parameter as nonstationary depending linearly and exponentially on time respectively; Sarhadi and Soulis (2017)  Different from the researches listed above which assume a constant shape parameter, Ragulina and Reitan (2017) explored the change of the shape parameter and found that it evidently depends on the elevation of study areas. 45 Although in the last few decades there have been several studies applying nonstationary GEV distribution to fit extreme rainfall, most of them focused on a limited number of specific domains because of data availability issues in hydrological observations; therefore, their conclusions are mostly of rationale and lack of generalization (Ganguli and Coulibaly, 2017). In addition, the performance of the existing GEV fitting methods, has not been systematically assessed as to their suitability, especially in the context of fitting nonstationary models. 50 To address these issues, we present in this paper a study of extreme rainfall using both stationary and non-stationary GEV models over a large number of samples distributed over Great Britain (GB), aiming to gain insights in the spatial dependency of the GEV distribution as well as the performance of the existing three methods. There are 88 domain samples with the same size of 500 km 2 , spatially distributed in the mainland of GB. These samples are generated by using the toolbox we developed for spatial random sampling for grid-based data analysis (SRS-GDA) (Wang and Xuan, 2020). The underlying precipitation 55 dataset is taken from the GEAR dataset (Gilbert, 1987) which covers the GB area with a spatial resolution of 1 km×1 km over the region of 700×1250 km 2 . The quality and homogeneity of the GEAR dataset have been well tested by its provider, the Centre for Ecology & Hydrology (CEH) of the UK.
The main objectives of this study include: 1) to reveal the extreme rainfall pattern that varies with the time during the last century in GB; 2) to assess the applicability of both stationary and nonstationary GEV models; 3) to test the three mainstream 60 parameter estimation methods: LM, ML and a Bayesian Markov-Chain Monte-Carol (B-MCMC) with regards to their https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License.
goodness of fitness at different levels of rarity of rainfall extremes. The specific focus on the spatial dependency of the methods tested the highlights of this study.
The rest of this paper starts with presenting the main methodology used including parameter estimation for both stationary and nonstationary GEV models; it then follows the introduction of the study area and the datasets in Sect. 3; Sect. 4 shows the 65 results alongside a detailed discussion focusing on the spatial feature of both stationary and nonstationary GEV models and their performances. The conclusions and recommendation are given in Sect. 5.

Methodology
We propose the following approach which covers the four related aspects of this study:  Evaluate the performance of the two types of models in various contexts with regards to the geographical locations, level of extremity as well the method of fit. 75

Stationary Generalized Extreme Value Model (S-GEV)
For a given sampled area, the annual maxima series of the areal daily rainfall is extracted and denoted as . We then consider using the GEV to fit the series with the cumulative distribution function defined in Eq. (1) and its inversion (Eq. (2)) to obtain the threshold value at a different return level .
The cumulative probability function is defined for {1 + ( − )/ > 0}, −∞ < < ∞, > 0 and −∞ < < ∞ , 80 where is the location parameter, is the scale parameter, and is the shape parameter. There are three types of distributions from the GEV family which are distinguished by their shape parameters. The Type I GEV, also known as the Gumbel distribution, refers to the case where = 0; while the type II and III are known as the Fréchet distribution and the Weibull distribution corresponding to the cases where > 0 and < 0 respectively. (LM) method (Hosking, 1990;Hosking and Wallis, 2005) which is a common choice. The linear moments are the expectations of certain linear combinations of order statistics which contain the estimated parameters. For the GEV distribution ( ≠ 0), the first three linear moments are: 90 where ( = 0,1,2) indicates the expectations of the quantiles or non-exceedance probabilities of the -th random variable (i.e. the -th AMDR) if the expectation [ ] exists. They can be calculated by using the probability-weighted moment estimator which is given by After the three linear moments are estimated, an approximate explicit solution for the shape parameter in the interval −0.5 ≤ ≤ 0.5, is calculated by using Eq. (5) (Hosking et al., 1985). 95 The other two parameters can then be estimated by plugging back into Eq. (3).

Nonstationary Generalized Extreme Value Model (NS-GEV)
Compared with the S-GEV model, the nonstationary GEV (NS-GEV) model makes an important extension by assuming that the parameters change over elapsing time. In this study, the scale and location parameters are considered to vary monotonically and linearly with time (see Eq. (7)) and thus its cumulative probability function and inversion are given as: 100 Basically, the CDF follows the same form as the stationary one with an additional subscript added to the location and scale parameters which indicates that both parameters are time dependent. The shape parameter, is assumed to be constant.
The linearly time-varying parameters are further shown in Eq. (7).
The NS-GEV model thus has five parameters { 0 , 1, 0 , 1, } which are denoted by a vector form to help our discussion.
The LM method which is previously applied to estimate the parameters of the S-GEV model is unsuitable for the case of NS-105 https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License.
GEV; therefore, in this study, the Maximum Likelihood (ML) method is employed to estimate parameters and the Bayesian Markov-Chain Monte-Carlo (B-MCMC) method is incorporated into NS-GEV model to characterize the uncertainty.

 The ML method
The ML method (Myung, 2003) is built upon the likelihood function of the occurrence of the annual maximum daily rainfall (AMDR) : 110 where (⋅) is the univariate density function and is the length of dataset . Its product is the likelihood function . The set of the parameter can then be obtained by maximizing the likelihood function: Often, Eq. (9) cannot be solved analytically and in this study a numerical scheme was applied to obtain the three parameters.

 The B-MCMC method
The B-MCMC method makes use of the Bayesian inference to estimate the posterior distribution of the time-varying location 115 and scale parameters of the NS-GEV model. In this study, the estimated parameters of the S-GEV model are used to define the initial prior values of the NS-GEV model. The prior distribution of parameters is assumed to be a uniform distribution.
Equation 10 presents the transformation from prior distribution to posterior distribution by multiplying by its likelihood (Rasmussen and Ghahramani, 2003).
Numerical iterations for processing the posterior distribution are carried out by using MCMC simulation (Binder et al., 2012;Manly, 2018;Metropolis and Ulam, 1949), which is also aimed at analyzing the uncertainty of the NS-GEV model. The final simulation results are compared with those estimated using the ML method.
The inputs to the B-MCMC method include: the initial values of the parameters taken from the S-GEV model, the likelihood 125 function, the prior probability and the step set-up function which returns the step length of each iteration. For this study, a random step length is used. The length of the Markov chain is set as 15,000 which is long enough for the simulation; a value 1 is used for setting the skip set-up (N) to thin the chain by only storing every N steps.
Suppose that is the current (known) state with a prior probability ( | ) and +1 is the next-step state (unknown) with an a prior probability of ( ′ | ); an MCMC iteration can be described by the following steps and the flowchart shown in Fig. 2 130 (Carlo, 2004): 1) Propose a new step state +1 by following a random walk and calculate the prior probability of ( ′ | ) of this state; in the meantime, drawing a random number * from (0,1). https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License.
) ≥ * , then calculate the likelihood ( | ′ , ) of +1 and go to step 3, otherwise reject this state and go back to step 1 to regenerate a state; 135 ) ≥ * , then accept +1 and store its parameters and go to the step 4, otherwise reject this state and go back to step 1; 4) Check the iteration with the length of Markov chain, if the number of iterations is less than 15000, continue executing the loop (step 1 to 4); otherwise, finish the Monte-Carlo simulation and analyze the estimated parameters.
Finally, a quantile-quantile plot (Q-Q plot) is produced to compare the quantiles simulated by both the S-GEV and NS-GEV 140 models against the empirical quantiles. The Q-Q plot has a reference line along which the data indicates the equalization between simulations and observations. The larger the deviation from this reference, the worse the performance of the model (S-GEV or NS-GEV) or method (LM, ML or B-MCMC).

Goodness of Fitness and Performance of S-GEV and NS-GEV Models
The Kolmogorov-Smirnov (K-S) Goodness of Fitness test (Kolmogorov, 1933;Smirnov, 1948) is widely used to assess the 145 quality of the convergence of GEV distribution with the extreme hydro-climatic datasets. The test is carried out by comparing the empirical cumulative probability distribution with the GEV cumulative probability distribution. The maximum difference between the two distributions is used to covert the -value which indicates whether the testing dataset follows the assumed distribution. The null hypothesis in this study is that the data are drawn from GEV distribution. The K-S test rejects the null hypothesis if the -value is below the significance level of 5% in this study. 150 Meanwhile, the difference between empirical designed rainfall ( ) by its empirical cumulative probability distribution at different return periods and its counterparts ( ′ ) by S-GEV and NS-GEV models, is applied to show the performance of GEV models and uncertainty arose from stationary and nonstationary assumptions, as defined in Eq. (11).
Small absolute value of Diff can be related to a less uncertainty and a better performance. In order to show such variation, a boxplot is employed to indicate the under/overestimate the risk of extremes. 155

Dataset and Study Area
The dataset used in this study, named GEAR, is a gridded daily rainfall at a spatial resolution of 1 km × 1 km from 1898 to 2010 over Great Britain (GB) (Tanguy et al., 2016). The rainfall estimates are derived from the UK Met Office national database of observed precipitations. To derive the estimates, the precipitations from the UK rain gauge network were used.
The Natural Neighbor interpolation method, with an extra normalization step based on average annual rainfall, was used to 160 generate the daily estimates. The estimated rainfall on any given day refers to the rainfall amount precipitated in the 24 hours between 9am on the day of report until 9am on the following day. The origin of the GEAR data matrix starts from the location https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License. 400 km west, 100 km north of the true Origin (49°N, 2°W), spreading 700 km east-westly and 1250 km southnorthly. Figure   1b shows the GEAR data matrix where only the grids within the green area (over the mainland) were used in this study.
The SRS-GDA toolbox (Wang and Xuan, 2020)is then employed to generate samples of areas over the study domain. This 165 toolbox can generate samples with either randomly or manually defined properties, e.g., location, size, shape and total number of samples, from the entire study area. In this study, each of the samples is predefined with a fixed size of 500 km 2 and a fixed spatial property = 0.8. The is an important parameter that indicates the irregularity of the shape of areas sampled, expressed as the ratio of the north/south dimension of the domain in question over the east/west dimension. The value 0.8 was used to guarantee a regular shape of the domains generated. It should be noted that the SRS-GDA toolbox is able to randomize 170 location, size as well as shapes; in this study, however, our focus is on the impact of location only. One of such samples is shown in Fig. 1a which consists of 500 grids with a grid size of 1km × 1km, e.g. the same resolution as that of the GEAR dataset. The sampling is then repeated with randomized (non-overlapping) locations with a spatial interval of 40 km until finally we obtained 88 such samples located all over the study domain (see Fig. 1b).

Simulation Results of the S-GEV and NS-GEV Models
The parameters of the GEV distribution under both the stationary and nonstationary assumptions, are estimated by using the three methods (LM, ML and B-MCMC) for the 88 samples. The -values, which indicates the goodness of fitness, are all very close to 1.0, which indicate a failure on rejecting the null hypothesis, i.e., the AMDR follows the GEV distribution at 5% significance level. It should be noted that the high p-values cannot be used to confirm that the AMDR follows the GEV 180 distribution, however, we follow other researchers here to use them to indicate that the AMDR is highly likely to follow the GEV distribution (De Michele and Avanzi, 2018;Hasan et al., 2012;Machiwal and Jha, 2008;Martin, 2013). Meanwhile, the Diff is applied to identify the best performance and uncertainty on nonstationary-based assumption. Among those 70% samples favoring the nonstationary assumption, the B-MCMC method always converge to better results than the ML method does. Geographically, the samples that favor stationary models (labelled by crosses) concentrate around the region of 100 km north in the vicinity of Manchester and Liverpool, with several others distribute in Southern England. Figure 3b presents the spatial distribution of the best selected GEV models in terms of their types. Out of all samples, there are more than half (55.7%) following the Fréchet distribution ( > 0), mainly located in Southern England, 37.5% following 190 the Weibull distribution ( < 0) and the rest following Gumbel distribution ( = 0). Figure 3c shows the spatial distribution of the samples with regards to how the parameters of the GEV distributions vary with time, i.e. the two parameters 1 and 1 . The results are further summarised in Table 1 and Table 2 with more discussions in the following section.

Spatial Nonstationary Patterns of AMDR in GB 195
It is worth revisiting the implication of time-varying scale and location parameters of the GEV models. The scale and location parameters determine the shape and location of the GEV distribution (Kantar and Şenoğlu, 2008;Mann, 1967). The location parameter indicates the mode of the time series, which in our case, is related to the AMDR that has the most frequent occurrence. An increasing means that the AMDR values of the highest probability goes upward. The scale parameter is related to the deviation of the AMDR values from , which determines the stretch (for increasing ) or squeeze (for decreasing 200 ) of the GEV probability distribution curve. The larger the scale parameter, the more spread-out the distribution is.
Conversely, the smaller the parameter, the more compressed the distribution is. In our study, if is estimated to be increasing with the time, the occurrence probability of extreme AMDR, i.e. rainfall ranked in the higher positions is increased.
As seen in Fig. 3c, several intriguing yet remarkable patterns can be found with regards to the fitted sale and location parameters: 205  Most of the samples are in favor of the NS-GEV model, with only 30% samples are shown to have stationary and .
Geographically, these 30% samples are centered around 100 km north (the vicinity of Manchester and Liverpool) with only a few distributed in southern England and Scotland. One of such samples is examined to reveal the difference among the models and the combination of the three methods, as seen in Fig. 4. This sample is located to the west of Glasgow with a location index of (240 km, 660 km). 210 Figure 4 presents the observed AMDR over the entire period of 113 years, comparing the simulated series fitted by the two models (S-GEV and NS-GEV) using the three different methods discussed above (LM, ML and B-MCMC). The majority of the AMDR values, which are related to and can be regarded as the most frequent rainfall, fluctuate between 40mm and 60mm during the entire period. And such fluctuations, which are related to , are even and have no perceivable changes from the first to the second 50 years. 215  About 56% samples are detected to have an increasing As mentioned previously, an increasing location parameter indicates an upward trend of the most frequent AMDR values. It is found that more than half of the samples over GB demonstrate such increasing trend. In fact, if including those samples with = 0, there are 92% of samples coming with non-decreasing . Location wise, samples with increasing generally are from the middle England and Wales, the Lake District and the Highlands. Figure 5 presents three examples with their characteristics 220 shown in Table 1. It is also worth noting that more than half of these samples come with an increasing scale parameter while less than a quarter of them have a decreasing one. This implies that not only are the AMDR in majority samples getting higher on average, they also are becoming more extreme in those areas. It is also clear from Table 1 that the changing scale parameter with an increased in the first two example samples, which represents 70% of all samples, leads to an increasingly more frequent 1-in-50-year rainfall over time; only the third sample, representing the rest 30%, whose dropping makes such 225 rainfall rarer as an 1-in-60-year event. This corroborates, quantitively, with other studies suggesting that extreme rainfalls are https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License. more likely to have become more frequent, or in other words, the return level of the events that engineering designs rely on could possibly be reduced and become less reliable.
 Only 8% samples present a decreasing trend of most frequent AMDR values.
Differing from the first two patterns, there are only 7 over 88 samples showing a decreasing trend in their parameters. 230 Remarkably, these samples are all located in Scotland. Figure 6 presents three examples with their characteristics shown in Table 2. Except the first sample, the most samples with a decreased and unchanged or decreased show 1-in-50-years rainfall become rarer after 100 years, especially when drops significantly.

Performance of Methods
In order to compare the three statistical methods LM, ML and B-MCMC, we divide the AMDR values by their associated 235 probability P into four levels separated by: P50, the 50 th quantile (or 1-in-2 years in terms of return level); P80, the 80 th quantile (1-in-5 years); P98, the 98 th (1-in-50 years) and even P99, the 99 th (1-in-100 years) quantiles of their empirical CDF's:  L1: P ≤ P 50 ;  L2: P 50 < ≤ P 80 ;  L3: P 80 < ≤ P 98 ; 240  L4:P > P 98 ; These four levels can be considered as the low (L1), the medium (L2), the high (L3) and the very high AMDR (L4). The higher the AMDR is, the less frequently it appears. Therefore, L4 is considered to be the extreme case. The assessment of the three fitting methods are then conducted on a level-by-level basis over the entire 113 years for all sampled domains. Q-Q plot is used to compare the performance of the three methods with one example shown in Fig. 7a. The reference line (dash 245 line) indicates the perfect fit. Larger deviation from this reference line implies worse performance of the fitted GEV model. It is interesting to see for this sample, the LM method (for S-GEV) and the ML (for NS-GEV) both work better for the lower quantile of rainfall (below L1) where the B-MCMC method tends to a bit underestimation. For the medium quantiles, e.g. L2 and L3, all three methods achieve similar level of performance. It is the extreme case, e.g. L4 that the B-MCMC method gets a clear leading edge with much closer results. This pattern of performance not only appears for the selected sample, but it also 250 represents most of all samples when plotting the simulation Diffs (Eq.(11)), as shown in Fig. 7b. Again, for the quartile rainfall less than 10years return level (i.e., L1 and L2), the smallest difference and least uncertainty are observed in the model results by S-GEV with the LM method; for L4, the boxplot of the difference by S-GEV is skewed left with 2 outliers while NS-GEV by B-MCMC method show a much smaller uncertainty without outliers and less difference but a bit underestimation.
Furthermore, the uncertainty grows as the return period increases. More details can be found in Table 3. 255 The spatial distribution of the best selected methods with regards to 4 levels are further summarized in Fig. 8 and Table 4.
They confirm the said pattern change, i.e., the LM methods dominates the L1 level and gradually, the ML method gets more https://doi.org/10.5194/hess-2020-44 Preprint. Discussion started: 24 February 2020 c Author(s) 2020. CC BY 4.0 License. and more contribution as the best performers from L2 to L3. And again, for the extreme case (L4), the B-MCMC is a clear winner.
It is also interesting to interpret Table 4 from another perspective of model choice. For low and medium AMDR, the stationary 260 model, e.g. S-GEV can sufficiently represent them very well; however, for higher level, NS-GEV models are preferred. This also implies that there have been more possible time-varying changes to the higher level of AMDR than to the normal, medium range of AMDR.

Conclusions
We present a study of spatial dependency of both stationary and nonstationary GEV modelling of annual maximum daily 265 rainfall over Great Britain for the period of 113 years using a large grid-based dataset GEAR. We also demonstrate the performance of the three most commonly used fitting methods LM, ML, B-MCMC, particularly over different level of rarity of the event. The study is assisted with a toolbox of spatial random sampling (SRS-GDA) which provides enough samples.
We find that: 1) In general, 70% samples favor the NS-GEV model whose parameters and are assumed to be linearly changing with 270 time; 2) Among those NS-GEV applications, B-MCMC always performs better than ML. However, S-GEV model estimated by the LM method is the best chose for modelling the rainfall less than 1-in-5-year return level while NS-GEV model incorporated by the B-MCMC method prevails for the extreme cases, e.g. the rainfall higher than 1-in-50-year return level; 3) More than half of those samples favoring the NS-GEV model show a continuous increase on which is related to the 275 increase of the most frequent AMDR in those samples. Meanwhile more than half of them are further accompanied by an increase on , which leads to an overall dropping of the return level from 1-in-50-year to 1-in-20 year over the study period of 113 years. If translated into everyday language, this means that not only do the most frequent events (w.r.t. ) becomemore extreme, the extreme events also become more frequent (w.r.t. ).
We trust that the findings from this study are of great relevance as they not only further corroborate other research findings on 280 extreme rainfall, e.g. extreme events are likely to become more frequent due to climate change impact, but they also quantitatively address how such changes may affect the original engineering design standard. The fact that the combination of NS-GEV/B-MCMC always perform the best for evaluating the extreme events regardless of the GEV model, may inspire a reconsideration of the current practice of designing storms.
Further work is recommended to have a closer look at the underlying datasets with respect to the potential inconsistency in the 285 resolution of the data observed near the west coast of Scotland. In addition, comparative study with long-term, single gauge observations, as well as catchment orientated sampling will make conclusions more robust.

Code/Data availability
The GEAR dataset for this study is provided by Centre of Hydrology and Ecology (CEH). The open-source toolbox of spatial random sampling for grid-based data analysis (SRS-GDA) developed by authors to generate samples used in this study can be 290 obtained from https://github.com/wanghan924/SRS-GDA_Toolbox.git.

Author contribution
There is a contribution

Competing interests 295
The authors declare no competing interests.   Location (    Location (