Modelling hyperspectral-and thermal-based plant traits for the early detection of Phytophthora -induced symptoms in oak decline

Holm oak decline is a complex phenomenon mainly influenced by the presence of Phytophthora cinnamomi and water stress. Plant functional traits (PTs) are altered during the decline process — initially affecting the physiological condition of the plants with non-visual symptoms and subsequently the leaf pigment content and canopy structure — being its quantification critical for the development of scalable detection methods for effective management. This study examines the relationship between spectral-based PTs and oak decline incidence and severity. We evaluate the use of high-resolution hyperspectral and thermal imagery ( < 1 m) together with a 3-D radiative transfer model (RTM) to assess a supervised classification model of holm oak decline. Field surveys comprising more than 1100 trees with varying disease incidence and severity were used to train and validate the model and predictions. Declining trees showed decreases of model-based PTs such as water, chlorophyll, carotenoid, and anthocyanin contents, as well as fluorescence and leaf area index, and increases in crown temperature and dry matter content, compared to healthy trees. Our classification model built using different PT indicators showed up to 82% accuracy for decline detection and successfully identified 34% of declining trees that were not detected by visual inspection and confirmed in a re-evaluation 2 years later. Among all variables analysed, canopy temperature was identified as the most important variable in the model, followed by chlorophyll fluorescence. This methodological approach identified spectral plant traits suitable for the detection of pre-symptomatic trees and mapping of oak forest disease outbreaks up to 2 years in advance of identification via field surveys. Early detection can guide management activities such as tree culling and clearance to prevent the spread of dieback processes. Our study demonstrates the utility of 3-D RTM models to untangle the PT alterations produced by oak decline due to its heterogeneity. In particular, we show the combined use of RTM and machine learning classifiers to be an effective method for early detection of oak decline potentially applicable to many other forest diseases worldwide.


Introduction
Plant functional traits (PTs), such as biochemical composition, chlorophyll fluorescence, water and dry matter content, crown temperature, and vegetation structure, are closely linked to plant health conditions and the responses to environmental and biotic stressors (Ahrens et al., 2020).Changes in PTs may alert managers to biotic and abiotic stressors and thus enable timely management interventions (Cunniffe et al., 2016).Hyperspectral signatures of plants provide an efficient alternative to standard field surveys by enabling monitoring of vegetation status (including biochemical and functional assessments) over large areas at a reduced cost (Homolová et al., 2013;Rocha et al., 2019).Recent studies provide evidence that the quantification of PTs from hyperspectral and thermal images can successfully detect previsual symptoms of harmful crop pathogens, such as Xylella fastidiosa (Xf) infection in olive trees (Zarco-Tejada et al., 2018).
Retrieving PTs from spectra obtained in non-agricultural contexts, such as forest canopies, is challenging because of their high variability.Natural forests, for example, are highly heterogenous in species composition and canopy structure, resulting in spectral mixture effects produced by forest canopy structure, shadows, and understory.Furthermore, they may have high levels of intraspecific variability, driven by microsite and ecophysiological conditions (Fernández i Marti et al., 2018;Navarro-Cerrillo et al., 2018).The spectral mixing produced in heterogeneous forest canopies reduces the accuracy of PTs retrieved from images, especially those derived from narrow regions of the spectrum such as the chlorophyll fluorescence emission region (Hernández-Clemente et al., 2017).
Forest decline is a pervasive decrease of forest health resulting from a complex interaction of a potentially large number of biotic and abiotic factors (Hutchings et al., 2000), including stresses such as water deficit, air pollution, and invasive pests (Manion and Lachance, 1992;Trumbore et al., 2015).In the case of oak decline on the Iberian Peninsula, water stress and root rot caused by Phytophthora cinnamomi (Pc) and related oomycetes are thought to be the main drivers of tree death (Ruiz-Gómez et al., 2019).This pathogen is one of the most pervasive invasive alien species in forest ecosystems of the northern hemisphere (Burgess et al., 2017).It is a challenge to identify the relationship between water stress and root rot.But it has been observed that the reduction in water availability caused by water stress increases susceptibility to Pc infection (Corcobado et al., 2013).Infected trees exhibit regressive decline immediately after showing visual symptoms such as defoliation, crown or canopy discolouration, and brown foliage remaining attached to the canopy (Camilo-Alves et al., 2013).After these symptoms are detected, there is no opportunity for forest management to prevent tree death.For this reason, it is critical to developing accurate methods for the early detection of oak decline that maximises the effectiveness of silvicultural treatments such as calcium soil fertilisers, biofumigant crops, or fosetylaluminium treatments (Romero et al., 2019).Understanding the early phase as a pre-symptomatic or non-symptomatic stage, where trees may be affected but have not yet developed symptoms.
Several spectral-based strategies have been developed to quantify critical PTs in natural forest canopies, as recently reviewed by Hernández-Clemente et al. (2019).Methodologies range from those based on empirical relationships between field observations and specific spectral bands or vegetation indices (VIs) to approaches involving 3-D radiative transfer models (RTMs) or machine learning (ML) techniques.While empirical relationships can be readily developed for a wide range of traits of interest, 3-D RTM requires significant computational effort.A main disadvantage of the empirical approach is its limited generalisability to different spatial and temporal contexts.By contrast, RTMs are causal models robust to variations in geometry, illumination, and scene components (i.e., canopy, understory, soil), helping incorporate context dependency and enabling generalisation to different environments.These properties are important for deriving PTs from forest canopies, where 3-D RTMs such as FLIGHT (North, 1996) or DART (Gastellu-Etchegorry et al., 1996) represent the spatial heterogeneity of forest canopies fairly effectively (Hernández-Clemente et al., 2017, 2012;Kötz et al., 2004;Liu et al., 2020;Roberts et al., 2020).A recent study using FLIGHT8 has shown the need to account for effects of shrub and/or grass understories in addition to tree canopies in quantifying variables such as chlorophyll fluorescence (Hornero et al., 2021).
A semi-causal method is the combined use of PTs retrieved with RTMs and VIs (Zarco-Tejada et al., 2018).Numerous VIs have been formulated and tested for quantifying biomass loss related to advanced stages of plant diseases (Castrignano et al., 2020).Some formulations, such as the soil-adjusted vegetation index (SAVI) or the modified chlorophyll absorption ratio index (MCARI), have been shown to minimise the background and atmospheric effects and perform better for forest canopies than traditional formulations such as NDVI (Hornero et al., 2020).Zarco-Tejada et al. (2001) demonstrated that a red edge spectral index, R 750 /R 710 , reduced forest shadow effects better than other standard chlorophyll indicators used to estimate chlorophyll a and b content.
The diagnosis of plant diseases requires quantifying not only forest biomass but also the physiological condition of that biomass (Cunniffe et al., 2016).Functional PTs such as photosynthetic rate, water stress, leaf anthocyanin, chlorophyll a and b, and carotenoid content may be used for early detection of diseases (Hernández-Clemente et al., 2019).Also, a group of carotenoids, the xanthophyll cycle carotenoids, plays a photoprotective role, preventing damage from excess light to photosynthetic systems, and are potentially detected through the photochemical reflectance index (PRI), thus serving as a proxy for forest health (Hernández-Clemente et al., 2011;Sims and Gamon, 2002).Other useful indicators of plant health include sun-induced chlorophyll fluorescence (SIF) emission and canopy temperature, which are often used as powerful non-invasive markers to track the status, resilience, and recovery of vegetation (Gonzalez-Dugo et al., 2014;Mohammed et al., 2019;Zarco-Tejada et al., 2012).
However, the relative importance of different PT indicators for detecting disease remains largely unknown for many forest species and ecosystems.Understanding the sensitivity of different spectral-based physiological indicators for detecting forest decline in these heterogeneous environments will help guide management and future monitoring campaigns.In this study, we i) expanded our understanding of the contributions of different PTs in detecting symptomatic and asymptomatic trees affected by biotic and abiotic stressors in a holm oak forest and ii) used this information to construct a PT-based analytical approach for the early detection and severity assessment of forest decline.

Study site and field data collection
The study was conducted in an open Mediterranean-like oak savannah or dehesa located in Andalusia, southern Spain (37 • 36 ′ 45 ′′ N, 7 • 21 ′ 8 ′′ W, 148 ha, Fig. 1).The dominant species in the forest was holm oak, Quercus ilex subsp.ballota (Desf.)Samp.Tree density ranged from 30 to 40 trees ha − 1 .There was an understory of annual plants and typical Mediterranean sclerophyllous and sub-sclerophyllous shrub species, i.e., Cistus spp., Pistacia spp., Phillyrea spp., and Rosmarinus officinalis.The climate at the study site is dry thermo-Mediterranean, with mild winters and hot summers, including approximately 120-150 biologically dry days, a mean annual rainfall of 570 mm, and an average annual temperature of 16.8 • C, according to the Agroclimatic Information Network of Andalusia (Meteorological Station of Puebla de Guzmán, 37 • 33 ′ 07 ′′ N, 07 • 14 ′ 54 ′′ W).The bedrock is calcareous, and the terrain is characterised by smooth hills (slope < 15%).Soils are Eutric Cambisols, Chromic Luvisols, and Lithosols with Dystric Cambisols and Rankers (REDIAM, Junta de Andalucía, 2021).The study area is also affected by the combined effect of water deficiency and erosion, soil compaction, and nutrient losses (Moralejo et al., 2009).
Two field surveys were conducted in the study site in summer 2017 and summer 2019.During the surveys, disease severity (DS) and disease incidence (DI) were assessed for 1146 individual holm oak trees.Seem (1984) defined DS as the quantity of disease affecting entities within a sampling unit; DI is a quantal measure, defined as the proportion or percentage of diseased entities within a sampling unit.DS thus accounts for disease severity, while DI considers only whether a tree is affected or not.
Based on visual inspection, we assigned individual trees to one of the four DS categories available (Fig. 2) depending on the proportion of the crown affected by defoliation (Eichhorn et al., 2017) and other typical Pc-induced symptoms, including dead branches in the crown, stem cankers, and adventitious epicormic sprouts (Jung et al., 2000).DS ranged from 0, indicating the absence of visual symptoms, to 3, in which most of the branches in the crown were dead, following the classification of the Andalusian Forest Damage Monitoring Network (Consejería de Medio Ambiente y Ordenación del Territorio, 2018) (Table 1; Fig. 2).According to this classification, defoliation refers to both reduced leaf retention and premature loss compared to regular tree growth cycles.The part of the crown that is evaluated includes all live branches and thin branches that are dead but still bear leaves.However, it excludes thick branches that have been dead for years and have already lost their natural buds, epicormic shoots below the crown, and gaps in the crown where branches have never existed.DI was either 0 or 1, indicating nonsymptomatic trees and symptomatic trees, respectively, where nonsymptomatic trees corresponded to a DS of 0 and symptomatic trees to any other severity (DS ≥ 1).
The presence of Pc on holm oak roots was confirmed through molecular analyses in the study area.Soil samples were collected on three different trees located in the centre of the study area.The analysis and the results are detailed in Ruiz-Gómez et al. (2019).

Leaf pigment quantification
Biochemical measurements were taken on leaves from 15 selected trees in the study area in the summers of 2013, 2015, and 2017, in which the chlorophyll (C ab ), carotenoid (C ar ), and anthocyanin (A nth ) contents were measured (Table 2).Leaf pigment content was measured by destructive methods on 12 samples per tree (three biological replicates per orientation, i.e., North, East, South, and West).Samples were collected from the sunlit branches at the top of the crown during a 1hour window around solar noon.Leaves were immediately frozen in liquid nitrogen in the field and kept below − 20 • C until the analysis of pigment concentration was performed in the laboratory.Photosynthetic pigment extracts (chlorophylls and carotenoids) were obtained from a mixture of 2-cm 2 ground leaf material per sample (four discs of 0.5 cm 2 ); the leaves were milled in a mortar bed on ice with liquid nitrogen and diluted in acetone to 5 mL (in the presence of sodium ascorbate).Extracts were then filtered through a 0.45-μm PTFE hydrophobic filter to separate pigment extracts from remaining fractions.Extractions and measurements were performed under reduced light conditions to avoid degradation of the pigments, with five technical replications conducted per biological sample.Photosynthetic pigment quantification was done through absorbance measurement after separation by high-precision liquid chromatography (HPLC) following the methodology detailed by Hernández-Clemente et al. (2012).
Anthocyanins were extracted by suspending two 0.5-cm 2 leaf discs in acidic solution (methanol 1% HCl) following Murray and Hackett (1991).The absorbance of anthocyanins (AAs) in the samples was calculated by subtracting 24% of the maximum absorbance of chlorophylls (653 nm) from the maximum absorbance of the anthocyanins (532 nm) (1) Concentrations were estimated using a molar extinction coefficient of 30 mL mol − 1 cm − 1 (Steele et al., 2009).Five technical replicates were performed for each biological sample, and results are shown in units of μg cyanidin-3-glucoside equivalents per cm 2 (Lee et al., 2008).

Plant functional traits
Steady-state leaf fluorescence (F s ) was measured for 15 trees using 12 leaves per tree (three per orientation) with a FluorPen FP100 (Photon Systems Instruments, Drásov, Czech Republic).These measurements were used as a proxy of the airborne SIF retrievals and a field-level assessment of plant functional stress for each severity level.
In July 2013, the leaf area index (LAI) was measured using an LAI-2000 Plant Canopy Analyzer (LI-COR, Inc., Lincoln, NE, USA) for the same 15 trees as above.At each tree, the device was placed with the optical sensor in eight different orientations under the canopy, 1 m above the ground, and using a 90 • view-restricting cap.Measurements for LAI estimation included a reference reading above the canopy and several readings below the canopy.All measurements were made at dawn.The coordinates for all trees (both sampled and visually scored) were recorded using a GPS (Garmin GPSMAP 64s) device with a spatial accuracy below 3 m.

Airborne hyperspectral and thermal imagery 2.2.1. High-resolution image data collection
We collected high-resolution images on 19 July 2017 using a visible near-infrared (VIS-NIR) hyperspectral imager (Hyperspec model, Headwall Photonics Inc., Fitchburg, MA, USA), a hyperspectral sensor covering NIR and short-wave infrared (SWIR) regions (Hyperspec NIR-100, Headwall Photonics), and a thermal camera (FLIR SC655, FLIR Systems, Wilsonville, OR, USA) installed in tandem onboard a Cessna aircraft operated by the Laboratory for Research Methods in Quantitative Remote Sensing (QuantaLab), Spanish National Research Council (CSIC).The imagery was acquired at 350 m above ground level with the aircraft flying on the solar plane, with a track width of 185 m, resulting in 720 ha of ground surface covered (Fig. 3).The VIS-NIR camera operated with 260 spectral bands (400-885 nm) and a radiometric resolution of 12 bits at a 1.865-nm centre wavelength (CWL) interval,   The thermal sensor (FLIR SC655, FLIR Systems, Inc., USA) had a resolution of 640 × 480 pixels and was connected to an acquisition board via the Gigabit Ethernet protocol.It was equipped with a 24.5mm F-number 1.0 lens providing an angular FOV of 45 × 33.7 • .The detector is a focal plane array uncooled microbolometer and has a spectral range from 7.5 to 14 μm.This camera is equipped with a thermoelectric cooling (TE) stabilisation system, which enables a thermal sensitivity below 50 mK.The characteristics of the sensors on board, as well as their specific setup in this study, are detailed in Table 3.

Image post-processing
Both hyperspectral sensors were radiometrically calibrated by means of an integrating sphere (Uniform Source System, model CSTM-USS-2000C, Labsphere Inc., North Sutton, NH, USA) using coefficients derived from the calibrated light source at four constant levels of illumination.Atmospheric correction for the VIS-NIR sensor was performed using the total incoming radiance measured with a field spectroradiometer (ASD HandHeld Pro, Malvern Panalytical Ltd., Malvern, England).Atmospheric correction was simulated with the SMARTS model (Gueymard, 1995(Gueymard, , 2001) ) for the NIR-100 sensor, which allowed the conversion of the radiance images to reflectance for the full range of both sensors.Optical thickness measurements from a Microtops II sunphotometer (Solar Light Co., Philadelphia, PA, USA) and meteorological measurements from a weather station (model WXT510, Vaisala Corporation, Vantaa, Finland) were used as input parameters for the model.Additionally, the effects of illumination and viewing angle were also adjusted using cross-track correction (San and Süzen, 2011) in both hyperspectral processing chains (Fig. 4).
Thermal calibration was conducted in the laboratory using a black body calibration source (LANDCAL model P80P, Land Instruments International Ltd., Dronfield, England) and by indirect calibration using ground temperature measurements with a handheld infrared thermometer (LaserSight from Optris GmbH, Berlin, Germany) as described by Calderón et al. (2015) (Fig. 4).Standardised canopy temperature (T c -T a ) was calculated by subtracting weather station air temperature (T a ) from canopy temperature derived from calibrated thermal imagery (T c ).
Orthorectification of hyperspectral images was performed using PARGE (ReSe Applications LLC, Wil, Switzerland) image rectification software for airborne optical scanner systems.Data from inertial measurement units installed on each sensor (IG-500N, SBG Systems S.A.S., Carrières-sur-Seine, France) were synchronised with each camera's imager and used as inputs for the software.Orthomosaicing thermal imagery was performed using Pix4D (version 3.1.23,Lausanne, Switzerland) photogrammetry software.Data pre-processing and image correction were as described in detail by Hernández-Clemente et al.

Spectral-based indicators
The high-resolution imagery acquired from each airborne sensor allowed us to identify and delineate tree crowns independently, seeking to minimise the effect of background and shadowing.This image segmentation was achieved using object-based methods through Niblack's threshold (Niblack, 1986) and Sauvola's binarisation techniques (Sauvola and Pietikäinen, 2000).Finally, we applied a binary watershed analysis using the Euclidean distance map for individual objects to automate the separation of the trees with overlapping crowns (Fig. 5).
Mean reflectance values for each tree were used to calculate 96 spectral-based indicators, including: i) VIs related to tree crown structure, chlorophyll, carotenoid, anthocyanin and water contents, and the epoxidation state of the xanthophyll cycle (detailed in Appendix A.); ii) chlorophyll fluorescence emission through the Fraunhofer line depth (FLD) method as described by Maier et al. (2003) using three bands for the in (L 763 nm ) and out (L 750 nm ; L 780 nm ) bands (3FLD); and iii) thermal dissipation using T c -T a , as previously described.We selected indicators mainly related to pigment composition and physiological variables to intensify the discriminatory capability of the models detecting healthy trees from trees with low severity levels (e.g.DS0 to DS1).

Model simulation analysis and plant trait retrieval
Canopy structural traits and biochemical composition were quantified by inverting the 3-D RTM FLIGHT8 model, using the pixels extracted from the tree crowns.We selected this model to minimise the impact of structural canopy variations, soil background, shadows and understory affecting the retrieval of PTs in heterogeneous forest canopies (Hernández-Clemente et al., 2017;Hornero et al., 2021).The model simulations were conducted using the atmospheric and ground data set collected during the image acquisition.Input variables for the model (Table 4) were established according to the field measurements, estimates from existing literature, and nominal parameters to ensure that the generated look-up table (LUT) covered the range of spectral variability in the tree crowns.The ill-posed problem generated when a wide range of PTs can be obtained from the same spectrum was alleviated using restricting ranges of input parameters based on field data measurements (Combal et al., 2003).The LUT calculation is processed in two phases, with the purpose of sequencing the inversion process to minimise the improperly posed problem and using the inversion methods best suited to each step, as detailed in Fig. 6.FLIGHT8 is coupled to leaf model FLUSPECT-B in the first phase to allow the retrieval of sun-induced fluorescence quantum efficiency (F i ) and with PROSPECT-D in the second phase to allow the retrieval of anthocyanins content.
In the first phase of analysis (Fig. 6 top), we determined LAI, C ab , C ar , and the sun-induced fluorescence quantum efficiency (F i ).We built a LUT of +800k simulations coupling the FLUSPECT-B (Vilfan et al., 2016) leaf reflectance model with the FLIGHT8 (Hornero et al., 2021) canopy model.FLUSPECT-B considers the pigment concentrations in the leaf and its photosynthetic efficiency, and FLIGHT8 takes into account the structural properties of the canopy and the effect of the soil and the understory.The senescence material, water (C w ), and dry matter (C dm ) contents, and the structural parameter N were set to nominal values using a value previously determined on this particular species in the same study area following Hernández-Clemente et al. ( 2017) (Table 4 -Phase 1).For comparisons with airborne hyperspectral images, we used convoluted model simulations assuming Gaussian band spectral response functions for their corresponding FWHM, centred on the band locations of each imager.The LUT-based inversion followed a multi-step approach in which the LAI values were determined first, followed by C ab , C ar , and finally, F i , using the MSR, PSSR b , CRI 700m , and 3FLD spectralbased indicators as proxies for each PT, respectively.
In the second phase, parameterisations retrieved from each tree were used to build a LUT of +200k simulations by coupling the PROSPECT-D (Féret et al., 2017) leaf reflectance model with the FLIGHT8 canopy model.The leaf reflectance model was used to specifically quantify A nth , as well as C w and C dm (Fig. 6 bottom).For the simulations and images, a smoothing algorithm based on local polynomial regression fitting (Cleveland et al., 1992) was applied to eliminate the noise affecting the model inversion.Through the use of wavelets (Strang and Nguyen, 1996), we decomposed the hyperspectral signatures into frequency components at different spectral scales, allowing us to identify the LUT spectra that showed a closer correspondence to the image spectra, which enhances the retrieval of the spectral features and hence plant traits.The continuous wave transformation was performed over three spectral ranges, a) 470-710 nm, b) 670-850 nm, and c) 1000-1300 nm and 1500-1700 nm, for the retrieval of A nth , C dm , and C w , respectively.At this stage, Kattenborn et al. (2017) and, more recently, Suarez et al.
(2021) used a similar method to obtain the PTs from hyperspectral images; however, the methods used in this study differ in that a) an extended spectral range was used based on double-coupled hyperspectral imagers, and b) only the first four transformation scales were used to characterise more specific spectral regions of interest, instead of the whole range of the signal.The performance of the model-based PTs was evaluated based on the Normalised Root Mean Square Error (NRMSE) (2) with the field data (LAI, C ab , C ar , A nth ).F s /F i were excluded from this comparison since they are both unitless.
where n is the number of observations, y i represents the ith actual observation of the PT y, y its mean and ŷi the predicted value from the model-based retrieval.

Plant trait selection and classification model approach
Once the PTs were obtained for each tree, feature selection was performed using a random forest (RF) classifier (Breiman, 2001;Liaw and Wiener, 2002) combined with an adaptation of an algorithm developed by Kursa and Rudnicki (2010), henceforth referred to as the Boruta algorithm.In the Boruta algorithm, shadow variables (permuted copies) are created by shuffling the original ones.The RF classifier is then applied to the initial data set, which is composed of the original variables and their shadow counterparts at the same time.The Boruta algorithm evaluates iteratively the importance of each original variable against the shadow variables to determine which variables are essential and at what magnitude.Variables are marked "Unconfirmed" when they are significantly lower than the shadows and are permanently discarded, while variables that are significantly higher than the shadows are marked "Confirmed".The process is repeated by re-generating the shadow variables and continues until only confirmed variables are left or until the maximum number of iterations defined at this stage is reached (set at 100 iterations).If the second scenario occurs, some variables may remain undecided, and they are considered "Tentative." The confidence level defined in the Boruta algorithm was established at 99% with a multiple comparisons adjustment using the Bonferroni method (Haynes, 2013) to control false positives.Once this process was completed, the importance of each PT in the severity and incidence classification process was obtained.
As an initial step, we performed the Boruta analysis using the fieldbased PT measurements, combining 2013, 2015, and 2017 evaluations, using only the three variables that were measured in all three years (F s , C ab , and C ar ) on 45 observations (15 evaluations and physiological measurements per year) and comparing them to the levels of severity and incidence.The purpose of this analysis was to understand the sensitivity of field-based PT to forest decline.The feature selection process started using all the model-based PTs retrieved for each tree, including 8 variables and 1146 observations.Then, the Boruta analysis was repeated for all the spectral-based indicators (N = 96).The objective was to improve the reliability of the model using complementary information added by VIs to the initial model-based PT feature selection.Due to the high fluctuations in the importance calculation when a large number of variables are used, the process in Boruta starts with three rounds, in which only the selected shadow variables are compared, while in the remaining roundsup to 100 iterationsthe original variables are compared with all the shadow variables.Fig. 7a presents an overview of the entire process for the selection of variables conducted in this study.
To strengthen the selection of features used in the classification model, the PTs were set in the established order according to their importance, and the VIs were added based on their previously calculated importance as well.At each stage of accumulation, the variance inflation factor (VIF) -an indicator that measures the extent to which the variance of an estimated regression coefficient increases due to collinearity (James et al., 2013) was calculated to avoid multicollinearity among the predictor variables.The variable was included only if the VIFs for all variables were below the threshold of 10.The final set of selected variables (PTs + VI) was used in the next screening stage.
Finally, Pearson's correlation analysis and p-values were used to determine the degree of relationship between the previously selected variables.Through the calculated correlation matrix, the variables to be A. Hornero et al. excluded were chosen to reduce the pair-wise correlations establishing a cutoff filter of 0.85 (Dormann et al., 2013).The Boruta algorithm was applied to the remaining variables to determine the importance of each selected variable.A principal component analysis (PCA) was also conducted to determine to what extent the components capture the majority of the variance and to identify the variables that provide the most information and whether the less relevant ones could be discarded to reduce the dimensionality of the data set.The filtered variables were retained for the development of the classification model, as shown in Fig. 7.
Two ML algorithms were used to classify disease incidence and severity levels: a supervised non-linear support vector machine (SVM) with a Gaussian kernel radial base function (Scholkopf et al., 1997) and the RF algorithm (Breiman, 2001), which were reported as the predominant classifiers on airborne imaging (Gigović et al., 2019;Gualtieri et al., 1999;Liu et al., 2017;Pal, 2005).
We evaluated models for two different cases (Fig. 7b), assessing incidence and severity classification from i) CASE 1, all trees assessed in 2017 (N = 1146), and ii) CASE 2, only confirmed trees, which were either still affected or unaffected again in 2019 (N = 506).To validate the selected models, we performed 100 iterations in which the data set was randomly divided into two samples, the training and the test samples by 80% and 20%, respectively, including k-fold cross-validation, in which the original sample was randomly partitioned into 10 equal-sized subsamples and repeated five times.Training data were subsampled for each iteration to avoid disproportionate frequencies of classes, which could negatively impact the model fit.Finally, we assessed the classification accuracy by calculating the overall accuracy (OA) and the Cohen's kappa coefficient (κ), which is based on comparing the observed agreement in a data set compared to what could occur by mere randomness (Richards and Jia, 1999).
After assessing the models' accuracy, we evaluated the anticipation capability using the visual evaluation 2 years later.In particular, we analysed whether the model was able to predict the unconfirmed cases trees that were assessed at a given incidence level and in the subsequent assessment, 2 years later, were assessed at the opposing leveland refined towards those that improve or worsen, i.e., those that change from having incidence to not having it and the opposite, respectively.This last analysis helped us understand the applicability of the model to predict a subsequent evaluation of forest decline using the data from previous images and evaluations.

Results
In this section, we present the results of the evaluation of the field and PT indicators to predict oak decline.The predictions of the remote sensing spatial model are described below, focusing on the ability to discriminate between damage levels as a function of PT alterations caused by oak decline.

Plant trait indicator assessment based on forest health field measurements
The bi-annual empirical data collected from 2013 to 2017 show the capability of the field-based PTs -C ab , C ar , and F s -to discriminate different levels of severity.Trees with low disease severity levels consistently had high values for F s , C ab , and C ar content (Fig. 8).F s was identified as having importance values two times higher than C ab and C ar in both severity and incidence levels (Fig. 8

Spectral-and model-based plant trait predictors of oak decline
As with empirical measurements, model-based values of F i and pigment content (C ab and C ar ) were inversely related to severity level (Fig. 9).The model-based PTs corresponded well with field data, having relatively low normalised error (NRMSE LAI = 0.13, NRMSE Cab = 0.16, NRMSE Car = 0.2, and NRMSE Anth = 0.12) and values within the expected range (data not shown).In Fig. 9, we also included the model-based retrievals of three other PTs (C w , C dm , and A nth ) and T c -T a derived from thermal data.Severity level was positively associated with T c -T a and C dm but negatively associated with LAI, A nth , and C w .These results are also consistent with the classification of incidence and severity obtained from field-based PT measurements, described in the previous section, where F s was one of the most relevant variables to detect oak decline.
Variable importance scores for model-based PTs and T c -T a are presented in Fig. 10.T c -T a and F i had the highest importance scores in models discriminating the first and second severity levels, while LAI and C dm were determined to be the most important for differentiating the remaining severity levels (Fig. 11a).
The principal components PC1 and PC2 explain 59.2% of the total variability, with 42.5% for PC1 and 16.7% for PC2 (Fig. 11b).The PTs T c -T a and LAI were strongly negatively correlated in PC1 and PC2 space, having nearly the same magnitude and angle but different directions.These results may indicate that the more abundant the vegetation, the greater the transpiration capacity and the lower the temperature difference.On an orthogonal ray, we find C ar , which is scarcely related to them, and its importance indicates its limited contribution to the model.The projection of F i in the first two components was opposite that of C ar , and this variable contributed substantially to model performance.This variable was more important than LAI for the development of an incidence classification model as well as distinguishing the first two severity levels.

Remote sensing spatial model predictions of oak decline
To find the best variables for predicting oak decline, the model-based PTs were combined with 95 VIs, of which only four passed the iterative VIF screening and pair-wise correlation threshold: LIC 3 , CI 2 , GnyLi, and MND (Fig. 12a).The variables with the lowest correlation coefficient    The variable selection process yielded 12 final indicators with a VIF below the established threshold.Two indicators were associated with photosynthesis regulation: F i and T c -T a .Four indicators were related to pigment content: C ab , C ar , A nth , and CI 2 .One indicator was related to fractional cover, namely, LAI.Five indicators were related to water content: C w , C dm , GnyLi, MND, and LIC 3 .Among all the indicators, the variables contributing the most to detecting different levels of incidence and severity were T c -T a and F i .These PTs were included as predictors for the final classification model of oak decline; their importance scores are presented in Fig. 12b.Variables with the highest importance included T c -T a , F i , and CI 2 .
Model accuracy was estimated on the basis of the total number of trees evaluated and confirmed cases reported in the subsequent survey (Fig. 13).Models classifying severity had an overall accuracy of 0.71 (κ = 0.51) in the best case (sampling of confirmed cases with RF algorithm).Models classifying incidence were more accurate (OA = 0.82; κ = 0.62) for this same scenario.The SVM algorithm was slightly more accurate when we used the complete data set (all trees; N = 1146), while RF performed better with the reduced-input data set (confirmed cases; N = 506).For models predicting incidence, the OAs were greater than 0.75 (thus considered 'high'), and the Cohen's kappa scores were fair to excellent, according to Cicchetti and Sparrow (1981).
The findings obtained when evaluating the anticipation capabilities (Table 5) indicate a better behaviour of the RF algorithm when building the model with both confirmed casesin which the best result is found and all cases.When we analyse the prediction rate while segregating  between trees that worsen (incidence: 0 → 1) and those that improve (1 → 0), for the former, the RF algorithm behaves better, and for the latter, SVM.
Example predictions from a final incidence classification model using the SVM algorithm are presented in Fig. 14, with results within the expected performance (OA = 0.81; κ = 0.62); comprehensive statistics are detailed in Appendix B. Through this map and the field evaluations, the differences found can be appreciated, as well as their spatial variability.

Discussion
The first objective of this study was to identify the PTs that are most useful for detecting the incidence and severity of decline symptoms in holm oak.One of the main challenges encountered in quantifying PTs in heterogeneous forest canopies was to minimise the impacts of shadows, soil background, and understory, which dilute the spectral signature of pure crowns (Hernández-Clemente et al., 2019;Liu et al., 2020;Markiet et al., 2016;Pisek et al., 2015).For this reason, advanced 3-D simulation models designed specifically for heterogeneous forest canopies were a major methodological component of this study.The critical step resided in the successful retrieval of model-based PTs that allowed us to understand the contribution to each PT and complete the ML modelling approach with additional information derived from other spectralbased, uncorrelated variables.

Table 5
Prediction rate for non-confirmed cases (NC) using models built with all cases or only confirmed ones.The best results for each case are highlighted in light green and in darker green overall.Field data confirmed the association between Q. ilex status and several key PTs.Trees with lower disease incidence had higher values of C ab , C ar , and F s .As symptom severity increases, the concentration of these pigments and the chlorophyll fluorescence decrease.The decrease rate we observed in chlorophyll fluorescence and pigment content associated with disease incidence are consistent with declines associated with drought and root rot stress found in other experiments under controlled conditions (Früchtenicht et al., 2018;Koller et al., 2013;Ruiz Gómez et al., 2018) and field surveys (Baquedano and Castillo, 2007;Camarero et al., 2012).
It is notable that we found F s to be more important than the other two PTs in identifying disease incidence from field data.Among model-based PTs retrieved from hyperspectral images, F i similarly had a higher importance score than any other pigment content indicator for discriminating severity.This pattern is consistent with the variable importance ranking of variables in Zarco-Tejada et al. (2018) for detecting Xf-induced symptoms in olive trees.
Including spectral-based PT indicators in our analysis provided insight into the functional responses of oak trees to different disease levels.T c -T a was the most important indicator, regardless of whether we discriminate by incidence or severity.Thermal imaging has improved the detection of several crop diseases in other studies, including Verticillium wilt in olive orchards (Calderón et al., 2015), water stress in peach orchards (Gonzalez-Dugo et al., 2020), and red leaf blotch in almond orchards (López-López et al., 2016).In this study, other important PTs included LAI and F i , followed by C dm , C w , and A nth , and to a lesser extent C ab and C ar .
Focusing on the discrimination capacity of each PT between the different stages of severity, T c -T a was generally an important predictive variable for determining disease incidence, but LAI and C dm were more relevant for discriminating mild and advanced severity classes.PCA showed that T c -T a and LAI contributed strongly to the same component but in opposite directions.Severity subsampling supports that while canopy temperature is particularly important for early incidence detection, LAI may provide more information about severity levels when a tree is infected.
Another important aspect of this study is the consideration of VIs alongside other model-based PTs for classification.CI 2 , GnyLi, MND, and LIC 3 were variables that passed through selection criteria, providing additional information and avoiding collinearity with other variables.In the final model, F s was selected as highly important, since part of the weight of LAI was distributed among other indicators such as CI 2 or LIC 3 .The importance of indicators from the SWIR region (MND and GnyLi) also exceeded that of C ab , C ar , C dm , and A nth .
This study showed that remotely derived PTs can support the early detection of holm oak decline, which was the second objective of this work.By applying a combination of 3-D model simulation and statistical analysis using ML approaches, we found that oak forest decline can potentially be detected at an earlier stage and that severity levels can be accurately assessed at broad scales.Predictive model accuracy was high, with an OA > 0.8 and κ > 0.6, indicating that the PTs we identified may be helpful for understanding physiological responses to disease and other stressors.The model accuracy achieved in this study is comparable to that of prediction models developed for olive trees by Zarco-Tejada et al. (2018).Taking advantage of a subsequent field evaluation performed 2 years later, the model's anticipation ability was evaluated, which brought us significantly improved results since it managed to anticipate up to 40% in the best scenario.
These results help bridge a gap in the understanding of how forest decline alters PTs via complex interactions between biotic and abiotic factors.Unlike in agricultural studies, where factors such as nutrient deficiency or water availability can be controlled, in forests these interactions are difficult to dissociate.Forest canopy heterogeneity poses a challenge for spectral data modelling, due to discontinuous architectures and interference from shadows, understory, and soil composition.The utility of satellite-based spectral indicators for detecting diseases has been examined by Hornero et al. (2020) in olive trees and Hernández-Clemente et al. (2017) in holm oak.A common finding in these studies was that the soil and the understory both influence the spectral signature and the fluorescence signal of aggregated pixels.In this work, we used the FLIGHT8 model, a recently improved version of the FLIGHT model, which minimises background effects by considering the spectral contribution of the understory.The success of the methods presented here may be partially due to the high spatial resolution of hyperspectral images collected and to the open nature of the woodland landscape.However, the FLIGHT8 model also accounts for increasing levels of pixel aggregation (e.g., using medium-to low-resolution satellite imagery) in heterogeneous canopies (Hornero et al., 2021).Future work should investigate the assessment and validation of the methods presented here performed with satellite imagery and/or different types of forest canopies.
In a practical level for the management of holm oak decline, the results show that T c -T a , F i , LAI and C dm are sensitive indicators to discriminate between DI levels [0-1].However, being able to quantify between DS [0-4] is clearly advantageous for effective management and mitigation of forest decline.According to the results, monitoring holm oak decline should include the analysis of the transition between severity levels based on indicators such as T c -T a , F i and LAI to discriminate between DS [0-1]; LAI, C dm , and T c -T a between DS [1-2]; and LAI, C dm and C w between DS [2-3].The transition between DS [0-1] is particularly important, as it indicates the progression between nonsymptomatic to symptomatic trees.
The proposed methodology has been validated on holm oak decline affected jointly by the presence of Pc and abiotic stress, mainly water stress.But the methods proposed here should be further tested to analyse the sensitivity of PTs to disentangle the interactive biotic and abiotic effects.Future studies should thus include the analysis of a wider range of holm oak forest locations solely affected by either abiotic or abiotic factors.This situation is quite unlikely, as it has been shown that holm oak decline is often linked to a combination of factors (Camilo-Alves et al., 2013;Corcobado et al., 2014).However, each factor may have different contribution in the decline process (Colangelo et al., 2018).Therefore, the discrimination between both factors should be considered in future studies.Furthermore, the sensitivity of PTs in oak forest affected by other types of pathogens or abiotic stress could be different.This work provides a breakthrough in analysing the spectral changes caused by xylem-limited factors such as root rot, water stress or soil compaction in heterogeneous forest stands.The challenge is to evaluate other types of forest decline processes on the methodological and empirical basis shown in this work.
Large-scale monitoring may be further improved by including multitemporal data to track disease evolution.However, such data will increase the complexity of analyses, particularly due to variation in understory and soil reflectance from image to image, their impact on aggregated pixels, and the need to account for those variations with RTM.The methodology presented here may be particularly relevant for the Sentinel-2 mission, which provides multitemporal data in the visible, infrared, and short infrared regions, and the FLEX mission, which will provide fluorescence data after 2022.

Conclusions
This study develops a new methodology that integrates field data, airborne imagery, physical RTM, and empirical modelling to retrieve PTs and assess their association with forest decline and provides a tool to detect early-onset symptoms of decline in holm oak.Hyperspectral image data, including VNIR and SWIR spectral regions, combined with thermal imaging and RTM can be used to monitor the spread of forest decline over large areas.Thermal-based canopy temperature (T c -T a ) was the most important PT in the model to discriminate between different levels of severity and incidence, followed by the fluorescence (F i ) and LAI, whereas LAI and C dm were the most relevant indicators for discriminating advanced stages of severity.Additional spectral indicators such as CI 2 or LIC 3 complemented LAI, and VIs in the SWIR region (GnyLi and MND) were more important than PTs such as C ab , C ar , or A nth .Overall, our results demonstrate that an integrated approach combining spectral-and model-based PT retrievals using 3-D RTM and classification methods is needed for the large-scale monitoring of forest decline.This approach enabled the successful prediction of holm oak decline at an early stage; it is essential to monitor harmful forest diseases, and this task can be augmented through the retrieval of accurate forest health traits from advanced airborne imagery and satellite data observations.

Fig. 1 .
Fig. 1.Location of the study site selected for PT retrieval through high-resolution imaging (top).The square shaded in red represents the area of the field survey, and the grey dots indicate individual evaluation.Photographs illustrating the heterogeneity of the landscape within the study area are shown below.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Examples of the four forest disease severity (DS) levels assigned to holm oak trees (N = 1146) during a field survey in 2017, which was repeated in 2019.The classes range from apparently healthy trees (DS = 0) to trees whose canopies show a prevalence of dead branches (DS = 3).
yielding 6.4-nm full-width at half-maximum (FWHM) spectral resolution with a 25-μm slit.The acquisition frame rate on board the aircraft was 50 frames per second with an integration time of 18 ms.The focal length was 8 mm, producing an angular field of view (FOV) of 49.82 • .The images derived from this sensor resulted in a ground resolution of 60 cm, allowing us to distinguish individual oak tree crowns from the background.Further details regarding the platform and sensor configuration can be found in Zarco-Tejada et al. (2013).The NIR-SWIR sensor was operated with 165 spectral bands (950-1750 nm), yielding 6.05 nm FWHM (25-μm slit size) and 16-bit radiometric resolution.The sensor was configured with an acquisition rate of 25 fps with an integration time of 40 ms.The 12.5-mm-focal-length lens resulted in an angular FOV of 38.6 • , with a 90 cm/px spatial resolution.The FWHM and the centre wavelength for each spectral band were derived after spectral calibration using a monochromator (Cornerstone 260 1/4 m, model 74,100, Newport Oriel Instrument, CA, USA) and an XE-1 Xenon calibration light source (Ocean Optics, USA).

Fig. 3 .
Fig. 3. Flight path for image acquisition.White arrows and line indicate the flight path and the hashed green square is framed over the study area.The background shows the VIS-NIR hyperspectral mosaic, overlaid on an orthophoto from the Spanish National Geographic Institute (IGN, OrtoPNOA 2017 CC-BY 4.0).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. From left to right, the images from the VIS-NIR, NIR-SWIR, and thermal sensors are shown over the study area.Bottom row contains zoomed-in views of scenes above (green rectangle).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7. Overview of the methodology used for a) the feature selection using the Boruta algorithm, including the iterative reduction of variables and the correlation analysis; and b) the classification approach based on 2017 with the different cases assessed and a final comparison with a subsequent evaluation in 2019.

Fig. 8 .
Fig. 8. Relationship between the level of severity and field-based plant traitschlorophyll content (C ab ), carotenoid content (C ar ), and steady-state leaf fluorescence (F s ) -in N = 45 trees measured in 2013, 2015, and 2017.Importance scores for field-based plant traits in detecting oak decline computed via the Boruta algorithm are shown at right.

Fig. 9 .
Fig. 9. Relationship between severity and plant traits retrieved from hyperspectral and thermal images in 2017.

Fig. 10 .
Fig. 10.Overall importance scores for each plant trait when classifying both incidence and severity disease levels using the Boruta algorithm.

Fig. 11 .
Fig. 11.Severity subsampling importance scores for each plant trait (PT) (a) and spectral-based principal component (PC) predictors' analysis (b) for both incidence (0-1) and severity (0-3) levels using the model-based PTs (C ab , C ar , A nth , C w , C dm , LAI, and F i ) and the thermal-image-based PT (T c -T a ).The bidimensional plots display each variable's loading, with vectors and the tree samples as points coloured by severity and incidence levels.The vectors' length approximates the variance represented by each variable, whereas the angles between them represent their correlations.

Fig. 12 .
Fig. 12. Plant traits (PTs) and vegetation indices (VIs) correlations (a) and variable importance scores for spectral-based PT and VIs with severity and incidence (b) to detect oak decline.

Fig. 13 .
Fig. 13.Overall accuracy (OA) and Cohen's kappa scores for classification models.Results were obtained from 100 iterations of random data subsets for training and validation (80/20).Average OA and kappa values are shown as horizontal bars, the former in colour and the latter as narrower grey bars with dotted edges.The error bars indicate the minimum and maximum OA values across iterations.

Table 2
Summary of field measurements and surveys.Forest health condition assessment: crown-level severity and incidence levels.

Table 3
Technical characteristics of the airborne imaging sensors and operational settings.

Table 4
Inputs for the model simulation analysis.F i and A nth are not modelled in PROSPECT-D and in Fluspect-B, respectively.