Performance indicators associated with match outcome within the United Rugby Championship

Objectives: The aims of this study were to: i) identify performance indicators associated with match outcomes in the United Rugby Championship; ii) compare the ef ﬁ cacy of isolated and relative datasets to predict match outcome; and iii) investigate whether reduced statistical models can reproduce predictive accuracy. Design: Retrospective analysis of key performance indicators in the United Rugby Championship. Methods: Twenty-seven performance indicators were selected from 96 matches (2020 – 21 United Rugby Cham-pionship). Random forest classi ﬁ cation was completed on isolated and relative datasets, using a binary match outcome (win/lose). Maximum relevance and minimum redundancy performance indicator selection was utilised to reduce models. In addition, models were tested on 53 matches from the 2021 – 22 season to ascertain prediction accuracy. Results: Within the 2020 – 21 datasets, the full models correctly classi ﬁ ed 83% of match performances for the relative dataset and 64% for isolated data, the equivalent reduced models classi ﬁ ed 85% and 66% respectively. The reduced relative model successfully predicted 90% of match performances in the 21 – 22 season, highlighting that ﬁ ve performance indicators were signi ﬁ cant: kicks from hand, metres made, clean breaks, turnovers con-ceded and scrum penalties. Conclusions: Relative performance indicators were more effective in predicting match outcomes than isolated data.Reducing featuresusedinrandomforestclassi ﬁ cationdidnotdegradepredictionaccuracy,whilstalsosim-plifying interpretation for practitioners. Increased kicks from hand, metres made, and clean breaks compared to the opposition, as well as fewer scrum penalties and turnovers conceded were all indicators of winning match outcomes within the United Rugby Championship. © 2022 The Authors. Published by Elsevier Ltd on behalf of Sports Medicine Australia. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
When quantifying success within Rugby Union, performance indicators (PIs) can be used to investigate and infer key processes that underpin winning performances.This approach has been studied in Rugby Union, including within the English Premiership and at international level, with most studies focussed on the predictive ability of PIs without consideration of the opposition's performance. 1In these studies, numerical techniques such as supervised machine learning [2][3][4] and varied hypothesis testing [5][6][7] were utilised to compare PIs between the winning and losing team performances.For example, winning teams in World Cup matches tend to win opposition lineouts, 6 gain more metres, 5 kick out of hand more 7 and concede most of their penalties between their opposition's 50 and 22 m. 7In contrast, losing teams carry less and have low lineout success. 5Other studies have also developed more complex models, using methods such as principal component and discriminant analyses to interpret PIs and alternative areas of performance such as training effects and physical markers. 8,9Additionally, Ortega et al. reported similar results to studies using random forest, [2][3][4] whilst also reporting higher average values for mauls won and line breaks in winning teams. 8Other factors that have been reported to have an impact on game performance include the match location 10 and the stage of competition. 7ecently, several articles have focussed in on performance indicators with consideration of the opposition, in which team PIs are relativised to reflect the differences between two teams within a match.For example, if one team made 100 passes and the opposition made 150 passes, the relativised passes would be −50 and 50 for each team, respectively. 2dopting this simple mathematical process, Bennett et al. 2,3 reported many relative variables had significant relationships with match outcome in both the English Premiership and the 2015 World Cup.These included kicks from hand, clean breaks, and average carry distance, and similar outcomes were established by Mosey et al. within sub-elite Australian Rugby. 4 Interestingly, Bennett et al. 2 identified that there was a clear improvement to match prediction when relative data were utilised in place of standard PIs, whereas Mosey and Mitchell 4 reported no clear improvement.
To date, feature selection methods, such as maximum relevance, minimum redundancy have not been used to develop performance prediction models, particularly in the United Rugby Championship (URC) or its predecessors (PRO14 and PRO12).Due to this, results of previous studies include large groups of PIs, which can be complex to interpret and difficult to implement within training or tactical strategies.Statistically optimising PI selection with reduced datasets has the potential to simplify dissemination and maximise up-take by sporting practitioners.
The primary aims of this study were to: i) identify performance indicators associated with match outcomes in the United Rugby Championship, ii) compare efficacy of isolated data and data relative to opposition in predicting match outcome, and iii) investigate whether reduced PI statistical models can reproduce predictive accuracy.

Data selection
Within the 2020-21 URC season, PIs for 96 regular matches were downloaded from OPTA (www.optaprorugby.com).Finals and knockout matches were not selected (n = 1 in this season due to structuring) as they may differ to regular matches, given previous research has separated different stages of competition. 3,5There has been no published analysis of reliability of OPTA for Rugby Union, but OPTA data within football have been shown to have high reliability with kappa values of 0.92-0.94. 11ugby Union data from OPTA is used by major clubs and broadcasters worldwide. 12The following 27 PIs were tabulated from each team's match summaries: carries, metres made, defenders beaten, offloads, passes, tackles, missed tackles, turnovers conceded, kicks from hand, clean breaks, turnovers won, lineouts won, lineouts lost, scrums won, scrums lost, rucks won, rucks lost, penalties conceded, free kicks, scrum penalties, lineout penalties, tackle/ruck/maul penalties, general play penalties, control penalties, yellow cards, red cards and home/away status.These PIs were selected as they formed the match report statistics at the time of download, thereby encompassing many areas of the game.
The 27 PIs formed the isolated data, whereas the relative data were calculated by deducing the difference in each PI between teams per match.For example, if team A made 200 m and team B made 400, the relative metres made for each team would be − 200 and 200, respectively.Nomenclature was used to identify features belonging in each group as follows: PI I indicated a PI in its isolated form and PI R indicated a PI in its relative form.For example, Carries I relates to isolated carries and Carries R relates to relative carries.

Approach
Random forest classification 13 (RFC) was completed on the full dataset for both isolated and relative data to categorise matches as either wins or losses.Each of the 27 PIs represents a feature in the RFC; the combination of all PIs, across the matches considered, forms the feature space of the algorithm.Generally, this feature space is interrogated by the RFC process to generate and drive ensemble decisions that promote classification of the data to one of the binary win/lose outcomes.
The method used an ensemble of classification trees by drawing a new training set each time, with replacement, from the original sample. 14This training set was drawn randomly using two thirds of the full dataset, with the remaining dataset forming the out of bag (OOB) test set.The tree was then tested using the OOB set. 14From this set, the error rate (number of incorrect predictions divided by the total number of predictions) was noted. 13This was averaged for each tree built, to give an OOB error for the random forest model. 13he mean decrease accuracy (MDA) was used as the measure of importance. 13MDA represents how much the model accuracy will decrease if a PI was removed from the model, with high values indicating that the PI is relatively more important.The prediction error from OOB data was recorded after permuting through each PI.The difference between the model with and without the PI was determined, then averaged over all trees and normalised. 14The z-scores of the MDA values were then calculated to determine significance.Partial dependency plots (see Fig. 2) were also used to monitor the relationship between match outcome and features used within modelling. 15aximum relevance minimum redundancy (MRMR) was used to simplify the dataset.The PI with the highest mutual information with match outcome is selected first with the successive features selected in order of maximising mutual information that they have with match outcome, whilst minimising the mutual information shared with the features already selected. 17The mutual information was calculated as: where I represents the mutual information, and ρ the correlation coefficient of features x and y.Pearson was used as the correlation method between two continuous features, Cramer's V for two binary features and Somers' D xy was used to compare continuous and binary features. 17The score q j , for maximising MRMR at each jth step was calculated as: where S refers to the set of features and k represents the iteration step prior to j.
An optimisation loop was created to maximise the model accuracy in predicting matches, whilst minimising the features used in modelling.A summary of the steps taken can be seen in Fig. 1.
Once a unique set of features was chosen for both datasets, the RFC parameters were then optimised.The features selected in the previous step were maintained, but the number of features considered at each split was changed at each iteration instead.Values from one to the maximum number of features in the model were chosen and tested at one step intervals.The same metric was used to choose a value as described in Fig. 1.Similar optimisation was used for the number of trees within the model, with only the number of trees changed in each iteration.Tree values between 50 and 2500 were tested at 50 tree intervals.The initial optimised number of trees was maintained the same in both models to allow comparison when the model was permuted, due to differences in trees impacting MDA values.Modelling and analysis were completed using the following packages in R: randomForest, 15 rfUtlities, 18 mRMRe 19 and rfPermute. 16

Model evaluation
After a final model was established for both datasets, data were sourced for the opening rounds of the 2021/22 season.This included 53 matches up to and including round 10 of the competition, excluding any postponed matches due to COVID-19.
The model was then applied in prediction.McNemar's test was used throughout for model comparison. 19The test statistic can be calculated by: where, B represents the number of outcomes correctly identified by the first model only, and C represents the number of outcomes correctly by the second models only. 19A 5% significance level was utilised for p-values and 95% confidence intervals were used throughout this study.

Results
The initial RFC for season 2020-21 was completed on both datasets.The full isolated model correctly classified 122 match performances out of 192, giving an accuracy of 64% with a 95% confidence interval (CI) of (56%, 70%).Within this, 66% of wins were correctly classified compared to 61% of losses.
The full relative model correctly classified 159 out of 192 match performances (83%, CI (77%, 88%)), with 82% of wins correctly classified and 83% of losses.McNemar's test confirmed that the full relative model outperformed the full isolated model with a value of 16.00 (p < 0.05).
The full models for both the isolated and relative sets were then reduced with feature selection and the RFC parameters optimised.For the isolated data, six features were the optimum number of features.These features were Metres Made I , Kicks from Hand I , Turnovers Conceded I , Scrum Penalties I , Turnovers Won I , and Lineouts Lost I .Using this reduced feature set, 1650 was identified as the optimal number of trees, whereas the features tested at each split were optimised at five.The reduced isolated model, given the above parameters and features, accurately classified 126 out of 192 match performances (66%, CI (58%, 72%)), including 69% of wins and 63% of losses.
Within the relative set, optimisation led to the selection of seven features for the reduced relative model.These features were Kicks from Hand R , Metres Made R , Scrum Penalties R , Scrums Lost R , Control Penalties R , Turnovers Conceded R , and Clean Breaks R .The optimal number of features tried at each split was one for the reduced relative model.To maintain the ability to compare MDA in both models, the number of trees was set to 1650 to match the reduced isolated model.The reduced relative model correctly classified 163 out 192 match performances (85%, CI (79%, 90%)), of which it correctly identified 84% of wins and 85% of losses.McNemar's test value was 16.40 (p < 0.05) illustrating that relative data outperformed the isolated data.There was no significant difference in reduced model performance, with McNemar's values of 0.25 (p > 0.05) for the isolated models' comparison and 0.75 (p > 0.05) for the relative models' comparison.
Both reduced models were used in prediction on the 2021-22 datasets for URC matches that had been completed at the time of analysis (n = 53).The reduced isolated model accurately predicted 76 out of 106 match performances (72%, CI (62%, 80%)), including 79% of wins and 64% of losses.With the reduced relative model, 95 match performances out of 106 were correctly predicted (90%, CI (82%, 95%)), with 89% of wins and 91% of losses.In prediction, the reduced relative model outperformed the reduced isolated model based on McNemar's test (χ 2 = 10.62,p < 0.05).
Both full models were also used in prediction on the 2021-22 datasets for URC matches.The full isolated model accurately predicted 77 out of 106 match performances (73%, CI (63%, 81%)), including 74% of wins and 72% of losses.With the full relative model, 96 match performances out of 106 were correctly predicted (91%, CI (83%, 95%)), with 91% of wins and 91% of losses.In prediction, the full relative model outperformed the full isolated model based on McNemar's test (χ 2 = 24.60,p < 0.05).When the full and reduced models were compared in prediction, there was no evidence of significant differences in performance (χ 2 = 0 in both cases, p > 0.05).
The MDA z-scores for each feature in the model are summarised in Table 1 along with the corresponding p-values.Within the reduced isolated model, only five features were significant based at the 5% significance level.These features were Metres Made I , Turnovers Won I , Kicks from Hand I , Scrum Penalties I and Turnovers Conceded I , with their related MDA z-scores ranging from 21.7 to 12.5.
Within the reduced relative model, only five features were significant including Kicks from Hand R , Clean Breaks R , Scrum Penalties R , Metres Made R and Turnovers Conceded R .The range of MDA z-scores was much larger for significant features in the relative set (51.6-17.7),and this is primarily due to the magnitude of the MDA z-score for Kicks from Hand R .
Partial dependence plots for all statistically significant features within the reduced relative model, based on MDA z-scores, are presented in Fig. 2. Plots A-C illustrate the partial dependence across the range of Kicks from Hand R , Clean Breaks R , and Metres Made R respectively, which were all positively associated with winning.Plots D and E show partial dependence for both Scrum Penalties R and Turnovers Conceded R , with both PIs negatively associated with winning.There is no increase in probability of winning after 10 relative kicks (Fig. 2A) and no increase in probability of winning after approximately 300 relative metres made (Fig. 2B).Clean Breaks R has no increase in probability of winning after 10 additional clean breaks (Fig. 2C) whereas, Scrum Penalties R tend to have no increase in winning with less than −5 penalties (Fig. 2E).

Discussion
The focus of this study was to investigate which PIs were associated with match outcome within the URC, compare efficacy of isolated and relative data in match outcome prediction and investigate whether feature selection methods could be used in PI statistical models to reproduce accuracy with smaller datasets.The current results indicate that kicks from hand, metres made, clean breaks, turnovers conceded, and scrum penalties were key PIs in differentiating between winning and losing performances within URC matches.Furthermore, the current study corroborates what has been recognised within literature: team performance data are much more efficient at predicting match outcome when expressed relative to the opposition's performance. 2This suggests that team performance should not be analysed independently, but in context of the opposition.The study also demonstrated that utilising feature selection methods to reduce datasets does not negatively impact model effectiveness in this context.The ability to generate smaller datasets with methods such as MRMR, whilst maintaining high predictive accuracy, is valuable when ensuring that results can be successfully translated into a practical application.This, in turn, assists with the development of tactical planning, informs elements of coaching, and simplifies monitoring processes.
Kicking was significant within both isolated and relative modelling datasets; however, when data were relativised, kicks from hand became a more effective differentiator between match outcomes.When kicks from hand was removed from the model, it led to an average mean decrease accuracy z-score of 51.6, demonstrating its power in differentiating between wins and losses.Over time, the nature of kicking has changed within Rugby Union 21 and can be performed in search of territorial or tactical advantage.The outcomes of this study suggest that promoting tactics that allow a team to gain additional kicks against their opposition may be beneficial to match success.Using box kicks in vulnerable positions, kicks for touch, kick chase tactics and "winning the kicking battle" are all examples of areas where teams may be able to perform additional kicks against their opposition.The latter refers to periods of play where teams exchange many kicks in a row, and starting and finishing these battles could be another area that would assist with increasing relative kicks from hand within a match.The introduction of the 50:22 law in the test data is likely to drive additional kicks within winning teams. 21][4] Attacking metrics, including metres made, and clean breaks also ranked highly within the reduced relative model, demonstrating the importance of an effective attack and conversely a strong defence.This corroborates what has been reported within international and subelite Rugby Union. 2,4Increased relative metres made can be an indicator of gainline success but also preventing this for the opposition, hence it is clear why this metric can indicate a winning performance over the entirety of a match.Clean breaks featured in the reduced relative model only and was the third most important feature based on MDA.This suggests that making clean breaks alone is not key to match success, but the ability to make more than the opposition is.This could be achieved by completing more clean breaks or potentially by preventing the opposition from executing clean breaks.
Another key area of importance is at the breakdown, with turnovers conceded a significant feature within both models, which is in line with what had been reported within literature for men's Rugby Union. 4This suggests that conceding fewer turnovers than the opposition, or alternatively forcing more turnovers from the opposition, is key to winning matches.
Set piece discipline is also considered important based on MDA, in the form of scrum penalties.Over time, the attitude towards scrums has changed, with packs getting heavier, 23 law changes, and entire front row substitution typical in every match.Stolen scrums are uncommon within professional Rugby with the average in the URC 2020/21 season 0.47 per match.This means other methods are needed to force turnovers in the scrum and, hence, scrum penalties become more important to the game.The team's ability to control their own scrum and opposition scrums is key to forcing the opposition to concede scrum penalties.Teams can then use awarded penalties to either kick for points or gain a tactical or territorial advantage.Scrum penalties have not yet been identified as a contributor to winning performances, but many studies have identified total penalties conceded a key indicator of match success. 2,7,24It is possible that breaking down penalties in different areas of the game may be useful, especially in practical implementation of research.Some key differences were found between URC and the other competitions.Within World Cup groups and knockout matches, missed tackles and tackle ratio were both reported as significantly different between winning and losing teams. 6This finding suggests that missed tackles have a different impact in higher stake games such as group and knockout matches.It is not clear whether this relationship is strictly seen at the international level, as knockout matches have not been analysed within this study.Within the Six Nations, there were many similarities between studies such as line breaks, turnovers won, and possessions kicked. 8Additionally, mauls won was highlighted as a key feature, which was not identified in the URC.This is interesting as it could suggest that mauls are more effective within winning teams in the Six Nations.Premiership Rugby, arguably the closest in style to the URC structure, has reported many key features in common with this paper, as previously discussed. 2However, winning teams in the Premiership tend to have a higher difference in metres per carry relative to their opposition.This was not identified within our current study, as average carry was not available as part of this data set but could be calculated in future analysis in this area.
Whilst random forest modelling is a recognised and popular method within Rugby Union 1 performance analysis, feature selection has not been used within key literature, with the possibility of its application only being discussed briefly. 25Using MRMR has allowed the current model to target the key features that are driving successful performances, whilst removing highly correlated features from within the model.This assists with removing similar features, for example metrics such as defenders beaten and clean breaks, which are highly correlated due to their relationship in matches.This is useful both in reducing the noise in the modelling, but also within practical applications of this research.Within a professional Rugby environment, the reduced feature set can be utilised by practitioners to focus on a manageable set of parameters that can be focussed on in training.Many other feature selection methods, such as Principal Component Analysis and Partial Least Squares regression can also be used to reduce features.However, these methods create features based on linear combinations of the original features, which adds complexity when interpreting and applying results 26 as seen in research in Rugby League. 27The interpretation cost may outweigh the benefit of model simplification, which is avoided by using MRMR as it maintains features in their original form. 18The benefit of MRMR feature selection in combination with other machine learning methods is unknown but it may be advantageous for future studies to investigate this.The reduced relative model was effective in prediction, with 90% of match performances correctly classified.The majority of errors (9 out of 11) were from matches with point differences of six points or less, suggesting that close matches may be more difficult to predict.This is interesting as in Rugby Union, any team who lose by seven points or fewer is awarded a bonus point. 28Close matches have been studied independently, with studies using cluster analysis to decide what defines a close game rather than using the bonus points laws to decide. 29,30Further research is needed to understand this implication as well as how it can be avoided, if at all, in future studies.Previous studies have highlighted the importance of match location to winning and losing teams, 10 however our current study did not find that this was the case.Given the COVID-19 pandemic, all matches within the 2020/21 dataset took place with no crowds, which may have diminished the influence of home advantage.

Conclusions
Indicators of winning performances within the URC can be simplified to five key features; kicks from hand, metres made, clean breaks, turnovers conceded and scrum penalties.Kicking has been highlighted as a key driver in match success, with a team kicking more than their opposition leading to increased probability of winning.It has also demonstrated the effectiveness of using data relative to the opposition, and that simplified datasets can be used to understand the drivers of match outcome in Rugby Union.MRMR has allowed a small set of PIs to be highlighted in this study, leading to manageable results when put into a practical perspective.

Fig. 1 .
Fig. 1.Flow diagram outlining steps taken within the optimisation loop to maximise model OOB accuracy whilst minimising the number of features used in modelling.A similar loop was used to optimise random forest parameters.

Fig. 2 .
Fig. 2. Partial dependence plots for significant relative features (based on mean decrease accuracy z-scores).The plots show the marginal effect of relative kicks from hand (A), relative metres made (B), relative clean breaks (C), relative turnovers conceded (D) and relative scrum penalties (E) on classification of match outcome.The x axis contains the range of values for each of the previously named PIs, and the y axis contains the partial dependence.Partial dependence values range from −1 to 1, with negative values leading to increased likelihood of a match performance being classified as a loss and positive values leading to increased likelihood of a performance being classified as a win.

Table 1
The mean decrease accuracy values and associated p values based for the isolated and relative reduced model features.