Expanding the Statistical Toolkit of Sports Scientists
Timothy J. Newans
B. Exercise Science, M. Medical Research
Griffith Health - Griffith University
March 2023
Submitted in fulfilment of the requirements of the degree of Doctor of Philosophy.
Supervisors: Clare Minahan, PhD & Phillip Bellinger, PhD, Brent Richards MB ChB
Preface
One of the most appealing aspects of watching sport is encountering world-class athletes compete against each other. While these performances create enthralling experiences, their exclusivity unintentionally creates some pitfalls when robustly analysing their relative performances and changes to their performance. With the increasing professionalism of sport and increasing availability of wearable technologies, Sports Scientists can gather a prolific amount of data about athletes’ physical, physiological, psychological, and tactical state.
This thesis aims to expand the statistical toolkit of Sports Scientists by providing novel research using statistical techniques that are underutilised by the sports science community to provide real-world examples of how correct statistical methodologies and principles can help overcome commonly seen pitfalls in sports science data sets. While each of these pitfalls are not unique to sports science, the combination of multiple pitfalls can hinder performing robust statistical analyses. Sports Scientists are required to utilise statistical methods that can account for an individual’s baseline level before an intervention, especially when there are imbalanced amounts of data for each athlete. Additionally, with the continual development of testing batteries and methodologies when understanding an athlete’s profile, it is easy to simply take the players that are exceptional in each attribute; however, more consideration is required to identify ‘all-rounder’ players, that perform well in numerous testing batteries without being exceptional in any attribute. Furthermore, given these athletes are at, or close to, the peak of their ability, the levels of improvement required for athletes are substantially lower than that which would be deemed successful at a general population. Similarly, given there is only a limited number of athletes compete at the highest level of competition in their given sport, consideration needs to be had for statistical methods that can accommodate for limited sample sizes.
This thesis calls for an increased adoption of mixed models for data sets in which there are repeated measures of athletes and it is necessary to account for the athlete’s inherent baseline in an imbalanced data set. This thesis presents Pareto frontiers as a statistical principle of identifying the ‘extreme’ athletes that excel in multiple attributes of interest, even if they may not be exceptional in any single attribute. Finally, this thesis highlights Bayesian inference as a statistical framework, in which prior subject matter expert knowledge of what the data typically seen can be incorporated into the statistical model to identify beneficial interventions more easily when dealing with small effect sizes and small sample sizes.
The first three studies provide methodological overviews of these three statistical aspects (i.e., mixed models, Pareto frontiers, and Bayesian inference). The final two studies are more applied studies to show how more rigorous statistical methods can become more commonplace throughout sports science research. In particular, the final study serves as a ‘capstone’ study, in which the uncertainty around the Pareto frontier can be estimated by a mixed model run in a Bayesian framework. Each of the five studies are presented in manuscript form in the format required by the individual journal guidelines (e.g., abstract length and structure); however, to assist with thesis flow, some slight formatting changes have been made. A single reference style was used to assist with readability, with all references placed together at the end of the thesis, rather than keeping the references in the style of each journal.
Throughout the thesis, all the lines of code required to execute the statistics required in each study is published alongside the study to provide readers with access, with the intent to encourage Sports Scientists to adopt similar practices in their own research. By publishing this thesis as an eBook, it is intended that the barrier for entry for Sports Scientists to such statistical methods and frameworks can be lowered and more accessible by reading research using such statistical methods and frameworks published within their own discipline.
Acknowledgements
To my primary supervisors, Clare Minahan and Phil Bellinger, thank you for all your support, advice, and guidance over the past six years’ worth of research project, Masters, and now PhD candidature. Thanks for welcoming me into the Griffith Sports Science team and putting up with my rants about statistics in sports. I didn’t think I’d be able to stand getting through my planned 5 years at uni, let alone this 9-year stint; however, Griffith has always felt like a place where I belonged, and the Griffith Sports Science team has been a large part of that. To Brent Richards, thanks for our 30,000-ft view conversations of the health and sports science landscape and always looking further down the track of where we could be and working out how we can get there. To Chris Drovandi, your statistical insights and catchups and have meant you have felt like a supervisor to me and I’m so thankful for your patience with me in explaining statistical concepts.
To Brad and Bucko from the NRL, it’s been a crazy journey, starting with identifying talent for the inaugural NRLW premiership and finishing with a World Cup win at Old Trafford! Thanks for always making me feel part of the team and encouraging me in my academic pursuits. It’s been a huge honour to have been even a small part of the evolution of female rugby league and I’m so excited to see what it becomes in the future.
To the Queensland Academy of Sport, thanks for your partnership in this project, allowing opportunities to present my research, and bounce ideas back and forth with like-minded sports scientists.
To my parents, Jamie and Jenny, and to Josh and Brodie, thank you for all your support and encouragement as I moved away from home. Even when I was 7 years old and said I wanted to be a sports statistician when we thought it wasn’t a real thing, thanks for encouraging me to explore opportunities and make it into a reality. To my in-laws, Ian and Jayne, thanks for the countless meals and for finally coming to terms with what I do is actually a real job.
Finally, to my wife Breeana, I could not have finished this without you. Thank you for just being you. Pushing me when I need to be pushed but also telling me to step back when I needed a break. It has been a whirlwind 4 years through this PhD: from getting married, to COVID dramas, and finally with the birth of our beautiful son, Theodore. Bringing Theo into the world has brought me so much joy and you have held our family together through these final few months throughout my submission period and just showed me unconditional love, even when I showed my rare signs of stress.
Abstract
Given the proliferation of data regarding an athlete’s physical, physiological, psychological, and tactical state, Sports Scientists in the applied setting are increasingly being required to provide statistics, both descriptive and inferential, to coaches and other support staff to provide decision-making insight. However, these data sets can be imbalanced (i.e., more/less data on some athletes), contain many variables, include small sample sizes, and display only small individual/group differences. While these properties are not unique to sports science, these properties are often a barrier to performing robust statistical analysis. Therefore, the overall aim of this thesis is to provide Sports Scientists with access to applications of statistical methods that will expand their statistical toolkit to accommodate data sets regularly seen in a sports science context.
As sports science research consistently contains repeated measures and imbalanced data sets, Study 1 calls for further adoption of mixed models when analysing longitudinal sports-science data sets. Mixed models were used to understand whether the level of competition affected the intensity of women’s rugby league match play recorded during club-, state-, and international-level competitions. As athletes featured in all three levels of competition and there were multiple matches within each competition (i.e., repeated measures), mixed models were shown to be the appropriate statistical method for these data. If a repeated-measures ANOVA were used for the statistical analysis in this study, at least 48.7% of the data would have been omitted to meet ANOVA assumptions. However, using a mixed model, it was determined that mean speed recorded during Test matches was 73.4 m·min-1, while the mean speed for NRLW and Origin matches were 77.6 and 81.6 m·min-1, respectively. Study 1 demonstrates how to use mixed models with typical data sets acquired in the professional sports setting and calls for mixed models to be more readily used within sports science, especially in observational, longitudinal data sets such as movement pattern analyses.
As athletes often require a mix of physical, physiological, psychological, and skill-based attributes, multiple variables need to be considered in tandem when identifying talent. Study 2 introduces Pareto frontiers as a technique that can identify the observations that possess an optimal balance of the desired attributes, especially when these attributes are negatively correlated. The study explores the trade-off relationship between batting average and strike rate as well as bowling strike rate, economy, and average in Twenty 20 cricket. Batting and bowling data from both the men’s (MBBL) and women’s (WBBL) Australian Big Bash Leagues were compiled to determine the best batting and bowling performances, both within a single innings and across each player’s Big Bash career. Each Pareto frontier identified players that were not the highest ranked athlete in any metric when analyzed univariately and yet possess an optimal ‘trade-off’ between attributes. Study 2 concludes that Pareto frontiers can be used when assessing talent across multiple metrics, especially when these metrics may be conflicting or uncorrelated and that Pareto frontiers can identify athletes that may not have the highest ranking on a given metric but have an optimal balance across multiple metrics that are associated with success in a given sport.
As sporting success can be determined by the smallest margins, detecting small, worthwhile effects is notoriously difficult, especially given the small sample sizes seen in sports science. Study 3 demonstrates how utilising a Bayesian framework when conducting research can incorporate prior information to assist in decision making when sample sizes and effect sizes are small. The study revisits a paper published in the European Journal of Sport Science investigating the effects of β-alanine on 4-km cycling TT performance through a randomised placebo-controlled trial with 14 trained cyclists. By analysing the data in both a frequentist framework and in a Bayesian framework with priors varying in terms of informativeness, the study demonstrates why incorporating prior information can improve the quality of sports science research and reinforces that β-alanine supplementation may be beneficial for cycling time trials of ~6 min in duration.
Study 4 builds on the call in Study 1 for mixed models to be further adopted, by using mixed models to quantify the position-specific demographics, technical match statistics, and movement patterns of the National Rugby League Women’s (NRLW) Premiership. As women’s rugby league grows, the need for understanding the movement patterns of the sport is essential for coaches and Sports Scientists. Global positioning system, demographic, and match statistics collected from all NRLW clubs across the full 2018 and 2019 seasons were compared between the ten positions using generalised linear mixed models. By understanding the load of NRLW matches, coaches, high-performance staff, and players can better prepare as the NRLW Premiership expands. These movement patterns and match statistics of NRLW matches can lay the foundation for future research as women’s rugby league expands.
Finally, Study 5 brings together the mixed models discussed in Study 1, Pareto frontiers discussed in Study 2, and Bayesian inference discussed in Study 3. Team sport athletes require both speed and endurance to perform in their given sport; however, it is difficult to excel at an elite level in both attributes. Univariate analysis of these attributes is commonplace in sports science whereby speed is typically evaluated independently of endurance. While this methodology readily identifies athletes excelling in either speed or endurance, it fails to highlight athletes who possess ‘the best compromise’ in speed and endurance. Study 5 presents an innovative approach to evaluating running intensity during short (10 s; anaerobic power/speed) and long (20 min; aerobic power/endurance) periods across three seasons of elite female football matches by using the Pareto frontier to visualize athlete characteristics simultaneously. Given the differing number of observations for each athlete, the study uses samples drawn from the Bayesian posterior distribution of both rolling averages estimated by a multivariate mixed model to calculate the probability an athlete sits on the true population Pareto frontier. The 10-s and 20-min rolling averages were calculated (325.1 and 104.7 m·min-1, respectively) for the 18 elite female footballers to provide coaches with a holistic view of their athletes’ speed and endurance capabilities and enable a more informed decision when identifying the athletes with the optimal balance of these polarizing attributes.
This thesis encourages Sports Scientists to develop in their statistical literacy to ensure validity and robustness in their statistics being presented for decision-making. As repeated-measures, imbalanced data sets, multiple variables of interest, small samples, and small effect sizes are commonplace in sports science, this thesis is written to urge Sports Scientists to develop a deeper understanding and improved competency in statistical methods. The thesis culminates with Study 5 illustrating how the three different statistical concepts explored in this thesis (i.e., mixed models, Pareto frontiers, and Bayesian inference) can be used in tandem to produce novel research. The thesis provides all the data and R code required to run all the analyses within this thesis to ensure Sports Scientists can replicate the analyses with their own data, as well as to provide an example to the sports science community of open and transparent research practices. By compiling this thesis as an eBook, it is intended that Sports Scientists can utilise these studies as a resource when conducting their own research.
Statement of Originality
This work has not been submitted previously for a degree or diploma in any university. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made in the thesis itself.
29/03/2023
Timothy Newans
Acknowledgement of Papers included in this Thesis
Section 9.1 of the Griffith University Code for the Responsible Conduct of Research (“Criteria for Authorship”), in accordance with Section 5 of the Australian Code for the Responsible Conduct of Research, states: To be named as an author, a researcher must have made a substantial scholarly contribution to the creative or scholarly work that constitutes the research output, and be able to take public responsibility for at least that part of the work they contributed. Attribution of authorship depends to some extent on the discipline and publisher policies, but in all cases, authorship must be based on substantial contributions in a combination of one or more of:
• conception and design of the research project
• analysis and interpretation of research data
• drafting or making significant parts of the creative or scholarly work or critically revising it so as to contribute significantly to the final output.
Section 9.3 of the Griffith University Code (“Responsibilities of Researchers”), in accordance with Section 5 of the Australian Code, states: Researchers are expected to:
• Offer authorship to all people, including research trainees, who meet the criteria for authorship listed above, but only those people.
• accept or decline offers of authorship promptly in writing.
• Include in the list of authors only those who have accepted authorship
• Appoint one author to be the executive author to record authorship and manage correspondence about the work with the publisher and other interested parties.
• Acknowledge all those who have contributed to the research, facilities or materials but who do not qualify as authors, such as research assistants, technical staff, and advisors on cultural or community knowledge. Obtain written consent to name individuals.
Included in this thesis are papers in Chapters 2-6 and Appendix A which are co-authored with other researchers. My contribution to each co-authored paper is outlined at the front of the relevant chapter. The bibliographic details (if published or accepted for publication)/status (if prepared or submitted for publication) for these papers including all authors, are:
Chapter 2: Newans, T., Bellinger, P., Drovandi, C., Buxton, S., & Minahan, C. The utility of mixed models in sports science: A call for further adoption in longitudinal datasets. International Journal of Sports Physiology and Performance. 17:8 p. 1289-1295.
Chapter 3: Newans, T., Bellinger, P., & Minahan, C. The balancing act: Identifying multivariate sports performance using Pareto frontiers. Frontiers in Sports and Active Living. 4:918946.
Chapter 4: Newans, T., Bellinger, P., Drovandi, C., & Minahan, C. The role of informative priors: A new look at the role of β-alanine on 4-km time-trial performance in cyclists. European Journal of Sports Science. Submitted for publication.
Chapter 5: Newans, T., Bellinger, P., Buxton, S., Quinn, K., & Minahan, C. Movement patterns and match statistics in the National Rugby League Women’s (NRLW) Premiership. Frontiers in Sports and Active Living. 3:618913.
Chapter 6: Newans, T., Bellinger, P., Drovandi, C., Griffin, J. & Minahan, C. Bayesian approximation of the trade-off relationship between running intensity measured during short and long periods using Pareto frontiers. Prepared for publication.
Appendix A: Newans, T., Bellinger, P., & Minahan, C. Identifying multivariate cricket performance using Pareto frontiers. MathSport Conference 2022.
Appropriate acknowledgements of those who contributed to the research but did not qualify as authors are included in each paper.
29/03/2023
Timothy Newans
29/03/2023
Supervisor: Clare Minahan
Additional publications not included in this thesis that have been made during this candidature
Minahan, C., Newans, T., Quinn, K., Parsonage, J., Buxton, S., & Bellinger, P. Strong, Fast, fit, lean, and safe: a positional comparison of physical and physiological qualities within the 2020 Australian women’s rugby league team. The Journal of Strength & Conditioning Research. 35 p. S11-S19.
https://doi.org/10.1519/JSC.0000000000004106
Griffin, J., Newans, T., Horan, S., Keogh, J., Andreatta, M., & Minahan, C. Acceleration and high-speed running profiles of women’s international and domestic football matches. Frontiers in Sports and Active Living. 3:604605.
https://doi.org/10.3389/fspor.2021.604605
Quinn, K., Newans, T., Buxton, S., Thomson, T., Tyler, R., & Minahan, C. Movement patterns of players in the Australian Women’s Rugby League team during international competition. Journal of Science and Medicine in Sport. 23:3 p. 315-319.
https://doi.org/10.1016/j.jsams.2019.10.009
Bellinger, P., Newans, T., Whalen, M., & Minahan, C. Quantifying the activity profile of female beach volleyball tournament match-play. Journal of Sports Science & Medicine. 20:1 p. 142-148.
https://doi.org/10.52082%2Fjssm.2021.142
Bellinger, P., Ferguson, C., Newans, T., & Minahan, C. No influence of prematch subjective wellness ratings on external load during elite Australian Football match play. International Journal of Sports Physiology and Performance. 15:6 p. 801-807.
https://doi.org/10.1123/ijspp.2019-0395