Introduction
Precision medicine is prevention and treatment strategies of diseases taking the individual variability into account.1 Recently, a similar concept called precision exercise medicine was brought forward where the role of physical activity (PA) and cardiorespiratory fitness (CRF) in health enhancement was acknowledged.2 However, currently, the focus in precision exercise medicine is mainly on exploring treatment procedures and exercise response variability in adults.2 3 Nevertheless, many chronic diseases have origins already in early childhood.4 Prevention strategies warrant more focus on children and adolescents, especially as health risks have associations with CRF5 and reversibility with exercise interventions in this age group.6
The 20-m shuttle run test (20MSRT) is the most commonly used field test to estimate CRF.7 Low 20MSRT score has adverse associations with many aspects of children’s and adolescents’ daily lives. Previous studies have reported 20MSRT associated with lower overall physical performance,8 poorer tissue health (including adiposity,8 brain9 and bone tissue10), lower cardiometabolic and psychosocial health, and cognitive performance.8 However, currently used methods to assign interventions based on the 20MSRT have limitations by their individual level accuracy.7 11 The ability to predict 20MSRT prospects during adolescence would enhance the identification of potential individuals for lifestyle interventions.
Machine learning (ML)-based pattern recognition approaches have emerged as promising alternatives to traditional statistical methods in precision exercise medicine.3 Random forest (RF) is a commonly used ML algorithm. Contrary to other high learning capacity methods, such as neural networks and support vector machines, major advantages of RF include that the extensive tuning of hyperparameters is not required and overfitting the model is usually of lesser concern. An additional benefit especially suited for our research goals is extracting the estimates of importance for each variable in the data.12 13 The main aim of this study was to evaluate the performance of RF on predicting future individual unfavourable 20MSRT status and development during adolescence based on 48 baseline variables, including physical, psychological and social indicators. Two prediction tasks were implemented: (Task 1) prediction of unfavourable future 20MSRT status (identification of individuals in the lowest 20MSRT tertile after 2 years), and (Task 2) prediction of unfavourable 20MSRT development in adolescents with limitations in their 20MSRT performance (identification of individuals with 20MSRT development in the lowest tertile among adolescents with baseline 20MSRT below median level). Task 1 focuses on the normal population, while Task 2 focuses specifically on children and adolescents who are more likely to experience the adverse outcomes related to lower 20MSRT performance.
We hypothesised that the baseline data contain variables that can predict future 20MSRT status and development. A secondary aim was to evaluate with a data-driven approach the best predictors of unfavourable 20MSRT prospects out of a wide range of baseline characteristics. We furthermore provide the predictive modelling algorithms used in this study for future research.