Article Text

Clinician-friendly physical performance tests in athletes part 3: a systematic review of measurement properties and correlations to injury for tests in the upper extremity
  1. Daniel T Tarara1,
  2. Lucas K Fogaca2,
  3. Jeffrey B Taylor3,
  4. Eric J Hegedus3
  1. 1Department of Exercise Science, High Point University, School of Health Sciences, High Point, North Carolina, USA
  2. 2Department of Biology, High Point University, High Point, North Carolina, USA
  3. 3Department of Physical Therapy, High Point University, School of Health Sciences, High Point, North Carolina, USA
  1. Correspondence to Daniel T Tarara, Department of Exercise Science, High Point University, School of Health Sciences, 833 Montlieu Avenue, High Point 27268, NC, USA; dtarara{at}highpoint.edu

Abstract

Objective In parts 1 and 2 of this systematic review, the methodological quality as well as the quality of the measurement properties of physical performance tests (PPTs) of the lower extremity in athletes was assessed. In this study, part 3, PPTs of the upper extremity in athletes are examined.

Methods Database and hand searches were conducted to identify primary literature addressing the use of upper extremity PPTs in athletes. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist was used to critique the methodological quality of each paper. The Terwee Scale was used to analyse the quality of the measurement properties of each test.

Results 11 articles that examined 6 PPTs were identified. The 6 PPTs were: closed kinetic chain upper extremity stability test (CKCUEST), seated shot put (2 hands), unilateral seated shot put, medicine ball throw, modified push-up test and 1-arm hop test. Best evidence synthesis provided moderate positive evidence for the CKCUEST and unilateral seated shot put. Limited positive evidence was available for the medicine ball throw and 1-arm hop test.

Conclusions There are a limited number of upper extremity PPTs used as part of musculoskeletal screening examinations, or as outcome measures in athletic populations. The CKCUEST and unilateral seated shot put are 2 promising PPTs based on moderate evidence. However, the utility of the PPTs in injured populations is unsubstantiated in literature and warrants further investigation.

  • Upper extremity
  • Injury
  • Functional
  • Performance
  • Athlete

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Numerous healthcare organisations and sports bodies have released consensus statements regarding the importance of the preparticipation screening examination to identify athletes at risk of injury.1 ,2 Specifically, the IOC has stated that screening must be reliable, sensitive, specific, inexpensive, easy to perform and widely available.2 Physical performance tests (PPTs) meet this definition and go one step further: they are portable and can be performed in many different environments and contexts.

Despite the obvious benefits of PPTs, little is known about whether these tests are appropriate tools for musculoskeletal injury screening, and what little is known is uninspiring.3 Two previous systematic reviews of PPTs in the lower extremity found little evidence to support their use as either outcome measures or prognostic variables.4 ,5 Specifically, no test possessed the measurement properties outlined by the IOC to serve as part of a musculoskeletal injury screening examination. Further, no single PPT had sufficient measurement properties to warrant use in monitoring recovery and determining readiness for return to play. Therefore, a comprehensive research agenda, including a review of literature, on upper extremity PPTs is needed.

In comparison to PPTs that focus on the lower extremity, there are fewer upper extremity PPTs. This is despite the time lost from sport and long-term sequelae that can result from upper extremity injuries.6 To date, such metrics as the reliability, validity and responsiveness of upper extremity PPTs have yet to be comprehensively reported. Therefore, the purpose of this study was to systematically review the literature pertaining to PPTs of the upper extremity, and to examine the quality of the literature and measurement properties of established tests. Our hypothesis was that findings would be similar to those of the lower extremity: limited evidence to support the use of PPTs as screening examinations or outcome measures for musculoskeletal injury.

Methods

We framed our research question using the Patient, Intervention, Comparison and Outcome (PICO) method. Our primary research question was: ‘Do upper extremity performance tests have any relationship to upper extremity injuries in athletes of any age?’ Secondary research questions were whether PPTs predicted injury, and whether PPTs could be used as an outcome measure in the clinic.

Literature search

The search was conducted in PubMed and translated with the assistance of a librarian for CINAHL and SPORTDiscus databases (see online supplementary appendix A). All articles were included up to 17 April 2015. The search strategy included terms pertaining to athletes and injury confined to the upper extremity, and these were combined with terminology relating to performance tests. Results were limited to English language and humans. In addition, a hand search was conducted based on the identification of review articles using the ‘Clinical Queries’ function in PubMed. Reference lists of these reviews were assessed for potentially eligible articles, as were the reference lists of any article that we read in full. Finally, Google Scholar was searched using the names of known upper extremity PPTs.

Eligibility criteria

Two authors (DTT and LKF) reviewed the titles and abstracts of all studies identified during the literature search to generate a list of selected articles to read in full. Discrepancies were resolved via consensus discussion, and a third author (EJH) refereed if disagreements could not be resolved.

The inclusion criteria were:

  • Analysed at least one upper extremity PPT that was accessible across settings (clinic, courtside, field side), and feasible (portable, affordable and easy to administer).

  • (1) Studied an exclusive population of organised professional, collegiate or high school sports athletes (eg, baseball, volleyball, tennis, javelin); (2) used terms ‘elite athlete’, ‘professional athlete’, ‘semi-professional athlete’, ‘club sport’, ‘intramural athlete’, ‘recreational athlete’ or ‘sport participant’; or (3) when 50% of study's participants had a physical activity level 5 or high based on the Tegner scale.7

  • Original, primary research (reviews were excluded).

We operationally defined PPT as a subset of functional assessments8 used by sports medicine clinicians to discern aspects of athleticism (power, agility, endurance, flexibility), injury risk and return-to-play readiness.4 PPTs were accepted as upper extremity if they tested the region of the body from shoulder girdle, including the scapula, and supporting structures to the end of the fingers.

We excluded studies of PPTs that used equipment for three-dimensional motion capture, upper body ergometers, rowing ergometers, or other technology-dependent instrumentation, conference proceedings, dissertations and theses, case studies, and case series. All articles were read in full by two authors (EJH and DTT) to further include or exclude them for scoring of methodological quality and quality of measurement properties. Any discrepancies were resolved via consensus discussion or consultation with a third author (LKF) if consensus could not be reached.

Data extraction, summaries and best evidence synthesis

Included studies were summarised based on population, injury classification, sport, PPT description and study results. PPTs were grouped based on testing procedures to determine whether the naming of PPTs and their conduct was consistent across studies. Finally, we summarised the methodological quality of the literature and the quality of the measurement properties of the PPTs and combined them using a best evidence synthesis.9 One author scored the methodological quality and measurement properties (DTT).

A modified version of the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) tool10 ,11 was used to assess the quality of methodology of all included articles. Our version of the COSMIN tool consisted of five categories (reliability, agreement, hypothesis testing, criterion validity and responsiveness). The methodological quality of each categorical property was individually scored on a scale of excellent, good, fair and poor. To determine the quality of PPT measurement properties, the Terwee Scale of positive (+), indeterminate (?) and negative (−) was used.12

For the best evidence synthesis, we combined the results of the COSMIN tool assessment13 and the Terwee Scale12 for each article. Since the COSMIN scoring guidelines were originally designed for health-related patient-reported outcomes (HR-PROs)10 and adapted for performance measures,14 final COSMIN quality scores were not based on sample size to avoid automatic exclusion of small (n<30) studies.15 The COSMIN's original scoring guidelines sets a minimal threshold for adequate sample size at >30 (fair quality).13 Automatically disqualifying small studies (<30) as poor quality may unnecessarily exclude studies that report large effect sizes for physical performance measures.14 Following the recommendation by Bartel et al,14 studies scored as poor quality based solely on small sample sizes (n<30) and without formal power analysis were retained, and accounted for in the best evidence synthesis as limited evidence. The lowest score methodology was used; poor quality studies were eliminated, as recommended previously, to contextualise best evidence.4 ,14 The grading key4 for the best evidence summary was:

  • Unknown: investigated in studies of exclusively poor methodology or not investigated in any study.

  • Conflicting: contradictory findings.

  • Limited: one study of fair methodological quality.

  • Moderate: multiple studies of fair methodological quality or one study of good methodological quality.

  • Strong: multiple studies of good methodological quality or at least one study of excellent methodological quality.

Results

Description of included studies and PPTs

The database searches identified 1021 studies for preliminary title/abstract screening. A majority (n=854) of studies were eliminated for not meeting the inclusion criteria of PPT, athletic population or upper extremity (UE) body region. In addition, 4 summary abstracts, 114 case reports, 17 narrative reviews and 19 systematic reviews were excluded. Thirteen studies were eligible for full-text review. The preliminary article selection process yielded a moderate16 correlation (κ=0.55) with 99.2% agreement. The full-text article screening process eliminated 11 studies for not meeting inclusion criteria as a feasible PPT or being administered in an athletic population. Hand searching identified nine additional articles for inclusion. The outline of study selection and inclusion is in figure 1. The preliminary analysis consisted of 11 studies (table 1). Online supplementary appendix B provides the COSMIN scoring tables.

Table 1

Summary of included studies in preliminary analysis

Figure 1

Flow chart of study selection and inclusion.

Summary of methodology quality (COSMIN) of included studies

Reliability

Based on two reliability studies of upper extremity PPTs, there is poor25 to good20 evidence of test-retest reliability. Both studies reported good to excellent interclass coefficient correlation (ICC) values (>0.75). A small sample size was used for the one-arm hop test,25 which resulted in a poor quality rating. There was good methodological quality evidence for the reliability of the closed kinetic chain upper extremity stability test (CKCUEST).20

Agreement/measurement error

Measurement error of UE PPTs was analysed in one study of good methodological quality, which reported measurement error (SE measurement (SEM)) and minimal detectable change (MDC) with 95% CIs for the CKCUEST.20

Hypothesis testing/construct validity

There was conflicting evidence from two studies for hypothesis testing24 ,27 and consistent evidence from two studies of construct validity.20 ,21 Hypothesis testing of the unilateral seated shot put test was of excellent quality,24 while the CKCUEST was of fair quality.27 Methodological quality was excellent for hypothesis testing of the unilateral seated shot put test.24 The construct validity was established for the CKCUEST and evidence was of fair quality.20 ,21

Criterion/predictive validity

Two studies, of fair methodological quality, investigated the criterion validity of PPTs of the shoulder.22 ,23 Both correlated One-repetition maximum (1RM) bench press values to the 4.5 kg two-handed seated shot put test distance. Predictive validity for the CKCUEST was reported in one study, but was of poor quality due to a small sample size (n=26) and a prediction model based on only six injury cases.18

Responsiveness

Responsiveness of upper extremity PPTs was reported in two studies.17 ,26 The CKCUEST was used to quantify improvement for SICK (Scapular malposition, Inferior medial border prominence, Coracoid pain and malposition, and dysKinesis) scapula in asymptomatic overhead athletes following a 3-week open kinetic chain exercise intervention and was rated as poor quality because of low sample size (n<30).17 The seated medicine ball throw was used to quantify improvement following traditional and maximal concentric acceleration strength training programmes and was rated to be of fair methodological quality.26

Summary of quality of measurement properties (Terwee Scale) for included PPTs

Ten studies reported measurement properties. One study27 did not report any measurement properties included to the COSMIN tool or Terwee Scale, and was excluded from further analysis. Measurement properties were reported for the following six PPTs (online supplementary appendix C provides the Terwee Scale scoring tables):

  • CKCUEST;17 ,18 ,20 ,21

  • Seated shot put (two hands);22 ,23

  • Unilateral seated shot put;24

  • Medicine ball throw;26

  • Modified push-up test;27

  • One-arm hop test.25

The seated shot put (two hands), unilateral seated shot put and medicine ball throw were not grouped due to differences in testing procedures (see table 1). The quality of PPT measurements, based on the Terwee Scale, is described below.

Reliability

Test-retest reliability was reported for two of the six PPTs. The studies reporting on one-arm hop test25 and CKCUEST20 were of positive quality and reported ICC values >0.75.

Agreement/measurement error

One study reported data consistent with agreement/measurement error (SEM and MDC for the CKCUEST).20 Since the Terwee Scale scores this category based on reported minimal important change (MIC) and smallest detectable change (SDC) values,12 we calculated the MIC and SDC. There was positive evidence for the quality of the measurement properties of agreement/measurement error for the CKCUEST.

Hypothesis testing/construct validity

Hypothesis testing of unilateral seated shot put yielded positive quality.24 Conversely, the modified push-up test27 was of negative quality. Construct validity of the CKCUEST was of positive quality for discriminate validity20 in distinguishing between healthy, active UE recreational sport athletes and sedentary individuals with a positive history of subacromial impingement syndrome. With respect to convergent validity, the CKCUEST was rated as negative quality in correlating to upper quarter y balance test (UQYBT).21

Criterion/predictive validity

The two-hand seated shot put22 ,23 demonstrated negative quality criterion validity when referenced to 1-RM bench press values. Criterion validity, in this context, is an indication of how a PPT reflects a ‘gold’ or reference standard based on statistical correlation.13 There was positive quality for predictive validity: the CKCUEST prospectively predicted in-season shoulder injury in male, collegiate football players.18

Responsiveness

The responsiveness of the CKCUEST17 and the medicine ball throw26 were both of positive quality, with demonstrated score improvement after following 3-week17 and 14-week26 resistive exercise programmes, respectively.

Best evidence synthesis

The results of the best evidence synthesis are summarised by test in table 2.

Table 2

Best evidence synthesis by PPT

Closed kinetic chain upper extremity stability test

Several studies reported on the CKCUEST. However, two articles17 ,18 were eliminated for poor quality evidence and one for not reporting measurement properties.19 The two remaining studies20 ,21 provided varied evidence based on measurement properties. There was moderate evidence for reliability and agreement,20 fair evidence for discriminate construct validity,20 and limited evidence for convergent construct validity.21 The CKCUEST discriminated between young (21.7–23.1 years), active, recreational athletes and sedentary adults (45.1–49.8 years). With respect to convergent validity, the relationship between the UQYBT and CKQUEST had a positive weak correlation for the dominant (r=0.43) and non-dominant (r=0.49) sides.16 ,21 The responsiveness, criterion validity and injury prediction in athletic populations is unknown.

Unilateral seated shot put test

There was moderate evidence for hypothesis testing based on a good methodology, and positive quality measurement properties with hypothesis testing in one study.24 Chmielewski et al24 reported that the unilateral seated shot put test should be scored on an allometric scale (exponent of 0.35), and in healthy athletes, a 5–10% great difference can be accounted for when comparing dominant to non-dominant arm. The reliability, agreement, criterion validity and responsiveness are unknown. There is no evidence pertaining to athletes with UE injury.

Seated shot put (two hands)

Two studies22 ,23 reported fair evidence on the criterion validity of the seated shot put test in relationship to 1-RM bench press. The correlation between the PPT and strength measure was weak (r=0.38) in female athletes and male (r=0.17 to 0.57) athletes. There was no evidence for reliability, agreement, construct validity, responsiveness or injury prediction.

Medicine ball throw

There was limited evidence for the measurement properties of the seated medicine ball throw. Jones et al26 reported 2.8–9.4% improvement in throwing distance following traditional and maximal concentric acceleration strength training programmes. These findings reflect positive quality measurement properties for the PPT, but were only demonstrated in healthy athletes. The evidence relating to reliability, agreement, hypothesis testing/construct validity, and criterion validity for the seated medicine ball throw are unknown.

One-arm hop test

Limited quality evidence is available for the one-arm hop test. One study reported measurement properties for this PPT in a small sample (n=26) of uninjured athletes.25 Test-retest reliability was established in subset of 13 wrestlers (ICC=0.8) and 13 football players (ICC=0.78). One-arm hop test scores were 4.4% lower in the non-dominant versus dominant arm. The evidence relating to agreement, hypothesis testing/construct validity, criterion validity and responsiveness for the one-arm hop test is unknown.

Modified push-up test

The modified push-up test had fair evidence in discriminate construct validly, but was unable to distinguish between healthy university-level modern dancers and non-dancers.27 No evidence was available for reliability, agreement, criterion validity, responsiveness and injury prediction.

Discussion

This study provided a systematic review of upper extremity PPTs in response to a broader research recommendation specific to the use of PPTs in sports medicine.5 The methodology quality (COSMIN) of eight studies and the quality of measurement properties (Terwee Scale) of six PPTs were included in the synthesis of best evidence. Consistent with our hypothesis, the evidence relating to upper extremity PPTs is mostly unknown or limited.

The use of upper extremity PPTs as assessment tools must be grounded in the tenants of reliability and validity. In general, the available studies that report UE PPTs in athletic populations fail to provide adequate reliability and validity evidence based on COSMIN criteria due to a number of factors. The first issue relates to studies relying on poor (n<30)17 ,18 ,19 ,25 and fair (n=30–49)21 ,22 ,26 ,27 quality sample sizes. Small sample sizes affect statistical power and limit the generalisability of study results. The exceptions to this were studies reporting on the CKCUEST,20 seated shot put23 and unilateral shot put.24 However, the use of a small subgroup of healthy (n=40) upper extremity sport-specific athletes (n=40) and upper extremity-injured athletes (n=28) to evaluate the reliability, agreement and construct validity (discriminate) of the CKCUEST weakened the methodological quality of the study examining the CKCUEST. Collectively, these findings highlight a problematic issue relating to the low quality of research availability on UE PPTs.

Studies reporting strong evidence relating to reliability, agreement, hypothesis testing/construct validity, criterion validity and responsiveness of upper extremity PPTs are lacking. Research designs consistent with moderate evidence relating reliability, agreement and construct validity (discriminate) are only available based on two studies reporting on CKCUEST.20 ,21 One study presented moderate evidence of criterion validity for the unilateral shot put.24 Limited evidence from two studies was available for the one-arm hop test reliability25 and medicine ball throw responsiveness.26 Aside from these investigations, the quality of research designs for the remaining studies was mostly rated as unknown due to an absence of research focused on the constructs of reliability and validity.

Beyond the COSMIN criteria, the majority of studies included in the best evidence synthesis examined healthy athletes21 ,24 ,27 or failed to report upper extremity injury history.22 ,23 ,26 Tucci et al20 included an upper extremity-injured group with a history of subacromial impingement syndrome to determine construct (discriminate) validity for CKCUEST scores. However, comparisons were made between an older (41.5–49.8 years) sedentary injured group and a younger (21.7–23.1 years) upper extremity athletic population. The difference in age and athletic status is a confounder when interpreting the discriminate validity of the CKCUEST for injury status, and calls into question the validity of this test. The absence of studies that evaluate the measurement properties of PPTs in injured athletes is problematic for sports medicine practitioners to interpret PPTs in the contact of injury screening, evaluation and rehabilitation responsiveness.

The naming and testing procedures for comparative tests were not a matter of concern because of the low number of PPTs in the best evidence synthetic. The CKCUEST was performed and measured consistently across studies20 ,21 and was reflective of the original test description.28 In one study, the CKCUEST was modified for females by allowing a push-up start position for the knees.20 Although this modification to the original test may seem like a reasonable accommodation, it is a deviation that may influence scoring and limit generalisability, and warrants separate reliability and validity assessment.

The unilateral seated shot put,24 seated shot put22 ,23 and medicine ball throw26 are PPTs designed to measure upper extremity power. As these tests may appear to measure the same construct, it is tempting for clinicians to group them together for comparative analysis. However, this should not be done based on differences in testing procedures relating to the weight of the ball and alignment of the trunk and upper extremity. Despite modest evidence that supports the relationship of these tests to other measures of strength, there has yet to be any data supporting their inter-relationship. Thus, each test should be considered independently. For example, the unilateral seated shot put may be the best test to quantify performance and outcomes in overhead sport athletes (eg, tennis, volleyball, baseball) who rely on dominant arm performance.

Limitations

Systematic reviews are prone to a number of limitations29 and there a several studies related to this review worthy of consideration. The quality of our systematic review is dependent on the availability and quality of primary literature catalogued in electronic databases. There is no standard search strategy for PPTs in athletic populations,4 nor is there a MeSH (Medical Subject Headings) term to facilitate the cataloguing and searching of PPTs. The absence of a MeSH term may compromise identification of all relevant studies.30 However, to account for this, we included empirical31 and manual30 search strategies in addition to the electronic database searches. Our search was confined to English language studies due to language translation limitations. Therefore, we may have missed articles that were published in languages other than English.

Our operational definition for qualifying PPTs was restricted to tests that were portable, affordable and easy to administer in diverse settings, including clinics, courtside or field side. This eliminated studies and PPTs that required technology (force plates, three-dimensional motion capture) for data collection or scoring.

Methodological quality and PPT measurement properties were assessed with the COSMIN and Terwee Scale, respectively. These tools were originally designed to evaluate the quality of HR-PROs with respect to reliability, validity and responsiveness.12 ,13 ,32 Recently, the COSMIN and Terwee Scale have been adapted for systematic reviews relating to PPTs.4 ,5 ,14 ,15 ,33 Although their measurement properties have been questioned,14 ,15 ,33 these tools provide an objective means for determining quality and evidence relating to PPTs. Lastly, the review was based on small number of available PPTs.

Summary

There are a limited number of upper extremity PPTs that can be used as part of a musculoskeletal screening examination or an outcome measure in athletic populations. Based on current evidence, the CKCUEST, unilateral seated shot, one-arm hop test and medicine ball throw may have promise as PPTs. However, the utility of these PPTs in injured populations is unsubstantiated in literature and warrants further investigation. Higher quality research needs to focus on the measurement properties of upper extremity PPTs across a spectrum of healthy and injured athletic populations. Considering this, future research needs to consider standardisation of PPT names, descriptions, and testing procedures to ensure best methodological quality and appropriate comparisons of measurement properties. This is a critical link for reporting PPTs in sports medicine literature and to support the clinical feasibility of tests by physiotherapists, athletic trainers, coaches and trainers. In the absence of sound evidence relating to reliability and validity, upper extremity PPTs provide little utility in determining the scope of functional performance or injury status in athletic populations.

What are the findings?

  • Six upper extremity physical performance tests (PPTs) were investigated and provided modest evidence pertaining to reliability and validity.

  • The available studies on upper extremity PPTs should use larger sample sizes to provide higher quality evidence and generalisability of findings.

  • The closed kinetic chain upper extremity stability test (CKCUEST) and unilateral seated shot put were associated with moderate quality evidence relating to reliability, agreement and hypothesis testing.

  • The evidence pertaining to one-arm hop test reliability and medicine ball throw responsiveness was of limited quality.

How might it impact on clinical practice in the future?

  • Clinicians should exercise caution in using upper extremity PPTs in athletic populations as musculoskeletal screening tools or outcome measures, as current tests lack evidence to suggest their widespread use in clinical practice.

  • The CKCUEST has moderate test-retest reliability and may demonstrate discriminate validity in subacromial impingement syndrome.

  • The unilateral shot put test may be used to quantify dominant versus non-dominant (−5% to 10%) upper extremity power differences.

  • The medicine ball throw is best used to quantify responsiveness to upper extremity strength training programmes in healthy athletes.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors DTT, EJH and LKF planned the study, executed the searches, examined the articles for quality and edited the manuscript. DTT and JBT wrote the initial version of the manuscript. JBT edited and contributed to all other versions of the manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement There are additional figures from the meta-analysis that the authors are happy to share with receipt of a written request by the corresponding author.