Article Text

Download PDFPDF

Motor competence assessments for children with intellectual disabilities and/or autism: a systematic review
  1. Samantha J Downs1,
  2. Lynne M Boddy1,
  3. Bronagh McGrane2,
  4. James R Rudd1,
  5. Craig A Melville3,
  6. Lawrence Foweather1
  1. 1 Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, Liverpool, UK
  2. 2 School of Arts Education and Movement, Dublin City University, Dublin, Ireland
  3. 3 Institute of Health and Well-being, University of Glasgow, Glasgow, UK
  1. Correspondence to Lawrence Foweather; L.Foweather{at}


Objective Gross motor competence is essential for daily life functioning and participation in physical activities. Prevalence of gross motor competence in children with intellectual disabilities (ID) and/or autism is unclear. This systematic review aimed to identify appropriate assessments for children with ID and/or autism.

Design & data sources An electronic literature search was conducted using the EBSCOhost platform searching MEDLINE, Education Research Complete, ERIC, CINAHL Plus and SPORTDiscus databases.

Eligibility criteria Included studies sampled children with ID and/or autism aged between 1 and 18 yrs, used field-based gross motor competence assessments, reported measurement properties, and were published in English. The utility of assessments were appraised for validity, reliability, responsiveness and feasibility.

Results The initial search produced 3182 results, with 291 full text articles screened. 13 articles including 10 assessments of motor competence were included in this systematic review. There was limited reporting across measurement properties, mostly for responsiveness and some aspects of validity. The Bruininks–Oseretsky Test of Motor Proficiency-2 followed by The Test of Gross Motor Development-2 demonstrated the greatest levels of evidence for validity and reliability. Feasibility results were varied, most instruments required little additional equipment (n=8) and were suitable for a school setting, but, additional training (n=7) was needed to score and interpret the results.

Conclusion This review found the BOT-2 followed by the TGMD-2 to be the most psychometrically appropriate motor competency assessments for children with ID and/or autism in field-based settings. Motor competence assessment research is limited for these cohorts and more research is needed.

PROSPERO registration number CRD42019129464.

  • Measurement
  • Children
  • Adolescent
  • Fitness testing
  • Physical activity

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Motor competence refers to performing goal-directed human movements in a coordinated, accurate and relatively error-free manner.1–3 Fundamental movement skills (FMS; also termed foundational/gross motor skills) such as stability (eg, balancing), locomotor (eg, jumping) and object-control (eg, catching) skills are an important constituent of gross motor competence.4 5 The development of gross motor competencies, including FMS, is considered an essential foundation for daily life functioning and to build more complex skills necessary for sport-specific activities or physical activity (PA) participation.5–10 Gross motor competence promotes positive PA and health trajectories in children and adolescents,2 11–15 including those with intellectual disabilities (ID) and Autistic Spectrum Condition (ASC).16 17 Compared to typically developing peers, children with ID and ASC engage in less PA,18 19 have low fitness levels20 and greater rates of overweight and obesity.19 21 22 Thus, motor competence deficits could further exacerbate these health inequalities. It is therefore imperative to monitor gross motor competence in children with ID and ASC to identify and diagnose motor development problems and support targeted interventions.

ID and ASC are neurodevelopmental disabilities. ID is characterised by impairments to intellectual and adaptive functioning,23 which presents with difficulties in comprehending new and complex information, learning and applying cognitive, language, motor and social skills, as well as challenging behaviours.24 ASC, on the other hand, is a permanent neurodevelopmental condition characterised by social, communication and interaction difficulties, and by repetitive and/or restrictive patterns of behaviour.25 26 These social deficits observed in ASC can also present with more severe forms of ID, while individuals with ASC can display features that overlap with ID, such as taking longer to understand information.27 This leads to diagnostic challenges for clinicians in distinguishing between ID as its own diagnoses, ID with an additional diagnosis of ASC, and ASC only, particularly in infants and very young children where some of these abilities are yet to emerge.27 Furthermore, individuals with both ASC and ID appear to have a common genetic aetiology, with up to 50% of the children with ASC thought to have comorbid ID.28 Therefore, given the similarities and complexities around the clinical manifestation of ASC and ID and the associated diagnostic challenges, the present systematic review focuses on children with ID and/or ASC.

Clinicians, physical therapists, physical educators and scholars require field-based assessments of gross motor competence that are valid, reliable and feasible to provide them with useful information for clinical, educational, and research purposes.29 30 Validity, defined as ‘the degree to which (an instrument) is an adequate reflection of the construct to be measured’31 (eg, content validity). Reliability refers to ‘the degree to which the measurement is free from measurement error’31 (ie, test–retest reliability, intra- and inter-rater reliability). Feasibility refers to the usability of the assessment, including ease of administration, training or equipment requirements, cost and the length of time required.32 33 Thus, the assessment must be acceptable to children and adolescents, researchers and/or professionals. Assessments should also be responsive and able to detect changes in gross motor competence, in order to monitor growth and development, and to evaluate the impact of interventions.30 Information on these measurement properties is important as it influences the selection of the appropriate gross motor competence assessment for the intended purpose in the population of interest.

Several reviews have examined the measurement properties of gross motor assessment tools for use with typically developing children and adolescents.34–37 While there is no ‘gold standard’ measure of gross motor competence, these reviews indicate the availability of an abundance of process- or product-oriented measures, or hybrid approaches. Process measures focus on the analysis of movement technique. This provides rich data on movement quality but extensive training is typically required due to the higher expertise needed for scoring skill criteria as present or absent. Product measures, which focus on the outcome of the movement (eg, running velocity, number of catches), are more objective, easy to score and less time consuming, and consequently have more limited training requirements (for a more detailed guide, see.38 39) While these reviews highlight a number of valid, reliable and feasible tools for use within typically developing children, it should not be assumed that these tools are appropriate for use with children with ID and/or ASC and more specific research is warranted.

Children with ID and/or ASC have complex needs, including communication issues, a limited attention span and ability to retain information. These populations may need to receive instructions and information in a different way to typically developing children,40–42 thus requiring adapted forms of gross motor assessment administration. For instance, Wilson, Enticott and Rinehart43 adapted the 3rd edition of the Test of Gross Motor Development (TGMD) to include visual support for those with ASC as it is known that children with ASC may have a preference for visual learning. They found the TGMD-3 raw scores of children with ASC were significantly lower than typically developing peers, however, their raw scores significantly improved using the TGMD-3 visual support protocol compared to the TGMD-3 traditional protocol. This indicates that children with ID and/or ASC may not understand the assessment requirements in existing assessments,44 which could lead to the documentation of greater deficits in gross motor competencies in these populations relative to typically developing children than truly exists.

A number of studies have assessed gross motor competence in children with ID and/or ASC.45–47 Despite a growth in research in these populations over the last decade, studies have used different assessment tools such as the TGMD-2 (eg,48 49) TGMD-3 (eg,50) or the Bruininks-Oseretsky Test of Motor Proficiency-2 (BOTMP-2: for example, 51) which means that the results are not directly comparable and hinders broader interpretations of gross motor competence levels. It also highlights that there has been difficulty deciding on an assessment tool which may be most appropriate for use with children with ID and/or ASC, as these assessment tools were not originally designed for use with these populations. This is important as it is recommended that the quality of an assessment tool should be established in the target population in which the measure will be administered.52 Furthermore, some of the available evidence used measures of only one dimension of FMS (typically locomotor or object-control), providing a narrow picture of gross motor competence, while the reliability of the assessments was unclear.47 To overcome these methodological weaknesses in the literature, more clarity is needed regarding the measurement properties of gross motor competence assessment tools in children with ID and/or ASC.

The purpose of this systematic review was to evaluate the measurement properties of field-based assessments of gross motor competence for use in children with ID and/or ASC aged 3 to 18 years old. This information is needed to help professionals (educators/clinicians) and researchers determine the most appropriate and feasible tool for use with this specific population.


The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)53 guided the methodology and reporting of this study. The study protocol was registered with PROSPERO, registration no: CRD42019129464.

Inclusion criteria

Target population included:

  • Children and adolescents with ID and/or ASC (mild, moderate, severe and profound).

  • Participants aged between 3 and 18 years old.

Studies were included if they:

  • Included a representative sample of the target population (>50% of sample had ID/ASC and >50% of sample <18 years old).

  • Included a field-based assessment tool of gross motor competence (studies that included both fine and gross motor assessments were included if the data could be separated between fine and gross motor competence).

  • Included measurement properties data of a gross motor competence tool.

  • Were published in English language in a peer-reviewed journal.

  • Were published between January 1937 and October 2019.

Exclusion criteria

During screening studies were excluded if:

  • Included a laboratory assessment of gross motor competence.

  • Included only fine motor skill assessments.

  • Full text articles were not available.

Literature search and study selection

An electronic literature search was conducted using the EBSCOhost platform searching MEDLINE, Education Research Complete, ERIC, CINAHL Plus and SPORTDiscus databases. These databases were searched independently and included publications from January 1937—October 2019. The search strategies applied in the databases included combinations of key search terms from four subheadings; population (Child* OR adolesc* OR youth* OR teenage* OR girl* OR boy* OR preschool* OR juvenile OR p$ediatric) AND disability (Disab* OR autis* OR ‘down syndrome’ OR (intellectual* OR learning OR develop* OR mental* OR special OR additional OR cognitive) NEAR/2 (disab* OR disorder* OR impair* OR difficult* OR handicap* OR deficien* OR subnorm* OR delay* OR retard* OR needs)) AND motor competence (Coordination OR agil* OR balanc* OR (‘fundamental movement’ OR physical* OR movement OR motor OR locomotor OR ‘object control’ OR ‘gross motor’ OR stability OR actual) NEAR/2 (skill* OR compet* OR develop* OR abil* OR proficien* OR learning OR assess*)) AND assessment tool (Assess* OR measure* OR test* OR tool OR battery OR instrument* OR evaluat* OR valid* OR reliab* OR feasib* OR responsiveness OR ‘psychometric test’ OR ‘measurement properties’). Boolean searches were also used.

Once the initial literature search was completed, the lead author removed all duplicates and remaining papers were transferred to Covidence (online systematic review software) for screening. Title and abstract screening of all articles was conducted by the lead author, a second researcher (last author) also independently screened 20% of the article titles and abstracts. Disagreements were resolved through discussing with a third researcher (second author). Full-texts were retrieved, screened and labelled ‘yes’, ‘no’ or ‘maybe’ for inclusion by the first author. The last author checked 10% of the articles labelled ‘no’ and all of the articles labelled ‘yes’. Articles labelled ‘maybe’ were discussed between the first and last author until a consensus on the decision was made, with the involvement of the second author where necessary. In addition to the electronic literature search, researchers also checked reference lists of included papers and searched author bibliographies.

Data extraction

The first author independently extracted individual study data relating to: study information (title, authors, year and country of publish, study design and environment), participant information (sample size, age, sex, disability diagnosis and severity, inclusion/exclusion criteria, body mass index [BMI] and weight status), assessment purpose and administration, measurement properties of tools (reliability, validity, responsiveness and feasibility). Data extraction was checked for accuracy by co-authors and inconsistencies were resolved with discussion between the first and last author.

Quality appraisal

A quality appraisal tool for rating the measurement properties of motor competence assessments was developed on the basis of previous checklists (see online supplemental file 1).30 31 52 54 In addition, to examine the feasibility of assessments, a utility matrix was developed using criteria gathered from previous recent systematic reviews exploring related concepts (see online supplemental file 2).32 33 Risk of bias across studies was assessed using a modified version of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach.52 55

Supplemental material

Supplemental material


Figure 1 provides the PRISMA flow diagram of the search and screening process. 13 articles were identified for inclusion including 10 unique instruments: the Test of Gross Motor Development 2nd Edition (TGMD-2),56 Test of Gross Motor Development 3rd Edition (TGMD-3) [traditional and visual],56 Bruininks-Oseretsky Test of Motor Proficiency Second Edition (BOT-2),57 Movement Assessment Battery for Children 2nd Edition (MABC-2),58 Peabody Developmental Motor Scales 2nd Edition (PDMS-2),59 Test of Motor Proficiency,60 Cratty Six-Category Gross Motor Test,61 Data-Based Dance Skills Placement Test,62 Ages and Stages Questionnaire-263 and The Four Square Step Test.64 Five corresponding manuals were identified and added to the final pool for the TGMD-2, TGMD-3, BOT-2, MABC-2 and the PDMS-2 tools; the remaining tools either did not have a manual or they could not be located.

Figure 1

PRISMA flow diagram showing identification and screening process.

Tool characteristics

Table 1 shows the characteristics of the 10 motor competence assessment tools documented in the included studies and manuals. Five tools are suitable for use with children (TGMD-2,56 TGMD-3,56 PDMS-2,59 Ages and Stages Questionnaire-2,63), three tools with children and adolescents (BOT-2,57 MABC-2,58 Test of Motor Proficiency,60), with the remainder of tools suitable for youth and adult populations (Cratty Six Category Gross,61 Dance Skill Placement Test,62 Four Square Step Test.64). With the exception of the TGMD-2, TGMD-3 traditional and visual, the Four Square Step Test, Data-Based Dance Skill Placement and the Cratty Six Category Gross, all other tools measure at least one other construct in addition to gross motor competence such as fine motor skills or strength. Despite differences in test administration and structure, the tools tend to include similar items within the gross motor skill subsets, specifically object control skills such as, catching, throwing and kicking, or locomotor skills such as walking, running and jumping. Further similarities can be seen in strength tasks for the BOT-2 and Test of Motor Proficiency tools, as well as dexterity tasks for the BOT-2, PDMS-2 and Test of Motor proficiency. Five tools (BOT-2; PDMS-2; Test of Motor Proficiency; Cratty Six Category Gross; Ages and staged Questionnaire-2) assess all three gross motor skill domains: stability, object-control and locomotor skills. It should be noted that the Ages and Stages Questionnaire-2 is the only tool within this study that is a questionnaire and covers both fine and gross motor skills. All tools use product-oriented scoring procedures, with the exception of the process measures of TGMD-2 and TGMD-3.

Table 1

Characteristics of the gross motor competence assessments

Study characteristics

The characteristics of included studies in the review are described in table 2. The studies were conducted in four continents; Asia (n=7), Australia (n=1), Europe (n=3) and USA (n=2). Typically, data collection took place indoors within school grounds in an open area such as a multi-use hall or sports hall (n=10), working with children individually or in small groups (2–4 children). Study participants included individuals with Down syndrome (DS) (n=3 studies), ASC (n=2 studies) and ID (n=8 studies), with participant ages ranging from 1 year 10 months to 18-years old and study sample sizes ranging from 13 to 446 participants (average = 93 participants).

Table 2

Characteristics of the studies reporting gross motor competence assessment measurement properties

Validity, reliability and responsiveness

Online supplemental file 3 (validity and responsiveness) and online supplemental file 4 (reliability) displays the data extracted from the included studies with regards to the measurement properties of the assessments in children with ID and/or ASC. Table 3 presents a summary of the overall evidence for each of the 12 measurement properties, synthesising the outcomes of all studies for each measurement instrument. In general evidence was limited with few studies reporting across the range of measurement properties. Test-retest reliability, inter-rater reliability and internal consistency were most frequently evaluated, while no studies assessed content validity.

Table 3

Summary of appraisal of measurement properties of gross motor competence assessments

Supplemental material

Supplemental material

Validity data were available for nine of the ten assessments within 11 studies. Internal consistency was reported for seven tools (TGMD-2, TGMD-3 traditional, TGMD-3 visual, BOT-2, MABC-2, PDMS-2 and Test of Motor Proficiency) within 69% of studies. Discriminant validity (BOT-2), construct/structural validity (TGMD-2) and cross-cultural validity/measurement invariance (BOT-2) were reported for one tool within one study each; hypothesis testing for construct validity for three tools (TGMD-2 and TGMD-3 Traditional & Visual) within 15% of studies, and criterion/concurrent validity for seven tools (TGMD-3 Visual, BOT-2, MABC-2, PDMS-2, Test of Motor Proficiency, Ages and Stages Questionnaire-2 and the Four Square Step Test) within 38% of studies. Positive ratings were reported for all validity measurement properties with the exception of criterion/concurrent validity, which showed some indeterminate ratings for the PDMS-2 and the Ages and Stages Questionnaire-2. No tools had sufficient validity data to present a positive rating for all aspects of validity.

Reliability data were available for nine of the 10 identified measurement instruments. All studies that assessed elements of reliability were given a positive rating. Test-retest reliability was reported for 10 tools within 77% of studies (no test-retest data for Ages and Stages Questionnaire-2), intra-rater reliability for the TGMD-3 only (traditional and visual) within 8% of studies, and inter-rater reliability for eight tools (TGMD-2, TGMD-3 traditional, TGMD-3 visual, BOT-2, MABC-2, PDMS-2, Test of Motor Proficiency and the Four Square Step Test) within 62% of studies. There were no reliability data for the Ages and Stages Questionnaire-2 in this population.

Overall responsiveness results were mixed (table 3). Studies included in this review provided responsiveness results (including floor & ceiling effects) for three of the 10 measurement instruments reported within 23% of studies. The minimum important change and smallest detectable change results showed positive ratings while the AUC results were negative for BOT-2, MABC-2 and PDMS-2, this suggests that these tools are capable of detecting some change at an acceptable level but not a good level in this population. Floor and ceiling effects for the MABC-2 and PDMS-2 were both negative, while the BOT-2 reported a negative rating in one study72 and a positive rating in another.71 No other floor and ceiling effects were reported for other instruments.


The detailed information concerning the utility of each assessment is provided in online supplemental file 5. Table 4 presents a summary of the feasibility ratings for each assessment. Data are missing for the Ages and Stages Questionnaire—2, as this detail was not presented in the included studies and no manual could be located online or via the library, while the authors did not respond to contact made by the lead author. Results showed that only one measurement instrument (PDMS-2) took a long time to conduct (>60 mins), while the majority of instruments were rated positively, taking a short (<15 mins: Four Square Step Test; <30 mins: TGMD-2, TGMD-3, MABC-2) to moderate (30–60 mins: BOT-2, Test of Motor Proficiency, Data-based Dance Skill Placement Test) length of time to complete. The amount of space required to conduct different tests was more varied, with the TGMD-2 and TGMD-3 requiring the largest space (60 feet and 18.3 metres of clear space is required, respectively, for the run task alone). With regards to the equipment needed to administer the tests, over half of the instruments received positive ratings, with either the equipment required likely to be present in a typical school (Test of Motor Proficiency, Four Square Step Test) or minimal additional equipment required (TGMD-2, TGMD-3, BOT-2, PDMS-2, Cratty Six-Category Gross Motor Test). The Data-Based Dance Skill Placement Test requires no equipment. A large proportion of the instruments (60%) do not report specific qualifications required to carry out the assessments, however, the MABC-2 and PDMS-2 require a researcher with specific qualifications. Further, results imply that as well as administrators familiarising themselves with the examiner manuals, they would need further additional training for general administration and scoring purposes (0.5 day—1.5 days) (50%—TGMD-2, TGMD-3, BOT-2, MABC-2, Test of Motor Proficiency).

Table 4

Feasibility scores for gross motor competence assessments

Supplemental material

Risk of bias across studies

Online supplemental file 6 shows the risk of bias across studies for each measurement property per assessment. The quality of the evidence was low or very low across measurement properties for the Test of Motor Proficiency, Cratty Six-Category Motor Test, the Data-Based Dance Skill Placement Test, Ages & Stages Questionnaire-2 and the Four Square Step Test. The quality of the evidence for the remaining assessments (TGMD-2, TGMD-3, BOT-2, MABC-2, PDMS-2) was mixed and varied by measurement property, with moderate to high-quality evidence typically available for test-retest and inter-rater reliability, as well as internal consistency. The measurement properties were generally downgraded due to lack of available studies including a lack of high-quality studies, imprecision due to the total sample size included in the studies being below 100, and/or inconsistency in the similarity of results across studies for the measurement property.

Supplemental material


This systematic review aimed to identify and evaluate the measurement properties of field-based assessments of gross motor competence for use in children with ID and/or ASC aged 3 to 18 years old. In general, there was a lack of published studies (n=13) concerning gross motor competence assessments in ID and/or ASC populations. Nevertheless, the results suggest assessments exist which are psychometrically sound to examine motor competence in these populations within a field-based setting (eg, a school). However, the quality of the evidence was limited by the small number of studies and low pooled sample size of the included studies for each measurement property per assessment. Only three tools were assessed across each measurement dimension (validity, reliability and responsiveness): BOT-2,70 72 MABC-272 and PDMS-2,72 while none of the studies included in this review reported across all 12 measurement properties. Two studies reported on six measurement properties covering four tools: the BOT-2, MABC-2, PDMS-272 and the TGMD-3(traditional and visual).68 However, most studies (71%) reported on three or less measurement properties, with test-retest reliability, inter-rater reliability and internal consistency properties most commonly reported. Taken together, the review has highlighted the lack of research conducted around the measurement of motor competence in children with ID and/or ASC and more research is warranted.

Of the 13 published studies in the review, most measurement instruments only appear in one (64%) or two (18%) papers. The BOT-2 and TGMD-2 both appeared within four studies and were therefore most commonly reported. These findings are consistent with systematic reviews of gross motor competence assessments for typically developing children.33 35 37 These previous reviews include over 30 different gross motor competence assessments that, due to a lack of published data on measurement properties in children with ID or ASC, were not included in the current review. These include traditional style assessments such as CHAMPS Motor Skill Protocol,76 and obstacle/circuit-based instruments such as Dragon Challenge77 and MOBAK-5-6,78 as well as the use of advanced technologies such as inertia sensors to measure competence.79 Further research examining the validity, reliability and feasibility of these tools and new technologies in children with ID or ASC is needed to ascertain the suitability of these tools for use with this population.


Internal consistency was the most commonly reported form of validity (reported in 69% of studies), followed by criterion/concurrent validity (reported in 38% of studies). The TGMD-2 reported good internal consistency when used on populations with ID.17 65–67 The BOT-2, MABC-2 and PDMS-2 also reported good internal consistency when used with ID and ASC populations.66 70 72 73 Positive criterion/concurrent validity was observed for TGMD-3 visual when assessed against TGMD-3 traditional68; The Four Square Step Test when assessed against the Functional Reach Test75; the Test of Motor Proficiency when assessed against BASIC-MR Behavioural Assessment Scale For Indian Children With Mental Retardation, Part- A60; the BOT-2, MABC-2 and PDMS-2 when compared with one another.72 In one study73 criterion/concurrent validity data were unclear resulting in indeterminate ratings for the Ages and Stages Questionnaire-2 when assessed against the PDMS-2.

Many other aspects of validity were underreported: content validity was not reported for any tools, while discriminant validity (BOT-2,71) construct/structural validity (TGMD-267) and cross-cultural validity/measurement invariance (BOT-271) properties were rarely reported. These findings differ somewhat from similar systematic reviews conducted in TD populations.36 37 These reviews found that construct validity was the most commonly assessed aspect of validity and that generally assessments had sound quantitative evidence for proposed factor structures for motor constructs. Our finding that content validity was reported the least is in agreement with Hulteen et al,37 yet in contrast to Scheuer et al,36 who found that content validity was the second most commonly reported form of validity (60% of studies). It should be noted that the differences observed between these reviews and ours may be linked to methodological factors. Regardless, the lack of information concerning content validity and the lack of testing for validity within studies included in the current review may suggest that it is assumed that tests developed for use with TD children will be valid and appropriate for use children with ID and ASC. Given the importance of using assessment tools that are validated for use with that specific population,52 more validity testing is required in children with ID or ASC to make definitive statements regarding the validity of assessments.


Test-retest reliability was the most commonly reported property for reliability, followed by inter-rater reliability, while only a single study examined intra-rater reliability.68 As noted by Hulteen et al,37 it is interesting that test-retest reliability was reported the most given that it is more time consuming, with greater burden for the participant, as this construct requires data collection on at least two time points for each participant, ideally 2 weeks apart. In comparison inter-rater and intra-rater reliability constructs can be checked during the same testing session. Of the 10 measurement instruments reviewed in the current study, the TGMD-2, BOT-2 and PDMS-2 reported the highest levels of reliability. These studies included researchers,66 70 72 licenced physiotherapists,17 occupational therapists,70 72 university students,69 physical education specialists and psychomotor therapists67 when testing reliability. These three tools consistently demonstrated strong inter-rater17 66 67 72 73 and test-retest reliability (ICC or r >0.70).17 67 69 70 72 73 The TGMD-3 was the only tool that had data for all three reliability measurement properties,68 while the Ages and Stages Questionnaire—2 did not have any reliability data.73


Assessments should be responsive and able to detect changes in gross motor competence, in order to monitor growth and development, and to evaluate the impact of interventions.30 In the current review, responsiveness and floor and ceiling effects were reported for three (BOT-2, MABC-2 and PDMS-2) of the 10 measurement instruments. Responsiveness properties were reported in two papers for the BOT-2 and demonstrated positive ratings in both studies, while the floor & ceiling effects showed mixed results (+ −), suggesting the tool can detect change but only at an acceptable level. Similar to previous systematic reviews in TD children,34–37 in general the reporting of responsiveness measurement properties within studies included in this review were limited. This could be due to the low prevalence of reporting.37 Alternatively, testing for responsiveness requires researchers to conduct longitudinal or experimental studies, which are more difficult and time consuming to conduct. Testing for responsiveness has been identified as an important area for future research.35


Given the wide range of disabilities within SEN school settings, ensuring that field-based assessments are feasible for researchers and professionals to use in school settings is important. Different factors can influence the feasibility of assessments,33 such as time, space, environment and administrator expertise/training.34 As the development of gross motor competence is optimal during childhood,6 the feasibility was assessed based on primary school settings and the majority of included studies (71%) were conducted within this environment. Feasibility varied from one instrument to the next as well as within the instruments. For example, the PDMS-2 scored poorly for time and qualifications required to complete the test, yet the amount of space and equipment needed scored well. Instruments that scored poorly with regards to ‘time’ tended to assess a wider array of gross motor domains and skills (BOT-2, PDMS-2 and Test of Motor Proficiency). When conducting these more detailed assessments, administrators could break testing sessions up by subsets to alleviate some of the time burden for participants.80 As feasibility information was incomplete for eight of 10 measurement instruments, it is not possible to make a fair and ‘final’ judgement. However, limited results suggest the Test of Motor Proficiency60 scored the highest and the PDMS-259 scored the lowest for feasibility, with the need for administrator training unclear. The Test of Motor Proficiency was specifically developed for the assessment of gross and fine motor skills of children with mild-moderate ID within SEN schools, perhaps explaining why it is the most feasible instrument within this study.60 More research is required to establish the feasibility of existing assessments for use with children with ID and ASC in primary school settings.

Strengths and limitations

Strengths of this paper include the adherence to PRISMA53 guidelines and the appraisal of a wide range of measurement properties to examine the quality of the assessments in terms of validity, reliability, responsiveness and feasibility. Limitations of the review include the exclusion of fine motor skills, which are an important constituent of motor development, and laboratory or clinical-based assessments, as we wanted to focus on gross motor competence assessments that can be administered by researchers and professionals in the children’s natural environments such as home, school and community settings. Further, only papers written in English within peer-reviewed journals were included, meaning we may have missed some relevant assessments.


This is the first paper to systematically review the validity, reliability, responsiveness and feasibility of assessments of gross motor competence in children with ID and/or ASC. While 10 instruments were identified, the available evidence was of mixed quality: literature was sparse with many measurement properties unreported or not yet examined in the target population. The limited evidence available suggests that the BOT-2,57 followed by the TGMD-2,56 have the strongest measurement properties to support use of these assessments with children with ID and/or ASC to date. Assessments developed specifically for use with children with ID and/or ASC such as the Test of Motor Proficiency60 scored highest for feasibility, supporting the importance of using assessment tools designed for use with this specific population. More population-specific research is required to establish the validity, reliability, responsiveness and feasibility of existing assessments for use with children with ID and ASC in primary school settings.

Summary box

  • Gross motor competence is an essential foundation for daily life functioning and to build more complex skills necessary for sport-specific activities or physical activity participation.

  • It is unclear which motor competence tool is reliable, valid and feasible for use with children with intellectual disabilities and/or autistic spectrum disorder.

  • This review identified the Bruininks-Oseretsky Test of Motor Proficiency, Second Edition, followed by the Test of Gross Motor Development-2 as assessments with the best population-specific measurement properties to date.

  • While there are many assessments of gross motor competence, more research is needed on their validity, reliability and feasibility for use in children with ID and/or autism.



  • Contributors SD and LF conceived the study and secured the funding. All authors were involved in the study design and set the scope of the review, agreed articles for inclusion, and made contributions to the quality appraisal of assessments. SD and LF wrote the manuscript. All other coauthors read, reviewed, commented and approved the final draft.

  • Funding This study was funded by the Baily Thomas Charitable Fund (award ref: TRUST/VC/AC/SG/4771-7962).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval Not required.

  • Provenance and peer review Not commissioned; externally peer reviewe.

  • Data availability statement All data relevant to the study are included in the article or uploaded as supplemental information.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.