The validity of the Youth Physical Activity Questionnaire in 12–13-year-old Scottish adolescents

Background The development of accurate methods to measure health-behaviours forms an integral component in behavioural epidemiology. Population surveillance of physical activity often relies on self/proxy reported questionnaires due to cost and relative ease of administration. The aim of this study was to examine the criterion validity and measurement agreement between the Youth Physical Activity Questionnaire (YPAQ) and accelerometry before being included in a Scotland-wide study. Methods Forty four participants (12–13 years old; 61% girls) completed the YPAQ following 7 days wearing the Actigraph GT3X+. Mean moderate-to-vigorous physical activity (MVPA) per day was derived from YPAQ and accelerometer and validity was assessed using Spearman's correlation; Bland-Altman plots examined absolute agreement between methods. Results Pearson's and Spearman’s correlations between YPAQ and accelerometer were r = 0.47 and rs = 0.39 (p<0.01) respectively. The YPAQ over reported mean MVPA by 25.6 ± 50.2 minutes (95% CI 10.4-40.9 minutes; p <0.001), with 95% limits of agreement of −72.69 minutes and + 123.99 minutes. Evidence of underreporting at lower levels of activity and over reporting at higher levels of activity was evident (Pearson's r=0.81), in addition to heteroscedasticity, where variances increased as MVPA increased. Conclusions Although a moderate correlation between the two methods was apparent, the YPAQ should not be used interchangeably with accelerometry. The YPAQ does demonstrate a reasonable ability to rank MVPA, although it tends to under-report lower levels and over-report higher levels. This, and other administering factors, should be taken into consideration if being used for group or individual level analyses.

Methods: Forty four participants (12-13 years old; 61% girls) completed the YPAQ following 7 days wearing the Actigraph GT3X+. Mean moderate-tovigorous physical activity (MVPA) per day was derived from YPAQ and accelerometer and validity was assessed using Spearman's correlation; Bland-Altman plots examined absolute agreement between methods.
Results: Pearson's and Spearman's correlations between YPAQ and accelerometer were r = 0.47 and r s = 0.39 (p<0.01) respectively. The YPAQ over reported mean MVPA by 25.6 AE 50.2 minutes (95% CI 10.4-40.9 minutes; p <0.001), with 95% limits of agreement of À72.69 minutes and + 123.99 minutes. Evidence of underreporting at lower levels of activity and over reporting at higher levels of activity was evident (Pearson's r=0.81), in addition to heteroscedasticity, where variances increased as MVPA increased.
Conclusions: Although a moderate correlation between the two methods was apparent, the YPAQ should not be used interchangeably with accelerometry. The YPAQ does demonstrate a reasonable ability to rank MVPA, although it tends to under-report lower levels and over-report higher levels. This, and other administering factors, should be taken into consideration if being used for group or individual level analyses.

BACKGROUND
The development of accurate methods to measure health behaviours forms an integral component in behavioural epidemiology. 1 Within physical activity (PA) research, high quality measures are crucial in all stages of the research process, including population surveillance. Accelerometry and movement sensors have become a widely used objective method for quantifying PA levels through their ability to derive information relating to frequency, duration and intensity of PA from actual body movement/acceleration. Although successfully integrated into largescale studies, 2 only a few population level datasets exist using this particular method. Self-or proxy-reported questionnaires remain popular, despite a number of limitations: 3 4 questionnaire responses depend on perception, encoding, storage and retrieval of information; 5 and concerns exist over the accuracy of questionnaire data from children under 10 years due to their cognitive underdevelopment. 6 These concerns translate to poor validity coefficients, 7 where a tendency exists for questionnaires to over-report PA levels compared with directly measured PA. 8 However, within population surveillance research, a questionnaire approach requires less technical knowledge and expertise and is considered less burdensome than accelerometry. Although cheaper and more practical to administer, 9 13 we set out to examine the ability of the YPAQ to accurately capture the main outcome variable used to assess guideline adherence, namely moderate-to-vigorous physical activity (MVPA). Specifically, we examined the individual level criterion validity and measurement agreement between YPAQ-derived MVPA and accelerometry-derived MVPA (ActiGraph GT3X+; ActiGraph LLC, Pensacola, FL, USA) to assess the suitability of YPAQ to measure this outcome variable for inclusion in the SPACES study.

METHODS Participants
A convenience sample of 90 adolescents (12-13 years old) from two schools in Central/West Scotland were invited to take part. Participants were automatically enrolled (following participant assent) in the study unless parents withdrew consent (opt out consent).
Ethical approval for the study was granted by the University of Glasgow's College of Social Sciences, the participating school's local educational authorities and the head teachers of both schools. The study fieldwork was conducted in May 2013 and included three school days, two weekend days and 2 days which fell on public holidays.

Measures
Objective measurement: accelerometer PA was measured using an accelerometer (ActiGraph GT3X+) worn on a belt around the waist for seven consecutive days. The GT3X+ is a small (4.6Â3.3Â1.5 cm), lightweight (19 g), tri-axial device that records and stores raw acceleration signals in three axes, at a user-specified sample rate (between 30 and 100 Hz). It has a dynamic range of AE6 G and memory capacity of 512MB. ActiGraph devices are used extensively, and internationally, in children's PA research; 2 14 15 the GT3X+ has been validated against indirect calorimetry in children's energy expenditure research. 16 Following data collection, ActiGraph data were uploaded to a computer for post-processing using Acti-Graph's proprietary software (ActiLife, v6.7.1). PA files were trimmed to include only the measurement period. The software aggregated the raw acceleration data (100 Hz) into 30-second epochs. Periods of 60 consecutive zeros, allowing for 'spikes' of 2 min of activity (less than 100 counts/min), were classified as non-wear and subsequently removed in any PA outcome measure. Participants had to wear the device for 500 min for it to be classified as a valid day, 9 and a minimum of three valid days were required for inclusion in the analyses. 17 18 MVPA per valid day per participant was extracted using the Evenson threshold (counts per minute >2295) cut points. 19 Mean MVPA was calculated per participant (as a function of number of valid days per participant), and then across the full sample.
Self-reported questionnaire: YPAQ The YPAQ contains 47 different activities and requests participants to report the frequency and duration of each activity for both weekdays and weekend days over the past 7 days. The YPAQ is broken into contextual settings/domains: sporting, leisure, school and freetime activities. 11 On completion of the accelerometer protocol (on day 8), participants attended a large classroom, where trained fieldworkers assisted with the completion of the YPAQ over an allocated school period (55 min). The fieldworkers read the instructions, showed an example of how a question should be filled out and allowed the pupils to ask questions before starting. Upon completion, fieldworkers were instructed to check for errors or omissions (eg, missed questions, illegible/ambiguous answers).

Scoring
Each activity in the questionnaire was assigned a metabolic equivalent (MET) value according to previously published values. 20 For the purposes of this study, activities with values above 4 METs were considered to be at least moderate and included in the analysis. 21 The activities included cricket, dancing, football, gymnastics, martial arts and rugby. Mean time per day in MVPA was calculated per participant (derived from the total MET minutes divided by seven) and then across the group.

Statistical analyses
The null hypothesis that no bias exists between measurement methods (YPAQ vs accelerometer) was initially tested using a paired t-test. The strength of the association between both measures was tested using Pearson's correlation and Spearman's rank correlation. A Bland-Altman plot, 22 showing mean bias and 95% limits of agreement was used to assess the degree of absolute agreement between methods, and differences between measurements were calculated for each participant (YPAQÀaccelerometer) and plotted against the mean of each method ((YPAQ+accelerometer)/2). The relationship between these differences (YPAQÀaccelerometer) and the mean was tested using a Pearson correlation. This provided an indication of the dependency of the differences on the underlying measurement range.
Considering accelerometry to be the criterion method, the values representing the differences (YPAQÀaccelerometer) were plotted against the accelerometer (figure 3A). Potential heteroscedasticity across the range of MVPA (accelerometer) was assessed by conducting a Breusch-Pagan/Cook-Weisberg test: 23 visually represented by plotting the residuals versus predicted values (figure 3B).
All statistical analyses were performed using Stata version 13 (Stata Corp, College Station, TX, USA).

RESULTS
Of the original 90 participants invited, 7 opted out prior to the study commencing. A further six withdrew their consent during the study, four were absent during data collection and two accelerometers were lost during the monitoring period, leaving 71 participants who took part in the full data collection period. Fortyfour participants (61% girls) provided at least three valid days of PA and were included in the agreement analyses. The mean age was 12.7 years.

Mean PA levels
On average, children spent 58.2AE20.3 min per day in MVPA according to accelerometry, with boys spending approximately 1.3 more minutes per day in MVPA than girls. Self-reported time spent in MVPA was much higher than that recorded by accelerometer, with an average time of 99.8AE56.2 min per day, with boys reporting on average 29.3 more minutes in MVPA than girls (table 1).

Validity coefficients
Pearson's and Spearman's correlations between YPAQ and accelerometer were r=0.47 and r s =0.39 (p<0.01), respectively, indicating a statistically significant medium monotonic relationship between the two methods.
The mean difference in minutes spent in MVPA between YPAQ and accelerometer was 25.6AE50.2 min (95% CI 10.4 to 40.9; p<0.001); the mean difference between methods was more pronounced among boys (table 2). Figure 1A demonstrates that data points do not fall on the line of equality (perfect agreement) across levels of measurement. This initial plot illustrates that YPAQ scores tend to be greater than accelerometer-derived MVPA, with a slight trend in the bias: being negative (YPAQ scores lower) for lower levels of accelerometerderived MVPA and positive for high levels of accelerometer-derived MVPA.

Agreement between methods
The Bland-Altman plot (figure 1B) identified a mean bias between the methods of 25.65 min of MVPA, with 95% limits of agreement of À72.69 and +123.99 min (YPAQÀaccelerometer). There is evidence of both under-and over-reporting, dependent on the mean level of MVPA. The differences tended to be negative when mean MVPA was low and positive when mean MVPA was high. Pearson's correlation between the difference and the mean was 0.81, indicating a significant positive linear relationship between these two variables. Where there are instances of a relationship of this magnitude, Bland and Altman 24 suggest a regression approach for non-uniform differences (figure 2). Using this approach, the limits are slightly narrower at lower levels of MVPA and widen as MVPA increases.  The Breusch-Pagan/Cook-Weisberg test (figure 3A and B) was conducted to test for constant variance of residuals across predicted values (of differences between measurement methods). This led us to accept the null hypothesis that all error variances were equal (p=0.2899). However, once influential cases and outliers (figure 3B, circled data points) were identified and removed, there was evidence of heteroscedasticity, as shown in figure 3B (the variance increases as the values increase).

CONCLUSIONS Interpretation of findings
The main purpose of the analysis was to investigate the validity metrics of the YPAQ as a self-reported measure for extracting time spent in MVPA in young adolescents as compared with accelerometry. In the event of its acceptability, the measure could be translated to testing in the population setting where its purpose would be to estimate the population prevalence of children meeting the PA guidelines. The results demonstrated that a moderate linear correlation existed between methods (Pearson's r=0.48; Spearman's r s =0.39), although results from the Bland-Altman analysis demonstrated a poor level of agreement, with error between measures dependent on the underlying PA level (r=0.81).
We were interested in determining whether the YPAQ would be a valid proxy for accelerometry, given that questionnaires could be considered more practical for population surveillance than activity monitors. When used to assess the prevalence of children meeting PA guidelines, Sallis and Saelens 25 have stated the importance of measuring absolute levels of validity. As can be seen from our findings, the agreement between the two methods becomes less evident as MVPA increases (error and overestimation increases), effectively widening the limits of agreement. The YPAQ, although demonstrating acceptable validity through correlational metrics, including the ability to rank individuals' PA, shows systematic bias through the measurement range as demonstrated by the Bland-Altman analyses. We would therefore advise caution if it is used to extract accurate levels of MVPA to be used in population prevalence estimates.

Comparisons with the original validation work
The initial validation work undertaken by Corder and colleagues 11 was conducted using a population group (12-13 year olds; n=25) similar to that of the present study (12)(13) year olds; n=44). Compared with participants in our study, those in the Corder study recorded 14 min/day more in MVPA (72 vs 58 min) as measured by accelerometry; median MVPA by YPAQ in the Corder sample was 92 min/day compared with 100 min/ day in our sample. The differences in accelerometry can be explained, to some degree at least, by the particular cut point used in each method (>1952 vs >2295 counts/min), although the use of different Acti-Graph models, epoch length from which the MVPA was calculated and processing options, such as nonwear time and total valid time per day, will have also contributed to these differences. YPAQ scores were  Both studies demonstrated a general over-reporting of MVPA by the YPAQ although a stronger, and significant, bias was found in the present study; 22.4 min MVPA/week (95% CI À155.6 to 200.4) in the Corder sample compared with 25.6 min/day (95% CI 10.4 to 40.9) in the present study. Only our study found that the degree of questionnaire error was dependent on activity level (Pearson correlation of 0.81 vs 0.02), with the complicated pattern observed suggesting underreporting at lower levels of activity and over-reporting at higher levels. The dependence of error across the measurement range is seldom reported in the literature, 7 but it can be seen from the present study that this finding may be considered problematic if it is used in population surveillance studies where guideline prevalence is of key importance.

Comparisons with other literature
The literature supports the premise that self-reported/ indirect measures of PA may over-report activity levels. In a systematic review conducted by Adamo and colleagues, 8 it was found that 72% of the reviewed indirect measures overestimated the directly measured values. Within the same review, correlations ranged from À0.56 to 0.89 highlighting both negative and positive relationships between the measures. In contrast to our findings, the review by Adamo and colleagues reported that girls were more likely to overestimate direct values of PA than boys(by 584% vs 114%, girls and boys, respectively). Why we have observed the opposite pattern is unclear, although one potential explanation may be that football and running were more commonly recorded among boys-often with large and extreme values.
The ability of the YPAQ to successfully rank MVPA is supported by a number of recent reviews on selfreported measures. 7 26 However, having the ability to rank PA is different to its ability to be used accurately as a surrogate for PA prevalence. Helmerhorst and colleagues in their recent review suggested that 'despite considerable effort, accurate and precise self-report physical activity instruments are still scarce'. 7 The reduction of a complex multidimensional construct (PA) into a single metric, potentially misleading understandings of what criterion methods are and, importantly, a lack of a comprehensive measurement framework have been cited as potential reasons for the inconsistencies seen in the literature. 27 We have to consider the participants themselves when we discuss inconsistency. Feedback from our fieldworkers suggested that many participants struggled with the concept of frequency and duration of activities. Cognitive immaturity, including memory recall, and the comprehension of questionnaire content can be problematic in youth. 25 Future work will collect data from 10-year-old children, and will be self-administered rather than interviewer/fieldwork administered. Our experience in this study-with older childrensuggests that issues may arise over comprehension of the YPAQ, and consequently affect data quality. A recent review 28 assessed 89 PA measures for their applicability to population surveillance and identified a small group of measures that received scientific and expert support. One of these measures, the Physical Activity Questionnaire for Children (PAQ-C) 10 may address the memory and comprehension issues faced within this study by assessing general levels of PA rather than trying to ascertain all facets of the behaviour.

Strengths and limitations
This study endeavoured to replicate the original design conducted by Corder and colleagues, 11 including similar participant ages and statistical approach. A strong scientific approach, particularly in measurement studies, is one where previous work is replicated, challenged, supported or refuted. This is especially true when employing different population groups in different settings. By doing so, greater confidence in the YPAQ's validation properties can be expected. As advocated in the literature, 29 we have been clear with regard to the measurement purpose, derivation of our outcome and analysis. Furthermore, this study tried to improve data quality by allocating significant time and resources to the process, including the employment of fieldworkers to actively supervise questionnaire completion.
One limitation of our study is that our sample size lacked the power to detect subgroup differences (eg, by gender), as 52% (n=46) of the participants did not meet the accelerometry inclusion criteria. Additionally, our decision to include children with 3 days of valid PA may not have been sufficient enough to provide an accurate representation of daily PA. 30 Moreover, these valid days may not have included a weekend day, which often involve less wear time, and PA levels, than weekdays. 31 Therefore, compared with the YPAQ, which asks participants to recall based on the 'previous 7 days', we may have introduced error into the analyses and inflated the level of MVPA as measured by accelerometry. Even so, there remained a significant overestimation of the YPAQ against the accelerometer. Furthermore, the accelerometers were worn during the waking hours and removed only for water-based activities and contact activities. As such, some activities (eg, swimming or rugby) may not have been recorded. Additionally, some activities (eg, cycling) may have been misclassified due to the placement of the accelerometer. This information could be included in future studies through the addition of self-reported cycling time; doing so may reduce the size of the overestimation.
In summary, although moderately correlated, these two methods should not be used interchangeably as agreement was poor, with error in the measurement highly dependent on activity level. From a practical perspective, the face-to-face administration of the YPAQ highlighted a number of concerns, and its employment in population surveillance (where a face-to face delivery may not be possible) to extract individual level MVPA should be considered carefully. Conversely, if a suitable standardised error was identified and adjusted for, then the YPAQ could be a cheaper, more practical way to measure PA if methods were employed to improve in situ participant comprehension.
Contributors PRWM was responsible for the original idea, with all authors contributing to the analytical approach. AP was the dedicated statistician on the paper, conducting all analyses. PRWM was responsible for drafting the paper and AP assisted with the Methods and Conclusionssections. All authors reviewed, commented and signed off the final draft.
Funding All authors are supported by the UK Medical Research Council Neighbourhoods and Communities Programme (MC_UU_12017/10).