## Discussion

The CONSORT group is a long-standing international collaboration of medical professionals from a variety of areas related to research conduct and publication including ‘trialists, methodologists and medicine journal editors.’4 Their primary aim has been to improve the reporting of RCTs, and their recommendations are endorsed by nearly 600 journals. Regarding RCTs, item 15 of the CONSORT statement advises presenting a table with baseline characteristics of the randomised groups, but discourages statistical testing of baseline differences.1 Our systematic evaluation of RCTs published in the sports medicine literature in 2005 and 2015 found that about two-thirds reported statistical testing of baseline differences. Our results suggest that the practice of statistical testing for baseline differences is more common in sports medicine journals than in the highest-impact medical journals such as *JAMA*, *BMJ* and *Lancet*.3 We also found that about 20% of studies across both years failed to include a baseline table.

We recognise that there are compelling reasons for why authors would argue in favour of reporting statistical tests of baseline differences. One may be to check if the randomisation was successful.2 If there are covariate imbalances that far exceed what one would expect, then there is reason to question the randomisation process. In addition, p values provide a uniform measure of baseline differences, combining the magnitude of the differences and the sample size into a single number. Statistical tests are also easy to perform and provide the reader with more quantitative information. However, there are sound reasons to not present statistical tests of baseline differences.5 The aim of statistical testing is to find the probability that the baseline differences would be due to chance if the groups were the same. Yet, as described by the CONSORT statement, if the participants were truly randomised, then it is known that any baseline differences were due to chance. Furthermore, the main concern of the CONSORT group is that statistical testing can ‘mislead investigators and their readers’1 (p 21).

How can statistical tests mislead investigators? Consider a study with a small number of participants. If investigators use baseline differences as a measure to assess which covariates were different and which were roughly equal between randomised groups, then they may not consider potential confounding from a covariate if the p value falls above the cut-off for statistical significance, generally 0.05. This may be a substantial problem in the sports medicine literature where many studies have small sample sizes. For example, in a 2007 RCT studying operative versus non-operative management as treatments for mid-shaft clavicle fractures,6 the outcomes (eg, strength, fracture non-unions) were known to correlate with the sex of the patient, a baseline covariate subject to randomisation. The operative group comprised 85% men and the non-operative group comprised 69% men, an absolute difference of 16%. Since there were only 111 participants in the trial, the p value for difference in sex was 0.06 and the authors did not adjust their analyses, stating, “there were no demographic differences between the operative and non-operative groups”6 (p 6). However, there was a large difference in the percentage of men and women in the groups, although the difference did not reach a level of statistical significance using a p value cut-off of 0.05. Since men tend to be stronger and have fewer non-unions, this difference would be expected to affect the outcomes of the treatment groups. Our evaluation of studies published in sports medicine journals found that about 80% sampled fewer than 100 participants. Conversely, in large studies, small differences may meet statistical significance yet not be meaningful. Authors may then adjust their analyses for these differences, adding extraneous covariates to their models that may have no consequences for the results.

Our study had limitations. While we aimed to identify all RCTs in the included journals, it is possible that some were missed in our PubMed search due to ambiguous language in the title and abstracts. We attempted to minimise this risk by using a publication type term (applied as part of formal indexing in PubMed) and several keywords related to randomisation. Although we had each article independently reviewed by two authors, there is a chance that some articles were misclassified. Finally, out of convenience we selected only the years 2005 and 2015. There is a small possibility that these two years were outliers. We welcome other researchers applying similar methods to RCTs published in other past and future years.

In summary, we found that 65% of RCTs in the sports medicine literature reported statistical testing of baseline differences between randomised groups, a value that changed little when comparing articles from 2005 and 2015. Reporting statistical tests of baseline differences contrasts with recommendations from the 2010 CONSORT statement.1 Authors should understand the rationale for and against statistical testing of baseline differences. Ideally, prior to the analysis, authors should select baseline covariates for adjustment (ie, those known to affect the outcome) and incorporate these covariates into their models. Journals that ask authors to follow the CONSORT statement guidelines should beware that many manuscripts are ignoring the recommendation against statistical testing of baseline differences.