Article Text
Statistics from Altmetric.com
Introduction
As a service to its readership, BJSM publishes educational editorials in order to provide methodological guidance and direction to clinicians and researchers.1 For example, a recent BJSM invited commentary outlined important methodological aspects of randomised controlled trials (RCTs) in order to help readers better understand and interpret findings.2 Similarly, a recent BJSM editorial written for clinicians has outlined five essential methodological considerations or ‘hacks’ to consider when reading a systematic review (SR), while another editorial has outlined common pitfalls for authors conducting a SR.3 4 In light of recent suggestions of untrustworthiness of systematic review and meta-analyses (SRMA) findings,5–7 researchers are encouraged to maximise the transparency of their work. Similarly, clinicians and policymakers are encouraged to scrutinise SRMA in order to decide if the findings should influence clinical decisions or policies. The purpose of our two-part educational editorial series is to highlight important methodological aspects of SRMA of the effects of interventions and thus help clinicians make judicious choices when considering whether to accept the findings. By using a worked example, we hope this resource serves as a guide for clinicians when critically appraising SRMA findings and deciding whether (or not) to accept the reported findings or the interpretation of the authors.
What is a systematic review and meta-analysis and how should it inform my practice?
A SR attempts to collate all empirical evidence that fits prespecified eligibility criteria in order to answer a specific research question.8 An interventional SR specifically assesses the benefits and harms of an intervention and should provide an unbiased, robust, transparent and reproducible overview of the effect of an intervention and the quality of the evidence from identified clinical studies.9 Should the SR contain sufficient and appropriate quantitative data from the included studies, a meta-analysis (MA) can be conducted to provide a pooled estimate of effect, and to examine variation among individual study effect estimates.9 As such, SRMA represent the highest standard of evidence of the effects of interventions and have the capacity to usefully inform clinical decisions, reduce research waste and direct policy making.3 5 7 10 However, the flip-side and increasingly worrying prospect is that poorly conducted or redundant SRMAs may mislead clinical decisions, policy makers and the public.6 11–15 This can reasonably lead clinicians to question whether the findings of a SRMA can be trusted.16 Thus, it is vital they have the confidence and ability to assess the quality of SRMA and therefore interpret findings appropriately. We encourage clinicians to appraise the findings before deciding ‘Should this SRMA change my practice?’. Central to this decision is an understanding of transparency and reproducibility in SRMA. The working example below is intended to illustrate these issues.
How is treatment effect determined?
An interventional SRMA pools the effects from multiple studies to obtain a pooled estimate of the effect of an intervention. Studies that use the same measure for a continuous outcome (e.g. pain intensity during running using a visual analogue scale [VAS]) may be pooled using an effect measure called the ‘mean difference’ (MD). The pooled effect size from the MA reflects the weighted average across studies of the difference (in pain during running) between groups, observed in each study, at the time point of interest (e.g. 12 weeks postintervention).17 Studies that use different measures for continuous outcomes may be pooled using an effect measure called the ‘standardised mean difference’ (SMD). Here, the difference in outcome between groups in each study is standardised to a uniform scale (by dividing this value by the standard deviation (SD) observed in the outcome across both groups). Thus, different measures such as a VAS (usually scored 0–100) and the pain component from the SF-36 questionnaire which is scored from 1 to 6 (1=None; 6=Very severe) can be pooled in a MA. The SMD is often more difficult to interpret than a MD because the scale reflects units of SD in outcome, rather than units of difference in outcome.17
In our worked example, Miller et al (2017) reported a pooled SMD of 0.47 (95% CI 0.22 to 0.72, p<0.001) which suggests a moderate treatment effect in favour of platelet-rich plasma injections (PRP) for tendinopathy. This is illustrated in the forest plot below (figure 1). In this example, negative values favour control injections, while positive values favour PRP. The SMD for each study is indicated by the boxes in the forest plot and is calculated from the mean outcome score, SD and sample size of each arm from each study at the time point of interest. Of note, the included studies used different outcome measures which measured different outcomes. For example, pain intensity (scored on a VAS) and disability (the Victorian Institute of Sport Assessment—Achilles (VISA-A). This comparison raises a key question; was the use of SMD appropriate?
The varied outcome measures employed by trials included in the example SRMA are listed in figure 2 and raise concerns about what treatment effect is actually being reported. Pain (e.g. VAS) is a different outcome than disability (e.g. VISA-A) as they reflect different domains of the condition.18 The clinician could be justified in asking ‘how could a result from a pain score at the elbow be compared with a disability scale for Achilles tendinopathy?’. The authors correctly did not use MD, given that the outcome measures differed across trials. However, combining measures of different domains of tendinopathy with the SMD is nevertheless not appropriate here. The SMD is used to standardise effects from different scales that measure the same outcome domain. It may not be used to standardise effects from scales used to measure different outcome domains (pain and disability, in this example). Consequentially, the pooled effect of 0.47 is difficult to interpret in terms of what outcome domain the effect pertains to. It may be argued that the example SRMA would have better informed clinicians had they pooled measures for pain and disability into separate analyses. In this case, use of SMD would have been appropriate had different scales measured the same outcome domain.
Interpreting the treatment effect and considering its trustworthiness
Clinicians do not work with SMDs in practice. Translating a pooled SMD into clinically usable terms is aided when the original trial data (between group comparisons) are presented by the authors. Having these data readily available would allow them to gain some understanding of the between group differences that create the reported SMD. For example, a SMD for the VISA-A of 0.03 (95%CI −0.51 to 0.56) is very difficult to interpret in clinical terms. This means there is no difference between groups at the time point of interest but may be more easily interpreted by knowing that it is derived from a control group mean (SD) 22.4 (17.2) and a PRP group mean (SD) 21.8 (25.9) for the VISA-A.
These trial data were not reported in the example SRMA. Transparent methodological reporting and data sharing facilitate reproducibility. In order to explore transparency and appraise the trustworthiness in our worked example, we attempted to source the outcome data from each individual RCT included in the MA by Miller et al (2017).
Using these data, we sought to reproduce the forest plot. Three authors (MJT/MCM/MKB), independently extracted data from the 16 included RCTs. However, eight of the included studies did not explicitly report the required data (mean and SD) for the outcome measure used in the MA. The Cochrane Handbook for Systematic Reviews of Interventions19 outlines recommended options for obtaining the required data.20
Accordingly, we contacted the corresponding authors of the RCTs that did not report the data required. Only one author replied with the requested data. For three RCTs we calculated the SD from the existing estimate of variance (standard error (SE) or 95% CI) and included these studies in the forest plot. One additional RCT only reported the mean change in pain from baseline and SE of that change so the SE reported at baseline was used to calculate SD. Finally, one RCT did not give an estimate of variance at all, an estimate of the SD was derived from the group means and reported p value.
We were unable to reproduce the SMD for two studies. For one, it was unclear what outcome variable was actually used to derive the SMD as there were multiple possible outcomes that may have been appropriate to use; however, this was not defined. The second study did not report raw scores for the pain outcome used.
Given our difficulties in obtaining the data required to reproduce the forest plot (outlined in detail in online supplementary file 1), we wrote to the corresponding author of the example SRMA to request the data and clarity on how these SMDs were generated. Unfortunately we received no response. Nonetheless, we used the data we extracted or calculated as described in an attempt to reproduce the example SRMA’s forest plot. We present this (and reasons for exclusions) in figure 3. The difficulty encountered and our ultimately unsuccessful attempt to exactly reproduce the reported forest plot illustrates that non-transparent reporting seriously inhibits reproducibility and therefore undermines the trustworthiness of SRMA findings. As expected, our results do not match exactly with the example SRMA but we obtained a pooled effect for PRP compared with the comparator of SMD of 0.41 (95% CI 0.11 to 0.70, p=0.007). However, because our results are derived from an incomplete dataset with varied outcomes it cannot be inferred that PRP is effective for tendinopathy.
Supplemental material
It is possible that the authors of the example SRMA faced similar difficulties and may have needed to contact RCT authors or impute SD from similar studies. However, this was not reported. Reporting to this level of detail would afford the reader greater clarity on how data were obtained. This is especially pertinent as we could not directly reproduce a result for six of the SMDs and could not reproduce two of them at all. Furthermore, SRMA authors are encouraged to pre-register their protocol. Transparency, methodological rigour and reproducibility are enhanced by pre-registering the SRMA protocol in advance, though the example SRMA did not report on their protocol being accessible in any registry (e.g., PROSPERO).
How should this be interpreted by clinicians?
An interventional SRMA should provide an unbiased, robust, transparent and reproducible overview of the effect of an intervention and the quality of the evidence supporting this finding. These reviews may usefully inform treatment choices for clinicians, but only if the clinician can assess the robustness and trustworthiness of the findings. Our worked example illustrates that there is merit in reading reviews with a sceptical and forensic eye as not all peer-reviewed findings are trustworthy. This example highlighted the importance of carefully considering the pooled estimate of the treatment effect with respect to whether MD or SMD is reported. If the latter, it is necessary to assess what outcome measures have been combined in a MA. Clinicians are well placed to critically judge whether combining varied outcome measures makes sense from a clinical perspective. If the combination of outcome measures does not appear appropriate, then it is reasonable to question whether the pooled estimate is meaningful. Second, when a pooled estimate of effect is provided (MD or SMD), we encourage clinicians to question whether individual trial data reporting appears sufficient to allow for an informed assessment of how the pooled estimate was generated. A lack of transparency through missing data or data simply not being reported renders the results non-reproducible and therefore undermines their trustworthiness.
Key messages
Systematic Review and Meta-analysis (SRMA) of the effects of interventions may usefully inform treatment choices for clinicians, but only if the clinician can assess the robustness and trustworthiness of the findings. Thus, we encourage clinicians to:
Consider the pooled estimate of the treatment effect with respect to whether mean difference or standardised mean difference (SMD) is reported.
If SMD is reported, it is important to assess what outcome measures have been combined in a meta-analysis.
Examine whether individual trial data reporting appears sufficient to allow for an informed assessment of how the pooled estimate was generated.
Recognise a lack of transparency and reproducibility through missing data or data simply not being reported undermines the trustworthiness of the results of a SRMA.
Acknowledgments
Benedict Martin Wand provided advice, direction and oversight to the planning of this project.
References
Footnotes
Contributors All authors contributed to the inception of the research idea. MJT, MCM and MKB performed the independent data extraction. WG provided oversight to the data extraction and associated disputes. MJT, WG and MCM performed the statistical analysis with oversight from PC and MB. All authors contributed to drafting / review of manuscript. MKB = Matthew K Bagg; MB = Max K Bulsara
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.