Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes

doi:10.1016/j.jclinepi.2007.03.012

Journal of Clinical Epidemiology

Volume 61, Issue 2, February 2008, Pages 102-109

https://doi.org/10.1016/j.jclinepi.2007.03.012 Get rights and content

Abstract

Objective

The objective of this review is to summarize recommendations on methods for evaluating responsiveness and minimal important difference (MID) for patient-reported outcome (PRO) measures.

Study Design and Setting

We review, summarize, and integrate information on issues and methods for evaluating responsiveness and determining MID estimates for PRO measures. Recommendations are made on best-practice methods for evaluating responsiveness and MID.

Results

The MID for a PRO instrument is not an immutable characteristic, but may vary by population and context, and no one MID may be valid for all study applications. MID estimates should be based on multiple approaches and triangulation of methods. Anchor-based methods applying various relevant patient-rated, clinician-rated, and disease-specific variables provide primary and meaningful estimates of an instrument's MID. Results for the PRO measures from clinical trials can also provide insight into observed effects based on treatment comparisons and should be used to help determine MID. Distribution-based methods can support estimates from anchor-based approaches and can be used in situations where anchor-based estimates are unavailable.

Conclusion

We recommend that the MID is based primarily on relevant patient-based and clinical anchors, with clinical trial experience used to further inform understanding of MID.

Introduction

Patient-reported outcomes (PROs) are frequently incorporated in clinical trials comparing health interventions for chronic diseases. These PROs include measures of health-related quality of life (HRQL), symptoms, and treatment satisfaction. PROs provide the patient's perspective and help us understand the effects of disease and treatment on symptoms, functioning, and other outcomes [1], [2], [3]. For many chronic diseases, PROs represent one of the most important health outcomes for evaluating the effectiveness of treatments and changes in disease trajectory. As far back as Hippocrates, listening to the patient has been considered an integral part of medical science [4]. Therefore, the patient's perspective of her health is integral to understanding health outcomes. The application of relevant and psychometrically sound PROs in clinical trials assists patients, their family members, and clinicians in understanding the comprehensive impact of treatment on patient symptoms, functioning, treatment preferences, and general well being.

To be useful in clinical trials evaluating new health interventions, PROs, similar to other health outcomes, must have acceptable reliability and validity [1], [2], [5], [6]. Responsiveness is an aspect of construct validity [7] and is determined by evaluating the relationship between changes in clinical and patient-based endpoints and changes in the PRO scores over time, or based on the application of a treatment of known and demonstrated efficacy [2], [5], [8]. Responsiveness can be evaluated based on observational studies or in clinical trials. Evidence supporting responsiveness and for interpreting PRO results is critical for clinical trial settings. Information on the interpretation of changes or differences in PRO scores is based on the minimal important difference (MID). Demonstrating a MID is also important evidence for achieving successful PRO claims through regulatory agencies [9], [10]. Nonetheless, virtually all instruments found to differentiate among clinically distinct groups are also found to be responsive to change.

Although responsiveness and interpretation of PRO measures have been discussed for the past 15 years or more [2], [5], [7], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], recommendations about the best approach for evaluating responsiveness and determining MIDs for PRO instruments are still needed. For example, the FDA requested further information and guidance on methods for determining responsiveness and MID [9]. Although there is an evolving consensus as to the best approach to evaluating responsiveness and MID [25], [26], [29], there is no clear statement about the recommended methods and about important issues underlying responsiveness and MID.

This report focuses on issues and recommendations for evaluating responsiveness and MID for PRO measures in chronic disease. These issues are especially germane given that for most chronic diseases cure is not feasible, and that the main objective of treatment is to maintain or improve patient functioning and well being. The remainder of this report will cover (1) conceptual issues and definitions; (2) methods for evaluating responsiveness and MID; (3) recommended decision criteria for determining MID; and (4) summary and conclusions. We will illustrate methods and concepts using published health outcomes literature.

Section snippets

Interpretation of PROs: conceptual issues and definitions

PROs require the patient to assign a response to questions (or statements) about their perceptions or activities, such as symptoms, capabilities, or performance of roles or responsibilities. These responses are typically combined in some way to create summary scores that can be used to measure concepts such as physical, psychological, or social functioning and well being, or symptom burden or severity. Symptoms can be rated based on frequency, severity, duration, degree of bother, or impact on

Methods of evaluating responsiveness and clinical significance

Longitudinal studies are needed to determine whether a PRO instrument is responsive to changes or differences. These studies may be randomized clinical trials comparing treatments of known efficacy or observational studies where patients are treated with usual medical care and followed over relevant periods of time. For clinical trial designs, there needs to be some evidence that the treatment is effective and that the expected changes in clinical status are linked to expected changes in the

Recommended decision criteria for determining MID

The application of multiple methods to determine the MID for a PRO instrument in a specific patient population will almost always result in a range of values for the MID. This is the essence of triangulation, that is, examining multiple values from different approaches and hopefully converging on a small range of values (or one single value). It is recommended that the different MID estimates be graphed to visually depict the range of estimates. Figure 1 provides a summary of MID estimates from

Summary and conclusions

For PRO endpoint data to be accepted as evidence of treatment efficacy there must be evidence documenting the instrument's conceptual framework, content validity, and psychometric qualities. For responsiveness, it is necessary to demonstrate that the PRO scores are sensitive to actual changes in health status. Although demonstrating responsiveness is a key component to establishing an instrument's construct validity, it is also important to determine the MID to assist in interpreting

Acknowledgments

This paper was supported in part by Genentech, South San Francisco, California, the UCLA/DREW Project EXPORT, National Institutes of Health, National Center on Minority Health & Health Disparities, (P20-MD00148-01), the UCLA Center for Health Improvement in Minority Elders/Resource, Centers for Minority Aging Research, National Institutes of Health, National Institute of Aging, (AG-02-004), and the National Institute of Aging (AG20679-01).

References (52)

N.K. Leidy et al.
Recommendations for evaluating the validity of quality of life claims for labeling and promotion
Value Health
(1999)
G. Guyatt et al.
Measuring change over time: assessing the usefulness of evaluative instruments
J Chronic Dis
(1987)
R. Jaeschke et al.
Measurement of health status. Ascertaining the minimal clinically important difference
Contr Clin Trials
(1989)
D. Cella et al.
Group vs individual approaches to understanding the clinical significance of differences or changes in quality of life
Mayo Clin Proc
(2002)
G. Guyatt et al.
Methods to explain the clinical significance of health status measures
Mayo Clin Proc
(2002)
R.D. Crosby et al.
Defining clinically meaningful change in health-related quality of life
J Clin Epidemiol
(2003)
J.A. Sloan et al.
Clinical significance of patient-reported questionnaire data: another step toward consensus
J Clin Epidemiol
(2005)
K.J. Yost et al.
Minimally important differences were estimated for the Functional Assessment of Cancer Therapy-Colorectal (FACT-C) instrument using a combination of distribution- and anchor-based approaches
J Clin Epidemiol
(2005)
D. Cella et al.
Combining anchor and distribution based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy anemia and fatigue scales
J Pain Symptom Manage
(2002)
M.A.G. Sprangers et al.
Assessing meaningful changes in quality of life over time: a user's guide for clinicians
Mayo Clin Proc
(2002)

E.F. Juniper et al.

Determining a minimal important chance in the disease-specific quality of life questionnaire

J Clin Epidemiol

(1994)

K. Niebauer et al.

Impact of omalizumab on quality-of-life outcomes in patients with moderate-to-severe allergic asthma

Ann Allergy Asthma Immunol

(2006)

K.W. Wyrwich et al.

Further evidence supporting an SEM-based criteria for identifying meaningful intra-individual changes in health-related quality of life

J Clin Epidemiol

(1999)

D.A. Revicki et al.

Recommendations on health-related quality of life research to support labeling and promotional claims in the United States

Qual Life Res

(2000)

R.J. Wilke et al.

Measuring treatment impact: a review of patient-reported outcomes and other efficacy endpoints in approved product labels

Contr Clin Trials

(2004)

Hippocrates. On decorum. Hippocrates, with an English translation. In: Jones WH, translator. Cambridge, MA: Harvard...

R.D. Hays et al.

Reliability and validity (including responsiveness)

P.M. Fayers et al.

Quality of life: Assessment, analysis and interpretation

(2000)

R.D. Hays et al.

Responsiveness to change: an aspect of validity, not a separate dimension

Qual Life Res

(1992)

FDA

Guidance for industry—patient-reported outcome measures: Use in medical product development to support labeling claims

(2006)

Committee for Medicinal Products for Human use

Reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products

(2005)

N.S. Jacobson et al.

Clinical significance: a statistical approach to defining meaningful change in psychotherapy research

J Clin Consult Psychol

(1991)

M.J. Liang

Evaluating measurement responsiveness

J Rheumatol

(1995)

E. Lydick et al.

Interpretation of quality of life changes

Qual Life Res

(1993)

M. Testa et al.

Interpreting pharmacoeconomic and quality-of-life clinical trial data for use in therapeutics

Pharmacoeconomics

(1992)

D. Osoba et al.

Interpreting the significance of changes in health-related quality of life scores

J Clin Oncol

(1998)

Cited by (1603)

Estimating a Minimal Important Difference for the EQ-5D-5L Utility Index in Dialysis Patients
2024, Value in Health
The EQ-5D-5L is a commonly used health-related quality of life instrument for evaluating interventions in patients receiving dialysis; however, the minimal important difference (MID) that constitutes a meaningful treatment effect for this population has not been established. This study aims to estimate the MID for the EQ-5D-5L utility index in dialysis patients.
6-monthly EQ-5D-5L measurements were collected from adult dialysis patients between April 2017 and November 2020 at a renal network in Sydney, Australia. EQ-VAS and Integrated Palliative care Outcome Scale Renal symptom burden scores were collected simultaneously and used as anchors. MID estimates for the EQ-5D-5L utility index were derived using anchor-based and distribution-based methods.
A total of 352 patients with ≥1 EQ-5D-5L observation were included, constituting 1127 observations. Mean EQ-5D-5L utility index at baseline was 0.719 (SD ± 0.267), and mean EQ-5D-5L utility decreased over time by −0.017 per year (95% CI −0.029 to −0.006, P = .004). Using cross-sectional anchor-based methods, MID estimates ranged from 0.073 to 0.107. Using longitudinal anchor-based methods, MID for improvement and deterioration ranged from 0.046 to 0.079 and −0.111 to −0.048, respectively. Using receiver operating characteristic curves, MID for improvement and deterioration ranged from 0.037 to 0.122 and −0.074 to −0.063, respectively. MID estimates from distribution-based methods were consistent with anchor-based estimates.
Anchor-based and distribution-based approaches provided EQ-5D-5L utility index MID estimates ranging from 0.034 to 0.134. These estimates can inform the target difference or “effect size” for clinical trial design among dialysis populations.
Minimal clinically important differenceof fatigue severity scale in patients with chronic stroke
2024, Journal of Stroke and Cerebrovascular Diseases
One of the most prevalent symptoms of stroke is fatigue. Fatigue severity scale is the most often used tool for evaluating fatigue in stroke patients, its minimal clinically important difference threshold has not been determined. This study aimed to identify the minimal clinically important difference of fatigue severity scale in stroke patients.
All study participants were examined using fatigue severity scale and multidimensional fatigue symptom inventory-short form before and after the intervention. The 6-week intervention combined graded activity training and pacing therapy employed to reduce fatigue severity. Participants reported changes in their fatigue severity after the intervention with the global rating of change and visual analog scale. The minimal clinically important difference of the fatigue severity scale calculated using both anchor- and distribution-based methods.
A total of 117 stroke patients were included in the study. Using multidimensional fatigue symptom inventory-short form, global rating of change, and visual analog scale as an anchor, the minimal clinically important difference of fatigue severity scale was obtained at 3.5, 4.5, and 4.5, respectively. The minimal clinically important difference for fatigue severity scale varied from 4.28 to 12.90 using the distribution-based method, with SEM = 4.28 displaying the best sensitivity and specificity for use as minimal clinically important difference.
The minimal clinically important difference value for the fatigue severity scale was estimated at 3.5_12.90 using anchor-based and distribution-based methods. The study's results can be utilized to understand the effectiveness of fatigue interventions in stroke patients in clinical and research settings
Bridging the gap between statistical significance and clinical relevance: A systematic review of minimum clinically important difference (MCID) thresholds of scales reported in movement disorders research
2024, Heliyon
Minimum clinically important difference (MCID) is the smallest change in an outcome measure that is considered clinically meaningful. Using validated MCID thresholds for outcomes powers trials adequately to detect meaningful treatment effects, aids in their interpretation and guides development of new outcome measures.
To provide a comprehensive summary of MCID thresholds of various symptom severity scales reported in movement disorder.
We conducted systematic review of the literature and included studies of one or more movement disorders, and reporting MCID scales.
2763 reports were screened. Final review included 32 studies. Risk of bias (RoB) assessment showed most studies were of good quality. Most commonly evaluated scale was Unified Parkinson's Disease Rating Scale (UPDRS) (11 out of 32). Four studies assessing MDS-UPDRS had assessed its different sub-parts, reporting a change of 2.64,3.05,3.25 and 0.9 points to detect clinically meaningful improvement and 2.45,2.51,4.63 and 0.8 points to detect clinically meaningful worsening, for the Part I, II, III and IV, respectively. For Parts II + III, I + II + III and I + II + III + IV, MCID thresholds reported for clinically meaningful improvement were 5.73, 4.9, 6.7 and 7.1 points respectively; while those for clinically meaningful worsening were 4.7, 4.2, 5.2 and 6.3 points, respectively. MCID thresholds reported for other scales included Abnormal Involuntary Movement Scale (AIMS), Toronto Western Spasmodic Torticollis Rating Scale (TWSRS), and Burke-Fahn-Marsden Dystonia Scale (BFMD).
This review summarizes all the MCID thresholds currently reported in Movement disorders research and provides a comprehensive resource for future trials, highlighting the need for standardized and validated MCID scales in movement disorder research.
Improvement of muscle quality assessed using the phase angle is influenced by recovery of knee extension strength in patients with hip fractures
2024, Clinical Nutrition
Studies reported that knee extension strength on the operated side in patients with hip fractures was not recovered to the level on the non-operated side 6 months after surgery or later. In a cross-sectional study, we revealed that a reduction in isometric knee extension muscle strength on the operated side in patients with hip fractures approximately 6 months after surgery was associated with not only a reduction in skeletal muscle mass but also a reduction in muscle quality, characterized by a reduction in the phase angle (PhA). Furthermore, the mechanisms of knee extension strength improvement can be clarified in more detail using the minimal significant change as the index of recovery. However, no longitudinal studies have examined the factors for knee extension strength improvement based on the minimal significant change in patients with hip fractures 6 months after surgery. This study aimed to longitudinally examine the factors influencing the recovery of knee extension strength based on the minimal significant change in patients with hip fractures between 2 weeks and approximately 6 months after surgery.
In this study, the outcomes used were basic and medical information, PhA, skeletal muscle index (SMI), pain, one-leg standing time, movement control during one-leg standing, and walking speed. For PhA, SMI, pain, one-leg standing time, movement control during one-leg standing, and walking speed, the amount of change was calculated by subtracting the data at 2 weeks from the data at 6 months. Group classification was determined by dividing the patients into two groups using a previous study as a reference: recovery group if the knee extension strength value approximately 6 months after surgery minus that 2 weeks after surgery was ≥3.3 kgf and non-recovery group if the value was <3.3 kgf. Logistic regression analysis was performed to explore the association between the recovery and non-recovery groups.
The recovery group contained 55 patients, while the non-recovery group comprised 35 patients. The only significant factor associated with knee extension muscle strength in the recovery group was the amount of change in PhA. The odds ratio for the amount of change in PhA was 2.26. The discrimination rate of the model was 62.5%.
Our results suggest that recovery of knee extension strength in patients with hip fractures after surgery was mainly because of improvements in muscle quality, not improvements in muscle mass or pain.
Psychometric Properties of the Brief Pain Inventory Among Patients With Osteoarthritis Undergoing Total Knee Arthroplasty Surgery
2024, Journal of Arthroplasty
Knee osteoarthritis (OA) is characterized by pain and functional restrictions, necessitating precise and reliable pain evaluation for effective disease surveillance and postoperative treatment appraisal.
This investigation recruited 110 participants who were slated to receive unilateral total knee arthroplasty (TKA) and administered 3 self-reported questionnaires: the Brief Pain Inventory (BPI), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), and 5-level EuroQoL Group's 5-dimension questionnaire (EQ-5D-5L), at baseline and 1 year after surgery. Using standard statistical methods and indicators, the BPI was subjected to a battery of psychometric evaluations, including assessments of reliability, validity, and responsiveness.
At baseline, there were no significant ceiling or floor effects observed. Additionally, the internal consistency reliability (Cronbach's alpha) of the BPI was above 0.8, suggesting that the questionnaire items are adequately related to one another. The study found moderate to very strong correlations between the pain and physical function domains of the BPI and Western Ontario and McMaster Universities Osteoarthritis Index, as well as a strong correlation between the functional interference dimension of the BPI and the EQ-5D, supporting the construct validity of the BPI. Also, the BPI was found to be responsive to changes in pain over time, with a responsiveness index ranging from 2.55 to 3.19.
The BPI assessment tool demonstrated good reliability, validity, and responsiveness in knee osteoarthritis patients who have undergone TKA and can be a useful measurement tool in clinical research to evaluate the effectiveness of pain management strategies and surgical interventions.
Minimal clinically important difference in maxillofacial trauma patients: a prospective cohort study
2024, British Journal of Oral and Maxillofacial Surgery
The present study estimated the minimal clinically important difference (MCID) for pain on a visual analogue scale – numerical rating scale (VAS-NRS) and mean bite force (MBF) in patients treated for maxillofacial trauma (MFT). This cohort study included 120 MFT patients treated according to AO principles. Preoperative and four-week postoperative pain on the VAS-NRS, and MBF were measured to calculate MCIDs as indicators of functional rehabilitation. The patient’s perspective of the treatment was assessed using a four-item anchor question. The MCID was determined by two anchor-based approaches, namely, the change difference (CD) method and receiver operating characteristic (ROC) curve method. According to the CD method, the MCID for pain was 2.4 and the MBF was 147.9 N. Based on the ROC curve, the MCID for pain was 2.5 (sensitivity 91.7%, specificity 47.2%) and MBF was 159.1 N (sensitivity 71.4%, specificity 61.1%). This study demonstrated a high sensitivity (>70%) for MCID, which implies that pain reduction of 2.4–2.5 points on the VAS-NRS and a gain in MBF of 147.9–159.1N are clinically relevant for patients treated for MFT.

View all citing articles on Scopus

View full text

Review ArticleRecommended methods for determining responsiveness and minimally important differences for patient-reported outcomes

Abstract

Objective

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Interpretation of PROs: conceptual issues and definitions

Methods of evaluating responsiveness and clinical significance

Recommended decision criteria for determining MID

Summary and conclusions

Acknowledgments

Value Health

J Chronic Dis

Contr Clin Trials

Mayo Clin Proc

Mayo Clin Proc

J Clin Epidemiol

J Clin Epidemiol

J Clin Epidemiol

J Pain Symptom Manage

Mayo Clin Proc

J Clin Epidemiol

Ann Allergy Asthma Immunol

J Clin Epidemiol

Recommendations on health-related quality of life research to support labeling and promotional claims in the United States

Qual Life Res

Measuring treatment impact: a review of patient-reported outcomes and other efficacy endpoints in approved product labels

Contr Clin Trials

Reliability and validity (including responsiveness)

Quality of life: Assessment, analysis and interpretation

Responsiveness to change: an aspect of validity, not a separate dimension

Qual Life Res

Guidance for industry—patient-reported outcome measures: Use in medical product development to support labeling claims

Reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products

Clinical significance: a statistical approach to defining meaningful change in psychotherapy research

J Clin Consult Psychol

Evaluating measurement responsiveness

J Rheumatol

Interpretation of quality of life changes

Qual Life Res

Interpreting pharmacoeconomic and quality-of-life clinical trial data for use in therapeutics

Pharmacoeconomics

Interpreting the significance of changes in health-related quality of life scores

J Clin Oncol

Review Article
Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes