People using assistive technology may not be able to fully access information in this file. For assistance, please contact us.
Structured Abstract
Background
Critical appraisal of individual studies with a formal summary judgment for methodological quality and subsequent assessment of the strength of a body of evidence addressing a specific question are essential activities of conducting comparative effectiveness reviews (CERs). Uncertainty concerning the optimal approach of quality assessments has given rise to wide variations in practice. A well-defined and transparent methodology to evaluate the robustness of quality assessments is critical for the interpretation of systematic reviews as well as the larger CER process.
Purpose
To complete the first phase of a project to develop such a methodology, we aimed to examine the extent and potential sources of inter- and intra-rater variations in quality assessments, as conducted in our Evidence-based Practice Center (EPC).
Methods
We conducted three sequential exercises: (1) quality assessment of randomized controlled trials (RCTs) based on the default quality item checklist used in EPC reports without further instruction; (2) quality assessment of RCTs guided by explicit definitions of quality items; and (3) quality assessment of RCTs based on manuscripts stripped of identifying information, and performance of sensitivity analyses of quality items. The RCTs used in these exercises had been included in a previous CER on sleep apnea. Three experienced systematic reviewers participated in these exercises.
Data synthesis
We conducted three sequential exercises: (1) quality assessment of randomized controlled trials (RCTs) based on the default quality item checklist used in EPC reports without further instruction; (2) quality assessment of RCTs guided by explicit definitions of quality items; and (3) quality assessment of RCTs based on manuscripts stripped of identifying information, and performance of sensitivity analyses of quality items. The RCTs used in these exercises had been included in a previous CER on sleep apnea. Three experienced systematic reviewers participated in these exercises.
Limitations
The results presented here are based on a small sample of RCTs, selected from a single CER and assessed by three reviewers from one EPC only. The definitions of the items in our checklist were not evaluated for adequacy and clarity, other than for their face validity assessed by the reviewers of this study. We acknowledge that this default checklist may not be in widespread use across evidence synthesis practices, and is not directly aligned with the current trend to transfer the focus from methodological (and reporting) quality to explicit assessment of the risk of bias of studies. Due to these reasons, the generalizability and the target audience of this research activity may be limited. Furthermore, we did not examine how our quality assessment tool compared with other available tools or how our assessments would differ if applied in a different clinical question. Thus, our findings are preliminary, and no definite conclusions could and should be drawn from this pilot study.
Conclusions
We identified extensive variations in overall study ratings between three experienced reviewers. Discrepancies among reviewers in the assignment of individual items are common. While it may be desirable to have a single rating assessed by multiple reviewers using a process of reconciliation, in the absence of a gold standard method, it may be even more important to report the variations in assessments among different reviewers. A study with large variations in quality assessment may fundamentally be very different from one that has little variations, despite the fact that both of them are assigned the same consensus quality rating. Further investigations are needed to evaluate these hypotheses.