Semantics: One can think of the "level" of a body of evidence as the certainty of the assessment allowed from the evidence. This may also be considered as the "quality" of the evidence to address a specific question. Because the word "quality" can have other meanings (e.g., the extent to which individual studies meet given criteria), the Editorial Board prefers the word "level."
Certainty: Certainty can be considered in several ways, one of which is to use the heuristic of the "confidence interval." This "confidence interval" does not come from a statistical calculation, but is instead a "conceptual confidence interval" (CCI) representing the Editorial Board's assessment of the range of values for direction and magnitude of the effect of the intervention that is consistent with the overall evidence. This CCI is seeking to answer an effectiveness question (not simply an efficacy question). The question at issue is:
What would the health effects be if the intervention were widely implemented in routine practice? Even well-conducted RCTs may not provide "good" evidence if they are not conducted in generalizable populations, using generalizable tests and treatments. The CCI for evidence based on RCTs conducted in highly selected populations might be wide rather than narrow ("fair" as opposed to "good") unless there are reasons to believe the effects would be the same in the general population.
Benefits: Potential benefits include mortality reduction (and should note whether disease-specific or overall), improved quality of life, improved function, and reduced need for invasive procedures or interventions. If mortality reduction from an RCT is disease-specific with no trend toward overall mortality reduction, then the Editorial Board might consider the contribution of this disease-specific reduction to overall mortality to be less certain. As the question at issue is an all-cause mortality question, the CCI for a study of only disease-specific reduction without other indications of overall reduction may be considered wider than if there were indications of overall mortality reduction.
Harms: Harms may be considered in 3 categories:
Psychological (often from labeling or anxiety after a false-positive test or a diagnosis of an "intermediate" condition of no clinical importance).
Complications from diagnostic or monitoring tests (e.g., colonoscopy with perforation for a positive FOBT).
Complications or side effects from treatment, especially treatment from which there are no benefits (e.g., in the case of "overdiagnosis").
Extrapolation: Estimates of the presence and magnitude of either benefits or harms may come from extrapolation from indirect evidence; the degree of extrapolation determines the CCI for each estimate. For example, evidence shows that screening for ovarian cancer results in many false-positive tests (due to the low prevalence of the disease) for which the workup is an invasive procedure. Other evidence could provide an estimate of the effects of the invasive workup, including not only complications but also the discomfort, anxiety, and time of reduced usual activity. The evidence, then, for the harms of screening is indirect (i.e., not from an RCT of screening) but would still provide at least "fair" evidence of harms (i.e., intermediate width CCI).
Judgment: Judgment is involved at several steps in this process, including the assessment of "quality" (internal validity), consistency/coherence, external validity, and the overall "level" of evidence. It is important in each case that the rationale and conclusion be as explicit and transparent as possible. The reasoning behind the judgment of the overall Level of Evidence should be stated clearly.