Hanging by a thread: How reliable are data from Phase III studies?

Many randomized, controlled phase III studies, which are intended to show the superiority of new cancer drugs over control therapies, have a shaky statistical foundation, as a recent study in The Lancet indicates.

Oncology Blog
By Dr. Sophie Christoph

Many randomized, controlled phase III studies, which are intended to show the superiority of new cancer drugs over control therapies, have a shaky statistical foundation, as a recent study in the Lancet indicates.

Decisions by the European Medicines Agency (EMA) or the US Food and Drug Administration (FDA) are based on a wide range of scientific evidence. An important component for such decisions is randomized, controlled phase III studies (RCT). However, a study published in a recent issue of The Lancet, and its accompanying editorial warn that the evidence used for tumor therapy approvals has some worrying weaknesses.1, 2

An analysis of approval studies for oncological drugs from 2014 to 2018 showed that slightly more than half of the RCTs had a fragility index of 2 or less - this means that if 2 or fewer patients in the intervention group had not remained event-free, and instead an event such as progress or death had occurred, the study results would no longer have reached a statistically significant level, i.e. no (significant) difference between the intervention and control cohorts would have been detectable.

Doubts cast on results’ robustness

The fragility index - in other words, the minimum number of changes from "no event" to "event" leading to a loss of statistical significance (p < 0.05) - is thus a kind of confidence parameter for the real presence of a positive effect reported in an RCT.

The evaluation identified 36 Phase III RCTs, 17 of which were suitable for determining a fragility index (two-arm studies with 1:1 randomization and time-to-event outcome). The median fragility index was only 2 and in 9 out of 17 studies (53%) it was 2 or less. In these studies, the fragility index was one percent or less of the total sample size. In 5 studies, the number of “lost to follow-up” was greater than the fragility index.

Do we need to redefine our notion of a "positive outcome"?

Should more than one significant, positive Phase III RCT be required for new approvals of oncology drugs? There are many arguments against it: first there are high costs involved, survival endpoints require long observation periods, we want to ensure that patients get access to the probably superior therapy as quickly as possible, and finally, many sponsors are reluctant to fund more than one study. However, all this misses one essential point, which becomes obvious in the above-mentioned study: that a second positive result is not a foregone conclusion.

A further problem, which we had addressed is the necessary differentiation between statistical and clinical benefit. The Lancet editorial concludes with an obvious suggestion: if the quantity of evidence required for regulatory approvals does not change, we would have to require real-world data on efficacy (and not just toxicity) for all approved drugs - either via phase-IV/post-marketing studies or real-world patient evidence, and act on these results if they differ significantly from the data originally presented as evidence. Another idea would be to include the fragility index as an additional marker in studies. Even if it is not a measure of efficacy, it would help assess how much weight can be given to the results.

References:
1. Editor. Are results from clinical trials reliable? The Lancet Oncology 20, 1035 (2019).
2. Paggio, J. C. D. & Tannock, I. F. The fragility of phase 3 trials supporting FDA-approved anticancer medicines: a retrospective analysis. The Lancet Oncology 20, 1065-1069 (2019).