Citation: Kimmelman J, London AJ (2011) Predicting Harms and Benefits in Translational Trials: Ethics, Evidence, and Uncertainty. PLoS Med 8(3): e1001010. doi:10.1371/journal.pmed.1001010
Published: March 8, 2011
Copyright: © 2011 Kimmelman, London. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors were supported by Canadian Institutes of Health Research (EOG 102823) and a fellowship from the Andrew W. Mellon Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Provenance: Not commissioned; externally peer reviewed.
- Ethical judgments about risk, benefit, and patient eligibility in clinical trials hinge on predictions about harm, therapeutic response, and clinical promise.
- Predictions for novel interventions in preclinical stages of development suffer from two problems: insufficient attention to threats to validity in preclinical research and a reliance on an overly narrow base of evidence that includes only animal and clinical studies of the intervention in question (“evidential conservatism”).
- To improve ethical and scientific decision-making in early phase studies, decision-makers should explicitly attend to reporting quality and methodological features in preclinical experiments that address threats to internal, construct, and external validity.
- Decision-makers should also use evidence that sheds light on the reliability of causal claims embedded within a proposed trial. This evidence can be gathered from outcomes of previous trials involving agents targeting related biological pathways (“reference classes”).
First-in-human clinical trials represent a critical juncture in the translation of laboratory discoveries. However, because they involve the greatest degree of uncertainty at any point in the drug development process, their initiation is beset by a series of nettlesome ethical questions : has clinical promise been sufficiently demonstrated in animals? Should trial access be restricted to patients with refractory disease? Should trials be viewed as therapeutic? Have researchers adequately minimized risks?
The resolution of such ethical questions inevitably turns on claims about future events like harms, therapeutic response, and clinical translation. Recurrent failures in clinical translation, like Eli Lilly's Alzheimer candidate semagacestat, highlight the severe limitations of current methods of prediction. In this case, patients in the active arm of the placebo-controlled trial had earlier onset of dementia and elevated rates of skin cancer .
Various authoritative accounts of human research ethics state that decision-making about risk and benefit should be careful, systematic, and non-arbitrary –. Yet, these sources provide little guidance about what kinds of evidence stakeholders should use to ensure their estimates of such events ground responsible ethical decisions. In this article, we suggest that investigators, oversight bodies, and sponsors often base their predictions on a flawed and inappropriately narrow preclinical evidence base.
Prediction and Ethical Decision-Making
According to the core tenets of human research ethics, investigators, sponsors, and institutional review boards (IRBs) are obligated to ensure that risks to volunteers are minimized and balanced favorably with anticipated benefits to society and, if applicable, to the volunteers themselves ,. Accurate prediction plays a critical role in this process. When research teams underestimate the probability of favorable clinical or translational outcomes, they undermine health care systems by impeding clinical translation. When investigators overestimate the probability of favorable outcomes, they potentially expose individuals to unjustified burdens, which may be considerable for phase 1 studies involving unproven drugs. In both cases, misestimation threatens the integrity of the scientific enterprise, because it frustrates prudent allocation of research resources .
Naturally, there are limits to the reliability with which forecasts based on experimental evidence predict clinical outcomes. However, in late stages of clinical development, forecasts underwriting ethical and scientific decision-making have proven fairly reliable. Several analyses of cancer randomized controlled trials indicate that new interventions are just as likely to prove more effective than comparator ones as they were to prove inferior –. Similar findings have been reported for other indications . In the aggregate at least, researchers and review committees neither overestimate nor underestimate the medical benefits of allocating some patients to new interventions and others to standard drugs.
Whether decision-makers utilize evidence as effectively when predicting outcomes in early phase research has not been systematically investigated. Nevertheless, there are grounds for concern such that a systematic investigation is overdue. Highly promising preclinical findings in cancer, stroke, HIV vaccines, and neurodegenerative diseases frequently fail clinical translation. In cancer, only 5% of products entering trials are eventually licensed ,. In one study, approximately 5% of high impact basic science reports were clinically translated within 10 years . We suggest that these disappointments partly reflect two problems in the way evidence is used in predicting clinical outcomes.
Preclinical Reporting and Validity
First, decision-makers may not be adequately responsive to problems in preclinical research practice . Systematic reviews repeatedly demonstrate that many animal studies do not enable reliable causal inference and clinical generalization because they do not address important threats to internal, construct, and external validity. With respect to the first, one recent analysis of animal studies showed that only 12% used random allocation and 14% used blinded outcome assessment . Construct validity concerns the relationship between clinical implementation of an intervention and implementations evaluated in preclinical studies. A recent review found that clinical studies of cardiac arrest interventions applied treatment significantly sooner after cardiac events than in preclinical studies . In the case of Astra Zeneca's failed stroke drug NXY-059, use of normotensive rodents in preclinical development may have led to spurious predictions of clinical activity . Preclinical studies do not always test the extent to which cause and effect relationships hold up under varied conditions (external validity). In a systematic review of neuroprotective agents in phase 2 and 3 trials, only two of ten agents were tested in both rodents and higher order species . Finally, deficiencies in reporting and aggregation of preclinical evidence deprive decision-makers of crucial evidence. In one recent analysis, publication bias in preclinical stroke studies led to a 30% overestimation of treatment effect size . Clearly, preclinical researchers should endeavor to follow reporting guidelines  such as the recently proposed Animals In Research: Reporting In Vivo Experiments Guidelines (ARRIVE; http://www.nc3rs.org.uk/page.asp?id=1357) , and clinical predictions following from animal studies should take into account deficiencies in design and reporting.
In the case of semegacestat, it has been over 5 years since the drug was first tested in human beings, and preclinical studies have yet to be published. However, narrative reviews by Eli Lilly scientists indicate trials were launched on the basis of molecular, rather than behavioral, endpoints . Although the absence of publication makes difficult any assessment of animal study quality, the use of molecular endpoints raises questions about the construct validity of clinical generalizations drawn from preclinical experiments.
A second concern about forecasting outcomes in translational trials relates to a tendency to base clinical inferences on a relatively narrow class of evidence: those preclinical studies that involve the particular agent. We call this “evidential conservatism.” Such evidential conservatism is reflected in various policies. For example, the American Society of Clinical Oncology states that “the decision to move an agent into phase I evaluation is based… central[ly on]… the observation of sufficient preclinical antitumor activity, such that a therapeutic effect in human cancer is anticipated” ,. International Council on Harmonization policy requires investigators to furnish ethics review committees with only a narrow type of preclinical evidence . Similarly, some commentators argue that risk-benefit decisions in early phase trials should be driven by mechanistic evidence about an agent .
Evidential conservatism, however, fails to address the higher-order question of the reliability of forecasts made from such a narrow evidence base. This higher-order question is of special relevance for early phase research because agents that do not enjoy the support of promising preclinical results will not be plausible candidates for translation. Yet when agents are supported by equally promising preclinical results they may be differentiated by the maturity of the knowledge surrounding a nexus of variables concerning the relationship between test and target populations.
For instance, although neuroprotective stroke treatments have moved to translation on the basis of very encouraging preclinical studies, they have consistently failed randomized trials. Estimates of the risks and benefits of any particular neuroprotective compound that are based solely on preclinical evaluation of that compound will be less reliable than those that incorporate information about the relative success of neuroprotective compounds as a class. In part, this is because the success or failure of other interventions in this reference class provides evidence about the degree to which clinical development is guided by a reliable working knowledge of relevant disease processes.
Our claim that decision-makers need to use a broader base of evidence for evaluating early phase research is consistent with a recent call for incorporating whole research program outcomes into systematic reviews of particular agents .
Assessing Relevant Evidence
How might researchers depart from evidential conservatisim in a way that is open to scrutiny and amenable to assessment, revision, and improvement? Decision-makers who make forecasts about agent activity in early phase research must identify reference classes that are relevant to the decision at hand. Delimiting the reference class of relevant evidence poses a challenge in that interventions possess limitless characteristics. A drug might be classed within neuroprotective compounds, stroke drugs, and drugs beginning with the letter “n.” Decision-makers thus confront the timeless problem of selecting those characteristics most salient for prediction.
There are no simple formulas here. In some cases, choice of reference classes will be straightforward (e.g., a new, small molecule HMG-CoA reductase inhibitor); in other cases, consensus may be elusive. Nevertheless, we suggest that the very act of attending to reference class identity would be a departure from evidential conservatism. As a starting place, decision-makers should identify reference classes that index the maturity of knowledge regarding central causal premises embedded within a protocol. In an era in which basic science heavily informs product development, drug developers themselves often class their agents according to explicit ambitions about causal pathways. Asserting that a drug targets a particular pathophysiologic process should prompt us to look at how other drugs that target the same process performed in clinical translation. We can then base our estimates of the maturity of knowledge about these causal premises on the success or failure of past attempts at redeeming these ambitions. Decision-makers should therefore adjust their confidence in clinical generalizations on the basis of outcomes with previous interventions that addressed the same pathological processes.
Semagacestat was screened and designed to target amyloid-β production, which is believed to be a key step in dementia onset. Eight other anti-amyloid drugs have either failed randomized trials or been abandoned due to toxicity (Table 1) ,. Although a variant of this approach may eventually succeed, promising preclinical evidence supporting semagecestat should have been tempered by the accumulation of data about outcomes in the same reference class.
To illustrate how our suggestions interface with ethical decision-making, consider recent proposals to reinitiate trials of fetal-derived tissues for Parkinson's disease . Previous trials involved treatment-refractory patients, but investigators are now proposing trials involving patients with recent onset. The rationale is that fetal-derived tissues can only protect dopaminergic neurons to the extent that the latter remain intact. However, the risk-benefit balance is contentious, because the trial will expose patients who can manage symptoms with standard treatments to the risks of neurosurgery, immunosuppression, and cell transplantation.
According to evidential conservatism, investigators and ethics bodies should evaluate the risk-benefit balance by consulting preclinical studies and the biological rationale for patient-subject selection. One commentator notes that, on the basis of preclinical studies showing the intervention is designed to address early disease processes, performing studies in patients with advanced disease would be unethical . We think this way of using evidence in ethical evaluation is misguided.
Our proposal directs decision-makers to make risk-benefit decisions in light of two additional factors. First, to what degree do the preclinical studies incorporate design elements that support reliable inferences about clinical activity? This directs stakeholders to attend to those methodological features of the preclinical studies that support credible claims of internal, construct, and external validities in preclinical studies. As these preclinical studies are presently underway, researchers have an opportunity to overcome past limitations in addressing validity threats in Parkinson's disease models .
Second, our proposal directs stakeholders to consider evidence that sheds light on the maturity of the knowledge relating to key causal claims presupposed by therapeutic predictions. As investigators propose to intervene in degenerative processes, a claim of therapeutic action would need to be evaluated in light of outcomes in previous Parkinson's trials involving surgically delivered neuroprotective agents and/or transplanted tissues. No such strategies have produced positive randomized trials (Table 2). Accordingly, even with carefully collected preclinical evidence, decision-makers should approach new trials with modest therapeutic expectations.
Table 2. Outcomes in randomized trials of neuroregenerative and/or cell-transplantation strategies for Parkinson's disease.doi:10.1371/journal.pmed.1001010.t002
Thoughtful commentators have argued that, before initiating cell-based dopamine replacement, strategies should be “clinically competitive” with standard of care . However, this may present an unworkable standard . Previous unsuccessful attempts at translation betray profound uncertainty concerning risks and benefits for research volunteers. Given the preliminary nature of such interventions, the ethical justification for their administration in early phase trials should not hinge on the prospect of benefit for volunteers. It should rest instead on a compelling claim of knowledge value and on the reduction of avoidable risks. The latter entails pursuing trials in patients less likely to suffer opportunity costs from study participation, and maintaining a background of medical management that does not fall below standard of care. Rather than being told that the approach is comparable to standard of care, the consent process should emphasize that clinical benefit is unlikely.
Systematic study of preclinical research has centered on stroke and practices focused on internal validity. Our proposal makes clear the need to broaden the scope of this research agenda to cover a wider range of preclinical research, and to expand its focus to include issues of construct and external validity. A key component of this process will involve creating databases for aggregating translational outcomes according to relevant reference classes.
Some may worry that such an analysis might produce less optimistic predictions, and hence stymie product development. However, we do not see how medicine is advanced by forging ahead on the basis of predictions of dubious reliability. Moreover, there are many productive ways in which stakeholders may respond to less optimistic projections. For instance, review of relevant information may prompt researchers to test certain hypotheses before moving ahead with human trials. Investigators might adjust the design of translational studies to align the risk profile with ethical judgments. Or, investigators might decide that moving forward with a protocol represents the best way to advance a particular scientific initiative, but that risks can only be justified by appealing to the value of the knowledge sought, rather than the product's therapeutic activity.
Stakeholders might already adjust their predictions in light of intuitions about validity or experiences with success or failure for similar agents. If so, they do so on the basis of private beliefs, and often without the data needed to make these adjustments systematically. Our approach provides a more publicly accessible basis for making and adjudicating risk-benefit predictions. We suggest that this would better cohere with a sage prescription offered by the National Commission: “there should first be a determination of the validity of the presuppositions of the research…. The method of ascertaining risks should be explicit… It should also be determined whether an investigator's estimates of the probability of harm or benefits are reasonable, as judged by known facts or other available studies ” .
Wrote the manuscript: JK AJL. ICMJE criteria for authorship read and met: JK AJL. Agree with the manuscript's results and conclusions: JK AJL.
- 1. Kimmelman J (2010) Gene transfer and the ethics of first-in-human research: lost in translation. Cambridge: Cambridge University Press.
- 2. Extance A (2010) Alzheimer's failure raises questions about disease-modifying strategies. Nat Rev Drug Discov 9: 749–751.
- 3. The National Commission for the Protection of Human Subjects of Biomedical and Behavioural Research (1979) The Belmont report: ethical principles and guidelines for the protection of human subjects of research. Bethesda: Department of Health Education and Welfare.
- 4. World Medical Association (1964) Declaration of Helsinki. Helsinki: 18th World Medical Assembly.
- 5. Mann H (2010) ASSERT: a standard for the review and monitoring of randomized clinical trials. Available: http://www.assert-statement.org/. Accessed 31 January 2011.
- 6. Department of Health and Human Services (2005) Protection of human subjects: criteria for IRB approval of research. Title 45 CFR 46.111(a)(1). pp. 1–12.
- 7. London AJ, Kimmelman J, Emborg ME (2010) Research ethics. Beyond access vs. protection in trials of innovative therapies. Science 328: 829–830.
- 8. Djulbegovic B, Kumar A, Soares HP, Hozo I, Bepler G, et al. (2008) Treatment success in cancer: new cancer treatment successes identified in phase 3 randomized controlled trials conducted by the National Cancer Institute-sponsored cooperative oncology groups, 1955 to 2006. Arch Intern Med 168: 632–642.
- 9. Kumar A, Soares H, Wells R, Clarke M, Hozo I, et al. (2005) Are experimental treatments for cancer in children superior to established treatments? Observational study of randomised controlled trials by the Children's Oncology Group. BMJ 331: 1295.
- 10. Soares HP, Kumar A, Daniels S, Swann S, Cantor A, et al. (2005) Evaluation of new treatments in radiation oncology: are they better than standard treatments? JAMA 293: 970–978.
- 11. Gross CP, Krumholz HM, Van Wye G, Emanuel EJ, Wendler D (2006) Does random treatment assignment cause harm to research participants? PLoS Med 3: e188. doi:10.1371/journal.pmed.0030188.
- 12. Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3: 711–715.
- 13. Pangalos MN, Schechter LE, Hurko O (2007) Drug development for CNS disorders: strategies for balancing risk and reducing attrition. Nat Rev Drug Discov 6: 521–532.
- 14. Contopoulos-Ioannidis DG, Ntzani E, Ioannidis JP (2003) Translation of highly promising basic science research into clinical applications. Am J Med 114: 477–484.
- 15. van der Worp HB, Howells DW, Sena ES, Porritt MJ, Rewell S, et al. (2010) Can animal models of disease reliably inform human studies? PLoS Med 7: e1000245. doi:10.1371/journal.pmed.1000245.
- 16. Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, et al. (2009) Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE 4: e7824. doi:10.1371/journal.pone.0007824.
- 17. Reynolds JC, Rittenberger JC, Menegazzi JJ (2007) Drug administration in animal studies of cardiac arrest does not reflect human clinical experience. Resuscitation 74: 13–26.
- 18. Bath PM, Gray LJ, Bath AJ, Buchan A, Miyata T, et al. (2009) Effects of NXY-059 in experimental stroke: an individual animal meta-analysis. Br J Pharmacol 157: 1157–1171.
- 19. Philip M, Benatar M, Fisher M, Savitz SI (2009) Methodological quality of animal studies of neuroprotective agents currently in phase II/III acute ischemic stroke trials. Stroke 40: 577–581.
- 20. Sena ES, van der Worp HB, Bath PM, Howells DW, Macleod M (2010) Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol 8: e1000344. doi:10.1371/journal.pbio.1000344.
- 21. Macleod MR, Fisher M, O'Collins VE, Sena ES, Dirnagl U, et al. (2009) Good laboratory practice. Preventing introduction of bias at the bench. Stroke 40: e50–e52.
- 22. Kilkenny C, Browne W, Cuthill I, Emerson M, Altman D (2010) Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol 8: e1000412. doi:10.1371/journal.pbio.1000412.
- 23. Henley DB, May PC, Dean RA, Siemers ER (2009) Development of semagacestat (LY450139), a functional gamma-secretase inhibitor, for the treatment of Alzheimer's disease. Expert Opin Pharmacother 10: 1657–1664.
- 24. American Society of Clinical Oncology (1997) Critical role of phase I clinical trials in cancer treatment. J Clin Oncol 15: 853–859.
- 25. Christian M, Shoemaker D (2002) The investigator's handbook: a manual for participants in clinical trials of investigational agents sponsored by DCTD, NCI. Bethesda: Cancer Therapy Evaluation Program.
- 26. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) (1996) ICH Harmonized Tripartite Guideline. Guideline for Good Clinical Practice E6(R1).
- 27. Lowenstein PR (2008) A call for physiopathological ethics. Mol Ther 16: 1771–1772.
- 28. Ioannidis JP, Karassa FB (2010) The need to consider the wider agenda in systematic reviews and meta-analyses: breadth, timing, and depth of the evidence. BMJ 341: c4875.
- 29. Mangialasche F, Solomon A, Winblad B, Mecocci P, Kivipelto M (2010) Alzheimer's disease: clinical trials and drug development. Lancet Neurol 9: 702–716.
- 30. Cummings J (2010) What can be inferred from the interruption of the semagacestat trial for treatment of Alzheimer's disease? Biol Psychiatry 68: 876–878.
- 31. Holden C (2009) Neuroscience. Fetal cells again? Science 326: 358–359.
- 32. Kimmelman J, London AJ, Ravina B, Ramsay T, Bernstein M, et al. (2009) Launching invasive, first-in-human trials against Parkinson's disease: ethical considerations. Mov Disord 24: 1893–1901.
- 33. Lindvall O, Kokaia Z (2010) Stem cells in human neurodegenerative disorders—time for clinical translation? J Clin Invest 120: 29–40.
- 34. Anderson JA, Kimmelman J (2010) Extending clinical equipoise to phase 1 trials involving patients: unresolved problems. Kennedy Inst Ethics J 20: 75–98.
- 35. Gilman S, Koller M, Black RS, Jenkins L, Griffith SG, et al. (2005) Clinical effects of Abeta immunization (AN1792) in patients with AD in an interrupted trial. Neurology 64: 1553–1562.
- 36. Feldman HH, Doody RS, Kivipelto M, Sparks DL, Waters DD, et al. (2010) Randomized controlled trial of atorvastatin in mild to moderate Alzheimer disease: LEADe. Neurology 74: 956–964.
- 37. Elan Corporation (2010) Elan and Transition Therapeutics announce topline summary results of Phase 2 study and plans for Phase 3 for ELND005 (Scyllo-inositol) [press release].
- 38. Salloway S, Sperling R, Gilman S, Fox NC, Blennow K, et al. (2009) A phase 2 multiple ascending dose trial of bapineuzumab in mild to moderate Alzheimer disease. Neurology 73: 2061–2070.
- 39. Winblad B, Giacobini E, Frolich L, Friedhoff LT, Bruinsma G, et al. (2010) Phenserine efficacy in Alzheimer's disease. J Alzheimers Dis 22: 1201–1208.
- 40. Gold M, Alderton C, Zvartau-Hind M, Egginton S, Saunders AM, et al. (2010) Rosiglitazone monotherapy in mild-to-moderate alzheimer's disease: results from a randomized, double-blind, placebo-controlled phase III study. Dement Geriatr Cogn Disord 30: 131–146.
- 41. Green RC, Schneider LS, Amato DA, Beelen AP, Wilcock G, et al. (2009) Effect of tarenflurbil on cognitive decline and activities of daily living in patients with mild Alzheimer disease: a randomized controlled trial. JAMA 302: 2557–2564.
- 42. Bellus Health Inc (2008) Neurochem announces results from Tramiprosate (ALZHEMED(TM)) North American Phase III clinical trial.
- 43. Marks WJ Jr, Bartus RT, Siffert J, Davis CS, Lozano A, et al. (2010) Gene delivery of AAV2-neurturin for Parkinson's disease: a double-blind, randomised, controlled trial. Lancet Neurol 9: 1164–1172.
- 44. Nutt J, Burchiel KJ, Comella CL, Jankovic J, Lang AE, et al. (2003) Randomized, double-blind trial of glial cell line-derived neurotrophic factor (GDNF) in PD. Neurology 60: 69–73.
- 45. Lang AE, Gill S, Patel NK, Lozano A, Nutt JG, et al. (2006) Randomized controlled trial of intraputamenal glial cell line-derived neurotrophic factor infusion in Parkinson disease. Ann Neurol 59: 459–466.
- 46. Freed CR, Greene PE, Breeze RE, Tsai WY, DuMouchel W, et al. (2001) Transplantation of embryonic dopamine neurons for severe Parkinson's disease. N Engl J Med 344: 710–719.
- 47. Olanow CW, Goetz CG, Kordower JH, Stoessl AJ, Sossi V, et al. (2003) A double-blind controlled trial of bilateral fetal nigral transplantation in Parkinson's disease. Ann Neurol 54: 403–414.
- 48. Watts RL, Freeman TB, Hauser RA, Bakay RAE, Ellias SA, et al. (2001) A double-blind, randomised, controlled, multicenter clinical trial of the safety and efficacy of stereotaxic intrastriatal implantation of fetal porcine ventral mesencephalic tissue (Neurocelli-PD) vs. imitation surgery in patients with Parkinson's disease (PD). Parkinsonism Relat Disord 7: SupplS87.
- 49. Olanow CW, Stern MB, Sethi K (2009) The scientific and clinical basis for the treatment of Parkinson disease. Neurology 72: 21 Suppl 4S1–S136.