Research Article

Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration

  • Irving Kirsch mail,

    To whom correspondence should be addressed. E-mail:

    Affiliation: Department of Psychology, University of Hull, Hull, United Kingdom

  • Brett J Deacon,

    Affiliation: University of Wyoming, Laramie, Wyoming, United States of America

  • Tania B Huedo-Medina,

    Affiliation: Center for Health, Intervention, and Prevention, University of Connecticut, Storrs, Connecticut, United States of America

  • Alan Scoboria,

    Affiliation: Department of Psychology, University of Windsor, Windsor, Ontario, Canada

  • Thomas J Moore,

    Affiliation: Institute for Safe Medication Practices, Huntingdon Valley, Pennsylvania, United States of America

  • Blair T Johnson

    Affiliation: Center for Health, Intervention, and Prevention, University of Connecticut, Storrs, Connecticut, United States of America

  • Published: February 26, 2008
  • DOI: 10.1371/journal.pmed.0050045

Reader Comments (48)

Post a new comment on this article

Analytical differences

Posted by plosmedicine on 31 Mar 2009 at 00:23 GMT

Author: PJ Leonard
Submitted Date: March 16, 2008
Published Date: March 17, 2008
This comment was originally posted as a “Reader Response” on the publication date indicated above. All Reader Responses are now available as comments.

It is good of Johnson et al to reply to the responses here. However, I do not think they have sufficiently dealt with some of the reservations concerning their paper.

In particular, I do not think that they have engaged with my finding that using the raw HRSD change scores reveals that the placebo response does not in fact decrease with increasing baseline severity on the HRSD.

I am not clear exactly what they mean when they say that I have used between-subjects analyses to suggest that the effect size (when analysing the raw HRSD change scores) is larger than presented in their paper, whereas they have used within-subjects analyses.

My analyses utilise conventional methods for meta-analysis where the effect size in each study is analysed directly, whereas it seems likely that the low estimated effect size in HRSD units in this study is the result of carrying out the meta-analytic weighting on the drug and placebo groups separately (a 'within subjects' analysis?), and then comparing the effect sizes thus obtained (which would explain the lack of forest plots in the paper).

This is not an acceptable analytic technique because it ignores that there is a relationship between the improvement in placebo and drug groups from the same study, but that the placebo and drug groups from any given study can have grossly different weightings when considered separately (e.g. there would be half as much weighting to the results from the fluoxetine trials in the drug analysis as the placebo analysis, the result of, for example, different sample sizes between the experimental arms).

Normalising the HRSD change to the change standard deviation in each group separately is also unnaceptable because a larger change in HRSD score in the drug group could be associated with a greater variance, although this does not appear to be the case in this study.

Robert Waldmann estimates that there is more bias in analytical method in this paper than publication bias present in the data itself:


I note that Figure 4 in the paper of Kirsch et al is actually more consistent with my finding of 'clinical significance' at a baseline of 26 (this threshold is found both by regression on the difference scores, or separate regressions for each group's change score) than their suggestion of 28 points, this difference is undoubtedly because this figure looks at raw HRSD scores, as did my analyses, and because the NICE 'clinical significance' threshold of d > .5 is actually stricter than the NICE threshold of an HRSD difference > 3.

I concur that there is a relationship between baseline HRSD severity and effect size but it is worth noting that almost all studies examined had baselines over 23 points (and were thus in APA/NICE categories of 'very severe' depression) so the threshold of 26 points is a fairly average baseline severity for the studies analysed in this paper (as can be seen from my regression plots or their Figure 4). Any generalisation to less severe categories of depression is unwarranted given that it would depend on extrapolating the regression line to a region with only a single study.

No competing interests declared.