**DOI:**10.1128/AAC.01743-10

## ABSTRACT

Antimicrobial drug development has greatly diminished due to regulatory uncertainty about the magnitude of the antibiotic treatment effect. Herein we evaluate the utility of pharmacometric-based analyses for determining the magnitude of the treatment effect. Frequentist and Bayesian pharmacometric-based logistic regression analyses were conducted by using data from a phase 3 clinical trial of tigecycline-treated patients with hospital-acquired pneumonia (HAP) to evaluate relationships between the probability of microbiological or clinical success and the free-drug area under the concentration-time curve from time zero to 24 h (AUC_{0-24})/MIC ratio. By using both the frequentist and Bayesian approaches, the magnitude of the treatment effect was determined using three different methods based on the probability of success at free-drug AUC_{0-24}/MIC ratios of 0.01 and 25. Differences in point estimates of the treatment effect for microbiological response (method 1) were larger using the frequentist approach than using the Bayesian approach (Bayesian estimate, 0.395; frequentist estimate, 0.637). However, the Bayesian credible intervals were tighter than the frequentist confidence intervals, demonstrating increased certainty with the former approach. The treatment effect determined by taking the difference in the probabilities of success between the upper limit of a 95% interval for the minimal exposure and the lower limit of a 95% interval at the maximal exposure (method 2) was greater for the Bayesian analysis (Bayesian estimate, 0.074; frequentist estimate, 0.004). After utilizing bootstrapping to determine the lower 95% bounds for the treatment effect (method 3), treatment effect estimates were still higher for the Bayesian analysis (Bayesian estimate, 0.301; frequentist estimate, 0.166). These results demonstrate the utility of frequentist and Bayesian pharmacometric-based analyses for the determination of the treatment effect using contemporary trial endpoints. Additionally, as demonstrated by using pharmacokinetic-pharmacodynamic data, the magnitude of the treatment effect for patients with HAP is large.

## INTRODUCTION

Antibacterial drug development has greatly diminished due to the impact of regulatory uncertainty regarding the magnitude of drug effect, contemporary clinical trial endpoints, and the ability of these endpoints to capture a meaningful benefit. These elements are critical to statistically power noninferiority (NI) studies, and the lack of this information undermines the viability of contemporary clinical trial designs. One approach to ascertain this knowledge is through the examination of historical data, as presented in U.S. Food and Drug Administration (FDA) draft guidance documents for acute bacterial skin and skin structure infections (ABSSSI) (15) and for hospital-acquired pneumonia (HAP) and ventilator-associated pneumonia (VAP) (16).

Historical data from studies of sulfanilamide (13) and prontosil (14) for the treatment of erysipelas in the 1930s serve as the foundation for determining the magnitude of the drug effect and identifying new clinical trial endpoints in the recently released guidance for ABSSSI (15). There is great controversy, however, surrounding the use of historical data for this purpose. In the papers from the 1930s, for example, there are obvious limitations that can introduce potential biases (e.g., no placebo control, a lack of blinding and randomization, and the use of medically dubious interventions), but most concerning is the inability to authenticate the source data. Source data verification is an essential component of good clinical practice (GCP) guidelines that are ensconced in Federal law (19a). It is of concern that these studies, deemed pivotal sources for justifying statistical NI margin calculations, actually predate GCP concepts and standards and, thus, are in direct conflict with Federal law.

Using historical data, the U.S. FDA and others have employed primarily a single statistical approach, which is based on frequentist inference, to evaluate the above-described issues. Alternative statistical approaches, such as Bayesian inference and pharmacometric methods, have not been fully considered by the U.S. FDA.

In brief, frequentist inference draws conclusions for the magnitude of the drug effect, treated as a fixed unknown value that is estimated by using only the data. With knowledge of the distribution of sample statistics based on a specified model, one constructs a confidence interval that contains potential values of the drug effect for which the observed data are plausible to a specified level. In contrast, Bayesian inference begins with a quantification of the prior belief and uncertainty concerning possible values of the drug effect. The data are then used to modify that belief, resulting in the construction of a credible interval in which the existence of the drug effect is likely, to some specified level. Thus, a Bayesian approach, which incorporates prior information, can be coupled with the results of pharmacokinetic-pharmacodynamic (PK-PD) analyses derived by using pharmacometric tools and data from contemporary clinical trials that were performed by using fundamental elements of GCP, to more precisely estimate the drug effect.

To illustrate the potential impact of the two different statistical approaches, frequentist and Bayesian pharmacometric-based analyses of phase 3 patient data involving tigecycline for the treatment of HAP were carried out. Tigecycline is a broad-spectrum intravenous antibiotic first approved for use by the U.S. FDA in 2005. Using this contemporary data set, the goal of these analyses was to provide an alternative statistical paradigm by examining the utility of frequentist and Bayesian pharmacometric-based approaches for estimating the magnitude of the treatment effect.

## MATERIALS AND METHODS

Clinical data.A description of the clinical data from the above-described clinical trial of patients with HAP was provided previously (3, 8).

Analysis populations.Only those patients with HAP who were microbiologically evaluable and who had sufficient pharmacokinetic data for tigecycline exposure estimations were included. Among those patients who were clinically and microbiologically evaluable, 22.8% (61/268 patients) and 31.4% (61/194), respectively, had sufficient pharmacokinetic data for tigecycline exposure estimations (3, 8). Among the 61 microbiologically evaluable patients, a total of 23 (37.7%) had VAP (3).

Statistical methods.Logistic regression analyses were carried out to evaluate PK-PD relationships for efficacy. Separate analyses were conducted for the two dependent variables of interest, microbiological and clinical responses at the test-of-cure (TOC) visit. The independent variable in each analysis was the log_{10} transformation of the ratio of the free-drug area under the concentration-time curve from time zero to 24 h (AUC_{0-24}) to the MIC. Logistic regression analyses were carried out in a frequentist fashion based on maximum likelihood estimation. Bayesian logistic regression analyses, employing a noninformative uniform prior distribution for the intercept parameter but an informative normal prior distribution for the slope parameter corresponding to the free-drug AUC_{0-24}/MIC ratio, were also performed.

The treatment effect was defined as the difference in the probabilities of microbiological or clinical success at low and high free-drug AUC_{0-24}/MIC ratios. Low and high free-drug AUC_{0-24}/MIC ratios were fixed at 0.01 and 25, respectively. Normal prior mean and standard deviation parameters for the slope term for Bayesian logistic regression analyses were selected to reflect a 99% prior likelihood of a treatment effect within intervals of −0.25 to 0.75, with 0.5% likelihood outside either extreme.

For the relationships between probability of microbiological or clinical success and free-drug AUC_{0-24}/MIC ratio based on frequentist logistic regression, 95% pointwise confidence intervals were calculated based on the model parameter estimates. For the set of relationships based on the Bayesian logistic regression, 95% pointwise credible intervals were calculated based on posterior distributions for the model parameters.

The term “point estimate” at a given value of the free-drug AUC_{0-24}/MIC ratio was used to represent the model-predicted probability of success based on the frequentist analyses and the probability of success calculated by using the centers of the normal posterior distribution of the intercept and slope based on the Bayesian analyses.

By using both the frequentist and Bayesian approaches, the treatment effect was estimated using three different methods based on the probability of success at free-drug AUC_{0-24}/MIC ratios of 0.01 and 25. For the first approach, the treatment effect was simply calculated as the difference in point estimates of the probability of success at free-drug AUC_{0-24}/MIC ratios of 0.01 and 25 (method 1). By using an approach that has been the current practice for the design of NI clinical trials for antibacterial agents, a highly conservative estimate of the treatment effect was calculated as the difference between the upper limit of a 95% interval for the probability of success at a free-drug AUC_{0-24}/MIC ratio of 0.01 and the lower limit of a 95% interval for the probability of success at a free-drug AUC_{0-24}/MIC ratio of 25 (method 2). For the third method, 95% lower bounds for the treatment effect were obtained by using 1,000 bootstrap samples and the bias-correcting acceleration (BCa) method (6, 7) (method 3).

Logistic regression model parameters were estimated by using the frequentist and Bayesian approaches and the R functions “glm” and “Bayesglm” (package “arm”), respectively. All analyses were performed by using R, version 2.8.1 (12). Bootstrapping was performed by using the package “boot” in the above-described version of R.

## RESULTS

Frequentist and Bayesian logistic regression.Estimated relationships between microbiological or clinical response and free-drug AUC_{0-24}/MIC ratio based on frequentist logistic regression were significant (*P* = 0.031 for each). For both relationships for microbiological and clinical responses based on Bayesian logistic regression, the Bayesian posterior likelihoods of a positive slope parameter based on these models were high for both endpoints (99.85% in each), thus demonstrating strong evidence of a positive treatment effect.

Estimated relationships for microbiological and clinical responses based on frequentist and Bayesian logistic regression are shown in Fig. 1A and B and 2A and B, respectively. For each set of relationships, 95% pointwise confidence and credible intervals are shown based on frequentist and Bayesian logistic regression, respectively.

Estimates of treatment effect.Estimates of the treatment effect determined for the three different methods based on the probability of microbiological and clinical success at free-drug AUC_{0-24}/MIC ratios of 0.01 and 25 using the frequentist and Bayesian approaches are summarized in Table 1. For method 1, a free-drug AUC_{0-24}/MIC ratio of 0.01 yielded a higher point estimate of the probability of microbiological success using a Bayesian approach than that using the frequentist approach (0.370 versus 0.233). At a free-drug AUC_{0-24}/MIC ratio of 25, the point estimate of the probability of microbiological success using the Bayesian approach was lower than that using the frequentist approach (0.765 versus 0.870), thus yielding an estimate of a treatment effect for this endpoint of 0.395 for the Bayesian approach, versus 0.637 for the frequentist approach. Using a Bayesian approach, a free-drug AUC_{0-24}/MIC ratio of 0.01 yielded a higher point estimate of the probability of clinical success than that using the frequentist approach (0.355 versus 0.195). At a free-drug AUC_{0-24}/MIC ratio of 25, the point estimate of the probability of clinical success using the Bayesian approach was lower than that using the frequentist approach (0.760 versus 0.867), thus yielding an estimate of a treatment effect for this endpoint of 0.405 for the Bayesian approach, versus 0.672 for the frequentist approach.

As shown in Fig. 2A and B, the Bayesian credible intervals are notably tighter than the frequentist confidence intervals shown in Fig. 1A and B, demonstrating an increased level of certainty with the incorporation of prior information. As a result of the increased certainty, even the very conservative estimate of the treatment effect determined by using method 2 for microbiological response using the lower limit of the 95% interval at the maximal exposure minus the upper limit of the 95% interval at the minimal exposure was greater for the Bayesian approach than for the frequentist approach (0.663 − 0.589 = 0.074, versus 0.694 − 0.690 = 0.004). Similarly, for the clinical response, the lower limit of the 95% interval at the maximal exposure minus the upper limit of the 95% interval at the minimal exposure was greater for the Bayesian approach than for the frequentist approach (0.657 − 0.572 = 0.085, versus 0.690 − 0.647 = 0.043).

In comparison to the conservatively calculated lower bounds for the treatment effect described above based on method 2, the 95% lower confidence and credible bounds for the treatment effect derived by method 3, which entailed the use of bootstrapping and bias-corrected acceleration, demonstrated a substantial treatment effect. The 95% lower confidence and credible bounds for the treatment effect based on the frequentist and Bayesian approaches were 0.166 and 0.301, respectively, for the microbiological response and 0.211 and 0.314, respectively, for the clinical response.

## DISCUSSION

The objective of these analyses was to demonstrate the utility of frequentist and Bayesian pharmacometric-based logistic regression analyses to determine the magnitude of the treatment effect and the ability of clinical trial endpoints to capture the drug benefit using contemporary data. A tigecycline clinical trial data set containing data for patients with HAP (37.7% of whom also had VAP) was selected for these analyses (3). These data were considered especially valuable and timely given that they were gathered from a contemporary trial and given the considerable discussions for this specific labeled indication that have been generated at U.S. FDA- and Infectious Diseases Society of America (IDSA)-cosponsored workshops (18, 19) and by U.S. FDA guidance documents (16).

We considered a Bayesian pharmacometric-based approach to be rational for three reasons. First, tigecycline has demonstrated *in vitro* activity against bacteria of clinical interest (9). Second, because of the *in vitro* microbiological activity of tigecycline, it was evaluated in animal infection models in which effective bacterial killing and mortality reduction were demonstrated (20). Moreover, tigecycline exposure, as measured by the free-drug AUC_{0-24}/MIC ratio, and outcome were well described by PK-PD models in these animal systems. Third, based upon the *in vitro* and *in vivo* activities of tigecycline, tigecycline was studied in patients for whom efficacy was demonstrated and related to the AUC_{0-24}/MIC ratio using PK-PD models (1, 3, 4, 10, 11). Acknowledging each of the above-mentioned steps in the drug development process is critical to a Bayesian understanding that antibacterial agents entering late-stage clinical development have been effectively prescreened and, thus, have a high potential to be efficacious, assuming that appropriate dose regimens are selected for clinical study.

Using PK-PD relationships for tigecycline efficacy in patients with HAP, the lower bounds for the treatment effect were identified by using frequentist and Bayesian logistic regression analyses for microbiological and clinical responses at the TOC visit. These findings have a number of profound implications. First, the relationships indicate that the above-described contemporary clinical endpoints capture a reasonable measure of the drug effect. This observation negates the first of the two reasons underpinning the use of historical data to determine the magnitude of the treatment effect, namely, the perception that data are not available, supporting the ability of contemporary endpoints to capture a measure of the drug effect.

Second, the magnitude of the treatment effect based on the difference between point estimates was high for this population of patients (0.395 and 0.637 for the microbiological response and 0.405 and 0.672 for the clinical response, using the Bayesian and frequentist approaches, respectively). Using frequentist and Bayesian logistic regression models, the probability of a successful response as free-drug AUC_{0-24}/MIC ratio approached zero was determined, thus providing an estimate of the no-treatment (i.e., placebo) response. This observation negates the second reason underpinning the use of historical data, namely, the perception that data are not available to provide an estimate of the placebo response rate. Note that the estimates of the treatment effect described above are based upon the difference in drug effects at high and low free-drug AUC_{0-24}/MIC ratios (rather than placebo). Therefore, these estimates of the treatment effect are biased low; that is, the estimates of the treatment effect described above are expected to be lower than those that hypothetically could be derived from patients receiving placebo.

Rather than the use of the best estimate of the treatment effect, which is based on the difference between point estimates (method 1) and which is not biased, the current practice for estimating the treatment effect of antibacterial agents when designing NI clinical trials involves taking the difference between the upper and lower bounds of the 95% confidence intervals to estimate the magnitude of the treatment effect (method 2). The magnitude of the treatment effect, which is referred to as M1, is then empirically discounted by 30 to 50% to yield M2 (15–17). Although this step is implemented to derive a conservative estimate of the treatment effect, the above-described adjustments for M1 are arbitrary and result in biologically implausible estimates. Nonetheless, by using this method, the estimate of the treatment effect was higher for the Bayesian approach than for the frequentist approach (e.g., 0.074 × 0.5 = 0.037 versus 0.004 × 0.5 = 0.002, respectively, for the microbiological response).

In order to determine an appropriate lower 95% confidence or credible bound for the treatment effect that is not dependent on the subtraction of point estimates for the probability of a successful response at minimal and maximal free-drug AUC_{0-24}/MIC ratios, bootstrapping was employed. Using this third method, the estimate of the treatment effect was higher for the Bayesian approach than for the frequentist approach (0.301 versus 0.166, respectively, for the microbiological response). These treatment effect estimates are conservative and biologically plausible (i.e., drug effect decreases as exposure decreases), thereby negating the need for the arbitrary M2 adjustment. While U.S. FDA guidance for industry for NI clinical trials does advocate accounting for variability when estimating the treatment effect (17), the application of the methods encompassed by method 3 is not described but accomplishes this goal effectively.

It is important to recognize why antibacterial treatment effect estimates have been so low when calculated by use of method 2. The method rationale is to obtain a “conservative” estimate of the antibacterial treatment effect, but the description of conservative is completely misleading. The most accurate estimate of the antibacterial treatment effect, based on method 1, is the actual difference between the point estimates of the mean. Instead, the current practice, reflected by method 2, involves calculating the difference between the upper and lower 95% confidence intervals. The resulting estimate of the treatment effect is neither accurate nor reasonably conservative; it is merely biased to be much lower than the best estimate of the true effect. Even this highly biased estimate of the treatment effect is not considered sufficiently conservative. This is evidenced by another completely unsupported reduction (discounting) of the magnitude of the already biased estimate of the treatment effect by 30 to 50%, the scientific basis of which is lacking. Method 3, which employed bootstrap samples and a bias-correcting acceleration method, provided an estimate of the treatment effect that is appropriately conservative based on the selection of a single level of confidence or credibility for the resulting lower bound.

In addition to the debate about how to calculate the treatment effect, the pharmacometric-based approach to determine the magnitude of the treatment effect described herein has been subject to certain criticisms. Critics argue that there could be alternate, unobserved factors that result in a spurious relationship between drug exposure and effect. However, pharmacometric-based analyses have been replicated numerous times with the identification of PK-PD relationships for efficacy within an antibiotic class as well as for different drug classes with different dynamically linked indices (e.g., AUC/MIC ratio versus time above the MIC) (2). This degree of consistency alone renders the probability low that an unobserved factor can lead to erroneous relationships based on pharmacometric analyses.

One needs to ponder the wisdom of discounting the massive amount of evidence that supports the use of pharmacometric analyses over historical data to determine the magnitude of the treatment effect, evidence that critics wish to ignore. The supporting evidence includes the consistencies of observations between studies with animals (the exposures for which are assigned) and infected patients. Similar results can be shown between studies with animals and those with patients across multiple indications (HAP, VAP, community-acquired bacterial pneumonia, ABSSSI, and acute bacterial exacerbations of chronic bronchitis and within and across multiple antibiotics classes) (2). The biological plausibility and the identification of a PK-PD index that is predictive of outcomes across multiple indications and within and across multiple antibiotic classes strongly argue for the reliability of these analyses. The alternate proposal, that a single patient factor or a constellation of other patient factors is responsible for the manifestation of statistically significant but spurious relationships in these analyses, seems highly unlikely.

As described herein, these results demonstrate the utility of frequentist and Bayesian pharmacometric-based analyses for the determination of the treatment effect and the calculation of NI margins using contemporary trial data that employ the fundamental elements of GCP. While estimates of the treatment effect using the Bayesian approach were higher for methods 2 and 3 than those based on the frequentist approach, the benefit of the Bayesian over the frequentist approach is dependent on the existence (and quality) of prior information, which is not always available. Irrespective of the type of approach, the incorporation of bootstrapping to obtain lower bounds for the treatment effect serves to improve upon an overly imprecise and arbitrary practice of taking the difference between the lower bound of the interval for the maximal effect and the upper bound of the interval for the minimal effect. As demonstrated by the analyses described herein, approaches that integrate contemporary data with pharmacometric-based approaches for the determination of the treatment effect and calculations of NI margins will provide a bridge forward for drug development that is anchored on sounder scientific footings.

## ACKNOWLEDGMENTS

P. G. Ambrose, S. M. Bhavnani, and C. M. Rubino were responsible for tigecycline PK-PD analyses conducted for Wyeth Research and submitted as part of the drug's new drug application (NDA). P. G. Ambrose and S. M. Bhavnani serve as consultants for Pfizer, which currently holds the tigecycline NDA. P. G. Ambrose is currently a special FDA employee and is a committee member for the Foundation of the National Institutes for Health Biomarker Consortium. E. J. Ellis-Grosse is a consultant to various pharmaceutical and drug development companies. She has received financial compensation and/or owns benefits from the following companies: Affinium, Astra-Zeneca, Cadence, Cempra, Cerexa, Cubist, Elan, Hospira, Middlebrook, Novartis, Protez, Pfizer, Rempex, and Targanta. G. L. Drusano has no conflicts to disclose.

## FOOTNOTES

- Received 14 December 2010.
- Returned for modification 14 January 2011.
- Accepted 6 December 2011.
- Accepted manuscript posted online 12 December 2011.

- Copyright © 2012, American Society for Microbiology. All Rights Reserved.