Previous Article | Next Article ![]()
Antimicrobial Agents and Chemotherapy, January 2006, p. 62-67, Vol. 50, No. 1
0066-4804/06/$08.00+0 doi:10.1128/AAC.50.1.62-67.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Sujata M. Bhavnani,1,2*
Ronald N. Jones,3,4
Alan Forrest,2 and
Paul G. Ambrose1,2,
Cognigen Corporation, Buffalo, New York,1 School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, Buffalo, New York,2 The JONES Group/JMI Laboratories, North Liberty, Iowa,3 Tuft's University School of Medicine, Boston, Massachusetts4
Received 28 July 2004/ Returned for modification 2 January 2005/ Accepted 26 September 2005
|
|
|---|
0.5 mg/liter or MIC > 8 mg/liter) imposes certain limitations on the use of SR. In order to investigate the nature of these limitations, simulations were performed to compare a regression tailored for censored data (censored regression [CR]) and one tailored for an SR. By using a model relating piperacillin-tazobactam MICs against Enterobacter spp. to patient age and hospital bed capacity, 200 simulations of 500 isolates were performed. Various MIC censoring patterns were imposed by using 26 left- or right-censored (L,R) pairs (i.e., MICs
2 mg/literL [2L] or MICs > 2 mg/literR [2R], respectively). Data were fit by CR and SR for which censored MICs were either (i) excluded, (ii) replaced by 2L or 2R, or (iii) replaced by 2L 1 or 2R + 1. Total censoring for the 26 pairs ranged from 7 to 86%. By CR, deviations of average parameter estimates from the true parameter values were <0.10 log2 (mg/liter) for all parameters for each of the 26 pairs. By SR, these deviations were >0.10 log2 (mg/liter) for at least 18 of the 26 pairs for all but one parameter. Two-standard-error confidence intervals for individual parameters contained as little as 0% of cases for all SR approaches but
91.5% of cases for the CR approach. When censored MIC data are modeled, CR may reduce or eliminate biased parameter estimates obtained by SR. |
|
|---|
Most regression analyses for quantitative data are performed by using a dependent variable for which all observations are known. However, the analysis of MICs which include censored values within and at the lower margins of the MIC distribution entails certain challenges. Typically, a high proportion of left-censoring (i.e., MICs
x) is common with highly active drugs. As reported previously, the total proportion of censored MICs was as high as 90% for MICs for cefepime against Klebsiella pneumoniae (3). In such cases, one option is to simply replace the censored value with an actual value (e.g., replace a MIC of >2 mg/liter with a MIC of 2 mg/liter, or replace a MIC of >2 mg/liter with a MIC of 4 mg/liter). Alternatively, all observations for which the dependent variable is censored may be removed from the analysis. A third approach is to retain censored MIC data and use maximum-likelihood estimation. In this manner, a multiple-regression model is fit to the data, and the model incorporates censored MIC observations into the likelihood function by using the tail probabilities of the error distribution (3, 7). By the use of such an approach, the uncertainty of the censored MICs is preserved.
In order to compare the impact of general linear regression that accounts for the censored pattern of MICs (i.e., "censored regression") with standard regression, which excludes or replaces censored MICs, to correctly estimate model parameters, we conducted a series of simulations using data with various proportions of left- and right-censored MICs.
|
|
|---|
![]() | (1) |
18 years (parameter 2), 1.2 if 41 years
age
60 years (parameter 3), 0.7 if 61 years
age
75 years (parameter 4), and 0.6 if age is >75 years (parameter 5); and the hospital bed size effect is equal to 1.1 if 401
bed size
900 (parameter 6), 0.1 if 901
bed size
1,350 (parameter 7), and 1.1 if bed size is >1,350 (parameter 8). The reference categories were 19 to 40 years for age and 0 to 400 beds for hospital bed size. By using SAS version 8.2, a total of 200 simulated data sets were generated. Each simulated data set contained 500 isolates with MICs determined by equation 1. A random error was assigned according to a normal probability distribution with a mean of 0 and a standard deviation of 1.9 log2 (mg/liter). Random errors from different observations were assumed to be statistically independent. MICs were log transformed (log2) and rounded to the nearest integer to create MIC data of the same quantitative and integer-valued nature as the original observed and modeled MIC data. For each simulation, patient age and hospital bed size were randomly generated according to a joint probability distribution of which the probability for any combination of the age and the bed size variables was equal to the proportion of the combination observed within the original data set (3).
Imposed censoring. The proportion of left and right censoring of MICs in a data set is dependent upon the integer-valued censoring pair which defines the lower and the upper margins of susceptibility tested. In order to compare censored regression and standard regression methods for a wide range of censoring patterns, 26 distinct censoring conditions [i.e., 26 censoring pairs, left and right (L,R)] were applied to the MICs in each simulated data set. For each censoring pattern, the proportions of total censoring and the balance between left and right censoring were averaged across the 200 simulations to describe the extent of censoring.
General linear modeling.
For each of the 200 simulated data sets and each censoring pattern, censored regression and each of three standard regression methods were used to fit the regression model in equation 1 with eight coefficient parameters (the intercept, the four age category parameters, and the three bed size category parameters). In order to provide MICs suitable for use in multiple-regression modeling of only uncensored dependent variable values, the following three standard regression methods were used to handle censoring: (i) multiple regression with censored MICs excluded from the analyses (exclude observations); (ii) multiple regression with censored MICs of the form MIC
2 mg/literL (2L) or MIC > 2 mg/literR (2R) replaced by the censoring boundaries 2L or 2R (ignore inequality); and (iii) multiple regression with censored MICs replaced by 2L 1 or 2R + 1 (adjust by 1). The model fits for censored regression were obtained by using the LIFEREG procedure in SAS version 8.2 (SAS Institute), while the fits for the standard regression methods were obtained by using the REG procedure.
Evaluation of parameter estimation performance. For any given modeling method, the difference between the mean of a model parameter estimator's probability distribution and the true parameter value represented estimator bias. For each of the four regression methods evaluated, each of the 26 censoring conditions, and each of the eight model parameters, the parameter estimate average (PEAV) over the 200 simulated data sets was computed. PEAV values were plotted in three dimensions against the censoring conditions imposed (showing both quantity and balance) as a result of the 26 censoring conditions. Deviations (in absolute values) between PEAV and the true parameter values were computed to estimate the bias. Although the results for individual censoring conditions are the most applicable to a specific modeling situation, the average over the 26 censoring conditions of the deviations between the PEAV and true parameter values was used to measure the overall performance of an individual regression method. In addition to the average difference between PEAV and the true values, the percentage of PEAV values among the 26 pairs within 0.10 log2 (mg/liter) of the true parameter value was also summarized for each of the four regression methods. The graphical and numerical summaries of PEAV were generated by using S-Plus UNIX version 6.0.1 (Insightful).
When assumptions of negligible bias and approximate normality of a parameter estimate are satisfied, a two-standard-error (2 · SE) deviation on either side of the parameter estimate provides an approximate 95% confidence interval for the true parameter value. The percentage of the 200 simulated data sets for which the 2 · SE interval contained a given parameter (or coverage percentage) was also used to assess the parameter estimation performance. The coverage percentage for such intervals should be close to 95%. The 2 · SE confidence intervals within each simulation were computed for each parameter and each of the 26 censoring conditions. Confidence intervals were computed in conjunction with the general linear regression modeling by using the LIFEREG procedure (SAS version 8.2), and numerical and graphical summaries of the coverage percentages were generated by using S-Plus.
|
|
|---|
![]() View larger version (31K): [in a new window] |
FIG. 1. Simulated MIC histogram with censoring conditions for the 26 (L,R) censoring pairs. The (L,R) censoring pairs are represented by closed and open dots, respectively.
|
![]() View larger version (23K): [in a new window] |
FIG. 2. Comparison of PEAV values across a range of censoring conditions for parameters 1 and 2. A surface was fit to PEAV by a two-order regression equation. Portions of the surface (PEAV) within 0.10 log2 (mg/liter) of the true value are displayed in black. Gray surfaces represent PEAV greater than 0.1 log2 (ml).
|
The graphs in Fig. 2 for the nonintercept parameter demonstrated negative bias estimates for the exclude observations and ignore-inequality standard regression methods, the degree of which increased as the total proportion of censored observations increased. Although not as severe, the bias for the adjust-by-1 method was in the positive direction. There appeared to be some improvement in this bias as the total proportion of censored observations increased. Visual inspection of surfaces for the exclude observations and adjust-by-1 methods revealed a more severe bias when there was an even balance of left and right censoring. In comparison, there was no visual indication that bias was affected for the ignore-inequality approach when censoring was balanced. As in the case of the intercept parameter, censored regression consistently produced estimates with no perceivable bias across the censoring conditions.
Table 1 summarizes the numerical measures of bias in the parameter estimates for each regression method. When the absolute deviation between the PEAV and the true parameter values is examined, censored regression provided a deviation less than 0.1 log2 (mg/liter) for each of the eight parameters under all of the 26 censoring conditions. Simulations for the standard regression methods showed that deviations within 0.1 log2 (mg/liter) of the true value were attained or approached 100% of the censoring conditions for parameter 7 only, the true value of which was 0.1 log2 (mg/liter). The magnitudes of the deviations from the true value, irrespective of the regression method used, were generally smaller for parameters with true values closer to 0. Parameters with true values of 0.6 log2 (mg/liter) or larger, which included all but parameter 7, were estimated with a bias of less than 0.1 log2 (mg/liter) for only 30.8% of the censoring conditions or less.
|
View this table: [in a new window] |
TABLE 1. Summary statistics for PEAV
|
|
View this table: [in a new window] |
TABLE 2. Range of coverage percentage for 2 · SE confidence intervals
|
![]() View larger version (18K): [in a new window] |
FIG. 3. Comparison of coverage percentage for a 2 · SE confidence interval (CI) across a range of censoring conditions for parameters 1 and 2.
|
![]() View larger version (16K): [in a new window] |
FIG. 4. Comparison of the parameter estimate standard error across a range of censoring conditions for parameters 1 and 2.
|
|
|
|---|
The low coverage percentages of the 2 · SE confidence intervals under some censoring patterns for the standard regression methods was striking. One (and perhaps the most important) contributing factor to this finding was that the presence of censoring led to biased parameter estimates, as shown by the PEAV results. Interestingly, as the proportion of censored data increased, the standard error for the parameter estimates based on standard regression methods decreased. This finding was counterintuitive, since increased censoring of a dependent variable results in the loss of specific information and thus should result in an increase in the standard error. Stated more generally, less information intuitively implies more uncertainty. In practice, one might not be aware of this counterintuitive behavior of the standard error and the resulting low coverage percentage of the confidence intervals obtained by using standard regression methods with a modest or large amount of censored dependent variable values. The danger is that confidence intervals that very likely do not contain the true parameter value may be reported with an assumed confidence level (typically near 95%). The findings of such analyses could be seriously misleading.
Through the analyses undertaken, we demonstrated both the usefulness and the potential benefit of the use of linear regression models which appropriately accommodate a dependent variable with censored observations. In addition to MICs, concentration and CFU data are frequently censored due to lower limits of detection. Statistical methods which can appropriately account for these data would be of great benefit in providing a better understanding of exposure-response relationships at higher and lower concentrations or colony counts. As demonstrated by these analyses, naïve approaches which exclude or replace such observations may provide misleading results. Unfortunately, many commonly used desktop statistical packages do not provide the capability of handling both left and right censoring of the dependent variable. While commonly used packages such as SAS and S-Plus do provide the capability to perform regression procedures for censored data, their use is currently limited to linear models with independent error structures. Moreover, only a small selection of independent error distributions is available. The MIC data in these analyses were obtained from only single measurements from individual patients, so the independent error structure was appropriate. Analyses that contain repeated measurements may require the implementation of censored regression methods that also allow the use of dependent error structures.
In our analyses of the simulated data, we correctly assumed a normal error distribution, since this was the error distribution imposed upon the data. We did not use alternative error distributions to thoroughly evaluate the impact of misspecification of the error structure. We recommend that caution be taken when an appropriate distribution is selected, since the robustness of error distribution selection has yet to be explored. In addition, selection of the error distribution should be based on the residuals of model fits rather than the original MICs, since independent variables in the model may explain much of the variation or multimodal nature of the MIC distribution, particularly in the right tail. If, after careful evaluation, a suitable error distribution is not available using standard statistical packages (i.e., no near approximation to the supposed error distribution is found), then the use of alternative approaches (e.g., transformations or different software) should be considered.
Although the modeling of MIC data, due to their censored nature, is more challenging than the modeling of other end points such as the percentage of resistant isolates, greater insight may be gained by evaluating these data, especially when a low burden of resistant isolates exists. For example, multivariable censored regression analyses could be applied to the identification of subgroups of patients at increased risk for infection arising from organisms with increased MICs. Such information may then be used to identify patients of interest for clinical study as well as patients for whom interventions (e.g., increased infection control and/or treatment with a potent antimicrobial regimen) may increase the likelihood of clinical success and reduce the emergence of resistance. A number of large and long-standing surveillance systems, such as the SENTRY Antimicrobial Surveillance Program (5, 9) and the Trust (10) and Mystic (8) surveillance programs, have performed extensive MIC testing of isolates collected globally. Furthermore, these surveillance systems frequently have more information about the patient and institution from which isolates were collected than is ever reported. Analyses that use quantitative MIC data, and especially those that can appropriately integrate censored MICs, represent a step forward in the effort to better understand predictors of resistance.
Present address: Collaborative Biostatistics Center, Cleveland Clinic Foundation, Cleveland, OH 44195. ![]()
Present address: Institute for Clinical Pharmacodynamics, Ordway Research Institute, Albany, NY 12208. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»