Previous Article | Next Article 
Antimicrobial Agents and Chemotherapy, January 2006, p. 62-67, Vol. 50, No. 1
0066-4804/06/$08.00+0 doi:10.1128/AAC.50.1.62-67.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Comparison of Censored Regression and Standard Regression Analyses for Modeling Relationships between Antimicrobial Susceptibility and Patient- and Institution-Specific Variables
Jeffrey P. Hammel,1,
Sujata M. Bhavnani,1,2*
Ronald N. Jones,3,4
Alan Forrest,2 and
Paul G. Ambrose1,2,
Cognigen Corporation, Buffalo, New York,1
School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, Buffalo, New York,2
The JONES Group/JMI Laboratories, North Liberty, Iowa,3
Tuft's University School of Medicine, Boston, Massachusetts4
Received 28 July 2004/
Returned for modification 2 January 2005/
Accepted 26 September 2005
 |
ABSTRACT
|
|---|
In order to identify patients likely to be infected with resistant bacterial pathogens, analytic methods such as standard regression (SR) may be applied to surveillance data to determine patient- and institution-specific factors predictive of an increased MIC. However, the censored nature of MIC data (e.g., MIC
0.5 mg/liter or MIC > 8 mg/liter) imposes certain limitations on the use of SR. In order to investigate the nature of these limitations, simulations were performed to compare a regression tailored for censored data (censored regression [CR]) and one tailored for an SR. By using a model relating piperacillin-tazobactam MICs against Enterobacter spp. to patient age and hospital bed capacity, 200 simulations of 500 isolates were performed. Various MIC censoring patterns were imposed by using 26 left- or right-censored (L,R) pairs (i.e., MICs
2 mg/literL [2L] or MICs > 2 mg/literR [2R], respectively). Data were fit by CR and SR for which censored MICs were either (i) excluded, (ii) replaced by 2L or 2R, or (iii) replaced by 2L 1 or 2R + 1. Total censoring for the 26 pairs ranged from 7 to 86%. By CR, deviations of average parameter estimates from the true parameter values were <0.10 log2 (mg/liter) for all parameters for each of the 26 pairs. By SR, these deviations were >0.10 log2 (mg/liter) for at least 18 of the 26 pairs for all but one parameter. Two-standard-error confidence intervals for individual parameters contained as little as 0% of cases for all SR approaches but
91.5% of cases for the CR approach. When censored MIC data are modeled, CR may reduce or eliminate biased parameter estimates obtained by SR.
 |
INTRODUCTION
|
|---|
One challenge in the clinical study of resistant bacteria has been the difficulty in identifying patients likely to be infected with such pathogens. In an effort to identify factors predictive of resistance, linear regression has frequently been applied to epidemiologic data collected across multiple institutions (1, 2, 4, 6; C. K. Johnson, and R. Polk, Abstr. 43rd Intersci. Conf. Antimicrob. Agents Chemother., abstr. K-1399, 2003). In these analyses, the outcome variable is typically the percentage of isolates that are nonsusceptible or resistant to a given antimicrobial agent. However, the use of qualitative susceptibility data places undue restrictions upon the sensitivity of an analysis: significant increases in MICs within any of the qualitative susceptibility categories (susceptible, intermediate, or resistant) cannot be detected (S. M. Bhavnani, Abstr. 41st Infect. Dis. Soc. Am., Session 125, abstr. 1005, 2003). Epidemiologic analyses involving quantitative MIC data overcome the latter limitation.
Most regression analyses for quantitative data are performed by using a dependent variable for which all observations are known. However, the analysis of MICs which include censored values within and at the lower margins of the MIC distribution entails certain challenges. Typically, a high proportion of left-censoring (i.e., MICs
x) is common with highly active drugs. As reported previously, the total proportion of censored MICs was as high as 90% for MICs for cefepime against Klebsiella pneumoniae (3). In such cases, one option is to simply replace the censored value with an actual value (e.g., replace a MIC of >2 mg/liter with a MIC of 2 mg/liter, or replace a MIC of >2 mg/liter with a MIC of 4 mg/liter). Alternatively, all observations for which the dependent variable is censored may be removed from the analysis. A third approach is to retain censored MIC data and use maximum-likelihood estimation. In this manner, a multiple-regression model is fit to the data, and the model incorporates censored MIC observations into the likelihood function by using the tail probabilities of the error distribution (3, 7). By the use of such an approach, the uncertainty of the censored MICs is preserved.
In order to compare the impact of general linear regression that accounts for the censored pattern of MICs (i.e., "censored regression") with standard regression, which excludes or replaces censored MICs, to correctly estimate model parameters, we conducted a series of simulations using data with various proportions of left- and right-censored MICs.
 |
MATERIALS AND METHODS
|
|---|
Data simulation.
Simulations were based on the MICs generated by using a simplified approximation of a multiple-regression model for piperacillin-tazobactam MICs against Enterobacter species reported previously (3). While other significant relationships were found, important independent variables associated with the MIC included the categories of patient age and hospital bed size. The form of the regression model used for the simulations is described by equation 1:
 | (1) |
where the intercept (parameter 1) is equal to 0.8; the age effect is equal to 0.8 if the age is
18 years (parameter 2), 1.2 if 41 years
age
60 years (parameter 3), 0.7 if 61 years
age
75 years (parameter 4), and 0.6 if age is >75 years (parameter 5); and the hospital bed size effect is equal to 1.1 if 401
bed size
900 (parameter 6), 0.1 if 901
bed size
1,350 (parameter 7), and 1.1 if bed size is >1,350 (parameter 8). The reference categories were 19 to 40 years for age and 0 to 400 beds for hospital bed size.
By using SAS version 8.2, a total of 200 simulated data sets were generated. Each simulated data set contained 500 isolates with MICs determined by equation 1. A random error was assigned according to a normal probability distribution with a mean of 0 and a standard deviation of 1.9 log2 (mg/liter). Random errors from different observations were assumed to be statistically independent. MICs were log transformed (log2) and rounded to the nearest integer to create MIC data of the same quantitative and integer-valued nature as the original observed and modeled MIC data. For each simulation, patient age and hospital bed size were randomly generated according to a joint probability distribution of which the probability for any combination of the age and the bed size variables was equal to the proportion of the combination observed within the original data set (3).
Imposed censoring.
The proportion of left and right censoring of MICs in a data set is dependent upon the integer-valued censoring pair which defines the lower and the upper margins of susceptibility tested. In order to compare censored regression and standard regression methods for a wide range of censoring patterns, 26 distinct censoring conditions [i.e., 26 censoring pairs, left and right (L,R)] were applied to the MICs in each simulated data set. For each censoring pattern, the proportions of total censoring and the balance between left and right censoring were averaged across the 200 simulations to describe the extent of censoring.
General linear modeling.
For each of the 200 simulated data sets and each censoring pattern, censored regression and each of three standard regression methods were used to fit the regression model in equation 1 with eight coefficient parameters (the intercept, the four age category parameters, and the three bed size category parameters). In order to provide MICs suitable for use in multiple-regression modeling of only uncensored dependent variable values, the following three standard regression methods were used to handle censoring: (i) multiple regression with censored MICs excluded from the analyses (exclude observations); (ii) multiple regression with censored MICs of the form MIC
2 mg/literL (2L) or MIC > 2 mg/literR (2R) replaced by the censoring boundaries 2L or 2R (ignore inequality); and (iii) multiple regression with censored MICs replaced by 2L 1 or 2R + 1 (adjust by 1). The model fits for censored regression were obtained by using the LIFEREG procedure in SAS version 8.2 (SAS Institute), while the fits for the standard regression methods were obtained by using the REG procedure.
Evaluation of parameter estimation performance.
For any given modeling method, the difference between the mean of a model parameter estimator's probability distribution and the true parameter value represented estimator bias. For each of the four regression methods evaluated, each of the 26 censoring conditions, and each of the eight model parameters, the parameter estimate average (PEAV) over the 200 simulated data sets was computed. PEAV values were plotted in three dimensions against the censoring conditions imposed (showing both quantity and balance) as a result of the 26 censoring conditions. Deviations (in absolute values) between PEAV and the true parameter values were computed to estimate the bias. Although the results for individual censoring conditions are the most applicable to a specific modeling situation, the average over the 26 censoring conditions of the deviations between the PEAV and true parameter values was used to measure the overall performance of an individual regression method. In addition to the average difference between PEAV and the true values, the percentage of PEAV values among the 26 pairs within 0.10 log2 (mg/liter) of the true parameter value was also summarized for each of the four regression methods. The graphical and numerical summaries of PEAV were generated by using S-Plus UNIX version 6.0.1 (Insightful).
When assumptions of negligible bias and approximate normality of a parameter estimate are satisfied, a two-standard-error (2 · SE) deviation on either side of the parameter estimate provides an approximate 95% confidence interval for the true parameter value. The percentage of the 200 simulated data sets for which the 2 · SE interval contained a given parameter (or coverage percentage) was also used to assess the parameter estimation performance. The coverage percentage for such intervals should be close to 95%. The 2 · SE confidence intervals within each simulation were computed for each parameter and each of the 26 censoring conditions. Confidence intervals were computed in conjunction with the general linear regression modeling by using the LIFEREG procedure (SAS version 8.2), and numerical and graphical summaries of the coverage percentages were generated by using S-Plus.
 |
RESULTS
|
|---|
Censoring proportions.
The histogram of all simulated MICs and the summary of censoring conditions for the 26 (L,R) censoring pairs, averaged across the 200 simulated data sets, is shown in Fig. 1. For any given pair, the MICs to the left of the closed datum point and the MICs to the right of the open datum point were censored. The average proportion of the total sample censored ranged from 7 to 86%, with the average left censoring of the MIC (as a percentage of the censored observations) ranging from 0.9 to 92%.

View larger version (31K):
[in this window]
[in a new window]
|
FIG. 1. Simulated MIC histogram with censoring conditions for the 26 (L,R) censoring pairs. The (L,R) censoring pairs are represented by closed and open dots, respectively.
|
|
PEAV summaries.
Three-dimensional graphical summaries of PEAV for model parameters 1 and 2 for each of the four regression methods are shown in Fig. 2. Within each graph, the 26 datum points represented the censoring pairs, which demonstrated the total proportion and balance (between left and right) of censoring. A second-order regression surface was fit to these datum points and is displayed as a grid to assist in the visual description. Although they are not shown, graphs for parameters 3 through 8 demonstrated surface shapes which were the same as those seen in the series of graphs for parameter 2. The patterns for parameters 6 and 7, however, were flipped upside down because their true values are negative, unlike the positive value of parameter 2 and the other parameters.

View larger version (23K):
[in this window]
[in a new window]
|
FIG. 2. Comparison of PEAV values across a range of censoring conditions for parameters 1 and 2. A surface was fit to PEAV by a two-order regression equation. Portions of the surface (PEAV) within 0.10 log2 (mg/liter) of the true value are displayed in black. Gray surfaces represent PEAV greater than 0.1 log2 (ml).
|
|
The graphs in Fig. 2 for both the intercept parameter (parameter 1) and an example nonintercept parameter (parameter 2) show that as the total proportion of censored observations approached 0, the PEAV values converge toward the true parameter value. The surfaces in Fig. 2 describing the PEAV against the censoring conditions take on different characteristics between the intercept parameter and the nonintercept parameter. For the intercept, the bias appeared to be low for the standard regression methods when the proportion of left censoring versus that of right censoring was balanced. A greater proportion of left censoring than right censoring showed a trend toward overestimates of the intercept for the exclude observations and ignore inequality methods. The adjust-by-1 method had an opposite and less dramatic effect for these cases. In cases where there were equal proportions of left and right censoring, these trends were reversed (i.e., estimates for the intercept were low for the exclude observations and ignore inequality methods and, to a lesser degree, high for the adjust-by-1 method). In contrast to the three standard regression methods, censored regression consistently produced estimates of the intercept with no perceivable bias across all censoring conditions.
The graphs in Fig. 2 for the nonintercept parameter demonstrated negative bias estimates for the exclude observations and ignore-inequality standard regression methods, the degree of which increased as the total proportion of censored observations increased. Although not as severe, the bias for the adjust-by-1 method was in the positive direction. There appeared to be some improvement in this bias as the total proportion of censored observations increased. Visual inspection of surfaces for the exclude observations and adjust-by-1 methods revealed a more severe bias when there was an even balance of left and right censoring. In comparison, there was no visual indication that bias was affected for the ignore-inequality approach when censoring was balanced. As in the case of the intercept parameter, censored regression consistently produced estimates with no perceivable bias across the censoring conditions.
Table 1 summarizes the numerical measures of bias in the parameter estimates for each regression method. When the absolute deviation between the PEAV and the true parameter values is examined, censored regression provided a deviation less than 0.1 log2 (mg/liter) for each of the eight parameters under all of the 26 censoring conditions. Simulations for the standard regression methods showed that deviations within 0.1 log2 (mg/liter) of the true value were attained or approached 100% of the censoring conditions for parameter 7 only, the true value of which was 0.1 log2 (mg/liter). The magnitudes of the deviations from the true value, irrespective of the regression method used, were generally smaller for parameters with true values closer to 0. Parameters with true values of 0.6 log2 (mg/liter) or larger, which included all but parameter 7, were estimated with a bias of less than 0.1 log2 (mg/liter) for only 30.8% of the censoring conditions or less.
Confidence interval coverage.
Table 2 shows summaries of the 2 · SE confidence interval coverage percentage for individual parameters for each of the regression methods. For most parameters and for each of the standard regression methods, there were censoring conditions under which none of the 200 simulated confidence intervals contained the true parameter value. In contrast, confidence intervals for censored regression consistently contained the true parameter values for close to 95% of the simulations. Figure 3, which displays the estimated coverage percentage against the average proportion of the total sample censored, demonstrates a low coverage percentage for the 2 · SE confidence intervals based on each of the standard regression methods when the proportion of censoring was high. The confidence intervals for censored regression, however, consistently covered the true parameter values in nearly 95% of the simulations, regardless of the amount of the sample censored. The relationships between the proportion of confidence intervals that covered the true parameter values and the balance between left and right censoring were also explored, but there appeared to be no relationship between the two for any of the methods. Figure 4 shows graphs of the standard errors for parameter estimates. When the three standard regression methods were used, the standard error tended to decrease as the proportion of censored data increased. In contrast, the standard error increased with increasing proportions of censored data when censored regression was used.

View larger version (18K):
[in this window]
[in a new window]
|
FIG. 3. Comparison of coverage percentage for a 2 · SE confidence interval (CI) across a range of censoring conditions for parameters 1 and 2.
|
|

View larger version (16K):
[in this window]
[in a new window]
|
FIG. 4. Comparison of the parameter estimate standard error across a range of censoring conditions for parameters 1 and 2.
|
|
 |
DISCUSSION
|
|---|
The results of the simulations presented herein, which were based on a simplified approximation of a multiple-regression model for piperacillin-tazobactam MICs against Enterobacter species (3), allowed the evaluation of four regression approaches for different patterns of MIC censoring and a range of censoring conditions. As expected, the performance of each of the approaches was comparable under conditions of very limited or no censoring (i.e., the PEAV converged toward the true parameter value). However, irrespective of the proportion of the sample that was censored or the pattern of censoring, censored regression proved to be superior to the three standard regression methods. Deviations between PEAV and true parameter values were within 0.1 log2 (mg/liter) for 100% of the censoring patterns for all parameters using this approach. The rank order of performance for the three standard regression methods evaluated was as follows: adjust by 1, ignore inequality, and exclude observations. Of these three methods, adjustment by ±1 on the log2 scale was perhaps the most arbitrary. Nonetheless, the adjust-by-1 method would be considered a reasonable approach for consideration, since an adjustment of ±1 or any other integer preserves the integer nature of the MIC on the log2 scale. The superior performance of censored regression was likely the result of the inclusion of all information for the dependent variable without the imposition of specific nonobserved values when the exact value is unknown.
The low coverage percentages of the 2 · SE confidence intervals under some censoring patterns for the standard regression methods was striking. One (and perhaps the most important) contributing factor to this finding was that the presence of censoring led to biased parameter estimates, as shown by the PEAV results. Interestingly, as the proportion of censored data increased, the standard error for the parameter estimates based on standard regression methods decreased. This finding was counterintuitive, since increased censoring of a dependent variable results in the loss of specific information and thus should result in an increase in the standard error. Stated more generally, less information intuitively implies more uncertainty. In practice, one might not be aware of this counterintuitive behavior of the standard error and the resulting low coverage percentage of the confidence intervals obtained by using standard regression methods with a modest or large amount of censored dependent variable values. The danger is that confidence intervals that very likely do not contain the true parameter value may be reported with an assumed confidence level (typically near 95%). The findings of such analyses could be seriously misleading.
Through the analyses undertaken, we demonstrated both the usefulness and the potential benefit of the use of linear regression models which appropriately accommodate a dependent variable with censored observations. In addition to MICs, concentration and CFU data are frequently censored due to lower limits of detection. Statistical methods which can appropriately account for these data would be of great benefit in providing a better understanding of exposure-response relationships at higher and lower concentrations or colony counts. As demonstrated by these analyses, naïve approaches which exclude or replace such observations may provide misleading results. Unfortunately, many commonly used desktop statistical packages do not provide the capability of handling both left and right censoring of the dependent variable. While commonly used packages such as SAS and S-Plus do provide the capability to perform regression procedures for censored data, their use is currently limited to linear models with independent error structures. Moreover, only a small selection of independent error distributions is available. The MIC data in these analyses were obtained from only single measurements from individual patients, so the independent error structure was appropriate. Analyses that contain repeated measurements may require the implementation of censored regression methods that also allow the use of dependent error structures.
In our analyses of the simulated data, we correctly assumed a normal error distribution, since this was the error distribution imposed upon the data. We did not use alternative error distributions to thoroughly evaluate the impact of misspecification of the error structure. We recommend that caution be taken when an appropriate distribution is selected, since the robustness of error distribution selection has yet to be explored. In addition, selection of the error distribution should be based on the residuals of model fits rather than the original MICs, since independent variables in the model may explain much of the variation or multimodal nature of the MIC distribution, particularly in the right tail. If, after careful evaluation, a suitable error distribution is not available using standard statistical packages (i.e., no near approximation to the supposed error distribution is found), then the use of alternative approaches (e.g., transformations or different software) should be considered.
Although the modeling of MIC data, due to their censored nature, is more challenging than the modeling of other end points such as the percentage of resistant isolates, greater insight may be gained by evaluating these data, especially when a low burden of resistant isolates exists. For example, multivariable censored regression analyses could be applied to the identification of subgroups of patients at increased risk for infection arising from organisms with increased MICs. Such information may then be used to identify patients of interest for clinical study as well as patients for whom interventions (e.g., increased infection control and/or treatment with a potent antimicrobial regimen) may increase the likelihood of clinical success and reduce the emergence of resistance. A number of large and long-standing surveillance systems, such as the SENTRY Antimicrobial Surveillance Program (5, 9) and the Trust (10) and Mystic (8) surveillance programs, have performed extensive MIC testing of isolates collected globally. Furthermore, these surveillance systems frequently have more information about the patient and institution from which isolates were collected than is ever reported. Analyses that use quantitative MIC data, and especially those that can appropriately integrate censored MICs, represent a step forward in the effort to better understand predictors of resistance.
 |
FOOTNOTES
|
|---|
* Corresponding author. Present address: Institute for Clinical Pharmacodynamics, Ordway Research Institute, 150 New Scotland Avenue, Albany, NY 12208 Phone: (518) 641-6473. Fax: (518) 641-6304. E-mail: SBhavnani-ICPD{at}OrdwayResearch.org. 
Present address: Collaborative Biostatistics Center, Cleveland Clinic Foundation, Cleveland, OH 44195. 
Present address: Institute for Clinical Pharmacodynamics, Ordway Research Institute, Albany, NY 12208. 
 |
REFERENCES
|
|---|
- Ballow, C. H., and J. J. Schentag. 1992. Trends in antibiotic utilization and bacterial resistance. Diagn. Microbiol. Infect. Dis. 15(2 Suppl.):37S-42S.[Medline]
- Bhavnani, S. M., W. A. Callen, A. Forrest, K. K. Gilliland, D. A. Collins, J. A. Paladino, and J. J. Schentag. 2003. Effect of fluoroquinolone expenditures on susceptibility of Pseudomonas aeruginosa to ciprofloxacin in U.S. hospitals. Am. J. Health-Syst. Pharm. 60:1962-1970.[Abstract/Free Full Text]
- Bhavnani, S. M., J. P. Hammel, A. Forrest, R. N. Jones, and P. G. Ambrose. 2003. Relationships between patient- and institution-specific variables and decreased susceptibility of gram-negative pathogens. Clin. Infect. Dis. 37:344-350.[CrossRef][Medline]
- Fridkin, S. K., R. Lawton, J. R. Edwards, F. C. Tenover, J. E. McGowan, and R. P. Gaynes. 2002. Intensive Care Antimicrobial Resistance Epidemiology Project. National Nosocomial Infections Surveillance Systems hospitals. Monitoring antimicrobial use and resistance: comparison with a national benchmark on reducing vancomycin use and vancomycin-resistant enterococci. Emerg. Infect. Dis. 8:702-707.[Medline]
- Fritsche, T. R., H. S. Sader, and R. N. Jones. 2003. Comparative activity and spectrum of broad-spectrum beta-lactams (cefepime, ceftazidime, ceftriaxone, piperacillin/tazobactam) tested against 12,295 staphylococci and streptococci: report from the SENTRY Antimicrobial Surveillance Program (North America: 2001-2002). Diagn. Microbiol. Infect. Dis. 47:435-440.[CrossRef][Medline]
- Lesch, C. A., G. S. Itokazu, L. H. Danziger, and R. A. Weinstein. 2001. Multi-hospital analysis of antimicrobial usage and resistance trends. Diagn. Microbiol. Infect. Dis. 41:149-154.[CrossRef][Medline]
- Klein, J. P., and M. L. Moeschberger. 1997. Survival analysis: techniques for censored and truncated data, p. 65-70. Springer-Verlag, New York, N.Y.
- Rhomberg, P. R., R. N. Jones, and the MYSTIC Program (USA) Study Group. 2003. Antimicrobial spectrum of activity for meropenem and nine broad spectrum antimicrobials: report from the MYSTIC Program (2002) in North America. Diagn. Microbiol. Infect. Dis. 47:365-372.[CrossRef][Medline]
- Sader, H. S., D. J. Biedenbach, and R. N. Jones. 2003. Global patterns of susceptibility for 21 commonly utilized antimicrobial agents tested against 48,440 Enterobacteriaceae in the SENTRY Antimicrobial Surveillance Program (1997-2001). Diagn. Microbiol. Infect. Dis. 47:361-364.[Medline]
- Thornsberry, C., D. F. Sahm, L. J. Kelly, I. A. Critchley, M. E. Jones, A. T. Evangelista, and J. A. Karlowsky. 2002. Regional trends in antimicrobial resistance among clinical isolates of Streptococcus pneumoniae, Haemophilus influenzae, and Moraxella catarrhalis in the United States: results from the TRUST Surveillance Program, 1999-2000. Clin. Infect. Dis. 34:S4-S16.
Antimicrobial Agents and Chemotherapy, January 2006, p. 62-67, Vol. 50, No. 1
0066-4804/06/$08.00+0 doi:10.1128/AAC.50.1.62-67.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.