| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article | Next Article ![]()
Antimicrobial Agents and Chemotherapy, July 2007, p. 2483-2488, Vol. 51, No. 7
0066-4804/07/$08.00+0 doi:10.1128/AAC.01457-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Division of Laboratory Medicine, Women's and Children's Hospital, North Adelaide, Australia,1 Dade Behring—MicroScan, West Sacramento, California2
Received 20 November 2006/ Returned for modification 23 January 2007/ Accepted 9 April 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The Clinical and Laboratory Standards Institute (CLSI; formerly NCCLS) describes methods whereby these ranges are set for their susceptibility testing methods (8). Preliminary QC range studies, or tier 1 studies, are used principally for the purpose of controlling the performance of susceptibility tests during drug development. They are usually performed in a single laboratory with a limited number of replicates. For establishing published QC ranges for a new antimicrobial agent, a tier 2 study is recommended in the standard. A tier 2 study must involve at least seven independent laboratories, which are required to test the antimicrobial agent in or on three separate lots of medium from two different manufacturers at least 30 times (from 30 separately prepared inocula). In the case of disk testing, two separate disk lots from two manufacturers are tested. The choice of the number of laboratories, medium lots, disk lots, and replications has been determined by cumulative experience and with assistance over the years from statisticians employed in the susceptibility test manufacturing industry. Until now, QC ranges for MIC zone diameters (ZDs) have been determined largely by visual inspection of the histogram of the data generated, enhanced by "common sense" rules of thumb and, in the case of disk testing, by a statistical method involving medians which was developed in the early 1980s (4). In the latter method, a tentative QC range is calculated as the overall median of the ZDs observed in the study ± 0.5 times the range of the medians of ZDs of the individual laboratories, rounded up or down to the nearest whole millimeter. Current methods of setting QC ranges do not take advantage of the fact that the data generated follow statistical distributions, nor do they use any unbiased techniques to detect and reject outlier laboratories or results.
Here we show that relatively simple statistical techniques can be applied to data generated in CLSI QC studies and that these can be used as the primary output, to which few arbitrary rules need be applied, thereby reducing the risk of incorrectly setting QC ranges.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Calculated ZD ranges. Means and standard deviations were calculated for each laboratory and for the pooled laboratory data. From the pooled statistics, ZD ranges were calculated to encompass 95% of the values, that is, from the lower 2.5% of the distribution to the upper 97.5%. These values were adjusted downwards and upwards, respectively, to the nearest whole millimeter, thus ensuring that at least 95%, and mostly more, of the predicted distribution of MICs in the range was included.
Detection of possible outlier data. Occasionally, visual inspection of the data suggested that data from some laboratories or individual values were substantially different from the others. This might be attributed to errors in test performance, including setup and reading, or to transcription errors or may indeed represent true variation in the test. In order to ensure that true variation in the data was not lost, a conservative approach was developed for the detection of possible outlier data. First, three central tendency statistics were calculated for each laboratory data set; these were the mean, the median, and the mode. Second, control ranges were set for each of these. For the mean, the control ranges were set to be within 1.645 standard deviations of the mean for the pooled data (90% of the data). For the median, the ranges were set at the 25th percentile of the pooled data minus 1.5 times the interquartile range to the 75th percentile of the pooled data plus 1.5 times the interquartile range. For the mode, the ranges were set at the mode of the pooled data ± 1 dilution for MIC tests and at 2 mm for disk diffusion tests. To be considered a possible outlier laboratory, at least two of the three central tendency statistics of an individual laboratory's data needed to be outside the control ranges.
Election to not set ranges.
CLSI has not formally established a set of rules for electing not to establish QC ranges. However, it has generally been agreed that ranges which result in an excessively broad range of MICs or ZDs are not acceptable. A twofold dilution range of
5 for MICs or a ZD range of >12 is thought to represent excessive scatter and/or interlaboratory variation, and usually ranges have not been established for such data sets by CLSI. Other reasons for not setting ranges have included (i) technical issues, e.g., the test dilution range did not go sufficiently high or low to accurately capture the variation in results; (ii) major differences in results between medium lots, i.e., usually one of the medium lots yielded significantly higher or lower results; and (iii) all results in MIC studies being at very low concentrations, where accuracy of preparation can be problematic.
In this study, ranges were not set when there were clearly identifiable technical, medium lot, or low-concentration issues as described above. Ranges were also not set if there was excessive variation between laboratories, defined as more than three laboratories with one central tendency statistic indicating them as possible outliers. In contrast, calculated ranges were accepted for MIC ranges of >4 dilutions and for all ZD ranges in order to allow comparison with those set by CLSI.
| RESULTS |
|---|
|
|
|---|
The QC ranges calculated by the statistical method were then compared to the actual QC ranges approved by the relevant CLSI subcommittee.
MIC ranges. There were 178 MIC range comparisons (Table 2). In 15 instances (8.4%), one laboratory was identified as a possible outlier by the predefined rules. For 10 of these, all three criteria were met, and for the remaining 5, two criteria were met. The data from each of these laboratories were excluded before determination of the calculated QC ranges. In three instances, exclusion of one laboratory's data set led to a second becoming a possible outlier. Data from these second laboratories were also excluded, and ranges were then calculated using data from six laboratories. The relevant CLSI subcommittee elected not to set ranges for these particular QC strain-antimicrobial agent combinations.
|
|
Two-thirds (121/178 ranges) of the calculated MIC ranges were identical to those set by CLSI. In 12 of the 121 instances where they were identical, adjustments had been made to the calculated ranges as outlined in Materials and Methods. In all cases where the calculated ranges resulted in narrower ranges, the calculated ranges covered 3 dilutions while the ranges set by CLSI covered 4 dilutions. Frequently, in these cases, the CLSI ranges were extended to 4 dilutions because of the "shoulder" rule, which states that if the frequency of observations at an MIC above or below the modal MIC is about 65% that of the mode or greater, then the QC range should be extended 1 twofold dilution lower or higher than that concentration, respectively.
When calculated and CLSI ranges were different, the CLSI ranges were more likely to be narrower, mostly by a single dilution. There were six instances where the calculated ranges ran to 5 dilutions (e.g., gentamicin versus Escherichia coli ATCC 25922 in Brucella microdilution broth after 24 and 48 h of incubation) and two where the ranges included 6 dilutions (e.g., doripenem versus Bacteroides fragilis ATCC 25285 in supplemented Brucella microdilution broth). Closer inspection of the data raises the question of whether it was appropriate to set ranges at all because of considerable variation between laboratories, but without any standout individual laboratory. When there is that much statistical variation between laboratories, one questions the wisdom of trimming the ranges to include 3 or 4 dilutions, even if 95% of the observed values are captured, as it is likely that >5% of QC results will be out of control when the QC range is put into wide routine practice.
In eight instances, the new method calculated a range that was not set by CLSI. This suggests that the new method can give guidance on whether to set ranges, even if there are apparent difficulties with the data.
Sixteen sets (9%) of calculated ranges required adjustment to include 3 twofold dilutions, with 11 sets (6.2%) covering a single calculated dilution and 5 sets (2.8%) covering two calculated dilutions.
ZD ranges. There were 48 ZD range comparisons (Table 3). No laboratory data were considered possible outliers, and all data were included in the calculation of the ranges. The majority of calculated ranges generated a wider range of ZDs than those determined by CLSI, but only by 1 or 2 mm. In some cases, the calculated ranges covered a narrower range, by 1 to 2 mm. In one-third of cases, the ranges were identical. By inspection, the fits to a normal distribution were generally very good (Fig. 2).
|
|
| DISCUSSION |
|---|
|
|
|---|
In developing the statistical method, we have attempted to embrace the "rules of thumb" that are currently employed by the CLSI subcommittees, while enhancing them by (i) attempting to identify possible outlier data in a reproducible manner and (ii) using predominantly statistical values to define the ranges rather than having them defined by visual inspection plus capture of at least 95% of the observed data in the study.
Although participation of more laboratories will possibly generate ranges that have better predictive value, the costs of conducting these studies constrain the numbers, and it has been calculated that seven laboratories should provide sufficient data to allow estimates of ranges to be reasonably predictive of those likely to be observed in routine testing (Ullery, personal communication). On the basis that data from one laboratory might be nonrepresentative, it has therefore been common practice to use eight laboratories in tier 2 studies. However, CLSI has not established criteria that would detect nonrepresentative laboratories, and judgments are usually made "by committee."
With regard to possible outlier detection, the statistical method proposed here has been designed to minimize the possibility of data rejection and to ensure that true variation is included. Indeed, while we excluded data from laboratories identified as possible outliers for the purposes of analysis and comparison, we would not recommend exclusion as a matter of course. Instead, we envision the identification of possible outlier data as a flag to investigate possible causes with the laboratory concerned before considering data exclusion. For instance, in one case where MIC ranges were being examined, it was clear that the possible outlier data from one laboratory for one QC organism were actually data from the same laboratory for another QC strain against the same agent, i.e., they were the result of a transcriptional error. On the other hand, we suggest that serious consideration be given to including such data when no reasonable technical or transcriptional cause can be found.
Calculated MIC ranges resulted in some QC ranges that were narrower than those set by the CLSI subcommittees. The calculated ranges merely reflect the amount of variation in MICs observed in the study. The current CLSI convention is to adjust the ranges to include at least 3 twofold dilutions, and this convention was applied to our calculated ranges for comparison purposes. Indeed, in some other susceptibility testing standards, this convention has been codified (1). However, the validity of doing this can be questioned. The fact that ranges calculated from eight-laboratory tier 2 studies can be only 1 or 2 twofold dilutions is a consequence of the relatively coarse grouping that the twofold dilution series imposes on the data. If finer grouping of MICs were to be used, such as that generated by the commercial Etest (Solna, Sweden) gradient diffusion method (6), it would be more obvious that the scatter in MICs for individual QC strain-antimicrobial agent combinations would vary significantly. It is therefore possible that the intra- and interlaboratory variation in MICs of QC strains could be quite small and consist of values within just 1 or 2 twofold dilutions. An example is the combination of Enterococcus faecalis ATCC 29212 and doripenem. Of 240 replicate measurements, 231 were at a single MIC (2 µg/ml), and in four of the eight laboratories this was the only value observed in 30 replicates. The calculated QC ranges were a single concentration, as this clearly captured much more than 95% of the data. According to the distribution of the data and the calculated statistical parameters, there is a 0.4% probability of observing a value of 1 µg/ml and a 0.6% probability of observing a value of 4 µg/ml. Adjusting to 3 dilutions is done to address the fear that a 1- or 2-twofold-dilution range will not be representative when applied to routine work. However, it seems inconsistent to apply this adjustment only to calculated 1- and 2-twofold-dilution ranges. This implies that these ranges are not predictive while those with 3 and 4 twofold dilutions are predictive when both are found in a single study. Further discussion will be required to address the problem of narrow MIC ranges.
In contrast, it can be argued more easily that calculated MIC ranges producing an excessively broad range of twofold dilutions are problematic. A calculated range of 5 or 6 twofold dilutions for a particular QC strain-antimicrobial agent combination means that there is significant interlaboratory variation, which suggests that the particular QC strain is not a reliable QC indicator for that antimicrobial agent. There were eight such ranges (4.5%) in the data sets we examined. At present, CLSI subcommittees are likely to accept such combinations if 95% of the observed data are within 4 twofold dilutions. However, it is the spread of variation with combinations that produce a 5- to 6-twofold-dilution range that should send a warning about the particular QC strain's reliability when applied in a routine context. We advocate caution in setting QC ranges for any QC strain where the calculated MIC ranges produce such broad ranges, because we can predict from the statistics that in routine practice, out-of-range values will occur more often than 5% of the time.
Far fewer issues arose in the comparison of ZD ranges. There were no problems with possible outliers, coarse grouping of data, or excessively broad or narrow calculated ranges. The tendency for calculated ranges to produce a slightly wider range of ZDs was expected, as the current CLSI method accommodates 95% of the data observed in the study while the calculated ranges are meant to apply to the indefinitely large number of QC tests that will be performed in routine laboratories.
Overall, we believe the statistical approach adds value to the current CLSI method of establishing QC ranges. It was easily set up in a spreadsheet which requires only entry of the raw data and has visual alerts to possible outlier data once entry is complete. The calculated ranges and other fields are automatically recalculated whenever the raw data are modified, such as exclusion of possible outlier data, allowing the group collating the tier 2 study data to examine the effects of data adjustment immediately. Ultimately, the proof of this concept will be in its application in the field. Unfortunately, CLSI does not currently have a direct system for measuring the performance of its published QC ranges. Rather, it relies on feedback from clients (e.g., laboratories and pharmaceutical sponsors) to raise concerns about "abnormal" rates of out-of-control data. We look forward to such data being collected in future, as it would allow direct comparison between the current range-setting methods and the proposed statistical method.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published ahead of print on 16 April 2007. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Clin. Vaccine Immunol. | Clin. Microbiol. Rev. |
|---|---|
| J. Clin. Microbiol. | ALL ASM JOURNALS |