Statistical Analyses
As residents could be included more than once, the unit of analysis throughout is episode of illness. In our major analysis, we developed a multivariable logistic model to estimate the probability of radiographic pneumonia (possible or probable). Before beginning modeling, we imputed mean values for missing continuous data and the largest category for missing dichotomous variables (the number of missing values is noted in Table 2). Data imputation is less biased than dropping cases in developing multivariable models.14
Illness episodes were then randomly assigned to a two thirds model-development and a one third model-validation sample. On the basis of the literature and clinical experience, we defined categories of variables that might relate to the presence or absence of pneumonia, such as lung findings (eg, crackles, wheezes), respiratory symptoms (eg, cough, sputum production), vital signs, findings of delirium (eg, acute confusion, decreased alertness), and laboratory findings. Restricting our focus to the development sample, we selected the best representatives of these groups on clinical and statistical grounds. For continuous variables, we considered the shape of the relationship to presence of pneumonia. For example, both very high and very low pulse rates predicted increased risk of pneumonia. In such cases, we considered several different ways to represent the variable in the model. We also limited the range of some variables to avoid undue influence of outliers (approximately the 1% most extreme values). For example, pulse rate above 140 was set equal to 140.
We then employed forward and backward stepwise logistic regression with possible or probable pneumonia (also referred to as positive x-ray results) as the dependent variable. For final model inclusion, we required variables to bear a plausible relationship to the diagnosis of pneumonia and meet a statistical significance criterion (a=.05).
To obtain final estimates of the relationship of each model variable to pneumonia probability, we considered adjustments for 2 kinds of correlation within our data: (1) individuals are nested within facilities, and (2) subjects could be represented by more than one episode.15 Using generalized estimating equations (GEE) in Proc Genmod in SAS software (SAS Institute, Cary, NC),16 we noted that the effect of facilities was minor, but the effect of repeat episodes by the same subject was more marked. Consequently, we used GEE to account for repeat episodes on subjects. To avoid unstable GEE estimates, we dropped 5 episodes in the development sample and 8 in the overall sample (episodes beyond the 5th and 6th per individual, respectively).
Using parameter estimates from the development sample, we tested the model’s discrimination and calibration in the validation sample.17 To assess discrimination, we used the c-statistic, which evaluates among all possible pairs of individuals whether those with higher predicted risk are more likely to die. The c-statistic is also equal to the area under the receiver operating characteristic curve. To assess calibration—agreement between observed and predicted mortality over the range of predicted risk—we used the Hosmer-Lemeshow goodness-of-fit statistic.18 We then used estimates fitted to the overall sample to develop a simple additive score to provide a clinically usable prediction rule. Statistical analyses were performed with SAS statistical software.16
Results
Project nurses performed 2592 evaluations. In 90% (2337), residents received chest x-rays either in the nursing home or on hospital transfer. In 3 additional cases crucial information was missing from nursing home records. This left for final analysis 2334 episodes in 1474 individuals Figure 1.
Fifty-five percent of radiographs were interpreted as negative, 12% showed possible pneumonia, and 33% showed probable pneumonia. Most nursing home residents with pneumonia had few presenting symptoms; 80% had 3 or fewer respiratory or general symptoms. However, only 7.5% of subjects evaluated had no respiratory symptoms. Table 2 shows the relationship of selected variables to radiographic findings of absent, possible, or probable pneumonia. Though a few signs and symptoms are more common in those with positive (possible or probable pneumonia) than negative chest x-ray results, most did not discriminate at all. Fever (temperature Ž38°C) was present in 44.4% of positives but only 28.5% of negatives (P=.001).
Multivariable Analysis and Prediction Score
Our GEE model to predict radiographic pneumonia includes 3 vital sign abnormalities (fever, rapid pulse, and rapid respiratory rate), 2 lung findings (presence of crackles and absence of wheezes), 2 potential indicators of delirium (somnolence or decreased alertness and acute confusion), and elevated white blood count. Table 3 reports GEE estimates for the entire sample. Though only exhibiting fair overall performance, the model did well at distinguishing subjects with a high probability of pneumonia. In the 20% of subjects with the highest predicted risks, more than two thirds had pneumonia.