RADON POTENTIAL MAPPING - THE PROBLEM OF GEOGRAPHIC SCALE George Aspbury Department of Geography-Geology, Illinois State University Normal, IL ABSTRACT The purpose of this study is to investigate the significance of geographic scale in the construction of radon hazard potential maps using a vector-based Geographic Information System (GIs). I compare and discuss radon potential maps produced at the national scale by the United States Geological Survey, several state-level maps based on observed residential radon levels using counties and zipcode areas as the units of analyses, and lastly, a highly localized (large scale) map based on individual (non-aggregated) observations. In addition, one-way analysis of variance statistical techniques are utilized to establish the correlation between type of glacial deposits and observed radon levels. I demonstrate that when data is aggregated at both the zipcode and county scales, there appears to be no strong relationship between quarternary units and observed radon levels. However, at the large scale, statistically significant relationships are observed. I also demonstrate that such analyses within a GIs environment aids in the more meaningful construction of radon hazard potential maps. INTRODUCTION Due to the efforts of the United States Environmental Protection Agency, the public has become increasingly aware of the potential risks and health hazards associated with elevated levels of residential radon. Several studies during the past several years however, have questioned the extent to which radon is a contributing risk factor associated with lung cancer mortality. Other studies have gone so far as to completely reject the notion that residential radon is related to lung cancer. A moderate position on this issue is to assume that residential radon is in fact a causal factor, but the critical level of residential radon may have to be re-assessed. If this is the case, one can argue for additional research. This paper explores some of the difficulties inherent in the cartographic and geographic information systems (GIs) analysis of residential radon data. I also examine and demonstrate some of the problems of geographic scale and how the development of radon potential maps must carefully consider this factor. This is particularly critical when researchers are attempting to explain spatial variations in radon concentrations as the result of a multivariate process. As a general rule, very different spatial patterns may develop as a result of the same explanatory processes operating at different spatial scales. The data used in this study is drawn from a variety of sources and for three different geographical scales for the state of Illinois. CARTOGRAPHIC AND CIS ANALYSIS OF RADON DATA Previously it has been well-established that residential radon levels are not geographically constant but typically exhibit significant geographical variation regardless of the spatial scale of the data. If a map is worth a thousand words then it is only natural to represent tabular data on residential radon levels cartographically. Thus, for example, the United States Geological Survey has published a map (Map 1) of the United States - "The Generalized Geologic Radon Potential of the United States." (United States Geological Survey, 1993). This simple map identifies three categories of radon potential regions: Low potential areas (<2 pCiiL), moderate potential areas (2 - 4 pCi/L), and high potential areas (>4 pCi/L). This map as a generalization is an excellent first start in that it links characteristics of geologic provinces to radon potential. With reference to this work, the "EPA's Map of Radon Zones: Illinois," (United States Environmental Protection Agency, 1993) states: "It is important to note that EPA's - 1996 International Radon Symposium 1 4.1 extrapolation from the province level to the county level may mask significant "highs" and "lows" within specific counties. In other words, within-county variations in radon potential are not shown on the Map of Radon Zones." Radon level and radon potential maps have also been developed for individual states; these state-level maps have been developed by extrapolating fromthe national geologic province level to the county level. Other state-level studies, namely for Ohio, have been more ambitious, treating residential radon levels aggregated at the zipcode level and incorporating geologic factors into a large data bases(Kumar, et. al., 1990). A number of these same studies also endeavor to explain the resulting spatial patterns on the basis of certain geologic factors, including soil permeability, thickness of glacial deposits, type of bedrock, depth to bedrock, etc. Recent developments in automated and analytical cartography and geographic information systems (GIs) have greatly aided in the extent and quality of radon potential analyses and research. GIS platforms (including software and hardware) have simultaneously become less expensive, more sophisticated, more powerful and certainly more user-friendly compared to only several years ago. While GIs's have become a progressively more widely accepted technology in a variety of disciplines, problems can and due arise because of the manner in which the data itself is handled by the researcher. Ordinary descriptive statistics and the compilation of data in the form of frequency distributions applied to radon levels can provide obvious and useful insights about the extent and intensity of the residential radon hazard. In this paper I am concerned with some problems inherent in analyzing and visualizing such data, namely identifying the spatial distribution of the data for a given geographic area. A number of studies routinely incorporate maps in which radon concentrations are cartographically displayed. Most published maps of radon concentrations that this author is aware of use either counties or zipcode areas as the spatial units. In all likelihood, these maps using either of these geographical units, ultimately were constructed by aggregating individual residential observations. This data aggregation, however, raises serious questions with respect to the usefulness of such maps for analytical purposes. Stated differently, these maps are a generalization of the reality of the pattern of spatial distribution and thus may mask very significant statistical variations both within and among the spatial units being utilized. The question arises then, as to what are the appropriate spatial units that should be used in the construction of more powerful, predictive maps of radon potential. SPATIAL SCALE AND DATA AGGREGATION The creation of frequency distributions from 'raw' or 'unclassified' data forces the researcher to carefully consider issues such as the number of frequency classes and the upper and lower limits of each of these frequency classes. Failure to select a meaningful number of classes or appropriate class intervals can lead to erroneous interpretation of the generalized data. An analogous problem is faced by geographers or others creating choropleth maps. Barber (Barber, 1988) states, "...spatial aggregation tends to reduce the variation depicted on a map. Comparisons of maps of a variable at different levels of aggregation must take into account the ramifications of this variance reduction. The problem becomes even more acute when we try to examine the relationship between two maps of different variables." He proceeds to illustrate this with the following example (Figure 1.) Figure 1: Spatial Scale and Aggregation Effects on Summary Statistics LARGE SCALE DATA m = 7.5 s2= 9.75 Fig. 1.a - 1996 International Radon Symposium I 4.2 MEDIUM SCALE DATA m = 7.5 s 2 = 1.75 SMALL SCALE DATA m = 7.5 s2= 0.00 In this illustration assume that each group of square cells is a map of some variable for the same region. In Figure l.a, the region is subdivided into sixteen square cells and the mean for the entire region is m = 7.5 with a variance s2= 9.75. If this data is then aggregated by combining two horizontally adjacent cells (Figure 1.b) such that the region is partitioned into only eight cells, the mean remains unchanged but the variance is now reduced to s2 = 1.75. If further data aggregation and generalization in undertaken (Figure l .c) such that the region is partitioned into only four subregions, the mean again remains unchanged but the variance is now s2 = 1.75. In other words, all of the original statistical and spatial variation in the data has been lost. Other examples could be provided in which both mean and variance would change in response to different spatial configurations and aggregations of the data. Another example of aggregation effects is presented by Clark and Hosking (Clark and Hosking, 1986) and Clark and Avery (Clark and Avery, 1976). With particular emphasis on the effects of data aggregation with respect to correlation and regression analyses, they observed that, "Aggregation of observational units on the basis of proximity leads to substantially biased correlation coefficients, with an increase in r as the level of grouping increases." In the case of cartographic representation of radon concentrations or the development of radon potential maps, the researcher should utilize (in most cases) data from the smallest geographic units available. Similarly in the development of multivariate statistical models in which radon concentration, the dependent variable, Y, is a function of a set of independent variables, Xi, or symbolically represented as: Y = f (Xi, X2,........Xi) + e Equation 1 the effects of data aggregation must be acknowledged in the research design and dealt with explicitly. Failure to account for this effect may lead to erroneous and misleading results. If it operationally necessary or imperative to aggregate data into larger geographic units, I suggest that aggregation should occur among proximal units with approximately the same data values. This practice will tend to minimize the problem of variance reduction. In a quantitative sense, the effect of numerical map generalization is to obscure or mask fundamental and important spatial variations in the data. Cartographers and geographers make critical distinctions with respect to spatial data. Any univariate spatial data set can be reduced to a geometric primitive of point, line, area or volume. Of concern here is the distinction between areal data and point data. Areal data is most commonly represented in the form of a choropleth map (for example, Map2). The implicit assumption is that the variable being mapped is continuous and constant over each 1996 International Radon Symposium I - 4.3 areal unit and hence, any variance within the original data aggregated to this scale is reduced to zero. Furthermore, the data, in a mathematical sense, is implicitly discrete. In many cases, it is common practice to assign the data variable for each areal unit to the centroid of that unit and then use some automated technique to contour this data. The result is a contour map which is a continuous representation of the variable. A three dimensional representation of this data can also be easily generated from a contour map to produce a continuous data surface. Barber states, "Cartographers sometimes argue that a continuous areal representation is appropriate if the phenomena exists everywhere on the map, both at and between observation points. This argument appears to be valid for many physical phenomena such as rainfall and temperature, but is less compelling for a variable such as population density." (Barber, 1988). CIS DATA SETS In this study three separate and independent data sets are used in this analysis of radon concentration levels in Illinois. The first data set is drawn from data published by the U.S. Environmental Protection Agency (U.S. Environmental Protection Agency, 1993). This data is published in tabular form and also shown cartographically as a choropleth map. The second data set consists of radon concentration values compiled by an independent radon testing agency. Although this data contains information for individual residential radon levels, the locational information for each of these is identified at the zipcode level rather than the county level. The third data set, used to observe micro-scale radon levels within a single county, was provided by another testing company. Most of the individual observations in this data are concentrated in the central Illinois area. For the purpose of this study this data set was further reduced to include only observations located within the Bloomington-Normal, Illinois, metropolitan area. The locational information in this set contains both a zipcode identifier and street address for each observation. This data has been treated in such a manner as to guarantee complete locational confidentiality of individuals by minimally aggregating the data at the subdivision level. Each of the three data sets was provided in digital form, either in a spreadsheet or a database file format. Because of the relative ease of file conversion, each set was translated into a standard dBase IV format. The second major task was to create a standard digital map such that the county outline map, zipcode area map, and address location map all share a common coordinate system. A pcARCANF0 coverage of county boundaries was provided by the Illinois Department of Energy and Natural Resources. The zipcode area map data was translated from an Atlas GIs data file to a pcARC/INFO coverage. For purposes of address-matching for the Bloomington-Normal data, the most recent version of the U.S. Bureau of the Census TIGER (Topologically Integrated Geographic Encoding Referencing) line files was utilized. These files were translated into a pcARCIINF0 coverage. Because TIGER line files are based upon latitude and longitude coordinates, these were converted into a Lambert conformal projection using state plane feet as the coordinate units to conform to the same projection units as the zipcode and county coverages. Each of the three radon concentration data sets was then related to their respective GIs coverages. The three separate resulting pcARC/INFO coverages could then be individually displayed and queried as well as overlaid on other coverages. For analytical purposes, the Illinois State Geologic Survey provided pcARC/INFO coverages of the stack unit coverage (digital map and related data base) and quarternary deposits coverages for Illinois. They additionally provided bedrock depth and thickness of quarternary deposits coverages for McLean County, Illinois, which will be used in subsequent analyses of the radon concentration data for the Bloomington-Normal metropolitan area. - COUNTY LEVEL DATA ANALYSIS Table 1 summarizes data on radon concentrations by county. Of the 102 counties in the state fifty nine (58%) are located in areas dominated by Wisconsinan and Woodfordian stage glacial deposits. Although only seven counties are associated with Liman stage deposits, these counties have the highest mean radon concentrations (6.47 pCi/L). Only Woodfordian counties have a mean above 4.00 pCi/L. - 1996 International Radon Symposium 1 4.4 Average county radon concentration is shown in Map 2. Clearly the counties having average radon concentrations above 4.00 pCVL are geographically concentrated in the central, west-central and northern tier of counties in the state. Recall that representing data in this fashion implies that even as average values, these values are assumed to be uniform over each entire county and the county boundaries imply sharp transitions in average value. The continuous nature of the data is more effectively shown by contouring the data. pcARC/INFO does not provide a tool for contouring discrete data values. To remedy this problem, a point coverage using county centroids was developed and x- and y-coordinates assigned to each point. This database was then imported into SURFER (Golden Software, 1995), a powerful and versatile contouring and 3-D surface mapping software, and gridded using the Krigging contouring routine. This provides a more meaningful cartographic representation of the spatial trends of radon concentrations across the state. Map 4 is a three-dimensional representation of the contour surface and allows for the easy identification of the "hills" (high concentrations) and "valleys" (low concentrations). Map 3 is a more accurate representation of the data. The spatial trends in the data now become more apparent with the high "ridge" of radon values extending in a band from the southeast to northwest across north central Illinois. A trend to progressively lower values extends from this ridge toward the southern part of the state. ANALYSIS OF VARIANCE MODEL Although the maps reveal a great deal about the spatial distribution of radon concentration using the county data, we are now in a position to ask whether there are statistically significant differences among counties based on their dominant quarternary deposits. A simple one-way analysis of variance (ANOVA) seems appropriate to apply to this problem. In general, this model provides a simple statistical technique for identifying whether statistically significant differences exist among a partitioning of the data into meaningful categories. In the context of this study, the radon concentration data for the county-level and zipcode area -level data, can be categorized on the basis of the particular glacial stage of quartemary deposits and till members. In the case of the local level data, individual observations are classified on the basis of a glacial sub-stage. In the analysis of variance, the variance within each of the categories is compared to the variance between categories. The ratio of the between categories variance relative to the within categories variance yields the F-ratio and the F probability distribution is used as the statistical test. To calculate each of the variance terms requires first computing the sum of squares between categories and within categories. The null hypothesis can be formally stated as: and thus the alternate hypothesis: If the computed F-ratio exceeds the tabled value of the F-distribution for the relevant degrees of freedom and at a given level of significance (a = .05 or .O1 typically), we can reject the null hypothesis. The analysis of variance (Table 2) for the county level data yields a calculated value of F = 3.744. For 97 and 5 degrees of freedom respectively and with a = .05, F = 4.40. Therefore Hois accepted, implying that there are no statistically significant differences between counties on the basis of their dominant quartemary deposits. ZIPCODE LEVEL DATA ANALYSIS Average radon concentrations based on zipcode area data and quartemary formation are shown in Table 3. Of the approximately thirteen hundred zipcode areas in Illinois, only 862 had at least one valid observation. Even though the individual observations for which the averages for each zipcode area were compiled represent a statistically independent sample (from the county level data), the Liman formation zipcode areas have the highest mean but also a large variance. This is comparable to the county area data. Of the 862 valid zipocde areas, a total of - 1996 International Radon Symposium I 4.5 34.06 percent had average radon levels in excess of 4.00 pCi/L. The highest mean is associated with the Liman formation (6.47 pCi/L) with 55.9% of zipcodes exceeding 4 pCi/L. Although the Jubilean Formation has the highest percent of zipcode areas above 4 (68.8%) its mean is considerably less (4.5 pCiIL). The Wisconsinan, Mixed and Monican Formation also have high coefficients of variation (.9973, .882 1, and .83 1). Map 5 shows the spatial pattern of radon concentrations. Maps 6 and 7 show the same data as a contour and three dimensional surface respectively. This surface appears rather "spiked" because there are a number of zipcode areas that have no data. Generally, however, these maps do suggest a spatial pattern comparable to what was observed in the county level data (namely higher values concentrated in the central and west central and northern areas of the state. Table 4 shows the analysis of variance of the zipcode level data. At this level compared to the county level data, the calculated F-ratio is 12.84. The value of the Fdistribution for 8 and 1000 degrees of freedom and a = .01, is 2.53. Thus the null hypothesis (Ho)is rejected. There is thus a ninety nine percent probability that there do exists differences between the quarternary units and their associated average radon concentrations. LOCAL LEVEL DATA ANALYSIS BLOOMINGTON-NORMAL METRO AREA The Bloomington-Normal metropolitan area, is located in McLean County, Illinois. The area has experienced relatively recent rapid population growth. Most of this growth has been concentrated on the eastern side of the community, resulting in the development of new residential subdivisions. More recently, there has also been a population expansion toward the southwest. A locally based radon testing agency supplied data on residential radon levels covering mostly the McLean County area. Three hundred ninety one observations in the data base were located within the metropolitan area. This database identified the geographic location of each tested residence by street name and address and subdivision. This database was then easily merged with the GIs street coverage for Bloomington-Normal. Using the address matching procedures available in pcARC/INFO, a point coverage of all residences was created. This coverage was then overlaid on the quartemary deposits coverage. Unlike the case with the county and zipcode level data (in which the quarternary stages were used), at this scale the analysis could be undertaken using individual till units, smaller spatial units associated with quartemary substages. Table 5 presents these results. Although there are clear differences in the mean radon concentration values by till units, a total of 147 of the 391 observations (37 percent) had consentrations above 4.01 pCi/L. For all observations, the mean level for the metropolitan area is 4.04 pCi/L. Figure 2 shows the frequency distribution of radon concentration by till units. Because each observation's location was addressed-matched to a given city block's address range, for reasons of confidentiality individual observations were aggregated to the approximate geographic centroid of residential subdivisions. This data was then contoured and overlaid on the street network and quarternary deposits coverages. Spatial trends in the radon concentration surface are again evident. A ridge of high values exists over the northern section of the metropolitan area. From that ridge there is generally a trend toward areas of progressively lower concentrations. Maps 8 and 9 show these patterns. Lastly, an analysis of variance was performed on this data set. The calculated F-ratio is 19.15. For the appropriate degrees of freedom the null hypothesis is again rejected at a = .O1 level, indicating that there are statistically significant differences among till units in terms of their radon concentrations. The specific characteristics of these till units and their relationship to radon emissions awaits further investigation. 1996 International Radon Symposium 1- 4.6 SUMMARY Geographic Information Systems provide an excellent analytical tool for researching and modeling the spatial patterns of radon concentrations. The researchers wishing to analyze radon concentration data using a GIs, must however, confront the problem of geographic scale. As suggested earlier data at the least aggregated scale possible should be utilized without compromising the confidentiality of individuals. By doing this the problem of masking meaningful spatial and statistical variation that exists within and among the geographic units of analysis will be avoided. The most significant difficulty in this study is the lack of a sufficiently large database at the zipcode level. As more individual observations are aggregated to this spatial scale the contour surface should become more powerful as a predictive tool for the development of radon hazard potential maps. It will also aid in the analysis of the explanatory causes - geologic or otherwise - of high radon areas. REFERENCES ARCVIEW, Environmental System Research Institute, Redlines, CA Atlas GIs, Strategic Mapping, Inc., Santa Clara, CA Barber, Gerald M. Elementary Statistics for Geographers, New York: The Guilford Press, 1988, pp. 114-1 19 - Christensen, Lindsay G. and Rigby, James G . GIs Applications of Radon Hazard Studies An Example from Nevada, Nevada Bureau of Mines and Geology, 1996 Clark, W.A.V. and Hosking, P.L. Statistical Methods for Geographers, New York: John Wiley & Sons, Inc., 1986 Clark, W.A.V. and Avery, K. (1 976) The effects of Data Aggregation in Statistical Analysis, Geographical Analysis 8:428-438. Illinois Department of Energy and Natural Resources, Digital GIs Data, 1994 Kumar, Ashok, Heydinger, Andrew G.and Harrell, James A. Development of an Indoor Radon Information System for Ohio and Its Application in the Study of the Geology of Radon in Ohio, Ohio Air Quality Development Authority, 1990 Lineback, Jerry A. Quarternary Deposits of Illinois: Illinois State Geological Survey, scale 1:500,000, 1979 Nero, Anthony Developing a Methodology for Identifying High Radon Areas, http://eande.lbl.gov/CBS/Newsletter/NL3/Radon2.html, 1995, p. 3. pcARC/INFO, Environmental Systems Research Institute, Redlands, CA SURFER For Windows, Golden Software, Inc. Golden, CO United States Bureau of the Census, 1993. TIGER Line Postcensus Files (Illinois), Washington, D.C. United States Geological Survey, Geologic Radon Potential Maps for Counties in the Washington, D.C. MetroArea,http://sedwww.cr.usgs.gov:8080/radon/mcounty.html United States Environmental Protection Agency, EPAYsMap or Radon Zones - Illinois, Radon Division, Office of Radiation and Indoor Air, US. Environmental Protection Agency, September, 1993 1996 International Radon Symposium I - 4.7 Willman, H.B.and Frye, John C. Pleistocene Stratigraphy of Illinois, Bulletin 94, Illinois State Geological Survey, Urbana, IL, 1970 - 1996 International Radon Symposium I 4.8 Table 1: Radon Concentration Analysis by County FORh TOTALS NUMBER OF OBSERVATIONS MEAN 102 3.85 VARIANCE STANDARD COEFF. OF DEVIATION VARIATION 3.868 1.966723163 0.510837185 - 1996 International Radon Symposium I 4.11 MAP 3 Average Radon Concentrations By Counties - Contoured Values - 1996 International Radon Symposium 1 4.12 -. .. MAP 4 Average Radon Concentrations By Counties (Surface) - pCi/L - 1996 International Radon Symposium 1 4.13 Table 2: Analysis of Variance County - Scale Data Source of Variation 1 1 1 Sum of Squares Between I 75.948 1 Within 1 393.57 1 Total 1 469.518 1 Degrees of Freedom Variance F-Ratio 5 97 1, 1 ...--. 15.189 4.057 1 1 102 1 19.246 1 1 I 1 n 3.744 1 Table 3: Radon Concentration Data Zipcode Data by Quarternary Deposits d 'a g - LA FORMATION TOTALS NUMBER OF OBSERVATIONS NUMBER OF OBSERVATIONS> 4 pCi/L PERCENT > 4.0 pCi/L MEAN VARIANCE STANDARD DEVIATION COEFF. OF VARIATION I I I - 1996 International Radon Symposium 1 4. I6 MAP 6 Average Radon Concentrations By Zipcode Area - Cumtoured Values - 1996 International Radon Symposium 1 4.17 - 1996 International Radon Symposium 1 4.18 Table 4: Analysis of Variance Zipcode Scale.Data - Source of Variation Between Within Total Sum of Squares 47 1.77 3963.35 4435.12 Degrees of Freedom Variance F-Ratio 8 854 862 58.97 4.59 63.56 12.84 Table 5: Radon Concentration Data Bloomington-Normal Data UNITS CLASS TOTALS MEAN MEDIAN VARIANCE ST. DEVIATION % ABOVE 4.00 % BELOW 4.00 hm wb wbn ws wsm TOTAL 32 90 60 24 185 391 5.5 3.74 3.96 3.24 3.51 2.70 5.18 9.69 3.72 3.00 4.04 6.22 3.15 39% 61% 3.38 35% 65% 3.19 42% 58% 2.97 37% 63% 4.16 44% 56% hm: Mackinaw member of Henry fonnalion; wb: Batcaowntill member of Wedron formation; ws:Snider till member of Wedron formation; "rn" as identifier indicatesmoraine - 1996 International Radon Symposium 1 4.22 - 1996 International Radon Symposium I 4.23 FIGURE 2 Indoor Radon Levels by Type Count