rL~gL
0AV IS
STP 1283
Geostatistics for Environmental and Geotechnical Applications Shahrokh Rouhani, R. Mohan Srivastava, Alexander J. Desbarats, Marc V. Cromer, and A. Ivan Johnson, editors
ASTM Publication Code Number (PCN): 04-012830-38
ASTM 100 Barr Harbor Drive West Conshohocken, PA 19428-2959 Printed in the U.S.A.
Library of Congress Cataloging-in-Publication Data Geostatistics for environmental and geotechnical applications/ Shahrokh Rouhani ... let al.l. p. cm. - (STP: 1283) Papers presented at the symposium held in Phoenix, Arizona on 26-27 Jan. 1995, sponsored by ASTM Committee on 018 on Soil and Rock. Includes bibliographical references and index. ISBN 0-8031-2414-7 1. Environmental geology-Statistical methods-Congresses. 2. Environmental geotechnology-Statistical methods-Congresses. I. Rouhani, Shahrokh. II. ASTM Committee 0-18 on Soil and Rock. III. Series: ASTM special technical publication: 1283. QE38.G47 1996 96-42381 628.5'01 '5195-dc20 CIP
Copyright © 1996 AMERICAN SOCIETY FOR TESTING AND MATERIALS, West Conshohocken, PA. All rights reserved. This material may not be reproduced or copied, in whole or in part, in any printed, mechanical, electronic, film, or other distribution and storage media, without the written consent of the publisher.
Photocopy Rights Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by the AMERICAN SOCIETY FOR TESTING AND MATERIALS for users registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided that the base fee of $2.50 per copy, plus $0.50 per page is paid directly to CCC, 222 Rosewood Dr., Danvers, MA 01923; Phone: (508) 750-8400; Fax: (508) 750-4744. For those organizations that have been granted a photocopy license by CCC, a separate system of payment has been arranged. The fee code for users of the Transactional Reporting Service is 0-8031-2414-7/96 $2.50 + .50
Peer Review Policy Each paper published in this volume was evaluated by three peer reviewers. The authors addressed all of the reviewers' comments to the satisfaction of both the technical editor(s) and the ASTM Committee on Publications. To make technical information available as quickly as possible, the peer-reviewed papers in this publication were printed "camera-ready" as submitted by the authors. The quality of the papers in this publication reflects not only the obvious efforts of the authors and the technical editor(s), but also the work of these peer reviewers. The ASTM Committee on Publications acknowledges with appreciation their dedication and contribution to time and effort on behalf of ASTM.
Printed in Ann Arbor, MI October t 996
Foreword This publication, Geostatistics for Environmental and Geotechnical Applications, contains papers presented at the symposium of the same name held in Phoenix, Arizona on 26-27 Jan. 1995. The symposium was sponsored by ASTM Committee on DIS on Soil and Rock. The symposium co-chairmen were: R. Mohan Srivastava, FSS International; Dr. Shahrokh Rouhani, Georgia Institute of Technology; Marc V. Cromer, Sandia National Laboratories; and A. Ivan Johnson, A. Ivan Johnson, Inc.
Contents OVERVIEW PAPERS Geostatistics for Environmental and Geotechnical Applications: A Technology Transferred-MARc V. CROMER
3
Describing Spatial Variability Using Geostatistical Analysis-R. MOHAN SRI VASTA VA
13
Geostatistical Estimation: Kriging-sHAHROKH ROUHANI
20
Modeling Spatial Variability Using Geostatistical Simulation-ALEXANDER J. DESBARATS
32
ENVIRONMENTAL ApPLICATIONS Geostatistical Site Characterization of Hydraulic Head and Uranium Concentration in Groundwater-BRUcE E. BUXTON, DARLENE E. WELLS, AND ALAN D. PATE
51
Integrating Geophysical Data for Mapping the Contamination of Industrial Sites by Polycyclic Aromatic Hydrocarbons: A Geostatistical Approach-PIERRE COLIN, ROLAND FROIDEVAUX, MICHEL GARCIA, AND SERGE NICOLETIS
69
Effective Use of Field Screening Techniques in Environmental Investigations: A Multivariate Geostatistical Approach-MIcHAEL R. WILD AND SHAHROKH ROUHANI
88
A BayesianiGeostatistical Approach to the Design of Adaptive Sampling ProgramsROBERT L. JOHNSON
102
Importance of Stationarity of Geostatistical Assessment of Environmental Contamination-KADRI DAGDELEN AND A. KEITH TURNER
117
Evaluation of a Soil Contaminated Site and Clean-Up Criteria: A Geostatistical Approach-DANIELA LEONE AND NEIL SCHOFIELD
133
Stochastic Simulation of Space-Time Series: Application to a River Water Quality Modelling-AMILcAR o. SOARES, PEDRO J. PATINHA, AND MARIA J. PEREIRA
146
Solid Waste Disposal Site Characterization Using Non-Intrusive Electromagnetic Survey Techniques and Geostatistics-GARY N. KUHN, WAYNE E. WOLDT, DAVID D. JONES, AND DENNIS D. SCHULTE
162
GEOTECHNICAL AND EARTH SCIENCES ApPLICATIONS
Enhanced Subsurface Characterization for Prediction of Contaminant Transport Using Co-Kriging---CRAIG H. BENSON AND SALWA M. RASHAD
181
Geostatistical Characterization of Unsaturated Hydraulic Conductivity Using Field Infiltrometer Data-sTANLEY M. MILLER AND ANJA J. KANNENGIESER
200
Geostatistical Simulation of Rock Quality Designation (RQD) to Support Facilities Design at Yucca Mountain, Nevada-MARc V. CROMER, CHRISTOPHER A. 218
RAUTMAN, AND WILLIAM P. ZELINSKI
Revisiting the Characterization of Seismic Hazard Using Geostatistics: A Perspective after the 1994 Northridge, California Earthquake-JAMES R. CARR Spatial Patterns Analysis of Field Measured Soil
236
Nitrate-FARIDA S. GODERY A, M. F.
DAHAB, W. E. WOLDT, AND I. BOGARD!
248
Geostatistical Joint Modeling and Probabilistic Stability Analysis for ExcavationsDAE S. YOUNG
Indexes
262 277
Overview Papers
Marc V. Cromer l
Geostatistics for Environmental and Geotechnical Applications: A Technology Transferred
REFERENCE: Cromer, M. V., "Geostatistics for Environmental and Geotechnical Applications: A Technology Transferred," Geostatistics for Environmental and Geotechnical Applications. ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. J. Desbarats, A. I. Johnson, Eds., American Society for Testing and Materials, 1996. ABSTRACT: Although successfully applied during the past few decades for predIcting the spatial occurrences of properties that are cloaked from direct observation, geostatistical methods remain somewhat of a mystery to practitioners in the environmental and geotechnical fields. The techniques are powerful analytical tools that integrate numerical and statistical methods with scientific intuition and professional judgment to resolve conflicts between conceptual interpretation and direct measurement. This paper examines the practicality of these techniques within the entitles field of study and concludes by introducing a practical case study in which the geostatistical approach is thoroughly executed. KEYWORDS: Geostatistics, environmental investigations, decision analysis tool
INTRODUCTION Although, geostatistics is emerging on environmental and geotechnical fronts as an invaluable tool for characterizing spatial or temporal phenomena, it is still not generally considered "standard practice" in these fields. The technology is borrowed from the mining and petroleum exploration industries, starting with the pioneering work of Danie Krige in the 1950's, and the mathematical formalization by Georges Matheron in the early 1960's. In these industries, it has found acceptance through successful application to cases where decisions concerning high capital costs and operating practices are based on interpretations derived from sparse spatial data. The application of geostatistical methods has since extended to many fields relating to the earth sciences. As many geotechnical and, certainly, environmental studies are faced with identical "high-stakes" decisions, geostatistics appears to be a natural transfer of technology. This paper outlines the unique characteristics of this sophisticated technology and discusses its applicability to geotechnical and environmental studies.
1 Principal Investigator, Sandia National Laboratories/Spectra Research Institute, MS 1324. P.O. Box 5800, Albuquerque, NM 87185-1342
3
4
GEOSTATISTICAL APPLICATIONS
IT'S GEOSTATISTICS The field of statistics is generally devoted to the analysis and interpretation of uncertainty caused by limited sampling of a property under study. Geostatistical approaches deviate from more "classical" methods in statistical data analyses in that they are not wholly tied to a population distribution model that assumes samples to be normally distributed and uncorrelated. Most earth science data sets, in fact, do not satisfy these assumptions as they often tend to have highly skewed distributions and spatially correlated samples. Whereas classical statistical approaches are concerned with only examining the statistical distribution of sample data, geostatistics incorporates the interpretations of both the statistical distribution of data and the spatial relationships (correlation) between the sample data. Because of these differences, environmental and geotechnical problems are more effectively addressed using geostatistical methods when interpretation derived from the spatial distribution of data have impact on decision making risk. Geostatistical methods provide the tools to capture, through rigorous examination, the descriptive information on a phenomenon from sparse, often biased, and often expensive sample data. The continued examination and quantitative rigor of the procedure provide a vehicle for integrating qualitative and quantitative understanding by allowing the data to "speak for themselves." In effect, the process produces the most plausible interpretation by continued examination of the data in response to conflicting interpretations.
A GOAL-ORIENTED, PROJECT COORDINATION TOOL The application of geostatistics to large geotechnical or environmental problems has also proven to be a powerful integration tool, allowing coordination of activities from the acquisition offield data to design analysis (Ryti, 1993; Rautman and Cromer, 1994; Wild and Rouhani, 1995). Geostatistical methods encourage a clear statement of objectives to be set prior to any study. With these study objectives defined, the flow of information, the appropriate use of interpretations and assumptions, and the customer/supplier feedback channels are defined. This type of coordination provides a desirable level of tractability that is often not realized. With environmental restoration projects, the information collected during the remedial investigation is the sole basis for evaluating the applicability of various remedial strategies, yet this information is often incomplete. Incomplete information translates to uncertainty in bounding the problem and increases the risk of regulatory failure. While this type of uncertainty can often be reduced with additional sampling, these benefits must be balanced with increasing costs of characterization. The probabilistic roots deeply entrenched into geostatistical theory offer a means to quantify this uncertainty, while leveraging existing data in support of sampling optimization and riskbased decision analyses. For example, a geostatistically-based, costlrisklbenefit approach to sample optimization has been shown to provide a framework for examining the many tradeoffs encountered when juggling the risks associated with remedial investigation, remedial
CROMER ON A TECHNOLOGY TRANSFERRED
5
design, and regulatory compliance (Rautman et. aI., 1994). An approach such as this explicitly recognizes the value of information provided by the remedial investigation, in that additional measurements are only valuable to the extent that the information they provide reduces total cost. GEOSTATISTICAL PREDICTION
The ultimate goal of geostatistical examination and interpretation, in the context of risk assessment, is to provide a prediction of the probable or possible spatial distribution of the property under study. This prediction most commonly takes the form of a map or series of maps showing the magnitude and/or distribution of the property within the study. There are two basic forms of geostatistical prediction, estimation and simulation. In estimation, a single, statistically "best" estimate of the spatial occurrence of the property is produced based on the sample data and on the model determined to most accurately represent the spatial correlation of the sample data. This single estimate (map) is produced by the geostatistical technique commonly referred to as kriging. With simulation, many equally-likely, high-resolution images of the property distribution can be produced using the same model of spatial correlation as developed for kriging. The images have a realistic texture that mimics an exhaustive characterization, while maintaining the overall statistical character of the sample data. Differences between the many alternative images (models) provides a measure of joint spatial uncertainty that allows one to resolve risk-based questions ... an option not available with estimation. Like estimation, simulation can be accomplished using a variety of techniques and the development of alternative simulation methods is currently an area of active research. NOT A BLACK BOX
Despite successful application during the past few decades, geostatistical methods remain somewhat of a mystery to practitioners in the geotechnical and environmental fields. The theoretical complexity and effort required to produce the intermediate analysis tools needed to complete a geostatistical study has often deterred the novice from this approach. Unfortunately, to many earth scientists, geostatistics is considered to be a "black box." Although this is far from the truth, such perceptions are often the Achilles' heel of many mathematical/numeric analytical procedures that harness data to yield their true worth because they require a commitment in time and training from the practitioner to develop some baseline proficiency. Geostatistics is not a solution, only a tool. It cannot produce good results from bad data, but it will allow one to maximize that information. Geostatistics cannot replace common sense, good judgment, or professional insight, in fact it demands these skills to be brought to bare. The procedures often take one down a blind alley, only to cause a redirection to be made because of an earlier miss-interpretation. While these exercises are nothing more than cycling through the scientific method, they are often more than the novice is willing to commit to. The time and frustration associated with continually rubbing one's nose in the
6
GEOSTATISTICAL APPLICATIONS
details of data must also take into account the risks to the decision maker. Given the tremendous level of financial resources being committed to field investigation, data collection, and information management to provide decision making power, it appears that such exercises are warranted. CASE STUDY INTRODUCTION This introductory paper only attempts to provide a gross overview of geostatistical concepts with some hints to practical application for these tools within the entitled fields of scientific study. Although geostatistics has been practiced for several decades, it has also evolved both practically and theoretically with the advent offaster, more powerful computers. During this time a number of practical methods and various algorithms have been developed and tested, many of which still have merit and are practiced, but many have been left behind in favor of promising research developments. Some of the concepts that I have touched upon will come to better light in the context of the practical examination addressed in the following suite of three overview papers provided by Srivastava (1996), Rouhani (1996), and Desbarats (1996). In this case study, a hypothetical database has been developed that represents sampling of two contaminants of concern: lead and arsenic. Both contaminants have been exhaustively characterized as a baseline for comparison as shown in Figures 1 and 2. The example scenario proposes a remedial action threshold (performance measure) of 500 ppm for lead and 30 ppm for arsenic for the particular remediation unit or "VSR" (as discussed by Desbarats, 1996). Examination of the exhaustive sample histogram and univariate statistics in Figures 1 and 2 indicate about one fifth of the area is contaminated with lead, and one quarter is contaminated with arsenic. The two exhaustive databases have been sampled in two phases, the first of which was on a pseudo-regular grid (square symbols in Figure 3) at roughly a separation distance of 50 m. In this first phase, only lead was analyzed. In the second sampling phase, each first-phase sample location determined to have a lead concentration exceeding the threshold was targeted with eight additional samples (circle symbols of Figure 3) to delineate the direction of propagation of the contaminant. To mimic a problem often encountered in an actual field investigation, during the second phase of sampling arsenic contamination was detected and subsequently included in the characterization process. Arsenic concentrations are posted in Figure 4 with accompanying sample statistics. The second phase samples, therefore, all have recorded values for both arsenic and lead. Correlation between lead and arsenic is explored by examining the co-located exhaustive data which are plotted in Figure 5. This comparison indicates moderately good correlation between the two constituents with a correlation coefficient of 0.66, as compared to the slightly higher correlation coefficient of 0.70 derived from the co-located sample data plotted in Figure 6. There are a total of 77 samples from the first phase of sampling and 13 5 from the second phase. The second sampling phase, though, has been biased because of its focus on "hot-
CROMER ON A TECHNOLOGY TRANSFERRED
7
FIGURE 1: EXHAUSTIVE PB DATA
o
10
1000 ppm
500
Number of samples: 7700 Number of samples = 0 ppm: 213 (3%) Number of samples> 500 ppm: 1426 (19%)
~ I
~ ~ >.
9 ' 8 1
d
u
51
:::J
cr
4 3
U.
2 ~
c: Q)
Q) ....
Minimum: Lower quartile: Median: Upper quartile: Maximum:
I
6 +
1
t j
0 ppm 120 ppm 261 ppm 439 ppm 1066 ppm
Mean: 297 ppm Standard deviation: 218 ppm
+
0 0
100
200
300
400
500
Pb (ppm)
600
700
800
900
1000
8
GEOSTATISTICAL APPLICATIONS
FIGURE 2: EXHAUSTIVE AS DATA
0
200 ppm
30
44% 20 -
[J
Number of samples: 7700 Number of samples =0 ppm: 1501 (19%) Number of samples> 30 ppm: 1851 (24%)
18 "
~
16 +
>()
12 1
Minimum: Lower quartile: Median: Upper quartile: Maximum:
c 14 1 :.=c
10+
::J
8t
cQ)
6T
u..
4 "'"
Q)
....
2t 0 "-
li'~Q a
20
40
60
80
100
As (ppm)
120
o ppm 1 ppm 6 ppm 29 ppm 550 ppm
Mean: 22 ppm Standard deviation: 35 ppm
140
I
I
160
180
200
9
CROMER ON A TECHNOLOGY TRANSFERRED
FIGURE 3: SAMPLE PB DATA
•
•
•
•
•
•
•
•
•
•
•
•
•
•
.,
. I.,
•
• •
•
•
• •
,
C
Iff> CO•
•
•
•
•
~O
•
•
.
•
0
o
• 0
(.)
c:
Q)
:J C"
,
0
•
•
• •
•
o
500
..
C
1000 ppm
Number of samples: 212 Number of samples = 0 ppm: 1 (0%) Number of samples> 500 ppm: 91 (43%)
8t
Minimum: Lower quartile: Median: Upper quartile: Maximum:
7 6+ 5+
4f
Q)
3+
LL
2"
....
•
•
o. • • • .... . I c9 . ... . ,. , .' .,fI..,..
91
c:
•
• •• • • •• q. 0 •• • • •• I •• •••
•
10 ·
-=->-
•
•
•
•
C•
•••••• tP· · .O
o
~ 0
•
•
•
C).c9 .
,
c. •.
•
•
1:1
•
•
• • • III • • •
• • 00 0
•
III
cit • •
•
•
•
•
•
•
•
d O ~
0 ppm 239 ppm 449 ppm 613 ppm 1003 ppm
Mean : 431 ppm Standard deviation: 237 ppm
~~~~~~~~~~~
0
100
200
300
400
500
Pb (ppm)
600
700
800
900
1000
10
GEOSTATISTICAL APPLICATIONS
FIGURE 4: SAMPLE AS DATA
• o•
·• • , . •• It
. .. ,
• •• • • .. I·, ~CO c::> .c9 • • 1°. • ••• ••• • •
,
0
0
~ • I ·
•• •• • •• • •• ••••••
0
o
••
•••
•••
• <e
~
0
o
o. • • • . : •• I
•... •. ". •••
•
~ d
"0·. 0 ·
200 ppm
30
22%
[J 20 18
t
~ 0
16 T
~
14 1 12 I
>()
Number of samples: 135 Number of samples =0 ppm: 12 (9%) Number of samples> 30 ppm: 51 (38%) Minimum: Lower quartile: Median: Upper quartile: Maximum:
c: 10 t Q)
:::J
8
1
C-
61
u..
4~
Q) '-
I 2t 0 ""
0 ppm 6 ppm 21 ppm 50 ppm 157 ppm
Mean: 33 ppm Standard deviation: 36 ppm
..J..J..L.1...l--W--4-J-L.4-J.~Ll..JJ.dJl!:ILt 0
20
40
60
80
100
As (ppm)
120
140
ill
160
~ ~ ~ 180
;
200
CROMER ON A TECHNOLOGY TRANSFERRED
FIGURE 5: EXHAUSTIVE DATA
• • ••
•
• • • •• •• • •• • • • ••• • •••• •• •
,
160 140 _120
E a.
..9: 1
. :.
I.. • •••
Correlation coefficient: 0.66
•
••
•
".
•
(/)
«
•
800
900
1000
FIGURE 6: SAMPLE DATA 200
Correlation coefficient: 0.7
180 160 140
•
_120
• •
E a.
..9: 100 80
• ••
..
• ••
... ._... .-....,..
60 40
• •
.' .
\
20 0
•
•
• • • • •• •• • • • I' • • • •••• • •
(/)
«
•
•
0
•• 100
• n-...:J.!r ~ 200 300 400 500 600 I-
'
I
Pb (ppm)
I
I
I
I
I
700
800
900
1000
11
12
GEOSTATISTICAL APPLICATIONS
spot" delineation. This poses some difficult questions/problems from the perspective of spatial data analysis: What data are truly representative of the entire site and should be used for variography or for developing distributional models? What data are redundant or create bias? Have we characterized arsenic contamination adequately? These questions are frequently encountered, especially in the initial phases of a project that has not exercised careful pre-planning. The co-located undersampling of arsenic presents an interesting twist to a hypothetical, yet realistic, problem from we can explore the paths traveled by the geostatistician.
REFERENCES Desbarats, A.J., "Modeling Spatial Variability Using Geostatistical Simulation," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for Testing and Materials, Philadelphia, 1996. Rautman, C.A., M.A. McGraw, J.D. Istok, 1M. Sigda, and P.G. Kaplan, "Probabilistic Comparison of Alternative Characterization Technologies at the Fernald Uranium-InSoils Integrated Demonstration Project", Vol. 3, Technolo~y and Pro~rams for Radioactive Waste Mana~ement and Environmental Restoration, proceedings of the Symposium on Waste Management, Tucson, AZ, 1994. Rautman, C.A. and M.V. Cromer, 1994, "Three-Dimensional Rock Characteristics Models Study Plan: Yucca Mountain Site Characterization Plan SP 8.3.1.4.3.2", U.S. Department of Energy, Office of Civilian Radioactive Waste Management, Washington, DC. Rouhani, S., "Geostatistical Estimation: Kriging," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for Testing and Materials, Philadelphia, 1996. Ryti, R., "Superfund Soil Cleanup: Developing the Piazza Road Remedial Design," .Im!1:nill. Air and Waste Mana~ement, Vol. 43, February 1993. Srivastava, R.M., "Describing Spatial Variability Using Geostatistical Analysis," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for Testing and Materials, Philadelphia, 1996. Wild, M. and S. Rouhani, "Taking a Statistical Approach: Geostatistics Brings Logic to Environmental Sampling and Analysis," Pollution En~ineerin~, February 1995.
R. Mohan Srivastava! DESCRIBING SPATIAL VARIABILITY USING GEOSTATISTICAL ANALYSIS
REFERENCE: Srivastava, R. M., "Describing Spatial Variability Using Geostatistical Analysis," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. J. Desbarats, A. I. Johnson, Eds., American Society for Testing and Materials, 1996. ABSTRACT: The description, analysis and interpretation of spatial variability is one of the cornerstones of a geostatistical study. When analyzed and interpreted propp-r'j', the pattern of spatial variability can be used to plan further sampling programs, to improve estimates and to build geologically realistic models of rock, soil and fluid properties. This paper discusses the tools that geostatisticians use to study spatial variability. It focuses on two of the most common measures of spatial variability, the variogram and the correlogram, describes their appropriate uses, their strengths and their weaknesses. The interpretation and modelling of experimental measures of spatial variability are discussed and demonstrated with examples based on a hypothetical data set consisting of lead and arsenic measurements collected from a contaminated soil site.
KEYWORDS: Spatial variation, variogram, correlogram.
INTRODUCTION
Unlike most classical statistical studies, in which samples are commonly assumed to be statistically independent, environmental and geotechnical studies involve data that are not statistically independent. Whether we are studying contaminant concentrations in soil, rock and fluid properties in an aquifer, or the physical and mechanical properties of soil, data values from locations that are close together tend to be more similar than data values from locations that are far apart. To most geologists, the fact that closely lManager, FSS Canada Consultants, 800 Millbank, Vancouver, 13
Be, Canada V5V 3K8
14
GEOSTATISTICAL APPLICATIONS
spaced samples tend to be similar is hardly surprising since samples from closely spaced locations have been influenced by similar physical and chemical processes. This overview paper addresses the description and analysis of spatial dependence in geostatistical studies, the interpretation of the results and the development of a mathematical model that can be used in spatial estimation and simulation. More specific guidance on the details of analysis, interpretation and modelling of spatial variation can be found in the ASTM draft standard guide entitled Standard Guide for Analysis of Spatial Variation in Geostatistical Site Investigations. DESCRIBING AND ANALYZING SPATIAL VARIATION
Using the sample data set presented earlier in this volume in the paper by Cromer, Figure 1 shows an example of a "variogam", the tool that is most commonly used in geostatistical studies to describe spatial variation. A variogram is a plot of the average squared differences between data values as a function of separation distance. If the phenomenon being studied was very continuous over short distances, then the differences between closely spaced data values would be small, and would increase gradually as we compared pairs of data further and further apart. On the other hand, if the phenomenon was completely erratic, then pairs of closely spaced data values might be as wildly different as pairs of widely spaced data values. By plotting the average squared differences between data values (the squaring just makes everything positive so that large negative differences do not cancel out large positive ones) against the separation distance, we can study the general pattern of spatial variability in a spatial phenomenon. Figure 2 shows an example of another tool that can be used to describe spatial variation, the "correlogram" or "correlation function". On this type of plot, we again group all of the available data into different classes according to their separation distance, but rather than plotting the average squared difference between the paired data values, we plot their correlation coefficient. If the phenomenon under study was very continuous over short distances, then closely spaced data values would correlate very well, and would gradually decrease as we compared pairs of data further and further apart. On the other hand, if the phenomenon was completely erratic, then pairs of closely spaced data values might be as uncorrelated as pairs of widely spaced data values. A plot of the correlation coefficient between pairs of data values as a function of the separation distance provides a description of the general pattern of spatial continuity.
SRIVASTAVA ON SPATIAL VARIABILITY
60000
1.2 1.0 c: 0.8 .12 .i 0.6 ~ 0.4 0.2 0.0 -0.2
50000
~ 40000
.~
30000
~ 20000
8
10000
o
15
o
20
40
60
80
100 120
0
Separation distance (in m)
Figure 1. An example of a variogram using the sample lead data set described by Cromer (1996).
20
40
60
80
100 120
Separation distance (in m)
Figure 2. An example of a correlogram using the sample lead data set described by Cromer (1996).
As can be seen by the examples in Figures 1 and 2, the variogram and the correlogram are, in an approximate sense, mirror images. As the variogram gradually rises and reaches a plateau, the correlogram gradually drops and also reaches a plateau. They are not exactly mirror images of one another, however, and a geostatistical study of spatial continuity often involves both types of plots. There are other tools that geostatistician use to describe spatial continuity, but they all fall into two broad categories: measures of dissimilarity and measures of similarity. The measures of similarity record how dif. ferent the data values are as a function of separation distance and tend to rise like the variogram. The measures of dissimilarity record how similar the data values are as a function of separation distance and tend to fall like the correlogram. INTERPRETING SPATIAL VARIATION
Variograms are often summarized by the three characteristics shown in Figure 3:
Sill: The plateau that the variogram reaches; for the traditional definition of the variogram - the average squared difference between paired data values - the sill is approximately equal to twice the variance of the data. 3 3The "semivariogram", which is simply the variogram divided by two, has a sill that is approximately equal to the variance of the data.
16
GEOSTATISTICAL APPLICATIONS
Range : The distance at which the variogram reaches the sill; this is often thought of as the "range of influence" or the "range of correlation" of data values. Up to the range, a sample will have some correlation with the unsampled values nearby. Beyond the range, a sample is no longer correlated with other values. Nugget Effect: The vertical height of the discontinuity at the origin. For a separation distance of zero (i.e. samples that are at exactly the same location), the average squared differences are zero. In practice, however, the variogram does not converge to zero as the separation distance gets smaller. The nugget effect is a combination of:
• short-scale variations that occur at a scale smaller than the closest sample spacing • sampling error due to the way that samples are collected, prepared and analyzed
.,
Range 80000
-
E 60000 !!!
.~
40000
>
20000
,, , ,
-
I'll
--, II:
II'
0 0
20
40
60
80
-- Sill
--
- - Nugget effect
II
II
100 120
Separation distance (in m)
Figure 3. Terminology commonly used to describe the main features of a variogram. Of the three characteristics commonly used to summarize the variogram, it is the range and the nugget effect that are most directly linked to our intuitive sense of whether the phenomenon under study is "continuous" or "erratic". Phenomena whose variograms have a long range of correlation and a low nugget effect are those that we think of as "well behaved" or "spatially continuous"; attributes such as hydrostatic head, thickness of a soil layer and topographic elevation typically have long ranges and low nugget effects. Phenomena whose variograms have a short range of correlation and a high nugget
17
SRIVASTAVA ON SPATIAL VARIABILITY
effect are those that we think of as "spatially erratic" or "discontinuous" j contaminant concentrations and permeability typically have short ranges and high nugget effects. Figure 4 compares the lead and arsenic variograms for the data set presented earlier in this volume by Cromer. For these two attributes, the higher nugget effect and shorter range on the arsenic variogram could be used as quantitative support for the view that the lead concentrations are somewhat more continuous than the arsenic concentrations. (b) Arsenic
(a) Lead
1600 1400 E 1200 ~ 1000
60000 50000 E 40000 ~
.2 30000
·2 CG
~ 20000
>
10000
o
o
20
40
60
80
soo
600 400 200 0
100 120
0
Separation distance (in m)
20
40
60
80
100 120
Separation distance (in m)
Figure 4. Lead and arsenic variograms for the sample data described by Cromer (1996).
(b) Northeast - Southwest
(a) Northwest - Southeast 60000
60000
50000
50000
~
~ 40000
40000
g, 30000 .c:
g, 30000 .c:
~ 20000
~ 20000
10000
10000
o
o
o 20
40
60
80
100 120
Separation distance (in m)
o
20
40
60
80
100 120
Separation distance (in m)
Figure 5. Directional variograms for the sample lead data described by Cromer (1996). In many earth science data sets, the pattern of spatial variation is directionally dependent . In terms of the variogram, the range of correlation often depends on direction.
18
GEOSTATISTICAL APPLICATIONS
Using the example presented earlier in this volume by Cromer, the lead values appear to be more continuous in the NW-SE direction than in the NE-SW direction. Geostatistical studies typically involve the calculation of separate variograms and correlograms for different directions. Figure 5 shows directional variograms for the sample lead data presented by Cromer. The range of correlation shown by the NW-SE variogram (Figure 5a) is roughly 80 meters , but only 35 meters on the NE-SW variogram (Figure 5b). This longer range on the NW-SE variogram provides quantitative support for the observation that the lead values are, indeed, more continuous in this direction and more erratic in the perpendicular direction. MODELLING SPATIAL VARIATION
Once the pattern of spatial variation has been described using directional variograrns or correlograms, this information can be used to geostatistical estimation or simulation procedures. Unfortunately, variograms and correlograms based on sample data cannot provide information on the degree of spatial continuity for every possible distance and in every possible direction. The directional variograms shown in Figure 5, for example, provided information on the spatial continuity every 10 m in two specific directions. The estimation and simulation algorithms used by geostatisticians require information on the degree of spatial continuity for every possible distance and direction. To create a model of spatial variation that can be used for estimation and simulation, it is necessary to fit a mathematical curve to the sample variograms.
(a) Northwest - Southeast 60000
60000
50000
50000
~ 40000
~ ·c
(b) Northeast - Southwest
~ 40000
0-
~
30000
30000
~ 20000
·c ~ 20000
10000
10000
o
o
o 20
40
60
80
100 120
Separation distance (in m)
o
20
40
60
80
100 120
Separation distance (in m)
Figure 6. Variogram models for the directional sample variograms shown in Figure 5.
SRIVASTAVA ON SPATIAL VARIABILITY
19
The traditional practice of variogram modelling makes use of a handful of mathematical functions whose shapes approximate the general character of most sample variograms. The basic functions - the "spherical", "exponential" and "gaussian" variogram models - can be combined to capture the important details of almost any sample variogram. Figure 6 shows variogram models for the directional variograms of lead (Figure 5). Both of these use a combination of two spherical variogram models, one to capture short range behavior and the other to capture longer range behavior, along with a small nugget effect to model the essential details of the sample variograms. In kriging algorithms such as those described later in this volume by Rouhani, it is these mathematical models of the spatial variation that are used to calculate the variogram value between any pair of samples, and between any sample and the location being estimated.
REFERENCES ASTM, Standard Guide for Analysis of Spatial Variation in Geostatistical Site Investigations, 1996, Draft standard from D18.01.07 Section on Geostatistics. Cromer, M.V., 1996, "Geostatistics for Environmental and Geotechnical Applications: A Technology Transfer," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Ed., American Society for Testing and Materials, West Conshohocken, PA. Deutsch, C.V. and Journel, A.G., 1992, GSLIB: Geostatistical Software Library and User's Guide, Oxford University Press, New York, 340 p. Isaaks, E.H. and Srivastava, R.M., 1989, An Introduction to Applied Geostatistics, Oxford University Press, New York, 561 p. Journel, A.G. and Huijbregts, C., 1978, Mining Geostatistics, Academic Press, London, 600p. Rouhani, S., 1996, "Geostatistical Estimation: Kriging," Geostatistics for Environmental and Geotechnical Applications, ASTM STP ma, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Ed., American Society for Testing and Materials, West Conshohocken, PA. Srivastava, R.M. and Parker, H.M., 1988, "Robust measures of spatial continuity," Geostatistics, M. Armstrong (ed.), Reidel, Dordrecht, p. 295-308.
Shahrokh Rouhani 1 GEOSTATISTICAL ESTIMATION: KRIGING
REFERENCE: Rouhani, S., "Geostatistical Estimation: Kriging," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander 1. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: Geostatistics offers a variety of spatial estimation procedures which are known as
kriging. These techniques are commonly used for interpolation of point values at unsampled locations and estimation of average block values. Kriging techniques provide a measure of accuracy in the form of an estimation variance. These estimates are dependent on the model of spatial variability and the relative geometry of measured and estimated locations. Ordinary kriging is a linear minimum-variance interpolator that assumes a constant, but unknown global mean. Other forms of linear kriging includes simple and universal kriging, as well as co-kriging. If measured data display non-Gaussian tendencies, more accurate interpolation may be obtained through non-linear kriging techniques, such as lognormal and indicator kriging. KEYWORDS: Geostatistics, kriging, spatial variability, mapping, environmental investigations. Many environmental and geotechnical investigations are driven by biased or preferential sampling plans. Such plans usually generate correlated, and often clustered, data. Geostatistical procedures recognize these difficulties and provide tools for various forms of spatial estimations. These techniques are COllectively known as kriging in honor of D. G. Krige, a South African mining engineer who pioneered the use of weighted moving averages in the assessment of ore bodies. Common applications of kriging in environmental and geotechnical engineering include: delineation of contaminated media, estimation of average concentrations over exposure domains, as well as mapping of soil parameters and piezometric surfaces (Joumel and Huijbregts, 1978; Delhomme, 1978; ASCE, 1990). The present STP offers a number of papers that cover various forms of geostatistical estimations, such Benson and Rashad (1996), Buxton (1996), Goderya et at. (1996), and Wild and Rouhani (1996). Comparison of kriging to other commonly used interpolation techniques, such as distanceweighting functions, reveals a number of advantages (Rouhani, 1986). Kriging directly incorporates the model of the spatial variability of data. This allows kriging to produce sitespecific and variable-specific interpolation schemes. Estimation criteria of kriging are based on IAssociate Professor, School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0355.
20
ROUHANI ON KRIGING
21
well-defmed statistical conditions, and thus, are superior to subjective interpolation techniques. Furthennore, the automatic declustering of data by kriging makes it a suitable technique to process typical environmental and geotechnical measurements. Kriging also yields a measure for the accuracy of its interpolated values in the fonn of estimation variances. These variances have been used in the design of sampling plans because of two factors: (1) each estimate comes with an estimation variance, and (2) the estimation variance does not depend on the individual observations (Loaiciga et al., 1992). Therefore, the impact of a new sampling location can be evaluated before any new measurements are actually conducted (Rouhani, 1985). Rouhani and Hall (1988), however, noted that in most field cases the use of estimation variance, alone, is not sufficient to expand a sampling plan. Such plans usually require consideration of many factors in addition to the estimation variance. To use the estimation variance as a basis for sampling design, additional assumptions must be made about the probability density function of the estimation error. A common practice is to assume that, at any location in the sampling area, the errors are normally distributed with a mean of zero and a standard deviation equal to the square root of the estimation variance, referred to as kriging standard deviation. The nonnal distribution of the errors has been supported by practical evidence (Journal and Huijbregts, 1978, p. 50 and 60).
Ordinary Kriging Among geostatistical estimation methods, ordinary kriging is the most widely used in practice. This procedure produces minimum-variance estimates by taking into account: (1) the distance vector between the estimated point and the data points; (2) the distance vector between data points themselves; and (3) the statistical structure of the variable. This structure is represented by either the variogram, the covariance or the correlogram function. Ordinary kriging is also capable of processing data averaged over different volumes and sizes. Ordinary kriging is a "linear" estimator. This means that its estimate, Z', is computed as a weighted sum of the nearby measured values, denoted as z!, ~, ... , and Zy,. The fonn of the estimation is n
2:A;Z;
1
;=}
where Ai'S are the estimation weights. Z· can either represent a point or a block-averaged value, as shown in Fig. 1. Point kriging provides the interpolated value at an unsampled location. Block kriging yields an areal or a volumetric average over a given domain. • The kriging weights, Ai' are chosen so as to satisfy two suitable statistical conditions. These conditions are: ! (1) Non-bias condition: This condition requires that the estimator Z· to be free of any ! systematic error, which translates into
I !
i f
~
22
GEOSTATISTICAL APPLICATIONS
•
Zl
•
z· • Z4
.~
Z:l
• (a)
(b)
Fig. 1. Example of Spatial Estimation: (a) Point Kriging; (b) Block Kriging.
ROUHANI ON KRIGING
8
~
E
Q, Q,
.5 5os Cl 'g .., ....l
'0
..,;.
rIl <:> <:>
."
'./:3
::l
os
~
>L1 N o.i)
~
<:>
23
24
GEOSTATISTICAL APPLICATIONS
tAi
=1
2
i==/
(2)
Minimum-variance condition: This requires that the estimator Z' have minimum variance of estimation. The estimation variance of Z', d-, is defmed as where Yio is the variogram between the i-th measured point and the estimated location and n
0.2
=
n
n
iLAiY io - 2:LAiAjYij + Y oo i:}
3
i:/ j : }
Yij is the variogram between the i-th and j-th measured points. The kriging weights are computed by minimizing the estimation variance (Eq. 3) subject to the non-bias condition (Eq. 2). The computed weights are then used to calculate the interpolated value (Eq. 1). As Delhomme (1978) notes: "the kriging weights are tailored to the variability of the phenomenon. With regular variables, kriging gives higher weights to the closest data points, precisely since continuity means that two points close to each other have similar values. When the phenomenon is irregular, this does not hold true and the weights given to the closest data points are dampened." Such flexibility does not exist in methods, such as distance weighting, where the weights are pre-defmed as functions of the distance between the estimated point and the data point.
Case Study: Kriging of Lead Data
As noted in Cromer (1996), a soil lead field is simulated as a case study as shown in Fig. 2. The measured values are collected from this simulated field. Similar to most environmental investigations, the sampling activities are conducted in two phases. During the first phase a pseudo-regular grid of 50x50 m is used for soil sampling. In the second phase, locations with elevated lead concentrations are targeted for additional irregular sampling, as indicated in Fig. 3. The analysis of the spatial variability of the simulated field is presented in the previous paper (Srivastava, 1996). Using this information, ordinary kriging is conducted. Fig. 4 displays the kriging results of point estimations. The comparison of the original simulated field (Fig. 2) and the kriged map (Fig. 4) shows that the kriged map captures the main spatial features of lead contamination. This comparison, however, indicates a degree of smoothing in the kriged map which is a consequence of the interpolation process. In cases where the preservation of the spatial variability of the measured field is critical to the study objectives, then the use of kriging for estimation alone is inappropriate and simulation methods are recommended (Desbarats, 1996). Each kriged map is accompanied by its accuracy map. Fig. 5 displays the kriging
25
ROUHANI ON KRIGING
• • • • • • • • • • • •• • • •• • • • • • • • • • • • • • ffII#fI' • fI' • • • • •••~ C • • • • •• •• • •
e
0. 0.
.S
§
c:
0 .J:l
'"c:
b
C!)
g 0
U
.",
'" C!)
-l
·0
....0
Ul
8
0()
en C!)
..,
i5..
E
'"
Ul .",
B Q C!)
~
u
C!)
0
en
'"0.
..c:I
0
~ E-<
M bil
ti:
I\)
0>
G)
m
oen
g
Cii
-t
ol>
r l> -0 -0
r
o ~ o z en
o
500
1000
Fig. 4. Soil Lead Concentration Map by Ordinary Kriging in ppm (Blank spaces are not estimated)
:Il
oC I
~
Z
o
120
240
oZ A :Il
Fig. 5. Kriging Standard Deviation of Soil Lead Concentration in ppm
Gi Z
Gl
I\)
-..J
28
GEOSTATISTICAL APPLICATIONS
standard deviation map of soil lead data. zones of high versus poor data coverage.
This latter map can be used to distinguish between
Block Kriging In many instances, available measurements represent point or quasi-point values, but the study requires the computation of the areal or volumetric value over a lager domain. For instance, in environmental risk assessments, the desired concentration term should represent the Depending on the computed average average contamination over an exposure domain. concentration or its upper confidence limit, a block is declared impacted or not-impacted. This shows that the decision is based on the estimated block value, and not its true value. So there is a chance of making error in two forms: (1) Wrong Rejection: Certain blocks will be considered impacted, while their true average concentration is below the target level, and (2) Wrong Acceptance: Certain blocks will be considered not-impacted when their true average concentrations are above the target level. As shown in Iournel and Huijbregts (1978, p. 459), the kriging block estimator, Z', is the linear estimator that minimizes the sum of the above two errors. Therefore, the block kriging procedure is preferred to any other linear estimator for such selection problems.
Alternative Fonns of Kriging As noted before, ordinary kriging is a linear minimum-variance estimator. There are other folms of linear kriging. For example, if the global mean of the variable is known, the nonbias condition (Eq. 2) is not required. This leads to simple kriging. If, on the other hand, the global mean is not constant and can be expressed as a polynomial function of spatial coordinates, then universal kriging may be used. In many instances, added information is available whenever more than one variable is sampled, provided that some relationship exists between these variables. Co-kriging uses a linear estimation procedure to estimate Z' as n
"'
;~J
j~J
Z' = LA,;Z; + LWjY j
4
were Zj is the i-th measured value of the "primary" variable with a kriging weight of A;, and Yj is the j-th "auxiliary" measured value with a kriging weight of {OJ. Co-kriging is specially advantageous in cases where the primary measurements are limited and expensive, while
ROUHANI ON KRIGING
29
~auxiliary measurements are available at low cost. Ahmed and de Marsily (1987) enhanced their 1limited transmissivity data based on pumping tests with the more abundant specific capacity data. This resulted in an improved transmissivity map. The present STP provides examples of co~'kriging, such as Benson and Rashad (1996) and Wild and Rouhani (1996).
i
(Non-linea, Kriging
~'
i
The above linear kriging techniques do not require any implicit assumptions about the distribution of the interpolated variable. If the investigated variable is multivariate . :normal (Gaussian), then linear estimates have the minimum variance. In many cases where \:the histogram of the measured values displays a skewed tendency a simple transformation may ~,produce normally distributed values. After such a transformation, linear kriging may be used. If Ithe desired transformation is logarithmic, then the estimation process is referred to as lognormal ""kriging. Although lognormal kriging can be applied to many field cases, its estimation process ':requires back-transformation of the estimated values. These back transformation are complicated ,~and must be performed with caution (e.g. Buxton, 1996). ~t Sometimes, the observed data clearly exhibit non-Gaussian characteristics, whose log~transforms are also non-Gaussian. Examples of such data sets include cases of measurements i'with multi-modal histograms, highly skewed histograms, or data sets with large number of \~'below-detection measurements. These cases have motivated the development of a set of r~techniques to deal with non-Gaussian random functions. One of these methods is indicator ~'kriging. In this procedure, the original values are transformed into indicator values, such that ~l:they are zero if the datum value are less than a pre-defmed cutoff level or unity if greater. The ~.',I,'.,stimated value by indicator kriging represents the probability of not-exceedence at a location. "I::!,his technique provides a simple, yet powerful procedure, for generating probability maps "I~OUhani and Dillon, 1990). ~:"underlying
, I
:~~
:~i
~Recommended Sources
'l
i
~or more information on ~iging, readers are referred to Journel and Huijbregts (1978), Marslly (1986), Isaaks and Snvastava (1989), and ASCE (1990). ASTM Standard D 5549, titled: "Standard Guide for Content of Geostatistical Site Investigations," provides information on the various elements of a kriging report. ASTM DI8.0l.07 on Geostatistics has so drafted a guide titled: "Standard Guide for Selection of Kriging Methods in Geostatistical . ite Investigations." This guide provides recommendations for selecting appropriate kriging methods based on study objectives and common situations encountered in geostatistical site ;investigations .
~de
30
GEOSTATISTICAL APPLICATIONS
References
(1)
(2)
(3)
(4)
(5)
(6) (7)
(8)
(9) (10) (11)
(12)
ASCE Task Committee on Geostatistical Techniques in Geohydrology, "Review of Geostatistics in Geohydrology, 1. Basic Concepts, 2. Applications," ASCE Journal of Hydraulic Engineering, 116(5), 612-658, 1990. Ahmed, S., and G. de Marsily, "Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity," Water Resources Research, 23(9), 1717-1737, 1987. Benson, C.H., and S.M. Rashad, "Using Co-kriging to Enhance Subsurface Characterization for Prediction of Contaminant Transport," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Buxton, B.E., "Two Geostatistical Studies of Environmental Site Assessments," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Cromer, M., "Geostatistics for Environmental and Geotechnical Applications," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Delhomme, J.P., "Kriging in the hydro sciences , " Advances in Water Resources, 1(5), 251-266, 1978. Desbarats, A., "Modeling of Spatial Variability Using Geostatistical Simulation," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Goderya, F.S., M.F. Dahab, and W.E. Woldt, "Geostatistical Mapping and Analysis of Spatial Patterns for Farm Fields Measured Residual Soils Nitrates," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Isaaks, E. H. and R M. Srivastava, An Introduction to Applied Geostatistics, Oxford University Press, New York, 561 p., 1989. Joumel, A. G. and C. Huijbregts, Mining Geostatistics, Academic Press, London, 600 p.,1978. Loaiciga, H.A., RJ. Charbeneau, L.G. Everett, G.E. Fogg, B.F. Hobbs, and S. Rouhani, "Review of Ground-Water Quality Monitoring Network Design," ASCE Journal of Hydraulic Engineering, 118(1), 11-37, 1992. Marsily, G. de, Quantitative Hydrogeology, Academic Press, Orlando, 1986.
ROUHANI ON KRIGING
(13) {14) (15)
. (16)
(17)
(18)
31
Rouhani, S., "Variance Reduction Analysis", Water Resources Research, Vol. 21, No.6, pp. 837-846, June, 1985. Rouhani, S., "Comparative study of ground water mapping techniques," Journal of ~ Ground Water, 24(2), 207-216, 1986. Rouhani, S., and M. E. Dillon, "Geostatistical Risk Mapping for Regional Water Resources Studies," Use of Computers in Water Management, Vol. 1, pp. 216-228, V/O "Syuzvodproekt", Moscow, USSR, 1989 . Rouhani, S., and Hall, T.J., "Geostatistical Schemes for Groundwater Sampling," Journal of Hydrology, Vol. 103, 85-102, 1988. Srivastava, R. M., "Describing Spatial Variability Using Geostatistical Analysis," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996. Wild, M.R., and S. Rouhani, "Effective Use of Field Screening Techniques in Environmental Investigations: A Multivariate Geostatistical Approach," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds., American Society for testing and Materials, Philadelphia, 1996.
'7
Alexander J. Desbarats 1
MODELING SPATIAL VARIABILITY USING GEOSTATISTICAL SIMULATION
REFERENCE: Desbarats, J. A., "Modeling Spatial Variability Using Geostatistical Simulation," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. I. Johnson, 1. A. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: This paper, the last in a four part introduction to geostatistics, describes the application of simulation to site investigation problems. Geostatistical simulation is a method for generating digital representations or "maps" of a variable that are consistent with its values at sampled locations and with its in situ spatial variability, as characterized by histogram and variogram models. Continuing the synthetic case study of the three previous papers, the reader is lead through the steps of a gebstatistical simulation. The si;nulated fields are then compared with the exhaustive data sets describing the synthetic site. Finally, it is shown how simulated fields can be used to answer questions concerning alternative site remediation strategies. KEYWORDS : Geostatistics, kriging, simulation, variogram
INTRODUCTION
In a geostatistical site investigation, after we have performed an exploratory analysis of our data and we have modeled its spatial variation structure, the next step is usually to produce a digital image or "map" of the variables of interest from a set of measurements at scattered sample locations. We are then faced with a choice between two possible approaches, estimation and simulation. This choice is largely dictated by study objectives. Detailed guidance for selecting between these two approaches and among the various types of simulation is provided in the draft ASTM Guide for the Selection of Simulation Approaches in Geostatistical Site Investigations. Producing a map from scattered measurements is a classical spatial estimation problem that can be addressed using a non-geostatistical interpolation method such IGeological Survey of Canada, 601 Booth st., Ottawa, ON KIA DES, Canada
32
DESBARATS ON SPATIAL VARIABILITY
33
as inverse-distance weighting or, preferably, using one of the least-squares weighting methods collectively known as kriging discussed in Ro~hani (this volume). Regardless of the interpolation method that is selected, the result is a representation of our variable in which its spatial variability has been smoothed compared to in situ reality. Along with this map of estimated values, we can also produce a map of estimation (or error) variances associated with the estimates at each unsampled location. This map provides a qualitative or, at best, semi-quantitative measure of the degree of uncertainty in our estimates and the corresponding level of smoothing we can expect. Unfortunately, maps of estimated values, even when accompanied by maps of estimation variances, are often an inadequate basis for decision-making in environmental or geotechnical site investigations. This is because they fail to convey a realistic picture of the uncertainty and the true spatial variability of the parameters that affect the planning of remediation strategies or the design of engineered structures. The alternative to estimation is simulation. Geostatistical simulation (Srivastava, 1994) is a Monte-Carlo procedure for generating outcomes of digital maps based on the statistical models chosen to represent the probability distribution function and the spatial variation structure of a regionalized variable. The simulated outcomes can be further constrained to honor observed data values at sampled locations on the map. Therefore, not only does geostatistical simulation allow us to produce a map of our variable that more faithfully reproduces its true spatial variability, but we can generate many equally probable alternative maps, each one consistent with our field observations. A set of such alternative maps allows a more realistic assessment of the uncertainty associated with sampling in heterogeneous geological media. This paper presents an introduction to the geostatistical tool of simulation. Its goals are to provide a basic understanding of the method and to illustrate how it can be used in site investigation problems. To do this, we will continue the synthetic soil contamination case study started in the three previous papers. We will proceed step by step through the simulation study, pausing here and there to compare our results with the underlying reality and the results of the kriging study (Rouhani, this volume). Finally, we will use our simulated fields to answer some questions that can arise in actual soil remediation studies. STUDY OBJECTIVES
The objective of our simulation study is to generate digital images or maps of lead (Pb ) and arsenic (As) concentrations in soil. We will then use these maps to determine the proportion of the site area in which Pb or As concentrations exceed the remediation thresholds of 150 ppm and 30 ppm, respectively. The maps are to reproduce the histograms and variograms of Pb and As in addition to observed measurements at sampled locations. Although the full potential of the simulation method is truly achieved only in sensitivity or risk analysis studies involving multiple outcomes of the simulated maps, we will focus on the generation of a single outcome. In many respects, even a sin~l~Il!~p of simulated concentrations is more useful than a map of kriged values. This is because a realistic portrayal of in situ spatial variability is often a sobering warning to planners whereas maps of kriged values are easily
'I
I
,I
"I "!
!
'i
I"~
iII'!.!; :
i
ii,
i!.
I"i " '1:' li.
II ,[:
1.
i'" "
34
GEOSTATISTICAL APPLICATIONS
misinterpreted as showing much smoother spatial variations. For our study, we have chosen the concentrations of Pb and As as the two regionalized variables to work with. This may seem like an obvious choice however we could have taken another approach based on an indicator or binary transformation of our original variables. The new indicator variables corresponding to each contaminant would take a value of 1 if the concentration exceeds the remediation threshold and a value of 0 otherwise. Proceeding in a somewhat different manner than shown here, we could then generate maps of simulated indicator variables for the two contaminants. From such maps, the proportion of the site requiring remediation is readily determined. The drawback with an indicator approach is that we have sacrificed detailed knowledge of contaminant concentrations in exchange for simplicity and conciseness. Should the remediation thresholds change, new indicator variables would have to be defined and the study repeated. Here, we will stick with the more involved but also more flexible approach of simulating contaminant concentrations. An application of indicator simulation is described in Cromer et al. (this volume). HISTOGRAM MODELS
The first step in our simulation study is to decide what probability distribution functions or, more prosaically, what histogram models are to be honored by our simulated concentrations. We would like these histograms to be representative of the entire site. Often, the raw histograms of sample data are the most appropriate choice. However, here this isn't the case: The sampling of our contaminated site was carried out in two stages. In the first stage, we obtained 77 measurements of Pb distributed on a fairly regular grid. In the second stage, we focused our sampling on areas identified in the first stage as having high Pb concentrations. Furthermore, by then we had become aware that arsenic contamination was present and we analyzed an additional 135 samples for both Pb and As. Thus, our Pb data consist of 77 values that are probably representative of the entire site area and another 135 values drawn from the most contaminated region. As for arsenic, our 135 samples were obtained exclusively from the most contaminated region and are probably not representative of the entire site. The raw histograms of Pb and As shown in Cromer (this volume) reflect the preferential or biased sampling procedure and do not provide adequate models for our simulation. The answer to this problem is to weight our sample data in such a way as to decrease the influence of clustered measurements while increasing that of more isolated values. In geostatistics, this exercise is known as "declustering" and can be accomplished several ways (Isaaks and Srivastava, 1989; Deutsch and Journel, 1992). Here we used a cell declustering scheme to find sample weights. This involved moving a 10 x 10 unit cell over N non-overlapping positions covering the study area. At a each cell position, the number n of samples within the cell was counted and each sample was then assigned a relative weight of liNn. This procedure may be expected to work well for Pb but for As there is no escaping the fact that our samples are restricted to a few small, highly contaminated patches and are hardly representative of the site as a whole. Obtaining a reasonably representative histogram is crucial for a simulation
35
DESBARATS ON SPATIAL VARIABILITY
Nunom.rofO.ta 212
~mbtrotD. 1~
m.." std. dri. coef. ITI4IlCimum upper (jUn.
300.6708 225.5126 0.7500 1003,0000 454.0000 machn 274.5673 w ... quam. 103.9253 rrinimum 0.0000
me.,
'4.3813 . .. day. 28.4787
0.600
of".
ooef. of " . 1.9803
maxitn.lm upper qu.-tll medi., lower quart. minimum
0.:500
g
0._
1
157.0000 11 .0000 1.9190 0 .0000 0.0000
0 .300
0 .200
0.100
o.
'0.
100.
1.0.
Figure 1: Declustered histograms of a) Pb and b) As . study therefore desperate measures are called for. Although no geostatistical method can truly compensate for lack of data, the following "fix" was attempted here: Using our knowledge of the correlation between Pb and As provided by the 135 samples of the second sampling campaign (Cromer, this volume), we filled in the missing As values at the 77 locations of the first campaign. For each of the 77 Pb values, we looked up the closest Pb value from the second campaign and read off the corresponding As value. Thus, all 212 sample locations have both Pb and As measurements and the same declustering weights can be used for both variables. The resulting histograms of weighted Pb and As samples are shown in Figures 1 a) and b), respectively. They should be compared with the un-declustered histograms shown in Cromer (this volume). We now have histograms that, we think, provide reasonable models of the exhaustive distributions of Pb and As that we are trying to replicate in our simulated fields. A peek at the true exhaustive distributions (Cromer, this volume) shows that our declustered Pb histogram does a fairly good job of reproducing the main statistical parameters whereas our As histogram does a rather mediocre job despite our best efforts. Further discussion of the declustering issue can be found in Rossi and Dresel (this volume). NORMAL-SCORE TRANSFORMATION OF VARIABLES
The next step of our study involves transforming our Pb and As sample values into standard Normal deviates (Deutsch and Journel, 1992). This "normal-score" transformation is required because the simulation algorithm we will be using is based on the multivariate Normal (or Gaussian) distribution model and assumes that all sam-
200.
36
GEOSTATISTICAL APPLICATIONS
pIe data are drawn from such a distribution. In simple terms, this transformation is performed by replacing the value correponding to a given quantile of the original distribution with the value from a standard Normal distibution associated with the same quantile. For example, a Pb value of 261 ppm correponding to a quantile of 0.50 (i.e. the median) in the sample histogram is transformed into a value of 0 corresponding to the median of a standard Normal distribution. In mathematical terms, we seek the transformations ZI and Z2 of Pb and As such that :
G(Zd
(1 )
where G( ) is the cumulative distribution function (cdf) of a standard Normal distribution and FI ( ) and F 2 ( ) are the sample cdfs for lead and arsenic, respectively. Implementation of this transformation is fairly straightforward except when identical sample values are encountered. In such cases, ties are broken by adding a small random perturbation to each sample value and ranking them accordingly (Deutsch and Journel, 1992). Here, this "despiking" procedure was required to deal with a large number of below-detection As values. In general however, it is good practice to avoid extensive recourse to this procedure. If, for example, large numbers of samples have values below detection limits, it is better to subdivide the data set into two populations, above and below detection, and analyze each group seperately, or adopt an indicator approach (Zuber and Kulkarni, this volume). PRINCIPAL COMPONENT TRANSFORMATION
Before we can proceed to the analysis of spatial variation, one last step is required. Our simulation algorithm can only be used to generate fields of one variable at a time. However, we wish to simulate two variables ZI and Z2, reproducing not only their respective spatial variation structures but also the relationship between them shown in Figure 2. We must therefore "decouple" the variables ZI and Z2 so that we can simulate them independently. To do this, we use the following principal component transformation which yields the independent variables Yi and 12 from the correlated variables ZI and Z2 :
(2)
12 = where p is the correlation coefficient between ZI and Z2 which is found to be 0.839.
DESBARATS ON SPATIAL VARIABILITY Pb
II'S As
37
: Declustered Normal Scores Nurrber 01 daIa 135 Nurrbertrlmmed n
3.00
X Varillb..: mean 0.887 otd. daY. 0.783
0 0
.;i-';.
2.00
0
1.00
~ 0.00
Y Varillb..: mean 0.80S Old. dey. 0.806
00
00
0
,
0 0
..
0
0
correlation 0.839 rank correlation 0.869
0
0
0
-1.00 0
0
-2.00
.3.00-h-T""I"''TTTT"'TTTT"......,..,...'''..,..,.....,..,..,..,....,..,..,..,...,.... -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z1
Figure 2: Scatter plot of Zl and Z2 for 135 sample values. The correlation coefficient is 0.839. VARIOGRAM MODELS
In this section, we examine and model the spatial variation structure of the two independent variables Y1 == Zl and l'2. The jargon and the steps involved in an analysis of spatial variation are described in more detail by Srivastava (this volume) so that only a summary of results is given here. Directional variograms, or more specfically correlograms, were calculated for Y1 using all 212 data values, and for 1'2 using the 135 values of the second sampling campaign. For each variable, eight directional correlograms were calculated at azimuth intervals of 22.5° using overlapping angular tolerances of 22.5°. Lag intervals and distance tolerances were 10 grid units and 5 grid units, respectively for Yi and 5 grid units and 2.5 grid units, respectively for 1'2. The purpose of these directional correlograms is to reveal general features of spatial variation such as directional anisotropies and nested structures. Results for Yi. and Y2 are shown in Figures 3 a) and b), respectively. These figures provide a planimetric representation of spatial correlation structure, displaying correlogram values as a surface, function of location in the plane of East~West (x) and North-South (y) lag components. For Yi., we observe, in addition to a significant nugget effect, what we interpret as two nested structures with different principal directions of spatial continuity. The first, shorter scale, structure has a direction of maximum continuity approximately North North-West, a maximum range of about 20 grid units and an anisotropy ratio of about 1.4 : 1. The second, larger scale structure has a direction of maximum continuity approximately West North-West, an indeterminate maximum range and a minimum range of at least 40 grid units.
38
GEOSTATISTICAL APPLICATIONS
--'§.s
40.0
20.0
"d .~
-
20.0
10.0
...:I
0.0
0.0
-20.0
-10.0
~
~
::s0 rn I
:E
Z
-40.0 -40.0 -20.0
0.0
20.0
-20.0 40.0 -20.0 -10.0
East-West Lag (grid units)
0.0
10.0
20.0
East-West Lag (grid units)
Figure 3: Directional correlograms (planimetric view) for a)
Yi and b) Y2.
Detailed experimental correlograms were calculated in the directions of maximum and minimum continuity of the larger scale structure. They are shown in Figures 4 a) and b), respectively. These correlograms were used in the fitting of a model to the spatial variation structure of Yi. The fitted model is the sum of three components: 1. A nugget effect accounting for 35% of the spatial variance.
2. A short-scale structure accounting for 50% of the spatial variance. It is represented by an exponential model with maximum continuity in the North NorthWest direction. It has a range parameter of 6 grid units and an anisotropy ratio of 1.43 : 1. 3. A large-scale structure accounting for 15% of the spatial variance. This structure is also represented by an exponential model with, however, maximum continuity in the West North- West direction. The model range parameter is 300 grid units with an anisotropy ratio of 10 : 1. Such a large maximum range ensures that the "sill" value of the structure is not reached in the direction of maximum continuity within the limits of the site. What we have done here is model a "zonal anisotropy" (Journel and Huijbregts, 1978) as a geometric anisotropy with an arbitrarily large range in the direction of greatest continuity. This model is also ·shown in Figures 4 a) and b) for comparison with the experimental results. For Y2 , there are less data and we are careful not to over-interpret the directional correlograms. Indeed, the apparent periodicity in the NNE direction is probably an artifact of the sampling pattern . With somewhat more confidence, we note a
39
DESBARATS ON SPATIAL VARIABILITY
0.800 Co".log,.m
,*,d:Z1
1I1/:Z1
dlrecHon 1
0.600 Comlog,.m
0.700
0.500
0.600
0.<400
f >
direction 2
0.300
0.500
~
/wd:Zf
II/I/:Z1
~
f
0.«10
>
0.300
0.200 0.100
0.200
0.000
0.100
.Q.loo .Q.2OO
0.000 0.0
10.0
20.0
30.0
40.0
SO.O
60.0
Dstanoe
0.0
10.0
20.0
30.0
40.0
SO.O
I
60.0
Dstanoe
Figure 4: Experimental correlograms for }'J in a) direction WNW; b) direction NNE. The fitted model is shown in dashed lines. strong nugget effect and a structure with maximum continuity in the West NorthWest direction, a maximum range of about 30 grid units and an anisotropy ratio of about 3 : 1. Detailed directional correlograms were calculated for the directions of maximum and minimum continuity and are shown in Figures 5 a) and b), respectively. The model fitted to these correlograms is the sum of two components: 1. A nugget effect accounting for 55% of the spatial variance. 2. A structure accounting for 45% of the spatial variance. This structure is represented by an exponential model with greatest continuity in the West North-West direction . The model has a range parameter of 8 grid units and an anisotropy ratio of 3 : 1. This mQdel is shown in Figures 5 a) and b) for comparison with the experimental results. SEQUENTIAL GAUSSIAN SIMULATION
We are now ready to simulate fields of the two independent standard Normal variables, Y1 and Y2 • The simulations of Y1 and Y2 are to be conditioned on 212 and 135 sample values, respectively. Both fields are simulated on the same 110 x 70 grid as the exhaustive data sets for Pb and As . To perform our simulations, we are going to use the Sequential Gaussian method. This method is based on two important theoretical properties of the multivariate Normal (or Gaussian) distribution: First, the conditional distribution of an unknown
40
GEOSTATISTICAL APPLICATIONS
0.600 Corre/ogram
IaIl:Y2
heIId:Y2
dlreclJon 1
0.600 Cotrelogram
0.500
direction 2
0.400
\
0.300
~
_:Y2
0.500
0.400
! 8'
IIIII:Y2
0.300
" "- "
0.200 0.100
! 8'
~
'"
0.200 0.100
---
0.000
"-
·0.100
-0.100
·0.200
-0.200 0.0
5.0
10.0
15.0
20.0
25.0
30.0
0.0
5.0
Distance
Distanct
Figure 5: Experimental correlograms for Y2 in a) direction WNW; b) direction NNE. The fitted model is shown in dashed lines. variable at a particular location, given a set of known values at nearby locations, is Normal. Second, the mean and variance of this conditional distribution are given by the simple kriging (SK) estimate of the unknown value and its associated error variance. Simple kriging is a variant of ordinary kriging (OK) described in Rouhani (this volume). Then, it follows that since the conditional distribution is Normal, it is completely determined by the mean and variance provided by simple kriging. The Sequential Gaussian Simulation algorithm is described in detail elsewhere (Deutsch and Journel, 1992; Srivastava, 1994) ; however, because of its simplicity, it is briefly outlined here: 1. Start with a set of conditioning data values at scattered locations over the field to be simulated. 2. Select at random a point on the grid discretizing the field where there is not yet any simulated or conditioning data value. 3. Using both conditioning data and values already simulated from the surrounding area, calculate the Simple Kriging estimate and corresponding error variance. These are the mean and variance of the conditional distribution of the unknown value at the point given the set of known values from the surrounding area. 4. Select at random a value from this conditional distribution. 5. Add this value to the set of already simulated values.
DESBARATS ON SPATIAL VARIABILITY
41
6. Return to step 2 and repeat these steps recursively until all points of the discretized field have been assigned simulated values. Thus, in many ways, the Sequential Gaussian simulation method is similar to the point kriging process described by Rouhani (this volume). The difference is that we are drawing our simulated value at random from a distribution having the kriged estimate as its mean, rather than using the kriged estimate itself as a "simulated" value. Intuitively, we see how this process leads to fields having greater spatial variability than fields of kriged values. BACK-TRANSFORMATIONS
We now have simulated fields of the two independent standard Normal variables Y1 and )12. In order to obtain the corresponding fields of Pb and As , we must reverse the earlier transformations. First, we reverse equations (2) to get the correlated standard Normal variables Zl and Z2 from Yi and Y2. Then we reverse equations (1) to get the variables Pb and As from the standard Normal deviates Zl and Z2. Finally, we are left with simulated fields of Pb and As on a dense 110 x 70 grid discretizing the site. Although here we are focusing on single realizations of each of these fields, multiple realizations can be generated by repeating the simulation step using different seed values for the random number generator. COMPARISON OF TRUE AND SIMULATED FIELDS
In addition to honoring values of Pb and As at sampled locations, the simulated fields should reproduce the histogram and correlogram models that we used to characterize contaminant spatial variability. These fields should also reproduce the correlation between Pb and As . Because this is a synthetic case study, we have exhaustive knowledge of Pb and As contamination levels over the entire site, something we would never have in practice. Therefore, we can conduct a postmortem of our study, comparing our simulated fields with the exhaustive fields described in Cromer (this volume). In order to compare the distributions of true and simulated values, we will use what is known as a Q-Q plot. This involves plotting the quantile of one data set against the same quantile of the other data set. Thus, we would plot the median (0.5 quantile) of our simulated values against the median of our true values. Therefore, if the histograms of the two data sets are similar, all points should plot close to the 45° line. Figures 6 a) and b) show Q-Q plots between exhaustive and simulated values of Pb and As , respectively. These results show that while we did a rather good job of reproducing the exhaustive histogram of Pb , we can claim no great success for As . Although we did our best to correct for the effects of a grossly unrepresentative sampling of As , in the end, this was not good enough. This failure serves as a reminder that geostatistics alone cannot compensate for a biased site sampling campaign. Next, we check how well our simulation reproduced the correlation between Pb and As concentrations. Figure 7 shows a scatter plot of simulated Pb versus As values. This figure is to be compared with the scatter plot of true values given by
42
GEOSTATISTICAL APPLICATIONS 200. Q-Q Plot for A. Slmul.tlon
1000. Q-Q Plot for Pb Slmul.tlon
160.
800.
]
600.
."
i
~ r/)
I
120.
1~
80.
j
~
400.
r/)
40.
200.
o.
200.
400.
600.
True valu••
800.
1000.
o.
40.
80.
120.
160.
200.
True vw..
Figure 6: Q-Q quantile plots of exhaustive and simulated data: a) Pb ; b) As Cromer (this volume). The comparison shows that we did quite a respectable job of reproducing the relationship between Pb and As in our simulated fields. Directional correlograms calculated on our two simulated fields are shown in Figures 8 a) and b). The main features of these correlograms compare favorably with those observed in the correlograms presented by Srivastava (this volume). Given the limited number of data and their spatial clustering, the models we fitted to the experimental correlograms were quite successful in representing the true spatial variation structures of Pb and As . No comparison of true and simulated fields would be complete without looking at images or maps of the simulated fields. Although qualitative, the visual comparison of simulated and true fields is in fact the most stringent measure of the success of our simulation. We must check how well we have captured the character of contaminant spatial variability at the site, its "noisyness", the grain of any spatial patterns, and any trends. We should also check to see what our simulated values are doing in areas far from conditioning data values. The spatial variability in such areas should be consistent in character with that observed in more densely sampled areas.'" Grey-scale digital images of simulated Pb and As fields are shown in Figures 9 and 10, respectively. Comparison with the corresponding true images in Cromer (this volume) shows that we have reason to be satisfied with our simulation. Discrepancies between simulated and true fields exist; however, these are manifestations of the uncertainty associated with our knowledge of site contamination as provided by the rather limited sampling data. It should be emphasized that we are looking at but one pair of images of contamination from amongst the many equally possible alternatives that would be consistent with sampling information. We can also compare Figure 9
DESBARATS ON SPATIAL VARIABILITY
43
vs As : Simulated values Number of data 7700 X Variable: mean 296.479 std. dey. 225.313
160.
Y Variable: mean 15.619 std. dey. 30.076 correlation 0.722 rank correlation 0.881
o.
200.
600.
400.
800.
1000.
Pb
Figure 7: Scatter plot of simulated Ph and As values
40.0 - -- - - "u-.::----,'-J,..,----.;:,-----, 20.0 t-=l
~
0.0
as~
-20.0
~ Z
20.0
-20.0 0
-40. 0 ~----r--"'r----Y'-----,""::"-fL-.----Y----, -40. 0 ~~--f.:""""';~,uLr--.-'-~..c:; -40.0 -20.0 0.0 20.0 40.0 -40.0 -20.0 0.0 20.0 40.0
East-West Lag (grid units)
East-West Lag (grid units)
Figure 8: Directional correlograms (planimetric view ) for simulated a) Ph and h) As .
44
GEOSTATISTICAL APPLICATIONS
0.0 .-.
.....s
§
~ 0 0 00
20.0
'1:l .~
.-.
S p.
'-'
til)
.S
40.0
p.
'-'
~ 0 0
~
Z
S!
~
60.0 0.0
20.0
40.0
60.0
80.0
100.0
Easting (grid units)
0
0
Figure 9: Grey-scale digital image of the simulated Pb field. with the kriged field shown in Rouhani (this volume) . We see that kriging smoothes spatial variations in a non-uniform manner: less in regions with abundant sample control, more in unsampled regions . This may lead the unsuspecting to conclude that large portions of the site are quite homogeneous! Simulation, on the other hand, preserves in-situ spatial variability regardless of the proximity of sampling points. APPLICATION
Now that we have simulated fields of Pb and As that we confidently assume are representative of the true yet unknown contamination at the site, we can use these fields to answer some simple questions . Perhaps the most basic question that we may ask is what fraction of the site requires remediation given the contamination thresholds of 150 ppm and 30 ppm for Pb and As , respectively? However, before we can attempt to answer that question, we must decide on a "volume of selective remediation" or VSR. Note that the concept of volume of selective remediation is identical to that of selective mining unit (smu) described in the mining geostatistics literature (chapter 6 of Journel and Huijbregts, 1978; chapter 19 of Isaaks and Srivastava, 1989). The volume, or in the present case, area of selective remediation is the smallest portion of soil that can be either left in place or removed for treatment, based upon its average contaminant concentration. The VSR may depend on several factors including the size of equipment being used in the remediation and the sampling information ultimately available for the selection process. It is an important· design parameter because the variance of spatially averaged concentrations decreases as the
DESBARATS ON SPATIAL VARIABILITY
45
0
0
0.0
to
,-..
~ .....
§
20.0 0
'0 .~
0
"
bO
.S
40.0
p.
'-"
1!
Z
,-..
S p.
'-"
~ 0
<
C"I
60.0 0.0
20.0
40.0
60.0
80.0
100.0
Easting (grid units)
0
0
Figure 10: Grey-scale digital image of the simulated As field. VSR becomes larger. This reduces the spread and alters the shape of the histogram of contaminant concentrations thereby affecting the proportion of values above a given threshold and the fraction of the site requiring remediation. Here, the original sample size or "support", as it is known in geostatistics, is a square of 1 x 1 grid units (5m x 5m). The corresponding standard deviations of Pb and As concentrations are 218 ppm and 35 ppm, respectively. If we were to consider a VSR with a support of 10 x 10 grid units (50m x 50m), the standard deviations of VSR-averaged Pb and As concentrations are reduced to 172 ppm and 18 ppm, respectively. In Tables 1 and 2, we compare fractions of the site requiring remediation for VSRs of 1 x 1 grid units and 10 X 10 grid units, respectively. Within each table, we also compare remediation fractions based on kriged, simulated and true values. For selection based on Pb concentration alone, results for both VSR sizes show good agreement between remediated fractions calculated on simulated and true fields. Remediated fractions based on kriged fields are overestimated for the smaller VSR. We note that the fraction of the site requiring remediation increases for the larger VSR. This is because the spatial averaging of Pb concentrations over a VSR smears high values over the entire block area thereby pushing its average over the remediation threshold. The same phenomenon may also happen in reverse, with low values diluting a few high values and thus lowering the average VSR concentration below threshold. In either case, it is obvious that the choice of VSR will have a significant impact on the fraction of the site requiring remediation. For selection based on As values alone, remediated fractions calculated on the simulated fields are almost half those calculated on the true fields. On the other hand, remediated fractions based on the kriged fields are much larger than those
46
GEOSTATISTICAL APPLICATIONS
Table 1: Fraction of site requiring remediation based on a VSR of 1 x 1 grid units. The Pb threshold is 150 ppm and the As threshold is 30 ppm.
Field
Kriged Simulated True
Pb cutoff
As cutoff
Combined cutoff
0.7956 0.6793 0.6998
0.4327 0.1692 0.2474
0.8360 0.6796 0.7026
Table 2: Fraction of site requiring remediation based on a VSR of 10 x 10 grid units. The Pb threshold is 150 ppm and the As threshold is 30 ppm.
Field
Kriged Simulated True
I'll
Pb cutoff
As cutoff
Combined cutoff
0.7975 0.7922 0.7792
0.4248 0.1688 0.3116
0.8011 0.7922 0.7792
based on the true fields. The cause of the poor simulation results can be traced back to our difficulties in obtaining a representative histogram for As concentrations. The poor kriging results are due to smearing of As values from the densely sampled highly contaminated zone to the surrounding area. Considering only the results for the true field, we see an increase in remediated fraction with the larger VSR size, as we saw previously with Pb. For selection based on either Pb or As threshold exceedance, results are similar to those for Pb selection alone : VSRs that would otherwise be misclassified based on their As value are correctly classified based on their Pb value. Although the simulated Pb field gave remediation fractions close to those obtained for the true field, this may be partly fortituous and, in any case, does not ensure that the blocks selected for remediation are the correct ones i.e. the same as in the true field. In practice, multiple simulations should be performed and, for each VSR within the site, a contamination threshold exceedance probability should be calculated from the resulting distribution of simulated concentrations for that location. The decision on whether or not to renlediate a given VSR would then be based on its threshold exceedance probability and not on a single concentration value.
DESBARATS ON SPATIAL VARIABILITY
47
CONCLUSIONS
In this paper, we have described the steps involved in a geostatistical simulation study, going from a small and possibly biased sample data set to a detailed numerical representation of contamination levels at a hypothetical site. We have shown that geostatistical simulation is a tool for producing maps of a variable that honor data values at sampled locations and models of the histogram and spatial variation structure that characterize the phenomenon. We have shown that maps of a variable produced by simulation are, in general, more useful than maps produced by kriging or other spatial interpolation methods because they provide a more faithful representation of in situ variability. We have shown that geostatistical theory and conditional simulation provide a powerful means of studying alternative remediation strategies based on the concept of volume of selected remediation. We have shown, perhaps unintentionally, that geostatistical methods are not a panacea. They cannot compensate for insufficient or grossly unrepresentative sampling. In the end, a geostatistical study is only as good as the sampling data that it is based on. Hopefully, through the case study, we have shown that the geostatistical approach is flexible, allowing for the incorporation of much collateral information and expert judgement concerning a site that might otherwise be neglected. Indeed, the tailoring of a geostatistical approach to specific site conditions is the hallmark of a successful study. With the overview of geostatistical simulation provide here, the reader should now be able to fully appreciate the subsequent papers on the topic contained in these proceedings. ACKNOWLEDGMENTS
The author wishes to thank Doug Hartzell, who coined the term "VSR", and one anonymous reviewer for their comments on the original manuscript. Geological Survey of Canada contribution no. 20995. REFERENCES
ASTM Standard Guide for the Selection of Simulation Approaches in Geostatistical Site Investigations, American Society for Testing and Materials, Philadelphia, draft submitted for Society approval by section DI8.01.07. Cromer, M., 1996, Geostatistics for Environmental and Geotechnical Applications: A Technology Transfer, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. , American Society for Testing and Materials, Philadelphia.
.1
!
I'
I
,, I I: ,,
I
:!
48
GEOSTATISTICAL APPLICATIONS
Cromer, M.V., C.A. Rautman and W.P. Zelinski, 1996, Geostatistical Simulation of Rock Quality Designation (RQD) to Support Facilities Design at Yucca Mountain, Nevada, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. , American Society for Testing and Materials, Philadelphia. Deutsch, C.V. and A.G. Journel, 1992, GSLIB : Geostatistical Software Library and User's Guide, Oxford University Press, New York. Isaaks, E.H. and R.M. Srivastava, 1989, An Introduction to Applied Geostatistics, Oxford University Press, New York. Journel, A.G. and C. Huijbregts, 1978, Mining Geostatistics, Academic Press, London. Rossi, R.E. and P.E. Evan Dresel, 1996, Declustering and Stochastic Simulation of Ground- Water Tritium Concentrations at Hanford, Washington, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. ,American Society for Testing and Materials, Philadelphia. Rouhani, S., 1996, Spatial Variability and Geostatistical Estimation : Kriging, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. , American Society for Testing and Materials, Philadelphia. Srivastava, R.M., 1994, An Overview of Stochastic Methods for Reservoir Characterization, in Stochastic Modeling and Geostatistics : Principles, Methods and Case Studies, J. Yarus and R. Chambers, Eds. , American Association of Petroleum Geologists, Computer Applications 3, p.3-16, American Association of Petroleum Geologists, Tulsa. Srivastava, R.M., 1996, Describing Spatial Variability Using Geostatistical Analysis, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. , American Society for Testing and Materials, Philadelphia. Zuber, R.D. and R. Kulkarni, 1996, A Geostatistical Analysis of Lake Sediment Contaminants at a Superfund Site, Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Eds. , American Society for Testing and Materials, Philadelphia.
, "
Environmental Applications
Bruce E. Buxton 1 , Darlene E. Wells 2 , Alan D. Pate 3 GEOSTATISTICAL SITE CHARACTERIZATION OF HYDRAULIC HEAD AND URANIUM CONCENTRATION IN GROUNDWATER
Bl!IFEBENCE: Buxton, B. E., Wells, D. E., Pate, A. D., "Geostatistical Site Characterization of Hydraulic Head and Vranium Concentration in Groundwater," Geostatistlos for Environmental and Geoteohnioal Applloatlons, ASTM STP 128:3, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, and Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: The first case study presented in this paper describes an assessment of the spatial distribution and temporal changes in hydraulic head pressure in the groundwater beneath a retired federal government uranium processing facility. Analysis of the hydraulic heads involved ordinary kriging which was found to be a better mapping method than such alternatives as inverse-distance weighting, mainly because kriging provides measures of estimation uncertainty. The objective of this kriging was to provide estimated steady-state head values for use in calibrating a groundwater flow model for the site. In the second case study, the spatial distribution of potential uranium contamination in the aquifer was assessed with lognormal kriging. Uranium measurements for this analysis were available at roughly three-month intervals across a four-year time period. The objective of the analysis was to assess where the uranium concentrations were highest. A second objective, not addressed in this paper, was to determine if the concentrations were changing significantly during the four-year time period. KEYWORDS: analysis
ordinary kriging, lognormal kriging, joint spatial temporal
Kriging is a statistical interpolation method for analyzing spatially and temporally varying data. It is used to estimate groundwater hydraulic heads (or any other important parameter) on a dense grid of spatial and temporal locations covering the region of interest. At each location, two values are calculated with the kriging procedure: the estimate of hydraulic head (in feet above sea level), and the precision of the estimate (also in feet above sea level). The precision can be interpreted as the half-width of a 95% confidence interval for the estimated head. The kriging approach includes two primary analysis steps: 1. Estimate and model temporal and spatial correlations in the available monitoring data using a semivariogram analysis.
lprogram Manager, Battelle, 505 King Avenue, Columbus, OR
43201.
2S en ior Data Analyst, Battelle, 505 King Avenue, Columbus, OR 43201.
3Research Scientist, Battelle, 505 King Avenue, Columbus, OH 43201.
51
52
GEOSTATISTICAL APPLICATIONS
2. Use the resulting semivariogram model and the available monitoring data to interpolate (i.e., estimate) hydraulic head values at unsampled times and locations; calculate the statistical precision associated with each estimated value. Spatial Correlation Analysis The objective of the spatial correlation analysis is to statistically determine the extent to which measurements taken at different locations and/or times are similar or different. This section is written in terms of hydraulic head measurements; however, the analysis approach is similar for any measured parameter of interest. Generally, the degree to which head measurements taken at two locations are different is a function of the distance and direction between the two sampling locations. Also, for the same separation distance between two sampling locations, the spatial correlation may vary as a function of the direction between the sampling locations. For example, head values measured at each of two locations, a certain distance apart, are often more similar when the locations are at the same depth, than when they are at the same distance apart but at very different depths. Spatial/temporal correlation is statistically assessed with the semivariogram function, r(h), which is defined as follows (Journel and Huijbregts, 1981):
where Z(~) is the hydraulic head measured at location ~, h is the vector of separation between locations ~ and ~+h, and E represents the expected value or average over the region of interest. Note that the location ~ might be defined by an easting, northing, and depth coordinate, or for joint spatial/temporal data by an easting, northing, and time coordinate. Similarly, the vector of separation might be defined as a three-dimensional shift in space, or for joint spatial/temporal data as a shift in both space and time. The semivariogram is a measure of spatial differences, so that small semivariogram values correspond to high spatial correlation, and large semivariogram values correspond to low correlation. As an initial hypothesis, it is always wise to assume that the strength of spatial correlation is a function of both distance and direction between the sampling locations. When the spatial correlation is found to depend on both separation distance and direction it is said to be anisotropic. In contrast, when the spatial correlation is the same in all directions, and therefore depends only on separation distance, it is said to be isotropic. The spatial correlation analysis is conducted in the following steps using all available measured hydraulic head data: •
Experimental semivariogram curves are generated by organizing all pairs of data locations into various separation distance and direction classes (e.g., all pairs separated by 500-1500 ft (150-450 m) in the east-west direction ± 22.5°), and then calculating within each class the average squared-difference between the head measurements taken at each pair of locations. The results of these calculations are plotted against separation distance and by separation direction.
•
To help fully understand the spatial correlation structure, a variety of experimental semivariogram curves are generated by subsetting the data into discrete zones, such as different depth horizons or time periods. If significant differences are found in the semivariograms they are modeled separately; if not, the data are pooled together into a single semivariogram.
BUXTON ET AL. ON SITE CHARACTERIZATION
•
53
After the data have been pooled or subsetted accordingly, and the associated experimental semivariograms have been calculated and plotted, a positive-definite analytical model is fitted to each experimental curve. The fitted semivariogram model is then used to input the spatial correlation s~ructure into the subsequent kriging interpolation step.
In this study, the computer software used to perform the geostatistical calculations was the GSLIB software written by the Department of Applied Earth Sciences at Stanford University, and documented and released by Prof. Andre Journel and Dr. Clayton Deutsch (Deutsch and Journel, 1992). The primary subroutine used to calculate experimental semivariograms was GAMV3, which is used for threedimensional, irregularly spaced data. •
For three-dimensional spatial analyses, horizontal separation distance classes were defined in increments of 1000 ft (300 m) with a tolerance of 500 ft (150 m), while vertical distances were defined in increments of 20 ft (6 m) with a tolerance of 10 ft (3 m). Horizontal separation directions were defined in the four primary directions of north, northeast, east, and northwest with a tolerance of 22.5°.
•
For the joint spatial/temporal analysis, spatial separation distances and directions were defined in the same way as described immediately above, although there was no vertical direction associated with this analysis. For the temporal portion of this analysis, separation distance classes were defined in increments of 30 days with a tolerance of 15 days.
Interpolation Using Ordinary Kriging Ordinary kriging is a linear geostatistical estimation method which uses the semivariogram function to determine the optimal weighting of the measUrecfhydraulic head values to be used for the required estimates, and to calculate the estimation precision associated with the estimates (Journel and Huijbregts, 1981). In a sense, kriging is no different from other classical interpolation and contouring algorithms. However, kriging is different in that it produces statistically optimal estimates and associated precision measures. It should be noted that the ordinary kriging variance, while easy to calculate and readily available from most standard geostatistical software packages, may have limited usefulness in cases where the data probability distribution is highly skewed or non-gaussian. The ordinary kriging variance provides a precision measure associated with the data density and spatial data arrangement relative to the point or block being kriged. However, the ordinary kriging variance is independent of the data values themselves, and therefore may not provide an accurate measure of local estimation precision (e.g., appropriate width of estimation confidence interval). The kriging analysis was conducted in this study using the GSLIB computer software (subroutine KTB3D). The primary steps involved in this analysis were as follows: •
A three-dimensional grid was defined, specifying the locations at which estimated head values were required. The network included 112 blocks in the northern direction and 120 blocks in the eastern direction, and all blocks were 125 ft (37.5 m) square. For three-dimensional spatial kriging, the network included 30 vertical blocks 5 ft (1.5 m) thick. For joint spatial/temporal kriging, the network included 43 monthly blocks in increments of 30 days, starting at January 20, 1990.
•
At each block in the grid, the average hydraulic head across the block was estimated using all measured data found within a
54
GEOSTATISTICAL APPLICATIONS
pre-defined search radius. For three-dimensional spatial kriging of steady-state hydraulic head, the search radius was 6000 ft (1800 m) in all directions. For joint spatial/temporal kriging, the search radius was anisotropic and extended 6000 ft (1800 m) in space and 72 days in time. •
After the available data were identified for each grid block, the appropriate data weighting, estimated hydraulic head, and estimation precision were calculated using the appropriate semivariogram model.
•
Output from the kriging process was typically displayed in the form of contour maps, to represent spatial variations, and time-series graphs to represent temporal variations.
ANALYSIS OF HYDRAULIC HEAD Steady-state hydraulic heads were needed for calibration of a steady-state groundwater flow model. A two-step data analysis approach was used to estimate the steady-state heads. 1. A joint spatial-temporal kriging analysis was performed to estimate monthly hydraulic head changes at one depth horizon, and to select a single month representative of steady-state conditions. 2. A three-dimensional spatial kriging analysis was performed with data from the selected month at all available depth horizons to estimate steady-state hydraulic heads. Joint Spatial-Temporal Analysis The joint spatial-temporal kriging analysis was performed using monthly hydraulic head measurements in 177 wells (Figure 1) collected during the period from January, 1990 through July, 1993. Figure 1 shows the well locations where data were available for only the joint spatialtemporal analysis (denoted "JST" in the figure), for only the threedimensional steady-state analysis (denoted "55"), and for both of the analyses (denoted "Both"). There were a total of 3791 joint spatialtemporal measurements analyzed; and the minimum, maximum, mean, and standard deviation of these data were 493.7, 568.9, 519.9, and 5.3 feet (148.1, 170.7, 156.0, 1.6 m), respectively. The semivariogram curves, quantifying spatial and temporal correlation in these data, are shown in Figures 2 and 3. The spatial semivariograms in Figure 2 were calculated for four standard directions -- north, northeast, east, and northwest. These semivariograms show clear anisotropy with the highest variabilities directed north along the predominant flow direction, and the lowest variabilities directed east in the direction perpendicular to predominant flow. The corresponding temporal semivariogram for the monthly hydraulic heads is shown in Figure 3. Note that the units for separation distances between data locations are in days in Figure 3 and in feet in Figure 2. The semivariograms in Figures 2 and 3 were modeled with an anisotropic mathematical model containing three nested variance structures; the parameters of the model are listed in Table 1. Note in this table that three types of semivariogram models were used in various parts of these analyses: spherical, gaussian, and linear models. These models are fully described by Journel and Huijbregts (1981). In Figure 2 the bold line denotes the model in the north direction; the dashed line denotes the model in the northeast or northwest direction; and the dotted line denotes the model in the east direction.
55
BUXTON ET AL. ON SITE CHARACTERIZATION
15000 0 0
13500
0 0
12000
0 0
0 0
0 0 800 0 of} 0
10500 0
,..... ~
....Q)Q) E
'-"
9000
0
~o 0
0
6000
&&0
4500 0
3000
1500
0
o
8
o§o
0
0
~o
6
0 0
0
{)
o
o
a
0 %.00 0
t:9
0
0 0
Cboo 0009 0 0 000 o 0 00 00 0<::0 0 0
.c t Z
0
0 o 0
7500
0
0:0 0
OCQ,%>O~o
00 0
0 0
0 0
0
0 6
0 0
tf::.
0
0 0
0
6
£t:,
0
66 6
o 0 o 0 o 00 0 o 00 0 0 00 0 0 0 I
0
1500
3000
4500
6000
7500
I
I
I
9000 10500 12000 13500 15000
East (meters)
IWell
Locations
000
Both
000
J5T
666 55
Fig. 1 -- Well locations where hydraulic head levels were monitored from January , 1990 through July, 1993. The monthly head measurements were used along with the semi v ariogram model to estimate via kriging monthly changes in the heads a c ross the entire groundwater modeling grid . The time period for this kriging analysis was taken as every 30 days starting January 20, 1990 and ending July 2, 1993 . The monthly heads (in feet above mean sea level) are depicted in Figure 4 for six locations uniformly spaced across the groundwater modeling grid. This figure shows that hydraulic heads during this period were relativ ely high in 1990, decreased in 1991 , were relatively low in 1 9 92 , and were increasing in 1993 .
56
GEOSTATISTICAL APPLICATIONS
40.0 36.0 32.0 ~
a;
28.0
Q)
..:!::; 24.0 -
E
CU
0,
20.0
.Q .....
16.0
CU
>
'EQ)
12.0
III
8.0 4.0 0.0
0
600
1200
1800
2400
3000
3600
4200
4800
5400
6000
separation distance (feet) Fig. 2--Spatial semivariograms f rom joint spatial -temporal analysis of hydraulic head . Not e that 1 foot ; 30 cm .
2000 Wells Joint sIt - temporal 10.0 9.0 8.0 ~
.....Q)
7.0
Q)
LL
6.0
E
CU
.....
Ol
5.0
.Q ..... 4.0 CU >
'E Q)
3.0
:
(/)
2.0
:
450.0
500.0
......... : .......... .:. .... . .
1.0 0.0 0.0
:
......... '[ " ......... -;- .......... ! .... . .. . . . ~
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
separation distance (Days) Fig. 3-- Tempora l Semivariogram from joi nt spatial- t emporal a n alysis of hydraulic he ad l evels . No t e tha t 1 foot; 30 cm.
BUXTON ET AL. ON SITE CHARACTERIZATION
57
TABLE 1--Fitted semivariogram models of spatial and temporal correlation. (Note that 1 foot = 30 cm.) Data
Semivariogram Model
Joint Spatial-Temporal Hydraulic Head Pressure in ft.
K = 3 Nested Structures, Nugget Variance o ft.2 1. Geometric Anisotro~ic Spherical, variance = 1.5 ft. , Spatial Range 1200 ft., Temporal Range = 30 days 2. Geometric Anisotro~ic Spherical, variance = 5.5 ft. , Spatial Range 7000 ft., Temporal Range = 700 days 3. Zonal Gaussian in Spatial NS Direction, Variance = 22 ft.2, Spatial NS Range = 6060 ft.
Steady-State Hydraulic Head Pressure in ft. (June, 1993 Data)
K = 2 Nested Structures, Nugget Variance ft.2 1. Isotropic Linear, Slope = 0.00045 ft. 2/ft. 2. Zonal Gaussian in Horizontal NS Direction, Variance = 13 ft.2, NS Range = 8660 ft.
1990 Uranium Levels in flg/L
K = 1 Structure, Nugget Variance = 0.3 [In(flg/ L )]2 1. Geometric Anisotropic Spherical, Variance = 2.7 [In(flg/L)]2, Horizontal Range = 3000 ft., Vertical Range = 120 ft.
0
Steady-State Analysis One primary reason for performing the joint spatial-temporal analysis in the previous section was to select, for the steady-state analysis, a single month which was representative of average hydraulic head levels during the 1990-1993 time period. Examining the results in Figure 4, it appears that three months can be considered representative: January, 1990; November, 1991; and June, 1993. Hydraulic heads in each of these three months appear to be approximately equal to the average head levels across the entire 1990-1993 time period. However, several new wells were installed in the area in 1993, particularly in the southeastern part of the modeling grid. Therefore, a significantly greater number of head measurements were available for June, 1993 in comparison with January, 1990 and November, 1991. As a result, June, 1993 was selected as the month to represent steady-state conditions. Hydraulic head measurements for June, 1993 were available for the steady-state kriging analysis in 202 wells at various depths. The horizontal spatial semivariograms for these data are shown in Figure 5. As in the joint spatial-temporal analysis (Figure 2), horizontal semivariograms in Figure 5 were calculated in four primary directions. The fitted model is also shown in Figure 5, where the bold line denotes the model in the north direction; and the dashed line denotes the model in the east direction. A kriging analysis was performed with the June, 1993 data and the semivariogram model shown in Figure 5, as well as Table 1. This analysis estimated steady-state head levels across the groundwater modeling grid at regular 5 ft (1.5 m) vertical intervals from 390 to 540 ft (117 to 162 m) above sea level. The horizontal variability in steady-state head levels at the 490 ft (147 m) elevation is shown in
!
il
58
i
.! ...... ~Q)
:r:
"0
....Q)a1 E :;:; CJ)
W
GEOSTATISTICAL APPLICATIONS
527 526 525 524 523 522 521 520 519 518 517 516 515 514 513 512 511 510 509 508 507 506
0
10
20
30
40
50
Time (months)
Grid Location
>!<->!<-¥ ~
(1 0,1 0) (80,30)
e-e-e (60,10) ~ (10,90)
B-B-B ~
(40,30) (60,90)
Fig. 4--Temporal Profile of Six Selected Estimation Locations. Note that 1 foot = 30 cm . Figure 6 . This figure shows a general trend of decreasing head pressure to the south associated with the predominant flow direction, along with a hydraulic head depression caused by two pumping wells in the eastern portion of the grid. Note in the extreme south-central part of Figure 6 that an unrealistically abrupt transition is shown between the uniform head values to the west and the pumping depression to the east . Unrealistic kriging features like that are possible when estimates are calculated for areas beyond the coverage of the data locations (see Figure 1). Figure 7, which presents the statistical uncertainty (in feet) associated with the estimates in Figure 6, shows that the steadystate heads are generally estimated to within 1 or 2 ft ( . 3 or . 6 m) . For areas beyond the spatial coverage of the data, the uncertainties increase to 3 ft. (.9 m), or more . These uncertainties can be interpreted as half-widths of a 95\ confidence interval for the estimates. That is, the confidence interval for the hydraulic head at any location in the grid is
BUXTON ET AL. ON SITE CHARACTERIZATION
59
20.0 18.0 . 16.0 .
$!!... Q)
2
E
ro ....
OJ 0
.;:
14.0 12.0 10.0
ro
8.0 .
'EQ)
6.0
~--.z:;. L. . " "fO~~~
>
II)
4.0
_~ ____ ~
2.0 0.0
_r.: :<1 1
'~
0
600
1200
1800
2400
3000
3600
~:
---Model _ - - - Model
4200
4800
5400
6000
separation distance (feet) Fig. 5--Horizontal spatial semivariograms for steady-state hydraulic head analysis using June, 1993 data. Note that 1 foot = 30 cm.
HEAD ± PREC where HEAD is the estimated hydraulic head from Figure 6 and PREC is the estimation uncertainty from Figure 7 . ANALYSIS OF URANIUM LEVELS Estimated uranium levels were needed for calibration of a groundwater solute transport model. Separate three - dimensional spatial kriging analyses, similar to that for steady-state hydraulic head, were performed using average uranium levels measured during 1990, 1991, and 1992 . One important difference between these uranium analyses and the head data analysis was that a logarithmic transformation of the uranium data was performed prior to the semivariogram and kriging analyses. This transformation was required to reduce the extreme variability seen in the uranium concentrations, making the semivariogram analysis more reliable . That is, the semivariograms calculated with untransformed data showed extreme variability which would have been difficult to model, while the semivariograms calculated with transformed data exhibited less variability to which a model could more reliably be fit. However, this transformation of the data leads to possible complications when the subsequent kriging results are back-transformed. The most direct back-transformation is a simple inverse-logarithmic transformation of the kriging estimate and estimation uncertainty. However, this back-transform corresponds to estimation of the median uranium concentration across a grid block, instead of the mean uranium concentration. In addition, the 95% confidence intervals for uranium
60
GEOSTATISTICAL APPLICATIONS
~ o
o :0 '0 ·c
S .£:
t: o Z
o
10
20
30
40
50
60
70
80
90
100
110
120
•
< 521
East (grid blocks)
Krg. Head
.. II ..
•••
< 515 < 524
< 518 • •• > = 524 •
II •
II •
Fig. 6--Estimated steady-state hydraulic head (feet) . Note that 1 foot = 30 cm. concentrations are multiplicative, rather than additive, in format. That is, the confidence interval at any location in the grid is [CONC / PREC, CONC*PREC] where CONC is the back-transformed estimated median uranium concentration and PREC is the back-transformed estimation uncertainty. In the alternative approach, the kriging estimate and estimation uncertainty can be back-transformed using analytical relationships between the mean and variance of the normal and lognormal probability distributions (Journel and Huijbregts, 1981, p. 572), resulting in an
61
BUXTON ET AL. ON SITE CHARACTERIZATION
120 110 100 90
~
80
8
70
u
60
:0 .;::
.g ..c.
t:
50
0
Z
40 30 20 10 0 0
10
20
30
40
50
60
70
80
90
100
110
120
East (grid blocks) [ Krg. Prec.
< 1 •••
< 2
•••
< 3
••• > =3 [
Fig. 7--Statistical uncertainty (width of 95\ confidence interval) in feet for estimated steady-state hydraulic head. Note that 1 foot ; 30 cm. estimate of the mean uranium concentration rather than the median. In this case, as with the analysis of water levels discussed earlier, the confidence intervals are additive, although there is no guarantee that the lower confidence bounds will be greater than zero. 1990 Uranium Levels The spatial kriging analysis was performed using average uranium concentrations (~g / L) measured during 1990 in 169 wells at various depths (Figure 8). The mean of these measurements was 29.3 ~g/L, although the maximum concentration (691 ~g / L) was considerably higher.
62
GEOSTATISTICAL APPLI CATIONS
15000
* 13500 12000 10500 9000 ~
t
0
7500
Z 6000 4500 3000
* ,j:!'** * * 'lit< :;t* * * * ~*** *" * * * * ** * ** ** * * * <~* * ** * ** * * * * * ** *** * * * * * * *** * * * * *** * * * * * * ** * * * ** * * * * * * * *
* * *
>i
1500
o T<"-'rrITT"-.rrTT"~rr"""-'"""-'rr"",,-r,,,,,,~ o 1500 3000 4500 6000 7500 9000 10500 12000 13500 15000 East Fig. 8--Wel l locations where uranium concentrations were meausred in 1990. The overall variability in the uranium data, as measured by the coefficient of variation , was also relatively high (2.90), particularly in comparison with that of the hydraulic head data (0.01). Horizontal semivariograms were calculated with the log-transformed data in the four primary directions (Figure 9), using horizontal separation distance classes defined in increments of 500 ft (150 m) with a tolerance of 250 ft (75 m), and vertical distance classes in increments of 20 ft (6 m) with a tolerance of 10 ft (3 m). The horizontal semivariograms indicated no significant anisotropy; that is, all four directional curves exhibited the same shape and variability . The vertical semivariogram (Figure 10) was found to plateau at the same overall variance as the horizontal semivariograms; however, the vertical semivariogram reaches it plateau at a separation distance of about 1 2 0 ft (36 m) wh ile the horizontal semivariograms reach their plateau at a separation distance of about 3000 ft (900 m) . As a result, a geometric an i sotropic semivariogram mode l was fi t ted to these curves, as shown in Figures 9 and 10, as well as Table 1.
BUXTON ET AL. ON SITE CHARACTERIZATION
63
5.0 4.5
--N
Cl
4.0 3.5
2-
-E
c: 3.0
ro .....
"f-
2.5
.,-<
Cl
2.0 .Q ..... ro
•
[
> 1.5
0 0
·E OJ
\1
en 1.0
I
0.5 0.0
0
600
1200
1800
2400
3000
3600
4200
4800
OMNI N NE E NW Model
5400
6000
separation distance (feet) Fig. 9--Horizontal semivariograms for log-transformed 1990 average uranium concentrations. Note that 1 foot; 30 cm. A three-dimensional kriging analysis was next performed using the log-transformed 1990 average uranium concentrations and the semivariogram model discussed above. In this analysis, a data search radius of 12,000 ft (3600 m) was used. The resulting estimated spatial distribution of the median uranium concentrations is depicted in Figure 11 which is a horizontal cross-section at a depth of 512 ft (153.6 m). above sea level. The most significant uranium concentrations, those above 70 ~g / L, occur in a northeast oriented area extending about 2500 ft (750 m) by 900 ft (270 m) horizontally, and about 40 ft (12 m) vertically . A surrounding area, about 5-10 times larger, contains lower uranium concentrations between 10 and 70 ~g / L. Figure 12, which presents the statistical uncertainty associated with the estimates in Figure 11, indicates that these uranium concentrations are typically estimated to within a multiplicative factor of 10; that is, the true concentrations could be 10 times higher or lower . In contrast, Figure 13 presents the mean 1990 uranium concentrations calculated from the same lognormal kriging, but using the second back-transform described above. Note that because the estimated mean concentration is more strongly affected by high uranium data values than is the estimated median concentration, the mean estimates in Figure 13 exhibit greater spatial variability than the median estimates in Figure 11. This is particularly true in the northwest, northeast, and southeast corners of the figure where the kriging is extrapolating beyond the spatial coverage of the available data. Kriging estimates in those areas should not be trusted, and are probably best excluded from the final map. However, they have been retained in Figure 13 to point out this common problem . The corresponding statistical uncertainty associated with the mean estimates is presented in Figure 14; the uncertainties range from about 150 ~g / L to 240 ~g / L. Qualitatively, the uncertainty results in Figures 12 and 14 are similar and result in
64
GEOSTATISTICAL APPLICATIONS
1990 Uranium - Vertical 5.0 4.5
~
4.0
::::: 3.5 OJ
3
-
c 3.0
E
m 2.5
....
OJ
o 2.0 .;:: m
.~ 1.5 Q) (J)
1.0
0.5 15.0
30.0
45.0
60.0
75.0
90.0
105.0
120.0
135.0
150.0
separation distance (Feet) Fig . 10--Vertical semivariograms for log-transformed 1990 a v e rage uranium concentrations . Note that 1 foot = 30 c m.
similar upper confidence bounds. However , as noted earli e r , the uncertainties for the mean estimates (Figure 14) imply lower confidence bounds which are often below zero ~g / L while the uncertainties for the median estimates (Figure 12) are not plagued b y this problem .
CONCLUSION This paper presents four variations of the ordinary kriging methodology which was found useful for environmental characterization of groundwater at a potentially contaminated site . When estimating hydraulic head pressure in the groundwater aquifer, three - dimensional ordinary kriging was used in two different ways: (1) to assess temporal changes in the two-dimensional spatial distribution of head pressures, and (2) to estimate the three-dimensional spatial distribution of head pressures at a fixed point in time. In both cases ordinary kriging was applied directly to the head data, and the resulting kriging variances were used to construct statistical confidence intervals for the estimated head values. Ordinary kriging could be used directly in these cases because the head data exhibited relatively low overall variability and a symmetric probability distribution. In contrast, kriging of uranium concentrations in the groundwater, which exhibited much greater variability and a skewed probability distribution, required modification of the standard ordinary kriging procedure. In this case, ordinary kriging was performed after making a natural logarithmic transformation of the uranium data to help reduce the variability and make the subsequent semivariogram analysis more reliable. The major complication with this approach is related to the back-transformation which must be performed after kriging to convert the estimates back into the original scale of measurement. Two approaches
65
BUXTON ET AL. ON SITE CHARACTERIZATION
120 110
•
100 90
~
80
0 0
70
"0 .1:
60
:0
.9 £
t::
50
0
Z
40 30 20 10
o
10
20
30
40
50
60
70
80
90
100
110
120
East (grid blocks)
Krg. Conc.
• ••
< 10 < 40
••
•••
< 20 < 50
••• < 30 • •• > = 50
Fig. 11--Median estimate of 1990 uranium concentrations
(~g/L)
were presented in this paper . The first approach leads to estimated median (rather than mean) uranium concentrations, and mUltiplicative (rather than additive) confidence bounds. The second approach results in estimated mean uranium concentrations and additive confidence bounds. However, there is no guarantee that the lower confidence bounds will be non - negative, and the widths of the confidence intervals are independent of the local data, which may not be appropriate for highly skewed data distributions .
66
GEOSTATISTICAL APPLICATIONS
~ 0 0
:0 u 01::
80 70 60
.9 .c: t
50
Z
40
0
30 20 10
o
10
20
30
40
50
70
60
80
90
100
110
120
East (grid blocks)
Preco
• ••
<
5
< 20
•
•••
< 10 >= 20
•••
Fig. 12--Multiplicative uncertainty factor (width of 95% confidence interval) for median estimates of 1990 uranium concentrations.
REFERENCES Deutsch, Clayton V., and Andre G. Journel, GSLIB: Geostatistical Software Library and User's Guide, Oxford University Press, New York, 1992, 340 pp. Journel, A.G., and Ch . J . Huijbregts, Mining Geostatistics, Academic Press, reprinted with corrections, 1981, 600 pp.
BUXTON ET AL. ON SITE CHARACTERIZATION
67
120 110 100 90
~ 0
80
0
70
u .t:
60
:0
.9 ..t:
1:: 0 Z
50 40 30
..
20 10
o
10
20
30
40
50
60
70
80
90
100
110
120
East (grid blocks)
Krg. Conc.
•••
< 10 < 40
••• •••
< 20 < 50
••• < 30 • •• >= 50
Fig. 13 - -Mean estimate of 1990 uranium concentrations
(~g / L).
68
GEOSTATISTICAL APPLICATIONS
120 110 100
90
~
80
8
70
·c
"0
60
l:
50
:0
,g -e 0 z
40 30 20 10
o
10
20
30
40
50
60
70
80
90
100
110
120
East (grid blocks)
Krg. Prec.
•••
< 150 < 240
•••
< 180
••• >
= 240
•••
< 210
Fig . 14--Additive uncertainty factor (width of 95% confidence interv al) in ~g / L for mean estimates of 1 9 90 uranium concentrations .
Pierre Colin/ Roland Froidevaux/ Michel Garcia3 and Serge Nicoletis4
INTEGRATING GEOPHYSICAL DATA FOR MAPPING THE CONTAMINATION OF INDUSTRIAL SITES BY POLYCYCLIC AROMATIC HYDROCARBONS: A GEOSTA TISTICAL APPROACH
REFERENCE: Carr, J. R., "Revisiting the Characterization of Seismic Hazard Using Geostatistics: A Perspective After the 1994 Northridge, California Earthquake," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, and Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: A case study is presented of building a map showing the probability that the concentration in polycyclic aromatic hydrocarbon (P AH) exceeds a critical threshold. This assessment is based on existing PAH sample data (direct information) and on an electrical resistivity survey (indirect information). Simulated annealing is used to build a model of the range of possible values for P AH concentrations and of the bivariate relationship between P AH concentrations and electrical resistivity. The geostatistical technique of simple indicator kriging is then used, together with the probabilistic model, to infer, at each node of a grid, the range of possible values which the P AH concentration can take. The risk map is then extracted for this characterization of the local uncertainty. The difference between this risk map and a traditional iso-concentration map is then discussed in terms of decision-making. KEYWORDS: polycyclic aromatic hydrocarbons contamination, geostatistics, integration of geophysical data, uncertainty characterization, probability maps, simulated annealing, bivariate distributions, indicator kriging.
IHead, Environmental and Industrial Risks, Geostock, Rueil-Malmaison, France 2Manager, FSS Consultants SA, Geneva, Switzerland 3Manager, FSS International r&d, Chaville, France 4Head, Geophysical Services, Geostock, Rueil-Malmaison, France
69
70
GEOSTATISTICAL APPLICATIONS
Steelwork and coal processing sites are prone to contamination by polycyclic aromatic hydroccu"bons (P}Jl), SOnte of which are known to be carcinogenic. Consequently, local and state regulatory agencies require that all contaminated sites be characterized and that remediation solutions be proposed. The traditional approach for delineating the horizontal and vertical extent of the contamination is to use wells and boreholes to construct vertical profiles of the contamination at several locations. This approach, however, is both time consuming and expensive. Recent work has shown that, in some situations, electrical conductivity and resistivity surveys could be used as a pathfinder for delineating the contaminated areas. These geophysical surveys, which are both expedient and cost effective, could be used to reduce the number of wells and boreholes to be drilled. Geophysical data, however, do not provide direct information on soil chemistry. They are indicative of the ground nature, which in turn may reflect human activities (backfill material, tar tanks) and potential sources of ground pollution. These geophysical data have to be treated as indirect and imprecise infonnation. The mapping of the contamination therefore requires that imprecise geophysical data be correctly integrated with precise chemical analyses from wells and boreholes. Geostatistics offers an ideal framework for addressing such problems. Different types of information can be integrated in a manner which takes into consideration not only the statistical correlation between the different types of infonnation, but also the spatial continuity characteristics of both. Using this approach, it is possible to provide maps showing the probability that the P AH exceeds some critical level. A case study from an industrial site in Lorraine (northern France) is used to compare the geostatistical approach to the traditional approach of directly contouring the data from wells and boreholes. AVAILABLE DATA The available information consisted of chemical measurements of P AH concentrations (in ppm) from 51 boreholes. Figure 1 shows the location of these boreholes. As can be seen, the coverage is not even, and the lower right quadrant of the map is undersampled. In terms of distribution of the concentration values, it is seen that the highest concentrations are located toward the middle, where the cokeworks were located. The geophysical information included both conductivity measurements (electromagnetic survey) and resistivity measurements (dipole-dipole electrical measurements). The electromagnetic survey made it possible to investigate the overall site area and to identify anomalous zones where more accurate resistivity measurements were carried out. These resistivity measures are average values over large volumes of soil and are directly related to the soil nature (recent alluvium, slag deposits and other backfill material). They depend also on soil heterogeneities and the spatial arrangement of these heterogeneities. Although the physical phenomena that govern P AH transport, and the reactions between P AH and soils of different nature are not yet well understood (they are the subject of ongoing research projects), the presence of P AH in significant amounts has been found to be be
COLIN ET AL. ON GEOPHYSICAL DATA
71
associated with low resistivity values (i.e it increases, locally, the soil conductivity). This relationship, however, remains site specific and cannot be considered as a general law. The available resistivity measurements (in ohm-meter) come from 14 electrical lines, tightly criss-crossing the contaminated area. Gaps between electrical lines were filled, first, by sequential simulation to produce the full resistivity map shown on Figure 2.
500 rn
o
0
0 00 0 0 0
••
PAH concentration :
0 0
0
• " 0 0" " 0 0 " 0 • " 0 o a- "· 0 0 0 " • • " • 0
·.
•
...
0 0
0
> 200 ppm 40 - 200 ppm < 40 ppm
0 0
" 0
0 0
0 0
600rn
Figure 1 : Sample Location Map
500
15 ohm· m 25
~a~
• •• •• II
0 0
600
Figure 2 : Electrical Resistivity Map
40 95 250 375 750 1500 3000
If
72
GEOSTATISTICAL APPLICATIONS
OBJECTIVE OF THE STUDY The problem at hand is to delineate (on a 10 by 10 meter grid) areas where the risk that the P AH contamination is in excess of a critical threshold is deemed large enough to warrant either remediation or further testing. The critical threshold used for this study is 200 ppm PAH, and three classes of risk were considered: Low Risk :
The probability that the concentration in P AH exceeds 200 ppm is less than 20 percent.
Medium Risk: The probability of exceedance is in the 20 to 50 percent range. High Risk:
The probability of exceedance is over 50 percent.
From a methodological point of view, assessing the risk of exceedance implies that, at each node of the grid, the range of possible values for the P AH concentration, along with their associated probability, be available. The challenge therefore is to take advantage of both the direct measurements of P AH concentration and the indirect information provided by electrical resistivity to infer, at each grid node, the range of possible values that P AH concentration could take. These ranges of possible values, called also local conditional distribution functions, can be viewed as measures of the local uncertainty in the P AH concentration. Once these local uncertainties are established, the risk maps can be produced.
EXPLORATORY DATA ANALYSIS Given the objectives of the study, we need to understand the following critical features from the available data: 1- The range of possible values for the PAH concentrations which may be encountered away from sampled locations; 2- The relationship which exists between P AH concentrations and electrical resistivity. In other words, knowing the resistivity value at a particular location, what can we say about the possible range of P AH concentration values at the same location? 3- The spatial correlation structure of P AH concentrations?
Univariate distribution ofPAH concentrations Any geostatistical estimation or simulation algorithm requires a model describing the probability distribution function of the variable under consideration, i.e. an enumerated list of possible values with their associated probabilities. Traditionally, this probability distribution function is based on the experimental histogram built on the available data. Figure 3 shows the experimental histogram, and the corresponding summary statistics, of the available PAH
COLIN ET AL. ON GEOPHYSICAL DATA
73
concentration data. We can see that: -
The number of data available to construct the histogram is fairly limited, resulting in a lack of sufficient resolution : the class frequencies tend to jump up and down erratically and there are gaps between data values.
-
The histogram is extremely skewed to the left, the bulk of the values being below 300 ppm, with some erratic high values extending all the way up to 6500 ppm (Figure 3a). Not surprisingly, the coefficient of variation is very high (3 .03).
-
The mean and variance of the data are severely affected by this high variability and they cannot be established at any acceptable level of reliability: removing the two largest values reduces the average by a factor 3 and the variance by a factor 65!
-
Ifwe use a logarithmic scale to visualize the same histogram (Figure 3b), we see clearly the existence of three populations: a first one below 50 ppm and accounting for 61 percent of the total population, a second one in the range 50 to 500 ppm and including 34 percent of the population, and a third small (5 %) population characterized by extreme P AH values ranging from 600 to 6500 ppm.
f
Summary statistics Number of data : Mean : Standard deviation : Coer variation : Minimum : 1st quartile : Median: 3rd quartile : Maximum :
0.5
0.25
0.0
-Jlllll:ilib:tlhl...frlliL..Illl.....IDlla,.__ 250
51 382 ppm 1159 ppm 3.03 2 ppm 9 ppm 33 ppm 206 ppm 6500 ppm
PAH (ppm)
500
Figure 3a : Histogram and summary statistics for P AH concentrations (in ppm)
74
GEOSTATISTICAL APPLICATIONS
f
0. 1
0.0
..,....""'-'''''-+ PAH wpm] 50
500
Figure 3b : Histogram ofPAH concentrations (logarithmic scale)
The first population can be interpreted as representing the background concentration level on the site. The second population seems clearly related to contamination itself, with the bulk of it above the critical threshold of 200 ppm. The third population is a little more dubious to interpret, primarily because it is represented only by three samples. Although it is obviously associated with the contamination, it is not entirely clear whether it represents a different factor of contamination or if it is merely the tail end of the second population. From this analysis it is clear that the experimental histogram cannot be used as is as a model of the distribution function of P AH concentrations over the site area. This probability distribution function should, instead, be modelled and have the following features :
It should not be based on parameters like the mean and variance, which are highly affected by extreme values and are, as a result, not known with any degree of reliability; It should provide probabilities for the entire range of possible values, from the absolute minimum to the absolute maximum and fill the gaps between existing data values;
It should reproduce the existence of the three populations and their respective frequencies.
Bivariate Distribution The cross-plot shown on Figure 4 describes the relationship which exists between PAH concentrations and electrical resistivity.
COLIN ET AL. ON GEOPHYSICAL DATA
75
PAH [ppm[ 1000
100
•
•
•• • ••• • • •• •••
• • • • •• • • • • •
10
10
100
Resist . ohm. m
1000
Covariance : -0.799 Correlation (pearson) : -0.360 Correlation (Spearman) : -0.266
Figure 4 : Cross plot and bivariate statistics ofPAH vs Electrical Resistivity
The most important feature ofthis plot is the existence oftwo distinct clouds of points, which is a direct consequence of the multi-modality of the PAH distribution, and of the bimodality of the electrical resistivity: An upper cloud where P AH values are in excess of 3 5 ppm and the electrical resistivity ranges from 15 to 150 ohmomo Within this cloud, the correlation between the two attributes is positive.
A lower cloud with PAH values below 35 ppm and with electrical resistivities in the 30 to 1600 ohmom range. The correlation, again, appears to be positive, but less significantly so . From this cross-plot it seems that high concentrations ofPAH (over 35 ppm) are associated to rather low resitivity values. One possible explanation of this feature, which stilJ remains to be confirmed, is that P AH, which are viscous fluids, tend to flow down through backfilJ materials until they reach the top of the natural soil. At this level they filJ up the soil pore volume, thus creating a flow barrier to water.
76
GEOSTATISTICAL APPLICATIONS
Traditionally, bivariate distributions are parametrized by the mean and variance of their n;arginal distributions together with the correlation coefficient. Such an approach is inapplicable in our case, since it will fail completely to reflect the most important feature of the cross-plot which is the existence of the two populations. The solution adopted for this study consists of using directly a bivariate histogram to describe the bivariate distribution model. Because of the sparsity of data, the experimental cross plot is not sufficient to inform all the possible bivariate probabilities: it is spiky and full of gaps. The required bivariate histogram, therefore, will be obtained by an appropriate smoothing of this experimental cross plot, making sure that the two clouds of points are correctly reproduced. Spatial Continuity Analysis The variogram analysis performed on the natural logarithm ofPAH concentrations (Figure 5) shows that the phenomenon is reasonably well structured, with a maximum correlation distance (range) of approximately 70 meters. There was no evidence of anisotropy and the shape of the variogram was exponential.
y(h) 6
... .. ,.. .. .. i · .. · .. ... .... ..................... .......... .. ~
..
~
~
'
.. ~' 3
0r-----------r----------.-----------r~ hIm) so
100
ISO
Figure 5 : Experimental variogram for Ln(P AH) concentrations
COLIN ET AL. ON GEOPHYSICAL DATA
77
BUILDING THE PROBABILISTIC MODEL Based on the results of the exploratory analysis, the probabilistic model to be used in estimation and uncertainty assessment will consist of the following: -
A smooth univariate histogram approximating the marginal distribution of the P AH concentration, and
-
A smooth bivariate histogram describing the bivariate distribution of P AH concentration and electrical resistivity.
Several approaches have been proposed to produce smooth histograms and cross-plots: quadratic programming (Xu 1994), fitting of kernel functions (Silverman 1986; Tran 1994) and simulated annealing (Deutsch 1994). The technique selected for this study is simulated annealing, because it was perceived to be the most flexible to accommodate all the requirements of the probabilistic model. Simulated annealing is a constrained optimization technique which is increasingly used in earth sciences to produce models which reflect complex multivariate statistics. A detailed discussion of the technique can be found in (Press et al. 1992; Deutsch and Journel 1992; Deutsch and Cockerham 1994). In this study the modelling of the bivariate probabilistic model was done in two steps: first the marginal distribution of the P AH concentration was modelled, and then the cross-plot between PAH and electrical resisitivity (there was no need to model the marginal distribution of resistivity, since it was directly available from the resistivity map).
,., i. II:
I. IU
I!I, it
,.i' !.
The modelling of the marginal distribution ofPAH via simulated annealing was implemented as follows: 1- An initial histogram is created by subdividing the range of possible values into 100 classes, and assigning initial frequency values to each of these classes by performing a moving average of the original experimental histogram and then rescaling these frequencies so that they sum up to one. 2- An energy function (Deutsch 1994) is defined to measure how close to desired features of the final histogram the current histogram is. In the present case the energy function takes into consideration the reproduction of the mean, variance, selected quantiles and a smoothing index devised to eliminate spurious spikes in the histogram. 3- The original probability values are then perturbated by choosing at random a pair of classes, adding an incremental value b.p to the first class and substracting it from the second, hence ensuring that the sum of the frequencies is still one. 4- The perturbation is accepted if it improves the histogram, i.e. if the energy function decreases. Ifnot, the perturbation may still be accepted with a small probability. This will ensure that the process will not converge in some local minimum. 5- This perturbation procedure is repeated until the resulting histogram
IS
'i i
il if '1'
I
,I i'
1
deemed
~
l II 'I
!
1
I i
:1
78
GEOSTATISTICAL APPLICATIONS
satisfactory (the energy function has reached a minimum value) or until no further progress is possible. The modelling of the cross plot followed a similar general approach: 1- An initial bivariate histogram is created by subdividing the range of possible values along both axes into 100 classes, and assigning initial bivariate frequency values to each of these cells by performing a moving average of the original cross-plot followed by a rescaling of the frequencies to ensure that they sum up to one. 2- An energy function is defined to measure the goodness of fit of the current bivariate histogram to desired features of the final one. In the present case the energy function takes into consideration the reproduction ofthe marginal distributions defined previously, the correct reproductions of some critical bivariate quantiles and, again, a smoothing index devised to eliminate spurious spikes in the cross plot. 3- The original bivariate frequencies are perturbated by randomly selecting a pair of cells, and adding an incremental probability t,p to the first class and subtracting it from the second, leaving therefore the sum of frequencies unchanged. 4- As before the perturbation is accepted if it decreases the energy function and accepted with a certain probability if not. 5- The perturbation mechanism is iterated until the energy function has converged to some minimum value. A detailed discussion on how to use simulating annealing for modelling histograms and cross plots can be found in Deutsch (1994). The result of this modelling is shown on Figure 6 (experimental histogram of P AH concentrations and smooth model) and Figure 7 (experimental cross plot of P AH versus resistivity and corresponding smooth bivariate histogram). As can be seen, all the important features appear to be well reproduced: the multi-modality of P AH, the bi-modality of the resistivity and the existence of the two clouds on the cross plot.
ESTIMA TING THE LOCAL DISTRIBUTION FUNCTIONS Having developed the bivariate probabilistic model for P AH and electrical resistivity, we will now use it to infer the local conditional cumulative distribution function (ccdt) of the P AH concentration. This inference (see Appendix I) involves two steps: 1- The local a priori distribution function (cdt) of the P AH, given the local resistivity value, is extracted from the bivariate histogram. This local cdf characterizes the uncertainty of the P AH value based on the overall relationship existing between P AH and resistivity, but
COLIN ET AL. ON GEOPHYSICAL DATA
before using the local P AH data values themselves.
f
0.1
...,...~~--+
0.0
50
PAH [ppm]
500
f
...,...--.... PAH[ppm]
0.0
50
500
Experimental Figure 6 : P AH Concentration histogram and smooth histogram model (logarithmic scale)
79
80
GEOSTATISTICAL APPLICATIONS
2- The local a priori cdf is then conditioned to the nearby P AH data values via simple indicator kriging (Journel \989) . This ccdf now describes the uncertainty on the P AH concentration once the local conditioning information has been accounted for. Note that simple indicator kriging calls for a model of the spatial continuity of the residual indicators (see Appendix I). This model is shown on Figure 8.
y
... .
·f
01) = 0.01 + 0.25Gaus 70 (h)
.. ." ........................... " ......... " ....... " .. •
r-----.---~r---~----~-----r-----r~h[ml 70
140
210
Figure 8 : Variogram model for the indicator residuals
In this approach, the two types of information (direct measurements ofP AH concentrations and electrical resistivities) are mixed in a smooth, transitional fashion: when there is abundant nearby sample data, the simple indicator kriging system will put a lot of emphasis on this conditioning information and down play the influence of the indirect information, whereas when there is little or no conditioning data, the range of possible outcome for P AH will be primarily controlled by the local resistivity value.
PROBABILITY MAPS Having inferred the local distribution functions of P AH concentrations, we can now build the probability map (Figure 9c) showing the risk that this concentration exceeds the critical threshold of200 ppm .
COLIN ET AL. ON GEOPHYSICAL DATA PAH (ppm )
• 1000
•
100
..... •
• •• •••
• • • • • •• • • ••
10
10
100
ResilL (ohm-m 1
1000
PAH (pp m(
Freguency 0 0.01 0.02
1000
;:m:-:::*
• •• f.~~~*
II
100
••
10
0.03 0.04 0.05 0.06 0.07 0.08 0.09
=="-~ Resistivity [o1un . m] 10
100
1000
Figure 7 : P AH vs Electrical Resistivity Experimental cross-plot and bivariate histogram model
81
82
GEOSTATISTICAL APPLICATIONS
500 m
a) Iso-concentration map
II
PAH > 200 ppm
[j
PAH <200 ppm
o 500m
~1I1I1I~------------~
b) Probability map (PAH only)
High risk Mediwn risk Low risk
0
600 m
500 m c) Probability map (PAH & ResistivM
II
High risk
~
Mediwn ri sk
II
Low risk
0 0
600m
Figure 9 :Iso-concentration map and probability maps
COLIN ET AL. ON GEOPHYSICAL DATA
83
For the sake of comparison, this probability map is compared with two other maps: - A probability map built by indicator kriging also, but taking into account the P AH concentrations only (Figure 9a), and - A map showing the area where the estimated PAH concentration (expected value) exceeds the critical threshold. This map was obtained by ordinary kriging (Figure 9a). If we look first at the estimated map (Figure 9a), we see that the area where the P AH estimated value exceeds 200 ppm forms a rather homogenous, smoothly contoured, zone concentrated around the cokeworks, where all the high sample values are located. If this map was used for decision-making, one could come to the conclusion that this zone, representing a surface of 135,900 square meters, is the only one requiring attention. Looking now at the probability map produced by indicator kriging based on the P AH concentrations only (Figure 9b), we see that the picture gets more complex: the center zone is still high risk but less homogeneously so, and the medium risk zone extends further to the south. The medium and high risk zones, now, represent 190,700 square meters. However, because oflack of direct information, the periphery appears mostly as a low risk zone. Finally, by including the information provided by the electrical resisitivity, we see that there is a significant probability (medium risk) that peripheral areas to the south, but also to the north and east, are contaminated. These areas would probably warrant further testing to confirm the level of contamination. In this case, the medium and high risk areas total 271,900 square meters. From these results it is clear that estimated maps are inadequate for delineating risks of contamination: they provide information on the expected value of the concentration, but not on the local uncertainty. And because of their intrinsic smoothing properties, they may either vastly overestimate or underestimate the extent of the risk zone. In this case the contour map indicates a risk area (estimated concentration greater than 200 ppm) which is half the size of the medium and high risk area as shown on the probability map inferred from both the P AH concentrations and the electrical resistivity. It is worth remembering also that the objective of such study is to provide a classification of whether or not the soil, at a particular location, is likely to be affected by the contamination, and not to provide a good estimate of the concentration at that location. The challenge is to come up with probabilities and not with estimated concentrations. One may even argue that the latter is irrelevant to the task at hand: high estimated values may correspond to a low probability of exceedance and, conversely, moderate estimated values may be associated with high probabilities of exceedance.
CONCLUSION Prediction of contaminant concentration can be improved by taking into consideration indirect, related information such as geophysical data. To integrate effectively this indirect
, ,I
II
I.
I II
II
!,
84
GEOSTATISTICAL APPLICATIONS
information into the estimation process it is crucial that the bivariate relationship existing bp,tween the direct and indirect information be correctly rendered. Very often this relationship cannot be captured by the classical parametric description based on the means and variances of the marginal distributions and the correlation coefficient. A more general alternative consists in using a full bivariate histogram to describe the relationship between the two attributes, and it is proposed that this bivariate histogram be modeled by simulated annealing. This bivariate probabilistic model can then be used to infer the local a priori distribution function ofthe contaminant concentration given the known value of the secondary attribute. This local a priori distribution function is finally conditioned to the existing local contaminant data by simple indicator kriging. This approach is very general and can be used to address many different situations. It should be stressed, however, that the relevance of its results depends heavily on how physically meaningful is the relationship between the main and co-attribute as described by the bivariate probabilistic model.
APPENDIX I Simple Indicator Kriging with Bivariate Histogram This paper is concerned with P AH concentration and electrical resistivity. The technique however is completely general and can be used each time that secondary information is provided under the form of a bivariate histogram. We will use the following notation: Z(u)
random variable describing the attribute of main interest, informed by N data values, z(uu), a= 1, ... , N
Y(u)
random variable describing the secondary attribute, informed by M data values, p= 1, ... ,M
y(u~),
u
location coordinates vector
Prob{
Zj_1 ::;
Z(u) < z; and Yj-J::; Y(u)
<
yj}
= bivariate histogram frequency with: Zj,
i =1, .. , M = number of thresholds discretizing the range of values [zmm' zmar]
Yp j= I, ... , ~ = number of thresholds discretizing the range of values [ymm. YmaJ At a given location u, the a priori distribution function of the main attribute will be given by:
COLIN ET AL. ON GEOPHYSICAL DATA
85
It
EtcU; F O(u; zit
I y(u»
= Prob{Z(u) ~ zit
I y(u)}
ZI;
y(U»
1=1
(1)
N,
EtcU;
ZI;
y(U»
1=1
and the estimated local ccdf (posterior distribution) by:
n
F*(u; zit Iz(u •.) a=l, ... , n) =
L
A. .. ·r(u .. ; zit) + FO(u; zit
I y(u»
(2)
.. =1
I~i
~;: I I,
I:
Ii" ,~
,
I",
I,
I, 1
~II I
with Z(U~),
a= 1, ... , n being the n local conditional data
and rU!~;
zJ being the residual indicator value:
(3)
; otherwise
Simple kriging with a bivariate histogram, therefore, involves the following steps: 1- Select the number thresholds~, k=1, ... ,K required to provide an adequate discretization of the local ccdf This number K of thresholds depends on the goal of the study and needs not to be as large as the number of thresholds ~ used to build the bivariate histogram; 2- For each datum z(u a ), define the residual indicator values r(ua,zJ, k=1, ... , K, using the local a-priori distribution function FO(Ua,Zk) ;
'1,/,'I!
l'
86
GEOSTATISTICAL APPLICATIONS
3- Establish the variogram model y /h; z.) of the residual indicator values; 4- Then, at each grid node u and for each threshold
~:
- determine the local a priori distribution of the main attribute Z(u), given the local secondary attribute value y( u) using equation I; - estimate, by simple kriging, the local ccdf of Z( u) using equation 2; - check and correct for potential order relation problems 5- Process the estimated ccdf to extract the required probabilities, quantiles or estimated values. In principle, this involves solving, at each grid node u, K simple kriging systems since variogram models may be different for each threshold z.. Ifit can be shown that the variogram model does not change significantly from one threshold to another, then it is sufficient to solve a single system, and to use the same weighting scheme for every threshold. This approach, called Median Indicator Kriging - or "mosaic-model" (Lemmer 1984; Journel 1984), can simplifY and speed up the whole estimation process.
REFERENCES Wenlong, Xu, 1994, "Histogram and Scattergram Smoothing Using Convex Quadratic Programming," SCRF proceedings, Stanford University. Silverman B., 1986, "Density Estimation for Statistics and Data Analysis," Chapman and Hall, New-York. Tran, T., 1994, "Density Estimation using Kernel Methods," SCRF proceedings, Stanford University. Deutsch, c.v., 1994, "Constrained Modelling of Histogram and Cross-Plots with Simulated Annealing," SCRF proceedings, Stanford University. Press W., et aI., 1992, "Numerical Recipes in C, The Art of Scientific Computing," CambridgeUniversity Press. Deutsch, C.V. and Journel, AG., 1992, "GSLIB Geostatistical Software Library and User's Guide," Oxford University Press. Deutsch, C.V., and Cockerham, P.W., 1994, "Practical Considerations in the Application of Simulated Annealing to Stochastic Simulation," Mathematical Geology, Vol. 26, No.1, pp. 67-82.
COLIN ET AL. ON GEOPHYSICAL DATA
87
Journel, A. G.,1989, "Fundamentals of Geostatistics in Five Lessons," Volume 8, Short Course in Geology, American Geophysical Union, Washington D.C. Lemmer, I. c., 1984, "Estimating Local Recoverable Reserves via Indicator Kriging, "in G.Verly et aI., Geostatistics for Natural Resources Charaterization, pp. 349-364, Reidel,Dodrecht, Holland. Journel, A. G., 1984, "The Place of Non Parametric Geostatistics," in G.Verly et aI., Geostatistics for Natural Resources Charaterization, pp. 307-335, Reidel, Dodrecht, Holland.
, , ,
;
1::1
Iii
I':
I,'
" 'I
II" I
I,
"I, I i
Michael R. Wild i and Shahrokh Rouhani 2
EFFECTIVE USE OF FIELD SCREENING TECHNIQUES IN ENVIRONMENTAL INVESTIGATIONS: A MULTIVARIATE GEOSTATISTICAL APPROACH
REFERENCE: Wild, M. R., Rouhani, S., "Effective Use of Field Screening Techniques in Environmental Investigations: A Multivariate Geostatistical Approach," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: Environmental investigations typically entail broad data gathering efforts which include field screening surveys and laboratory analyses. Although usually collected extensively, data from field screening surveys are rarely used in the actual delineation of media contamination. On the other hand, laboratory analyses, which are used in the delineation, are minimized to avoid potentially high cost. Multivariate geostatistical techniques, such as indicator cokriging, were employed to incorporate volatile organic screening and laboratory data in order to better estimate soil contamination concentrations at a underground storage tank site. In this work, the direct and cross variographies are based on a multi-scale approach. The results indicate that soil gas measurements show good correlations with laboratory data at large scales. These correlations, however, can be masked by poor correlations at micro-scale distances. Consequently, a classical direct correlation analysis between the two measured values is very likely to fail. In contrast, the presented multi-scale co-estimation procedure provides tools for a cost-effective and reliable assessment of soil contamination based on a combined use of laboratory and field screening data. KEYWORDS: geostatistics, cokriging, multivariate, field screening, volatile organics
Assessing the extent of soil contamination can be very costly. Laboratory analysis of common environmental contaminants can range from $200 to $1000 per sample for standard method testing. Consequently, many investigations first use field screening techniques to help identify relative levels of contamination and then select a few samples for laboratory analysis. In many cases, the validity of field data are questioned and such data are rarely used in the actual delineation of source contamination. This paper presents a geostatistical technique for an optimal and defensible incorporation of field screening and laboratory data. This approach is intended to
iProject Environmental Engineer, Dames & Moore, Inc., Six Piedmont Center, 3525 Piedmont Road, Suite 500, Atlanta, Georgia 30305. 2Associate Professor, School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0355.
88
WILD AND ROUHANI ON FIELD SCREENING
89
accomplish the following objectives: Perform site characterization in a cost-effective and information-efficient manner, Minimize the need for additional environmental investigations, and Employ defensible approaches based on rigorous mathematical techniques for the analysis of spatial data. Several screening devices or kits are available to measure various contaminants, such as volatile organics, metals and pesticides. The most commonly used devices are portable soil-gas probes, xray fluorescence spectrometers, and immunoassay kits. The measured results of these tools are mostly qualitative in nature and may not correlate well to actual laboratory measurements. Almost all environmental investigations require procedures to determine the extent of contamination. Investigators would prefer to employ all types of available information, including field and laboratory data, to perform this task. However, federal and state regulatory agencies discourage the direct use of field data. Consequently, a large portion of useful information is either neglected or under-utilized. This paper provides a timely solution to extract the maximum amount of information from available data. For this purpose, multivariate, non-linear techniques, such as indicator cokriging, are used. This paper applies these concepts to a site whose soil has been contaminated by several underground storage tanks over a period of 20 to 30 years. The tanks were used primarily for storage of kerosene, gasoline and diesel fuels and various industrial solvents. Although the site was extensively investigated with over 300 samples, limited laboratory confirmation of volatile organic compounds (VOCs) was performed. This attempt to save money in laboratory cost actually prohibited the delineation of contamination extent.
BACKGROUND INFORMATION Screening tools for VOCs, such as photoionization detectors (PID) and organic vapor analyzers (OVA), can provide an inexpensive alternative to laboratory testing, especially for large, multilayer investigations. Manufacturers of these instruments advocate the use of these devices as an effective method of measuring VOCs for preliminary site characterization, including the delineation of subsurface contamination. Unfortunately, the reliability of these devices has proven to be dependent on weather conditions, soil type and actual contaminant concentrations. Several published case studies exist that incorporate data gathered using these instruments. One study performed by Marrin and Kerfoot (1988) used a portable gas chromatograph and PID to predict the extent of groundwater contamination by measuring volatile organics in the soil gas. Thompson and Marrin (1987) also measured soil gas concentrations at 49 locations to estimate groundwater contamination. The results of this estimation, however, were verified by an inadequate number of groundwater samples (five). Crouch (1990) used gas detection tubes to estimate contaminant concentrations in soil vapor. According to Crouch, gas detection tubes were used because the OVA and PID are not compound specific and only provide a total VOC measurement. None of the above investigations performed any correlation analyses between soil gas probe
,,''" ,,'
90
GEOSTATISTICAL APPLICATIONS
readings and laboratory measurements and therefore assumed that the measured results from these s<::reening devices were reasonably accurate. Seigrist (1991) compared gas chromatography to two PIDs of varying ionization potentials to measure volatile organics in a controlled environment. The results of the comparison showed poor correlation between both PIDs and the gas chromatograph and demonstrated that the PIDs were very sensitive to water vapor and responded to natural organics including methane, ethylenes and alcohols. Smith and Jensen (1987) tried correlating OVA and PID readings to laboratory measurements for total petroleum hydrocarbons (TPH) in soil. Again, poor correlation prohibited using screening tools to estimate actual TPH, and the authors cautioned against using screening tools as a sole criterion for determining soil contamination. The above works clearly indicate that the screening tools which are widely used and accepted in the environmental field provide only qualitative information on total VOCs. The use of these tools in delineating contamination is cautioned against because their effectiveness and reliability remain unverified. On the other hand, comprehensive laboratory analyses of environmental investigations can be prohibitively expensive. This study presents geostatistical procedures, such as cokriging, to link the two measurement techniques and produce accurate maps of contamination. Cokriging is defined as the estimation of one variable using not only observations of the variable but also data on one or more additional, related variables defined over the same field (Olea 1991). Cokriging is suitable for cases where the targeted variable is not sampled sufficiently to provide acceptably precise estimates of that variable over the entire investigated field (Joumel and Huijbregts 1978). Such estimates may then be improved by correlating this variable with bettersampled auxiliary variables as a function of their separation distance. This approach is fully compatible with actual field conditions where VOC screening data are extensively collected but only relatively few samples are verified by laboratory analysis. However, it must be emphasized that the utility of cokriging depends on the level of spatial cross-correlation between the primary and auxiliary variables. To obtain an adequate cross-correlation between investigated variables, it may become necessary to apply data transformations prior to the actual co-estimation process. This may require use of non-linear cokriging techniques. The use of non-linear geostatistical techniques is preferable if one or both of the variables exhibit non-gaussian tendencies. Such distributions are commonly observed in contamination assessments of VOCs in soil. VOC data sets are usually characterized by a few significant outliers and a majority of very low or non-detectable samples. Indicator kriging has been found to be useful for highly variant phenomena where data present long-tailed distributions (JoumelI983). This characterization also applies with respect to mineral deposits data sets (lsaaks and Srivastava 1989). This type of kriging uses a non-parametric approach that does not suffer from the impact of outliers since the original values are transformed to either a 0 or 1 based on cutoff or threshold limits (Isaaks and Srivastava 1989). The transformed values can then be used to estimate the spatial distribution of the data.
GEOSTATISTICAL METHODOLOGY Geostatistics provides tools for the analysis of spatially correlated data and is well-suited to the
WILD AND ROUHANI ON FIELD SCREENING
91
study of natural phenomena (Journel and Huijbregts 1978). The theory of geostatistics has been well-documented over the years; therefore, only a general description of techniques applicable to this investigation are provided. These techniques are cokriging and indicator kriging. For more information on geostatistics, see Journel and Huijbregts (1978). Geostatistics allows for the estimation of values at unsampled locations. This estimation approach is commonly known as kriging and is a linear combination of known nearby values, as shown by (1)
where Zo* = the estimated value ofZ (an arbitrary parameter) at location xo; Zj = the measured value at location Xj; A. j = the kriging weight of the parameter value at Zj; and n = the number of nearby sample points to be used in the estimation. The weights are calculated to produce the lowest estimation error or variance and to satisfy the unbiasedness condition (LA.j = 1, I = 1 to n). The minimized variance for ordinary kriging can be written as n
Lj=! Ajy~
v'o
+ Jl
(2)
where Vo' = the minimum variance of estimation error; yjjZ = the variogram between Zj and Zj; and f.J- = the Lagrange multiplier. Cokri~in~
Cokriging is the estimation of one variable based on measured values of two or more variables. This procedure can be regarded as a generalization of kriging in the sense that, at every location, there is a vector [Z(Xj), Y(Xj)' ... ] of many variables instead of a single variable Z(x) (Olea 1991). The variable to be estimated is denoted as the target or primary variable while all other variables are categorized as auxiliary or secondary variables. The secondary variable is cross-correlated with the primary variable. The cokriging procedure is especially advantageous in cases where abundant secondary values are more abundant than primary variables. The co-estimation of the primary variable is calculated as n
Z~
=
m
Lj=! AjZj +L vjYj j=!
(3)
where Vj = the weight factor for the secondary variable, Y, measured at Xj; and m = the number of secondary-variable measurements (which is typically much greater than n).
92
GEOSTATISTICAL APPLICATIONS
Minimizing the variance of estimation error, Vo', subject to cokriging unbiasedness conditions (~;\'i =1, ~Vj = 0) results in
V;
where
=
n
m
i=l
j=l
L Ai y~ + L Vj y~Y
+
(4)
III
yl
=the variogram of the primary variable; and YijZY = the cross-variogram of the primary and secondary variables.
This technique of cokriging improves the estimation and reduces the variance of estimation error (Ahmed and De Marsily 1987). Indicator Kriging Indicator kriging is a non-parametric technique used in probability theory when parametric assumptions are not appropriate to describe the distribution of the data set. Instead, indicator kriging provides an estimate of the cumulative distribution of the data set by calculating conditional probabilities. These probabilities can be estimated by transforming the variables to a one or zero, depending upon whether they fall above or below a cutoff level (Sullivan 1984):
.
l(X;Z)
=
{I
if z(x) S Zk 0 if z(x) >Zk
(5)
where Zk is the cutoff level. By using kriging, the interpolated indicator variable at any point Xo can be estimated by n
I: (xo)
=
L
(6)
AjOik(x)
j=l
where
1*k = the estimate of the conditional probability at Xj
'\0 = the kriging weight for the indicator value at point Xj
(Rouhani and Dillon 1989).
The conditional probability in this case is defined as:
(7) By varying zk' the cumulative probability can then be constructed.
CASE STUDY The case site has been in operation for over 50 fifty years and is currently a commercial site. The area under investigation has a total of nine underground storage tanks (USTs) situated around the site. Some of the tanks have been in operation for up to 30 years and were suspected to have
WILD AND ROUHANI ON FIELD SCREENING
93
leaked for an unknown number of years. The site is approximately 90 to 95 percent covered with concrete and is relatively level. The investigated data and some site characteristics have been altered to maintain confidentiality. The horizontal and vertical contamination at the site were assessed by three boring programs. A total of 82 borings were advanced. All USTs were eventually excavated and removed after the environmental investigation. A confirmatory campaign was performed after the tanks were removed and produced 12 additional sample locations. Figure 1 shows the locations of the 94 borings and the previous locations of the tanks. A Foxboro® OV A3 was used to screen for VOCs in each boring for all four investigations. The OV A was used to screen over 300 samples from these borings.
• •
•
•
•
•
• • ••
• •
•
•
•• •
•
•
,..
•
• • •••• • • •• • • • •• •• •
• .~ • • :~ • • • • • • ••••• • •• • • •
• •
•
iii
•
B.lidirg UST Sci BOfirg
Figure 1-- Site Map. In addition to the OV A screening, laboratory analysis was performed on 35 of the 300 samples. Typically, laboratory testing is performed on high OVA readings if one wants to identify the highest concentrations for risk assessment. Alternatively, when OVA readings are neither high nor low, laboratory results can be used to determine potential contamination of a sample. In this case study, the boring campaigns collected VOC information over the entire site. Laboratory samples were severely under collected for the size of the site. From an agency standpoint, this site characterization would be incomplete and unacceptable. Figure 2 demonstrates the collected laboratory samples in the surficial soils.
3A Foxboro® OVA uses the principle of hydrogen flame ionization for detection and measurement of total organic vapors. The OVA meter has a scale from 0 to 10 which can be set to read at 1 X, 10 X or 100 X or 0-10 ppm, 10-100 ppm and 100-1000 ppm, respectively. The OVA is factory-calibrated to methane.
94
GEOSTATISTICAL APPLICATIONS
•
• •
. 1520
•
· ••
• •
•
.
:~~ : .. • •• -;.52.5••• .
• • a::: •
.. i. .250.
· •• .: I.¥'1 100:x>..
•
•
• • • • • .25
2.5-
•
•SarrpIes sent 10 lab
I (restfu in PID) .2.5
•
~....
• •
• o
•
_l1li •
ElJIldng LIST Soil Boring
20'-'
L---...J
Figure 2-- Laboratory analyzed sample locations; ethylbenzene concentrations are shown. This paper salvages this extensive investigation by providing means to extract the maximum level of information from the existing data.
ANALYSIS OF DATA SET
Cross-correlations were computed between the laboratory analyzed compounds in order to determine applicable indicator compounds, if any. The available data were also analyzed for correlation between the laboratory-analyzed data and the OV A samples. Next, the characteristic distribution of the data was determined to select the most appropriate geostatistical technique for the spatial or structural analysis. Only benzene, ethylbenzene, toluene and xylene (BTEX) were investigated. All other compounds detected in the samples, together, made up less than eight percent of the total volatile compounds detected (EPA Method 8240 or Priority Pollutant compounds were tested) . These compounds, which were mostly methylene chloride measurements, were consistently measured at low levels. Both the OVA and VOC measurements were grouped into 3-foot intervals. Because the three boring campaigns did not always collect data at consistent depths, the intervals had varying amounts of data and spatial distribution. However, these measurements were distributed over various depths. Figure 2 shows only 13 samples in the surficial layer which was the most impacted layer in terms of horizontal extent. Cross-Correlation Analysis A cross-correlation analysis was performed between each variable to determine applicability of
95
WILD AND ROUHANI ON FIELD SCREENING
indicator compounds. For this purpose, the entire data set of 35 samples, which spanned various depths, was used. Ethylbenzene had the highest average correlation value with the other BTEX compounds, R2 = 0.92 (R2 is the correlation coefficient). In addition, ethylbenzene correlated well with total BTEX (R2 = 0.94). As mentioned previously, the soil-gas probes measure total VOCs. Therefore, ethylbenzene was used as an indicator of the other three parameters and for total VOCs . The correlation analysis perfonned for each BTEX compound versus the corresponding OVA reading produced poor correlations (Figure 3). The highest correlation coefficient was for the
-
1~.--------------------------------.
1400
--
ill-
1200
8' WOO E; 800 ' ~
o
•
600 400 200
o~
°
10,000
20,000
30,000
40,000
Ethylbenzene (Ppb)
Figure 3-- OVA to ethylbenzene correlation analysis. complete OVA data set versus ethylbenzene yielding an R2 = 0.37 . This low correlation coefficient indicates that direct correlation between laboratory and OVA measurements could not be justified. As discussed previously, similar results were also found by many investigators. Structural Analysis of OVA Measurements A structural analysis was perfonned on the surficial soil OVA measurements to determine their spatial correlation. Due to the qualitative nature of the OV A, the structural analysis exhibited a high degree of variability. It was therefore concluded that, given the non-gaussian shape of the histogram of OVA measurements, an indicator transfonnation was preferable. Two approaches were considered for this analysis. The first approach, suggested by Isaaks and Srivastava (1989), uses the median value of the data as the cutoff. Given the qualitative nature of the OVA data, the median cutoff value may have no real significance. Therefore, a second approach was developed. This approach identifies an OVA cutoff that would provide a high degree of confidence that the soil is less contaminated than an established regulatory threshold for petroleum hydrocarbons. This threshold was based on a review of a number of government guidelines on petroleum-contaminated soils. The threshold or target value could then be used to develop the conditional probability Prob[Ethylbenzene ~ Target
/ OVA]
(8)
96
GEOSTATISTICAL APPLICATIONS
where Target =the cleanup or suggested maximum-allowable hydrocarbon contamination level. Calculation of the conditional probability for varying target levels (Figure 4) showed that there was a greater than 95 percent chance that the ethylbenzene level in the soil was equal to or less than a 100% 95%
,
,
. -. -.
0 90%
~
~ .0
£
85%
-..~~.~:~....-.-.---.....-..-.-. -. -. -'-
80% 75% 70%0
200
400
600
800
1000
OVA Readings (ppm) Figure 4-- Conditional probability based on OV A readings and regulatory cleanup standards. 20 parts per billion (ppb) target level, given an OVA reading of 20 parts per million (ppm) or less. Therefore, a target level of 20 ppb for ethylbenzene was selected as a conservative cleanup standard. The 20 ppm cutoff value for the OVA readings was similar to the median values for the surficial soils. Using the surficial soils data, the indicator variogram of OVA at 20 ppm (Y) is shown as Figure 5. 0.4 r - - - - - - - - - - - - - - - - - - - - - ,
•
0.3
•
0.3
!
0.2
o
~ 0.2 0.1 0.1
Distance (meters)
Figure 5-- Indicator variogram of OVA measurements at 20 ppm threshold
WILD AND ROUHANI ON FIELD SCREENING
97
As shown by this figure, the variogram demonstrated a well-defined spatial structure.
Structural Analysis of Ethylbenzene Recalling Figure 2, only 13 surficial ethylbenzene measurements were available for mapping soil contamination. As detennined in exploratory data analysis, the ethylbenzene measurements exhibited a tendency toward a log normal distribution. To account for this tendency, the natural log of the ethylbenzene measurements were taken. Furthermore, in order to avoid the possibility of numerical errors in the cokriging process, the log-transformed values were then normalized (mean =0, standard deviation =1). This made the latter data set numerically consistent with the indicator OVA values, thus minimizing the chance of numerical errors in cokriging. Unlike the OVA measurements, the variogram for the standardized, log-transformed ethylbenzene measurements (Z) demonstrated a relatively poor spatial structure (Figure 6). This short range prohibited accurate mapping with the current ethylbenzene data set. 1.4,-------------------,
•
1.2
~ 1.0 ~
0.8 0.6
10
20 30 Distance (meters)
40
50
Figure 6-- Direct variogram of normalized ethylbenzene measurements . All the above direct variographies were performed using the U.S. Environmental Protection Agency (EPA) public domain program, GEO-EAS (Englund and Sparks 1988). Cross-Variography of OVA and Ethylbenzene Measurements Cross-variography between the above two variables was conducted based on the linear model of co-regionalization (Rouhani and WackernageI1990). These computations were conducted using EPA's program, GEOPACK (Yates and Yates 1989). In this approach, the relationship between the direct and cross variograms is defined as
,i'
98
GEOSTATISTICAL APPLICATIONS + ~g2 +
yZ = algI yY = bIg] yZY = c]g]
(9)
such that (10) where yZ = the direct variogram of standardized, log-transfonned ethylbenzene, Z; y Y = the direct variogram of indicator OVA, Y; yZY =the cross-variogram of Z and Y; gi = the ith basic variogram model (sill = 1); '\ = the sill of gi in the nested variogram model of Z; bi = the sill of gi in the nested variogram model of Y; c i = the sill of gi in the nested cross-variogram model of Z,Y; and k = the number of basic models used in the nested variograms. The ratio of the fitted ci2/,\b i represents the correlation coefficient (Ri 2) at a scale consistent with the range of the ith basic variogram. Figure 7 depicts the cross-variogram between the standardized, log-transfonned ethylbenzene and 0.1.....---------------------,
~ -0.1
5b 0-0.2
'la ~
-0.3
•
-0.4
•
-O.5L-~-L-~-L-~-L-~-L_~_L_~~
o
10
20 30 40 Distance (meters)
50
60
Figure 7-- Cross-variogram of standardized ethylbenzene and indicator OVA. the indicator OVA. Table I summarizes the variogram models used for all structural analyses.
WILD AND ROUHANI ON FIELD SCREENING
99
TABLE 1 -- Summary ofvario&raphy.
Y
Spherical
0.03
0.07 0.2
21 40
Z
Spherical
0.6
0.4 0.25
21 40
Z,Y
Spherical
0.05
0.167 0.223
21 40
The scale-specific correlation coefficients displayed in Table 1 indicate that at micro-scales associated with the nugget effect, the correlation between the log-transformed ethylbenzene and the indicator OVA values is ratherlow (R,2 = 0.14). This can be attributed to measurement fluctuations. However, at larger scales, associated with the spherical models with ranges of 21 and 40 meters, the correlation between the above variables improves significantly (R22 = 0.996, R/ = 0.995). These correlations, however, can be masked by the poor correlations at the micro-scale distances. Consequently, a classical direct correlation analysis between the two measured values is very likely to fail.
CONCLUSIONS The mapping of the under-sampled ethylbenzene measurements is made feasible by incorporating the well-sampled OVA measurements. Figure 8 demonstrates the target 20 ppb contour for ethylbenzene. While the ethylbenzene measurements alone do not provide a basis for delineation of the contaminated soil, the cokriged map allows us to define the extent of contamination. However, specific areas of the site still require additional laboratory confmnation. These areas are south of the 26 ppb ethylbenzene measurement and between the two major tank pits. The above results show that field screening of contaminated sites can provide valuable information for characterization and mapping. This objective is accomplished by: Indicator transformation of OVA measurements based on a site-specific and regulatory-dependent conditional probability analysis; Multi-scale direct variography of transformed data; and Multi-scale cross-variography of the transformed OVA and laboratory measurements.
100
GEOSTATISTICAL APPLICATIONS
•
•
•
•
2.1'
•
• •• • •
•
_•
BUldng UST
• •
Soil Booing E.b<mooe • aJ ppb
Figure 8-- Results of co-estimation; ethylbenzene contamination extent In conclusion, attempting to directly correlate field screening and laboratory data is usually prone to failure. Instead, the auxiliary data is subjected to indicator transformation which is consistent with the qualitative nature of field screening data. This transformation, however, requires that a cutoff or threshold value be determined. In this work, the cutoff value is computed based on analysis of conditional probabilities of soil samples passing various regulatory criteria. Such an approach provides a flexible, site-specific algorithm for the transformation of field screening data and their eventual use for co-estimation. This combined information can then be cokriged with laboratory measurements to produce an information-efficient assessment of the extent of contamination.
REFERENCES Ahmed, S. and G. Marsily, 1987, "Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity," Water Resources Research. 23(9), pp 1717-1734 Crouch, M. S., 1990, "Check soil contamination easily," Chemical Enl:'ineerinl:' Prol:'ress, pp 41-45 Englund, E., and A. Sparks, 1988, "GEO-EAS (Geostatistical Environmental Assessment Software) Use's Guide," EPA600/4-88/033, ENMSL, Environmental Protection Agency, Las Vegas, NV Isaaks, E. H. and R. M. Srivastava, 1989, Applied Geostatistics, Oxfords University Press, New York Journel, A. G. and C. J. Huijbregts, 1978, Mininl:' Geostatistics, Academic Press, London
WILD AND ROUHANI ON FIELD SCREENING
101
Journel, A. G., 1983, "Non-parametric estimation of spatial distributions," Mathematical Geology, 15(3), pp 445-468. Marrin, D. L. and H. Kerfoot, 1988, "Soil gas surveying techniques," Environmental Science and Technology, 22(7), pp 740-745 Olea, R. A., 1991, Geostatistical Glossary and Multilingual Dictionary, International Association of Mathematical Geology Studies in Mathematical Geology No.3, Oxford University Press, 1991 Rouhani, S. and M. Dillon, 1989, "Geostatistical risk mapping for regional water resources studies," The Use of Computers in Water Management, in International Water Resources Association - Technical Session, Moscow, pp 216-228 Rouhani, S. and H. Wackernagel, 1990, "Multivariate geostatistical approach to space-time data analysis," Water Resources Research, 26(4), pp 585-591 Siegrist, R. L., 1991, "Volatile organic compounds in contaminated soil. The nature and validity of the measurement process," Conference - Characterization and Cleanup of Chemical Waste Sites, Washington D.C., Journal of Hazardous Materials, 29(1), pp 3-15 Smith, P. G., and S. Jensen, 1987, "Assessing the validity of field screening of soil samples for preliminary determination of hydrocarbon contamination," Superfund '87, Hazardous Materials Control Resources Institute, pp 101-103 Sullivan, J., 1984, "Conditional recovery estimation through probability kriging theory and practice," in G. Verly ~ aI..., eds., Geostatistics for Natural Resource Characteristics, Part I, D. Reidel Publishing Co., Dordrecht, pp 365-384 Thompson, G. M. and D. L. Marrin, 1987, "Soil gas contaminant investigations: a dynamic approach," Groundwater Monitoring Reyiew, 7(3), pp 88-93 Yates, S.R., and M.V. Yates, 1989, "Geostatistics for Waste Management: A User's Manual For the GEOPACK (Version 1.0) Geostatistical Software System," EPA, R.S. Kerr ERL, Ada, OK
Robert L. Johnson l
A BayesianJGeostatisticaJ Approach to the Design of Adaptive Sampling Programs
REFERENCE: R. L. Johnson, "A BayesianJGeostatisticaJ Approach to the Design of Adaptive Sampling Programs," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, and Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: Traditional approaches to the delineation of subsurface contamination extent are costly and time consuming. Recent advances in field screening technologies present the possibility for adaptive sampling programs---programs that adapt or change to reflect sample results generated in the field. A coupled Bayesianlgeostatistical methodology can be used to guide adaptive sampling programs. A Bayesian approach quantitatively combines "soft" information regarding contaminant location with "hard" sampling results. Soft information can include historical information, non-intrusive geophysical survey data, preliminary transport modeling results, past experience with similar sites, etc. Soft information is used to build an initial conceptual image of where contamination is likely to be. As samples are collected and analyzed, indicator kriging is used to update the initial conceptual image. New sampling locations are selected to minimize the uncertainty associated with contaminant extent. An example is provided that illustrates the methodology.
KEYWORDS: adaptive sampling program, indicator kriging, Bayesian analysis, site characterization, sampling strategy
INTRODUCTION Characterizing the nature and extent of contamination at hazardous waste sites is an expensive and time-consuming process that typically involves successive sampling programs. The total cost per sample can be prohibitive when sampling program
lStaff engineer, Environmental Assessment Division, Argonne National Laboratory, Bldg. 900, 9700 S. Cass Ave., Argonne, IL 60439.
102
JOHNSON ON SAMPLING PROGRAMS
103
mobilization costs, drilling or bore hole expenses, and sample analysis costs are all included. For example, the Department of Energy (DOE) estimates that it will spend between $15 and $45 billion dollars for analytical services alone over the next 30 years to support environmental restoration activities at its facilities (DOE 1992). One of the primary products of a site characterization study is an estimate of the extent of contamination. Traditional characterization methodologies rely on pre-planned sampling grids, off-site sample analyses, and multiple sampling programs to determine contamination extent. Adaptive sampling programs present the potential for substantial savings in the time and cost associated with characterizing the extent of contamination. Adaptive sampling programs rely on recent advances in field analytical methods (FAMs) to generate real-time information on the extent and level of contamination (McDonald et al. 1994). Adaptive sampling programs result in more .cost effective characterizations by reducing the analytical costs per sample collected, by limiting the number of samples collected by strategically locating samples in response to field data, and finally by bringing characterization to closure in the course of one sampling program. Adaptive sampling programs can result in characterization cost savings on the order of 50% to 80% (Johnson, 1993). Supporting adaptive sampling programs requires the ability to estimate the extent of contamination based on available information, to measure the uncertainty associated with those estimates, to determine the reduction of uncertainty one might expect from collecting additional samples, and to direct sample collection so that sample locations maximize information gained. Two key characteristics of contaminated sites must be taken into account. The first is that spatial autocorrelation is often present when samples are collected. The second is that there may be abundant "soft" information regarding the location and extent of contamination, even if little "hard" sample data are initially available. Soft data refers to information such as historical records, non-intrusive geophysical survey results, preliminary fate and transport modeling results, aerial photographs, past experience with similar sites, etc. A number of geostatistical approaches to the design of sampling programs for characterizing hazardous waste sites have been proposed in the past. Early methods focused on minimizing some form of kriging variance (e.g., Olea 1984 and Rouhani 1985). More recent work has centered on stochastic conditional simulation techniques, Bayesian implementations of geostatistics and more complex decision rules (for example, Englund and Heravi 1992; McLaughlin et al. 1993; James and Gorelick 1994). In practice, site characterization sampling program designs tend to blend rigid sampling grids with selective sampling based on best engineering judgement. Typically there is little quantitative analysis to support the final sampling program design. A combined Bayesianlgeostatistical methodology is well suited to quantitative adaptive sampling program support. Bayesian analysis allows the quantitative integration of soft information with hard data. Geostatistical analysis provides a means for interpolating results from locations where hard data exists, to areas where it does not. A general Bayesianlgeostastical approach to merging soft and hard data is the Markov Bayes model described by Deutsch and Journel (1992). The Markov Bayes model estimates conditional cumulative density functions by developing covariance relationships between soft and hard data sets, and pooling the two different data sources through a form of indicator cokriging.
1".,
I·r' 'Ii
'II
:r,
104
GEOSTATISTICAL APPLICATIONS
The methodology described in this paper exploits the fact that environmental indicator sampling resembles binomial sampling. Binomial sampling events allow for the derivation of conjugate prior and posterior probability density functions, which in turn greatly simplifies computational effort. By incorporating soft information into an initial conceptual model that is subsequently updated as hard sampling data becomes available, the development of covariance models between soft and hard data is avoided. The classification of areas as clean or contaminated, and the selection of additional sampling locations, is based on a form of Type I and Type II error analysis, an approach consistent with the Environmental Protection Agency's (EPA) Data Quality Objectives approach to environmental restoration decision making.
METHODOLOGY Classical statistics estimates the most likely value for 1t, the probability of encountering contamination, by using hard sample data results. For example, if 20 random locations were sampled at a site and 5 of these samples returned contamination levels above an action threshold, then an unbiased estimator of the true probability of observing contamination above that threshold for any random location at the site would be the number of hits divided by the number of samples, or 0.25. In classical statistics one could carry the analysis one step further and develop confidence intervals around this estimator with some basic assumptions about the underlying probability distribution. Kriging provides similar results for individual points in space, accommodating spatial autocorrelation as well. Neither classical statistics nor geostatistics provide a means of quantitatively accommodating soft information in the analysis. For the design of sampling programs to characterize contamination extent and subsequent analysis of sampling program results, soft information often plays a crucial role. A Bayesian approach differs from classical statistics by assuming that parameters (such as the presence of contamination at a node) are unknown initially, but have some known probability distribution called the prior probability density function (pdf). As additional information becomes available (such as results from new sampling locations), these prior pdfs can be updated quantitatively using Bayes' rule to produce posterior probability density functions:
P(XIY)
oc
P(X)P(YIX)
P(XIY) is the posterior pdf for X, P(X) is the prior pdf for X, and P(YIX) reflects the probability distribution associated with observing Y given the prior pdf of X. From a Bayesian perspective, a two parameter beta distribution Be(o:,~) is a conjugate prior in the context of Bernoulli trials and the binomial distribution (Lee 1989). Be( o:,~) ranges between zero and one, and can assume a variety of shapes depending on the values of 0: and~. For a random variable 1t that follows a beta distribution, the expected value of 1t is given by:
(1)
JOHNSON ON SAMPLING PROGRAMS
£(11:)
(ex
+
P)
105 (2)
where: ex,P = parameters associated with the beta pdf for 11:, ex,P >= O. The variance of 11: is given by:
Var{11:)
(3)
Binomial distributions provide the probability of observing a specified number of successes within a specified number of trials. Conjugate priors are priors that retain their same underlying pdf after the application of Bayes' rule. In the case of a binomial trial with an unknown underlying probability 11: of seeing a success in any given trial, if X successes are obtained in N trials, a prior for 11: of the form Be( ex,P) becomes the posterior Be(ex+X,p+N-X). N functions as the total amount of additional information supplied to the prior. As N grows large, E(11:) approaches the classical maximum likelihood estimator for 11:, XIN, and the Var(11:) decreases monotonically. When one considers only the presence or absence of contamination above some threshold, environmental sampling resembles a binomial trial---N samples collected, X of which encounter contamination above the threshold. The primary difference is that environmental samples are not independent, as required in a traditional binomial sampling sequence. Sample values, even at an indicator level, are spatially autocorrelated. The issue is how to update a prior beta distribution at a given point in space with results from samples nearby that is consistent with the derivation of beta distributions as conjugate priors for binomial distributions and that recognizes their spatial autocorrelation. Two pieces of information are required from the set of samples: N' .0' the total amount of information represented by the set of samples appropriate for that point in space, Xo, and p' .0' the probability of encountering contamination at Xo based on the samples' results. Indicator kriging provides a means for deriving these two pieces of information. An unbiased estimator of p' at Xo is given by: N
Lwl(x) i=l
where Xi
Z(x)
=
locations where samples have been collected; 0 or I, depending on whether the sample at Xi encountered
(4)
1/ q
;, "
106
GEOSTATISTICAL APPLICATIONS
contamination below or above the threshold; kriging weights. The set of kriging weights, W, can be derived by solving the following set of simultaneous linear equations: Wi
=
N
E i= 1
for j= 1, .... ,N
CijW i + WN*l
(5)
(6)
where Cij CjO
= =
covariance between sample locations Xi and Xj; covariance between sample location Xj and the point where the interpolation is taking place, "0. N* at Xo can be tied to N, the number of samples taken, through the following relationship:
Varestim
- 1
(7)
N
Varestim
where Varestim Coo ~
=
=
Coo -
(E WiCiO
+
f.L)
(8)
i= 1
the estimation variance associated with the interpolation of p* at location xo; the variance of the indicator values; the average of the indicator values for the sample locations involved in the updating.
Equation (7) is heuristically based. When the sampled locations are all "distant" from the point of interest (i.e., greater than the spatial autocorrelation range), N* goes to zero, implying that the sampled locations contribute no information at the point of interest. As a sampled location comes close to the point of interest, N* goes to infinity, indicating that the sample information has specified the probability at the point of interest exactly. The methodology begins by defining a uniform grid over the region of interest. Grid nodes are designated as Decision Points (DPs). At each DP, a pdf based on the two
JOHNSON ON SAMPLING PROGRAMS
107
parameter beta distribution Be(ct,p) is defined. The beta pdf associated with each DP describes the probability of encountering contamination above a pre-selected threshold level at that DP. Initial values for ct and p are selected to represent a synthesis of any soft information available for a site, using equations (2) and (3). For a particular DP, the values of ct and p relative to each other determine the expected probability of contamination at that DP. The absolute sizes of ct and p determine the certainty associated with the beta distribution at that DP. For example, both ct =P=O.4 and ct =P=40 result in an expected probability of contamination equal to 0.5. However, in the latter case, the variance as calculated in equation (3) is much less. In the unlikely case where no information is available at a particular DP, a "non-informative" prior can be selected that sets ct and p equal to one at that DP. Updating the set of decision points with hard sampling data requires knowledge of the variogram or covariance function for the site. Because the values of p and N at x are independent of Coo' the primary covariance function parameters of concern are its shape, or functional form, and its range. If sufficient hard data exist, one can estimate the covariance function from an experimental variogram analysis. A simple measure of the uncertainty associated with contamination extent is to categorize decision points as either "clean", "contaminated", or state uncertain at a given certainty level, where the probability of contamination being present at any given decision point is based on equation (2) using the posterior beta pdf parameters that are associated with that decision point. For example, if one wishes to be 90% certain that the classification is correct when a decision point is classified as either clean or contaminated, then decision points with E(n) ranging between 0.1 and 0.9 would be classified as state uncertain. This definition of uncertainty parallels the use of uncertainty by the EPA in its Data Quality Objectives approach to decision-making. This method for handling uncertainty also leads naturally to measures of benefit one might expect from additional data collection. For example, one might wish to sample those locations that would be expected to maximize the number of decision points classified as "contaminated" or "clean" at a given certainty level, or to minimize the number classified as state uncertain.
EXAMPLE APPLICATION A simple example illustrates this methodology in action. Figure 1 provides a plan view of a hypothetical site with surface soil contamination. The site contains a waste lagoon that was breached during a storm. The owner's property is bounded by two secondary roads. The demarcated area indicates where surface soil contamination actually exists (7 940 m2) ___ an area unknown to the site owner. The owner acknowledges that contamination exists, and that portions of the site will require remedial action. The purpose of the characterization effort is to determine the extent of contamination so that the soils can be removed and treated off-site. The responsible regulator wants all contaminated soils identified and removed. The regulator wants to ensure that the sampling program is designed so that soils that are contaminated are not erroneously classified as clean. The owner will have to pay for the characterization, excavation and remediation of all soils believed to be contaminated.
108
GEOSTATISTICAL APPLICATIONS
)(
x
x
x
IPrope rty Llne l
x x
FIG. l--Example site
The owner wants to avoid remediating soils that are actually clean, and also to minimize his characterization costs. After negotiations, the regulator agrees to tolerate a 20% chance that a soil volume identified as clean is actually contaminated. The owner will be responsible for removing and remediating all areas that have greater than 20% chance of contamination being present. There is no initial hard sampling data for this site. The available soft information includes the location of the lagoon, scattered survey points from which a terrain model can be built to indicate the probable direction of overland flow and hence contaminant migration, the location of a utility building on site that would have been a barrier to flow, and the location of roads with embankments that would have also blocked flow . This soft information is used to construct the initial conceptual image of where contamination likely is, and where it likely is not. A grid is superimposed over the site that consists of 625 decision points (Figure
JOHNSON ON SAMPLING PROGRAMS
109
----;.I.~ ·.......~ ,..... ..'..................~. ... IO~ ....______ .. •.....••.. ----I x
•
• • • • • • • • • • •: • • r• • • • • • • _
.~.~
~
~
~
••••••• ~.i •••••
~
•••;
•••••••••
-
••••••••••••••••••••••••• -'It • • • • • • • • • • • • • • • • • • • • . • ..• • _ • • • • • • • • '. . .
• ••
x
_ fi • • • • • • • • • r •
•• •• •• • ••
... ..'... ... . .......... . ···........ ,. ............ ... .. '.' .'..... .......... -......... ·..................... • • • • • • • • • • • e . • • • • • • • ' • • • • • • '-' I
,
••••••••••••••••••••••••• _
'.
"
x
• • • ~ • .• • •.! • • • • • • • • • • • • • • • • '~ '
x x
· I~_o______:_o____,_o_o~
FIG. 2--Decision point grid
2). At each decision point, a beta distribution is defined, with parameters selected to reflect the soft information available. For decision points that are in the building, IX is set equal to zero and Pto a very large number to reflect the fact that the interior of the building is known clean. For decision points within the lagoon, IX is set equal to a very large number and p equal to zero, to reflect that fact that the lagoon is known to be contaminated. For the balance of the decision points, IX and Pare set to values less than 0.5, with their relative sizes selected so that equation (2) reflects the initial probability of the presence of contamination. Figure 3 shows the gray-scale representation of the initial conceptual model once the beta distribution parameter values have been selected, along with a set of terrain contours based on-the available survey points. As is shown in Figure 3, the initial conceptual image is faithful to the location of the lagoon, building, and land surface contours. The area demarcated with the heavy black line indicates soil with contamination probability greater than 0.2 based on this initial conceptual modeL At this
110
GEOSTATISTICAL APPLICATIONS
Contam. Probability
x
I
o
1.00
FIG. 3--Initial conceptual model
point, without any sampling, the owner would have to clean up 34 440 m2 of soil, more than four times what is actually contaminated. Before the adaptive sampling program can begin, the methodology requires a covariance function. At the outset there is no hard data upon which to base a covariance function choice. If the covariance function were selected to honor the initial conceptualization, a range of approximately 200 meters would be used. The larger the assumed range, however, the fewer the samples that would be required to characterize the site. As a conservative start, for this example an isotropic exponential covariance function is assumed with range of 50 meters. A traditional sampling program for a site such as this would probably rely on a pre-planned, regular sampling grid. As a point of comparison for the subsequent adaptive sampling examples, Figure 4 shows an example pre-planned sampling program based on a triangular grid pattern. The gray-shaded surface contained in Figure 4 shows the results when a non-informative initial conceptual model is updated with the
JOHNSON ON SAMPLING PROGRAMS
I.
Sampling Poinl
111
I
x
x
Contam. Probability
0.011
0.99
FIG. 4--Standard sampling program results
information that would have been derived from this sampling program. The underlying beta distribution parameters for each decision point were set to a = P=0.1. In this scenario, the 14 samples result in classifying 23230 m2 of soil as requiring remedial action (i.e., the probability of contamination for these soils is greater than 0.2). This captures 87% of the soils actually contaminated, and includes 16230 m2 of uncontaminated soil. The classification of much of the clean area in Figure 4 as being contaminated is a product of the uncertainty associated with the use of an "ignorant" or non-informative prior during the updating process. If one uses the initial conceptual model shown in Figure 3, and updates it with the results from the sampling program shown in Figure 4, one obtains a different interpretation of the site. Figure 5 shows the results graphically. Using an initial conceptual model that reflects what is known at the outset about the site results in classifying 22 000 m2 of soil as requiring remedial action. This captures more than 98% of the soil actually contaminated, and includes 14 190 m2 of uncontaminated
112
GEOSTATISTICAL APPLICATIONS
x
'Ie
Sampling Point
I
x
FIG. 5--Standard sampling grid with initial conceptual model
soil. If one incorporates the underlying soft information available for the site, as displayed in Figure 3, and then sequentially selects 14 sampling locations that maximize the area that would be classified as clean at the 80% certainty level, then one obtains the pre-planned sampling program shown in Figure 6. The sequential selection of sampling locations proceeded as follows. First, a set of potential sampling locations based on a tight grid was established. Second, each potential sampling point was evaluated based on the impact sampling that point would have on the categorization of soils as requiring remedial action or not. If a potential sampling location had already been selected for sampling, then it was discarded. In this evaluation, it was assumed that the sampling result observed would be the most likely result based on the initial conceptual model conditioned with any locations either already sampled, or already selected for sampling. The potential sampling location that provided the greatest increase in the area of soil classified as clean would be added to the list of locations to sample. This process was
JOHNSON ON SAMPLING PROGRAMS
113
x
I_
Sampling Point
I
x
Contam. Probability
x
!
o
1.00
FIG. 6--Preplanned sampling program with initial conceptual model
used interatively until 14 locations had been selected. Figure 6 also shows the results from updating the underlying conceptual model with the results that would actually have been obtained from this pre-planned sampling program. These 14 samples reduce the amount of soil classified as requiring remedial action from 34 440 m2 in the original conceptual model to 15 120 m2 , a reduction, on average, of 1 380 m2 of soil reclassified as clean per sample collected. This captures more than 97% of the soil that is actually contaminated, and includes 7 395 m2 of uncontaminated soil. The selection of sampling locations for the pre-planned program was based on what was assumed would have been the results from sampling each of those locations. As the number of sampling locations included in a pre-planned program increases, the probability that at least one sample will encounter results that are unexpected also grows. In an adaptive sampling program, the results from previously selected sampling locations are available when the decision is made where to sample next. While the same process is
114
GEOSTATISTICAL APPLICATIONS
x
)(
I •. Sampling Point I x
x
Contam. Probability
x
I
o
1.00
FIG. 7--Extended adaptive sampling program
used for identifying the next sampling location, the difference is that the decision is conditioned on actual sample results, not assumed sample results as in the selection process for the pre-planned program. An adaptive sampling program at this site, driven by the objective of maximizing the area classified as clean at the 80% certainty level, would initially follow the same course as the pre-planned program shown in Figure 6. The reason is that for the fourteen samples collected as part of the pre-planned sampling program, all encountered what was expected---no contamination. In the case of an adaptive sampling program, one has the additional option of continuing sampling until the goals of the program have been met. Figure 7 shows the locations of an additional 14 samples for this site, along with the results from updating the underlying conceptual model with their results. The additional 14 samples reduced the area classified as requiring remedial action to 10070 m2 • This included 96% of the soil actually contaminated, and 2 460 m2 of uncontaminated soils. Each sample reclassified, on average, 350 square meters of soil, a significantly smaller amount than
JOHNSON ON SAMPLING PROGRAMS
115
obtained from the first 14 sampled. There are two reasons for this: first, there is simply less area available for reclassification to clean. The second is that the sampling has begun to encounter the unexpected---contaminated soil.
CONCLUSIONS Adaptive sampling programs provide the opportunity for significant cost savings during the characterization of a hazardous waste site. The challenge for adaptive sampling programs is providing real-time sampling program support that both incorporates the typically significant amounts of soft information available, and that accounts for the spatial autocorrelation that is omnipresent. A.joint Bayesian analysis/indicator geostastistical method can be used to guide the selection of sampling locations, to estimate the extent of contamination based on available data, and to determine the expected benefits to be gained from additional sampling. The example illustrates how the addition of soft information to the design of a sampling program can result in a more directed sampling strategy. When the ability to guide the program while in the field is added, the potential for cost savings is great. l.\tlll
ACKNOWLEDGEMENTS The work presented in this paper was funded through the Mixed Waste Landfill Integrated Demonstration, funded by the Office of Technology Development, Office of Environmental Restoration and Waste Management, U.S. Department of Energy through contract W-31-109-ENG-38.
REFERENCES Department of Energy, Analytical Services Program Five-Year Plan, Laboratory Management Division, Office of Environmental Restoration and Waste Management, Washington, D.C., January 29, 1992. Deutsch, C. V. And A.G. Joumel, GSLIB: Geostatistical Software Library and User's Oxford University Press, New York, NY, 1992.
~,
Englund, EJ. and N. Heravi, "Conditional Simulation: Practical Application for Sampling Design Optimization", Geostatistics Troia '92, A. Soares, ed., Kluwer Academic Publishers, Dordrecht, 1992, pp. 631-624. James, B. R. and S. M. Gorelick, "When Enough is Enough: The Worth of Monitoring Data in Aquifer Remediation Design", Water Resources Research, Vol. 30, No. 12, December, 1994, pp.3499-3514. Johnson, R. L., Adaptive Sampling Strategy Support for the Un!ined Chromic Acid Pit.
i
III JII
j" I'
116
GEOSTATISTICAL APPLICATIONS
Chemical Waste Landfill. Sandia National Laboratories. Albuquerque. New Mexico, Argonne National Laboratory ANLIEADffM-2, Argonne National Laboratory, Argonne, IL, November, 1993. Lee, P. M., Bayesian Statistics; An Introduction, Oxford University Press, New York, NY, 1989. McDonald, W. C, M. D. Erickson, B. M. Abratam, and A. R. Robbat, "Developments and Applications ofField Mass Spectrometers", Environmental Science & Technology, Vo!. 28, No.7, 1994, pp. 336-343. McLaughlin, L. B., L. B. Reid, S.-G. Li, and 1. Hyman, "A Stochastic Method for Characterizing Ground-Water Contamination", Ground Water, Vo!. 31, No.2, 1993, pp. 237-249. Olea, R. A., "Sampling Design Optimization for Spatial Functions", Mathematical Geology, Vo!. 16, No.4., 1984, pp. 369-392. Rouhani, S., "Variance Reduction Analysis", Water Resources Research, Vo!. 21, No.6, June, 1985,pp. 837-846.
Kadri Dagdelen 1 and A. Keith Turner 2 IMPORTANCE OF STATIONARITY FOR GEOSTATISTICAL ASSESSMENT OF ENVIROMENTAL CONTAMINATION ----~------
----------
REFERENCE: Dagdelen, K., Turner, A. K., "Importance of Stationarity for Geostatistical Assessment of Environmental Contamination," Geostatlstlcs for Environmental and Geotechnical Appliaations, ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. I. Johnson, A. J. Desbarats, Eds., Ameriaan Society For Testing and Materials, 1996.
ABSTRACT: This paper describes a geostatistical case study to assess TCE contamination from multiple point sources that is migrating through the geologically complex conditions with several aquifers. The paper highlights the importance of the stationarity assumption by demonstrating how biased assessments of TCE contamination result when ordinary kriging of the data that violates stationarity assumptions. Division of the data set into more homogeneous geologic and hydrologic zones improves the accuracy of the estimates. Indicator kriging offers an alternate method for providing a stochastic model that is more appropriate for the data. Further improvement in the estimates results when indicator kriging is applied to individual subregional data sets that are based on geological considerations. This further enhances the data homogeneity and makes use of stationary model more appropriate. By combining geological and geostatistical evaluations, more realistic maps may be produced that reflect the hydrogeological environment and provide a sound basis for future investigations and remediation. KEYWORDS: Geostatistics, environmental contamination, kriging, second order stationarity.
~ ::
INTRODUCTION
"
,~
Determination of the extent of contamination at a site is usually based on the collection and analysis of a limited number of samples. Accurate assessment of these sample values requires knowledge of the geologic conditions and correct application of geostatistical methods to extend the sample values over the entire site area. These two requirements are mutually supportive. Geological conditions control the movement of contaminants; therefore evaluations of existing information concerning contamination are dependent on a clear and unambiguous understanding of the geologic framework. Site contamination patterns may be affected in significant ways by the geologic framework in regions surrounding the site. Thus, geologic studies should extend appropriate distances beyond the immediate site boundaries. Geostatistical methods are frequently employed to convert sampled values into a complete description of the contamination pattern at a site.
Assistant Professor, Mining Mines, Golden, CO. 80401. Professor, Geological Golden, CO. 80401.
Engineering
Engineering
117
Dept.,
Dept.,
Colorado
Colorado
School
School
of
of
Mines,
:
", t,: "
118
GEOSTATISTICAL APPLICATIONS
Ordinary kriging is recognized as the best linear unbiased estimator (II.L.U.E.) that minimize the variance of error in determining the average contaminant concentration at unsampled locations. The mechanics of the kriging process are relatively straightforward. However, kriging is based on several assumptions concerning the character of the model, and kriging produces a B.L.U.E. only as long as these assumptions are not violated. Violation of these assumptions may result in strongly biased kriged estimates and a flawed site assessment. Kriging procedures are based on a random function model that is second order stationary. The stationarity of the model is the chief assumption of the kriging procedure that is often violated. A random function is said to be stationary if the probability distribution of each of its random variables is the same. A random function is first order stationary if the expected mean value of each of its random variables is the same. A random function is said to be second order stationary if, in addition, the covariance between pair of random variables exist and is the same for all points separated by a distance ~h (Journel and Huijbregts 1975). The unbiasness condition of kriged estimates are based on a random function model that is first order stationary. The ability to view a particular sample data set as an outcome of a first order stationary random function is directly related to the ability of a set of samples to represent a local population whose expected value is the same at all locations of the search neighborhood. This paper describes why a data set coming from an environmental site may not be suitable to be analyzed under the assumptions of stationary random function model. It shows how ignoring this condition leads to biased kriged estimates and documents approaches to address the stationarity issue, hereby producing more accurate estimates of contaminant distribution.
SITE DESCRIPTION The site is located on 464 acres of land in the foothills of the Colorado Front Range 20 miles south-southwest of the city of Denver. Since 1957, activities at the site consisted of missile assembly, engine testing, and research and development for the Titan I, II, and III missile programs, and included fuels development, purification, and testing in support of the Titan III program.
Geologic Framework The site straddles the eastern margin of a portion of the Colorado Front Range. The western portions are dominated by Precambrian high-grade metamorphic and igneous intrusive rocks. Younger sandstone formations are found to the east of the Precambrian rocks. These now dip away from the mountain front at relatively steep angles, up to 500. Consequently, the eastern portions of the site are entirely restricted to the lower and middle portions of the Fountain Formation. Large and small fractures, faults, and shear zones, some over a mile wide and extending for many tens of miles are common in the Precambrian rocks. Renewed movements along several of these zones of weakness introduced fractures and faults within the younger sedimentary rocks. These sandstones are partly covered by unconsolidated Quaternary and Holocene deposits, composed of silty sandy gravels with substantial proportions of clay. However, the older of these units represent pediment surface deposits and are distinct from the younger units, which are valley-fill-alluvium deposited at lower elevations in more geographically restricted areas
DAGDELEN AND TURNER ON STATIONARITY
119
following a period of valley down-cutting. The Data Set The bedrock at the site is penetrated by about 80 drill holes extending to various depths (Figure 1). Figure 1 shows these borehole locations and highlights the five samples with the highest TeE concentrations. TeE concentrations were reported for III samples, but 31 of these samples were duplicates. The sampled TeE concentrations, range between 0 and 10,000 ppm and are skewed, with an arithmetic average of 328.8 ppm and a coefficient of variation of 3.96. As shown if Figure 2, considerable numbers of samples show "non detect" conditions, and only 50% of the samples exceed 3.0 ppm TeE. Evaluation of Hydrogeologic Conditions Three distinct hydrologic regimes are obvious at the site: the older Precambrian rocks, younger sedimentary rocks, and overlying unconsolidated deposits. Each has distinctive characteristics and interactions between these regimes are relatively complex. Ground-water flow in the Precambrian rocks may be characterized as a system governed by fracture flow. The intrinsic permeability of these rocks is so low as to be negligible, but fractures and foliation planes are common and pervasive. Numerous studies have demonstrated the importance of fractures in controlling ground-water flow within these otherwise relatively impermeable Precambrian rocks. A large zone of sheared rock is mapped along one major trend that crosses the western boundary of the site. Within such regions, there may be substantial hydraulic interconnection between surface water, ground water in the relatively thin, spatially-confined and discontinuous alluvial deposits, and the regional ground-water flow systems. Water movement through the Fountain sandstone is primarily controlled by its relatively low matrix permeability. These rocks contain considerable silt and clay which reduces and clogs the pores between the sand and gravel particles. Within the Fountain, the highest permeabilities are generally oriented parallel to the inclined bedding. Fractures are often sealed by calcite, which further reduces their ability to transmit water. The Fountain thus appears to have a relatively consistent and generally low value of effective hydraulic conductivity, especially in the direction normal to the PrecambrianFountain contact. This value is lower than the effective rock-mass permeability of the fractured portions of the Precambrian terrane. At least in some areas, it seems probable that the Fountain sediments may act as a "permeability blanket" to the regional groundwater flow system in the Precambrian terrane. The presence of a leaky hydraulic barrier in the lower Fountain would be manifested by artesian or confined ground-water conditions within the lower Fountain sediments, and springs and seeps or recharging streams along or near the Precambrian-Fountain contact. The presence of such seeps and recharging streams has been reported. The unconsolidated deposits have a generally similar texture and hydraulic conductivity values, but their recharge and discharge characteristics and inter-connections are highly variable throughout the site. The ability of these deposits to act as a single shallow aquifer system is uncertain. Movement of contaminants through upper portions of weathered Fountain bedrock may provide hydrologic connections, but the existence of such flow paths does not yet appear convincing.
..
,
',I': Jill
,
iii'
Iii'
i" "
120
GEOSTATISTICAL APPLICATIONS
,.,
,.'
••
30&10.0
'.'
~O
'.'
..
1.5
'.' '.'
'.'
2~O
e~o
,---L--"---"2 1oks,---L'--'-'- 2 ' _----'--'-'--'--.,2,0k7·.,--J- - ' --.,'I a60----'--'-'--'--"2_
-'--'---'---21
Eas~ng
Figure 1 . Map of the site showing drill hole locations (the five heavier circles represent the five largest - valued samples) .
Blldrock Tee Data (log s<:ale)
Number 01 Oala 111 mean 328 .7676 SId. dev. 1304.3257
roe!. 01 va. 3.9673
0.300
maximum 10000.0000 uwer quartile .3.000 1
median 3.0000 100m quanie 0.0000 miNftum 0.0000 ~
Hist"'lram has Looar.hmic ~
0.200
c
'"Z u: 0100
·5 .0
·3.0
·1.0
l.0
3.0
Variable
Figure 2 .
Histogram and descriptive statistics defining the sample distribution of the TeE values in the bedrock.
DAGDELEN AND TURNER ON STATIONARITY
121
Available well measurements were adequate to allow the construction of two potentiometric surface maps, one showing the distribution of heads in the bedrock aquifers and the second showing conditions within the shallow ~alluvialH aquifer represented by the entire suite of unconsolidated deposits. Potentioemetric contours for the ~bedrockH groundwater system suggest water flows from the west and discharges toward the east and southeast. Shallow aquifer contours also show a general west to east flow direction over much of the site. A difference map was created by subtracting values of the heads in the alluvial aquifer from those in the bedrock. On this map positive difference values correspond to areas where the potential flow is upward, from bedrock to the alluvial aquifer. Similarly, negative difference values correspond to areas where the heads in the alluvial aquifer are higher than in the bedrock aquifer and the potential for downward flow exists. Areas of upward flow are found mostly in the lower portions of the Fountain Formation, supporting the concept that significant ground water flow from the Precambrian rocks along fracture systems may be partially blocked, causing increased pressures within the lower Fountain. From this difference map, three distinct zones were defined: • a zone of downward flow where ground water may flow from the shallow alluvial units into the bedrock; • a zone of upward flow where ground water may flow from the bedrock units into alluvium; and • a zone where neither upward nor downward flow gradients are strong and there are no apparent preferred directions of vertical groundwater movements. Definition of Subregions at Site
In those zones where upward flow from the bedrock into the alluvium appears to dominate, most wells monitoring the alluvial units report TCE contamination. Yet, in this same zone, the majority of wells monitoring the bedrock ground-water flow system show no TCE contamination. In contrast, in those zones where downward flow dominates, alluvial wells, with only a few exceptions, report no TCE contamination at locations where most bedrock wells report TCE contamination. In the zone where neither upward nor downward flow appears to dominate, many bedrock and alluvial wells report TCE contamination. TCE contamination of the Fountain bedrock thus appears to be mostly restricted to those portions of the site where downward ground-water movement from the alluvial units may be occurring. In these locations, small groups of bedrock wells reporting TCE contamination are surrounded by non-contaminated bedrock wells. The reported TCE contaminants in the bedrock thus appear to be directly related to downward movement of contaminants from the overlying unconsolidated materials. Based on such hydrogeologic evidence, the site was divided into subregions. Four initial subregions were defined by examining the shallow unconsolidated deposits in terms of: (a) the hydrogeological setting, (b) the directions of ground-water flow, and (c) the location of known contaminant sources. Their boundaries were largely defined by interpreted ground-water flow directions and ground-water divides identified by analysis of the potentiometric contours. Comparison of these initial subregions with the zones of potential vertical ground water movements, defined by the methods described earlier, and with known major contaminant source areas, yielded six subregions (Figure 3).
122
GEOSTATISTICAL APPLICATIONS
tl61 4o.lC ~----------------------~~~--------------~
N612500
N609500
N608000
-1--,------,--------,-------,--------,------,------1 210 750
2103125
2104500
3000
Figure 3 . Site map showing the six subregions. Each subregion represents an area that which is believed to contain a distinct combination of surface and bedrock geologic conditions, and a cornmon contaminant source or sources . Thus each should have a distinct population of contaminant values , and sample values from within each subregion should be considered as an outcome of a stationary random field. Each subregion should be evaluated independently by geostatistical methods , when there are sufficient samples within the area , or by visual inspection when there are too few samples to allow for geostatistical analysis. THE ASSUMPTION OF STATIONARITY IN ESTIMATION The theoretical derivation of the kriging procedure is based on the assumptions that the data observations can be conceptualized as an outcome of a second- order stationary random function. That is , the variable being measured has the same mean value at all locations and the same spatial covariance or variogram function between all points separated by a distance ~h.
DAGDELEN AND TURNER ON STATIONARITY
123
In practice, the assumption concerning stationarity of the mean values requires the sample set being evaluated to be derived from the domain under study in such way that, at any point in the domain, the expected values of samples surrounding this point should be the same. In other words, the probability of sampling high values within any local region will be the same throughout the domain, as will the probability of sampling low values. In a similar fashion, second order stationarity requires that the expected value of the squared difference between pairs of points ~h apart in a given direction to remain the same throughout the domain. Site-wide sampling campaigns may provide data sets that may be inappropriate for analysis by stationary random function models, especially if sampled locations preferentially represent locations of contaminated zones within a larger domain (Isaaks and Srivastava 1989). When data sets representing zones of different concentration levels are mixed to form a single data set, the stationary random function model may no longer be justified. When this combined data set is analyzed by kriging, samples coming from one stationary domain will influence the estimation of unknown concentrations in other domains, violating the assumptions of the stationary random function model and resulting in biased estimates. Indicator kriging determines, by using the samples in the neighborhood, the probability of data values in a given area being greater than a defined threshold value(Journel 1983; Isaaks 1984). To conduct indicator kriging, data values are transformed into indicator values: original values which exceed the chosen threshold value are coded 1, and those below the threshold value are coded O. These indicators are then analyzed to determine their spatial directional variability with a series of experimental variograms. By inspection of these variograms, orientations of greatest and least spatial continuity are selected. Variogram models are fitted to the experimental variograms corresponding to these two directions. Then the indicator data are kriged using these variogram models to determine the probability of exceeding the threshold value at a series of desired grid locations. Though actual values from multiple local zones of contamination cannot be combined to provide unbiased estimates of the average contamination at a given unsampled location, experience has shown that it may be appropriate to combine the median indicator values and treat them as an outcome of stationary random function model. ANALYSIS PROCEDURES Limitations to the use of ordinary kriging are illustrated with the data from this site. For these analyses, a threshold limit of 3.0 ppm TeE contamination was selected, because it seemed to be about the lowest reported value in any of the wells and was slightly less than the drinking water standard of 5 mg/l. When data from the entire site were combined and evaluated by ordinary kriging procedures, the resulting bias produced over-estimation of the observed concentration values over much of the site(Table 1). The data were then divided into subregions defined by careful interpretation of geologic conditions at the site, as described previously. These subregional data sets were individually analyzed with ordinary kriging procedures, and although a lower degree
124
GEOSTATISTICAL APPLICATIONS
of over--estimation of the observed values was observed (Table 1). The j,ias in these estimates was still considered unacceptable. Thus 'r.dicat:or kriging methods were used to determine if assumptions of stationarity could be achieved by this method, hereby providing more accurate ("unbiased") estimates (Table 1). Table 1 Summary of Cross Validation Results, Showing Over- and UnderEstimation Rates Achieved by Different Analysis Methods. Procedure Ordinary Kriging on entire data set (Fig.7) Ordinary Kriging on Subregions (Fig.9) Indicator Kriging on entire data set (Fig.ll) Indicator Kriging on Subregions (Fig.13)
# False Positives 25 (35%)
# False Negatives 5 (7%)
21 (29% )
4 (5%)
19 (26% )
17 (24%)
l3
12 (17%)
(18%)
Ordinary Kriging Using Data from Entire Site Pairwise relative variograms were created using routines in GSLIB (Deutsch and Journel 1992) to determine directional anisotropies within the entire data set. Figure 4 shows the contour map of the resulting variogram surface. The main axis of anisotropy is aligned along the Azimuth of 112.5°(see Figure 4), and the anisotropy ratio is 0.5. Figure 5 shows eight directional variograms oriented at 22.5° intervals. The modeled variogram uses a spherical model with a range of 700 ft, a sill of 1.2 ppm, and a nugget of 0.2 ppm. Figure 6 shows results of ordinary block kriging of the entire data set using the above parameters, and a minimum of 3 and a maximum of 16 samples. The map suggests that almost all the areas covered by drill holes are contaminated at levels exceeding 3.0 ppm, although examination of the observational data revealed that only 43.5% of the drill holes exceed this value. Considerable bias toward over-estimation has apparently occurred (Table 1). Figure 7 shows the results observed by cross-validation of kriged estimates and sampled values. Cross validation allows testing of the estimation method at locations of existing samples. The sample value at a particular location is temporarily discarded from the sample data set; the value at the same location is then estimated using the remaining samples. The procedure is repeated for all available samples (Isaaks and Srivastava 1989). On Figure 7 (and also in Figures 9,11,and 13, a circle enclosing a plussign represents locations where the sample value is below 3.0 ppm, yet the kriged estimate is
125
DAGDELEN AND TURNER ON STATIONARITY
greater than 3.0 ppm. Thirty six percent of the locations (25 of the 72 locations that had at least 3 samples within the search window) were estimated as contaminated (over 3.0 ppm) when , in reality , the sampled value was below 3.0 ppm. These results are summarized in Table 1 .
500
400
LEGEND
300
1.80
200
1.60 1.40
100
1.20
0
1.00
Y
0 .80
-100
0 .60 0 .40
-200 0.20
0.00
-300
-400
-500+"---P---""f----t=="'---t--=="'--+--'--+-----'t-----¥-----''-fL----''----+_ -50
Figure 4.
X Contour map of the pairwise relative variogram surface for the TCE data set for the entire site .
Such over-estimation has important consequences; the kriging suggested that 86% of the area may be considered contaminated while only 43.5% of the samples showed such contamination . I f the kriged va l ues are accepted as correct , substantial remediation costs can be expected . Figure 7 also shows that 5 sample locations were estimated as not contaminated (under 3.0 ppm) when in reality , the sampled value was above 3.0 ppm.
126
GEOSTATISTICAL APPLICATIONS
'~ !,---:::-----:::--.. :::--
= - = - -=,
DIoc.r-,h
..'"
01, - . " ,
...,
Figure 5. Eight directional experimental relative variograms and the fitted model for the data set for the entire site.
Ordinary Kriging Using Data by Sub-regions
In order to produce a data set that can be viewed as an outcome of a second-order stationary model for the kriging purposes, the entire data set was partitioned according to the six subregions described previously. Ordinary kriging procedures were applied to these individual subregions using the global variogram model given earlier . Figure 8 shows the results of this process , while Figure 9 shows the cross-validation plot of these results . By comparison with Figure 7 , it can be seen that the overestimation bias has been somewhat reduced, yet 21 (29%) sample locations remain over-estimated (Table 1).
DAGDELEN AND TURNER ON STATIONARITY
127
N612500
Property Boundary N6 11000
N609500
N608000
l o-jn n-'----'-.,-lIIh z.-s---'--...J'~zlutsoO--'-'----', -"ZllJ$a75
'
EiID"g
Figure 6. Map showing results of ordinary kriging using the data set for the entire site
E9 E90
•
+
+
E9
Property Boundary
.... E9
o +
+
_+
e!>
'::-ED + .,10
.....
E9
0
+ +
E9
Eas1ing Figure 7 . Map showing cross-validations for ordinary kriging applied to the set for the entire site.
128
GEOSTATISTICAL APPLICATIONS
N01401JO
N612500
N81101JO
N809500
N608000
750----'-----'-'~2..,"'ojh-,'>2.5:----'-'-----'----,-,,21.J5oo,--L---"'----,.21rno"~87r;-.--'--'--'-'--'-'21"01250
J
Fa!Dg Figure 8. Map showing results of ordinary kriging applied to the data sets for the subregions. 61~q,-'--'--.--'--r--.--'--.--r--'-'--'--'--r--.--'--'--r-,
.,.
Truevalues->
2fooksoo
Figure 9 .
'
Eas1ing Map showing cross-validations for ordinary kriging applied to the data sets for the subregions.
DAGDELEN AND TURNER ON STATIONARITY
129
Indicator Kriging Using Da·ta from Entire Site
To further explore applicability of estimators based on second-order stationary models for the data, indicator kriging was used to analyze the entire data set. The TCE data values were transformed into 0 and 1 values, depending on their values relative to the 3.0 ppm (the median of sample values)TCE threshold. Directional variograms were produced. Indicator kriging was then applied to produce a map of probabilities of any location exceeding the 3.0 ppm threshold (Figure 10). Figure 11 shows the cross-validation plot of these probability estimates against the actual occurrence of sample values greater than 3.0 ppm ( using 50% or greater probabilities). There are 19 locations (26%) with false positives and 17 locations (24%) with false negatives (Table 1). The bias toward over-estimation has been further reduced, but additional reduction in the numbers of false positive and false negative locations appeared desirable. Indicator Kriging Using Data by Sub-regions
The indicator kriging process was then independently applied to the individual sub-regional data sets. Figure 12 shows the estimated probability of exceeding 3 ppm in each block. Figure 13 shows the cross-validation plot for these resulting probability estimates against the actual occurrence of sample values greater than 3.0 ppm. A further reduction in the degree of overestimation of the area of contamination exceeding the 3.0 ppm threshold is evident. There are 13 false positives, (18%)and 12 false negatives (17%) (see also Table 1).
ttlll 11,11
CONCLUSIONS
Mis-interpretation of the extent and degree of contamination at a site is likely to occur when traditional kriging is applied to a sample data set that does not consider the geological complexity. This result is likely because: •
Kriging should not be applied to data sets having a coefficient of variation greater than 1.0, since a few high concentration samples in such skewed data sets make the model assumptions inappropriate for the data at hand, resulting in biased estimation of local averages.
•
One of the important assumptions of geostatistics is second-order stationarity. In order to be able to apply kriging, a given data set must combine samples so that they can be conceptualized as an outcome of a second-order stationary of random function. This means that the data being processed by geostatistical kriging should come from a single consistent population. Only data from similar contaminant sources and geologic environments are likely to satisfy the stationarity assumption of the model.
For the example site discussed in this paper, the coefficient of variation for the bedrock TCE data is approximately 3.96, much greater than the limit of 1.0 defined above. At this site, because the TCE contamination appears to come from multiple point sources and to be
'''I
'"I
130
GEOSTATISTICAL APPLICATIONS
N60BOOO
, 2,U$a7'-.--'---'-'-->'2f'Oh50r"----"--->'2fO*'2•.---"---'---.,2fl
2'.,hoo
Figure 10. Map showing estimated probability of the threshold of exceedence by indicator krig i ng applied to the data set for the entire site .
$
•
o ++ (!X!) _+ +
+
$
Property Boundary
.... $
1-:-$ +
~
..... 0
$
0
• $
0
0
++
$
0.5 TnHl values->
6060Q0.'-cg,orr11"'7so"'-'-'----'-'-2""rn'~h''''25.--'---'-'-->'2,mohook.r''----''--.,2",,0'/.a'''7S.--''---''--.,2,fnort,!z,..,.--'----'------m,k,.,----'------'~rn/,nm
Easling Figure 11 Map showing cross - validation of i n dicator kriging applied to the data set for the entire site.
DAGDELEN AND TURNER ON STATIONARITY
N608000
1)-1750
131
'
B*g
Figure 12, Map showing estimated probability of the threshold exceedence by indicator kriging applied to the data sets for the subregion.
,---J----'-'-21~ ...1~25,-c-...J''-.,2roo1soo,---J-'-J'-~'''''5--'---'-'~2~1"'!o6~---'-' --'-'~21-.Jo8s:!-~---'----'-~L
Easting
Figure 13. Map showing cross -validations for indicator kriging applied to the data sets for the subregions.
132
GEOSTATISTICAL APPLICATIONS
,~CClt rollEd cy dLO!ererlt ground-water flow regimes, the assumption of stationarity was not satisfied. Hence, application of the ordinary kriging technique to the entire site without subdivisions gave biased and erroneously high estimates of local TCE values, "smearing" high TCE values into locations where they do not actually occur. Such a "smearing" effect presents a false impression of widespread TCE contamination throughout the site, and suggests the presence of a large contaminant plume. Cross validation analysis provides a means for assessing the degree of bias and therefore the appropriateness of the kriged estimates at existing sample locations.
Indicator kriging was used to analyze bedrock TCE contamination data with a threshold limit of 3.0 ppm. (median of the data values). This method indicates that many areas within th~ site have low probabilities of being contaminated with TCE above this very low threshold level. Application of indicator kriging at higher threshold levels will define even more restricted areas as having significant probabilities for higher levels of TCE contamination. Analysis of the entire data set by indicator kriging procedures still resulted in slightly biased estimation; better results were obtained when indicator kriging was applied to subregional data sets. These results are summarized in Table 1. Indicator kriging is thus proposed as an appropriate method for developing realistic estimates of contamination levels at many geologically complex sites. It provides a mechanism for substantially meeting the underlying assumptions of stationarity in the model. Coupled with a complete conceptualization of the geological and hydrological framework for the site, optimal estimates may be achieved by applying indicator kriging methods to subregional data sets that reflect geologic controls. This approach will identify location of the misclassification bias both with respect to overestimation and underestimation and provide more accurate assessment of contamination limits. REFERENCES
Journel, A.G., and Huijbregts, C.J., 1975, Mining Geostatistics, Academic Press, New York, NY. Isaaks, E., and Srivastava, R., 1989, An Introduction to Applied Geostatistics, Oxford University Press, New York, NY. Journel, A.G., 1983,"Non Parametric Estimation of Spatial Distribution". Mathematical Geology: Vol. 15, No.3; 1983, pp. 445-468. Isaaks, E., 1984,"Risk Qualified Mappings for Hazardous Waste Sites: A case study in Distribution-free Geostatistics". Master's thesis, Stanford University, CA. Deutsch, C.V., and Journel, A.G., 1992, GSLIB: Geostatistical Software Library and User's Guide. Oxford University Press, New York.
Daniela Leonte 1 , and Neil Schofield 2
EVALUATrON OF A sorL CONTAMrNATED A GEOSTATrSTrCAL APPROACH
srTE
AND
CLEAN-UP
CRrTERrA:
REFERENCE: Leonte, D. and Schofield, N., "Evaluation of a Soil Contaminated Site and Cleanup Criteria: A Geostatistical Approach," Geostatistics foI' EnviI'onmental and Geotechnical Applications, ASTM STP 128:5, R. Mohan SI'ivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996.
,II>
Ii" I<
I' 'I<
IP
ABSTRACT: A case study of soil contamination assessment and clean-up in a site proposed for residential development is presented in this paper. The contamination consists mainly of heavy metals of which lead is the most important contaminant. The site has been sampled on an approximately 25 x 25 square meter grid to between 1 and 3 meters depth to evaluate the extent of the contamination. Three hotspots were identified based on eyeballing the lead sample values and a crude contauring. A geostatistical approach is proposed to map the lead cont~ination and provide an alternate evaluation. The results suggest a significantly different evaluation of the area for clean-up based on the probability of the lead concentration exceeding allowable levels.
KEYWORDS: soil contamination, hotspots, thresholds, indicator kriging, conditional probability
geoscatistics,
rNTRODUCTrON The issue of contaminated land has only recently become of importance in Australia, although chemical contamination of land and groundwater has a long history, going back to the first years of European settlement. The actual extent of the problem is yet to be accurately determined, with some predictions placing the number of
lSenior Environmental Scientist, McLaren Hart Environmental Engineering (Australia), 54 Waterloo Road, North Ryde, NSW 2113. 2Manager, FSSI Consultants (Australia) Pty. Ltd., Suite 6, 3 Trelawney Street, Eastwood NSW 2122.
133
" '"'
134
GEOSTATISTICAL APPLICATIONS
c:mtaminated sites around 10 000 [.ll. Much of the regulatory framework deal;.ng with the management of contaminated sites has been developed during the last decade. The Australian and New Zealand Guidelines for the Assessment and Management of Contaminated Sites, prepared jointly by the Australian and New Zealand Environment and Conservation Council (ANZECC) and the National Health and Medical Research Council (NHMRC), were released in January 1992 (2]. This document provides the unified framework within which individual States are developing their own legislation and guidance. Pollution control requirements are administered directly by the Environment Protection Authorities in New South Wales, Western Australia and Victoria, by the Department of Environment and Planning in South Australia, and by the Departments of Environment in Queensland and Tasmania. Specifically for the soil medium, the present lack of a unified legislative approach results from a combination of factors, the most important being: 1. More than twenty different soil profiles exist in Australia, including many where there is a sharp distinction between various horizons; as a result the natural levels and range of chemical components vary significantly throughout the country. 2. A myriad of different plant and animal species are unique to this continent [~l. 3. The value of land is still driven by commercial rather than environmental factors. Consequently, criteria-based standards which involve predetermined clean-up levels are not entirely favoured by both the public and various regulatory bodies. The ANZECC/NHMRC document, recognising the need for flexibility, concluded that "the most appropriate approach for Australia is to adopt a combination of the site-specific and criteria-based standards". This methodology incorporates, at a national level, a general set of management principles and soil quality guidelines which guide site assessment and clean-up action, obviating, where appropriate, the need to develop costly site-specific criteria. However, this approach also recognises that "every site is different" and that "in many cases, sitespecific acceptance criteria and clean-up technologies will need to be developed to reflect local conditions". As a result, the national guidelines provide a set of criteria to assist in judging whether investigation of a site is necessary. Soil quality guidelines, based on a review of overseas information and Australian conditions, give investigation threshold criteria for human health (Table 1) and the environment (Table 2). Levels refer to the total concentrations of contaminants in dry soil, and have been defined from a small number of investigations in both urban and rural areas. Background criteria pertaining to the level of natural occurrence for various chemical components in soils are also specified in the guidelines. Site data with levels less than the criteria indicate that the quality of soil may be considered as acceptable irrespective of land use, and that no further contamination assessment of the site is In cases where the contaminant concentration exceeds the required.
LEONE AND SCHOFIELD ON CONTAMINATED SITE
135
criteria, a contamination problem may exist at the site and requires further assessment. As the guidelines do acknowledge that Table 2 is "conservative and has been set for the protection of groundwater", most state environmental regulatory agencies use these levels as a starting point for further investigation, and determination of clean-up levels specifically for each site. TABLE 1--Proposed Health Investigation Guidelines [21.
Substance
Health LeveL, mg/kg
Lead Arsenic (total) Cadmium Benzo(a)pyrene
300 100 20 1
'III
TABLE 2--Proposed Environmental Soil Quality Guidelines [21.
1111
'I
III' ~ III
ti,'
Substance
Background
Antimony Arsenic Barium Cadmium Chromium Cobalt Copper Lead Mercury Molybdenum Nickel Zinc
4 - 44 0.2 - 30 20 - 200 0.04 - 2 0.5 - 110 2 - 170 1 - 190 <2 - 200 0.001 - 0.1 < 1 - 20 2 - 400 2 - 180
Env. investigation
20 20 3 50 60 300 1 60 200
""I
!: "
I
~::i
"'Ij
::'1
::1
"I
These criteria have been applied to the geostatistical case study discussed below.
THB
DATA
SBT
Site Description The site considered in this study is an almost rectangular parcel of land of some 70 000 m2 , having a general flattish topography with a slight fall to the centre. Its entire history is not well recorded and the site is only known to had been occupied by a brewery from 1885 to 1910. The site is now vacant with all buildings having been removed between 1984 and 1986. The majority of the land in the region was used by timber merchants for milling of timber from 1928 to 1980. It is
Ii' ,:::1
'"I
136
GEOSTATISTICAL APPLICATIONS
also known that the whole area, being a long strip of land along a ~') 7"ner wharf, is heavily contaminated. The contamination is associated with old practices of dumping, both domestic and industrial residues, in times when legislation controlling waste disposal in Australia was nonexistent and mudflat "reclamation" practices of this manner were actually encouraged. These residues are known from other nearby areas, to be of both Australian ane overseas sources. A development proposal to use the site for a medium density residential development comprising some 200 residential units and a retirement village complex initiated a site assessment as an initial evaluation of the potential for soil contamination. The site was sampled by taking 54 mrn diameter continuous cores from boreholes located on a grid of approximately 25 x 25 m2 . The boreholes were sampled every 500 mrn to a depth of 3 m and submitted for chemical analysis. The 1 000 - 1 500 mrn and 1 500 - 2 000 mrn layers were alternated between boreholes. The depth to which samples were taken was based on the depth to groundwater and natural soil, which was recorded on the borelogs. Samples were taken by splitting the core down the middle. The remaining core was retained for reference and in the cases of "hotspots", was used for further testing. Sampling of Hotspots All samples collected initially were analysed for a suite of parameters which included pH, Cadmium (Cd), Chromium (Cr), Copper (Cu), Nickel (Ni), Zinc (Zn), Lead (Pb), Mercury (Hg), Sulphur (S), Arsenic (As), Cobalt (Co), phenols, cyanides and total hydrocarbons. The chemical analysis revealed certain boreholes where concentrations of heavy metals, of which lead was the most important, were significantly higher than the global mean for the site. These groups of holes defined hotspots which were investigated in detail to map the extent of the contamination. Four additional boreholes were drilled around each hotspot at approximately 12.5 m spacing. Discussion of Chemical Analyses Chemical analysis was carried out on a total of 378 samples. Elevated levels of lead were found in approximately 150 samples. Calculation of the arithmetic mean across the site for various contaminants, showed that in the case of lead, the mean was 572 ppm, with a high standard deviation which reflects the highly variable nature of the contamination. In hotspot areas, lead levels as high as 3.7% (37 000 ppm) were measured. Analysis of the lead levels across the site and down the individual boreholes showed the following arithmetic mean and standard deviation values (in ppm) for the number of data points (n) in each layer:
layer layer layer layer layer
1 2 3 4 5
(0 - 500 ( 500 - 1 (1 000 (1 500 (2 000 -
mrn) 000 mrn) 1 500 mrn) 2 000 mrn) 2 500 mrn)
Mean 963.96 589.55 387.44 83.16 11. 00
St. deviation 4 269.92 1876.59 746.76 148.34 8.32
n 126 126 88 52 18
LEONE AND SCHOFIELD ON CONTAMINATED SITE
137
These statistics suggest a decrease in contamination concentration with depth, especially bellow 1.5 m. The number of samples in each layer is also decreasing with depth. Findings of the Initial Site Assessment Following chemical analysis of lead concentrations, results were used to delineate the hotspots by considering the midpoint between two nearby samples which generally followed the "rule" of one sample showing a lead concentration above the acceptable limit and the other one below this limit. However, this rule has not been obeyed for all sampling points. Specifically in bore 109 (see Figure 2), lead levels of 2 450 ppm in the first 500 mm and 1 640 ppm in the 500 - 1 000 rom soil depth intervals were considered as being isolated, and mixing during the excavation of soil was recommended as a clean-up method. However, bores 76 and 75, located to the south and north of bore 109, indicated a much lower concentration in layer 1 and a higher concentration in layer 2, as shown below:
111'
'"
1i! 1:1
I' rd,
B 76
o-
500 mm 470 ppm 500 - 1 000 mm 2 650 ppm 1 000 - 1 500 mm 180 ppm
B 75 14 ppm 3 150 ppm not sampled
"I
I,
I I~ HI 1'1::
1:11
1'1
Furthermore, Figure 2 shows that 2 of the hotspots on site are located immediately to the east of bores 76 and 75. These details indicate that the extent of contamination in the north, could be larger than that estimated in the first site assessment study. A similar situation is encountered in the south-east corner of the site, where boreholes on easting 295 m and 320 m showed very high lead concentrations at one or several depth levels. Because high concentrations did not appear clustered on all levels, it was concluded in the earlier study that no contamination existed there.
GBOSTATZSTZCAL
ANALYSZS
OP
LEAD
DATA
A file of the lead data was created to be used for a geostatistical study. The analysis was restricted to samples from the first three layers, where sampling has been relatively uniform. Statistical Analysis of Lead Data Figure 1 shows the declustered histogram of lead concentration in some 335 samples from the boreholes. The data have been declustered using the method of Schofield 1992(1] to account for the clustering of sampling in the hotspots. Lead shows a strongly positively skewed histogram with a very high coefficient of variation related to the presence of a few extreme values in excess of 10 000 ppm. The mean of lead is 793 ppm, well in excess of the third quartile and over twice the limit of 300 ppm above
"I
138
GEOSTATISTICAL APPLICATIONS
which investigation is recommended. For lead concentrations between 100 and 1 200 ppm, the histogram may be well fitted with a lognormal distribution model. Table 3 compares the cumulative probabilities for the declustered data and for a lognormal distribution with the same mean and log variance.
Statistics
...
# data: 323
Mean: 693.8
.J!!
(.)
Variance: 8190539
~
Coet.Var: 4.12
~ Cl
Minimum: 4.0
'0
1st Quart: 20.000
c::
0 .€
Median: 105.0
8.
3rd Quart: 430.0
0.
Maximum: 40800.0
e
mAD: 665.5
0.0
200
400
600
800
1000
Lead ppm
Figure I--Histogram of declustered lead concentrations in 500 mm samples.
Table 3--Comparison of the cumulative histogram of the declustered data and a lognormal distribution with the same mean and log variance. Lead, ppm
"I
'I
100 300 500 700 900 I 200
Cumulative Prob. Lognormal Distance
Cumulative Prob. Declustered Data
0.48 0.70 0.78 0.83 0.86 0.89
0.49 0.68 0.78 0.84 0.86 0.90
Spatial Distribution of Lead and Local HotsootS Figure 2 presents a contour map of lead concentration generated from a moving average of the lead sample values. The moving average method is described by Isaaks and Srivastava 1989 [~l as a useful tool for identifying local anomalies in the variability of data values. The map also shows the locations of the boreholes and the three hotspots
LEONE AND SCHOFIELD ON CONTAMINATED SITE
139
shaded in grey that were previously identified for clean-up. The contours show chree or possibly four areas of significant lead contamination. The contours do not show any preferred directional structure to the lead contamination but the lead concencration in the most easterly area is significantly higher than that in the other areas. The five samples with lead concentrations above 10 000 ppm all occur in the most eastern hotspot. Global Lognormal Model for Lead Swane et al. [~] has suggested that some regulatory agencies in Australia may favour defining acceptable clean-up criteria in terms of the 75th percentile of a lognormal model. This means that if the 75th percentile of a lognormal distribution model applied to the global histogram of lead concentrations is higher than the recommended limit of 300 ppm lead, further investigation and possible clean-up would be recommended. For the present lead data set, the 75th percentile of a lognormal model is 470 ppm. By removing the five samples with the highest lead values above 10 000 ppm, the 75th percentile is reduced to 296 ppm which may indicate that an acceptable site clean-up has been achieved. Therefore the global lognormal model would only require remediation of small areas of extreme contamination in order to reduce the global level of contamination and satisfy the acceptance threshold of 300 ppm lead for the site. Variogram Analysis of Lead
I
,
: I
The spatial continuity of lead concentration in the soil has been characterised by a set of ornni-directional indicator variograms for a range of relevant indicator thresholds. The actual spatial continuity measure used is the correlogram of Srivastava and Parker, 1988[ft]. Analysis of directional variograms did not indicate any preferred orientation to the lead contamination. There is no reason to believe that some scructured pattern of dumping lead contaminated waste would have been used at the site. Figure 3 shows a plot of the ornni-directional horizontal variograms. The limited vertical extent of sampling does not allow reasonable inference of the vertical continuity of lead contamination. The horizoncal variograms indicate an ornni-directional structure at all thresholds and an increasing nugget with increasing threshold. In all cases, the sample variograms have been reasonably fitted with a nuggeC and a single exponential model. Indicator Kriging Model The Indicator Kriging approach was used to map locally the probabilicy that the lead contamination in soil exceeds certain threshold concentrations some of which are used as clean-up criteria. The approach follows that of Isaaks 1984[~] in mapping lead concentration in soil around a lead smelter in Dallas, Texas and that discussed by Journel 1988[Q].
'.'
I
~ 11
II
"
II \. ~
f''l
:1
:;!
'I
f,j'l
:::1 '~I
140
GEOSTATISTICAL APPLICATIONS
o
C\I CO')
o
I'C\I
I Cl
o
C\I C\I
c:
~ W
o
I'-
o
C\I
o
I'-
o
C\I
o
C\I C\I
o
I'-
o
C\I
o
C\I
(w) f5U!l. nJON
Figure 2- Con kour map of lead con cencrations in the first soil lay er (0 SOO rom) showing bo r ehole and hocspo t l ocak i ons in grey shading
LEONE AND SCHOFIELD ON CONTAMINATED SITE
141
Lead Variog,a",
:2 ):'
~
.~
Model Parameters
co: 0.1 1 nugg C1: 0.91 exp range: 52 O.
0.0
.4
120.5
1SO.7
La Distance h IndicalDr 300 ppm
159<\.~~'
212~
196'1,
/--
:2 ):'
~
!?
'lij
Model P..,.",e",," CO: 0.11 nugg C1 : 0.88 exp range: 61
> 0 O.
0.0
.4
120.5
1SO.7
IndicalDr 900 ppm
r~
:2 ):'
2124,
19f1.o\
~
.~
. >
Model Parame""8 CO: 0.51 nugg C1 : 0.51 axp range: 72
0 O.
0.0
.4
120.5
1SO.7
La Distance h
Figure 3--Qrnni-directional horizontal sample yariograrns for l ead and for several indicator thresholds. Estimates of the local conditional probability for the lead concentration to exceed given thresholds were made using indicato r point kriging on a 10 meter square grid. Contours of the probability of exceedance for the 300 ppm lead thre shold for layer 1 and layer 2 are shown in Figure 4. The previously identified hotspots of high lead contamination are also shown with grey shading. The maps indicate
142
GEOSTATISTICAL APPLICATIONS
: . E. .~£f-= areas wi th a probability of at least 70 percent that the lead concentration in samples exceeds 300 ppm. The hotspots identified for clean-up lie close to the contaminated centres of two of these areas. However, a large area of high lead contamination in the south-eastern part of the site (20 m x 120 m northing and 270 m x 320 m easting) has been ignored completely, most likely because remediation of the previously identified hotspots would ensure compliance under the 75th percentile of a global lognormal criteria for lead at the site.
Layer 1, Pr (Pb > 300 ppm)
220
I CI t:
:E
120
~
o
Z
20 20
120
220 Easting (m)
320
Layer 2, Pr (Pb > 300 ppm)
220
E CI t:
:E
120
~
o
Z
20 20
120
220 Easting (m)
320
Figure 4 Cont ours of the conditional probability for lead concentration to exceed the recommended level of 300 ppm. Hotspots are shown by grey shading.
LEONE AND SCHOFIELD ON CONTAMINATED SITE
143
Figure 5 presents contour maps 0;: t he: '=s ·,:il r,at. e~1 pr::>(.clinl i ty for lead concentration to exceed 500 ppm in soils for layers 1 and 2. On these maps, the areas with very low probability of contamination are clearly shown. The areas with potentially high contamination are also clear with the southern area (20 m x 120 m northing and 270 m x 320 m easting) again standing out as unrecognised by the previous investigation.
Layer 1, Pr (Pb > 500 ppm)
220
~, 0.1-,-1
1 7 -- - , 1
I 01 ~
:E
120
t
o
Z
20 20
120
220 Easting (m)
Layer 2,
Pr (Pb > 500 ppm)
120
220 Easting (m)
320
220
I
01 ~
:E
120
t
o
Z
20 20
320
Figure 5 Contours of the conditional probability for lead concentration to exceed the recommended level of 500 ppm. Hotspots are shown by grey shading .
144
GEOSTATISTICAL APPLICATIONS
CONCLtl'SrONS
'I '" I:~
:( I'~
'", , '" ,,'
The recommendation of the ANZECC/NHMRC document for the use of both criteria-based and site-specific standards to assess soil contamination and clean-up is supported by the authors of this paper. The use of a universal or blanket standard for assessment of all sites appears inappropriate. This conclusion is supported by the outcome of applying the 75th percentile of a lognormal model criteria to the site in question in this paper. The cleaning of small areas of extreme contamination may often reduce the global level of contamination below some acceptance threshold. However, large areas carrying a significant risk of contamination above the acceptance threshold may go unrecognised and uncleaned. The application of geostatistical methods to analyse and model the lead contamination at this site appears appropriate. The dumping of lead contaminated material at the site does not seem to have been highly organised introducing considerable uncertainty as to the exact location of the contamination. Subsequent migration of the lead in soil due to natural processes has likely modified the spatial distribution of lead introducing greater complexity and uncertainty into its spatial geometry. Indicator kriging has enabled a mapping of the lead contamination at a local scale which permits an assessment of the risk associated with certain levels of contamination. When compared to previous attempts to identify areas of significant contamination (hotspots), the IK mapping indicates much larger areas associated with those hotspots where the risk of contamination is high. In addition, a large area of significant contamination which has previously gone unrecognised due to a naive decision rule, has been identified through geostatistical analysis. Although other techniques would have enabled estimation of the global lead contamination at the site by accounting for its specific directions of spatial continuity, the IK tool uniquely introduces the risk factor through the quantification of the uncertainty associated with the estimation process. Therefore making decisions on the extent and nature of remedial action to be implemented becomes a more informed process in which clean-up cost and potential liability associated with it can be evaluated.
REFERENCES
Lll
[~l
M.G. Knight. "Scale of the hazardous waste problem in Australia and disposal practice," Symposium on Environmental Geotechnics and Pyoblematic Soils and Rocks, Bangkok: Asian Institute of Technology, South-east Asian Geomechanics Society, 1985. Australian and New Zealand Environment and Conservation Council, and National Health and Medical Research Council (ANZECC/NHMRC). Australian and New Zealand Guidelines for the Assessment and Management of Contaminated Sites, January 1992.
LEONE AND SCHOFIELD ON CONTAMINATED SITE
U.l
[~]
[~]
[.§.]
[1]
[.a]
[.2.]
145
J. Daffern, C.M. Gerard and R.McFarland. "Regulatory and nonregulatory control of contaminated sites," Geotechnical Management of Waste and Contamination, Fell, Phillips and Gerrard (editors), Balkema, Rotterdam, 1993. E.H. Isaaks. Risk Oualified Mappings for Hazardous Wasce Sites; A Case Study in Distribution Free Geostatiscics. Master's thesis, Stanford University, 1984. E.H. Isaaks and R.M. Srivastava. An Introduction to Applied Geostatistics. Oxford University Press, 1989. A.G. Journel. "Non-parametric geostatistics for risk and additional sampling assessment," Principles of Environmental Sampling, L. Keith (ed.), American Chemical Society, 1988. N.A. Schofield. "Using the entropy statistic to infer population parameters from spatially clustered sampling," Proceedings of the 4th International Geostatistical Congress, Troia 92, pages 109-120, Kluwer, Holland, 1992. R.M. Srivastava and H. Parker. "Robust measures of spatial continuity," M. Armstrong (ed.), Third International Geostatistics Congress, D. Reidel, Dordrecht, Holland, 1988. M. Swane, I.C. Dunbavan and Riddell P. "Remediation of contaminated sites in Australia," Fell, Phillips and Gerrard, (editors), Geotechnical Management of Waste and Contamination, Balkerna, Rotterdam, 1993, pp 127-141.
,"
I'I!
III(
1111
'I
I
I I
Amilcar O. Soares', Pedro 1. Patinha2, Maria J. Pereira
2
STOCHASTIC SIMULATION OF SPACE-TIME SERIES: APPLICATION TO A RIVER WATER QUALITY MODELLING
REFERENCE: Soares, A. 0., Patinha, P. J., Pereira, M. 1., "Stochastic Simulation of Space-Time Series: Application to a River Water Quality Modelling," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996.
ABSTRACT: This study aims to develop a methodology to simulate the joint behaviour, in space and time, of some water quality indicators of a river, resulting from a mine effluent discharge, in order to enable the prediction of extreme scenarios for the entire system. Considering one pollutant characteristic measured in N monitoring stations along the time T, a random function X(e, t), e=l, ... ,N, t=1, .. . ,T, can be defined. The proposed methodology, a data driven approach, intends to simulate the realisation of a variable located in station e in time t, based on values located before e and t, and using a sequential algorithm. To simulate one value from the cumulative distribution F{X(e,t) I X(e-J,t), ... ,xU, t),x(e,t-I), ... ,x(e, I)}, the basic idea of the proposed methodology is to replace the e.t conditioning values by a linear combination of those:
* [x(e, t)1
=
e-I
(-I
u=1
~=I
L: au X( u, t} + L: brt X( e, ~) which allows the values to be drawn sequentially I-'
from bidistributions. The final simulated time series of pH and dissolved oxygen reproduce the basic statistics and the experimental time and spatial covariances calculated from historical data recorded over 15 months at a selected number of monitoring stations on a river with an effluent discharge of a copper mine located in the south of Portugal.
KEYWORDS: stochastic simulation, space-time series, water quality
'Professor, CVRM - Centro de Valoriza<;:ao de Recursos Minerais, Instituto Superior Tecnico, Av. Rovisco Pais, 1096 Lisboa Codex, Portugal. 'Research Fellow, CVRM - Centro de Valoriza<;:ao de Recursos Minerais, Instituto Superior Tecnico, Av. Rovisco Pais, 1096 Lisboa Codex, Portugal.
146
SOARES ET AL. ON SPACE-TIME SERIES
147
RIVER WATER QUALITY MODELLING The main objective of this study is to develop a methodology to simulate the behaviour, in space and time, of some water quality indicators of a river resulting from a mine effluent discharge, in order to enable the prediction of extreme scenarios for the entire system. The first results of the model application, in a case study of a mine located in the south of Portugal, will be presented. The time series of pH and dissolved oxygen were simulated, using historical data recorded over time at a selected number of monitoring stations on a river at Neves Corvo mine. The occurrence of extreme situations from the joint simulated time series: simultaneous high spikes of the parameters and durability in time and space of one extreme situation - can be visualised.
MODELLING METHODOLOGY: STOCHASTIC SIMULATION OF SPACETIME SERIES The proposed methodology is a data driven approach of sequential simulation of a random vector. Defining N dependent random variables Xj, X 2,... ,xN , the simulation of the cdf F( Xl, X2, ... ,xN) can be generated by a conditional distribution approach (Law and Kelton 1982, Johnson 1987, Ripley 1987), involving the following sequential procedure: · draw the first value Xl from the marginal distribution F(Xl ) of the RV Xl' · draw the second value X2 from the distribution ofRV X 2 conditioned onXl=xl: F(X2 I Xl=Xl) · draw the}/h value of the conditional distribution: F(XNI X l =Xl,x2=X2,"., XN-l=XN-l) The random variables Xj, X 2,... ,xN could be the same attribute of water quality measured in different time and spatial locations. The main limitation of the practical implementation of this algorithm is the calculation of all these conditional distributions, considering that, in most applications of earth and environmental sciences, just one realisation of Xj, X2, ... ,xN is available. Journel and Gomez Hernandez (1989) presented one solution for spatial realisations of F(Xj, X 2,... ,xN), in the geostatistical framework, using the indicator formalism (Sequential Indicator Simulation-SIS) or the multigaussian approach (Sequential Gaussian Simulation-SGS) (Deutsch and Journel 1992). Both methods apply for spatial stationarity: the stationarity of the indicator random function in SIS method or the stationarity of the gaussian transform function in SGS. Some cases, like a pollutant along the river coming from one point source, can not be considered spatially stationary. As a matter of fact, in the case study presented here, some sharp drops in the pollutant concentration are found between two monitoring stations. Thus, the application of the mentioned methodologies is not straightforward. Considering the pollutant characteristic measured in N monitoring stations over time T, the random function X( e,t), e= 1,.. ,N, t= 1,.. ,T, can be defined. The previous sequential procedure is written for X(e,t) : Monitoring Station I t=1 draw a value Xl of F(X(I,I))
148 t=2 t=T
GEOSTATISTICAL APPLICATIONS
draw a value X2 of F(X(I,2)1X(1,1)=xl) draw a value xrof F(X(I,1)IX(I,l)=xb .... ,x(I,T-I)=xr.l)
Monitoring Station N t=1 draw a value Xl of F(X(N,I)) t=2 draw a value X2 of F(X(N,2)IX(N, 1)=Xl) t=T draw a value Xr of F(X(N, 1)1X(N, 1)=x J, ••• ,x(N,T-l )=Xr-l) The basic idea of the proposed methodology is to substitute the NxT conditioning values by a linear combination of these: N-l
T-l
[X(e,t)]*= La"X(a,t)+ "
•
" j
Ij
:f.. :~"
"
.,
Lb~X(e,p) ~-l
With this approximation to simulate one value of X in monitoring station e in time t, instead of using the cdf - F[X(e,t)IX(l,l), ... ,x(e-l,t-l)] - one uses the bidistribution:
1
X(e,t) I[ X(e,t)] * =
~aaX(a,t -1) + t,bpX(e -1, (3)]
Thus, for all monitoring stations we have: Monitoring Station 1 t=1 draw a value Xl of F(X(I,I)) t=2 draw a value X2 of F(X(I,2) I [X(I,2)]* =X(I,I))=Xl) t=T draw a value XT of F(X(I,1) I [X(I,1)]* = b I X(1,I)+ ...+b r _l X(1,T-l)) = = F(X(1, 1)lbIXl+H+bT-I-xr-l)
I
I ":1
,Ii
il
."I
Monitoring Station N t=1 draw a value Xl of F(X(N,I)) t=2 draw a value X2 of F(X(N,2) I [X(N,2)]* = X(N,I)) = Xl) t=T draw a value xrof F(X(N,1) I [X(N, 1)]* = a\X(1,T-l)+ ...+b r_lX(N-l,T-l)) = F(X(N, 1)laIXl+H+br-l.xr-d
Practical Implementation of the Algorithm /
,,''" 'I
If the same pattern of neighbourhood values (in space and time) are used to calculate all [X(e,t)]*, e=1 ,N, t=1 ,T, the bidistributions functions F(X(e,t) I [X(e,t)]*) can be inferred from the historical data. Using the same neighbourhood pattern one can calculate the pairs (X(e,t), [X(e,t)] *) for all t=1,T and e=I,N. The bi-plots (X(e,t),[X(e,t)]*) can thus be calculated for each monitoring station and for homogeneous periods of time (see Fig.1). To simulate one value for X(e,t) conditioned on the known value [X(e,t)]* =x* (calculated with the previously simulated values X( e-l ,t-l)), first we need to select all pairs which belong to the class of [X(e,t)]* and, afterwards, one value ofX(e,t) is drawn randomly from them. Once the value Xs is simulated, X(e,t)=xs will be part of the next estimators [ (X(e+1,t)]* or [X(e,t+1)]*.
SOARES ET AL. ON SPACE·TIME SERIES
F(x(e,t)
I x· =
[X(e,t»)'
X
149
• • • • •• I.' . • I.
.:~.'~ ••
X S(o.')
•
'C"•• '" •
~---t_--Jr----::-.~,t-'
• •~ •
... .. ... ' :•
'1#' :
~. ~ ; : ~'. ~
•• :
:
• j
Fig 1. Illustrative representation of the simulation procedure from the bi-distribution.
STATISTICAL CHARACTERISATION OF SIMULATED TIME-SERIES With this ty'pe of data driven approach usually one wishes to mime the real data in some statistics of fundamental variables, which measure their spatial and time structures. The proposed algorithm generates a time series in each monitoring station with some statistics identical to the historical data: means, marginal histograms and spatial and time correlations. Considering for each monitoring station e, the statistics of historical data: m(e)- mean of the time data Fx(X;e)=prob{X(e,t)<x}- the marginal cdf of e Ce(h) = E{X(e, t)X(e, t+h}- m(e)2 - the time covariance in e, and the spatial covariance C(h)=E{X(e,t)X(e+h,t}- m(e).m(e+h)' Ce(h) and C(h) are measures ofthe spatial and time structure of the variableX(e,t). If the simulation sequence starts with a limited set of conditioning values with the marginal distribution FxCX;e), the bidistribution F(X(e,t)1 [X(e,t)]*) will generate a simulated time series with the same mean: m'(e) = m(e), and same marginal cdf F/ (X;e) = FxCX;e) . The simulated values Xs(e,t) reproduce the covariances between X(e,t) and [X(e,t) ]*:
150
GEOSTATISTICAL APPLICATIONS /-1
e-J
La"cov [X(e,t)),X(u,t)] + Lb~cov[X(e,t),X(e,l3)l ,,~l
~~l
[1] The simulated values Xs(e,t) reproduce an average of the time and space covariances. Consequently, the weights au and bl3 must be chosen in such way that the individual covariances are represented in final time series.
DEFINITION OF THE LINEAR COMBINATION [X( e, t)] N-I
*
)'-1
To define [X(e, t )]* = La" X(u, t) + Lb~X(e, (3) one could choose a centered H
.,
l "
and minimum variance estimator: E{X(e,t)}= E{[X(e,t)]*} and min (var{X(e,t)-[X(e,t)] *}), which entails to a kriging system written in terms of time and space covariances. For simplicity sake let us use the same notation for the weights and for the spatial and time location. The estimator of any location Xo is written:
[X(xo)r =
L A"X(X,,) "
J~ A"C(X",x~) + Jl = C(x",xo)
l~ A~ = 1
'I
The solution vector A combines two distinct effects: i) The proximity of the samples to the estimated point Xo determine their influence nd in terms of weights (2 member of the kriging system). ii) The declusterizing effect of the first member of kriging system leads usually to underweighting of clustered samples. Now the problem is that the samples in time series are usually clustered. Thus, due to the declusterizing effect, this minimum variance estimator tends to overweight the influence of the nearest sample and underweight the others influence. Consequently, the average covariance of [1] represents mainly the short distances structures. To avoid this drawback, in this study an estimator has been chosen which accounts only for the proximity effect, i.e., the weights are directly proportional to the correlation coefficient (p) between any sample Xu and the estimated point Xo : A" = Pu,o +
1I
l
Nl 1- ~P~o J
Note: this is the solution of the kriging system when there is a null correlation between the conditioning samples Xu .
SOARES ET AL. ON SPACE-TIME SERIES
151
Obviously, this is no longer a minimum variance estimator, but it assures the representativity of all co variances (in space and time) in the simulated time series. N- I
In summary, the estimator
T-I
[X(e,l)r = LaaX(a ,I) + Lb p X(e ,.8)
is defined
P- I
by the weights:
r
1 ["
Ja" = p".• + N.
lI
Tl I -
1 [ bp = Pl oP + N . T
[,-1
I-I]l J
£:P;oe + ~P/.j
e-I I-I]] l-l[" £:P;o. + ~P/.j
Note - This estimator was chosen for this particular case to avoid the overweighting of the small distances structures resulting from the ordinary kriging of clustered string data. The solution consisted of putting all covariances between samples equal to zero. However, other corrections of the covariances between samples can be adopted ( for example Deutsch 1994, suggests a kriging estimator with a 10umel ' s redundancy measure correction of the 1s l member of the equations system, for similar purpose).
MODEL VALIDATION WITH A CASE STUDY The data set used to implement the stochastic simulation consisted of monitored values of pH (daily analysis in lab) and dissolved oxygen at 4 monitoring stations located along a river, with a effluent discharge of a mine (Fig. 2), during a period of 15 months. The monitored data are shown in Figs.3a, 3b and FigsAa, 4b representing the time series, histograms and time variograms of 15 months period for pH and dissolved oxygen.
Mine Site ;
..
Ir,. '
Fig 2 - Layout of the mine site and the 4 monitoring stations along the river.
152
GEOSTATISTICAL APPLICATIONS
FF===~====~====~====~==o=n=it=o~riRng~tart~i~on~I__
- r____
pH 13
~r-________~-"
200
12 160
"
10
120 80
100
200
300
400 10 pll
days ~onitoring
12
14
Station 2
pH R===~====~============R F ~----~~~~--~~--~ 13 200
12
"
16 0
10
120 80 40
14
~====~===============~ ~o=n=it=o~riRng~tartrio~n~3______ . . __-,____________-.,
pH 13
200 _ ..
12
,-160
"
10
120 80
~ ..
40
soo
o
r--
10 pit
pH
12
14
~==========~==~=====~ ==o=n=it=o~riRng~tartrio_n_4____r -__________- '______" -
13 200 12 160
" 10
120 80 40
100
200
300
400 10 pll
Fig. 3a - Time series, histograms of pH historical data.
12
14
SOARES ET AL. ON SPACE-TIME SERIES
Monitoring Station 2
Monitoring Station I
y(d) , - - - - - - - - - - - - - - - - - ,
y(d) 1.60
153
,
....'.
.
..... .
3.20
1. 20
2.40
..
0.80
'.
,'
.... .... ,
0.40
"
1. 60
",
........
, ,., . '.'
0.00 100
200
300
days
Monitoring Station 4
Monitoring Station 3 y(d) , - - - - - - - - - - - - - - - - - ,
120
090
..
0.16
..
'
.... ..... .................
0.12
"'"
11.08
OJO
O.()4
...............
. ." ..
y(d) , - - - - - - - - - - - - - - - - - - ,
II II
_ ,"
.
•••••••
+-.a.----',-,WO.. -.~-~-------l
................. .
Fig. 3b - Time variograms of pH historical data.
Two different simulations were implemented: a conditional simulation in which the input is the real time series of the first station corresponding to a mine effluent, and a non-conditional simulation where all time series (including the mine effluent) were simulated. Conditional Simulation The estimators [X(e, t)]* were calculated with four samples before time t and two samples spatially located before e: [X(e, t)]*=a, X(e-l ,t)+ a2 X(e-2,t)+ b, X(e,t-l)+ b2 X(e,t-2)+ b3 X(e,t-3)+ b4 X(e,t-4) Based on experimental bi-plots {X(e, t), [X(e,t)]*} for each monitoring station, the simulation procedure has been initialised with real time series of the first station. The resulting simulated time series of the remaining 3 stations are shown in Figs. 5, 6 and 7a, 7b, for the two elements studied. The time correlations of the real data are quite satisfactory reproduced in the variograms of the simulated values.
154
GEOSTATISTICAL APPLICATIONS
Monitoring Start-ri_ o n~l~~~_ _~~-.-,--.-,--.-,-~~..--, F
00
.0
90 80
60
70 40
60 50
20
10
30
20
40
50 100
40
days
Monitoring Sta_t~i;c o;c n .::2;.,.-~~_ _~~-.--,-~-.-,-~-.-,-..-.--,
00
RF==~==~============R F
90
r
80
80 60
70 40
60 50
20
90
40
days
100
Monitoring Station 3 00 RF==~====~==~======~ 90
F ~--~--~~~--~~--~ 80
80 60
40
20
40
days
50
60
90
100
Monitoring Station 4
00
R==========t=I Fr====:-:r:::::::==:-:r:::::::==::::q
90
80
80 60
70 40
60 50
................ _.. ".. __ .....
20
40
100
200
300 days
500 40
90
Fig. 4a - Time series, histograms of Dissolved Oxygen historical data.
100
155
SOARES ET AL. ON SPACE-TIME SERIES
Monitoring Station 2
Monitoring Station 1 y(d)
64
..
.
48
y(d)
.. ... .'. ." .. . . .' '. •
64
'0
"
.'0
'.'
48
.. ,....
0
'
32
32
16
)6
......" .,'
.'." . ..
..
..... '
'.
0
0
100
0
200
300
400
100
0
Monitoring Station 3
96
..'
......
....
....... ...... . '
Monitoring Station 4
..
y(d)
128
400
300
200 d.ys
dlY'
64
.......
"
. . . ..... '.'
y(d)
64
48
.......... . . ..'. '. . ...... .'. . .. . . . . I.
I,
'
32
32
'0
16
100
200 days
300
400
100
200
300
400
days
Fig. 4b - Time variograms of Dissolved Oxygen historical data.
Non-Conditional Simulation The non-conditional simulation is presented only for pH. The time series of all monitoring stations were simulated including the first one corresponding to the mine effluent, based on experimental bi-histograms {X(e, t), [X(e,t)]*} . The simulated time series of the four stations are reproduced in Figs. 8a, 8b as well as the time covariances of the simulated values. Discussion of the Results The proposed simulation methodology presented very satisfactory results regarding the main objectives: generation of a time series with the same basic statistics, time and spatial correlation as the observed historical data. With the first simulation, as it is conditioned to experimental data of the first monitoring station, it reproduces, at the remaining stations, not only the variograms but also the main features of the experimental time series. The non-conditional simulation generated a time series of pH with excellent reproduction of the spatial and time correlation of historical data.
156
GEOSTATISTICAL APPLICATIONS
Monitoring Station 2 pH P===~========~======~
13
__..-.. . _ -..
~--~~~
_ ~.-
~~~---r;
200
12 16 0
II
10
120
~::
9
.::: •• •• • • •
80
40
6
o
100
200
300
400
500
14
days
Monitoring Station 3 pH ~==~=============q
13
~~~~--~~~--~ 200
12 160
II
10
•••••
~• '::
6
o
100
200
300
400
500
10
days
12
14
pH
Monitoring Station 4 pH ~==========~====~
13
~--~--~--------~ 20 0
12 160
II
10
•• •• •• • ••••••••...•••••• •••••••• • •• • ••• . •••• •
120 80 40 ~ ..
o
100
200
300 days
400
500
n
10
12
pH
Fig. 5 - Conditional Simulation of pH time series and histograms
14
SOARES ET AL. ON SPACE·TIME SERIES
Monitoring Sta;::.t:;:i;o:.:n;.:..:::2~~~~~~~~_......~~..,
DO ~====~~====~~====~
~ .•••••~••
80
-
40
20
40 b===~==~====~======~ 100 200 300 400 500
40
days
90
. . . .......... . . ... ....... . ... . .... . ...... .. . .
1--1-
60
'0
70
60
80
90
100
90
100
DO
80
60
40
rh--
20
40 b=====================~ 100
200
300
400
0
500
"40
days
'0
70
'0
80
DO
~==========~=========M~on=i=to=rii:ngSta~t~io~n~4~~-r......--'~~_~-r_~
DO I-
90
80
1-·
60
I--
40
20
100
200
300 days
400
500
40
'0
60
70
80
90
100
DO
Fig. 6 . Conditional Simulation of Dissolved Oxygen time series and histograms.
157
158
GEOSTATISTICAL APPLICATIONS
Monitoring Station 2
Monitoring Station 1
y(dl r---------..::~-----_,
y(dl
,.......,
.......
3.2 2.4
....
1.2
..
0.9
",
..··0 ," ,
.......
0.6
1.6
............ "
0.3
0.8
.............
0.0 ~~~~~~~~~~~~~~~~~~ o 100 300 400
100
200
.............
300
400
days
Monitoring Station 3 y(dl 0 16 0. 12
"
0.08
, , ...',..... ....... .......
.....
.........
,-
0.04
0.00 h~~~~~~~~~~~~~...--1
o
100
200 doys
300
4()()
Fig. 7a - Conditional Simulation of pH time variograms. Monitoring Station 1
Monitoring Station 2 y(dl
ltd)
, ......,, ",,, ....."
64 48 32
128
" 0" II
",
".
96
"
•••••••
-..,.
.......
"
",
64
','
16
......,," , ,
32
100
200
400
300
100
200
doys
300
400
doys
Monitoring Station 3 ,(dl
' ,
• ,'0
,"
6' 48
"
" ", .....
,
"
"
" "
""
"
"
32 16 0 0
100
200
300
400
doys
Fig. 7b - Conditional Simulation of Dissolved Oxygen time variograms.
SOARES ET AL. ON SPACE-TIME SERIES
159
Monitoring Sta;.:t.;cio:.:n;....::.l~~~__~_~~~~~...,...,
PH
200 160 120
.0 40
100
PII
300
"'"
12
!4
F~====~====~==========M=o=n=I=·t=o=ri~ng fStartrio_n__2~~~___.-___~r-----~
Il
200 160 120
'0 40
100
200
300
do,..
"'"
14
PH ~======~====~======~M=o=n=i= to=r~ ingStartrio_n~3~-r_____r -__________-.~ Il
200
"
160 120 80 40
"'" PH 13
_"
10
12
10 pH
12
14
~=====================M ==o=n=it=o=r~ ingStartrio~nr-4__-._____r -__________- . ,
I:-
..•..•••••.•.••••• . •.••.•••.•.. 210
" II
.- .............. -.... __ ... ..... .. .. . ....... .... -.......... .
10
. .... . ........... . ... . ... . .......... . ... . . .. .. . ... . . . ..... . ... .
200
ISO
... ...... .... ..........
~: : : : 100
200 do,..
300
"'"
100
'0
Fig. 8a - Non Conditional Simulation of pH time series and histograms.
14
Fig. 8b - Non Conditional Simulation of pH time variograms .
CONCLUSIONS The presented stochastic data driven approach aims to generate a set of realisations reproducing some basic statistics regarding the contiguity in space and time of relevant variables of a river water quality. With the stochastic realisations of time-series one can visualise the joint behaviour of the water quality characteristics and to predict extreme scenarios in the environmental system. The crucial point of the proposed methodology consists on the definition of the bidistributions between real and estimated values. They should generate posterior timeseries reproducing the same space and time covariances and marginal histograms for each monitoring station as the equivalent statistics of historical data. It means that the pattern of neighbourhood values of a space and time location to be simulated must be chosen in order to represent the relevant covariances in space and time. This simple and easy to implement data driven approach has one limitation: one can generate time-series only in the spatial locations (and with the time periodicity) of the historical data, corresponding to the monitoring stations ofthe presented case study. However, in these situations it typically is useless to simulate the time behaviour of a pollutant between two monitoring stations. Unless one has another pollutant source
SOARES ET AL. ON SPACE-TIME SERIES
161
between them, any expected value in the middle of two stations belongs to the interval of its values.
REFERENCES Deutsch, C., and Journel, A , 1992, GSLIB : Geostatistical Software Library and User 's Guide,Oxford University Press, New York Deutsch, C. , 1994, "Kriging with Strings of Data", Mathematical Geology, Vol 26, n- 5, pp 623-638. Johnson, M., 1987, Multivariate Statistical Simulation, John Wiley & Sons, New York Journel, A, and Gomez-Hernandez, 1., 1989, Stochastic Imaging of the Wilmington Clastic Sequence, SPE paper # 19857 Law, A, and Kelton, D., 1982, Simulation Modeling and Analysis, McGraw Hill Int.Ed., New York Ripley, B., 1987, Stochastic Simulation, John Wiley & Sons, New York
I ~
l
2
2
Gary N. Kuhn , Wayne E. Woldt , David D. Jones , Dennis D. Schulte3
I'I'
I;
I
Ii \ ~
SOLID WASTE DISPOSAL SITE CHARACTERIZA nON USING NON-INTRUSIVE ELECTROMAGNETIC SURVEY TECHNIQUES AND GEOSTATISnCS
I
REFERENCE: Kuhn G. N., Woldt W. E., Jones D. D., Schulte D. D., "Solid Waste Disposal Site Characterization Using Non-Intrusive Electromagnetic Survey Techniques and Geostatistics," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. 1. Johnson, A. 1. Desbarats, Eds., American Society for Testing and Materials, Philadelphia, 1995. ABSTRACT: Prior to the research reported in this paper, a site-specific hydrogeologic investigation was developed for a closed solid waste facility in Eastern Nebraska using phased subsurface characterizations. Based on the findings of this prior investigation, a surface based geoelectric survey using electromagnetic induction to measure subsurface conductivity was implemented to delineate the vertical and horizontal extent of buried waste and subsurface contamination. This technique proved to be a key non-intrusive, cost-effective element in the refinement of the second phase of the hydrogeologic investigation. Three-dimensional ordinary kriging was used to estimate conductivity values at unsampled locations. These estimates were utilized to prepare a contaminant plume map and a cross section depicting interpreted subsurface features. Pertinent subsurface features were identified by associating a unique range of conductivity values to that of solid waste, saturated and unsaturated soils and possible leachate migrating from the identified disposal areas. KEYWORDS: Geoelectrics, Electromagnetics, Geostatistics, Conductivity, Leachate, Hydrogeology, Vadose
IGraduate Student, University of Nebraska - Lincoln, Department of Biological Systems Engineering 2Assistant Professor, University of Nebraska - Lincoln, Department of Biological Systems Engineering 3Professor, University of Nebraska - Lincoln, Department of Biological Systems Engineering
162
KUHN ET AL. ON SOLID WASTE DISPOSAL
163
INTRODUCTION
Past landfill management and operational practices in the United States have created environmental problems and are commonly associated with soil and groundwater contamination. These landfills have either been upgraded to meet current State and Federal legislation or have closed. As a result, the number of operational landfills has decreased from over 20,000 in 1978 to approximately 3,300 in 1994. Because of strict State and Federal legislation passed dealing with closure of these landfills, the severe impact that past operations had on the environment is becoming more apparent. For example, in 1987 the State of Nebraska required the Nebraska Department of Environmental Quality (NDEQ) to conduct a comprehensive assessment of all community solid waste disposal sites (SWDS). The purpose of this assessment was to ascertain compliance of SWDS to standards established by the Nebraska Environmental Protection Act (NEPA) and the Federal Resource Conservation and Recovery Act Subtitle D (RCRA Subtitle D)(SCS 1991). In 1991, nearly 210 landfills in Nebraska had ceased operations or were recommended for closure by NDEQ. This recommendation was based on insufficient capacities or significant constraints imposed on owners to maintain compliance with RCRA Subtitle D requirements (SCS 1991). Due to the risk posed to Nebraska's surface and groundwater by the existence of unlicensed landfills, the NDEQ focused on closure activities. Information compiled by NDEQ identified sites located near public or private drinking water sources, that were underlain by a shallow water table surface, or were located in a 100 year flood plane~ The survey revealed that a significant number of SWDS should have further investigation. Based on this study, the NDEQ directed most of its efforts at the currently unregulated sites utilized primarily by rural communities (NDEQ 1990). These sites have not been subject to regulation since 1972 when the Whitney Amendment to the Nebraska Environmental Protection Act specifically exempted all cities within the second class (5,000 population or less) and villages from state solid waste rules and regulations (NDEQ 1990). Although these sites were exempt from state solid waste regulations, NDEQ revoked this amendment in 1991 when RCRA Subtitle D was reauthorized. As solid waste disposal sites are forced to close over the next few years, hundreds of millions of dollars will be spent in the United States to identify, characterize and remediate sites contaminated with hazardous materials. Traditional site investigation techniques typically include compiling hydrogeologic and contaminant fate and transport information from testhole and groundwater monitoring well data. Commonly, background information is limited until the results of the first round of groundwater samples is available. Only then does it become apparent that the plume may not be completely delineated and additional monitoring wells are required.
, I
L
164
GEOSTATISTICAL APPLICATIONS
To address this problem, non-intrusive field methods and geostatistical analysis tools were utilized to gather preliminary subsurface information pertaining to hydrogeologic features, horizontal and vertical extents of wastes and suspected leachate plumes. This information can be utilized to optimize and minimize testhole or permanent monitoring well locations.
LITERATURE REVIEW Electromagnetic (EM) surveying techniques, combined with appropriate geostatistical analyses, are rapid and non-intrusive methods of characterizing subsurface environments. The non-intrusive nature of this technique reduces drilling or other intrusive investigative tools. The technique of EM surveying is based on the principle of utilizing varying subsurface conductivity measurements as an indication of differing geologic and or other subsurface constructs. Surface electrical methods have been used successfully in many types of subsurface investigations. Kelly (1976) showed that the d-c resistivity method can be effective in delineating a plume moving off-site from a landfill. The use of EM data sources for delineation of contaminated groundwater has been described by Greenhouse and Siaine (1986). French et al. (1988) utilized geoelectric surveying to identify anomalous regions to focus subsequent boring and sampling activities. Hagemeister (1993) identified potential waste volumes and suspected contaminant migration present at an unregulated landfill. In each case, differing electrical conductivity was interpreted as an indication of changes in the systems being investigated. Geostatistics has been utilized in numerous investigations to estimate expected values at unsampled locations. Cooper and Istok (1988) utilized geostatistics to estimate and map contaminant concentrations and estimate errors in a groundwater plume from a set of measured contaminant concentrations. Cressie et al. (1989) prepared kriged estimate and error maps to predict a migration pathway of radionuclide contaminants from a potential high-level nuclear waste repository site. Woldt (1990) mapped the location of a suspected contaminant plume based on observed geoelectric measurements and geostatistics. Hagemeister (1993) utilized geostatistics to map subsurface electrical conductivity in two dimensional cross sections across a site. In each case, kriged estimate and error maps were prepared to assist in the interpretation of the measured data.
': ;1
DATA COLLECTION
f \
A three dimensional data set, developed by obtaining readings at several sounding depths across a gridded area, was subjected to geostatistical analyses. This data set was utilized \ in conjunction with available background data to identify pertinent subsurface features \ and approximate their general locations. The background data included boring logs, \ groundwater analytical reports and industrial waste disposal permits. These permits
\\,
KUHN ET AL. ON SOLID WASTE DISPOSAL
165
allowed the disposal of industrial wastes until the late 1970' s. The methods utilized during this study consisted of establishing a sampling grid, completing an electromagnetic survey, and performing a geostatistical analysis. Each procedure was directed towards non-intrusive characterization of the subsurface environment. Existing testhole data was correlated with the predicted locations of pertinent site features for validation purposes. Site Description Based upon information obtained from the NDEQ, the study site was operated as a "trench and fill" SWDS from 1975 to 1987. During this time, the owner accepted domestic and industrial waste from nearby rural communities. Information pertaining to the actual quantities received are not available. The site and the surrounding areas are located near the eastern most edge of the Nebraska Sandhills region. The topography of this region is mostly undulating to rolling. The elevation of the site is approximately 460 to 466 meters above mean sea level (MSL) near the northeast and southwest corners, respectively (NDEQ 1990). The surface geology consists of approximately 38 to 42 meters of fine to medium grain sands interbedded with coarse sand and fine gravel deposits, which is characteristic of this region. The uppermost monitorable aquifer is located in sand and gravel deposits of the High Plains aquifer system. The water table is approximately 11 and 15 meters below grade level (BGL) in the northeast and southwest corners of the site, respectively, and the saturated thickness of the unconfined aquifer is approximately 27 meters (NDEQ 1990). Based on regional bedrock maps for this area, it appears that the top of the uppermost confining unit is the Niobrara Shale formation that underlies the water table aquifer at an approximate elevation ranging from 422 to 424 meters above MSL near the northeast and southeast corners of the site, respectively. Under a Multi-Site Cooperative Agreement with Region VII of the Environmental Protection Agency (EPA), the NDEQ performed a Preliminary Assessment (PA) at the site to assess the threat posed by the site to human health and the environment. The NDEQ concluded that leachate from the site resulted in a leachate contaminant plume migrating in an east-northeast direction towards a river 2.5 kilometers away. Because of the low human and livestock population in the area, no evidence was found indicating that the site posed an immediate threat to human health and the environment (NDEQ 1990). Interviews with the site owner revealed that the standard operating procedures involved excavating a 5 meter deep cell with a backhoe, depositing refuse at the toe of the working face, compacting the refuse and providing 15 centimeters of daily cover material. After each cell was completely filled, 1 to 1.5 meters of silty clay was placed on top of the waste as a final cover. Based on this information, MSL elevations were assigned to the pertinent subsurface features and are presented in Table 1.
I I
166
Table 1. Approximate Elevation of Pertinent Site Features (meters) Site Feature Southwest Corner Northeast Corner Surface 466 460 Bottom of Trench 461 455 Water Table 451 449 Top of Bedrock 424 422 Sampling Grid Sampling point locations were established based on minimizing data collection efforts, maintaining minimum measurement support volumes of each instrument, and spatially defining the study area. The sampling point spacing utilized at the site was approximately 30 meters in the north and east directions and extended nearly 30 meters beyond all four property boundaries. The horizontal extent was selected based on the geology and obtaining an adequate number of sampling points beyond the limits of the suspected landfill cells to establish background subsurface conductivity levels. Electromagnetic Survey
'I: I
Electromagnetic techniques measure terrain conductivity to identify geologic and other subsurface formations. In most environmental EM applications, differing conductivity measurements are interpreted as a change in geologic formations or subsurface conditions (McNeill 1980). The EM instrument operates by generating alternating current loops with a transmitter coil (Tx). A time-varying magnetic field arising from the alternating current induces secondary currents sensed by a receiver coil (Rx) along with the primary field. The EM receiver coil intercepts a portion of the magnetic field from each loop generated by the transmitter coil and results in an output voltage which is also linearly proportional to the terrain conductivity. The resulting reading is in milli-Siemans per meter (mS/m)
II
~:
iji ~'
,!
I
GEOSTATISTICAL APPLICATIONS
Ii
The reading obtained from the EM instruments is a conductivity measurement averaged over a volume of subsurface media. Because the effective depths of penetration are small in comparison to the overall horizontal and vertical dimensions of the site, these readings were interpreted as being representative of a sampling point at the calculated effective depth. The effective depth of penetration by the induced current is directly proportional to the intercoil spacing and depends on the orientation of the instrument. By varying the intercoil spacing, conductivity measurements can be collected at varying depths. Also, operating the instrument in the horizontal dipole mode reduces the effective depth of penetration by approximately one halfthat of the vertical dipole mode. Therefore, the instrument was operated in the horizontal and vertical dipole positions at four different intercoil spacings to obtain readings at eight different depths.
KUHN ET AL. ON SOLID WASTE DISPOSAL
167
Theoretically, the total instrument response represents a weighted average of subsurface conductivities with a depth of infinity, but it does have practical limits. Interpretation, or modeling, of geophysical data to determine a reasonable unique solution to the nonunique problem was not performed. Although modeling this data provides a more comprehensive interpretation of the data set, the preliminary nature ofthis research did not warrant the level effort involved with the modeling process. Instead, Hagemeister (1993) calculated effective exploration depths offour intercoil spacings for both the vertical and horizontal dipole modes. These calculations are based on the assumption that 60 percent of the total signal contribution over a volume of subsurface media is associated with a discernible layer. Based on the small diameter and thickness of the support volume, in relation to the overall area of the site and the preliminary nature of the investigation, each instrument reading was assigned to a point located at the centroid of the calculated effective depth of penetration. Table 2 depicts the exploration depths at various intercoil spacings.
Table 2. Exploration Depths Exploration Depth (meter) Intercoil Spacing Horizontal Mode Vertical Mode 3.7 meters 1.0 2.5 10.0 meters 3.5 6.5 20.0 meters 8.0 13.0 26.0 40.0 meters 15.5
GEOSTATISTICAL ANALYSIS Environmental professionals are often confronted with the problem of providing detailed information about a site based on a minimum number of sampling points. Geostatistics provides a means of utilizing spatial continuity for estimating the expected value at unsampled locations. Geostatistics is commonly utilized to describe the spatial continuity of earth science data and aims at understanding and modeling the spatial variability of the data. The geostatistical analytical process for this study consisted of 1) describing and understanding the statistical distribution of the data, 2) modeling the spatial variability of the data, 3) estimating expected values at unsampled locations and 4) computing estimation variance values at the unsampled locations. Univariate Description Univariate description deals with organizing, presenting and summarizing data and provides an effective means of describing the data by identifying outliers and extreme values. The univariate descriptive tools utilized to analyze the conductivity data set were:
, I
I.
168
GEOSTATISTICAL APPLICATIONS
700 ,----------------------, 618
600
1) histograms, 2) probability plots and 3) summary statistics. Because the data set is three dimensional, a descriptive scatter plot could not effectively be obtained.
500
Geo-EAS 1.2.1 Geostatistical Environmental Assessment Software C>400 5 (Englund and Sparks 1988) was utilized to g. prepare histograms and probability plots for ~ 300 both the observed data set and logarithmic transformations of the observed data set. 200 Characteristic of many environmental data sets, the observed data exhibited a large 100 nwnber of low values which offset the mean of the data distribution to the left of the o median. The data were transformed to Conductivity (mmhoslm) prepare lognormal histograms and Figure 1. Histogram probability plots to determine if the data exhibits a lognormal distribution .. The tests for lognormality indicated that the data does not approach this distribution. Therefore, the observed data set was utilized for the analysis. Figure 1 presents a histogram plot for the observed conductivity data set. The summary statistics presented in Table 3 nwnerically describe the location, spread and shape of the observed data distribution. Table 3. Observed Value Summary Statistics Minimum Number 1679 0.5 Mean 23 .8 25th Percentile 10.2 Variance 263.6 Median 22.7 Standard Deviation 16.2 75th Percentile 31.9 Skewness 1.1 Maximum 125.0 Experimental Variogram A variogram is a plot of the variance, or one-half the mean squared difference, of paired data points as a function of the distance between the two points (Deutsch and Journel 1992). An omnidirectional variogram can be developed to obtain a general understanding of the spatial characteristics of the sample data. The omnidirectional variogram does not take into account spatial continuity changes due to directional changes in the data. Therefore, directional variograms are developed to identify these changes, if present. To ensure a more realistic sample variogram, the window for the lag distance did not extend greater than one-half the length or width of the data set. The lag distance between points
KUHN ET AL. ON SOLID WASTE DISPOSAL
169
was also restricted such that a minimum of 30 pairs per lag distance were available to increase the confidence of variogram calculations. GSLIB - Geostatistical Software Library and User's Guide (Deutsch and Journe11992) was utilized to develop directional and omnidirectional experimental variograms from the three dimensional conductivity data set. Generally, two sets of directional variograms were developed by restricting the paired data points to be either horizontally coplanar or vertically cocolumnar. Attempts to identify directions of maximum and minimum continuity within a horizontal plane revealed the same structure as the omnidirectional variogram in the all directions. Therefore, isotropic conditions were considered for the horizontal plane. Paired data
EM CONDUCTIVITY DATA SET 3SO.OO
I I
JOO.OO
250.00
E
e
--7
t 100.00
g'
V
.
-----
.--- . .
0 1 /
·C
~
/' :
lSO.00
100.00
/" --
50.00
0.00
o
so
100
ISO
200
2SO
Lag DistaDce (meters)
Figure 5. Schematic of Interpreted Subsurface Environment points within a 250 meter horizontal search region generally followed the pattern of the omnidirectional variogram. This indicates that the kriging estimation process was not significantly influenced by the orientation of the principal axis of the search neighborhood and data orientation within this horizontal search region. The experimental variograms for the horizontal plane and the vertical direction are presented in Figure 2. Model Yario"ram Once an acceptable experimental variogram was developed, the model variogram was constructed. Variogram modeling entailed fitting a mathematical function, using visual techniques, to the experimental variogram points by varying the model type and the nugget effect, sill and range values until the model variogram closely resembled the experimental variogram.
170
GEOSTATISTICAL APPLICATIONS
The exponential variogram model was fit to both the horizontal omnidirectional and the vertical variograms (Figure 2) utilizing the parameters presented in Table 4. Although the nugget and sill are identical, the range significantly decreases in the vertical direction. This is characteristic of geometric anisotropy and is commonly encountered in earth SCIence. Table 4. Variogram Model Parameters Parameter Horizontal Model Vertical Model Model Structure Exponential Exponential Range 250 50 Sill 180 180 Nugget Effect 130 130 Cross Validation
,
I
I
,
,
,i
Model variograms were cross validated to compare the sample point values to the estimated values at those locations. It is important to develop a variogram model that would minimize the standard deviation of the estimation error as determined by the cross validation process. A variogram model that produces good results does not necessarily indicate that the estimation at unknown locations will be accurate. However, good results from cross validation will suggest, with more confidence, the effectiveness ofthe selected model.
"
Cross validation consists of removing a data point from the data set and calculating an estimated value utilizing the model variogram. Once the estimate af6 is calculated, a comparison can be made between the estimated and observed values at each sampling point by calculating the difference between the two values. The three summary statistics utilized to evaluate the cross validation results are : 1) average kriging error (AKE), 2) mean squared error (MSE) and 3) standardized mean squared error (SMSE) (Woldt 1990). The AKE provides a measure of the degree of bias introduced by the kriging process and should equal 0 if the data is unbiased. The MSE should be less than the variance of the measured values. The SMSE is a measure of the consistency and is satisfied if the SMSE is within the interval 1.0±[2(2/n)1/2]. The results are summarized in Table 5 along with their calculated expected values.
Table 5. Cross Validation Summary Statistics SMSE MSE AKE <263.6 0.931 to 1.069 Expected Value 0.0 -0.03 131.0 0.89 Cross validation Results
I'I I
J,
KUHN ET AL. ON SOLID WASTE DISPOSAL
171
As depicted in Table 5, the criteria meet the recommended AKE and MSE values and is just outside the range of expected SMSE values. These results are generally considered acceptable. Ordinary krigin~ Ordinary point kriging was selected to estimate expected values at unsampled locations. This method was selected because it is a linear unbiased estimator that attempts to minimize the error variance and generally has the lowest mean absolute error and mean squared error in comparison to other estimation methods (i.e. polygonal, triangulation, local sample mean, and inverse distance squared). _gSLIB (Deutsch and Journel 1992) was utilized to calculate the expected values at 8,000 unsampled locations, or nodes, on a 20-x 20-~ 20 grid from the three dimensional conductivity data set. The nodes were ~paced meters in the north and east horizontal directions and 2 meters in the vertical direction. These spacings were selected based on the anticipated spatial orientation and depths of the pertinent subsurface features.
at2S"
Search Nei~hborhood The search neighborhood was established based on the following criteria: 1) selecting the greatest distance on the variogram model that closely fit the experimental variogram, 2) at least 30 pairs were utilized to calculate the experimental variogram at each point and 3) the distance did not exceed half the length ofthe horizontal sampling grid diagonal. The search neighborhood was defined in the horizontal plane at 250 meters. The geometric anisotropy limited the search to within 25 meters in the vertical direction which reflects the region of the variogram with the higher level of confidence. Background Conductivities Generally, the expected values located west and south of the site and outside the property boundaries were considered to represent background subsurface conductivities, or expected values assumed not to be impacted by past site activities. This was established based on the groundwater flow direction. These background values generally ranged from less than 0 mS/m to 12 mS/m in the vadose zone and from 12 mS/m to 24 mS/m below the water table. These ranges of values established the basis of interpretation for identifying hydrogeologic features, landfill cells and potential leachate migrating from the cells.
DISCUSSION The previous sections discussed a methodology that can be utilized to interpret surface based electrical data in an effort to construct reliable maps of suspected subsurface features. Based on the selected cross sections of expected conductivity values presented
I,
I:" Iii
172
GEOSTATISTICAL APPLICATIONS
in Figures 3 and 4 and limited knowledge of the site, interpretations of: 1) site specific hydrogeology, 2) horizontal and vertical extents of waste and 3) potential sources for leachate migration were developed Hydro geolo (;:y Station 000 North (Figure 3) depicts a vertical cross section of the background expected subsurface conductivity values for background reference. This station is located upgradient of the site and was utilized as an indication of subsurface conditions not impacted by past landfill operations. information obtained from NDEQ records, the natural subsurface environment adjacent to the boreholes consists of fine to medium sand with a static water table elevation near 450 meters. Therefore, 0 to 12 and 12 to 24 mS/m were determined to represent unsaturated and saturated fine to medium sands, respectively. A variance from these ranges was interpreted as an indication of differing subsurface structures, or impact from the landfill operation. Landfill Cell Identification Based on the nature of the site operations, landfill cells are expected to be located from the ground surface down to an approximate depth of 5 meters. The selected vertical cross sections (Figure 3) depicted expect conductivity values near the surface ranging in excess of 24 mS/m within the 0 to 5 meter BGL depth range. Generally, the landfill cells appear to cover the entire site. Based on the vertical cross section maps (Figure 3), two areas exhibiting conductivity values in excess of24 mS/m were elongated in the north and south directions with the centerlines located near stations 150 East and 300 East. Based on a personal interview with the owner, it appears that these two areas are actually several landfill cells spaced close together. Potential Leachate Mi(;:ration The primary concern with SWDS is the potential leachate contamination associated with the nature of the operation. Leachate is a liquid that consists of refuse moisture and all precipitation that mixes with this moisture as it migrates through the landfill. Leachate migrating from a landfill naturally due to gravity or forced out as a result of the consolidation of refuse may transport contaminants from the refuse to the groundwater environment. The presence of elevated conductivity values within the vadose and directly below the landfill cells was interpreted as potential leachate or possible instrument interference from the overlying waste. These conductivity values ranged from 12 mS/m to 24 mS/m (Figure 3).
_Uno _UnoSTATION 000 NORTH
-
465
Q) 'ai
~
ciQ"
.,c
~
\H
<: ~
a... ~
'ai
I~
455
2
o «i= > W
450
"
445
" •
12
-I
440
~
435
-
~
Q)
,,,"
-
5
STATION 100 NORTH
I
-
,-~
~
I
W
5
455
~
2
0
~
:.,
.
"" r
450
30
> UJ
" "
445
12
-I
W
-I
(/)
:2:
:2:
.,
Q
300
200
100
("")
'"'"
200
100
EASTING (meters)
300
EASTING (meters)
A
c
~
... ~
::t.
....
M
>oi "CI
... ~
~
Q.
~
5
445
~
w
~
"CI
~
I~
i=
12
•
ll 440 -I (/)
_
m
-i
»r
....
I
0
z
en
'ai
2
::;
....
465
Q)
455
~
-
2Q) 450
~
STATION 425 NORTH
465
o
0
_
STATION 225 NORTH
Q
=
Q
I Z
5 i=
30
""
12
•
W
II (/)
:2:
.
~445 -I
435
r .,
2
0
440
300
EASTING (meters)
400
0
:E »en -i
m
0
en
-0
en »r
430 200
c
0
435
:2: 100
0
0
100
200
300
EASTING (meters) -...,J
VJ
ELEVATION 440 meter.> (satuntm condidons)
"fj
ciQ'
.,= n>
~
.
400
.!l
! 300 ~
.,== N' = .... Q
Q
e:. .,(j
Q
i!i
~200
0
z 100
100
'"'" rJJ n> t'l ....
S'
= ..., .,~
Q
200
300
EASTING (meten:)
"""
ELEV A TlON 452 meter.> (unsaturMed conditions Ma,.,...tertable)
ciQ' n>
Q.
0
.... ~ ~
~ ~
'Q
100
200
300
FASTING (meters)
400
KUHN ET AL. ON SOLID WASTE DISPOSAL
175
Intemreted Subsurface Environment Based on the information obtained from this study, it appears that leachate may have migrated from the landfill and impacted the groundwater table. The plume appears to be migrating horizontally in the northeast direction with a vertical component. Figure 5 consists of a plan view and cross section illustrating an interpreted leachate plume located relative to the identified waste.
PROPERTY LINE
Figure S. Schematic of Interpreted Subsurface Environment
Computer Software Support All data description and estimation efforts requiring computer software support was completed utilizing an IBM 80386 processor. Software support utilized in this study consisted of Geo-EAS, GSLIB and TecPlot Version 6.0 (TecPlot). Probability and histogram plots describing the 1679 observed conductivity values were prepared utilizing Geo-EAS. Geo-EAS generated on-screen plots within minutes allowing efforts to be focused on the descriptive analyses. The expected values. are presented on cross section contour maps included as Figures 3 and 4. The three dimensional expected value data set was imported into TecPlot. TecPlot utilizes linear interpolation to construct each contour line. Tecplot generated cross section maps based on a three dimensional data set. By fixing one dimension, a cross section at a desired location was generated within minutes.
176
GEOSTATISTICAL APPLICATIONS
CONCLUSIONS Geostatistical analysis demonstrated that the data is spatial correlated which allowed for an interpreted subsurface model to be developed based on kriged estimated values. As an alternative to traditional intrusive characterization techniques, surface based electromagnetic surveying techniques proved to be a key non-intrusive, cost-effective element in the refinement of the second phase of the hydrogeologic investigation. Review of kriging error maps can further refine this second phase by focusing on the areas with the largest error. This study demonstrated that this methodology, as a preliminary field screening tool, can provide sufficient information to optimize the placement and minimize the number of permanent groundwater monitoring wells.
REFERENCES Barlow, P.M., Ryan, B.J., 1985,"An Electromagnetic Method of Delineating GroundWater Contamination, Wood River Junction, Rhode Island," Selected Papers in Hydrologic Sciences, U.S. Geological Survey Water-Supply Paper 2270, pp. 35-49. Cooper, R.M., Istok, J.D., 1988, "Geostatistics Applied to Groundwater Contamination. I: Methodology, "Journal of Environmental Engineering, Vol. 114, No.2., pp. 270-285. Cressie, N.A., 1989, "Geostatistics," American Statistician, Vol. 43, pp. 197-202. Deutsch, C.V., Journel, A.G., 1992, "GSLIB - Geostatistical Software Library and User's Guide," Oxford University Press. Environmental Protection Agency, 1994, "EPA Criteria for Municipal Solid Waste Landfills," The Bureau of National Affairs, Inc., 40 CFR Part 258. Englund, E. and Sparks, A., 1988, "Geo-EAS 1,2,1 User's Guide", EPA Report #60018911008, EPA-EMSL, Las Vegas, Nevada. French, R.B., Williams, T.R., Foster, A.R., 1988, "Geophysical Surveys at a Superfund Site, Western Processing, Washington," Symposium on the Application of Geophysics to Engineering and Environmental Problems, Golden, Colorado, pp. 747-753. Greenhouse, J.P., Slaine, DD, 1986, "Geophysical Modeling and Mapping of Contaminated Groundwater Around Three Waste Disposal Sites in Southern Ontario," Canadian Geotechnical Journal, Vol. 23, pp. 372-384. Hagemeister, M.E., 1993, "Systems Approach to Landfill Hazard Assessment with Geophysics (SALHAG)," Unpublished Masters Thesis, University of Nebraska - Lincoln. 1994, "Handbook of Solid Waste Management," McGraw-Hill Publishing
It
Ii11
KUHN ET AL. ON SOLID WASTE DISPOSAL
177
Isaaks E.H., Srivastava R.M., 1989, "An Introduction to Applied Geostatistics," Oxford University Press, New York. Joumel, A., Huijbregts, C., 1978, "Mining Geostatistics," Academica Press, New York. McNeill J.D., October 1980, "Electromagnetic Terrain Conductivity Measurement at Low Induction Numbers," Geonics Limited Technical Note TN-6. NDEQ, February 1990, "Ground Water Quality Investigation of Five Solid Waste Disposal Sites in Nebraska", Nebraska Department of Environmental Quality. SCS Engineers, December 1991, "Volume 1 - Recommendations to State and Local Governments," Nebraska Solid Waste Mana~ement Plan, Nebraska Department of Environemtnal Quality. Woldt, W.E., 1990, "Ground Water Contamination Control: Detection and Remedial Planning," Ph.D. Dissertation, University of Nebraska - Lincoln.
I "
,,
:.,
" , ,I
'.
I,: "
Ii'
Ii'
'i,
,,,
Geotechnical and Earth Sciences Applications
I
'I
Craig H. Benson1 and Salwa M. Rashad 2 ENHANCED SUBSURFACE CHARACTERIZATION FOR PREDICTION OF CONTAMINANT TRANSPORT USING CO-KRIGING REFERENCE: Benson, C. H. and Rashad, S. M., "Enhanced Subsurface Characterization for Prediction of Contaminant Transport Using Co-Kriging," Geostatisticsfor Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: Groundwater flow and advective transport were simulated in a heterogeneous synthetic aquifer. These simulations were conducted when the aquifer was fully defined and when it was characterized using a limited amount of hard and soft data (hydraulic conductivity data and soil classifications). Co-kriging was used to combine the data types when estimating the hydraulic conductivity field throughout the aquifer. Results of the flow and transport simulations showed that soil classifications were useful in characterizing the hydraulic conductivity field and reducing errors in statistics describing the plume. KEYWORDS: kriging, co-kriging, ground water, contaminant transport, hydraulic conductivity, soil classifications INTRODUCTION
Simulating flow and contaminant transport is often an essential feature of remediation projects dealing with contaminated groundwater. In recent years, numerous sophisticated groundwater models have been developed to conduct such simulations. The complexity of these models allows one to realistically simulate the fate of contaminants provided properties of the aquifer affecting transport are adequately characterized. Unfortunately, what level of characterization is "adequate" is unknown, especially at sites where the subsurface is heterogeneous. Thus, when limited data are available to describe subsurface conditions, predictions of contaminant transport can be uncertain even when sophisticated models are used. Although many factors affect the fate of groundwater contaminants, the spatial distribution of hydraulic conductivity is the primary factor affecting which pathways are active in transport (Webb and Anderson 1996). To better define these pathways, additional data must be collected and analyzed. The most useful data are hydraulic conductivity measurements. However, "hard" data such as hydraulic conductivity measurements are expensive to obtain, especially if the data are to be collected from a site that is contaminated. It is advantageous, therefore, to investigate the effectiveness of using less expensive "soft" data, such as soil classifications, to reduce uncertainty. Soft data can be readily collected using less expensive exploration techniques such a ground penetrating radar, terrain resistivity surveys, or cone penetrometer soundings. The objective of the project described in this paper was to evaluate how characterizing the subsurface affects predictions of contaminant transport. Simulations of 1Assoc. Prof., Dept. of Civil & Environ. Eng., Univ. of Wisconsin, Madison, WI, 53706, USA. 2Asst. Scientist, Dept. of Civil & Environ. Eng., Univ. of Wisconsin, Madison, WI, 53706, USA.
181
•
!'j
II
182
GEOSTATISTICAL APPLICATIONS
groundwater flow and advective transport were conducted in a heterogeneous "synthetic aquifer." The aquifer was characterized using various amounts of hard data (hydraulic conductivities) and soft data (soil classifications). Co-kriging was used to combine the two data types when estimating the hydraulic conductivity field. Similar uses of cokriging have been described by Seo et al. (1990 a,b) and Istok et al. (1993).
SYNTHETIC AQUIFER Characteristics
A "synthetic aquifer" was used in this study because it can be fully-defined; that is, the hydraulic properties throughout the aquifer are defined with certainty. In this particular application, fully-defined means that hydraulic conductivities and soil classifications can be assigned to every cell in the finite-difference grid used in simulating flow and transport in the aquifer. Thus, flow and transport simulations conducted with the "fully-defined" aquifer are representative of its "true" behavior. Comparisons can then be made between results obtained using the fully-defined case and cases where the aquifer has been characterized with a limited amount of sub-surface data. This comparison provides a direct means to evaluate the inherent inaccuracies associated with estimating subsurface conditions from a limited amount of information. A schematic illustration of the aquifer is shown in Fig. 1. It is extremely heterogeneous, as might be encountered in a supra-glacial depositional environment such as those occurring in the upper midwestern United States (Mickelson 1986, Simpkins et al. 1987). Details of the method used to design the aquifer are in Cooper and Benson (1993). Although an attempt was made to create a realistic aquifer, the synthetic aquifer was created without any site-specific data and thus may not be "geologically correct." The reader should keep this limitation in mind when considering the results and conclusions described later.
6 Clay 5 Clayey Sill • Silty Sand 3 Ane Sand
2 Coarse-Medium Sand 1 Clean Gravel
Upstream Boundary
Hydraulic Gradient. 0.01
Downstream
Boundary • FIG. 1 - Synthetic aquifer.
The aquifer is discretized into 12,500 cells that comprise a finite-difference grid used in simulating flow and transport. The aquifer is segregated into 25 layers. Each layer contains 20 rows and 25 columns of finite-difference cells. Each cell is 100 cm
BENSON AND RASHAD ON CO-KRIGING
183
long per side. Groundwater flow was induced by applying an average hydraulic gradient of 0.01. Constant head boundary conditions were applied at the upstream and downstream boundaries of the aquifer. No flow boundaries were applied along the remaining surfaces of the aquifer. An important feature of the aquifer is that soil types are layered to create continuous and non-continuous soil lenses. Lenses with high hydraulic conductivity, such as clean gravel and coarse to medium sand, simulate preferential flow paths that might not be detected during a subsurface investigation. Low hydraulic conductivity soils such as clayey silt and clay are layered to create pinches and stagnation points that may cause the flow of groundwater to slow or even stop. These intricacies of the aquifer also might not be detected during a subsurface investigation. Hydraulic Conductivity of Geologic Units
A soil classification was assigned to each geologic unit (i.e., the geologic facies) in the fully-defined synthetic aquifer. The soil classifications used to describe geology of the aquifer are: (1) clean gravel, (2) coarse to medium sand, (3) fine sand, (4) silty sand, (5) clayey silt, and (6) clay. These soil classifications are represented numerically using the integers 1-6. The writers note that the integer ordering of these classifications is arbitrary. Consequently, results somewhat different than those described herein may have been obtained had a different categorical scheme been used. Each cell in a given geologic unit was assigned a single realization from the distribution of hydraulic conductivity corresponding to the unit. Single realizations were generated using Monte Carlo simulation via inversion. In addition, no spatial correlation was assumed to exists within a geologic unit. Thus, the correlation structure inherent in the aquifer is due primarily to the relative location and size of the geologic units. The triangular distribution (Fig. 2) was used to describe spatial variability in hydraulic conductivity for a given soil type. The distribution is defined using an upper bound (Kmax), a lower bound (Kmin), and the peak of the density function (Kp). To select K max , Kmin, and Kp for each soil classification, a chart was developed that summarizes hydraulic conductivities assigned to various soil types in thirteen publications (Fig. 3). The hydraulic conductivities recommended by others were synthesized into a single "composite chart" having the six different soil types that comprise the synthetic aquifer, each with a corresponding range of hydraulic conductivities (Table 1).
.'
I. 1.1
fp
__
Kmax
FIG. 2 - Distribution of hydraulic conductivity in a geologic unit.
184
GEOSTATISTICAL APPLICATIONS
Oaa (1985) Bowl .. (1984) Domenico & Schwartz (1990) HoH.&
Hough (1981) (1969) Kovacs
Lee a1 81. (1983)
Means McCar1l1y (1982)&
=~:~!~= SmHh(1978)
~~;i~c~~~ I I
r==~~~==~ I ~I~~~E;5$~ I
CleanGr.-..l
eo.,..· IHdIumSand
:::: D:;m : t a:
SOW.~(l·~ I:'
Whitlow (1983)
~=---------, AM .....
so
Cloy
SIIy""" Cloy., . . 0 .." Sand. Sand l Grwel Mill Sanci,Sll,CI",MIIctu,.
I
Compoeito
FIG. 3 - Range in hydraulic conductivities for different soil types. TABLE 1 - Parameters describing hydraulic conductivity distributions. Soil Type Clean Gravel
Kmin (em/sec) 5 x 10- 1
Kp (em/sec) 5 x 100
I x 10-3 I x 10-4
5 x 10-2 5 x 10- 3
5 x 10- 2
Clayey Silt
5 x 10-5 I x 10-7
5 x 10-4 I x 10-6
5 x 10- 3 5 x 10- 5
Clay
I x 10- 10
I x 10- 8
1 x 10- 6
Coarse - Med. Sand Fine Sand Silty Sand
Kmax (em/sec) 5 x 102 I x 100
Spatial Correlation Structure
The spatial correlation structure inherent in the soil type and hydraulic conductivity fields was characterized by computing directional experimental variograms in three dimensions. A model was then fit to the experimental variograms. A similar approach was also used to characterize the spatial cross-correlation structure between hydraulic conductivity and soil type. Experimental variograms were computed using the program GAM3 from the GSLIB geostatisticallibrary (Deutsch and Joumel 1992). The experimental variograms were computed by: N(h)
1 ~ Y*In K (h) = 2 N(h) £.J [In K(xi + h) - In K(xi) ] 2 1=1
(1)
BENSON AND RASHAD ON CO-KRIGING
185 (2)
In Eqs. 1-2, Y*lnK(h) is the estimated variogram for InKs separated by the vector h, y* s(h) is the estimated variogram for soil classifications (S), N(h) is the number of data pairs separated approximately by the same vector h, and Xi is a generic location in the aquifer. The cross-variogram between InK and S is computed as;
1 Y~nK,S (h) = 2 N(h)
N(h)
L [S(Xj + h) - S(Xj)] [lnK(xj+h)-lnK(xD]
(3)
1=1
The principal axes for soil classification were identified by computing a series of experimental variograms each having a different orientation relative to the traditional Cartesian axes. The analysis showed that mild anisotropy exists in the X- Y plane, with the principal axis oriented 45° counterclockwise from the X-axis. For the vertical direction, the principal axis coincided with the vertical (Z) axis (Benson and Rashad 1994). The principal axes for the hydraulic conductivity field were assumed to correspond to the principal axes for soil type because the hydraulic conductivity field was generated directly from the soil type field. A similar assumption was made regarding the cross-variogram (InK-soil type). Experimental directional variograms for soil type and hydraulic conductivity corresponding to the principal axes are shown in Figs. 4 a & b. The experimental crossvariogram (InK vs. S) is shown in Fig. 4c. For each set of variograms, the range is largest in the Y' direction and smallest in the Z' direction, which is consistent with the size and shape of the geologic units shown in Fig. 1. In contrast, the sill is essentially the same for the Y' and Z' directions, but is smaller in the X' direction. A spherical model with no nugget was found to best represent the experimental variograms. The spherical variogram is described by (Issaks and Srivastava 1989); y(h) = C[ I.S(h/a) -D.S(h/a)3] y(h)
=C
ifh
(4a)
ifh ~ a
(4b)
where C is the sill and a is the range. Table 2 provides a summary of C and a for each variogram. The directional experimental variograms exhibit a mixture of geometric and zonal anisotropies. Geometric anisotropy is characterized by directional variograms that have approximately the same sill, but different range. In contrast, zonal anisotropy corresponds to changes in the sill with direction, while the range remains nearly constant (Issaks and Srivastava 1989). The X'-Z' anisotropy is primarily geometric, whereas the X'- Y' and Z'- Y' anisotropies are primarily zonal.
.
,
,•
:.1
186
GEOSTATISTICAL APPLICATIONS
.. ..D-
Soli Classification
Variograms
2 .0
"....0 .... [3"" b . lJ. .
1lc
'"
,,-
1 .6
.0. IJ. - Il. oil -/J.' 6 '
.2
.,.,[3""6 . A ' 4
/'
.~
>
1.2
"
0 .8
E (/)
"
0 .4
X'. Y' . and Z are components of h along the principle axes
( a)
0 .0 400
0
800
1200
2000
1600
Separation Distance, h (em)
InK Variograms
35 .0 30 .0
g"
25 .0
'" >
20 .0
"
15.0
.~
E (/)
10 .0 X', Y·. and Z are components of h along the principle axes
5. 0 0 .0
L~(b~)~~~~~~~~~==~~~
o
500
1000
1500
2000
Separation Distance, h (em)
X', Y, and r afe components of h along the principle axes
-2.0
""c '>"
';ii
-4.0
A \
'E
"
(/)
~
-6.0
e
()
-8.0
'"
"",-6
' "",!!J,
"l!~
11."
-a." ~_""I!l
,",
Cross-Semivariograms for ( c) Soil Classification and InK
Separation Distance. h (em)
FIG_4 - Experimental variograms: (a) soil classifications, (b) hydraulic conductivities, and (c) soil classification-hydraulic conductivity_
BENSON AND RASHAD ON CO-KRIGING
187
TABLE 2 - Sill (C) and range (a) for variograms
X'
Parameter InK - Sill InK - Range (em) Soil Class. - Sill Soil Class. - Range (em) Cross-Sill Cross - Range (em)
25 925 1.5 925 -5.8 925
Direction Y'
32 1840 2.25 1840 -7.7 1840
Z' 27 500 1.75 500
!;
-6.5
500
A three-level nested structure was used to combine the geometric and zonal anisotropies into a single variogram model. The model has the form: (5)
=
The model YI,j is a spherical function (Eq. 4) having C 1; it provides the basis for the nested structure. The subscript j denotes the variable being described (S, InK, or S-lnK for the cross-variogram). The separation distance hi is an "equivalent" distance:
h 1-
J
hX')2 h y,)2 h ,)2 ( - + ( - + (zax' ay' az'
(6)
The weight WI,S corresponds to the smallest sill for soil classification (X' axis: sill = WI,S = 1.5 for S). The models Y2 and Y3 are also spherical models and they are used to ensure that sills corresponding to the Z' and Y' axes are preserved. That is: W2,S = 1.75 - WI,S = 0.25
(7a)
w3,S = 2.25 - (WI,S + w2,S) = 0.50
(7b)
and
The equivalent distances h2 and h3 are:
" ":1 (8a)
and
:1
II
:1 r' (8b)
A summary of the weights used for the soil classifications, hydraulic conductivity, and cross-variogram models is contained in Table 3. It is important to note that data for hydraulic conductivity and soil classification were available for each cell in the aquifer when computing the cross-variogram. In more realistic cases, both types of data will probably not be available at each location. Problems associated with this disparity can be resolved by using the pseudo-cross variogram (Meyers 1991).
188
GEOSTATISTICAL APPLICATIONS
TABLE 3 - Summary of variogram weights.
Variable Soil classification, S Hydraulic Conductivity, InK Cross-Variogram InK vs. S
wI 1.5 25.0
w2 0.25 4.0
w3 0.5 3.0
-5.75
-0.75
-1.2
FLOW AND ADVECTIVE TRANSPORT MODELS MODFLOW
The three-dimensional finite difference program MODFLOW was used to simulate steady-state saturated flow in the aquifer. MODFLOW uses a block-centered finite difference scheme to solve the groundwater flow equation. A detailed description of MODFLOW can be found in MacDonald and Harbaugh (1988). MODFLOW was modified for use in this study by adding subprograms and changing the existing data collection and storage procedures. A detailed description of these modifications can be found in Benson and Rashad (1994). For each field of hydraulic conductivity that was simulated, MODFLOW was used to compute the total heads at each node and the total flow rate (Q) emanating from cells at the downstream end. The hydraulic head field is used by the advective transport model for simulating contaminant transport. PATH3D
The program PATH3D (Zheng 1988) was used to simulate advective contaminant transport. PATH3D is a general particle-tracking program for calculating groundwater paths and travel times in steady-state or transient, two- or three-dimensional flow fields. A detailed description of PATH3D can be found in Zheng (1988). Changes in PATH3D were required before it could be used in this study. These changes included modifying the algorithm for time step adjustment and modifying the post-processor to describe characteristics of the plume. Details of these changes can be found in Benson and Rashad (1994). Transport was initiated by releasing contaminant fluid particles along a vertical profile located at X =0, Y = 1000 cm, and Z =0 to 2500 cm. Eight particles were placed in each cell along this profile (200 total particles in aquifer). RESULTS AND ANALYSIS Ground water flow and contaminant transport simulations were conducted for three different conditions: (1) fully-defined aquifer, (2) partially defined aquifer using only hydraulic conductivity data, and (3) partially-defined aquifer using hydraulic conductivity and soil classification data. In the partially-defined cases, a co-kriging program (based on the program COKB3D, Deutsch and Journel 1992) was used to
BENSON AND RASHAD ON CO-KRIGING
189
estimate hydraulic conductivity for each finite-difference cell in the aquifer using a liner co-regionalization model for the variograms (see previous sections). In this application, co-kriging was only used to estimate the primary variable, hydraulic conductivity. In addition, point kriging was used instead of block kriging because it was more easily implemented and the cells used to discretize the aquifer were small. Nevertheless, a small error was introduced by point kriging. The variogram models previously discussed were used to describe the spatial correlation structure. Input for the co-kriging program included subsurface information consisting of profiles of hydraulic conductivity or soil classifications. A description of the co-kriging implementation can be found in Benson and Rashad (1994). Estimating the Hydraulic Conductivity Field
Initial Comparison: Kriging vs. Co-Kriging An initial comparison was made between the estimated hydraulic conductivity field and the fully-defined hydraulic conductivity field along a transect (single row of cells; X=50 cm, Z=50 cm, Y=O to 2500 cm) through the aquifer. In one case, the hydraulic conductivity field was estimated with kriging using only hydraulic conductivity data; co-kriging using hydraulic conductivity and soil classification data was used for the other case. u
The hydraulic conductivity fields estimated using kriging and co-kriging are shown in Fig. 5. Hydraulic conductivities measured along three vertical profiles were used as input when only kriging was conducted. The resulting estimated InK field is a smooth, nearly linear interpolation between the profiles at which measurements were made. The estimated InK field is very different from the "true" hydraulic conductivity field obtained from the fully-defined synthetic aquifer. That is , the irregular spatial variations in InK are not preserved.
~------,
",+-- Co-Kriging
,
-5.0
/
·. Knglng
'
,
"
:
)..
, ,
"
-15.0 -20.0
o
500
1000
1500
2000
2500
Y(cm)
FIG. 5 - Estimated hydraulic conductivities along a transect. Co-kriging was conducted using three profiles of hydraulic conductivity (primary variable) and 11 profiles of soil classifications (secondary variable). Figure 5 shows that
190
GEOSTATISTICAL APPLICATIONS
addition of the secondary variable greatly improves the estimated hydraulic conductivity field. The estimated InKs along the transect more closely resemble the "true" InKs. However: even with co-kriging, the estimated field is smoother then the "true" field. Selecting Profiles -- Various exploration schemes (i.e., collections of hydraulic conductivity and soil classification profiles) were selected to evaluate the relative effectiveness of different types of subsurface data in estimating the accuracy of the hydraulic conductivity field. The initial exploration scheme consisted of five profiles of hydraulic conductivity and five profiles of soil classifications. Subsequent schemes incorporating more data were constructed by adding more profiles of either hydraulic conductivities or soil classifications. A consistent method was needed to select locations for additional profiles. The writers chose to select subsequent profiles at locations where the co-kriging variance is largest. These locations have the greatest uncertainty in the estimated hydraulic conductivity. The writers note, however, that locations where the co-kriging variance is largest are not necessarily the critical locations where uncertainty in hydraulic conductivity has the greatest impact on contaminant transport. However, these critical locations cannot be identified a priori, because under normal circumstances the detailed characteristics of the aquifer are unknown. The aforementioned methodology was used to select 14 exploration schemes. The first seven schemes consist of five profiles of hydraulic conductivity (NK=5) and a varying number of soil classification profiles (NS=O, 5, 9, 15,22,32, 125). The second set of seven schemes was similar, except ten hydraulic conductivity profiles (NK=lO) were used. The layout of each exploration scheme is contained in Benson and Rashad (1994). Precision of the Hydraulic Conductivity Field -- In this section, two statistics are used to characterize the precision of the estimated field. These statistics are: the maximum co-kriging variance and the mean co-kriging variance. The mean co-kriging variance (cr~) is used to quantify the global estimation error. The mean co-kriging variance is computed as: H
.... 2
_~ ~ 2 N c 1= .1..1 cr cki'
uci ,
(9)
where cr~k,i is the co-kriging variance at the ith cell in the finite-difference grid and Nc is the total number of grid points (Nc = 12,500). In each case, cr c2k,I· is the variance of the primary variable (hydraulic conductivity) being estimated, as described in Issaks in Srivastava (1989, p. 404). The maximum co-kriging variance is shown in Fig. 6. The maximum co-kriging variance is essentially the same for the schemes using five and ten profiles of hydraulic conductivity; the co-kriging variance is only slightly larger for the scheme using five profiles, More importantly, however, addition of soil classification profiles results in a significant reduction in the maximum co-kriging variance. The mean co-kriging variance is shown in Fig. 7 as a function of the number of soil classification profiles. In this case, the error is significantly larger when five hydraulic conductivity profiles are used instead of ten profiles. A significant difference is
BENSON AND RASHAD ON CO-KRIGING
C>
c '6>
~
Cl>
25.0
OU
()~
E 'fii 20.0
E>
.~
~
15.0 10.0 L-~~~...........J_~~~.........J.._~~~.........J....--1 Kriging 1 10 100 FullyOnly Defined
No. of Soil Classification Profiles
FIG. 6 - Maximum co-kriging variance for InK.
,
20.0
L
15.0
r - - --
Q)
I
u
c
C"IJ
.~
> Cl
c
:§> ~
o
()
10.0 ~
t
c
C"IJ
Q)
:E
t
I
L
5.0 0.0
~ L t
o
--- ""'3 ~
\ -e-N K=5
J
i-
aT-
-N K=10
~~~""""""
I I
I
__
~......J....._~
_
10
_
100
No. of Soil Classification Profiles
FIG. 7 - Mean co-kriging variance for InK.
.....Jj
FullyDefined
191
192
'i "
GEOSTATISTICAL APPLICATIONS
expected, because the mean co-kriging variance represents a global measure of uncertainty whereas the maximum co-kriging variance is a local measure of uncertainty. Adding more hydraulic conductivity profiles will have a significant effect on the maximum co-kriging variance only if the profiles are located near or directly at the location where the maximum co-kriging variance exists because the co-kriging variance is a point measure. In contrast, because the mean co-kriging variance is a global measure of uncertainty, it will be reduced by adding more profiles, regardless of their location. Figure 7 also shows that exploration schemes employing more soil classification profiles with fewer hydraulic conductivity profiles can be as effective in reducing uncertainty as schemes that simply use more hydraulic conductivity profiles. For example, the scheme consisting of five hydraulic conductivity profiles and five soil classification profiles has similar mean co-kriging variance as the scheme using ten hydraulic conductivity profiles and no soil classification profiles. Furthermore, the scheme using more soil classification profiles and fewer hydraulic conductivity profiles is likely to be less expensive. Thus, a similar reduction in uncertainty can be obtained at less cost. Total Flow
One means to evaluate how well the aquifer is characterized is to compare the total flow rate across the compliance surface for the fully-defined condition with the total flow rate when the aquifer is characterized using a limited amount of subsurface data. For the synthetic aquifer, the compliance surface was defined as the downstream boundary (Fig. 1). If the flow rates are not nearly equal, then the aquifer is not adequately characterized. If the flow rate is too high, low conductivity regions blocking flow have been missed. In contrast, a flow rate that is too low is indicative of missing preferential pathways (Fogg 1986). Figure 8 shows total flow rate when the aquifer is characterized with 5 or 10 profiles of hydraulic conductivity and a variable number of soil classification profiles. When no profiles of soil classifications are used (kriging only), the flow rate is one-third to one-half the total flow rate. Apparently, the sampling program has inadequately defined the preferential pathways controlling true total flow. However, when more soil classification profiles are added, the flow rate begins to rise and then becomes equal (i.e., > 10 profiles) to the flow rate for the fully-defined condition. Two other characteristics of Fig. 8 are notable. First, similar flow rates were obtained when five or ten profiles of hydraulic conductivity (but no soil classifications) were used to characterize the aquifer. Apparently, neither set of measurements is of sufficient extent to capture the key features controlling flow. Second, the aquifer was better characterized (in terms of total flow rate) using five hydraulic conductivity profiles and 15 soil classification profiles then 10 hydraulic conductivity profiles and 15 soil classification profiles. This indicates that focusing on collecting a greater quantity of index measurements (i.e., soil classifications) may be more useful in characterization than collecting a fewer number of more precise measurements (i.e., hydraulic conductivities). In this case, hydraulic conductivity inferred from a soil classification had a precision of two to three orders of magnitude, whereas the hydraulic conductivity "measurements" were exact. Thus, in this case, simply defining the existence of critical flow paths apparently is more important than precisely defining their hydraulic conductivity.
BENSON AND RASHAD ON CO-KRIGING
193
120 100
0Q) III
"'E
.!:!. Q)
80 60
OJ
a: ~
N =0
40
0
~S_ _ _
u:: 20
o L,_--'-"-'-.L.L.Uu.L---'--'-'-'-'-.u..U._............-'-'--W-U-'----;:! Kriging Only
10 100 FullyNo. Soil Classification Profiles Defined
FIG. 8 - Total Flow Rate Through the Synthetic Aquifer Trajectory of the Plume - Centroid
Trajectory of the plume can be characterized by the coordinates (X, Y, Z) of its centroid. Trajectories for several different explorations schemes are shown in Fig. 9. In each case, the trajectory is recorded for only four years. For longer times, a portion of the plume has passed the downstream edge of the aquifer. Consequently, the statistics used to describe the plume (centroid and variance) are ambiguous. At early times, the trajectory of the centroid does not depend greatly on the exploration scheme. However, as the plume evolves, different trajectories of the centroid are obtained. In particular, the plume moves more slowly in the down-gradient (Xdirection) when the aquifer is characterized with a limited amount of subsurface data. Apparently, the preferential pathways controlling down-gradient movement were inadequately characterized. Addition of soil classification data did not consistently improve the trajectory in the X-direction. Adding five soil classification profiles improved the trajectory significantly, but the worst trajectory was obtained when 22 soil classification profiles were used. Adding even more soil classification profiles (NS = 32 or 125) improved the trajectory only slightly. This is particularly discouraging because 125 soil classification profiles corresponds to sampling 25% of the entire aquifer. The cause of this discrepancy is inadequate representation of subsurface anomalies that affect movement of the plume. At approximately 0.2 years, the centroid of the plume moves dramatically as the particles flow around a low conductivity region. The Y-coordinate increases and the Z-coordinate decreases (i.e., the plume moves upward and towards the rear face of the aquifer). None of the exploration schemes provided enough information to adequately characterize this movement. However, adding soil classification profiles did improve the prediction. When only hydraulic conductivity profiles were used (kriging only), the plume moved downward and to the front which is the exact opposite behavior occurring in the fully-defined case. Adding soil classification profiles did prevent the centroid from moving in the opposite direction
194
GEOSTATISTICAL APPLICATIONS 900 - NK =10, Ns=5
800 700
Q)E -u
0:1. ~~
Fully-Defined
'"
- NK =10, Ns=22 -- -. - -- NK =10, Ns=32 - NK =10, N s=125
600
"E "o 500 o!=
oc U., 'U x_
400
Kriging Only NK =10 ,--
0
300
/: '
~ -~
/-
200
.--- ,.,0:
L---.--'----
100 0.01
0.1
4
Time (years)
Fully-Defined 950
Q)E
-u 0:1-
.5"0
"E "o 900 o!= OC
U., 'U
-- r --~-
Kri~ng
.......'\
Only , "'- -
_ ~=10_-----1\ ~
- .-
>--0
850
NK =10, N s=5 NK=10, Ns=22 - ... - NK =10, Ns=32
\ /'
-- - M- --
\.1
- - - NK=10, N s=125
-
800 0.01
-
,,/
-
0.1
4
Time (years) 1100
Fully-Defined , ,
1150 1200
Q)E
-u 0:1-
1250
5"g
1300
' U N_
1350
.5'0
OC u.,
----- - - - - ., NK =10, N s=5 NK =10, Ns=22 NK =10, N s=32
o
1400
---.- NK=10, N s=125
1450 1500
L -_ _ _ _ _ _ _ _
0.01
~
_______
0.1
t
Kriging Only NK=10 ~
__
~
4
Time (years)
FIG, 9 - Centroid of plume: (a) X-coordinate, (b) Y-coordinate, and (c) Z-coordinate
BENSON AND RASHAD ON CO-KRIGING
195
and, in the case where 125 profiles were used, did result in a subsurface where the plume moved upward and to the rear of the aquifer. Unfortunately, the degree of plume movement existing when 125 soil classification profiles were used is still too small to simulate the fully-defined condition. For brevity, graphs of trajectory of the centroid are not shown for the exploration schemes where 10 hydraulic conductivity profiles were used. These graphs can be found in Benson and Rashad (1994). Smaller errors in the predicted trajectory occurred when ten hydraulic conductivity profiles were used in the exploration scheme. In this case, addition of soil classification profiles also had a smaller impact on the predicted down-gradient movement of the plume. However, adding soil classification profiles did improve the Y and Z-coordinates of the centroid. When only ten hydraulic conductivity profiles were used (kriging only), the plume moved in the opposite direction as was observed in Fig. 9 (NK=5, NS=O). But, when soil classification profiles were added, the plume moved in the correct direction. Nevertheless, the movement occurred more slowly than in the fully-defined case, which caused the down-gradient movement of the plume in the estimated aquifers to lag behind the down-gradient movement of the plume in the fullydefined aquifer. Spreading - Variance of the Plume
Spreading of the plume is characterized by the variance (or second central moment); a larger variance corresponds to a greater amount of spreading. Evolution of the variance of the plume is shown for various exploration schemes in Fig. 10. Some general features of Fig. 10 are noteworthy of mention. First, the variance is larger in the X and Z-directions. The Z-variance is large because the particles are uniformly distributed along a vertical profile (i.e., Z-direction) at the onset of the flow and transport simulation. A large X-variance occurs because down-gradient movement of the plume occurs in the X-direction and thus the X-variance corresponds to longitudinal spreading of the plume. Accordingly, the Y-variance is much smaller because it corresponds to lateral spreading orthogonal to the average hydraulic gradient, which is generally smaller than spreading in the longitudinal direction. The X-variance increases with time. At short times, the variance is small because little spreading of the plume has occurred. However, as the plume moves down-gradient, the variance increases as more spreading occurs. Furthermore, the ability to capture the true amount of spreading depends on the amount of subsurface information used in characterization (Fig. 10). When less information is used (e.g., kriging only, NK=5, NS=O), the variance is smallest and when more information is used (i.e., by adding soil classification profiles) the variance increases. This is expected, because a smoother subsurface containing fewer heterogeneities exists when less data are used in characterization. However, adding more soil classification profiles does not consistently improve the X-variance. In fact, the X-variance for NS=5 is closer to the X-variance in the fully-defined case than the schemes having NS=22, 32, and 125. Similar behavior was noted for the Y-variance. However, adding more soil classification profiles had a more consistent effect on the Z-variance. Adding more soil classification profiles consistently resulted in a Z-variance that was closer to the Zvariance in the fully-defined case.
196
GEOSTATISTICAL APPLICATIONS
~ - - NK=5'NS=5~
800
-
NE ~ C1I 0
>
K
'
S
-
- - - -- -----
III
.~
N =5 N =22
-
N K=5, N =32 s - -- - NK=5, N s=125
-+
600
c
K -
Fully-Defined '
400
X 200
0.1
I,
4
Time (years)
Fully-Defined
200
l~ -
150
K
'
S
=5~
- N =5 =5, N Ns=22 K - + . N K=5, N s=32 -
K -
.
- -- - NK=5, Ns=125 --- - -----
-~ .
100
/ /
50
o ~---------~--------
0 .01
0.1
__ ____ ~
~
4
Time (years)
Fully-Defined ...... 700
E
~
650
C1I
0
c
III
.~
> N
600
550 -
NK=5, Ns =5 N =5 N 5 =22 NK=5' N =32 K
'
5
NK=5, Ns 500 I 0.01
=: 25
I
I I 1
0.1
4
Time (years)
FIG. 10 - Variance of plume: (a) X-variance. (b) Y-variance. and (c) Z-variance
BENSON AND RASHAD ON CO-KRIGING
197
When ten hydraulic conductivity profiles were used for characterization, the Xvariance in the estimated aquifers was similar to the X-variance in the fully-defined case regardless of the number of soil classification profiles used in characterization (Benson and Rashad 1994). Apparently, the ten hydraulic conductivity profiles used for characterization resulted in a sufficiently heterogeneous subsurface such that spreading in the down-gradient direction was preserved. In contrast, spreading in the Y and Z-directions for the fully-defined case was distinctly different from spreading that occurred in these directions when ten hydraulic conductivity profiles were used for characterization (Benson and Rashad 1994). Adding five soil classification profiles resulted in a V-variance that was closer to the V-variance in the fully-defined case. However, as even more soil classification profiles were added (NS=22, 32, 125), the V-variance became much different than was observed in the fullydefined case. Apparently, the heterogeneities causing spreading in the V-direction were inadequately represented when the subsurface was characterized with additional soil classification profiles. Kriging with only ten soil classification profiles resulted in a Z-variance that differed greatly from the Z-variance in the fully-defined case. For larger times, the Zvariance was much smaller than the Z-variance for the fully-defined condition. However, when soil classifications were added, the Z-variance more closely resembled the Zvariance for the fully-defined case. Thus, using soil classifications apparently resulted in heterogeneities that were similar to those controlling spreading in the Z-direction in the fully-defined aquifer. SUMMARY AND CONCLUSIONS The objective of this study was to illustrate how predictions of contaminant transport differ as the quantity and type of information used to characterize the subsurface changes. Groundwater flow and advective contaminant transport were simulated through a heterogeneous synthetic aquifer that was fully defined. The aquifer was highly heterogeneous, as might be encountered in supraglacial sediments, such as those found in the upper mid-western United States. Additional flow and transport simulations were conducted using versions of the aquifer that were characterized using a limited amount of subsurface data. Comparisons were then made between the true movement of the plume (in the fully defined aquifer) and movement of the plume in versions of the aquifer that were characterized with limited subsurface data. Results of the flow and transport simulations show that soil classifications can be used to augment or replace more costly hydraulic conductivity measurements while maintaining similar accuracy in terms of total flow through the aquifer. However, the geologic details that govern transport through the synthetic aquifer apparently were never sufficiently characterized. Bulk movement of the plume (i.e., the centroid) and spreading (i.e., variance) of the plume were never simulated accurately, regardless of the amount of subsurface data (hard or soft) that were used for characterization.
198
GEOSTATISTICAL APPLICATIONS
ACKNOWLEDGMENT
,,
The study described in this paper was sponsored by the U.S. Dept. of Energy (DOE), Environmental Restoration and Waste Management Young Faculty Award Program. This program is administered by Oak Ridge Associate Universities (ORAU). Neither DOE or ORAU have reviewed this paper, and no endorsement should be implied. REFERENCES
Benson, C. and S. Rashad (1994), "Using Co-Kriging to Enhance Hydrogeologic Characterization," Environmental Geotechnics Report No. 94-1, Dept. of Civil and Environmental Engineering, University of Wisconsin, Madison, WI. Bowles, J. (1984), Physical and Geotechnical Properties of Soils, 2nd Edition, McGrawHill, New York. Cooper, S. and C. Benson (1993), "An Evaluation of How Subsurface Characterization U sing Soil Classifications Affects Predictions of Contaminant Transport," Environmental Geotechnics Report No. 93-1, Dept. of Civil and Environmental Engineering, University of Wisconsin, Madison, WI. Das, B. (1985), Principles of Geotechnical Engineering, PWS-Kent Publishing, Boston. Deutsch, C. and A. Joumel (1992), GSLIB Geostatistical Software Library and User's Guide Book, Oxford University Press, New York. Domenico, P. and F. Schwartz (1990), Physical and Chemical Hydrogeology, John Wiley, New York. Fogg, G. (1986), "Groundwater Flow and Sand Body Interconnectedness in a Thick, Multiple-Aquifer system," Water Resources Research, 22(5), 679-694. Holtz, R. and W. Kovacs (1981), An Introduction to Geotechnical Engineering, PrenticeHall, Englewood Cliffs, N1. Hough, B. (1969), Basic Soils Engineering, 2nd Edition, Ronald Press Co., New York. Isaaks, E. and R. Srivastava (1989), Applied Geostatistics, Oxford Univ. Press, New York. Istok, J., Smyth, J., and Flint, A. (1993), "Multivariate Geostatistical Analysis of GroundWater Contamination: A Case History," Ground Water, 31(3), 63-73. Lee, I., White, W., and Ingles, O. (1983), Geotechnical Engineering, Pitman Co., Boston. McCarthy, D. (1982), Essentials of Soil Mechanics and Foundations, Basic Geotechnics, Reston Publishing, Reston, VA. MacDonald, M. and A. Harbaugh (1988), "A Modular Three-Dimensional Finite Difference Ground-Water Flow Model," Techniques of Water-Resources Investigations of the United States Geological Survey, USGS, Reston, VA. Means, R. and 1. Parcher (1963), Physical Properties of Soils, Merrill Books, Columbus.
BENSON AND RASHAD ON CO-KRIGING
199
Meyers, D. (1991), "Pseudo-Cross Variograms, Positive Definiteness and Co-Kriging," Mathematical Geology, 23, 805-816. Mickelson, D. (1986), "Glacial and Related Deposits of Langlade County, Wisconsin," Information Circular 52, Wisc. Geologic and Natural History Survey, Madison, WI. Mitchell, J. (1976), Fundamentals of Soil Behavior, John Wiley and Sons, New York. Scott, C. (1980), An Introduction to Soil Mechanics and Foundations, 3rd Edition, Applied Science Publishers, London. Simpkins, W., McCartney, M., and D. Mickelson (1987), "Pleistocene Geology of Forest County, Wisconsin," Information Circular 61, Wisconsin Geologic and Natural History Survey, Madison, WI. Smith, G. (1978), Elements of Soil Mechanics for Civil and Mining Engineers, 4th Ed., Granada Publishing, London. Seo, D-J, Krajewski, W., and Bowles, D. (1990a), "Stochastic Interpolation of Rainfall Data from Rain Gages and Radar Using Co-Kriging," Water Resources Research, 26(3),469-477. Seo, D-J, Krajewski, W., Azimi-Zonooz, A., and Bowles, D. (1990b), "Stochastic Interpolation of Rainfall Data from Rain Gages and Radar Using Co-Kriging. Results," Water Resources Research, 26(5), 915-924. Sowers, G. and G. Sowers (1970), Introductory Soil Mechanics and Foundations, 3rd Ed., McMillan Co., New York. Webb, E. and Anderson, M. (1996), "Simulation of Preferential Flow in ThreeDimensional, Heterogeneous Conductivity Fields with Realistic Internal Architecture," Water Resources Research, 31(3), 63-73. Whitlow, R. (1983), Basic Soil Mechanics, Construction Press, New York. Zheng, C. (1988), "PATH3D, A Groundwater Path and Travel Time Simulator, User's Manual," S. S. Papadopulos and Associates, Inc., Rockville, MD.
Stanley M. Miller 1 and Anja J. Kannengieser 2
GEOSTATISTICAL CHARACTERIZATION OP HYDRAULIC CONDUCTIVITY USING PIELD
UNSATURATED INPILTROMETER
DATA
REFERENCE: Miller, S. M., and Kannengteser, A. J., "Geostatistical Characterization of Unsaturated Hydraulic Conductivity Using Field Infiltrometer Data," Geostatlstlos for Environmental and Geoteohnioal Applioations, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996.
ABSTRACT: Estimation of water infiltration and retention in surficial soils is a critical aspect of many geotechnical and environmental site evaluations. The recent development of field-usable tension infiltrometers now allows insitu measurements of unsaturated hydraulic conductivity (K u ), thus avoiding some uncertainties associated with remolded soil samples tested in the laboratory. Several different geostatistical "mapping" methods can be used to spatially characterize Ku , including ordinary and indicator kriging, as well as spatial simulations that provide realizations (stochastic images) of Ku that exhibit more natural variability than do the smoothed spatial estimations of kriging. Multivariate procedures, such as cokriging and Markov-Bayes simulation, can incorporate information from a secondary attribute (e.g., particle size information) to enhance the spatial characterization of an undersampled Ku field. These geostatistical procedures are demonstrated and compared for a case study at a 700 sq. meter site comprised of coarse soil material. Results indicate that percent-by-weight fractions can be used effectively to enhance insitu spatial characterization of Ku' KEY WORDS: unsaturated hydraulic conductivity, particle size, characterization, geostatistics, kriging, spatial simulation.
site
An important physical property to be measured when investigating water infiltration through surficial soils is the insitu unsaturated hydraulic conductivity (K u ). The recent development of field-usable tension infiltrometers now provides the capability to measure insitu Ku , thus avoiding some of the uncertainties associated with remolded soil specimens tested in the laboratory (e.g., loss of insitu soil structure). Even though field measurements of unsaturated hydraulic conductivity exhibit spatial variability, enough spatial dependence typically 10ept. of Geology and Geol. Engrg., Univ. of Idaho, Moscow, 10 20ept . of Mathematics and Statistics, Univ. of Idaho, Moscow, 10
200
83844 83844
MILLER AND KANNENGIESER ON CONDUCTIVITY
201
is observed to warrant a geostatistical investigation to characterize Ku across the study site. Because of a fairly rapid sampling time, as many as 20 to 30 Ku field measurements can be obtained in two days, a much faster and more time-efficient procedure than that associated with laboratory testing of remolded specimens. This provides an adequate number of data for many geostatistical assessments, and the data base can be supplemented by additional data on other physical properties related to Ku (particle-size distribution attributes or insitu density) . Several different geostatistical "mapping" methods can be used to spatially characterize Ku. Univariate procedures include: 1) ordinary kriging, which provides a smoothed map of Ku estimates at unsampled locations across the site, 2) indicator kriging, which provides local estimates of conditional probability distributions of Ku at specified grid locations across the site, and 3) gaussian-based simulations, which provide spatial realizations (stochastic images) of Ku that exhibit more natural variability than do the smoothed spatial estimations of kriging. Multivariate procedures, such as cokriging and Markov-Bayes simulation can incorporate spatial information from a secondary attribute (e.g., the median particle size) to enhance the spatial characterization of an undersampled Ku field. To demonstrate and evaluate these various spatial characterization methods, a case study at a 700 sq. meter site was conducted. The coarse-grained soil material, as represented by 20 sampling locations across the site, had a median particle size of approximately 7 mm and averaged less than 6% by weight fines (i.e., finer than a No. 200 sieve). Insitu Ku values at a 5-cm tension were obtained at each of the 20 sites to provide a minimally sized sample for geostatistical studies. Background information, analytical procedures, and results of the study are presented below.
TENSION
INPILTROMETER
For nearly a hundred years, soil scientists have been describing the flow of water through unsaturated materials. Estimating the amount and rate of such water flow requires knowledge of the Ku/moisture content relationship or the Ku/soil tension (water potential) relationship. The most commonly used method to define these relationships relies on laboratory measurements obtained by pressure desorption of a saturated core of soil material, which leads to the construction of a moisture characteristic curve of moisture content vs. soil tension (Klute 1986). However, there are three problems associated with such testing: 1) the time required to setup samples and then test over a wide range of soil tensions; 2) the cost of field sampling, remolding specimens, and monitoring the laboratory tests, which may take several weeks; and 3) potentially unrealistic results due to the remolding of specimens, which destroys any insitu soil structure or packing arrangements that may have strong influence on flow characteristics. This latter concern is especially applicable to coarse-grained soil materials, such as those containing a significant amount of gravel or coarse sand. Because of these concerns, there has been considerable interest in recent years among soil scientists to develop methods for field measurements of unsaturated flow properties (Ankeny et al. 1988; Perroux and
I: I
,i
t
202
GEOSTATISTICAL APPLICATIONS
White 1988; Clothier and Smettem 1990; Ankeny et al. 1991, Reynolds and Elrick 1991}. Field capable devices for such work are characterized as "tension infiltrometers" or "disk permeameters." They allow direct measurement of insitu infiltration (flow rate) as a function of tension, which leads to estimation of the insitu Ku value. The tension infiltrometer used in this study was manufactured by Soil Measurement Systems of Tucson, Arizona. It has a 20-cm diameter infiltration head that supplies water to the soil under tension from a Marriotte tube arrangement with a 5-cm diameter water tower and a 3.8-cm diameter bubbling tower (Fig. I). Three air-entry tubes in the stopper on top of the bubbling tower are used to set the operating tension. All major parts are constructed of polycarbonate plastic, with a very fine nylon mesh fabric covering the infiltration head. Pressure transducers installed at the top and bottom of the water tower are used to measure accurately the infiltration rate. Output from the transducers is fed electronically to a field datalogger for real-time data acquisition and storage. Procedures for field setup and use of the instrument are given in the SMS User Manual (1992). Using the measured flow rates, 0 (cm 3 /hr), from the field tests, values of Ku can be obtained using formulae given by Ankeny et al. (1988), Ankeny et al. (1991), and Reynolds and Elrick (1991). The first step is to calculate the pore-size distribution parameter, a, for a pair of tension settings:
where: h1 = first soil tension value, h2 = second soil tension value (higher than h1), 01 = volumetric infiltration rate for the first tension setting, 02 = volumetric infiltration rate for the second tension setting. Next, a parameter known as Ks (akin to saturated hydraulic conductivity) is calculated as follows: (2 )
r
where: r = effective radius of wetted area beneath the infiltration disk (cm), h1 = selected soil tension value in the testing range, 01 volumetric infiltration rate corresponding to h1' Then, an exponential relationship is used to calculate the desired Ku(h} given the results from Eqns. (I) and (2):
II
I
(3 )
For our field study, measured infiltration rates were recorded at soil tensions (suctions) of -3, -6, and -15 cm of water. The pair of tensions at -3 and -15 cm of tension was used in Eqns. (I) and (2) to obtain estimates of a and Ks , respectively. Values of Ku then could be calculated at any soil tension h desired.
i:
MILLER AND KANNENGIESER ON CONDUCTIVITY
203
em
Transducer Pinch Clomp
Water reservoir Water level
Three-hole stopper Bubble tower
8 1 em
56 em
Nylon screen
Air tube
Infiltration disc Shut-off valve
1--20 em---l
3.8 eM
I
I
I
I
I
==0:::::'=:':::::::::::::::':==---- ---- ------- -FIG. 1--Schematic diagram of tension infiltrometer.
S BLBCTBD
GBOSTATISTICAL
MAPPING
MBTHODS
Geostatistical spatial characterization of a specified attribute generally involves the generation of maps by "filling in" values of the attribute at numerous unsampled locations. Such filling-in processes that honor the available data can be achieved by one of two methods, interpolation or simulation. Spatial interpolation methods tend to smooth the spatial pattern of the attribute (causing the set of estimates to have a smaller variance than t h e actual data set) , but generally provide good local esti mations. Spatial simulations, on the other hand, provide more realistic fluct u ations, with the set of simulated values having a variance that approximates that of the actual data set. The theoretical basis and important practical considerations of ordinary kriging, the common geostatistical estimation method, have been described in published literature over recent years (for example , see David 1977; Journel and Huijbregts 1978; Clark 1979; Isaaks and srivastava 1989). In essence, the procedure involves calculating a weighted average of neighborhood data, where the weights represent least-squares regression coefficients obtained by incorporating spatial covariances between the data locations and the estimation location (CSi's) and those between the pairs of the data values (Cij ' S). Ordinary kriging provides unbiased and minimum-error estimates, and it can be used to estimate values at point locations or to estimate the average value of blocks (areas or vol umes) . The estimated value of the block (or point , if point kriging is used) is obtained by a weighted average of n data in the immediate neighborhood (x's at locations u i):
204
GEOSTATISTICAL APPLICATIONS
VB
n
=
L ai x(Ui)
where the ai'S are the kriging weights.
(4)
i=l In practice, the number of neighborhood data used in kriging estimation is limited so that only those data locations within a range of influence (or so) of the block or point location are used. Range of influence is defined at that distance beyond which data values are not dependent (i.e., covariance is zero). The block covariance CBB is a constant value for all blocks of identical dimensions; it is estimated by averaging the calculated covariance values between location pairs in the block defined by 4, 9, 16, or 25 locations. For point kriging, CBB
= s2 , the sample variance. The CBi values are obtained by averaging the covariances between 4, 9, 16, or 25 locations in the block with each i-th data location in the neighborhood. Any given Cij value is the covariance calculated for the lag and direction defined by the i-th and j-th data locations in the neighborhood. In all cases, the desired covariance value is obtained from the modeled variogram or complementary covariance at the specified lag distance and direction of the pair of locations being considered. Ordinary kriging is a useful spatial interpolation and mapping tool, because it honors the data locations, provides unbiased estimates at unsampled locations, and provides for minimum estimation variance. It also produces a measure of the goodness of estimates via the calculated kriging variance or kriging standard deviation. Because kriging is an interpolator, it produces a smoothed representation of the spatial attribute being mapped. Consequently, the variance of kriged estimates often is considerably less than the sample variance, and a kriged map will appear smoother than a map of the raw data. Kriging also accounts for redundancy in sample locations through the incorporation of the Cij information. Thus, kriging weights assigned to data locations clustered in the neighborhood will be less than those assigned to solitary data locations. In fact, data ·overloading" to one side of the estimation point or block may result in the calculation of small negative kriging weights. The ordinary kriging system of equations to be solved for the kriging weights is given by (a Lagrange term, A, is used to preserve unbiased conditions and to optimize estimates by minimizing the estimation error): n
L aj Cij + A. = CBi
(Sa)
j=l
(5b)
This system of equations is solved to obtain the ai weights and A. In addition, the estimation variance, or kriging variance, can be obtained at each estimation location.
MILLER AND KANNENGIESER ON CONDUCTIVITY
205
When describing the spatial dependence of an attribute of interest (i.e., the covariance values needed for kriging), either the semivariogram function or the spatial covariance function can be used (for example, see Isaaks and Srivastava 1989). Sometimes, difficulties are encountered when estimating these functions using skewed data sets that contain outliers. The influence of such outliers can be mitigated in many cases by using monotonous data transforms or by using an indicator-transform framework that leads to computing indicator variograms for use in indicator kriging (Journel 1983). The goal when kriging indicator-transformed data is not to estimate the unsampled value at location Uo, X(Uo), nor its indicator transform ;(110; Xk), which equals 1 if X(Uo).s;Xkand equals 0 ifX(Uo»Xk' Instead, indicator kriging yields a least-squares estimate of the local, conditional cumulative distribution function (cdf) for each cutoffXk, this estimate being valued between 0 and 1 and obtained by ordinary kriging of indicator values. Thus, at each ofkcutoffs, an estimated (designated by *) conditional cdf value forXkis obtained, which is equivalent to the indicator kriged value (a weighted average of neighboring 0' sand l' s) at location 110 with cutoff
Xk: F*[Xkl (n)] = P*[{X(uo) .s;Xk} I (n)]
= E*[ {I(uo; Xk)}
I (n)]
= [;(uo; Xk)] *
(6 )
where (n) represents the local conditioning information in the neighborhood surrounding the unsampled location Do. Once local conditional cdf's are estimated and then post-smoothed (if needed), maps of probability information or percentiles can be constructed to characterize the site. When bivariate data sets are available, cokriging can be used to provide estimates of the main attribute of interest that incorporate additional information from a secondary attribute. This requires computing semivariograms or spatial covariances for each individual attribute, as well as the cross-semivariogram between the two attributes (for example, see Isaaks and Srivastava 1989). Linear coregionalization of the semivariogram models allows for the cokriging covariance matrix to be positive definite and thus avoid theoretical and computational problems, such as estimating negative kriging variances (Issaks 1984). When adequate data are available and sufficient intervariable relationships observed, cokriging may provide a more comprehensive estimation than univariate kriging. Several types of spatial simulations also are available for mapping a spatial attribute. As discussed earlier, simulations provide natural-looking fluctuations in spatial patterns, while still honoring known data locations and preserving the desired variance and spatial covariance. Thus, simulations do not provide the smoothed appearance on maps typical to kriging estimations. For the case study that follows, we wanted to compare a straightforward simulation procedure to a more complicated approach. Therefore, we investigated sequential Gaussian simulation and MarkovBayes simulation, respectively. In addition, we used simulated annealing to generate numerous "data" values to supplement the available 20 "hard" data, and thus, provide a reference or training image to be used as a standard basis for comparisons. Discussions of these simulation methods and related software are given by Deutsch and Journel (1992).
206 CASE
GEOSTATISTICAL APPLICATIONS
STUDY
The site selected for the case study was a portion of a heap leaching pad at a base-metal mine in the Western U.S. Material at the site consisted of blasted ore, with particle sizes ranging from several microns up to several tens of millimeters. Although not a typical soil in agricultural terms (i.e., possessing necessary organic materials to support plant life), this material would be classified by engineers as a coarse gravel with some sand and fines. This type of coarse material would provide a rigorous test for the SMS tension infiltrometer, which was designed to be used primarily for agricultural-type soils. Field and Laboratory Work Due to time and budgetary limitations, only 20 locations were sampled over the study site, which was approximately 30 m (E-W) by 20 m (N-S). Prior to selecting the sampling locations in the field, various sampling layouts were studied by investigating their lag (separation distance between any two locations) distributions. The goal was to have a sampling plan that would provide adequate numbers of data pairs at short and intermediate lags to facilitate the computation and modeling of semivariograms. At the same time, fairly uniform coverage across the site was desired to establish a solid basis for kriging and for simulation. Numerous sampling layouts were evaluated by a trial-and-error method before the final layout was selected. Even this plan was not final, because some changes would be needed in the field, such as when a specified location lay directly over a large cobble. At each of the 20 sampling locations, an infiltrometer test pad was leveled by hand, large rocks were removed (those greater than about 8 cm across), and a 3-mm layer of fine sand was laid down to provide proper contact between the infiltrometer head and the ground surface. The 20-cm diameter head of the SMS tension infiltrometer device then was placed on the prepared pad and the infiltration test conducted. Water flow quantities were measured for three different tensions (suctions): -3 cm, -6 cm, and -15 cm of water head. Pressure transducers and electronic data-acquisition hardware were used to record the flow data on a storage module for later use. Once the infiltration test was completed at a given location, the wetted soil material directly beneath the infiltration disc was sampled. Several kilograms of the material were placed in sealed sample bags for subsequent analysis at the University of Idaho. Insitu measurements of density were not attempted at this particular site, due to the amount of gravel and larger-sized rocks. However, such measurements with a neutron moisture/density gage are recommended for similar studies of unsaturated hydraulic conductivity. At the University of Idaho Soils Laboratory, the 20 specimens were air-dried and rolled to break-up aggregated fines prior to sieve analyses conducted according to procedure ASTM-D422, excluding the hydrometer analyses. A stack of 13 sieves was used to sieve the granular materials, and particle-size distribution curves then were plotted to display the sieve results. All specimens showed fairly well-graded particle size distributions over size ranges from less than 0.075 mm (fines) to 75 mm (coarse gravel). None of the specimens had more than 8% by weight passing the No. 200 sieve (0.075 mm). Consequently,
MILLER AND KANNENGIESER ON CONDUCTIVITY
207
hydrometer analyses were not deemed necessary. Based on the Unified soil Classification system, the materials were identified as sandy gravel with nonplastic fines. Data Analysis Given the measured volumetric flow rates at -3 and -IS cm tensions, values of a and Ks were calculated according to Eqns. (1) and (2). Values of Ku at several selected tensions, h, then were computed and compared. Desiring to stay within the field measurement range and yet wanting to approximate field behavior at near-saturation conditions (such as after a heavy precipitation event or during spring snow-melt), we eventually selected a soil tension of -S cm for all subsequent calculations of the unsaturated hydraulic conductivity. Resulting values of Ku(-S), expressed in cm/day, for the 20 sampling locations are shown in the postplot of Fig. 2. Sample statistics for Ku(-S) are summarized below (units are cm/day): mean S.d.n-l
s.d. n
183 36.9 3S.9
minimum median maximum
124 189 238
various particle-size attributes were studied to evaluate their influence on Ku(-S), including the 010, 025, and 050 sizes, as well as the percent-by-weight finer than 2.0 mm (No. 10 sieve) and 4.7S mm (No. 4 sieve). Scatterplots of Ku(-S) vs. each of these attributes were generated and fitted with linear regression models. The three characteristics showing the strongest linear relationship were the percent-by-weight finer than 2.0 mm and finer than 4.7S mm, and the 025 size. Linear correlation coefficients were in the O.SS - 0.60 range, a positive value for the first two attributes and a negative value for the third one, respectively. Subsequent computations of experimental semivariograms for these three characteristics indicated that only the percent-by-weight finer than 2.0 mm (PF2) showed any significant univariate spatial dependence and cross spatial dependence with Ku(-S). Thus, this parameter from the particle-size distributions was selected as a secondary attribute to help estimate and map the primary attribute, Ku(-S). Sample statistics for PF2 are summarized below (units are %): mean s .d.n-l
s.d. n
24.S 2.S0 2.44
minimum median maximum
17.S 2S.1 27.9
Computing usable semivariograms with small data sets can be a challenging task. With only 20 data locations for this study, it was difficult to select computational lag bins that would provide adequate numbers of data pairs for the irregularly spaced data set. Therefore, we decided to use a ·sliding lag window" approach for computing the experimental semivariograms. A sliding window S-m wide was used for both the Ku(-S) and the PF2 data sets. Thus, the plotted points shown on the semivariogram graphs in Fig. 3 represent overlapping lag bins of O-S m, 1-6 m, 2-7 m, and so on. Because of the limited number of data, only isotropic (omnidirectional) semivariograms were computed. The two
208
GEOSTATISTICAL APPLICATIONS
3396. + 124.
3394.
+ 211. + 138.
3392.
+ 198.
3390.
+ 194.
+
3388. + 215.
3386.
+ 132.
238.
+ 183.
+ 205.+ 148.
+ 210.
+ 209.+ 127. + 174.+ 162.
3384.
+ 161 .
3382
+
+ 231.
+ 233.
161.
3378.L-~--~--~--~--~--~~--~--~--~--L-~--~--~~
664. 666.
668.
670.
672.
674.
676. 678. 680. 682. 664. 686. 668.
690.
692.
694.
FIG. 2--Postplot of e s timated insitu values of Ku(-S) in cm/day; northing and easting coordinates are in meter s .
1800
"i 1600
++
_7
+
~1400
+
"0
1; 1200
"E 5
u
-1000
~
:.'"
.2
.~
"
(/)
+
!i8
~
~4
800 600 400 200 0
0
P2 ·e
.
(/) 1 0 0
2
4
6
8
10
12
14
Lag Oi.lance (m)
(a)
16
18
20
22
0
2
4
6
10 12
8
14
16
18
20
22
Lag Distance (m)
(b)
FIG. 3--Estimated semivariograms and fitted spherical model s ; (a) Ku(-S) model: y(b)=671+620Sph I2 (b); (b) PF2 model: Y(h)=O.4+5.55Sph I2 (h).
experimental semivariograms were fitted with sph erical variogram models, as de scribed and annotated in Fig . 3. Sills on the models were set equal to the sample variance in both cases . The Ku(-S) data set wa s fitted with a fir s t-order trend s urface
MILLER AND KANNENGIESER ON CONDUCTIVITY
209
model to determine if there was any significant trend in mean across the site. Calculated F-statistics for this regression fit were not large enough to induce a rejection of the null hypothesis that a significant trend was not present. Thus, one of the primary considerations (i.e., that the mean does not depend on spatial location) of the covariance stationarity model for spatial random functions could be readily accepted for subsequent spatial estimations and simulations. Site Characterization and Mapping Using Geostatistics The isotropic semivariogram model shown in Fig. 3a provided the spatial covariance model to conduct ordinary point kriging on regular grids to generate estimates of Ku(-5) across the study site. GeoEAS (Englund and Sparks 1991) computer software was used. A regular 25 x 35 grid at 1-m spacings was used, because at field sampling locations an area approximately 0.7 to 1 m in diameter was wetted during each infiltrometer test. The shaded contour map of Fig. 4 clearly shows the smoothing characteristics of kriging. Summary statistics for these estimations are presented in Table 1. Comparisons of the sample variances again reflect the significant amount of spatial smoothing inherent to kriging estimations. Given similar estimates for storativity and soil-layer thickness, water-balance computations that incorporate annual precipitation and evapotranspiration values can be conducted to predict total annual recharge at each grid location at the site (Miller et al. 1990). Point cokriging of Ku(-5) also was conducted on the 1-m grid, incorporating the spatial information and codependence of PF2 (percentby-weight finer than 2.0 mm). The two semivariogram models of Fig. 3, as well as a cross-semivariogram between the two attributes (Fig. 5), were used in the GSLIB cokriging software (Deutsch and Journel 1992). Cokriged estimates were plotted and contoured to produce the shaded contour map given in Fig. 6. Summary statistics of the estimates are reported in Table 1. Cokriging did yield estimates with greater variance than ordinary point kriging, but not with as great of relief (i.e., difference between maximum and minimum). This estimation method would be especially applicable in situations where numerous sampling sites with particle-size analyses could be used to supplement a few actual Ku sampling sites.
TABLE 1--Summarized statistics of kriging results for Ku (-5), em/day.
Data n mean s.d. var. min. max.
20 183.0 35.9 1291.0 124.0 238.0
Ordinary Point Kriging
875 183.2 11.8 140.1 127.2 237.8
Cokriging
875 182.3 13.1 170.8 139.6 215.7
210
GEOSTATISTICAL APPLICATIONS
207 .00 199.00 191.00 183.00
§.
'"c
175 .00
0
167 .00
€ z
159 .00 151.00 143 .00 135.00
Eas1ing 1m)
FIG . 4 - -Shaded contour map of Kul-5) point k ri ging e s tima t es Ic m/ d ay) on a 1-m regular grid .
80 E ~
'"
70
+
.~ 50
..
+ + ++
60
+
.~ 40 C/)
30
~ 20
u
10 0 0
2
4
6
8
10
12
14
16
18 20
22
Lag Distance 1m)
FIG . 5--Es tima ted cross s emivariogram betwee n Kul-5) and PF2 with fit ted spher ical mode l: Y(h) = 8.0 + 42.5 Sph12(h) .
Indicator kriging yield s a different kind of mapping informatio n that often i s u s eful to characteri z e a s patial attribute . As di s cu ss ed previously, local conditional cdf 's are e s timated acro ss the s ite by indicator krig ing at s everal different data thre s hold value s. For th is s tudy, five thre s hold s were a ss igned for the Kul-5) data : 140, 161.5, 190, 211 . 5, and 235 cm / day. Computer s oftware in GSLIB IDeut s ch and Journel 199 2 ) was u s ed to c o nduct the indicator kriging, s moo th the
MILLER AND KANNENGIESER ON CONDUCTIVITY
211
207 .00 I 99 .00 191.00 183 .00 175 .0 0 167 .00 159.0 0 151 .00 143 .00 135 .00
Eosling 1m)
FIG. 6--Shaded contour map of Ku(-5) cokriging estimates on a 1-m regular grid, using PF2 as the secondary attribute.
estimated cdf's, and produce E-type estimates (expectation, or mean values) for mapping purposes (Fig. 7a). The probability of exceeding 200 cm/day also was calculated at each estimation location, and a s haded contour map of this exceedance probability wa s produced (Fig. 7b). This exceedance cutoff value was selected arbitrarily, but serves to illustrate the types of probability maps that can be generated to help characteri ze Ku at the site and provide input for cost-benefit studies to assist in treatment or remediation designs. Another advantage of the indicator kriging framework is that "soft" in format ion (inequal i ty relation s, professional judgments, etc.) can be coded probabilistically and used to supplement available "hard" indicator data (Journel 1986). As shown in the Ku(-5) postplot of Fig. 2, spatial variability in the unsaturated hydraulic conductivity is typical. For example, at a northing coordinate of 3386 m, a value of 205 cm/day is adjacent to a value of 148 cm/day, and a 209-cm/day value is adjacent to one of 127 cm/day. This spatial variability is to be expected , especially for coarser grained soil materials. Therefore, smoothed kriging maps of the type presented thus far may not always be the most appropriate way to characterize Ku. Spatial simulations that honor available data and also pre serve the sample variance provide quite a different prediction of spatial pattern s. To compare the performances of two type s of spatial simulators for small data sets of Ku , we first used simulated annealing (Deutsch and Journel 1992) on a 1-m grid to generate a pseudo ground-truth image of Ku(-5) that could serve as a reference base-map (Fig. 8). Sample statistics for the 875 simulated values are summarized below (in cm/day) :
212
GEOSTAT ISTICAL APPLICATIONS mean s.d. var.
184 35.8 1280
minimum median maximum
115 189 248
25.00 15.00 05.00 195.00 185.00 175 .00 165 .0 0 155 .0 0 145 .00 135 .00
Easling (meIer)
(a)
0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
Easting (meIer)
(b)
FIG . 7--Shaded contour maps of Ku {-5) indicator kriging results on a 1-m regular grid; (a) E-type map s howing cdf expectation values, cm/day; (b) probability of exceeding 200 cm/day.
213
MI LLER AND KANNENGIESER ON CONDUCTIVITY
250.00 230.00 210.00 190 .00 170.00 150.00
130.00 110.00
Eosling (meier)
FIG. 8--Shaded contour map of Ku (-5) "reference data" based on simulated annealing.
The isotropic semi variogram for the simulated values was similar to that shown in Fig. 3a. Sequential Gaussian and Markov-Bayes simulations of Ku(-5) were conducted on a I-m grid u sing software from GSLIB (Deutsch and Journel 1992), based on the 20 known data values and on the semivariogram model of Fig. 3a. The Markov-Bayes procedure also uses secondary information (PF2 in this case), but when both the primary and seconda ry attributes have data values at the same locations, the primary information is given precedence over the secondary (Zhu 1991; Miller and Luark 1993). Thus, results of the two different simulation methods for four trials (simulation passes, or iterations) were quite similar, s howing average meansquare-errors on the order of 2,300 cm/day squared. These errors were calculated as squared differences between the 875 simulated values and the 875 values of the ground-truth image. It was not surprising to observe that the largest of these errors occurred in the most s parsely sampled areas of the study site , where uncertainties are greatest for the simulated annealing approach and the other simulation methods. Advantages of a bivariate simulation method, such as the MarkovBayes (MB) procedure, become apparent when the primary attribute i s undersampled in regard to the secondary attribute. To illustrate this, we selected several subsets (reduced data sets) containing 10 of the original 20 Ku sampling sites. The goal became one of simulating 875 Ku (-5) values, given 20 PF2 sites and 10 Ku(-5) sites. The s e results then could be compared to those ba sed on sequential Gaussian (SG) simulation u sing only the 10 Ku(-5) data. Basic statistica l information for the three different subsets , A, B, and C, is presented in Table 2.
214
GEOSTATISTICAL APPLICATIONS
TABLE 2--Summarized statistics for the three subsets of Ku (-5) and for the corresponding simulation results (units are cm/day).
Original Data n
mean s.d. var. min. max.
20 183.0 35.9 1291.0 124.0 238.0
Subset A
Subset B
Subset C
10
10 194.4 30.3 918.6 138.4 232.8
176.5 40.1 1610.0 123.7 232.8
177.4 35.6 1264.0 123.7 237.8
10
SO simulation results (ayera~d from four trials);
n mean s.d. mean sq. err. using ref. image
875 176.4 34.5 2391.
875 194.2 2176.
875 173.0 40.3 2763.
875 182.1 33.4 2344.
875 178.8 37.2 2476.
31.1
MB simulation results (ayera~ed from four trials); n
mean s.d. mean sq. err. using ref. image
875 175.6 36.2 2588.
Note that Set B has smaller variance than the original data set, and that Set C has a larger variance than the original data set. In terms of overall mean squared error, SG simulation outperformed MB simulation for Subsets A and B where the sample variance was relatively small. However, when the subset had a larger variance, MB simulation was the better procedure according to this criterion. The beneficial influence of secondary data as used in the MB simulation method is shown clearly by the sample means and standard deviations of simulated sets based on the three different subsets. Note especially the more consistent results for Subsets Band C shown by the MB procedure compared to more inconsistent results of the SG procedure. Examples of MB simulation results for these two subsets are presented as shaded contour maps in Fig. 9.
CONCLUSIONS A variety of geostatistical tools are available for mapping and characterizing unsaturated hydraulic conductivity. Using recently developed tension infiltrometers for field use, measurements of volumetric infiltration rates provide a basis for estimating Ku values that reflect insitu conditions of soil density, packing, and structure. Although a vast improvement over laboratory testing of disturbed specimens, such insitu testing still requires enough time and effort that numerous measurements (greater than 30) at a study site likely will
MI LLER AND KANNENGIESER ON CONDUCTIVITY
215
250 .00
230.00 210.00 190 .00 170 .00
150.00
130 .00 110 .00
Easling (melor)
(a)
250 .00
230 .00 210 .00 190.00 170 .00
150.00 130 .00
110.00
Easling (meier) (b)
FIG. 9--Examples of Markov-Bayes s imulations of Ku (-5), cm / day; (a) bas ed on a subset (Set B) of 10 Ku data with variance lower than that of original data; (b) based on a subset (Set C) of 10 Ku data with variance higher than that of original data.
not be affordable except for large-budget investigations . However, s econdary information more economical to obtain, especially particlesi z e characteristics s uch a s the percent-by-weight finer than 2.0 mm ,
216
GEOSTATISTICAL APPLICATIONS
can be used in bivariate types of kriging and simulation to fill-in Ku values at unsampled locations and provide enhanced spatial mappings. The case study presented here dealt only with surface measuremen and two-dimensional maps. However, trenching with benched sidewalls could be used to provide insitu Ku assessments at various elevations add a third dimension of elevation into the characterization scheme. The kriging and simulation methods described herein are readily adapt to three-dimensional situations. If point estimates are desired for generating contour maps of estimated Ku , then ordinary point kriging (or indicator kriging for local cdf's) would be preferred. When local cdf's are estimated by indicator kriging, a variety of probabilistic type maps can be gene rat to characterize spatial patterns of Ku across the study site. When secondary data are available, and a recognizable relationsh is present between secondary and primary data, Markov-Bayes simulation often will provide better results than those produced by univariate simulations, such as the sequential Gaussian method. The former meth particularly has advantages when primary sample data are sparse and perhaps not representative of the entire population, and when a larger sample is available of the secondary attribute.
ACKNOWLEDGEMENTS Portions of this research work were supported by the Idaho Cente for Hazardous Waste Remediation Research under Grant No. 676-X405. authors also express appreciation to John Hammel, John Cooper, and Mit Linne of the University of Idaho for their technical advice and assistance in the operation of the tension infiltrometer, analysis of its measurements, and in the laboratory testing program. The Universi of Idaho does not endorse the use of any specific commercial material product mentioned in this paper.
REFERENCES Ankeny, M.D., M. Ahmed, T.C. Kaspar, and R. Horton, 1991, 'Simple Fie Method for Determining Unsaturated Hydraulic Conductivity," £oil Sci. Soc. of America Jour., Vol. 55, No.2, p. 467-470. Ankeny, M.D., T.C. Kaspar, and R. Horton, 1988, "Design for Automated Tension Infiltrometer," Soil Sci. Soc. of America Jour., Vol. 52, No. p. 893-896. Clark, I., 1979, 129 p.
Practical Geostatistics, Applied Sci. Publ., London,
Clothier, B.E., and K.R.J. Smettem, 1990, 'Combining Laboratory and Field Measurements to Define the Hydraulic Properties of Soil," ~ Sci. Soc. of America Jour., Vol. 54, No.2, p. 299-304. David, M., 1977, Geostatistical Ore Reserve Estimation, Elsevier, Amsterdam, 364 p.
MILLER AND KANNENGIESER ON CONDUCTIVITY
217
Deutsch, C.V., and A.G. Journel, 1992, GSLIB: Geostatistical Software Library and User's Guide, Oxford Univ. Press, New York, 340 p. Englund, E., and A. Sparks, 1991, Geostatistical Environmental Assessment Software User's Guide (GeoEAS 1.2.1), USEPA Env. Monitoring Systems Lab., Las Vegas, NV. Isaaks, E.H., 1984, "Risk Qualified Mappings for Hazardous Waste Sites: A Case Study in Distribution-Free Geostatistics,· M.S. Thesis, Stanford Univ., Stanford, CA, 111 p. Isaaks, E.H., and R.M. Srivastava, 1989, An Introduction to Applied Geostatistics, Oxford Univ. Press, New York, 561 p. Journel, A.G., 1983, "Nonparametric Estimation of Spatial Distributions," Math. Geology, Vol. 15, No.3, p. 445-468. Journel, A.G., 1986, "Constrained Interpolation and Qualitative Information -- the Soft Kriging Approach,· Math. Geology, Vol. 18, No. 3, p. 269-286. Journel, A.G., and C.J. Huijbregts, 1978, Academic Press, New York, 600 p.
Mining Geostatistics,
Klute, A., 1986, "Methods of Soil Analysis, Part 1," Amer. Soc. of Agronomy. Monograph 9. Miller, S.M., J.E. Hammel, and L.F. Hall, 1990, HCharacterization of Soil Cover and Estimation of Water Infiltration at CFA Landfill II, Idaho National Engineering Laboratory; Res. Report C85-110544, Idaho Water Resources Research Inst., Univ. of Idaho, Moscow, ID, 216 p. Miller, S.M., and R.D. Luark, 1993, HSpatial Simulation of Rock Strength Properties Using a Markov-Bayes Method," Int. Jour. Rock Mech. Min. Sci. & Geomech. Abstr., Vol. 30, No.7, p. 1631-1637. Perroux, K.M., and I. White, 1988, HDesigns for Disk Permeameters," Soil Sci. Soc. of America Jour., Vol. 52, No.5, p. 1205-1215. Reynolds, W.D., and D.E. Elrick, 1991, HDetermination of Hydraulic Conductivity Using a Tension Infiltrometer,· Soil Sci. Soc. of America Jour., Vol. 55, No.3, p. 633-639. Soil Measurement Systems, 1992, "Tension Infiltrometer User Manual,· Soil Measurement Systems, Tucson, AZ. Zhu, H. 1991, HModeling Mixture of Spatial Distributions with Integration of Soft Data," Ph.D. dissertation, Dept. of Applied Earth Sci., Stanford Univ., Stanford, CA.
l
2
Marc V. Cromer , Christopher A. Rautman , and William P. Zelinski
3
Geostatistical Simulation of Rock Quality Designation (RQD) to Support Facilities Design at Yucca Mountain, Nevada
REFERENCE: Cromer, M. V., Rautman, C. A., and Zelinski, W. P., "Geostatistical Simulation of Rock Quality Designation (RQD) to Support Facilities Design at Yucca Mountain, Nevada," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. 1. Johnson, A. 1. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: The conceptual design of the proposed Yucca Mountain nuclear waste repository facility includes shafts and ramps as access to the repository horizon, located 200 to 400 m below ground surface. Geostatistical simulation techniques are being employed to produce numerical models of selected material properties (rock characteristics) in their proper spatial positions. These numerical models will be used to evaluate behavior of various engineered features, the effects of construction and operating practices, and the waste-isolation performance of the overall repository system. The work presented here represents the first attempt to evaluate the spatial character of the rock strength index known as rock quality designation (RQD). Although it is likely that RQD reflects an intrinsic component of the rock matrix, this component becomes difficult to resolve given the frequency and orientation of data made available from vertical core records. The constraints of the two-dimensional study along the axis of an exploratory drift allow bounds to be placed upon the resulting interpretations, while the use of an indicator transformation allows focus to be placed on specific details that may be of interest to design engineers. The analytical process and subsequent development of material property models is anticipated to become one of the principal means of summarizing, integrating, and reconciling the diverse suite of earth-science data acquired through site characterization and of recasting the data in formats specifically designed for use in further modeling of various physical processes. KEYWORDS: indicator simulation, rock quality designation, variogram, core data 1 Principal Investigator, Sandia National Laboratories/Spectra Research Institute, MS 1324, P.O. Box 5800, Albuquerque, NM 87185-1342 2 Principal Investigator and Senior Member Technical Staff, Sandia National Laboratories, MS 1324, P.O. Box 5800, Albuquerque, NM 87185-1342 3 Principal Investigator, Sandia National Laboratories/Spectra Research Institute, MS 1324, P.O. Box 5800, Albuquerque, NM 87185-1342
218
CROMER ET AL. ON ROCK QUALITY DESIGNATION
219
INTRODUCTION Yucca Mountain, Nevada is currently being studied by the U. S. Department of Energy as a potential site for the location of a high-level nuclear waste repository. Geologic, hydrologic, and geotechnical information about the site will be required for both engineering design studies and activities directed toward assessing the waste-isolation performance of the overall repository system. The focus of the overall Yucca Mountain Site Characterization Project is the acquisition of basic geologic and other information through a multidisciplinary effort being conducted on behalf of the U. S. Department of Energy by several federal agencies and other organizations The location of the proposed underground facilities and the proposed subsurface access drift are shown on Figure 1. Also shown are the locations for the bore holes used in this two-dimensional study. The Yucca Mountain site consists of a gently-eastward dipping sequence of volcanic tuffs (principally welded ash flows with intercalated nonwelded and reworked units). Various types of alteration phenomena, including devitrification, zeolitization, and the formation of clays, appear as superimposed upon the primary lithologies. The units are variably fractured and faulted. This faulting has complicated characterization efforts by offsetting the various units, locally juxtaposing markedly different lithologies. Most design interest is focused on the Topopah Spring Member and immediately adjacent units. By comparison, the waste-isolation performance of the repository system must be evaluated within a larger geographic region termed the "controlled area" (Figure 1). The region evaluated by this study is contained entirely within the controlled area. In general, this study is further restricted to the location of the subsurface access drift known as the North Ramp, in keeping with a general engineering orientation. This twodimensional study represents the first attempt to identify local uncertainty in the rock structural index known as Rock Quality Designation (RQD).
CONCEPTUAL MODEL The U.S. Geological Survey provided the original geological cross-section model along the North Ramp (USGS, 1993). That model was subsequently modified by others and new cross-sections have also been prepared manually. For this study, the cross-section shown in Figure 2 was recreated interactively using the Lynx GMS Geosciences Modeling System, to insure that all of the new bore hole data and corroborative surface control (Scott and Bonk, 1984) was honored. The cross-section shown in Figure 2 is consistent with the conventional assumption that all faults in the repository area are generally down-thrown on the west side. This interpretation requires a variable, but relatively steep, dip to the beds that can locally exceed 6 degrees (10% grade). This cross-section also suggests the possible existence of one or more faults with the east side down thrown. The eight bore holes noted in Figure 2 are of variable lengths and are shown in their proper orientation with respect to the
220
GEOSTATISTICAL APPLICATIONS
Nellis Air Force Range
NRG-7A
t
N
(Not to Scale) Conceptual Perimiter Drift Boundary,
- - --- --
~- --) -- - -- -
ELM (Public Lands)
NRG-l"
North Ramp
Nevada Test Site
---<----Subsurrace Access Drift
Figure 1 General site map of the proposed repository area.
NRG-7A
NRG-S NRG-4 NRG-2 & NRG-2a
(Not to Scale)
Figure 2 Cross-section along the axis of the North Ramp.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
221
cross-section. For ease in interpretation, only the variations in gross lithology between welded and non-welded tuffs are differentiated in this figure.
RQD AS A REGIONALIZED VARIABLE During construction, emplacement, retrieval (if required), and closure phases of the project, consideration of excavation stability must be incorporated into the design to ensure worker health and safety, and to prevent development of potential pathways for radionuclide migration during the post-closure period. In addition to the loads imposed by the in-situ stress field, the repository drifts will be impacted by thermal loads developed after waste emplacement and, periodically, by seismic loads. Rock mass mechanical properties, which reflect the intact rock properties and the fracture joint characteristics, are used in detailed mechanical analyses to evaluate the host rock response to loading. The RQD index is widely used as an indicator of rock quality/integrity in rock mechanics practice. The concept of RQD is that of a modified core-recovery percentage that incorporates only nonfractured pieces of core that are 0.33 ft (0.10 m) or greater in length:
Run RQD
=
L Piece Lengths ~ 0.33ft x 100% Run Length(ft)
Although other parameters of rock quality are available and widely accepted e.g., rock mass rating system (RMR) and rock mass quality (Q), the RQD index is considered to be a good indicator since it reflects a combined measure ofjoint frequency, degree of alteration and discontinuity filling, if these exist (Deere and Deere, 1989). In fact, both RMR and Qmeasurements incorporate a factorial component ofRQD in their derivation. Common tunnelers' rock quality classifications (Deere & Deere, 1989) are correlated to RQD values in Table 1, while the information provided in Table 2 summarizes expected shotcrete and additional support requirements for a tunnel in rock which has been excavated by a boring machine (Cecil, 1970). This study found the recovery data for individual core runs to be highly variable. Typically, core runs with poor or no recovery are often short and numerous, while intervals with high recovery are usually as long as the drillers could make them. RQD was measured on individual core runs, but the high local variability in core recovery and disparate lengths of core runs made analysis of core run data difficult. A weighted average composite ofRQD values on 10-foot intervals provided useful information with which to perform geostatistical analyses.
222
GEOSTATISTICAL APPLICATIONS
TABLE 1 ROCK QUALITY CLASSIFICATION (from Deere & Deere, 1989)
Excellent Good Fair Poor Very Poor
90-100 75-90 50-75 25-50 0-25
Intact Massive, moderately jointed Blocky and seamy Shattered, very blocky and seamy Crushed
TABLE 2 GUIDELINES FOR SELECTION OF PRIMARY SUPPORT FOR 2-FOOT TO 40-FOOT TUNNELS IN ROCK (from Cecil, 1970)
SHOTCRETE SUPPORTITHICKNESS ROCK QUALITY Excellent RQD> 90 Good RQD 75 to 90 Fair RQD 50 to 75 Poor RQD 25 to 50 Very Poor (excluding swelling)
RQD < 25
CROWN None to occasional Local 2 to 3 inches 2 to 4 inches 4 to 6 inches
SIDES None None None 4 to 6 inches
6 inches or more on whole section
6 inches or more on whole section
SUPPORT None None Provide for rock bolts Rock bolts as required (approx. 4-6 ft cc) Medium steel sets as required
At the heart of geostatistics is the concept of the regionalized variable (ReV). Without expanding upon random function theory, the ReV can be considered to be a single-valued function defined over a metric space that has properties intermediate between a truly random variable and one that is deterministic. In practice, a ReV is preferentially used to describe natural phenomena which are spread out in space (andlor time) and which display a certain structure. This structure is typically characterized by fluctuations that are smooth at a global scale but erratic enough at a local scale to preclude their analytical modeling (Olea, 1991). Unlike true random variables, the ReV has continuity from point to point, but the changes in the variable are complex.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
223
Previous studies ofRQD at Yucca Mountain (Lin et.al., 1993) concluded that, in general, fracture frequency was found to increase with increasing degrees of welding in the volcanic tuffs. The use of an average RQD value for representing the rock quality of an entire unit, though, was not deemed appropriate to account for its observed spatial dispersion. Lateral variation offracture frequencies and RQD observed by Lin, lead to the recommendation that a range of values should be considered in the drift design methodology. While Lin's previous work recognized lateral variability ofRQD within units, this paper outlines the first attempt to model or further examine the nature of these changes. Although it would appear, at this point, that RQD could be considered a ReV in a manner similar to other rock properties that can be expected to vary in space, e.g., porosity or hydraulic conductivity, the dependence ofRQD upon not only the structural fabric of Yucca Mountain but also the relationship between the vertical borehole data and that same structure, produced some unanticipated problems. Of the four drill cores available to Lin (1993) for evaluation, nearly 95% of the 4000 fractures measured occurred within the more densely welded units and possessed nearvertical dip orientations. While this vertical nature of fracturing is consistent with most of the faults and fractures in the Basin and Range geological province which characterizes the Yucca Mountain area, it required Lin to make corrections when estimating the nondirectional volumetric fracture frequency for each unit. All the data available to this study is also from drill cores and subject to similar considerations, i.e., there may be question as to the validity ofRQD measurements derived from vertical drill holes that align themselves sub-parallel with the structural fabric. For example, intervals of good core recovery and relatively high RQD may simply reflect an isolated block of intact rock in pervasively fractured ground. It can also be shown that the orientation on the drill cores and the sample volume analyzed can influence the interpretation of RQD and distort characterization and modeling of the ReV. Fortunately, this study is focused in two-dimensions along the axis of a subsurface drift known as the North Ramp. The definition of the ReV, and any interpretations made from its modeling, will therefore be constrained and specific to this locale. Although limited, the RQD data are consistent in their sample size (boring diameter and 10 foot composite lengths) and general orientation (all vertical, except for drillhole NRG-3).
ANALYSIS OF SPATIAL VARIATION
RQD data were developed along the North Ramp from eight boring cores. These borings are shown in Figure 3 and are vertically exaggerated by five times their actual displacement. The histogram in Figure 4 shows the frequency ofRQD values, as grouped into 10 classes. A review of this histogram shows a positively skewed distribution having a mean value of25.3 and a standard deviation of24.5. It is interesting to note that 75% of these values are below an RQD value of 44.0. The limited and sporadic
224
GEOSTATISTICAL APPLICATIONS
21300.
NRG-7A
§_~L~
\I
NRG-3\
19300 .
I
NOG
.
III
. .. 1:1
'=
~ U
17300.
15300 .
NRG-S
I
13300 . -!-o-.,.,-,'TT'.-.-..,..,..,,.......,............,,......................,n-r,.,......,.,rTT..-j
8000
6000
4000
2000
o
distance from NRG-l Figure 3 Posting ofRQD data in the bore hole along the axis of the North Ramp.
50 .0 40 .0
!
>- 30.0 c
i
()
•
:::J I:T
N
20.0
....f 10.0 0 .0 W.....J....J......L...J.....L....J.....L....l......L....J.....r:;::::;::t;::.::I:i::!l:i:;::J;;j;;;g;;;;=~ 0 .0 10 .0 20 .0 30 .0 40 .0 50 .0 60 .0 70.0 80 .0 90 .0 100 .0
RQD Figure 4 Histogram of the RQD data along the North Ramp.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
225
occurrences of higher RQD values is problematic to extracting a single model (variogram) that accurately represents the spatial correlation of the ReV. There are many situations in which the pattern of spatial continuity of higher values is not the same as that oflower values. For example, preferential groundwater flow resulting from aquifer heterogeneity can often be attributed to isolated occurrence of sand/gravel stringers/lenses that possess the very highest permeability. When such marked disparities exist within a distribution, the higher values tend to increase within-lag variability and make variogram interpretation difficult. The focus of this study was, therefore, directed more specifically at trying to understand the nature of the spatial structure from the lower RQD values and predict their occurrence as they relate to specific design issues. Indicator methods (Isaaks and Srivastava, 1989) are non-parametric and provide the flexibility needed to focus this study on particular classes of data. This focus is accomplished by transforming the raw data distribution into K mutually exclusive classes of binary, indicator variables. The indicator transform of the raw data is typically defined as either zero or one, depending upon whether it falls above or below a particular data value (cutoff threshold): .
l(X·Z
'k
)=
{lifZ(X)~Zk Oif z(x» Zk
where Zk is the cutoff threshold for class k; k=1,2, ... K. Spatial relationships between the indicator transforms ofRQD were determined by examining all data pairs oriented along the principal directions of anisotropy and separated by pre-defined "lag" distances. The variogram is defined as half of the average squared difference between the indicator pairs, and Cromer and Srivastava (1992) suggest the indicator variogram can be viewed as the probability of switching from one indicator class to another over a certain distance along a given direction. With limited information, it is always a challenge to extract well-structured (low nugget effect, well defined range) variograms. Isaaks and Srivastava (1989) noted that this is especially true with bore-hole data where, typically, vertical information is closely spaced in comparison to the horizontal separation between borings. Because of the abundant data-pairs at all lag intervals and the likelihood of within-unit similarities along the length of core, vertical variograms often appear stable and well-structured. Horizontal variograms, by comparison, often display poor short-scale correlation structure when data are limited. Figure 5 displays the vertical indicator variogram developed for the 25 RQD cutoff. A clearly defined structure was apparent in the vertical orientation and was modeled using the parameters outlined in Figure 5. The computer software UNCERT (Wingle et. a!. , 1994) was used to generate and model the variograms used in this study.
226
GEOSTATISTICAL APPLICATIONS
Contrary to the vertical orientation, the horizontal spatial structure (Figure 6) at the 25 RQD cutoff is not as clearly definable. Although some structure is apparent in Figure 6, there are many instances were one must rely on additional information, external to the sample data, to formulate a professional judgment on the horizontal component of this anisotropy. For example, geologic information collected from surface transects can often provide details on principal directions of spatial correlation for the ReV at a scale not easily observed through isolated borings. An omni-directional variogram developed for this cutoff would not, in an explicit fashion, draw attention to limitations and anisotropy in the bore hole data... making this an excellent case for emphasizing that exploratory data analysis cannot be discounted as simply an exercise, nor should geometric anisotropy be ignored. The indicator formalism allows us to model mixtures of populations loosely defined as classes of values of a single attribute (Deutsch and Joumel, 1992). By isolating specific classes of data from the global cumulative distribution, variograms built on the indicator transform can often reveal a pattern in spatial continuity not available through other techniques. When estimating or simulating an indicator variable, the variogram model should be consistent with the particular cutoff under study. For example, to examine the distribution ofRQD values less than or equal to 25, a variogram that captures spatial continuity of the 25 threshold should be used. The lower RQD values display a relatively stable, continuous structure consistent with the conceptual geologic framework. This stability, although, does not persist when examining the 50 RQD indicator threshold. At the 50 RQD threshold, the vertical variogram in Figure 7 continues to exhibit good structure and was modeled with the parameters shown. The horizontal variogram (Figure 8), on the other hand, has degenerated with the inclusion of 78 additional indicator data previously assigned a value of 0 at the 25 cutoff. Not many conclusions can be drawn from such a relationship and, therefore, the horizontal component was modeled with an effective nugget at the theoretical sill of 0.147. The observed degeneration of the variogram structure (higher nugget, poorly defined range) reflect upon the erratic spatial occurrences of higher RQD values, which is again consistent with the conceptual framework.
SIMULATION OF RQD ALONG THE NORTH RAMP Simulation of the complete cumulative distribution of RQD was attempted using indicator cutoffs at 10,25,50 and 75. These cutoffs were selected to roughly correspond to the tunnel support design guidelines shown in Table 2. Although selection of these cutoffs can only provide us with a crude approximation of cumulative distribution of RQD, examining additional thresholds was not warranted given the objectives of this evaluation. Since limited and sporadic high RQD data imposed the observed rapid degeneration of the variogram structure at higher thresholds, the potential for order relations problems
CROMER ET AL. ON ROCK QUALITY DESIGNATION
o.40 r;::::=:::::!==:::!:==-::---..--r-----,~-_, MOdll Par_mltar, Nut Rangt 1 7. Nugglt
0 .30
•E E
•
!'
0 .17
Nodal Spharlc&l
0 .08
..
0 .20
'"
..
Data Par.maters • Data F" = InttJkomt/mvcromaJrqd8k.mlYatlrqdeuct.dal Gamma Fl. : In.tJhom.fmvcrom./rqd6k.mltl.r/lnd20 ... .g.m Indicator 61mlllarlogrlm • Point Data
0 .10
lag : 10 Direction : 8 Plunga : 0 HaI·Angle : 10 Horizonltl eandwldll'l : 40 V.l1k:ar Bandwidth : 1 HaI.Angla : 0
FI!. Column: 1
LaYlr: 0
Row : 2
Vaklt : S
L ........---''--........~==::::::i:==:j:==:::::;:==~
0 .00 0.00
40.00
80 .00
160.00
120.00
200.00
240.00
separation distance (feet)
Figure 5 Variogram along vertical principle direction of anisotropy, for the 25 RQD indicator cutoff.
0 .40
r;::::=:::::!==:::!:==r. . . . -,..-............,.-..,.....-; MOdll Puamtter. Nut Rang.
0.30
•E E
• '"
!'
1
1000
Nugget
Mod.1 0 .17
Spherical
0 .08
-----------------------:=,;-------------1 0 .20 Dati; P.... m.t.'.
•
Data FUt : Inttlhomlllhnvcromt/rqd8k.mlvatllnd20v.gam aamma Fl. : Inatlhomtfmvcrom I/rqd8k.mlVarnn620h .gam Indicator Stmlrrtar!ogram · Point Data
0 .10
lag : 200 Dlrlctlon : II
Plunge : 0
Hor!zonul Bandwidth: 40
Vertical Bandwidth : 1 Fne Column: 1 Row: 2
Hall-Angle : 40 HaI·Angle : 0 Llyer: 0 Value : S
L ..............._ .........z:==::.::::==;::=::::::::::;:==:J
0 .00 0 .00
400 .00
800 .00
1200 .00
1600 .00
2000 .00
2400 .00
separation distance (feet)
Figure 6 Variogram along horizontal principle direction of anisotropy, for the 25 RQD indicator cutoff.
227
228
GEOSTATISTICAL APPLICATIONS
0.40
r.==::::::!==:::::!===r--..,..... . . . - r - -.......--, Mod.1 P'l'IIm.ter. N.lt R.ng.
0 .30
•E E • '"
0 .20
I'
1 10 Nu • • t
C
Mod.1
0 .017 0.08
Sph.r1c.1
.... . ~ D.tI
0.10
p.t.m.t.,.
Data F.. : /n.tthom. /mvcromlllrqdeklmlV.rllndSOh .g.m Gamm. F.. : In.tIhom./mvcromlllrqdlktmN.,nnd5OV .g.m Indlce.tor S.",N.rlogt.", - Point 0.1&
lag : 10 Direction: I
Plung. : 0
Horizontal Bandwidth : 40 Hal-Angl.: 10 V.rtSc.l l Bandwidth : 1 Hal-Anlill. : 0
Flit Column: 1
Row: 2
Laytr: o
V.Iu. : a
L ............._ .........z==::::i:::::::;:====~==:::!.J
0 .00 0 .00
40.00
80.00
120.00
160.00
200 .00
240.00
separation distance (feet)
Figure 7 Variogram along vertical principal direction of anisotropy, for the 50 RQD indicator cutoff.
0 .40
r;::::::::::::!==::::!:===f"""""'-""-""'---'-"""""" Mod" Par.ln.tIl,.
,
N.. t
Rang.
C
Mod.1
1
.0
0 .087 0 .01
Sph.rlc.1
NUIIII.t
0 .3 0 1 - ' - - - - - - - - - - - '
•E
'
.•
E 0.20
b--=--..:,----ID·~:·~;~::~m.hn 0 .10
.
cromlllrqd.k.m/v.'n rwl50h.SI.m Gamm. F.. : fn.tIhom. /mvcromalrqdl • • mNarnndSOh.Q.m Indicator 6.mtnrloar.m - Point 0.1& Lag : 200
I
Dlrecllon: "
Pklng.: 0
Horizontal B.ndwkUh : 40 Half-Angle : 40 V.rdcal B.ndwldth : t HaI. Anal. : 0 Fl . Column : 1 Row : 2 Lay.r: o Value: S
0 .00 0 .00
400 .00
800.00
1200 .00
1600 .00
2000 .00
2400.00
separation distance (feet)
Figure 8 Variogram along horizontal principal direction of anisotropy, for the 50 RQD indicator cutoff.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
229
existed. These problems threaten the capability of the simulation algorithm to represent the higher two thresholds (50 and 75) because covariance reproduction is not constrained where high nugget effects prevail. For this reason, quantitative inferences will not be made from these upper thresholds. The indicator transforms were simulated using the sequential indicator simulation algorithm SISIM (Deutsch and Journel, 1992). Figure 9 shows three separate simulated fields ofRQD along the axis of the North Ramp. Each field is conditioned to existing bore-hole data and present a plausible version of the "reality" defined by first- and second-order statistical moments. These figures have been vertically exaggerated by five times the horizontal dimension for detailed examination. The three images display some similar textural characteristics, while the uncertainty in their representation is captured by the differences between images. If each image was used, for example, as input data to some process modeling code for design purposes (say, in a Monte Carlo fashion), the variation in outcomes from the process model would explicitly account for uncertainty. Geostatistical simulation was selected over estimation in this study because of its robustness in addressing potential "downstream" application questions. Simulation differs from estimation in two major aspects; (1) simulation techniques provide highresolution models that strive for overall textural and statistical representation rather than local accuracy and (2) the differences among the alternative models provide a measure of joint spatial uncertainty. For example, some uncertainty issues can be addressed simply from the equiprobable images (models) prior to any downstream process modeling. In Figure lOa total of 100 conditional simulations have been processed using the crude cumulative distribution function defined by the four indicator thresholds 10, 25, 50, and 75 to determine the distribution of expected (mean) RQD values. The gray-scale limits the displayed variability in the image to range between a maximum of 50 and a minimum of 0 to capture detail. Notice the limited occurrence of values that equal or exceed 50, indicating and that these are isolated occurrences that should not be expected to propagate spatially. Most areas of the expected value map shown in Figure 10 are dominated by values that range between 20.0 and 30.0. Although this is consistent with the histogram ofRQD values shown in Figure 4, it may also be due, in part, to the selection of simple kriging as the local estimator. Simple kriging was chosen over ordinary kriging because of the scarcity of data and the risk of unwarranted data propagation (Deutsch and Joumel, 1992). If ordinary kriging is used with sequential simulation, there may be a tendency to propagate locally simulated values in a manner inconsistent with the conceptual model. This characteristic becomes more problematic when there is a lack of constraining, original data. Since each pixel (model grid cell) is simulated 100 times, the statistical distribution of each local outcome allows us to query characteristics of the outcome distribution ofRQD values on a pixel-by-pixel basis. Post-processing of several outcomes can provide information such as the probability of exceeding a specified threshold or the average
230
GEOSTATISTICAL APPLICATIONS
N "'.0 45.0
40.0 35.0 30.0 2>.0 20.0 15.0
10.0
5.0 .0
N "'.0 45.0
40.0 35.0 30.0 25.0 20.0 15.0 10.0
5.0 .0
N "'.0 45.0
40.0 35.0 30.0 2>.0 20.0 15.0
10.0
5.0 .0
Figure 9 Three alternative 2-D images (realizations) ofRQD along the axis of the North Ramp. The angular trace in the middle of each image represents the vertical orientation of the ramp within the cross-section. Each image can be considered equally probable given the state of existing knowledge, because each is conditioned to the same sample data and honor the same spatial statistics. The differences between the images provides a measure of joint spatial uncertainty.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
231
value above, or below, a threshold. A map showing the value at which an individual pixel reaches a specified cumulative probability, for example, would provide valuable information for quantifying risk. Figure 11 shows the probability of exceeding an RQD value of 25. Although this map looks very similar to the expected value map of Figure 10, it is revealing very different information. The gray-scale in Figure 11 ranges between zero (0% probability) and one (100% probability), unlike the expected value map.
CONCLUSIONS Unfortunately, the 2-D simulated images along the North Ramp cross-section do not explicitly focus information on the expected variability to be encountered along the drift itself. To evaluate anticipated conditions specifically along the drift, the designed inclination of the drift has been projected from the tunnel entrance and is shown as the trace super-imposed on the images in Figure 9. The expected (mean) value ofRQD along the tunnel projection has been extracted on a pixel-by-pixel basis for comparison against each of the three simulations presented in Figure 9. The graphs shown in Figure 12 allow us to compare the variability in simulated RQD along the three tunnel projections (taken from Figure 9) as a function of distance from the right (east) edge of the cross-section. As a point of reference, the east edge of the crosssection also corresponds to the location of boring NRG-l. The most immediate observation in Figure 12 is the widespread, erratic fluctuations in simulated values about their expected (mean) value. This was to be expected following our variography exercises and discovery of the limited horizontal correlation range (approx. 800.0 ft (243.8 m)) for lower RQD values and negligible spatial correlation in the higher RQD values. What is not so apparent is the performance of the simulation in areas that are conditioned by the available boring logs. At distances ofless than 3200 ft (975.4m) from NRG-l the simulations, in general, tend to deviate less from the expected value. Boring log data in this region are available to constrain uncertainty and, therefore, reduce the spread oflikely outcomes for a local prediction
FINAL THOUGHTS Basic exploratory data analysis identified a great deal oflocal variability in RQD. Although very low RQD (i.e. less than 25) can be anticipated periodically along the entire length of the North Ramp, it would not be prudent to extrapolate this interpretation to the entire mountain. Three factors were found to influence the interpretation ofRQD: 1) stratigraphic setting, 2) proximity to major fault/fracture zones, and 3) very local footby-foot factors (likely due to individual high-angle fractures sub-parallel to the drill core). The high degree of variability over very short distances may require design planning to accommodate the worst rock conditions along the entire length of excavation
232
GEOSTATISTICAL APPLICATIONS
Figure 10. Mean (expected) value map developed from 100 individual simulations of RQD.
1 .0 .90 .80 .70 .60 .50
.40 .30 .20 .10 .0
Figure 11 Probability map reflecting the likelihood of exceeding an RQD value of 25 . Note the scale reflects a probability range from 0% to 100%.
CROMER ET AL. ON ROCK QUALITY DESIGNATION RN - 112063
RN-30157
10
_________ _
Horlzontlol dl,lance frvm "RO·t
RN-22475
10
"
~~~a~~~~§ .",IM_ """" NRO·'
Horlzontlll
Figure 12 Simulated RQD values along the proposed North Ramp taken from the three fields shown in Figure 9. For comparison, also shown (in bold) are their expected values derived from the 100 simulations.
233
234
GEOSTATISTICAL APPLICATIONS
Investigative work on rock properties in the exploratory studies facility is underway to supplement drill hole data with an adequate number and distribution of data pairs collected in a fashion that will support geostatistical analyses. In the meantime, simulation analyses has provided a preliminary assessment of the conditions that could be encountered during the excavation of the North Ramp. Indicator simulation along the axis of this drift identifies the need for additional information if this study, or similar studies, are to forecast engineering requirements for facilities design, especially with respect to spatial continuity of higher RQD values.
i
I
This study has demonstrated how the measurement and analysis of data may lead to interpretations that are not obvious or apparent using other means of research. Although, many statistical tools are useful in developing insights into a wide variety of natural phenomena; many others can be used to develop quantitative answers to a specific questions. Unfortunately, most classical statistical methods make no use of the spatial information in earth science data sets. However, like classical statistical tests, geostatistical techniques are based on the premise that information about a phenomenon can be deduced from an examination of a small sample collected from a vastly larger set of potential observations on the phenomena. Geostatistics offers a way of describing the spatial continuity that is an essential feature of many natural phenomena and provides adaptations of classical regression techniques to take advantage of this continuity. The quantitative methodology found in applications of geostatistical modeling techniques can reveal the insufficiency of data, the tenuousness of assumptions, or the paucity of information contained in most geologic studies.
II REFERENCES Cecil III, O.S., 1970, "Correlations of Rock Bolt--Shotcrete Support and Rock Quality Parameters in Scandinavian TWmels," Ph.D. Thesis, University of Illinois, Urbana. Cromer, M. V. and R. M. Srivastava, 1992, "Indicator Variography for Spatial Characterization of Aquifer Heterogeneities," in Water Resources Plannjn~ and Mana~ement. Proceedin"s of the Water Resources sessions at Water Forum '92. Au~ust 2-5.1992, American Society of Civil Engineers, Baltimore, MD, pp 420-425.
i' I;
I
Deere, D.U., and D.W. Deere, 1989, "Rock Quality Designation (RQD) after twenty years: US Army Corps of Engineers," Contract Report GI-89-1. Deutsch C.V. and A.G. Journel, 1992, "GSLIB Geostatistical Software Library and User's Guide," Oxford Univ. Press, New York, New York. Isaaks, E. H., and R. M. Srivastava, 1989, "An Introduction to Applied Geostatistics," New York: Oxford University Press.
CROMER ET AL. ON ROCK QUALITY DESIGNATION
235
Lin, M., M. P. Hardy, and S. 1. Bauer, 1993, "Fracture Analysis and Rock Quality Designation Estimation for the Yucca Mountain Site Characterization Project: Sandia Report SAND92-0449," Sandia National Laboratories, Albuquerque, NM. Olea, R.A. ,1991, "Geostatistical Glossary and Multilingual Dictionary," International Association of Mathematical Geology Studies in Mathematical Geology No.3, Oxford University Press. Scott, R.B. and J. Bonk, 1984, "Preliminary Geologic Map of Yucca Mountain, Nye County, Nevada, with Geologic Sections," U.S. Geol. Survey Open-File Report 84-494. US Geological Survey, 1993 "Methodology and source data used to construct the demonstration lithostratigraphic model: second progress report". Wingle, W. L., E. P. Poeter, and S. A. McKenna, 1994 "UNCERT User's Guide: A Geostatistical Uncertainty Analysis Package Applied to Ground Water Flow and Contaminant Transport Modeling," draft report to the United States Bureau of Reclamation, Colorado School of Mines.
James R. Carr! REVISITING THE CHARACTERIZATION OF SEISMIC HAZARD USING GEOSTATISTICS: A PERSPECTIVE AFTER THE 1994 NORTHRIDGE, CALIFORNIA EARTHQUAKE
REFERENCE: Carr, 1. R., "Revisiting the Characterization of Seismic Hazard Using Geostatistics: A Perspective After the 1994 Northridge, California Earthquake," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, and Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996.
ABSTRACT: An indicator kriging model of seismic hazard for southern California, based on the time period 1930 - 1971, is developed. This hazard assessment is evaluated in light of the occurrence of more recent, moderate earthquakes: the 1987 Whittier Narrows, the 1990 Upland, and the 1994 Northridge earthquakes. The hazard map shows relatively poor spatial correlation between regions of high hazard and known, active faults. A hypothesis is developed, however, suggesting that high seismic hazard in southern California is a function of spatial proximity to all active faults, not to anyone active fault. KEYWORDS: seismic hazard, modified Mercalli intensity, southern California, kriging, semivariogram, indicator functions
Geostatistical analysis of earthquake ground motion was first attempted by Glass (1978). Therein, modified Mercalli intensity data for the 1872 Pacific Northwest earthquake were analyzed using semivariogram analysis, then regularized (gridded) using kriging and contoured. Glass (1978) demonstrates the usefulness of geostatistics vis-a-vis semivariogram analysis and kriging for analyzing earthquake ground motion. Based on the success of Glass (1978), an experiment was attempted to characterize seismic hazard for southern California (Carr 1983; Carr and Glass 1984). Kriging was used to form digital rasters of modified Mercalli intensity data for all earthquakes in the time period, 1930 - 1971, that occurred within a 125 km radius of San Fernando, California (an arbitrary choice). These digital rasters !Professor, Department of Geological Sciences/l72, University of Nevada, Reno, NV 89557 236
CARR ON NORTHRIDGE EARTHQUAKE
237
were geographically registered and, as such, served as input to a Gumbel (1958) extreme events model for computing seismic hazard. Procedures for developing this model consisted of the following steps: 1) kriging was used to form a digital raster for each earthquake in the aforementioned time frame; all of these rasters were geographically registered; 2) for each year, 1930 - 1971, if more than one earthquake occurred, then the maximum kriged intensity for each cell of the raster was found and a summary raster formed reflecting maximum intensity for the year; this process resulted in 42 digital rasters, each a record of maximum intensity values for an entire year; 3) Gumbel (1958) statistics of extreme values was used to compute the probability that an intensity VI was exceeded for a raster cell over the 1930 - 1971 time period; an intensity VI was an arbitrary choice, but this is the intensity value at which exterior damage to buildings begins. These exceedance probabilities constitute the seismic hazard (Fig. 1).
119
118
Fig . 1. Seismic hazard model developed using Gumbel (1958); from Carr (1983) and also published in Carr and Glass (1984). Contoured values are probabilities (%) of exceeding an intensity VI over a 50 year period.
A Gumbel (1958) model requires that certain decisions be made when computing the probability of exceeding a particular level of ground motion. For example, a minimum, or threshold, ground motion value must be chosen for calculations. In Carr and Glass (1984), for instance, a minimum intensity value of III was chosen, yet in many years, the minimum value was actually O. The choice of an intensity III was entirely arbitrary . As an alternative to a Gumbel (1958) model, Carr and Bailey (1985) developed an indicator kriging (c.f., Iournel 1983) seismic hazard model. This model does not use Gumbel's statistics of extremes method for computing exceedance probabilities. Instead, modified Mercalli intensity data are rust converted to indicator values as is described later. Once converted to indicator values, kriging is applied to the indicator data to form digital rasters. As in the Carr and Glass (1984) model, these rasters
238
GEOSTATISTICAL APPLICATIONS
were geographically registered during the kriging process. Because the rasters are registered, the final step in the indicator kriging model is simply a summing of all rasters to form one, combined raster. A contour map of the combined raster shows the frequency of exceeding a threshold VI over a particular time period. This frequency constitutes the seismic hazard for a particular geographic region. Carr and Bailey (1985) applied the indicator kriging model to the New Madrid, Missouri seismic zone in the time period, 1811 - 1980. Because the indicator kriging model is considerably easier to apply in comparison to one using the Gumbel (1958) method, the seismic hazard in southern California in the time frame 1930 - 1971 is revisited herein using indicator kriging. One objective of this study is to compare the seismic hazard map from indicator kriging to that obtained using a Gumbel calculation. Another aspect of this analysis is to compare the occurrence of recent southern California earthquakes, in particular the 1987 Whittier Narrows, the 1990 Upland, and the 1994 Northridge earthquakes, to the seismic activity that preceded them (1930 - 1971).
A BRIEF REVIEW OF GEOST ATISTICS In general, geostatistical methods are useful for characterizing the spatial variation of regionalized phenomena. Other than earthquake ground motion, geotechnical applications include soil density and strength, ground water level, and ground water salinity; of course, there are many more examples. The term, geostatistics, is often considered synonymous with the spatial estimation technique known as kriging (Matheron 1963). This estimator is a relatively simple, weighted average of the form: N
Z*(xo }
= Laiz(xi } i=l
wherein Z(x;) are data values at N nearest data locations to the estimation location, Xc; Z'(Xc) is the estimated value at the estimation location, Xc, and the values, a, are weights applied to the N data values to obtain the estimate. A restriction is placed on the weights, a, in ordinary kriging such that their sum is 1; this assures unbiased estimation. That kriging is a relatively simple estimator is seen in its equation form, a simple weighted average. Obtaining the weights, a, for this equation is more complicated. The weights are obtained by solving: [COVu]{a} = {COVo;}. Notice that these matrices are functions of spatial covariance (COV). Covariance in this case is the autocovariance of the spatial data, Z, between two locations in space. Knowledge of spatial covariance is obtainable from what is known as the semivariogram (often referred to simply as the variogram; see Matheron 1963; or 10urnel and Huijbregts 1978). The semivariogram is estimated from the spatial data
CARR ON NORTHRIDGE EARTHQUAKE
239
as follows: N
Y (h)
=
-.l L 2N i
[Z(x)
- Z(xi+h) J 2
=1
which is a function of average squared difference in Z as a function of spatial separation distance (lag), h. Once the semivariogram is calculated, it must be modeled for use in kriging. Only a few functions , those that are negative semi-definite, qualify as valid models (see Journel and Huijbregts 1978). The most useful semivariogram model is known as the spherical model and is graphed (Fig. 2) . To model a calculated semivariogram (Fig. 2), values for the nugget, sill, and range (Fig. 2) are interpreted, allowing the spherical model equation to fit the calculated semivariogram as closely as possible (Fig . 2) . Then, spatial covariance is obtainable from the semivariogram model as follows: COV(h)
= s i l l - y (h)
In kriging, once a semivariogram model is selected and parameters defined (nugget, sill, and range), covariance entries in the foregoing matrix system are computed using the semivariogram model. How these calculations are performed is described in Carr (1995) using hand calculation examples. Once the covariance matrix entries are obtained, the matrix system is solved for the weights, a, using an equation solver, such as Gauss elimination or LV decomposition. Software for semivariogram calculation and kriging is given in Deutsch and Journel (1992), including diskettes containing FORTRAN source code. Software is also given in Carr (1995) along with graphics routines for displaying results. 300 225 SiII - 165 r(h)
?"'..,...------
Spherical model
7S R:mge - 2.3 3
4
h nag distance. km)
Fig. 2. A calculated semivariogram modeled using a spherical model; note nugget (CO), sill, and range (from Carr 1995).
A BRIEF NOTE REGARDING THE DATA Herein is presented a seismic hazard model of southern California that is based on modified Mercalli intensity data. Such data are subjectively assigned and are integer values in the range 0 to XII (12). A value, 0, represents no ground motion; a
240
GEOSTATISTICAL APPLICATIONS
value, XII (12), represents total damage, landsliding, fissuring, liquefaction, and so on. A value, VI (6), is that value at which exterior structural damage is noticed, such as cracked chimneys. Interior damage is noted with a value, V (5). Subsequent to an earthquake, the United States Geological Survey distributes questionnaires to citizens living within the region experiencing the earthquake. They are asked to describe what they experienced during the earthquake. Examples include: 1) Did you observe damage and, if so, what was the damage?; 2) Did you feel the earthquake and, if so, where were you when you felt it? Intensity values are then assigned [subjectively] to each questionnaire. That modified Mercalli intensity data are subjective is obvious. What is not obvious is that geostatistics (kriging) is validly applied to grid (estimate) such data. Clearly, Glass (1978) showed this empirically. Joumel (1986) discusses the application of geostatistics to "soft," or subjective, data in considerable detail.
INDICATOR KRIGING SEISMIC HAZARD MODEL Indicator kriging is a form of kriging that does not entail a change in the equation for the kriging estimator, but does entail a change in the data to which kriging is applied. With indicator kriging, a transform is applied to the data, in this case modified Mercalli intensity values. This transform is a simple one: i(x) = 0, if Z(x) < c; i(x) = 1 otherwise; this simple transform yields the indicator function, i. Notice that the indicator function is a binary one, taking on only two possible values, o and 1. Because of this, the indicator function is said to be a nonparametric function, because the notion of a probability distribution for such a function is not pertinent. The nonparametric nature of the indicator function has certain advantages in geostatistics (Joumel 1983), chiefly the minimization of the influence of extreme data values on the calculation of the semivariogram and in kriging. The value, c, used to define the indicator function is called a threshold value. In this study of seismic hazard, c is that critical ground motion value chosen to define the hazard. In this study, c is chosen to be an intensity value of VI (6) because this intensity value is that at which exterior structural damage is first noticed. When performing indicator kriging, the indicator function, i, is used rather than the raw data, Z. Other than this substitution, the kriging estimator is applied using the same equation as shown before. Weights, a, are calculated using the matrix system shown previously; covariance entries in this matrix system are obtained using the semivariogram for the function, i. When performing kriging on i, estimates are obtained that range between 0 and 1, inclusive. As the function, i, is defined for seismic hazard analysis, the estimate of i is interpreted as the probability at the estimation location that ground motion exceeds the threshold value, c, used to define the indicator function. An indicator kriging model for assessing seismic hazard is a simple one. Modified Mercalli intensity data for each earthquake in a particular time period are
CARR ON NORTHRIDGE EARTHQUAKE
241
transformed to indicator values as follows: if intensity is VI or greater, the intensity value is converted to 1, otherwise the intensity value is converted to zero. Kriging is used to form a regular grid (a digital raster) of the indicator values. For this study, 50 x 50 rasters were designed, registered to geographic coordinates as shown in various figures herein (for example, Fig. 3). Once rasters are formed for each earthquake in the time period, the digital rasters are simply added together to form a final, composite-sum map. Higher hazard is realized in this map by noticing regions that are associated with a higher sum.
APPLICATION TO SOUTHERN CALIFORNIA SEISMICITY, 1930-1971 Indicator kriging has been used to characterize southern California earthquake hazard previously (van der Meer and Carr 1992). The present study uses all 46 earthquakes that occurred between 1930 and 1971 that were associated with intensity values of VI or greater (see Carr 1983 for a list of these earthquakes). Van der Meer and Carr (1992) used only the 11 largest magnitude earthquakes of this 46. Hence, one objective of this current study is to revisit the earlier indicator kriging model and to update it using more information. Another objective, one not considered by van der Meer and Carr (1992), is to compare recent, large earthquakes with the seismic patterns analyzed in the indicator kriging model that is based on the time period, 1930-1971. An indicator kriging seismic hazard model based on the 46 earthquakes is shown (Fig. 3). It shares some similarities to that obtained previously (Fig. 1). In particular, a region of high hazard is found in each map near Oxnard/Santa Barbara. However, the indicator kriging hazard map finds particularly high hazard north to northeast of Long Beach. Both maps (Figs. 1 and 3) are also associated with relatively low hazard near Mojave, California. Van der Meer and Carr (1992) focused analytical attention on whether high hazard correlated spatially with known, active faults. That study found that higher hazard could not be directly related to anyone active fault in southern California. This study verifies this conclusion. Higher hazard does not directly correlate spatially with known active faults (Fig. 3). Because southern California is associated with so many active faults, it is perhaps not surprising that higher hazard sometimes occurs spatially where it is not expected. A hypothesis (Fig. 4) is forwarded as a possible explanation. This figure shows three hypothetical earthquakes. A circle encloses each epicenter and ground motion intensity of at least MMI VI (6) is assumed to have occurred everywhere within each circle. The dark, gray patterned area is that affected by all three earthquakes and therefore has a higher hazard because three episodes of damaging
242
GEOSTATISTICAL APPLICATIONS
119
Fig. 3. Indicator kriging hazard map with major active faults superimposed. Regions associated with at least 6 episodes of intensity VI or higher ground motion in the time period, 19301971 , are highlighted in gray. The faults are coded as follows : A) White Wolf Fault; B) Garlock Fault; C) Big Pine; D) Santa Ynez; E) Oak Ridge; F) San Andreas ; G) San Gabriel; H) NewportInglewood ; I) San Jacinto .
ground motion were experienced . But, a higher hazard would not necessarily be expected within this gray-patterned region because it is not near anyone fault . Its proximity to three active faults, however, makes it vulnerable to damage during earthquakes occurring on all three faults . This hypothetical model is thought to explain the regions of higher hazard in Figure 3. With respect to Long Beach, it has experienced damaging ground motion from earthquakes occurring on the Newport-Inglewood fault (the 1933 Long Beach Earthquake), faults in the San Fernando Valley (e.g., the 2 February 1971 earthquake), the White Wolf fault (the 1952 Kern County earthquake) , and also earthquakes occurring on the San Gabriel, San Andreas, San Jacinto, Oak Ridge, and Santa Ynez faults. With respect to Oxnard, it has been affected by earthquakes on the Newport-Inglewood fault (1933 Long Beach earthquake), the Oak Ridge fault (1941 Santa Barbara and 1957 Ventura earthquakes), the White Wolf fault (1952 Kern County earthquake), and to a lesser extent by earthquakes in the San Fernando valley. As a test of the hypothesis (Fig. 4) , the active faults shown in Figure 3 are idealized as shown in Figure 5. A digital raster is developed for each of these faults as follows : 1) an attenuation function was designed from a general formula given in Cornell (1968): intensity = 5.4 + M - 31nR, where M is Richter magnitude and R is the distance from the fault; 2) a typical Richter magnitude was chosen for each of the nine (9) faults (Table 1); 3) a 34 x 34 digital raster (an arbitrary choice of size) was
CARR ON NORTHRIDGE EARTHQUAKE
243
E- EPICENTER - -' FAULT
119
Fig. 4. Three hypothetical earthquakes occurring on the faults shown. Notice that the gray-shaded region is affected by all three earthquakes.
118
Fig. 5. Idealized active fault locations. Codes for faults are the same as described in the caption to Fig. 3.
developed, geographically registered to the kriged seismic hazard rasters (note that this grid size is smaller than that used for indicator kriging; both grid sizes, however, are of arbitrary size and merely facilitate the construction of contour maps). An intensity value was estimated for each cell of the raster using the foregoing attenuation formula (not by indicator kriging in this case); 4) if the estimated intensity was VI or greater, the raster cell was assigned a value of 1; otherwise the cell was assigned the value O. Once a digital raster was developed by this procedure for each of the nine (9) active faults (Fig. 6), a composite raster was formed as a sum of all nine rasters. Frequency of intensity VI or greater ground motion was then contoured (Fig. 7). Gray shading highlights the geographic regions associated with the highest frequency of damaging ground motion. A comparison of Figure 7 to Figure 3 shows that regions of higher hazard found in the hypothetical map (Fig. 7) do not exactly match those in the indicator kriging hazard map (Fig. 3). But, the region of higher hazard near Long Beach (Fig. 3) is near one of the higher hazard regions of Figure 7; and, the higher hazard found near Oxnard (Fig. 3) is near another region of higher hazard found in Figure 7. Both Figures 4 and 8 identify lower hazard near Mojave. Similarities between these two maps are interesting and lend credibility to the foregoing hypothesis (Figure 4).
244
GEOSTATISTICAL APPLICATIONS
TABLE l--Richter magnitudes used for nine active faults. Fault White Wolf Garlock Big Pine Santa Ynez Oak Ridge San Andreas San Gabriel Newport-Inglewood San Jacinto
Magnitude 7.0 7.0 6.5 6.5 6.5 8.3 6.5 6.5 6.5
RECENT SOUTHERN CALIFORNIA EARTHQUAKES Epicenters of three, recent southern California earthquakes are plotted (Fig. 8): 1) the 1987 Whittier Narrows earthquake, magnitude 5.5 to 6.0; 2) the 1990 Upland earthquake, magnitude 5.0 to 5.4; and the 1994 Northridge earthquake, magnitude 6.6 (estimated). It is interesting that these three earthquakes occur closely to the San Gabriel Fault. With respect to the indicator kriging result, none of these earthquakes occurs within a region identified as having a high seismic hazard. Of course, this is the point made with the foregoing hypothesis (Figures 5 and 8) that higher hazard cannot be spatially correlated with anyone active fault in southern California. The 1987 Whittier Narrows and the 1994 Northridge earthquakes, for example, caused damaging levels of ground motion within the region of higher hazard found north of Long Beach; these earthquakes increased the hazard within this region. Furthermore, the 1994 Northridge earthquake caused damaging levels of ground motion in the Oxnard area, another region identified as having higher hazard. Only the 1990 Upland earthquake occurred in a lower hazard area and did not have a large enough magnitude to influence any of the higher hazard regions. Figure 8 also shows these three epicenters plotted on the hypothetical hazard map (Figure 7). The epicenters for the 1990 Upland and 1994 Northridge earthquakes occur just outside regions of highest hazard, whereas the epicenter for the 1987 Whittier Narrows earthquake occurs within the region of high hazard north of Long Beach.
CONCLUSION An indicator kriging seismic hazard model is much more easily developed in comparison to one based on Gumbel's statistics of extreme values (Gumbel 1958). With the indicator kriging model, modified Mercalli intensity data are first transformed to indicator values, transformed to 1 if the intensity is VI (6) or greater, transformed to 0 otherwise. Kriging is used to estimate the Oil indicator data at nodes of a regular grid, hence forming a raster. Once rasters are formed for all
CARR ON NORTHRIDGE EARTHQUAKE
245
<6
( _6
..,"
,
119
>6
~
~
118
Fig. 6. Region of intensity VI or greater ground motion for earthquakes occurring anywhere along the Newport-Inglewood fault having Richter magnitudes of 6.5.
\19
Fig. 7. Resultant hazard map produced by hypothesizing earthquakes along the entire spatial domain of active faults. Gray shading shows regions that the theoretical model pridicts having the highest frequency of intensity VI or better ground motion.
246
GEOSTATISTICAL APPLICATIONS
118
INDICATOR KRIGING MODEL
THEORETICAL MODEL
Fig . 8. The seismic hazard maps of Figures 4 and 8 with epicenters plotted for the 1987 Whittier Narrows, 1990 Upland, and 1994 Northridge earthquakes.
earthquakes occurring within a time window of interest, the rasters are simply summed to yield a fmal composite map (e.g., Figure 3) . No normalization is performed subsequent to this summing process. Final maps therefore do not represent probabilities, but instead represent total frequency of experiencing ground motion severe enough to cause damage. Moreover, this summing process assumes all rasters are geographically registered, a condition easy to achieve with kriging because, when used for gridding, geographic coordinates defming the grid must be entered into the computer program performing the kriging . Accepting the hypothesis that high hazard is a function of spatial proximity to all active faults, not just to anyone active fault, the hazard map produced herein using indicator kriging is judged to be plausible. Extremely high hazard is identified just to the north and west of Long Beach, California. Long Beach is within that region of southern California affected by more active faults than other regions of California. In fact, two recent earthquakes, the 1987 Whittier Narrows and the 1994 Northridge earthquakes, occurred close enough to this high hazard region to have produced damage within it. Another region of relatively high hazard is identified around Oxnard, California and reflects a relatively high level of seismic activity on the Oak Ridge fault within the Santa Barbara Channel. In summary, geostatistics offers spatial analysis tools that are quite useful for producing maps of seismic activity. Ground motion for individual earthquakes are readily gridded using kriging. Furthermore, what is presented herein is nothing more than a raster-based geographic information system. Hence, GIS programs having a raster-import capability , such as ARC/INFO, are capable of displaying results given herein once digital rasters have been formed using software such as is given in Carr (1995).
CARR ON NORTHRIDGE EARTHQUAKE
247
References Carr, J. R., 1983, "Application of the Theory of Regionalized Variables to Earthquake Parametric Estimation and Simulation," unpublished doctoral dissertation, University of Arizona, 259p. Carr, J. R., 1995, Numerical Analysis for the Geological Sciences, Prentice-Hall, Englewood Cliffs, New Jersey. Carr, J. R. and Glass, C. E., 1984, "A Regionalized Variables Model for Seismic Hazard Assessment," Eighth World Conf. on Earthquake Engineering, Prentice Hall, Englewood Cliffs, New Jersey, Vol. 1, pp. 207-213. Carr, J.R., and Bailey, R.E., 1985, "An Indicator Kriging Model for the investigation of Seismic Hazard" Mathematical Geology, Vol. 18, No.4, pp. 409-428, 1986. Cornell, C. A., 1968, "Engineering Seismic Risk Analysis," Bulletin of the Seismological Society of America, Vol. 58, No.5, pp. 1583-1606. Deutsch, C. V. and Journel, A. G., 1992, GSLIB: Geostatistical Software Library and User's Guide, Oxford University Press, New York. Glass, C. E., 1978, "Application of Regionalized Variables to Microzonation," Proc. 2nd International Conference on Microzonation for Safer Construction - Research and Application, Vol. 1, pp. 509-521. Gumbel, E. J., 1958, Statistics of Extremes, Columbia University Press, New York. Journel, A. G., 1983, "Nonparametric Estimation of Spatial Distributions," Journal of the International Association for Mathematical Geology, Vol. 15, No.3, pp. 445468. Journel, A. G., 1986, "Constrained Interpolation and Qualitative Information - The Soft Kriging Approach," Mathematical Geology, Vol. 18, No.3, pp. 269-286. Journel, A. G., and Huijbregts, Ch. J., 1978, Mining Geostatistics, Academic Press, London. Matheron, G., 1963, "Principles of Geostatistics,." Economic Geology, Vol. 58, pp. 1246-1266. van der Meer, F. D. and Carr, J. R., 1992, "Geostatistical Investigation of Earthquake Hazards in Southern California," ITC (Ind. Inst. for Aerospace Survey and Earth Sciences) Journal, 1992-2, pp. 164-171.
Farida S. Goderya1 , M. F. Dahab2 , W. E. Wolde, and I. Bogarde.
SPATIAL PATTERNS ANALYSIS OF FIELD MEASURED SOIL NITRATE
REFERENCE: Goderya, F. S., Dahab, M. F., Woldt, W. E., Bogardi, I., "Spatial Patterns of Field Measured Residual Soil Nitrate," Geostatistics for Environmental and Geotechnical Applications, ASTM STP 1283, R. Mohan Srivastava, Shahrokh Rouhani, Marc V. Cromer, A. Ivan Johnson, Alexander J. Desbarats, Eds., American Society for Testing and Materials, 1996. ABSTRACT: The purpose of this study was to assess the spatial variability of residual soil nitrate, measured in three contiguous 16 ha fields. Available data for residual soil nitrate were examined using conventional statistics. Data tended to be skewed with the mean greater than the median. Geostatistical methods were used to characterize and model the spatial structure. Three dimensional spatial variability was . examined using two semivariograms: horizontal-spatial and vertical. Two ~ dimensional horizontal-spatial serriivariograms were also computed for each O.3m (1ft) layer. Semivariogram analysis showed that there were similarities in the patterns of spatial variability for all fields. The results suggest that the spatial patterns in residual soil nitrate may be correlated with irrigation practices. Furthermore, a trend was found to be present along the vertical direction, which may be related to the time of sampling. KEYWORDS: spatial variability, 3-D semivariogram, 2-D semivariogram, directional semivariogram, residual soil nitrate.
INTRODUCTION Nitrate contamination in groundwater is often related to nitrogen fertilizer applied in excess of crop needs. Residual soil nitrate is frequently the largest source of inorganic N available to crops. The amount of nitrate in the soil profile is important for determining a fertilizer nitrogen recommendation that ensures sufficient nitrogen for crop production as well as preventing potential groundwater problems. The origin and nature of soil resource variability includes natural and management induced soil parameters, and factors exhibiting variability in space and 'Graduate Research Assistant, and 2Prof., Dept. of Civil Eng., University of Nebraska, Lincoln, NE 68588 3Assistant Prof., Dept. of Biological Systems Engineering, University of Nebraska, Lincoln, NE 68583. 248
GODERYA ET AL. ON SOIL NITRATE
249
time (Bouma and Finke 1992). It is an outcome of many processes acting and interacting over a continuum of spatial and temporal scales. Nitrate is a mobile nutrient, also, soil resource and meteorological variability obscures the contemplation of its spatial structure. For example, soil nitrate concentrations from individual samples are usually quite variable, in addition, the non-uniform distribution of irrigation water complicates the issue. Classical statistical procedures have traditionally been used to assess the variability of various properties in soils (Biggar et al., 1973; Biggar and Nielsen 1976; Bresler 1989). The use of these techniques assumes that observations in the field are independent of one another, regardless of their location. However, there is a significant volume of literature in various disciplines such as geology (Davis, 1986; Journel 1989), mining (Guaracio et al., 1975; Isaaks and Srivastava 1989; Journel and Huijbregts 1978) and soil science (Beckett and Webster 1971; Dahiya et al., 1984; Bhatti et al., 1991), which shows that variations in geologic properties tend to be correlated across space. Thus,!the classical methods may be inadequate for interpolations of spatially dependent variables, because it assumes random variation and does not consider spatial correlation and relative location of samples. The geostatistical approach has received increasing attention in science and engineering during the last decade (Kalinski et al., 1993; Woldt et al., 1992; Woldt and Bogardi, 1992; Tabor et al., 1985; Berndtsson et al., 1993; Jury et al., 1987; Mulla 1988; Ovalles 1988; Rolston et al., 1989; Sutherland et al., 1991). The primary reasons for the adoption of geostatistics in various fields are that this methodology (1) provides an estimate for the minimum distance for the spacing of independent samples, (2) provides a basis for an efficient monitoring program from an initial reconnaissance survey, (3) allows the quantification of unbiased measurements of location and spread, (4) furnishes optimal, unbiased estimates of regionalized variables at unsampled locations, based on neighboring data, and (5) can also be used to characterize associated uncertainty using geostatistical simulations. To date, we are not aware of any attempts to characterize the spatial variability of the residual soil nitrate using three dimensional spatial statistics. This information is necessary since the spatial variability in residual soil nitrates has been considered a major factor associated with inherent leaching of nitrate in many production agriculture situations. The primary objective of this study is to measure quantitatively the spatial variability of residual soil nitrates in three fields. The hypothesis is that the variability of residual soil nitrate in a field contributes to the variability of leaching to / the groundwater from the available soil-N pools. The analysis conducted in this stud~,/ will be further utilized in the modeling of nitrate contamination to groundwater,' The eventual goal of the project is to explore variable rate application methods by ,relating residual soil nitrates and other parameters to the amount of nitrate leaching groundwater using geostatistical simulation and transport models. '
to
250
GEOSTATISTICAL APPLICATIONS
METHOOOWGY
Samples from three contiguous 16 ha fields with differing management histories were used to determine the spatial variability of residual soil nitrates (Peterson and Schepers, 1992). Two fields are 396m x 426m, and one fields is 365m x 426m. Field data consist of residual soil nitrate measurements at each location on a 30.5m x 30.5m (100ft x 100ft) grid with a spacing of 20-40m from the boundaries. At each grid location, a single 5 cm (2-inch) diameter, l.5m (5 ft) long soil core was collected and divided into 0.3 m(1 ft) increments. Hence, each layer in three separate fields contained 156, 156, and 143 points respectively. Each sample was analyzed separately and the results are reported as nitrate-nitrogen in 0.3 m (1 ft) depth increment. The data for each point were used to study the 3-dimensional and 2dimensional spatial continuity of the residual soil nitrate. Classical statistical parameters such as the mean, the standard deviation and the coefficient of variation were calculated for each layer. Statistical parameters for the overall three dimensional data sets (vertically averaged over core), as well as for profile (vertically integrated nitrate content for each hole), were also calculated. Structural analysis of the field data was used to evaluate the semivariogram function using programs from GSLIB (Deutsch and Joumel, 1993). Semivariograms (Joumel and Huijbregts 1978) were used to examine the spatial dependence between measurements at pairs of locations as a function of distance of separation. Three,~~ dimensional spatial variability was examined for each of the fields using two semivariograms: horizontal-spatial semivariogram and vertical semivariogram. The semivariogram for horiwntal spatially related data identifies the variability due to \ distance and is combined for all the depths. However, the vertical semivariogram ) describes the variabilities due to depth irrespective of horizontal location. Hence, fo~/ , ,the available data set of each field, two semivariograms were constructed. ' Two dimensional horiwntal-spatial semivariograms also were calculated for each layer, that is for each 0.3 m (1 ft) layer, resulting in 5 different semivariograms for each field. Furthermore, a 2-dimensional horizontal semivariogram also was prepared for the vertically integrated nitrate content at each sample location. In order to explore anisotropies, directional semivariograms were calculated for each field in the horiwntal spatial direction, keeping the direction of the vertical dimension constant. They were prepared using the concept of layers, in which semivariograms were calculated in different spatial directions, by restricting the search window in the vertical dimension.
RESULTS AND DISCUSSION
Residual soil nitrate in the profile was highly variable, ranging from 64 to 650 kg/ha (57 to 580 Ibs/acre) with a mean of 192 kg/ha (173 Ibs/acre). Table 1 shows
GODERYA ET AL. ON SOIL NITRATE
251
the statistical parameters for the three fields. For each layer, data tended to be skewed with the mean greater than the median. The general trend was toward an increase in the values of coefficient of variation and a decrease in the values of residual soil nitrogen with increasing depth. For overall 3-dimensional measurement values, the distribution of data was skewed with large coefficient of variation. TABLE 1--Residual soil nitrate from three fields. Field
Field 1
Minimum (kg/ba)
Maximum (kg/ba)
Mean (kg/ba)
Median (kg/ba)
Std. dev (kg/ba)
C.V (%)
19.35
239.9
84.95
81.45
41.18
48.47
2
9.27
144.35
39.87
33.67
25.02
62.67
3
8.06
125.0
30.63
24.6
20.46
66.79
Layer
4
8.47
123.78
32.76
23.79
22.98
70.15
5
7.66
95.57
29.73
23.79
18.85
63.39
profile
68.14
650.76
217.94
198.17
96.24
44.16
overall
7.66
239.90
43.6
31.85
34.09
78.20
20.16
271.76
73.81
68.95
33.32
45.15
2
9.27
117.33
35.87
30.64
20.05
57.16
3
7.26
151.6
31.12
26.81
21.83
70.14
21.26
75.64
........................................................................................................................................................................................................
Field 2
4
7.66
157.65
28.10
22.18
5
6.45
75.8
25.97
25.4
14.44
55.61
profile
64.11
574.16
194.87
177.61
91.73
47.07
overall
6.45
271.76
39.00
30.24
29.08
74.61
17.34
160.47
72.71
67.33
30.53
41.99
2
12.5
124.99
32.87
30.24
15.13
46.04
3
6.85
75.4
25.51
23.79
11.93
46.78
4
6.85
51.61
19.30
17.74
8.01
41.48
.........................................................................................................................................................................................................
Field 3
**
5
5.24
43.55
14.86
13.31
6.67
44.87
profile
65.72
362.48
165.25
156.44
51.68
31.27
160.50
33.01
24.60
26.70
80.7
overall 5.24 C. V CoeffiCIent of Vanallon
The horizontal-spatial semivariograms are shown in Figures la, 2a and 3a for the three fields. The semivariograms for all three fields have similar shapes. Theoretically, the semivariogram should pass through the origin when the distance is zero. However, all sample semivariograms appeared to approach non-zero values as distance decreased to zero, indicating the presence of a nugget effect.
252
GEOSTATISTICAL APPLICATIONS
The vertical experimental semivariograms and the models fitted are shown in Figures 1b, 2b, and 3b. The maximum distance considered in the computation of the semivariogram cannot exceed half the maximum dimension of the field, (i.e. 0.75 m) for the vertical semivariogram (Journel and Huijbregts, 1975). Thus, only the first two values of the vertical semivariogram are reliable. All vertical semivariograms do not reach a sill, indicating a trend in the property studied. If the information contained in the semivariogram is to be used for kriging at unsampled locations, the trend may need to be removed, or universal kriging may be used. A reason for this trend is most probably related to the presence of high amounts of residual soil nitrate in the surface layer. Figure 4 shows the average amount of nitrate-N in each layer for the three fields. Significant differences between the top layer and subsequent layers may be related to the time of sampling. The results probably exhibit the influence of temporal dynamics due to the spring sampling of the fields. This may be because high mineralization and almost no precipitation/irrigation occurred at the time of sampling of these fields. For this reason, two different types of theoretical models were fitted to the vertical semivariograms; power and spherical models. If the data are to be used for simulation purposes, then the power model may not be used, and hence, another model should be used. "';
Fitting a model to the experimental semivariogram is a significant step in the, geostatistical analysis( tt1s1nrpottiiflCto-serecran-appiopii~I'I1OdetiUr-the-- -' . semivariogram because each model yields different values for the nugget effect and range. A satisfactory fit to the sample variogram was accomplished by the trial and error approach as described by Isaaks and Srivastava (1989). Due to resource constraints only omni-directional horizontal spatial semivariograms and vertical semivariograms were fit to the sample variogram for each field. Table 2 provides the values of semivariogram models for the above mentioned cases. Parameters for the two types of theoretical semivariograms for the vertical direction also are provided in Table 2. Good agreement was obtained between calculated semivariogram values and the corresponding models, as shown in Figures 1, 2, and 3. The range values for horizontal-spatial semivariograms showed considerable variability among the fields: the scale of horizontal-spatial correlation varies from about 150 m to 244 m (500 ft to 800 ft). The range of the semivariogram model for Field 1 was significantly larger than for the other two fields. In the vertical direction the range varied between 1.5 m to 3 m (5 ft to 10 ft). There was two orders of magnitude difference between the ranges of horizontal-spatial and vertical dimension. This represents a system in which the vertical plane is much smaller in scale than the horizontal plane. A typical approach employed for this system is to examine the transport process locally as a vertical one-dimensional flow perpendicular to any layering in the medium (Jury et al., 1987). The complete structural analysis for both the horizontal-spatial dimension and the vertical dimension represents a combination of geometric and zonal anisotropy. The complete structural analysis of hydraulic properties for both dimensions may show the same pattern. There were no data available for lag distances less than 30 m (100 ft) in the
GODERYA ET AL. ON SOIL NITRATE 1000~------------,
E 800 ~ J 600
3000~-------------'
[EJ t2500
~2000
'"§ 1500
§
I;,
I;, 400
~ 1000
E
·s
~
253
200
500
ell
ell 100
200 DislAnce (m)
0~0--~0~ . 5--~1~-~1.~5--~2 DislAnce (m)
300
FIGURE l--Experimental (symbols) and theoretical (lines) semivariograms for Field 1; (a) horizontal-spatial and (b) vertical, (-) spherical model and C-) power model. 2ooor--------------,
1000~------------,
-
EJ
~
~ 1600
800
"-
+
J
+
600
~ 1200
5
5 f400V
's
"-
200
:1
800
~
400
ell
ell °0~--~10~0--~20~0--3~00--~400 DislAnce (m)
0.5
1 DislAnce (m)
1.5
2
FIGURE 2--Experimental (symbols) and theoretical (lines) semivariograms for Field 2; (a) horizontal-spatial and (b) vertical, (-) spherical model and C-) power model. 500r---------------,
-
EJ
~::: ~-~+-+-+
~2000 J 1500 5
~ 1000
200
·r E
2500r--------------.
/
:1
100
500
ell
ell 0~0--~1~00~-~2~00~-~30~0--~400 DislAnce (m)
0.5
1 DislAnce (m)
1.5
2
FIGURE 3--Experimental (symbols) and theoretical (lines) semivariograms for Field 3; (a) horizontal-spatial and (b) vertical, (-) spherical model and C·) power model.
254
GEOSTATISTICAL APPLICATIONS
1100
C ~
..
.-----
..
"'-toO 80
Field 1
•
~
Field 2
oS 60 .9
z I
...
40
~
:: "
~
...
+
t'l
0
Z
Field 3
'-----
.. . +
+
20 0
0
0.5
1 Depth (m)
1.5
2
FIGURE 4--Amount of average residual soil nitrate in each layer. TABLE 2--Semivariogram parameters of residual soil nitrates for three fields. Field 1
Field 2
Field 3
Nugget (kg/ha)2
420
130
130
Sill (kg/ha)2
810
330 (1) 430 (2)
190 (I) 290 (2)
Range (m)
244
30.5 (1) 122 (2)
30.5 (1) 152.4 (2)
Vertical-Modell
Nugget (kg/ha)2
420
130
130
(power)
Slope (kg/ha)2
110
200
230
Power
1.0
1.15
1.2
Vertical-Model 2
Nugget (kg/ha)2
130
130
130
(Spherical)
Sill (kg/ha)'
1550
900
1000
1.5
1.5
Horizontal-Spatial
Range (m) 1.5 "'. numbers m parenthesis refers to nested structures 1 and
2.
horizontal-spatial direction; hence, the nugget effect was estimated by visual inspection. There appears to be two different values for the nugget effect in the horizontal and the vertical directions for Field 1. The small nugget effect of the vertical semivariogram may be detected because of the small spacing between data points in the vertical direction. See Figures la and lb. Spatial variability can also be investigated using the semivariogram and the relative nugget effect, that is the ratio of nugget to total semivariance expressed as percentage. A ratio less than 25 % indicates strong spatial dependence, between 25 % and 75 % indicates moderate spatial dependence, and greater than 75 % indicates weak
GODERYA ET AL. ON SOIL NITRATE
255
spatial dependence (Cambardella 1994). The horizontal-spatial semivariograms may be described as having moderate spatial dependence for residual soil nitrate. However, if one considers the spherical model for vertical semivariograms, then the vertical semivariograms may be characterized by strong spatial dependence; exhibiting ratios of less than 25 %. Strong to moderate spatially dependent structures may be controlled by intrinsic and extrinsic variations as well as seasonal variations. Two dimensional horizontal semivariograms are shown in Figures 5, 6, and 7. These semivariograms are calculated individually for each layer and also for the profile (i.e. the sum of amounts in all layers for each grid location), without any regard to the vertical dimension. If one compares the form of spatial variability of each individual layer with that of a profile, it is obvious that the form of the structure is similar to the top layer, indicating that the top layer structure is representative of the overall spatial structure. The large impact of the top layer semivariogram on the profile semivariogram is due to the larger variance of residual nitrate concentrations in the top layer relative to other layers (see Table 1). As a result, if one has to measure the field again or measure other fields with a similar structure, it may be appropriate to assess each location to a depth of O.3m (1 ft) and then sample every fourth or fifth location at lower depths. However, classical statistics reveal high coefficient of variation values for the deeper layers as compared to the first layer. Further analysis is necessary to determine an ideal sampling approach. There was less nitrogen in the soil profile in the third field, and there was less variability in samples from different layers of this field, as compared to the other two fields. However, overall (vertically averaged over core) sample variability was the same or higher (see Table 1 and Figures 3 and 7). Further investigation indicated that this field received more irrigation water in the previous two years than the other two fields. It is probable that the excessive application of irrigation water leached much of the nitrate from the profile and reduced the amount and spatial variability of residual soil nitrate. Six directional semivariograms were calculated for each field. All directions corresponded to rotations in the horizontal plane only. The directions considered were North, N30E, N60E, N90E, N120E, and N150E, with azimuth half tolerance of 45 degrees. Directional semivariograms are presented as a contour map of the sample variogram surface (planimetric form) in Figures 8, 9, and lO for Fields I, 2, and 3, respectively. The values contoured are the semivariance in every direction to a distance of at least 200 meters, with a contour interval units in (kg/ha)2. Differences between direction-dependent semivariograms for the fields studied could be the result of the differences in geology, topography, and/or management of the area. In this case it is speculated that these significant effects of north-south and east-west directions across each field were largely due to the irrigation pattern in the fields. These fields were surface irrigated with water being distributed on the west side of the field. Hence, residual soil nitrate appears to follow trends in irrigation water supply. The variogram surface in the east-west direction is more continuous
256
GEOSTATISTICAL APPLICATIONS
2500
~2000 ~ 1500
5
...
~ 1000
...
...
500
+
+
10 iI )( )(
~
... ... ...
...
layer 1
......... ... ... ... ...
layer 2 layer 3 layer 4
'~
's
15000
r---
EJ
• • +++6111 0
°
00
00
layer 5
~~~~llIIilil*i .::
ii ><
)()()(
100
)(
400
,""
~ 12000
~
101
)(
)I(
)01
)I(
9000 6000
'~
S
3000
~
)(
200 300 Distance (m)
~
N
°0~----1~0-0--~20~0--~3COO~--4~0'-0--~500
500
Distance (m)
FIGURE 5--2-D semivariograms for Field 1; (a) five depths; and (b) total in profile.
- ...
2500
EJ
N
~2000
~ 1500
......
,~ 1000
~
... ...
... ...
layer 2 layer 3
... ...
5
S
............... ...
500
•
~
••
! , 0 • )()()(
layer 4
10
)(
• I!IO.
++
)(
)()(
)()()(
0
100
200 300 Distance (m)
15000
~
OJ 112000
~
9000
5Q,
6000
"" ,9
)(
0
layer 1
400
~ )(~
~
's
..
3000
~
0
500
0
100
200 300 Distance (m)
400
500
FIGURE 6--2-D semivariograms for Field 2; (a) five depths, and (b) total in profile.
1200
i ""
!
'~
S
EJ 900
...
... ... ... ...... ......
...
-
- ... layer 1 layer 2 layer 3
600
layer 4
300
.. ••••• * ••
~
+
~
0
II 1111 111111 II SI SI SIll
0
100
200 300 Distance (M)
g 400
5000
~
~ 4000 ~3000
...... ......
5 'f2000
S 1000 ~
500
0
0
100
200 300 Distance (m)
400
FIGURE 7--2-D semivariograms for Field 3; (a) five depths, and (b) total in profile.
500
GODERYA ET AL. ON SOIL NITRATE
-ISO -100
-SO
0
SO
100
257
ISO
But-West Sepentim DilcaDce (m)
FIGURE 8--A contour map of the semivariogram values for Field 1. Contour interval is 50 (kg/ha)A2.
-150
-100
-SO
0
SO
100
ISO
But-West SqJeruion DiIcaDce (m)
FIGURE 9--A contour map of the semivariogram values for Field 2. Contour interval is 50 (kg/hat2 . than in the north south direction. In other words, the irrigation pattern seems to result in high variability (larger sill values) with the variogram surface rising rapidly in the north-south direction. Hence, the directional semivariograms indicates the
258
GEOSTATISTICAL APPLICATIONS
FIGURE lO--A contour map of the semivariogram values for Field 3. Contour interval is 25 (kg/ha)'"'2. presence of anisotropy. However, this can be classified as a mild case of geometric and zonal anisotropy case which is apparent in all three fields at larger distances.
SUMMARY AND CONCLUSIONS Geostatistical analyses showed that residual soil nitrates in three fields were spatially structured. This spatial structure is important to consider, both for fertilizer application and for evaluation of potential pollutant transport to the groundwater. The apparent spatial variability in the residual soil nitrate has the potential to seriously limit the efficiency of fertilizer application according to traditional practices. Conventional statistical analysis showed that the residual soil nitrate in the profIle was variable, ranging from 64 to 650 kg/ha (57 to 580 lbs/acre) with a mean of 192 kg/ha (173Ibs/acre). Data tended to be skewed with the mean greater than the median. Geostatistical techniques offer alternative methods to conventional statistics for the estimation of parameters and their associated variability. Three-dimensional semivariograms were calculated for each field. Two different semivariograms were also calculated for each field, horizontal-spatial semivariogram and vertical semivariogram. In addition, two dimensional semivariograms were prepared for each layer. Finally, six directional semivariograms also were calculated for each field. Semivariogram analysis demonstrated that there were similarities in the patterns of spatial variability for the three fields. This may suggest that spatial
GODERYA ET AL. ON SOIL NITRATE
259
relationships derived from one set of measurements for one field, may have applicability at other field sites. Since spatial structures are influenced by the scale of the investigation, it remains to be seen whether or not this approach will be useful for extrapolating spatial information obtained at the field-scale to the watershed or regional scale. The 3-dimensional and 2-dimensional semivariogram analysis resulted in similar structure and form for all three fields. Three dimensional horizontal-spatial semivariograms showed that for all three fields, the range was about 120 to 245 m. In the vertical direction the range varied between 1.5 to 3m (5 to 10 ft). The complete structure for both the horizontal-spatial dimension and the vertical dimension represents a combination of geometric and zonal anisotropy. The complete structural analysis of hydraulic properties for both dimensions may show the same pattern. Three-dimensional vertical semivariograms also displayed a significant trend which may be related to conditions at the time of data collection. The nugget values expressed as a percentage of the total semivariance defines different classes of spatial dependence. Horizontal-spatial semivariograms indicated moderate spatial dependence, while the vertical semivariograms were characterized by strong spatial dependence, exhibiting ratios less than 25 %. Strong to moderate spatially dependent structures may be controlled by intrinsic and extrinsic variations as well as seasonal variations. The two dimensional analysis showed a strong spatial pattern in the top layer, which is displayed in the overall structure of the 2-dimensional semivariograms. The analysis further revealed that the soil nitrates at 0.6m to 1.5m (2 to 5 ft) depths may be sampled without a great sensitivity to location with a resulting similar variance. Direction-dependent semivariograms showed that residual soil nitrates apparently followed trends in irrigation water supply. This pattern resulted in high variability in the direction perpendicular to irrigation water flow. The structural information can be useful in the management of production agriculture systems in which variable rate application of nitrogen can be used to increase production and reduce the risk of groundwater contamination. The balance between crop uptake rates and residual soil nitrogen can also lead to a more costeffective fertilizer application rates without increasing the risk of groundwater pollution.
ACKNOWLEDGEMENT
This paper was supported, in part, by the Center for Infrastructure Research, the Water Center, and the University of Nebraska-Lincoln and, in part, by the Cooperative State Research Service (CSRS) of the U.S. Department of Agriculture (Grant Number 92-34214-7457). Assistance provided by Dr. T.A. Peterson from Department of Agronomy of University of Nebraska-lincoln is acknowledged.
260
GEOSTATISTICAL APPLICATIONS
REFERENCES Beckett, P. H. T., and Webster, R., 1971, "Soil variability: A review", Soils and Fertilizers, Vol. 34, No. l,pp. 1-15 Berndtsson, R., Bhari, A., and Jinno, K., 1993, "Spatial dependence of geochemical elements in a semiarid agricultural field: I. Geostatistical properties", Soil Science Society of America J., Vol. 57, pp. 1323-1329 Bhatti, A. U., Mulla, D. J., Koehler, F. E., and Gurmani, A. H., 1991, "Identifying and removing spatial correlation from yield experiment", Soil Science Society of America J., Vol. 55.pp.1523-1528 Biggar, J.W., Nielsen, D. R., and Erh, K. T.,1973, "Spatial variability of fieldmeasured soil-water properties", Hilgardia, Vol. 42, No.7, pp. 214-259 Biggar, J.W., and Nielsen, D. R., 1976, "Spatial variability of the leaching characteristics of a field soil", Water Resources Research, Vol. 12, No. l,pp. 78-84 Bouma, J., and Finke, P. A., 1992, "Origin and nature of soil resource variability", 1992, Proceedings of the Soil Specific Crop Management Conference, Minneapolis, Minnesota, April 14-16 Bresler, E., 1989, "Estimation of statistical moments of spatial field averages for soil properties and crop yields", Soil Science Society of America J., Vol. 53, pp. 16451653 Cambardella, C.A., Moorman, T. B., Novak, J. M., Parkin, T. B., Karlen, D. L., Turco, R. F., and Konopka, A. E., 1994, "Field-scale variability of soil properties in central Iowa soils", Soil Science Society of America J., Vol. 58 (In press) Dahiya, I. S., Ritcher, J., and Malik, R. S., 1984, " Soil spatial variability: A review", International Journal of Tropical Agriculture, Vol. 11, No.1, pp. 1-102 Davis J., 1986, "Statistics and Data Analysis in Geology", John Wiley & Sons, NewYork, NY Deutsch, C.V., and Journel, A. G., 1992, "GSLIB: Geostatistical Software Library and User's Guide", Oxford University Press, New York, NY Guaracio, M., David, M., and Huijbregts, C. J., 1975, "Advanced Geostatistics in the Mining Industry", D. Reidel Publishing Company, Dordrecht, Holland lsaaks, E. H., and Srivastava, R. M., 1989, "An Introduction to Applied Geostatistics", Oxford university press, New York, NY
GODERYA ET Al. ON SOIL NITRATE
261
Journel, A. G., and Huijbregts, C. J., 1978, "Mining Geostatistics", Academic press, New York, NY Jury, W. A., Russo, D., Sposito, G., and Elabd, H., 1987, "The spatial variability of water and solute transport properties in unsaturated soil; I. Analysis of property variation and spatial structure with statistical models", Hilgardia, Vol. 55, No.4, pp. 1-32 Kalinski, R.J., Kelly, W. E., Bogardi, I., and Pesti, G., 1993, "Electrical resistivity measurements to estimate travel times through unsaturated ground water protective layers", Journal of Applied Geophysics, Vol. 30, pp. 161-173 Mulla, D. J., 1988, "Estimating spatial patterns in water content, metric suction, and hydraulic conductivity", Soil Science Society of America J., Vol. 52, pp.1547-1553 Ovalles F. A., and Collins, M. E., 1988, "Evaluation of soil variability in northwest Florida using geostatistics", Soil Science Society of America J., Vol. 52, pp. 17021708 Peterson T. A., and Schepers J. S., 1992, "Spatial distribution of soil nitrate at the Nebraska MSEA site", Agriculture Research to Protect Water Ouality, Poster Paper, USDA Agricultural Research Service, University of Nebraska, Lincoln, NE Rolston, D. E., and Liss, H. J., 1989, "Spatial and temporal variability of water soluble organic carbon in a cropped field", Hilgardia, Vol. 57, No.3, pp. 1-19 Sutherland, R. A., Kessel, C. V., and Pennock, D. J., 1991, "Spatial variability of Nitrogen-15 natural abundance", Soil Science Society of America J., Vol. 55, pp. 1339-1347 Tabor, J. A., Warrich, A. W., Myers, D. E., and Pennington D. A., 1985, Snruial variability of nitrate in irrigated cotton: II. Soil nitrate and correlated variables, Soil Science Society of America J., Vol. 49, pp. 390-394 Webster, R., and Burgess, T. M., 1983, "Spatial variation in soil and the role of Kriging", Agricultural Water Management, Vol. 6, pp. 111-122 Woldt, W., and Bogardi, I., 1992, "Ground water monitoring network design using multiple criteria decision making and geostatistics", Water Resource Bulletin, Vol. 28, No.1, pp. 45-62 Woldt, W., Bogardi, I., Kelly, W. E., and Bardossy, A., 1992, "Evaluation of uncertainties in a three-dimensional groundwater contamination plume", Journal of Contaminant Hydrology, Vol. 9, pp. 271-288
Dae S. young1
GEOS~A~IS~ICAL
JOINT MODELING AND
PROBABILIS~IC
S~ABILITY
ANALYSIS FOR
EXCAVA~IONS
REFERENCE: Young, D. S., "Geostatistical Joint Modeling and Probabilistic Stability Analysis for Excavations," Geostatistias for Environmental and Geoteahmaai AppllOatlOns, ASTMBTP 1283, R. M. Srivastava, S. Rouhani, M. V. Cromer, A. 1. Johnson, A. J. Desbarats, Eds., AmerIcan Society for Testing and Materials, 1996.
Two geostatistical interpolation methods were studied for rock joint modeling; ordinary kriging and indicator kriging. Geostatistics was extended to improve the spatial interpretation of joint parameters, especially for pole vectors, which was kriged on the sphere. A matrix approach was introduced for probabilistic block failure analysis, and applied to study the stability of pit slopes and subway tunnels. The localized structural stability was achieved in probabilistic terms based on the geostatistical joint model.
ABS~C~I
KEYWORDS I joint models, geostatistics, block failure, matrix approach, probabilistic stability analysis.
Rock joints play an important role in rock mechanics (particularly for structural stability analysis) and geohydrology (particularly for fluid-flow in fractured rocks). The site characterization requires the characterization of both the rock mass and the joint systems within the rock mass. In this paper, recent advances in joint system modeling and its applications to rock mechanics are presented to demonstrate the superiority of the resultant engineering analysis from the advanced models. It was achieved mainly from geostatistics, which incorporates the spatial variability of joint characteristic parameters into the modeling and localizes them to build a localized discrete cell-block model of joint systems. Considering many characteristics of rock joints that are best described in probabilistic terms, there are intrinsic advantages in geotechnical approaches that directly employ the relevant statistical distributions (Baecher et al. 1977, Warburton 1980). Consequently, an appropriate model of joint systems in a rock mass is the localized probabilistic model, and a realistic geotechnical analysis of rock structures is a probabilistic approach made on the
1Associate Professor, Mining Engineering Department, Michigan Technological University, Houghton, MI 49931. 262
YOUNG ON ANALYSIS OF EXCAVATIONS
263
model, which will yield local structural stability in terms of the probability of failure. since the block size distribution (i.e. blocks formed by the joints in a rock mass) can describe or be related to these engineering criteria, it is a pertinent characteristic parameter in numerous engineering studies including tunneling and underground excavations, rock bolting and other types of supporting systems, engineering classifications of rock mass, key block analysis for structural stability, drilling and blasting, and transmissibility of fluids through fractured rock formations. In this paper, a numerical method was developed to identify blocks and calculate their sizes (or volumes), shapes, and locations, as well as their stability. The connectivity matrix was introduced in this numerical approach, which is equivalent to the stiffness matrix of the finite element method of stress analysis. Then, the key block analysis was extended for the probabilistic structural analysis based on the connectivity matrix. Finally, the key question, the localized probabilistic structural analysis was achieved by applying the finite element approach for the key block analysis on to the discrete cellblock model of the joint system. JOINT MODELS
Rock joint systems surveyed in the field and characterized statistically often are incorporated in the geotechnical analysis through the joint model. Because of the complexity of joint geometries and their characteristic nature, as well as limited accessibility in the field for joint surveys, various degrees of simplification or assumptions are made in joint modeling. Thus corresponding joint models are developed depending on the field geology, modeling purposes and the model's end usages. Basic joint models used in the geotechnology are created either to simulate the spatial distribution of joints; the joint network simulation that will duplicate the statistics of joint parameters sampled, or to replace the joint systems with the equivalent continuous media that can represent effectively the statistics of joints. Most of these models share common features and assumptions such as planar joints, a uniform distribution in space for joint locations and their independency with other joint parameters. Also, these models could be generated as either deterministic or stochastic models by using the statistical distribution of jOint parameters. GEOSTATISTICAL APPROACH
Recently geostatistics has been applied to joint system modeling, joint network simulations (Chiles 1988) and discrete block models of the equivalent continuum media (Young 1987a, 1987b). In the joint network simulation, joint planes are considered as discs and the parent-daughter model is applied for the location of joint disc centers, in which daughters are nucleated around parents. Then the density of the parents is regionalized for the geostatistical simulation of the joint networks. When studying a large volume of rock, such an approach requires generating a huge number of joints that cannot be handled easily on the computer. A simple shortcut to avoid this problem is to estimate average characteristic properties in discrete cell-blocks. DISCRETE CELL-BLOCK MODEL
In most engineering analyses in rock mechanics and geohydrology, the network geometry of joint systems can be replaced with the discrete cell-block model, because the joint parameters can be used directly as
264
GEOSTATISTICAL APPLICATIONS
input data or they can be converted into the equivalent continuum media that represents the joint systems effectively. In this discrete model, the entire area (or rock mass) to be modeled is divided into uniform cell-blocks and the characteristic parameters are inferred for each cell-block from the sparse sample data measured in the field. A few cases of geostatistics applications to geotechnology are reported where rock mass characteristic parameters were found to be spatially correlated random variables and geostatistical interpretations are a must to incorporate this phenomena into the modeling (Chiles 1988, Young 1987a, 1987b, Miller 1979). In most of these cases, the regionalized variables are in scalar terms but the joint orientations or poles are considered unit vectors. This means that the pole vectors should be kriged on the unit sphere where they are projected and analyzed traditionally (a stereonet projection for orientations) (Young 1987a, 19887b).
KRIGING ON
THE SPHERE
Traditionally, jOint orientations are projected on a stereonet as a pole to define and analyze them. A stereonet is simply the projection of a unit sphere on a plane, and so a pole is a unit vector projected on a unit sphere or stereonet. Therefore, a kriging system was developed on the sphere for the spatial analysis of pole vectors (Young 1987a). In this kriging, the pole vector Z(x) at a location x was regionalized directly. The spatial variability of poles was introduced by the vectorial variogram, which was defined as the expectation of the squared norm of the difference of two pole vectors over a vectorial distance h: 2y (x,h) = E [IZ(x) - Z(X+h)
12]
So, it is the magnitude of the vector difference; the distance AB in Figure 1, rather than an angular difference.
N
..
- z(x+h)]
-;------~~~-------r~E
FIG. 1--Two pole vectors regionalized and their difference vector projected on the upper reference hemisphere. Under the intrinsic hypothesis (Journel and Huijbregts 1979), the variogram was estimated by a mean value of samples (or poles) grouped over a distance h, as done for scalar variables; 2y (h) = _1_ 1: N(h)
Iz (x)
- Z (x+h) 12
YOUNG ON ANALYSIS OF EXCAVATIONS
265
where N (h) is the number of sample pairs available at h. As seen here, the vector variogram is turned out in scalar terms, and consistent with the definition of the classical scalar variable variogram (Journel and Huijbregts 1979). Therefore, the classical operation of geostatistics can be applied to the pole vectors through the vector variogram including estimation variance analysis, dispersion variance analysis and variogram structural analysis (Journel and Huijbregts 1979). By defining the estimation variance as the difference vector
where Zv = actual vector to be estimated for the support v, that is equivalent to a local cell-block
Z; = estimate of Zv the kriging system for pole vectors was rederived as follows (Young 1987a). + Jl
= Y (va' V) ,
IV- a=l,n
[exclusively followed the notation in Journel and Huijbregts (1978, p. 306)].
Then, the kriging estimate and its variance are obtained respectively as follows:
Z; =
and
La haZ (Xa ),
a! La haY (va' V)
>v- a=l, n
+ Jl
= Y (V, V)
• Lagrange Multiplier) As shown above, the kriging system of vector variables (or poles) is the same as ordinary kriging (OK) of scalar variables, depending on the definition of the estimation variance. In this vector kriging system, the magnitude of the estimation error vector was optimized. The kriging variance cr.2 is not a local conditional estimation variance and its application is limited (Journel and Huijbregts 1979). The kriged mean vector represents the average orientation of joints within the cell-block, V, and its accuracy can be measured by its kriging variance. So, it creates a deterministic model and is good enough for only deterministic analysis of geotechnology, but it is still a localized model. However, the full statistical distribution of pole vectors within a cell-block V is needed to generate a stochastic model of poles for probabilistic engineering analysis. This was achieved by using Indicator Kriging (IK) (Young 1987b, Young and Hoerger 1988a). (~
266
GEOSTATISTICAL APPLICATIONS
LOCAL DETERMINISTIC MODEL
In this localized deterministic cell-block model, mean values of joint parameters are estimated for each local block by OK and the joint systems are characterized for every block in the entire area modeled. This is the best type of model that can be developed from the sample information typically available, since OK incorporates the spatial variability observed in the sample into the kriging estimator via variograms, and minimizes the estimation variance. LOCAL STOCHASTIC MODEL
Considering the dispersion of joint parameters over their means and the complexity of the characteristic nature of joints, the mean value does not carry much meaning nor is it close to the reality, and neither is the deterministic engineering analysis. This difficulty was corrected in the stochastic model, which provides the full statistical distribution of joint parameters for each local cell-block. Thus, probabilistic engineering is applicable to the model at the early stages of site exploration and engineering design. The local probability distribution was estimated by IK, more precisely Mononodal IK, which is a non-parametric approach (Lemmer 1984). The original IK approach was rederived for vectorial variables (pole vectors in this case) and indicator variables were defined on the two-dimensional area of class intervals, which was projected on the Grossman's tangent plane for pole histograms (Lemmer 1984). The accuracy of this stochastic model was then cross validated by comparing the model with the actual field data in the following open pit case (Young 1987b, Young and Hoerger 1988a). FINITE ELEMENT APPROACH FOR KEY BLOCK FAILURE
The traditional key block theorem for the stability analysis on the structures excavated in a jointed rock mass is a deterministic method based on deterministic infinite joint planes. This means that the location and frequency of joints and size of joint planes are excluded from the block failure analysis and it provides a worse case analysis. Consequently a numerical approach was developed, which is general for both joint system models and any structure (their size and shape). Also, it is an effective algorithm to computerize the entire block failure analysis in probabilistic terms. Therefore, it can be combined easily with the local stochastic model of joint systems to achieve the localized probabilistic analysis of block failures. The numerical algorithm was developed based on the connectivity matrix, which is comparable to the stiffness matrix of the finite element method of stress analysis in the continuum mechanics. Connectiyity Matrix Approach The local area, where the joint systems were simulated and the key block analysis desired, was replaced with the discrete finite element model as used in the finite element method of engineering mechanics. However, the elements were constructed by two-force bars as in truss structures rather than by solid elements. Then, the local area can be represented as a large truss structure with bars connected at nodal points, whose continuity and immobility were secured. When the rock mass in the local area is cut by joints, some elements will be cut, as well as the connection bars within those elements. Also, many independent small truss structures will be formed when the rock mass is cut into many rock blocks by joints; that is, the whole truss structure is cut into many parts corresponding to those rock blocks. The connectivity matrix was introduced to define the connecting
YOUNG ON ANALYSIS OF EXCAVATIONS
267
condition of bars (or their continuity conditions), and the whole truss structure was represented by the global connectivity matrix. Then, the independent small size structure representing a block formed by joints can be searched and identified as an independent block matrix system in the global connectivity matrix. Each of the independent matrix elemental blocks in the whole system matrix has its own size, shape, and location. So, the complete information for a block geometry is known and available from the nodal numbers of the matrix elemental block. Element Model The element constructing the whole system of the rock mass is replaced with a truss formed by simple bars connected between two nodes. The element can be in any shape or size with different numbers of nodes. For simplicity in this paper, a rectangular element with 8 nodal points was used for the rock block calculations. Then, the 8-point equal parameter truss element appears like the usual 8-point solid element in the finite element method, but it consists of 28 two-force bars as shown in Figure 2. The volume of this element is the same to that of a solid element and distributed equally on to its nodal points. Therefore, the nodal point system and the number of freedoms in the element model were not changed from the finite element model systems. The global continuous truss structure for the entire rock mass was developed by constructing this type of truss element on every element in the model. The inner nodal points will have 26 bars connecting to adjacent nodes around it.
FIG. 2--28 bar truss element model. Elemental and Global Connectivity Matrices The connectivity of the truss element is an 8x8 matrix since the element has 8 nodal points and only one freedom at each node is needed for the purpose of block calculations. Also, the distribution of connection bars at a node is not required to be known exactly. Consequently, two indicator numbers are enough to define the continuity condition of the connection bars between any two nodes, such as 0 for no connection or the connection cut by a joint, and 1 for the positive connection, so there is a bar to connect them. By applying this indicator system, the connectivity matrix of a truss element can be written simply as follows:
268
K =
GEOSTATISTICAL APPLICATIONS 7
1
1 1 1 1 1 1 1
7
1 1
1 1 1 1 1 1
1 1 1 1 1
7
1 1 1 7
1 1 1 1
1 1 1 1
7
1 1 1 1 1
1 1 1
1 1
7
1 1 1 1 1 1 7
1 1 1 1 1 1 1
1
7
Compared with the exact stiffness matrix of the finite element method, the freedom of [KJ was reduced from 24 to 8 and the matrix element of [KJ does not express exactly the mechanical behavior of the element. However, the connectivity matrix [KJ remains symmetric to follow the principles of mechanics. Also, the global connectivity matrix can be assembled from the element connectivity matrix [KJ by following the same procedure of assembling the global stiffness matrix in the finite element method. The global connectivity matrix of the entire truss structure in the rock mass, [MJ, is a (nxn) matrix, where n is the number of nodal points in the whole structural matrix. Therefore, the information of the continuity or connection between two nodes can be stored at each matrix element in [MJ, because each node has one freedom. The [MJ matrix is symmetric and banded along the diagonal direction as in the way of the finite element method. Block Calculation Simulated rock joints were introduced into the whole structural matrix one by one (Young and Hoerger 1988a). Whenever one joint plane was introduced, the elements were searched that may be cut by the joint, and which of the 26 bars within the element to be cut were checked. Whenever a bar is cut by a joint, the continuity of the global truss structure will be weakened. This weakness was reflected on the global connectivity matrix by modifying its matrix element corresponding to the bar that was cut by the joint. Actually, the matrix element was subtracted by one from the original number of 26. Consequently, the global connectivity matrix was modified constantly while all of the joints were introduced and all the connection bars were tested for their continuities. The final global connectivity matrix must be singular, and will consist of as many independent substructures as blocks formed by the joint systems. It is obvious that when a rock block is isolated by joints from the rock mass, its corresponding small substructure will be separated from the global structure. These substructures are independent from each other, i.e. no connection exists among them. This means that the global connectivity matrix, which has been modified completely after the whole joint was introduced, can be transformed into a block matrix by elemental transformation (Golub and Van Loan 1983). Then, each elemental matrix block in the final block matrix transformed from the global connectivity matrix represents an independent substructure, which is nothing but the rock block isolated by the joints and the block searched for the block calculations. In this way the problem of searching blocks in the rock mass was replaced with the problem of identifying the independent matrix blocks on the global connectivity matrix. The knowledge of nodal numbers within the independent matrix block is enough to identify the shape, size or volume, and location of a rock block formed by the joint systems. The volume of a rock block was calculated simply by counting the nodes within its substructural matrix block and summing up their volumes. For this connectivity matrix method, there are no limits on the sizes and shapes of joint planes as well as the geometry of rock blocks
YOUNG ON ANALYSIS OF EXCAVATIONS
269
formed by the joint systems, including any random aggregation of elements in the three-dimensional space. Key Block Failure Once blocks were identified, key blocks were sorted out by following three steps: 1. Collect the joint and excavation plane geometry associated with a particular block. 2. Evaluate kinematic stability of the block with an algorithm based on Shi's theorem (Goodman and Shi 19984) and if the block is kinematically unstable. 3. Evaluate mechanical stability with Warburton's algorithm (Warburton 1980) • positional Probability of Failure The probabilistic analysis of key block failures was achieved by the positional probability of failure, which was defined by the number of times the position (or node) was evaluated as being contained in a key block. In this way the probabilistic key block analysis can be effectively combined with the stochastic joint system simulation, and the realistic structural stability can be obtained in probabilistic terms. One of the interesting aspects of this type of analysis is that it provides for the evaluation of progressive key block failures. If it is assumed that a key block displaces into the excavation, the next level of blocks exposed to the excavation surface modified by the initial key block failures become potential key blocks. The process may continue over many levels of failure. With this type of analysis it is very simple to evaluate the successive levels of key block failure around an excavation surface. It has a specific application in mining engineering: cavability analysis for the caving method of mining. CASE
S~IES
A few cases were analyzed to demonstrate the capability of localized discrete cell block models and corresponding improvements achieved in the engineering analysis by the finite element method of probabilistic key block theorem. Open Pit Slope Stability An extensive statistical analysis was made on a total of 939 joint survey data taken by the cell mapping technique from an open pit mine. When pole vectors were projected on the upper hemisphere, three joint sets were identified and separated by FRACTAN computer code (Shanley and Mahtab 1975) as shown in Figure 3. Characteristic parameters and their statistics on those joint sets were summarized in Table 1, which was computed on Grossman's tangent plane (Grossman 1985). Also, their variogram parameters for the isotropic spherical model were presented in Table 2 (Young and Hoerger 1988b).
270
GEOSTATISTICAL APPLICATIONS
TABLE l--Joint parameters and their statistics. Mean Attitude Spacing Roughness Avg. Length (deg. ) (m) (m) Joint Set
Dip Direction (deg.)
Dip (deg.)
Mean/ Variance
Mean/ Variance
Mean/ Variance
1 2 3
92.21 353.05 190.3
74.1 70.5 49.2
0.845/0.582
3.17/7.34
0.545/0.235
TABLE 2--Spherical variogram of joint parameters for East set. orientation Spacing Roughness Avg. Length Sill Nugget Range (m)
0.145 0.06 250
0.53 0.275 250
6.87 4.00 250
0.23 0.135 250
~~r----r----~--~--~~--~---, 6 _.-pi. loe.t i ona
~L-__~____~____~____~__~____~
'UL
1"'.
no •.
EAST (METERS)
FIG. 3--Contours of poles projected on a stereonet.
..·,et "-
FIG. 4--Discrete cell block model model for an open pit mine.
The entire mine was subdivided into cell-blocks as shown in Figure 4 to build a discrete cell-block model. Then geostatistical operations were performed to define the spatial variability of every joint parameter by their variograms and to characterize the joint systems within each cell-block by estimating those parameters by kriging techniques. First, the local deterministic cell-block model of three joint sets was generated for the mine by using OK. The average values of joint parameters were obtained for every cell-block, which is equivalent to the characterization of joint systems within a small unit block. In other words, the average local joint system properties were inferred and characterized from the global sample data. Then the key block theorems (Goodman and Shi 1984) were applied to study the local slope stability in terms of the maximum safe slope angles within each local cell-block area. The local joint parameters kriged previously were used as input for this local slope analysis. The localized slope stability was then compared with the slope stability obtained from the global average values of the joint parameters. The maximum safe slope angles based on local input showed
YOUNG ON ANALYSIS OF EXCAVATIONS
271
significant deviations from those based on global averages as input. To illustrate these results graphically, the localized maximum safe slope angles were plotted along the various pit slope dip directions as shown in Figure 5. These significant local deviations would have a major effect on the overall behavior of the mine slope (Hoerger and Young 1987) • For an open pit already designed using global averages as design inputs, geostatistics can identify areas whose slopes could be steepened as well as local high risk areas deserving increased monitoring; for a new mine, using only the limited information available during the development stage, kriging can create a block model of joint orientations which could be used to design not only the final pit slope, but also to design the intermediate slopes.
.. ." .,· ..,
· ...·.
15
- - glob.l IVg . o loca l esti ma te
80
)0
60
.
~
a. 40
0
ooO!O~
zo
10
j
00
~
~
~Pf( .""J..J
00000 o~
00000 ·
20
180
ZZ5
Z70
315
360
Probabill lyof C.:.llIolU , (I)
Pit Slop. Oip Di r.ct i on (dog)
FIG. 5--Safe slope angles using global averages and local estimates as input.
FIG. 6--Histograms of PF for cell-blocks.
Secondly, the local stochastic joint model was developed for the mine by using IK as described before (Young 1987b). In this case the full probabilistic distribution of joint parameters are available for each local block and the probabilistic stability analysis can be performed on every local block to achieve the localized probability of slope stability. Then, the probability of failure (PF) based on the localized probabilistic model of joint systems [PF (IK)) was compared with PF (sample), which was calculated similarly from the global sample distributions, by constructing a histogram of failure probability for local blocks (Figure 6). PF (sample) yielded a marginal PF of 50\ to every cell block, which could be expected from the symmetric distribution of the global sample data (Young and Hoerger 1988b). Therefore, it could be said that the PF (sample) did not improve the stability analysis over the deterministic method used currently . PF (IK) draws a distinctively different histogram, which spreads over a wide range of PF between 25\ and 80\. Only 17\ of a total of 36 cellblocks showed the marginal 50\ of PF by the PF (IK). Fifty percent of the 36 blocks had a higher PF than the marginal PF and the remaining 33% showed a PF lower than the marginal PF. This clearly indicates that the local variation or spatial variability of joints plays a significant role in the slope stability, and the local probabilistic approach should be applied to achieve an effective slope analysis. Also, local risk assessments on the regional pit slopes can be achieved from the PF (IK) analysis at any time period of the mine life. It is an important improvement in slope design in general and should be exercised routinely in field projects.
272
GEOSTATISTICAL APPLICATIONS
"¥ ; r---.,.---..,..---...,....--...,....---,-----,
!L-__'-__.......__.......__-'-__-'-__-'
..... .....
.....
"01 .
EAST ,METERSI "
IIUt.
"".
'1f1
FIG . 7--Spatial distribution of PF (IK) in the pit. The spatial distribution of PF (IK) plotted in Figure 7 indicates cell-blocks of higher and lower PF's than the marg i nal 50%. Regional zones of higher and lower PF's were formed and zones were scattered throughout the pit, randomly, but it was noticed that block PF's were not scattered randomly in the pit, showing that the analysis can identify local stability trends in the mine. A Subway Tunnel A metropolitan subway tunnel was studied to illustrate the difference between traditional key block theorem and the positional probability of key block failures by the finite element approach for the block failure. The joint systems and their statistical details were published by cording and Mahar (1974). An unit length of the tunnel was isolated based on the discrete cell-block model. The joint systems within this cell-block were modeled and simulated for the stability analysis as done for pit slopes. When the frequency of positional block failure was projected along the unit length of tunnel, the cumulative probability of positional failure can be plotted around the tunnel as shown in Figure 8. Compared with the worst case type of analysis by the traditional key block theorem, the positional probability analysis shows clearly the size and frequency of key block occurrences in this projection.
................
:------
",""""",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1"" ......"fI'"'''''''''''''''''''''''''''''''''''' ",""""",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, '1#'''''''''''''''''''''''.' '-.11""'"'''''''''''''' i"lliI""''''''II,,,,,,,,,''OEEctJ'-'II'''''''''''''''''' """""",,,,,,,,,,-o71,,-,,,,,,,,,,,,,,"""" iN IIII."""'U,,,,,'''' 10' E92''''''''''',,,'''''' 6-"-,."""""", 'II'---------z ,",,"'11 .... ',',2
" , •••• " ••" ' - 1 9 C
,,,,---""'2211223 ""UU'''. """-'-" ' '-''' '2Jl
lJKI
-,-.""""
11- - f l ' "
",,,,,,,,-11223 #"""'''''-'23' n""",,'''-'37 mm:m:m;~
31'5'....... "'" ,....."',,,
,. " " " 51-'''''''' 82""'''''' n-,II"""
6 1-
~;mmm:
Z~~::;;=m:m::""""",,,,,,,,,,,,,,,m:::m:::
a ) HaxiDnDI reJlO••bl. are.
b) Positional probabiliti.s of failur.
FIG. 8--Comparison of key block failures (a) with the positional probability of block failures (b) for a tunnel.
YOUNG ON ANALYSIS OF EXCAVATIONS
273
The positional probability of key block failure carries important features that can be simply implemented in the geotechnical design of excavations. For the design of a roof bolting system, the anchor should be located in the area where the positional probability is zero or low. The positional probability is the probability of the bolt not being anchored effectively to a stable portion of the rock mass. Also, the parts of the excavation requiring the most support can be identified easily from this. The other measures of key block stat"istics included here are: 1. The distribution (or histogram) of key block sizes and its mean and standard deviation. 2. The total volume of key blocks which summarizes the suspectability to key block failure. 3. The frequency of different sizes of key blocks. Block Size Distributions The matrix approach for block calculations is general; general in block size, shape and location. Therefore, block calculations can be completed within a discrete element. Results can be presented in a histogram which shows the frequency of block size distribution (Figure 9). In this case the circular disk model was applied in the joint simulation. It should be a part of site characterization for risk assessments in geotechnical and geohydrological engineering.
" ~------------------------------------~
~
~
~
~
block volume
~
~
~
- =
FIG. 9--The frequency distribution of block sizes formed by three joint sets. CONCLUSIONS 1. The most important conclusion, in general, which could be drawn from this work is that the localized probabilistic stability analysis for geotechnical structures can be made at the early stages of engineering design and construction, when only sparse sample data is available. It leads to the optimum design of geotechnical structures, optimum in their relative locations and orientations with other peripheral structures, and their shapes and sizes. This is achievable through the geostatistical model of characteristic parameters of rock masses. Then, it can be said that this is an ideal model of joint systems for many engineering analyses in both rock mechanics and geohydrology. 2. As comparing PF (lK) and PF (sample), the local probabilistic analysis of pit slopes is more powerful to draw a detailed and realistic picture of slope stability conditions. The local variation of joint orientations played an important role in the slope stability and should
274
GEOSTATISTICAL APPLICATIONS
be included in slope design and construction. Also a localized fullscale risk assessment can be made from PF (IK) and the pit design and operation can be optimized progressively. 3. Geostatistics contributed significant improvements into the modeling of rock joint systems (or site characterizations) for goetechnical structural analysis. The spatial variability was fully incorporated in this modeling and the characteristic model parameters were localized. The non-parametric approach by IK simplified the modeling of three-dimensional probability distributions of pole vectors projects on the reference sphere. Otherwise, the local probability distribution of poles will never be achievable from the sparse sample data available at the early stages of engineering design. The mathematical probability density function (pdf) for directional data projected on a sphere is little known and a non-parametric approach has been desirable for a long time [13]. Even though the mathematical pdf is available, the sample data available at the early stages will never be enough to describe the local statistical distribution. 4. Geostatistics is general and applicable to any characteristic parameters of physical properties for geotechnical materials such as strength values, elastic or plastic constants and flow parameters. Therefore, local probabilistic models of these parameters are readily available from geostatistics and corresponding geotechnical analysis can be made in terms of the probability as seen here. Considering the dispersion of these geotechnical parameters around their mean values as well as their spatial variations, the local probability analysis is a natural choice for various geotechnical fields in the future. The stochastic analysis based on the global sample data distribution did not improve the overall picture of slope stability conditions over the deterministic analysis by using sample mean attitudes, although it yielded a probability of slope failure over a full range of slope angles. 5. The deterministic approach based on block theorems treats joint orientations as constant and requires "engineering judgement" to qualitatively incorporate the quantitatively ignored factors of joint sizes and spacings, and their variabilities. Because of the fixed joint orientations and assumptions of infinite size of joint, the maximum removable area approach of the deterministic key block analysis provides an upper bound to the key block size identified. When the probabilistic analysis is coupled with the localized stochastic model of joint systems in geological formations, a Significant amount of engineering judgement required to optimize the size, shape and orientation of an excavation could be replaced by quantitative solutions. 6. The positional probability of key block failure carries important features; the parts of excavation requiring the most support, probability of roof bolts not being anchored in stable zones, distributions of key block volumes, and key block sizes and frequency. 7. In-situ block size distribution should be a part of geotechnical and geohydrological site characterizations. It is a pertinent parameter to be included in key block failures, and it is related directly to the transmissibility of fluid flow through the fractured rock as in the granular materials. A further study is deserved for this hydrological application of the block size distribution. REFERENCES Baecher, G. B., Lanney, N. A. and Einstein, H. H., 1977, "statistical Description of Rock Properties and Sampling," Proceedings of the Eighteenth Symposium on Rock Mechanics, Golden, Colorado. Chiles, J. P., 1988, "Fractal and Geostatistical Methods for Modeling of a Fracture Network," Mathematical Geology, Vol. 20, pp. 631-654.
YOUNG ON ANALYSIS OF EXCAVATIONS
275
Cording, E. J. and Mahar, J. W., 1974, "The Effects of Natural Geologic Discontinuities on Behavior of Rock in Tunnels," Proceedings of 1974 Rapid Excayation and Tunneling Conference, San Francisco, CA, pp. 107138. Golub, G. H. and VanLoan, C. F., 1983, "Matrix Computations," John Hopkins University Press, Baltimore, MD. Goodman, R. E. and Shi, G. H., 1984, "Block Theory and Its Application to Rock Mechanics," Prentice-Hall, Englewood Cliffs. Grossman, N. F., 1985, "The Bivariate Normal Distribution on the Tangent Plane at the Mean Attitude," proceedings of International Symposium on Fundamentals of Rock Joints," Bjorkliden, Sweden, pp. 3-11. Hoerger, S. F. and Young, D. S. 1987, "Predicting Local Rock Mass Behavior Using Geostatistics," Proceedings of Twenty-eighth U.S. Symposium on Rock Mechanics, Tucson, AZ, pp. 99-106. Journel, A. G. and Huijbregts, Ch. J., 1978, "Mining Geostatistics," Academic Press, London. Lemmer, 1. C., 1984, "Estimating Local Recoverable Reserves via Indicator Kriging," Proceedings of Geostatistics for Natural Resources Characterization (ed. by G. Verlys), D. Reidel, Dordrecht, pp. 349-364. Miller, S. M., 1979, "Geostatistical Analysis for Evaluating Spatial Dependence in Fracture Set Characteristics," Proceedings of the Sixteenth Symposium on Application of Computers and Operations Research in the Mineral Industry, Tucson, AZ, pp. 537-545. Shanley, R. J. and Mahtab, M. A., 1975, "FRACTAN: A Computer Code for Analysis of Clusters Defined on Unit Hemisphere," U.S. Bureau of Mines, IC 8671, Washington, DC. Warburton, P. M., 1980, "Stereological Interpretation of Joint Trace Data: Influence of Joint Shape and Implications for Geological surveys," International Journal of Rock Mechanics & Mining Science, Vol. 17, pp. 305-316. Young, D. S., 1987a, "Random Vectors and Spatial AnalysiS by Geostatistics for Geotechnical Applications," Mathematical Geology, Vol. 19, pp. 467-479. Young, D. 5., 1987b, "Indicator Kriging for Unit Vectors; Rock Joint Orientations," Mathematical Geology, Vol. 19, pp. 481-502. Young, D. S. and Hoerger, S. F., 1988a, "Non-Parametric Approach for Localized Stochastic Model of Rock Joint Systems," Geostatistical. Sensitivity. and Uncertainty Methods for Ground-Water Flow and Radionuclide Transport Modeling (ed. by B. Buxton), Battelle Press, Columbus, OH, pp. 361-385. Young, D. S. and Hoerger, S. F., 1988b, "Geostatistics Applications to Rock Mechanics," Proceedings of Twenty-ninth U.S. Symposium on Rock Mechanics, Balkema, Brookfield, pp. 271-282.
Author Index
N
Benson, C. H., 181 Bogardi, L., 248 Buxton, B. E., 51
Nicoletis, S., 69
C
p
Carr, J. R, 236 Colin, P., 69 Cromer, M. V., 3, 218
Pate, A D., 51 Patinha, P. J., 146 Pereira, M. J., 146
D R
Dagdelen, K., 117 Dahab, M. F., 248 Desbarats, A J., 32
Rashad, S. M., 181 Rautman, C. A, 218 Rouhani, S., 20, 88
F
s
Froidevaux, R, 69
Schofield, N., 133 Schulte, D. D., 162 Soares, A 0., 146 Srivastava, R M., 13
G
Garcia, M., 69 Goderya, F. S., 248
T
J Turner, A K., 117 Johnson, R L., 102 Jones, D. D., 162
W
K Wells, D. E., 51 Wild, M. R, 88 Woldt, W. E., 162,248
Kannengieser, A J., 200 Kuhn, G. N., 162 L
y
Leonte, D., 133 Young, D. S., 262
z
M Miller, S. M., 200
Zelinski, W. P., 218
277
Subject Index
A
Electromagnetics, 162 Estimation procedures, 20
Annealing, 69 Arsenic,13 ASTM standards, 13, 32
F Flow model, 51 Flow simulation, 181
B H
Bayesian analysis, 102 Bivariate distribution, 69 Block failure, 262 Block value estimation, 20
Histogram, 32 Hotspots, 133 Hydraulic conductivity, 181,200 Hydraulic head analysis, 51
C I
Conceptual interpretation, 3 Conditional probability, 133 Conductivity, 162 Contamination copper, 146 delineation, 88, 102 lead, 13, 133 metal, 13, 51, 133, 146 plume mapping, 162 site mappmg, 69 soil, 133 stationarity, assessment with, 117 subsurface, 162 transport, 181 uranium, 51 Contouring, 133 Copper, 146 Core data, 218 Correlogram, 13 D
Indicator simulation, 218 Infiltrometer, 200 Interpolation techniques, 20 Inverse-distance weighting, 51 Irrigation practices, 248
J Joint model, 262 K
Kriging,20,32,162,200,262 cokriging, 88, 181 indicator, 69, 102, 117, 133, 236,262 lognormal, 51 L
Leachate, 162 Lead, 13, 133
Design analysis, 3 Direct measurement, 3
M
Mapping, 20, 32,51,200 plume, 162, 181 probability, 69 Markov-Bayes simulation, 200 Matrix approach, 262
E
Earthquakes, 236 Electncal resistivity, 69 279
280 GEOSTATISTICS Mercalli intensity, 236 Metals, heavy, 13, 133 Mine effluent, 146 Modeling flow, 51 histogram, 32 joint, 262 numerical,218 spatial variability measures, 13 stationary, 117 stochastic, 117, 146 transport, 102 variogram, 13,32,218,236, 248 water quality, 146 Multivariate approach, 88, 200 N
Nitrate, 248 Numerical methods, 3 Numerical model, 218 p
Pit slope stability, 262 Polycyclic aromatic hydrocarbons, 69 Probabilistic stability analysis, 262
Site remediation strategies, 32 Soil classifications, 181 Soil contamination, 133 Soil gas measurements, 88 Soil nitrate, 248 Space-time series, 146 Spatial patterns analysis, 248 Spatial simulation, 200 Spatial temporal analysis, 51 Spatial variation, 13, 20, 248 modeling, 32 Stability analysis, 262 Stationarity, second order, 117 Stationary model, 117 Statistical methods, 3 Stochastic images, 200 Stochastic modeling, 117, 146 Structural stability, 262 Subway tunnels, 262 T
Transport simulation, 102, 181
u Uncertainty, 102 characterization, 69 measures, kriging, 51 Uranium, 51
R
Repository system, 218 Rock quality designation, 218
s Sampling program adaptive, 102 planning, 13 strategy, 102 Second order stationarity, 117 Seismic hazard, 236 Site characterization, 102
v Variogram, 13,32,218,236 semivariograms, 248 Vario~raphies, 88 Volatile organic screening, 88
w Water quality modeling, 146
ISBN 0-8031-2414-7