STATlSTlCAl ASPECTS OF WATER QUALITY MONITORING Proceedings of the Workshop held at the Canada Centre for Inland Waters, October 7-10,1985
DEVELOPMENTS I N WATER SCIENCE, 27 OTHER TITLES I N THIS SERIES
1 G. BUGLIARELLO AND F. GUNTER COMPUTER SYSTEMS AND WATER RESOURCES 2 H.L. GOLTERMAN PHYSIOLOGICAL LIMNOLOGY 3 Y.Y. HAIMES, W.A. HALL AND H.T. FREEDMAN MULTIOBJECTIVE OPTIMIZATION I N WATER RESOURCES SYSTEMS: THE SURROGATE WORTH TRADE-OFF-METHOD 4 J.J. FRIED GROUNDWATER POLLUTION 5 N. RAJARATNAM TURBULENT JETS
6 D. STEPHENSON PIPELINE DESIGN FOR WATER ENGINEERS 7 v. HALEK AND J. SVEC GROUNDWATER HYDRAULICS 8 J.BALEK HYDROLOGY A N D WATER RESOURCES I N TROPICAL AFRICA 9 T.A. McMAHON AND R.G. MElN RESERVOIR CAPACITY A N D Y I E L D
10 G. KOVACS SEEPAGE HYDRAULICS W.H. GRAF AND C.H. MORTIMER (EDITORS) HYDRODYNAMICS OF LAKES: PROCEEDINGS OF A SYMPOSIUM 12-13 OCTOBER 1978, LAUSANNE, SWITZERLAND
11
12 W. BACK AND D.A. STEPHENSON (EDITORS) CONTEMPORARY HYDROGEOLOGY: T HE GEORGE BURKE M A X E Y MEMORIAL VOLUME
13 M.A. MARINO AND J.N. LUTHIN SEEPAGE A N D GROUNDWATER 14 D. STEPHENSON STORMWATER HYDROLOGY AND DRAINAGE 15 D. STEPHENSON PIPELINE DESIGN FOR WATER ENGINEERS (completely revised edition of Vol. 6 in t h e series) 16 w. BACK AND R. LETOLLE (EDITORS) SYMPOSIUM ON GEOCHEMISTRY OF GROUNDWATER 17 A.H. EL-SHAARAWI (EDITOR) I N COLLABORATION WITH S.R. ESTERBY TIME SERIES METHODS I N HYDROSCIENCES 18 J.BALEK HYDROLOGY A N D WATER RESOURCES I N TROPICAL REGIONS 19 D. STEPHENSON PIPEFLOW ANALYSIS
20 I.ZAVOIANU MORPHOMETRY OF DRAINAGE BASINS 21 M.M.A. SHAHIN HYDROLOGY OF T HE N I L E BASIN 22 H.C.RlGGS STREAM FLOW CHARACTER ISTICS M. NEGULESCU MUNICIPAL WASTEWATER TREATMENT
23
L.G. EVERETT GROUNDWATER MONITORING HANDBOOK FOR C OAL A N D O I L SHALE DEVELOPMENT
24
25 W. KINZELBACH GROUNDWATER MODELLING: A N INTRODUCTION WITH SAMPLE PROGRAMS I N BASIC D. STEPHENSON AND M.E. MEADOWS KINEMATIC HYDROLOGY AND MODELLING
26
STATISTICAL ASPECTS OF WATER QUALITY MONITORING Proceedings of the Workshop held at the Canada Centre for Inland Waters, October 7-10,1985
Edited by
A.H. EL-SHAARAWI National Water Research Institute, Burlington, Ontario, Canada
and
R.E. KWIATKOWSKI Water Quality Branch, Inland Waters Directorate, Ottawa, Ontario, Canada
ELSEVIER Amsterdam - Oxford - New York - Tokyo 1986
ELSEVIER SCIENCE PUBLISHERS B.V. Sara Burgerhartstraat 25 P.O. Box 21 1, 1000 AE Amsterdam, The Netherlands Distributors for the United States a n d Canada: ELSEVIER SCIENCE PUBLISHING COMPANY INC. 52, Vanderbilt Avenue New York, N Y 10017, U.S.A.
Lihrary nf Congres C~taloginginYublicationData
Etatistical aspects of water qucity monitoring (Developments in water science ; 27) Aibliography: p. Includes index. 1. Water quality--Measurement--Congresses. 2. Water quality--Statistical methods--Congresses. I. El-Shoarawi, A . H. 11. K v i a t k o w s k i , I;. E., 1949;. 111. Series. TD3C.7. S73 1990 628.1 '61 ub-24035
.
ISBN O-444-42<9S-l (U.S.)
ISBN 0-444-42698-1 (Val. 27) ISBN 0-444-41669-2 (Series) 0 Elsevier Science Publishers B.V., 1986 All rights reserved. N o part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, withhout the prior written permission of the publisher, Elsevier Science Publishers B.V./Science & Technology Division, P.O. Box 330, 1000 A H Amsterdam, The Netherlands. Special regulations for readers in the USA - This publication has-been registered with the Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. A l l other copyright questions, including photocopying outside of the USA, should be referred t o the copyright owners, Elsevier Science Publishers B.V., unless otherwise specified. Printed in The Netherlands
P R E F A C E
Statistics provides a to1 lection o f , techniques for extracting maximum i n f o r m a t i o n from a g i v e n d a t a set a n d a l l o w the c o n s t r u c t i o n of s t r a t e g y f o r f u t u r e d a t a c o l l e c t i o n . These techniques h a v e proven i n v a l u a b l e i n such f i e l d s a s a g r i c u l t u r e , medical science a n d business. However, i n the a r e a of environmental sciences, s t a t i s t i c a l a p p l i c a t i o n s are s t i l l i n t h e i r i n f a n c y , w i t h few attempts to s y s t e m a t i c a l l y develop techniques d e a l i n g w i t h environmental issues. The "Workshop on the Statistical Aspects of Water Quality M o n i t o r i n g " , h e l d October 7-10, 1985 a t the National Water Research Institute i n Burlington, Ontario, Canada was a n attempt to b r i n g together i n t e r n a t i ona I sc i en t i sts , s t a t i st i c i ans , a n d users of s t a t i st ica I methodology in I imnology, water quality regulation and control, m o n i t o r i n g network d e s i g n , a n d , m o d e l l i n g of a q u a t i c environments. The p u b l i c a t i o n of t h i s book i s one step towards identifying a p p r o p r i a t e s t a t i s t i c a l techniques a n d d i a g n o s i n g problems i n Water Q u a l i t y M o n i t o r i n g w h i c h r e q u i r e new s t a t i s t i c a l methodologies. The papers presented i n t h i s volume were peer reviewed a n d represent international expertise, consolidating detailed information on both conventional a n d new methods. The p r i m e o b j e c t i v e of the Workshop was to generate i n t e r a c t i o n between the s t a t i s t i c a l community a n d s c i e n t i s t s w o r k i n g i n the a r e a of Water Q u a l i t y M o n i t o r i n g . To t h i s end, topics covered i n t h i s Workshop fall i n t o two categories: (1) Methods Development, and ( 2 ) the I m a g i n a t i v e A p p l i c a t i o n of E x i s t i n g Methodologies. Subjects covered include: Time Series, . E s t i m a t i o n of Loading, Clustering, Model Development, Censoring Data Analysis, Quality Control and Data Acquisi t ion.
Deepest a p p r e c i a t i o n i s e x t e n d e d t o t h e N a t i o n a l W a t e r R e s e a r c h I n s t i t u t e , Department o f E n v i r o n m e n t ( M r . D.L. Egar, Director), and t h e I n l a n d Waters D i r e c t o r a t e , Water Q u a l i t y B r a n c h H e a d q u a r t e r s ( M r . Wm. T r a v e r s e y , D i r e c t o r ) f o r t h e i r C o - S p o n s o r s h i p o f t h e W o r k s h o p o n the S t a t i s t i c a l Aspects o f Water Q u a l i t y M o n i t o r i n g . We w o u l d a l s o l i k e t o e x p r e s s o u r g r a t i t u d e t o members o f t h e O r g a n i z i n g Committee ( M r . T.J. D a f o e , D r . A. Demayo, D r . S. E s t e r b y , a n d D r . G. H a f f n e r i ) f o r t h e i r u n t i r i n g h e l p i n r u n n i n g t h i s e v e n t . F u t h e r m o r e , we a r e i n d e b t e d t o e a c h of t h e Session M o d e r a t o r s f o r t h e i r d i l i g e n t efforts. In addition, sincere t h a n k s i s conveyed to Ms. J. M a j o r a n d M r . J.D. Smith who a s s i s t e d g r e a t l y w i t h p r e p a r a t i o n s f o r the Workshop, a n d , t o M r s . B. A r a f a t , M s . 6. Jones a n d Ms. S . A u s t i n f o r t h e s e c r e t a r i a l , word processing, a n d text e d i t i n g services provided. Last, b u t not least, we would like to thank the individual authors\ for their submissions. C o u n t r i e s represented b y these p a p e r s i n c l u d e : A r g e n t i n a , A u s t r a l i a , Canada, Denmark, E g y p t , England, France, Holland, Jordan, N o r w a y , S a u d i A r a b i a , S i n g a p o r e , S o u t h A f r i c a , Sweden, a n d t h e U n i t e d States of America. A.H. R .E.
EI-Shaarawi Kwiatkowsk i
C O N T E N T S
1.
S p a t i a l H e t e r o g e n e i t y of W a t e r Qua1 i t y P a r a m e t e r s S . R . Esterby
2.
U n c e r t a i n t y in Water Q u a l i t y D a t a R.H. Montgomery and T.G. S a n d e r s
3.
T h e Use of M u l t i v a r i a t e M e t h o d s i n t h e I n t e r p r e t a t i o n o f Water Q u a l i t y M o n i t o r i n g D a t a of a L a r g e N o r t h e r n Reservoir R . Schetagne
1
17
- A Transfer Function Approach
30
44
4.
Modeling River Acidity E . Damsleth
5.
S u l p h a t e , Water C o l o u r a n d D i s s o l v e d O r g a n i c C a r b o n R e l a t i o n s h i p s i n O r g a n i c Waters o f A t l a n t i c C a n a d a G.D. Howell and T . L . Pollock
53
S u l f a t e i n C o l o u r e d Waters. I. E v a l u a t i o n of Chromatographic a n d Colorimetric Data Compatibility V. Cheam, A.S.Y. C h a u and S. Todd
64
The Importance of Design Q u a l i t y Control Monitoring Program R.E. Kw i a t k o w s k i
79
6.
7.
8.
9.
D e t e r m i n a t i o n of Water Q u a l i t y Z o n a t i o n U s i n g Mu1 t i v a r i a t e T e c h n i q u e s M.A. Neilson and R.J.J. Stevens Spatial Variability M. Simoneau
to a National
i n Lake Ontario
i n t h e W a t e r Q u a l i t y o f Qukbec R i v e r s
99
117
VIII 10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Estimation of Distributional Water Q u a l i t y D a t a D.R. Helsel
Parameters f o r Censored 137
N a t u r a l V a r i a b i l i t y o f Water Q u a l i t y in a Temperate Estuary L . E . G a d b o i s and B.J. N e i l s o n
158
Extension o f Water Q u a l i t y D a t a Bases i n P l a n n i n g f o r Water Treatment G.T. O r l o b and N. M a r j a n o v i i
173
S t a t i s t i c a I I n f e r e n c e s f r o m COY i f o r m M o n i t o r i n g of P o t a b l e Water W.O. P i p e s
183
M o d e l l i n g o f B a c t e r i a l P o p u l a t i o n s and W a t e r Q u a l i t y M o n i t o r i n g i n D i s t r i b u t i o n Systems A. M a u l , A.H. E l - S h a a r a w i a n d J.C. Block
194
A Goodness-of-Fit Test f o r t h e N e g a t i v e B i n o m i a l D i s t r i b u t i o n A p p l i c a b l e t o L a r g e Sets o f S m all Samples B. H e l l e r
215
R e p o r t i n g B a c t e r i o l o g i c a l Co u n ts .from W at er Samples: How Good i s t h e I n f o r m a t i o n f r o m an I n d i v i d u a l S a m p l e ? H.E. T i l l e t t
221
Some A p p l i c a t i o n s o f L i n e a r M o d e l s f o r A n a l y s i s o f Contaminants in Aquatic Biota R.H. Green
231
A Comparative Study of the Sampling Properties of Four Similarity Indices H.W. Khoo and T.M. L i m
246
Randomized S i m i l a r i t y A n a l y s i s o f M u l t i s p e c i e s L a b o r a t o r y and F i e l d S t u d i e s E.P. Smith
261
Association of Chlorophyll 5 w i t h Physical a n d C h e m i c a l F a c t o r s i n L a k e O n t a r i o , 1967-1981 A.H. E l - S h a a r a w i , J.R. E l l i o t t , R.E. K w i a t k o w s k i and D.R. P e i r s o n
273
21
Gamma M a r k o v P r o c e s s e s R.M. P h a t a r f o d
22
Dynamic C o v a r i a t e Adjustment of Water Q u a l i t y Parameters f o r Streamflow: Transfer Function Model Selection L.D. H a u g h , Y. Noda and J. M c C l a l l e n
293
23.
R e s i d u a l s from Regression w i t h Dependent E r r o r s R. J K u Ip e r g e r
24.
Alternatives for Differences E.A. McBean
.
Identifying Statistically
302
318
Significant 326
IX 25.
G l o b a l V a r i a n c e a n d Root M e a n S q u a r e E r r o r A s s o c i a t e d W i t h L i n e a r I n t e r p o l a t i o n of a M a r k o v i a n T i m e - S e r i e s D.A. Cluis
26.
E m p i r i c a l P o w e r C o m p a r i s o n s o f Some T e s t s f o r T r e n d K.W. H i p e l , A.I. M c L e o d and P.K. Fosu
27.
S t a t i s t i c a l A s s e s s m e n t of L i m n o l o g i c a l D a t a S e t : I n t e r v e nt i o n A n a l y s i s R. C l i f f o r d , J.W. Wilkinson and N.L. C l e s c e r i
335
347
363
28.
The Change P o i n t Problem: A Review of A p p l i c a t i o n s V.K. J a n d h y a l a and I.B. M a c N e i l I
38 1
29.
S p e c t r a l A n a l y s i s of L o n g - T e r m P.H. W h i t f i e l d
388
30.
Bayes Estimation of Parameters of F i r s t Order Autoregressive Process M.S. A b u - S a l i h and A.A. Abd-Alla
W a t e r Qua1 i t y R e c o r d s
31.
A Systems A p p r o a c h t o C o m p u t e r i z i n g D a t a A c q u i s i t i o n T.R. Clune
32.
H i g h F r e q u e n c y W a t e r Q u a l i t y M o n i t o r i n g of Stream N.E. Dalley
33.
405 418
a Coastal
T h e D e s i g n of a Cost E f f e c t i v e M i c r o c o m p u t e r - B a s e d D a t a A c q u i s i t i o n System K. O k a m u r a and K. A g h a i - T a b r i z
433
443
34.
O n t h e E s t i m a t i o n of M o n t h l y M e a n P h o s p h o r u s L o a d i n g s M.E. T h o m p s o n and K. Bischoping
460
35.
E s t i m a t i o n of L o a d i n g b y N u m e r i c a l I n t e g r a t i o n A.H. E l - S h a a r a w i , K.W. K u n t z and A. S y l v e s t r e
469
36.
I n t e r v e n t i o n A n a l y s i s o f S e a s o n a l and N o n s e a s o n a l D a t a to E s t i m a t e T r e a t m e n t P l a n t P h o s p h o r u s L o a d i n g S h i f t s
K.A. 37.
INDEX
Booman,
P.M.
479
B e r t h o u e x and L. P a l l e s e n
Sediment Responses D u r i n g Storm E v e n t s i n S m all F or es t ed Watersheds W.A. R i e g e r and L.J. O l i v e
490
499
This Page Intentionally Left Blank
SPATIAL HETEROGENEITY OF WATER QUALITY PARAMETERS
S.R. ESTERBY National
Water
Research
Institute, Canada
Centre
for
Inland
Waters, Burlington, Ontario, Canada
1.
INTRODUCTION
In water quality monitoring, it is difficult to think of a situation in which a decision does not have to be made where
to
collect
water
quality
samples.
Thus,
measurements
general, knowledge
have
at a
spatial
quality
about
potentially, all
In
component.
parameters vary
over
the region of interest is necessary to the understanding
of a
system.
o f h o w water
least
It may be of primary interest, as, for example, in the
study of the transport of a pollutant, or of secondary interest in that it may be necessary to remove the spatial component in order to detect changes over time.
Dependent upon our state of
knowledge, the analysis o f water quality data for the spatial component can be placed under one o f 1)
the following objectives:
characterization o f heterogeneity,
or
2)
testing
for
and
In this paper,
estimation o f a well-defined spatial component. only the first objective will be considered.
The nature of the data and the reasons for the analysis will determine
the
methods
used
for
Three broad classes of procedures methods,
2)
spatial
characterizing heterogeneity.
1)
are available:
autocorrelation
methods
and
3)
grouping methods
which involve fitting a function for the relationship between a water
quality
parameter
and
location,
with
or
without
the
ass,umption of independence.
In the present paper, a general overview of methods suitable for water quality
studies has
been
clearly t o o extensive to review here. has
been
analysis
limited for
to
spatial
separable is intended.
spatial and
(1984)
The
topic
is
Although the discussion
variation,
temporal
Haugh
attempted.
no
suggestion
variability
are
that
entirely
suggests directions
such
2
analyses might take for regularly spaced sample points based on extensions o f the Box and Jenkins approach to time series analysis. It will be useful to consider an example in which spatial heterogeneity is important before discussing methodology.
Such
an example is the issue (Barica, 1982) of anoxic conditions in Lake
Erie
objective between year
(for of
Canada
round
a
the
brief 1978
review Great
see
Lakes
1984).
Kwiatkowski, Water
and
the United
States was
aerobic
conditions
in
the
Quality
An
Agreement
the restoration o f
bottom
waters
of
the
Central Basin of Lake Erie. The historic records of hypolimnetic dissolved oxygen have been examined for the existence
of
a
time
trend
by
several
Gilbertson, 1971; Charlton, 1979; and Data selection was considered authors (Anderson et al.,
authors
Rosa
and
(Dobson Burns,
and
1981).
essential by all three sets of
1984).
Dobson and Gilbertson (1971)
used dissolved oxygen concentrations from samples for which the temperature was within
3°C
of the minimum.
Charlton
(1979)
used only near-bottom values at stations that had a depth over 15 m, were stratified and showed no evidence of incursion of Eastern Basin water. Burns and Rosa (1981) attempted to establish a representative, homogeneous area by calculating a depletion rate distribution map. clearly spatial considerations. variation Spatial
would
exist.
heterogeneity
This can
supplementary measurement
Some of these criteria are Others are reasons why spatial raises
sometimes
an be
additional associated
such as temperature.
point. with
a
Inclusion of
this additional variable in the model may make simplifying assumptions, such as independence of errors, tenable and permit the use of conventional statistical methods. DATA SETS AND NOTATION
2.
Notation is given here over
both
space
and
quality parameters
time
and
(Table and
1) for a data set collected
containing
descriptors of
the
a
number
of
water
location or water
mass, which will be called supplementary observations in this paper. The primary use for the notation will be to show which dimensions of such a general data set can be or are normally handled
by
a
particular method
measurements on p water quality
of
analysis.
parameters at
The matrix
of
I stations at
3
TABLE 1 Description of a g e n e r a l water q u a l i t y d a t a s e t . S u b s c r i p t s of Y M a t r i x
Description o f Rows Stat ion
Depth
-- - --- I
-
2 ^1 I I I y .
... ...
1 2
1,2 232
"1
2
n1 +I n1 +2
n1,2 n1+1,2 n1+2,2
... . .. ...
"2
nl+"2
n1+n2,2
.. .
1
n-n I +1 n-n + 2 I
n -n + 1 , 2 I n -nI+2, 2
Y1
2
n
I
...
...
.. .
n
Vector n o t a t i o n , length=IC Vector n t a t i o n , length=I derived v a r i a b l e
P
------
I,p 2,p
Supplementary Observation 1 2 9
...
ID
n-nI+l ,p n -n I + 2 , p
nsp
_ _ l l p l ^ l l l l l
Vector n o t a t i o n , length=n
--------
Water Q u a l i t y P a r a m e t e r 1
2
I
---
_ I I _ -
Row
.__-
1
2
-
S u b s c r i p t s of X Matrixa
I _ -
--
11
1 2
...
zp
1 1
11 1
2!2 1
...
2!,1
1 1 1 X21."
1 1
42
...
z -P
a
1 1 1
1 2
ll2
*
-y- 9
1! q l
...
u
--P
a S u b s c r i p t s o f m a t r i x X t a k e t h e same f o r m a s t h o s e o f m a t r i x Y. bAssumin t h e c o o r d i n a t e s of t h e s t a t i o n s a p e a r i n columns 1 and 2 o f t h e X m a t r i x , t h e e l e m e n t s of a coyumn d e n o t e d by unbroken v e r t i c a l l i n e s a r e e q u a l s i n c e t h e y c o r r e s p o n d t o t h e c o o r d i n a t e f o r t h e same s t a t i o n . cThe s e c o n d s u b s c r i p t i n d i c a t e s t h a t . t h e v e c t o r i n c l u d e s o n l y t h e elements correspondlng t o t h e f i r s t depth f o r each s t a t l o n . dAn e x a m p l e o f a d e r i v e d v a r i a b l e i s some f u n c t i o n of t h e v a l u e s of t h e o r i g i n a l v a r i a b l e f o r a l l d e p t h s s a m p l e d a t a particular s t a t ion.
4
time
is
t
denoted
corresponding
by
Yt
{y;j}t
=
supplementary
and
observations
the by
matrix Xt
of
q
{xik}t.
=
The number of values for each parameter is n since samples are assumed to be collected at ni depths at the ith station and
I
n
1
=
ni
i=l The subscript t will be dropped to simplify it
is
implicit.
the notation, but
For ease o f description, it will be assumed
that the first two columns of X contain the coordinates of the station, and the third column the depth of the measurement. 3.
THE IMPORTANCE OF CHARACTERIZING SPATIAL HETEROGENEITY The idea, that, if sampling is being conducted over time and
space, the variability in space should be accounted for in the design of the sampling program and the analysis of the data, is so
fundamental
objectives
of
that
it
needs
statistical
no
elaboration.
design
and
Two
analysis
of
the
which
are
achievable in environmental field studies are the increase in precision by removing as much and
the
elimination
of
as possible from the error term
bias.
spatial
component
could enter either as large variability or as bias.
An
ignored
Although
the examples given below are for other components, the comments are equally applicable to spatial variability.
In the overlooked
first at
example
the
the
design
day
to
stage.
In
heterogeneity o f phytoplankton, Platt conclusion that area
increased
remained
the between-station to
a
relatively
density
of
constant.
et
variability studying
al.
(1970)
variance rose as
ten By
.
day
stations
identifying
per
spatial d r e w the sampling
mile,
the
was
then
points
in
their Figure 1 by date and replotting in the original variance units
(Figure
I),
one
sees that
estimates
o f between-station
variance which were obtained on the same day are approximately equal.
The impression is obtained that the date of sampling is
important
in
sorting out
station variances.
the
reasons
for
different
hetween-
However, since variances based on markedly
different distances between stations were obtained for one day only, the day-to-day cannot be assessed.
variation in the between-station variances
5
26
20
el2 23
el6
14-
ex
12-
c\I
10-
CUN
0
el3 013
8-
22
64-
el9
AREA km2
Fig. 1
Plot of between-station reported
by
Platt et
variance and
al.
(1970).
sample area as
Numbers give the
day of sampling.
An example o f adjusting for a supplementary observation, by the inclusion of an additional term in a model o f a time trend, (1984)
is given by El-Shaarawi issue
o€
Lake
Erie
covariate
in the
existence
of
a
anoxia.
in further work related to the Water
level
was
analysis of oxygen depletion time
trend.
This
treated rates
provides
a
as
for
a
the
statistical
solution to the problem of correcting for hypolimnion thickness which had been identified
earlier by Charlton
(1979)
and Rosa
and Burns ( 1 9 8 1 ) . 4.
GROUPING PROCEDURES
In the present context, the common element to the procedures in
this
broad
class
is
the
division
of
a
set
of
points
in
space, usually sampling stations, into two or more groups such that members in the same group are more similar to each other than
to members
in other
groups with
respect
to one or more
6 water
quality
clustering plotting
parameters.
methods.
points
Included
The
in
are
geometrical
low-dimensional
geometrical
methods
Euclidean
and
consist
space
of that
so
points which are similar to one another occur close together. Clustering procedures use a mathematical criterion to partition the set o f points
into homogeneous groups.
methods
are
complementary
provide
objectivity
and
since
the
the
geometrical
natural groupings (Gordon, 1981).
The
two types of
clustering
procedures
procedures
display
T o discover the structure in
a data s e t , more than one procedure will often be needed. These procedures are widely used in ecology and recognition of
their
importance
is
be
to
found
in
the
book
(19841, which is devoted entirely to this topic. method,
factor
analysis,
geology (JBreskog et al., methods, include
than
with
most
frequently
Pielou
used
method
in
Books concerned more with the
applications
(1973),
Anderberg
is
1976).
by
The geometric
in
a
particular
(1975),
Hartigan
discipline,
( 1 9 8 1 ) and
Gordon
Lebart et al. (1984). With a few exceptions, these procedures divide the data set into
groups
parameters
on
strictly
and
no
categorization.
the
basis
information about
of
the
values
of
location is used
the
in this
The examination o f the spatial distribution of
the members of the groups is done after the categorization is complete,
usually
diagram which
by
plotting
represents the
the
groups
on
a
map
location in space.
or
other
Constrained
clustering methods, which require members o f a cluster to be spatially contiguous, are available for data on which a linear ordering has been imposed (e.g. but
other
spatial
Constrained
transect or depth profile data)
arrangements
clustering
methods
are have
much
more
been
complicated.
applied
to
pollen
percentages for sediment cores (Gordon, 1981). Grouping procedures one
or
more
water
are generally applied
quality parameters or
to the values
of
some combination of
water quality parameters and supplementary observations at one sampling
depth,
or
to
derived
values
specified part of the water column. y
=
(1111
(Table 1).
corresponding
to
a
F o r example, the matrices
(zl z2 ...
L P 1 ) or Z = Z P ) might be used A simple example o f a derived variable is the mean
L-21
* * *
of the concentration for all depths sampled at a station.
4.1
Examples o f grouping based on water quality d a t a Clustering methods have been developed for the zonation of a
lake into regions, homogeneous with respect to the level of one
or more water quality parameters.
A clustering method, which
defines a homogeneous set o f concentrations as one which can be fitted
by
coliform
a
Poisson
distribution,
concentrations
cruise by
cruise
season was
sampling
(El-Shaarawi
1984).
El-Shaarawi,
at
found
A
was
et
1981;
pattern
to
on
stations
al.,
spatial
applied and
which
surface
Lake
Erie,
Esterby changed
and with
be qualitatively consistent from year
t o
to
year, whereas apparent biases from year to year made comparison of
concentrations
impossible.
This
is
an
example
where
a
grouping procedure was a natural choice due to the discontinuA
ous nature of the spatial regions of similar concentration.
second procedure, suitable for a larger class of water quality parameters, is based o n a linear additive model for the station and
cruise
components
(El-Shaarawi
and
of
data
1978).
Shah,
collected
Allowance
in
is
orthogonality and the need to transform the data. station
component
is
significantly
one
made
different
year
for
non-
Provided the from
zero,
a
criterion based on the change i n the residual sum of squares is used to group stations. a
single
water
quality
1977)
Kwiatkowski,
The procedure has been used mostly for
but
parameter
the
( e.g.
multivariate
El-Shaarawi
extension
was
and also
given by El-Shaarawi and Shah. data on water
Existing
precipitation have
been
1985;
analyzed
and
quality
for sets o f lakes by
Haemmerli
grouping and
parameters
related
to
in regions o f Eastern methods
Bobge,
this
(El-Shaarawi volume)
to
et
al.,
determine
subsets of similar lakes which can subsequently be used design of future data collections.
acid
Canada
El-Shaarawi et a l .
in the applied
both a graphical method, in which the lakes were plotted on the first three principal components, and a k-means non-hierarchical nearest-centroid lakes.
Plots
variation
of
clustering method
the
explained
descriptions summary
of
the
by
clusters a
in
given
variables
statistics, histograms
to determine
space, number
within and
Q-Q
the of
groups of
percentage clusters,
of and
clusters,
by
means
of
plots,
were
used
to
select the number o f clusters and to characterize the clusters.
8 Two
examples
considerably following. oxygen
of
the
different
use
of
from
clustering methods the
above
which
are
are
the
examples
The first example is another analysis o f dissolved
concentrations
discussed
in the
in
Lake
Erie,
introduction.
the
issue
which
was
( 1 9 8 4 ) used
Anderson et al.
a
clustering method on three-dimensional spatial data, consisting
of
the
temperature and
surface and
bottom
the dissolved
samples at
each
oxygen concentration o f
station,
to
divide
these
points into groups which were assumed t o be either hypolimnetic or
not
hypolimnetic.
El-Shaarawi
stations on the Niagara clustering between
method
chlorinated
the
et
separately
substances
(1985)
al.
grouped
applying a complete linkage
standardized
calculated
organic
sediments. below
to
stations
River by
in
Euclidean
distances
from
ranks
water
the
and
in
of
suspended
Ranks were used as a means o f dealing with values
the detection
limit
and,
for each
phase,
the
stations
were ranked separately for each substance. 5.
SPATIAL AUTOCORRELATION METHODS If the value o f a variable depends upon the values of
same
variable
at
neighbouring
autocorrelation exists.
The
locations,
problem
of
autocorrelation exists is more difficult
then
the
spatial
determining
whether
for spatial data than
for a time series since dependence may extend in all directions for
the
spatial
data
series.
The
thorough
treatment
which
were
but
monograph of
introduced
by
only Cliff
the
into
the
and
Ord
spatial
earlier by
past
for
(1973)
the
autocorrelation
Moran
only examples f r o m geography were used
and
time
provides
Geary.
a
measures Although
by Cliff and Ord, the
methods are applicable to any fixed set of points in space. Let
11'
=
[ y l , y 2 , ...,yn]
be
the
vector
variable at m points in space, for example,
or
z1 corresponding
matrix Y (Table
1).
to
the
first
water
y
and
values
could be
quality
of
a
y1, 1 1 1
parameter
of
Then the general forms o f the Moran and
Geary spatial autocorrelation coefficients are
i# j
of
9
respectively, where and
The
spatial
weights the
information
{wij}.
form
of
The the
autocorrelation,
enters
only
€or
spatial
test
test
that
of
is
the
a
through
the
matrix
of
autocorrelation is
in
hypothesis
random
of
no
distribution
in
spatial space,
against an alternative as specified by the matrix {wij}. Jumars et al. (1977) first applied this method in the field of ecology.
These authors recognized that the method uses the
spatial information in the data, which is ignored in tests such as Fisher's
index o f dispersion, but
is less restrictive than
spectral analysis which requires intensive and regularly-spaced samples.
The
pairs
connected,
are
matrix
{wij] in
requires
some
the
sense, and
definition of which either
a measure
of
adjacency or of distance between pairs of connected localities (Sokal,
1979).
When
no
a
priori
distance
is
available, a
spatial correlogram, obtained using the unweighted autocorrelation
coefficient,
can
be
used
to
suggest
the
form.
A
more
recent application and a list of other ecological applications are given by Mackas (1984). One
of
the
applications
of
the
above
method
is
in
the
estimation of patch diameters.
Spectral analysis has been used
in ecology
Two examples are
for this purpose.
(1970) and Lekan and Wilson (1978). regularly sampled
Platt
et
al.
In both cases, densely and
transects with gradients of
temperature
and
salinity were used and the objectives included inferences about patch
size
or
acknowledged
length
that
this
scale
of
method
phytoplankton. would
Platt
et
underestimate patch
al. size
since the transect may not pass through the widest part of the patches.
As far as the author can determine, the methods based on the Moran and
Geary spatial autocorrelation coefficients have not
been applied to water quality parameters other than chlorophyl
10 1984).
(Mackas, biomass and
used
Lekan
and
considered
the
same
new
indicator
analyses
(1978).
each
the
the
spectral
Wilson
for
parameters,
is
which
in
The
by
form of
application
literature
on
of
phytoplankton
Platt
and
ecological
et
for
(1978)
al.
{wij}
needs
water
applications
to
be
quality
is
more
r e l e v a n t t h a n t h a t on g e o g r a p h i c a l a p p l i c a t i o n s .
6.
P A R A M E T E R E X P R E S S E D AS A F U N C T I O N OF P O S I T I O N
THE
Methods o f
the
previous
about
the
response
based
upon a n e x p l i c i t
position. two
However,
sections
section
surface.
since
provide
In t h i s
indirect
section,
is
there
are
and
its
c o n s i d e r a b l e o v e r l a p between
the
r e l a t i o n s h i p between a
spatial
inferences
t h e methods variable
is
autocorrelation
also
considered
here. in
As
the
Consider
x'
at
points
variable
m
1).
(Table quality
previous
again
The
parameter,
but of
x;p) the =
5;
or
a
=
1,2
,...,m .
location,
where
space
procedures
variable
term
(xil, Then
expressed
could
derived
this
xiz,
the
value the
sum
values
~ a
be
used
of
the
51
the
water
quality
only
5;
for (xil,
=
location
three-dimensional of
a
water
Let
provide
of a
1 ~, 1 o 1r
either
from
w i l l
univariate.
of
be of
section.
xi3) or
are
vector
values
parameter in
i n two-
as
the
which
gives
model
i t h v a r i a t e value
variable
deterministic
of
space for at
the
and
i
ith
random
is
terms,
If
or
the a
in
vector
parameter
parameters
section,
= [yl,y2, ...,ym],
X
is
a
the value
matrix of
the
m
rows
given
random
of
term
at
by
the
row
vectors
i does not
location
5;.
depend
upon t h e v a l u e a t n e a r b y l o c a t i o n s ,
where
the
parameters, in
(4).
ei's
are
uncorrelated.
ordinary least
In t h e m o r e g e n e r a l f o r m ,
stationarity
is required.
For
f
linear
in
the
s q u a r e s c a n be u s e d t o f i t t h e model
If
(3),
some
assumption
the dispersion matrix
about
i s assumed
11 known, p.
generalized
least
squares
Hand-drawn
contour
To
variability. geologists
can
procedure model
be
which
of
be
applied
grid
as
consists of
1)
surface
the
on
(Rao,
3)
on
(Davis,
method the
literature
is
the
thus,
using standard
known
is
prediction consists the
the
of
of
sum o f
unobserved
on
between
estimating
the value of done
point
in
Equation
(3)
covers
i.e.
stationarity
case
with
constant
drift,
the
unbiased
BLUE
and
mean
which
he
statistical
new
terminology
the observations
a
for
under
t h e mean,
(Delhomme,
at
this
estimator p,
kriging
(BLUE) and
for
that
it at
E(yp),
which
accounts
for
at
that an
this
is analogous
the
1981,
since
point
is
28-31).
hypothesis",
kriging,
f(xi)
and
t o what
pp.
"intrinsic
and u n i v e r s a l
the
distinction
unobserved
Smith,
1978)
are
variable
value,
Note
and
that
Contours
t h e mean
term
value
the
point
observation
kriging
of
shows
the
the
i.e.
reduces
to
a
i n t h e f i r s t case.
Delhomme
(1978)
provides
an e s t i m a t e of
contouring,
states
including
that
an advantage
to kriging
p r e c i s i o n which most
least
squares,
do
not.
The
method
of
ordinary
least
is
that
techniques
As
can
from t h e a n a l o g y w i t h r e g r e s s i o n which w a s g i v e n above, so.
trend
reference
in
variogram.
value
(1971)
(Draper
of
the
the
unobserved
an i n d i v i d u a l
regression
value
accounts
useful
(1971),
In kriging,
values.
the
which
language
defines
an
p
nearby
called
and t h e c o r r e l a t i o n t a k e s t h e form of a
best-linear at
constant
at
w a s d e v e l o p e d by
A
Watson
regression
2) using the
is
procedure,
English
effect,
Watson
yp
the
dependence
by
the The
the variable
of
Matheron.
the
predicting
points.
estimator
by
to in
procedures, assume
polynomial
as k r i g i n g ,
spatial
position.
procedure A
function called
by
unobserved
of
contours
s t a t i s t i c a l terms.
parametric
constructed
fitting a
known
paper
a r e assumed c o r r e l a t e d ,
not
1965,
procedure,
which
function
1973).
led
technique
and
a
display this
several
methods
interpolation
geomathematicians
this
adopted
constructing
numerical
analysis
relates
to of
spatial coordinates,
for spatial autocorrelation, French
used
t o e s t i m a t e t h e mean v a l u e o f
and
some
have
statistical
the variable
points
often
subjectivity
expressed
r e g r e s s i o n model
using
the
others)
well-known
variable
are
maps
avoid
(among
on
based
it
can
180).
squares
be
for seen
this
provides
is an
12 estimate o f the precision, however, the method o f trend surface analysis, as defined above, does not use this capability. El-Shaarawi and Esterby ( 1 9 8 1 ) to
an
have shown h o w this can be done
construct contours o f constant value, for either the mean or individual observation, and
these contours.
to attach confidence bands to
The method was applied to surface temperature
data from Lake Erie. Whatever the assumption about tion,
the
matrix.
difficulty
is
in
the form of the autocorrela-
estimation
of
the
dispersion
This requires estimation o f the variogram in kriging 1978)
(Delhomme,
which
is usually
done by
ad hoc
procedures.
Iterative procedures for regression with correlated errors are given
by
Cliff
and
(1973)
Ord
and
Cook
and
(1983).
Pocock
Maximum likelihood methods of estimation are considered by Cook and Pocock ( 1 9 8 3 )
and Mardia and Marshall ( 1 9 8 4 ) .
Methods o f testing for spatial autocorrelation in regression residuals applied (1973)
by
Cliff
in paleoecology
are
given
(Howe
and et
Ord al.,
(1973)
and
1984).
have
Cliff
been
and
Ord
stress the fact that detection of autocorrelation in the
residuals may be due to one of the following:
1) an inadequate
form
and
for
the
variables,
relationship
such
as
using
between a
dependent
linear
model
when
independent curvature
is
present, 2 ) omission o f one or more regressors and 3 ) the need for autocorrelation structure in the model. 2),
means
of
removing the
Clearly, in 1 ) and
autocorrelation from the residuals
exist, which are simpler than the methods incorporating spatial autocorrelation. Data collected in space need not exhibit spatial autocorrP1ation
for various
distances distance (1983)
between within
reasons. the
which
points
It may not be detectable because in
dependence
space
are
occurs.
larger Cook
discuss aggregation to remove correlation.
than
and
the
Pocock
Analogously,
both spacing and the use o f means or medians over seasons h a v e been suggested as methods o f reducing serial correlation in the analysis of water quality data for temporal trends (van Belle and Hughes,
1984).
The consequences o f using
ordinary
least
squares, when errors are correlated, are inefficient estimators of the regression parameters and a downwards biased
estimator
of the variance with the latter resulting in an overestimate of the significance o f the regression (Cliff and Ord, 1 9 7 3 ) .
13 Methods
in
expressing temporal
this
the
section
components
also
as
variable
and
provide
the
other
sum
of
the
capability
spatial
explanatory
of
components,
variables
such
as
An example of this, which encompasses many of the
temperature.
points'discussed in this section, is the complicated model used by
Eynon
and
Switzer
(1983)
to
construct
contour
maps
of
rainfall pH. 7.
DISCUSSION The many dimensions of water quality data sets make analysis
difficult.
Data is often collected to meet objectives related
to monitoring the change in water quality conditions which
are
necessarily
the
too
general
to
dimensions o f the problem.
be
of
help
in
reducing
Thus, cluster analysis and related
methods, which d o not use the spatial location but can be used to
examine
the
complementary
to
structure the
of
multivariate
univariate
methods
data,
which
do
are
use
the
spatial location.
The analyst can expect to use the classes of
methods discussed
here
scientific
in
understanding
an
iterative fashion, coupled with
of
the
system,
to
arrive
at
a
characterization o f spatial structure. Of the methods discussed, only the grouping procedures are strictly for the purpose of discovering structure in the data. The other methods, even, in the Characterization stage, are used for testing hypotheses and estimation. REFERENCES Anderberg,
M.R.,
1973.
Cluster
Analysis
for
Applications.
Academic Press, New York, 359 pp. Anderson,
J.E.,
Unny, T.E., Erie
El-Shaarawi,
1984.
(U.S.A.-Canada),
variability Hydrol., Barica, J.,
using
A.H.,
Esterby,
S.R.
and
Dissolved oxygen concentrations in Lake 1.
cluster
Study and
of
spatial
regression
and
temporal
analysis.
J.
72: 209-229. 1982.
Lake Erie depletion controversy.
Lakes Res., 8(4): Charlton, M.N., Lake Erie:
1979.
J. Great
719-722. Hypolimnetic oxygen depletion in central
Has there been any change?
Sci. Publ. Ser. No. 1 1 0 ,
25 pp.
Inland Waters Dir.,
14 Cliff,
A.D.
and
Ord,
J.K.,
1973.
Spatial Autocorrelation.
Methuen, New York, 178 pp. Cook,
D.G.
and
Pocock,
geograhical mortality correlated errors. Davis, J.C.,
1973.
S.J.,
1983.
Multiple
regression
in
studies, with allowance for spatially
Biometrics, 39: 361-371. Statistics and Data Analysis in Geology.
Wiley, New York. Delhomme, J.P., 1978.
Kriging
in
Water Resources, 1: 251-266. Dobson, H.H. and Gilbertson, M.,
the
1971.
hydrosciences.
Adv.
Oxygen depletion in the
hypolimnion of the Central Basin of Lake Erie, 1929-1970. Proc. 14th Conf. Great Lakes Res., Int. Assoc. Great Lakes Res., pp. 743-748. Draper, N.R. and Smith, Analysis. El-Shaarawi,
H.,
1981.
Applied
Wiley, New York, 709 pp. A.H. and Kwiatkowski, R . E . ,
describe
the
inherent
spatial
and
1977.
temporal
parameters in Lake Ontario, 1974.
Regression A
model
to
variability o f
J. Great Lakes Res.,
3:
177-183. El-Shaarawi, A.H. for
and Shah, K.R.,
classification o f
Inland
Waters
Institute, Ontario. El-Shaarawi,
1978.
a lake.
Directorate,
Canada A.H.,
Centre
National
for
Esterby,
Statistical procedures
Scientific Inland
S.R.
Series Water
Waters,
and
Dutka,
86. Research No.
Burlington, B.J.,
1981.
Bacterial density in water determined by Poisson or negative binomial
distributions.
Appl.
Environ.
Microbiol.
41:
107-1 1 6 . El-Shaarawi, regression
A.H. de
la
Lake
Erie
Esterby,
variation
A.H.,
1984.
El-Shaarawi, A.H., the
spatial
Dissolved
(U.S.A.-Canada),
dissolved oxygen in the Hydrol., 72: 231-243. 1985.
1981.
S.R.,
d'une
Analyse
de
caracteristique
Eau du Qudbec, 14: 222-228.
limnologique. El-Shaarawi,
and
2. Central
Esterby, S.R.,
oxygen concentrations A
statistical
Basin
of
Warry, N.D.
in
model
for
Lake Erie.
J.
and Kuntz, K.W.,
Evidence o f contaminant loading to Lake Ontario from Niagara
1278-1289.
River.
Can.
J.
Fish.
Aquat.
Sci.,
42:
15
El-Shaarawi, A.H., Esterby, S.R., Clair, T. and Lemieux, R., 1985. Spatial variability o f acidifiation-related water quality
parameters
Contribution C 8 5 -
.
for
lakes
in
Eastern
Canada.
NWRI
Environment Canada, Burlington, Ont.
1984. Col iform Esterby, S.R. and El-Shaarawi, A.H., Hydrobiologia, concentrations in Lake Erie - 1 9 6 6 to 1 9 7 0 . 111,
133-146.
Eynon, B.P. and Switzer, P., 1 9 8 3 . acidity.
The variability of rainfall
Canad. J. Statist., 1 1 : 1 1 - 2 4 .
Gordon, A.D.,
Classification.
1981.
1 9 3 pp. Hartigan, J.A.,
1975.
Chapman and Hall, London,
Clustering Algorithms.
Wiley, New York,
3 5 1 pp.
Haugh, L.D. O.D.
Practice 5.
introductory
overview of
modelling spatial time series.
to
Anderson
Howe, S .
An
1984.
approaches
(Editor),
Time
Series
some recent
In:
Analysis:
Theory
and
North Holland, Amsterdam, pp. 2 8 7 - 3 0 1 .
and Webb, T., 111, 1 9 8 4 .
climatic terms:
Calibrating pollen data in
improving the methods.
Quaternary Science
Reviews, 2 : 1 7 - 5 1 . Jdreskog, K.G., Klovan, J.E. and Reyment, R.A., 1976. Geological Factor Analysis. Elsevier, Amsterdam, 1 7 8 pp. Jumars,
P.A.,
Thistle, D.
two-dimensional
and
spatial
Oecologia, 2 8 : 1 0 9 - 1 2 3 . Kwiatkowski, R.E., 1984. El-Shaarawi Lakes
(Editor),
Surveillance
Ser. No.
136,
Jones, M.L., in
Lake
review.
Erie
biological
Statistical Assessment
Program,
1966-1981,
Detecting
1977.
structure
Lake
In : of
the
Erie.
data. A.H. Great Sci.
Inland Waters Directorate, Environment Canada,
Burlington, Ontario, pp. 3 - 2 6 . Lebart, L., Morineau, A. and Warwick, K.M., 1 9 8 4 . Multivariate Descriptive Statistical Analysis. Wiley, New York, 2 3 1 pp. Lekan,
J.F.
and
Wilson, R.E.,
1978.
Spatial variability
of
phytoplankton biomass in the surface waters of Long Island. Estuarine and Coastal Marine Science, 6 : 2 3 9 - 2 5 1 . Mackas,
D.L.,
community
1984.
composition
Spatial in
a
autocorrelation continental
of
shelf
plankton ecosystem.
Limnol. Oceanogr., 2 9 : 4 5 1 - 4 7 1 . Mardia, K.V. estimation regression.
and Marshall, R.J., 1984. Maximum likelihood o f models for residual covariance in spatial Biometrika, 7 1 : 1 3 5 - 1 4 6 .
16 1984.
Pielou, E.C.,
The
Interpretation o f
Ecological
Data.
Wiley, New York, 2 6 3 pp. Platt,
T.,
Dickie,
L.M.
and
Trites,
1970.
R.W.,
Spatial
heterogeneity o f phytoplankton in a near-shore environmeat. 27:
J. Fish Res. Board Can., Rao,
1965.
C.R.,
Linear
Statistical
Rosa, F. and Burns, N.M., of
1981.
central
and
approach indicates change. Basin
Inference
and
its
Wiley, New York, 5 2 2 pp.
Applications. hypolimnion
1453-1473.
Oxygen
Oxygen depletion rates in the eastern
Lake
Erie.
A
new
Presented at Workshop on Central
Depletion, Dec.
2-3,
1981,
Canada Centre for
Inland Waters, Burlington, Ontario. Sokal, R.R.,
1979.
correlograms. Contemporary
Ecological parameters inferred from spatial
In:
Patil and M.
G.P.
Quantitative
International Maryland, pp.
Co-operative
trend in water quality. G.S.,
Geology, 3 :
Rosenzweig (Editors),
and
Publishing
Related
Ecometrics.
House,
Fairland,
167-196.
van Belle, G. and Hughes, J:P., Watson,
Ecology
1971. 215-226.
1984.
Nonparametric tests for
Water Resour. Res., 2 0 :
Trend
Surface
Analysis.
127-136.
Jour.
Math.
UNCERTAINTY I N WATER QUALITY DATA
MONTGOMERY and THOMAS G. SANDERS Environmental Engineering Program, Department of C i v i l E n g i n e e r i n g , Colorado S t a t e U n i v e r s i t y , F o r t C o l l i n s , CO 80523 ROBERT H.
1.1 ABSTRACT Water q u a l i t y d a t a a r e c o l l e c t e d t o p r o v i d e information t o a s s i s t i n t h e u n d e r s t a n d i n g and managing of water r e s o u r c e s . The u s e f u l n e s s o f water q u a l i t y d a t a c o l l e c t e d is i n v e r s e l y r e l a t e d t o t h e amount of u n c e r t a i n t y i n t h e d a t a . Data u n c e r t a i n t y may be d e f i n e d a s a s t a t e o f d o u b t i n how r e p r e s e n t a t i v e o b s e r v e d values a r e of t h e t r u e p o p u l a t i o n c h a r a c t e r i s t i c s . Data u n c e r t a i n t y may b e e s t i m a t e d a s a f u n c t i o n of b o t h sampling and nonsampling e r r o r s . Sampling e r r o r s r e s u l t from t h e sampling network d e s i g n ( l o c a t i o n and f r e q u e n c y of sample c o l l e c t i o n ) which s a m p l e s o n l y a s u b s e t of t h e t o t a l p o p u l a t i o n . Nonsampling e r r o r s r e s u l t f r o m t h e p r o c e s s of m e a s u r i n g t h e amount of water quality material present. The measurement p r o c e s s may be d i v i d e d i n t o s a m p l e c o l l e c t i o n and l a b o r a t o r y a n a l y s i s . Sample collection includes t h e physical procedure f o r obtaining, s t o r i n g , and t r a n s p o r t i n g a water sample f o r l a t e r a n a l y s i s . L a b o r a t o r y a n a l y s i s c o n s i s t s o f some m e t h o d o f e s t i m a t i n g t h e amount ( c o n c e n t r a t i o n ) o f a g i v e n m a t e r i a l i n t h e water sample. P r e s e n t e d h e r e i s a g e n e r a l d i s c u s s i o n o f t h e s o u r c e s o f water q u a l i t y d a t a u n c e r t a i n t y , a method t o e s t i m a t e d a t a u n c e r t a i n t y i n w a t e r q u a l i t y v a r i a b l e s , and t h e i m p l i c a t i o n s o f u n c e r t a i n t y i n water q u a l i t y d a t a .
1.2
INTRODUCTION
F i n d i n g s o l u t i o n s t o water q u a l i t y management problems i s a n extremely d i f f i c u l t task. Systems a n a l y s i s t e c h n i q u e s p r o v i d e a method t o a s s i s t d e c i s i o n m a k e r s i n s o l v i n g problems. The systems a p p r o a c h t o problem s o l v i n g may b e c o n c e p t u a l i z e d a s a d e c i s i o n m a k e r who h a s a d e s i r e d o b j e c t i v e w i t h a t l e a s t t w o
18
p o s s i b l e c o u r s e s of a c t i o n a n d a s t a t e of d o u b t a s t o t h e b e s t c o u r s e of a c t i o n ( A c k o f f , 1 9 6 2 ) . After defining t h e objective, c o u r s e s o f a c t i o n , and c r i t e r i a f o r t h e s e l e c t i o n o f " b e s t " s o l u t i o n , t h e d e c i s i o n maker m u s t choose a c o u r s e of a c t i o n based upon a v a i l a b l e i n f o r m a t i o n . T h i s i n f o r m a t i o n may be q u a 1 i t a t i v e or q u a n t i t a t i v e , o r both, i n n a t u r e . Q u a l i t a t i v e i n f o r m a t i o n i s based on e x p e r i e n c e and judgement. Q u a n t i t a t i v e i n f o r m a t i o n i s o b t a i n e d t h r o u g h s t a t i s t i c a l a n a l y s i s and mathematical modeling u t i l i z i n g o b s e r v e d v a l u e s ( d a t a ) on s e l e c t e d v a r i a b l e s . U n f o r t u n a t e l y , o f t e n t h i s information is incomplete o r i n e r r o r , t h e r e f o r e t h e r e e x i s t s u n c e r t a i n t y ( s t a t e of d o u b t ) a s t o whether a g i v e n c o u r s e of a c t i o n w i l l r e s u l t i n a s p e c i f i c outcome. Types of u n c e r t a i n t y i n t h e d e c i s i o n making p r o c e s s a r e d a t a , m o d e l , p a r a m e t e r , and s u p p l e m e n t a l (Montgomery e t . a 1 . , 1 9 8 3 ) . Data u n c e r t a i n t y c o n s i s t s of e r r o r ( v a r i a b i l i t y and b i a s ) i n t h e d a t a t h a t r e s u l t e d f r o m i m p e r f e c t i o n s i n t h e sample d e s i g n and measurement t e c h n i q u e s . Model u n c e r t a i n t y r e s u l t s f r o m t h e f a c t t h a t models a r e o n l y r e p r e s e n t a t i o n s of r e a l world p r o c e s s e s , and may be d i s t u r b e d by e x t r a n e o u s s o u r c e s of v a r i a t i o n o r may n o t be complete r e p r e s e n t a t i o n s . Parameter u n c e r t a i n t y r e s u l t s from t h e f a c t t h a t model p a r a m e t e r s a r e o n l y e s t i m a t e s o b t a i n e d f r o m observed data. Supplemental u n c e r t a i n t y c o n s i s t s of a l l remaining u n c e r t a i n t y n o t a l r e a d y i n c l u d e d . For example, e r r o r s i n i n t e r p r e t a t i o n o f d a t a , t h e s e l e c t i o n o f a wrong model, o r human coding e r r o r . The o b j e c t i v e s of t h i s paper a r e t o : 1) d e s c r i b e t h e s o u r c e s of d a t a u n c e r t a i n t y i n w a t e r q u a l i t y v a r i a b l e s c o l l e c t e d by a w a t e r q u a l i t y m o n i t o r i n g network system, 2 ) p r e s e n t a method t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y , and 3 ) d i s c u s s t h e i m p l i c a t i o n s of w a t e r q u a l i t y d a t a u n c e r t a i n t y .
1.3
WATER Q U A L I T Y DATA UNCERTAINTY
Water q u a l i t y d a t a a r e o b s e r v a t i o n s o b t a i n e d from a w a t e r q u a l i t y m o n i t o r i n g system. A water q u a l i t y m o n i t o r i n g system may b e d i v i d e d i n t o two m a j o r c a t e g o r i e s d a t a a q u i s i t i o n ( o p e r a t i o n a l ) and d a t a u t i l i z a t i o n ( i n f o r m a t i o n a l ) ( S a n d e r s e t . a l . ,1983). D a t a a q u i s i t i o n c o n s i s t s of t h r e e a c t i v i t i e s : 1) network d e s i g n , 2 ) sample c o l l e c t i o n , and 3) l a b o r a t o r y a n a l y s i s . D a t a u t i l i z a t i o n a l s o c o n s i s t s o f t h r e e a c t i v i t i e s : 1) d a t a
19
h a n d l i n g , 2 ) d a t a a n a l y s i s , and 3 ) i n f o r m a t i o n u t i l i z a t i o n . Figure 1 p r e s e n t s a summary of t h e s p e c i f i c f u n c t i o n s f o r each of t h e s i x a c t i v i t i e s . The d a t a a q u i s i t i o n a c t i v i t i e s w i l l c r e a t e uncertainty i n t h e data. Model and parameter u n c e r t a i n t y w i l l r e s u l t from t h e d a t a a n a l y s i s a c t i v i t y . Supplemental u n c e r t a i n t y may occur from any of t h e w a t e r q u a l i t y m o n i t o r i n g a c t i v i t i e s . The amount of d a t a u n c e r t a i n t y due t o t h e network d e s i g n i s a f u n c t i o n of sample s i t e l o c a t i o n ( s p a t i a l ) and sampling frequency ( t e m p o r a l ) . The i n t e n t i n d e t e r m i n i n g s a m p l e l o c a t i o n a n d frequency is t o make a r e p r e s e n t a t i v e s e l e c t i o n t h a t w i l l p r o v i d e reasonable e s t i m a t e s of t h e t r u e p o p u l a t i o n c h a r a c t e r i s t i c s t h a t a r e of concern. Data u n c e r t a i n t y a r i s e s because o n l y a s u b s e t of
7- v
DATA ACQUISITION
1. 2. 3.
S t a t i o n Location Sampling Frequency Variable Selection
1. 2.
Sampling Technique F i e l d Measurements Sample P r e s e r v a t i o n Sample T r a n s p o r t
3.
iL 4.
1. 2.
3.
4.
DA~A UTILIZATION
A n a l y s i s Techniques Operational Procedures Q u a l i t y Control Data Recording
1. 2. 3.
Data Reception S c r e e n i n g and V e r i f i c a t i o n Storage Retrieval
1. 2.
Statistics Modeling
1. 2.
Reporting Formats U t i l i z a t i o n Evaluation
F i g . 1.1. Summary o f W a t e r Q u a l i t y M o n i t o r i n g A c t i v i t i e s (Sanders, e t a l . , 1 9 8 3 ) . t h e t o t a l p o p u l a t i o n i s s a m p l e d , and t h e n i n f e r e n c e s a r e made a b o u t t h e p o p u l a t i o n b a s e d on t h e l i m i t e d number o f o b s e r v e d values.
20
Sample s i t e l o c a t i o n i s d e f i n e d i n two l e v e l s - macro a n d micro l o c a t i o n . Macro l o c a t i o n r e f e r s t o s i t e l o c a t i o n i n r e f e r e n c e t o t h e e n t i r e a r e a (frame) of i n t e r e s t (e.g. watershed, a q u i f e r ) . Micro l o c a t i o n r e f e r s t o s p e c i f i c v e r t i c a l and l a t e r a l p o i n t of sample c o l l e c t i o n a t a p a r t i c u l a r macro l o c a t i o n s i t e . Data u n c e r t a i n t y a t t h e macro l o c a t i o n l e v e l i s a f u n c t i o n of t h e underlying s p a t i a l p r o c e s s t h a t governs t h e water q u a l i t y v a r i a b l e , t h e n u m b e r a n d l o c a t i o n of s a m p l e s i t e s , and t h e p o p u l a t i o n c h a r a c t e r i s t i c ( s t a t i s t i c ) of i n t e r e s t . A problem i s t h a t t h e u n d e r l y i n g p o p u l a t i o n i s n o t known ( i . e . t h a t i s what you a r e t r y i n g t o e s t i m a t e ) , t h u s one may r e a l l y n o t know whether t h e c o l l e c t e d d a t a i s r e p r e s e n t a t i v e of t h e t r u e p o p u l a t i o n . Micro l o c a t i o n d a t a u n c e r t a i n t y i s r e l a t e d t o how w e l l t h e p o i n t ( s ) of sample c o l l e c t i o n r e p r e s e n t t h e a c t u a l conditions, g i v e n t h a t s p a t i a l c o n c e n t r a t i o n g r a d i e n t s may e x i s t . For e x a m p l e , i n a r i v e r t h e r e may be c o n c e n t r a t i o n g r a d i e n t s a c r o s s t h e r i v e r and a t v a r i o u s d e p t h s . Thus, samples would h a v e t o b e t a k e n a t v a r i o u s p o i n t s v e r t i c a l l y and l a t e r a l l y , t o e s t i m a t e t h e a c t u a l c o n d i t i o n s (e.g. mean) i n t h e e n t i r e r i v e r c r o s s s e c t i o n . S a m p l i n g f r e q u e n c y i s t h e d e t e r m i n a t i o n o f t h e number of samples and s p a c i n g of samples t o be t a k e n p e r t i m e u n i t . Data u n c e r t a i n t y due t o sampling frequency a r i s e s from c o l l e c t i n g samples a t i n c o r r e c t times and f r o m l i m i t e d s a m p l e s i z e . The a p p r o p r i a t e t i m e s f o r c o l l e c t i o n of s a m p l e s i s r e l a t e d t o t h e d e s i r e d o b j e c t i v e ( s t a t i s t i c ) and t h e u n d e r l y i n g temporal p r o c e s s of t h e water q u a l i t y variable. For example, i f t h e d e s i r e d o b j e c t i v e i s t o e s t i m a t e t o t a l annual m a t e r i a l l o a d s t o a l a k e , t h e s a m p l i n g f r e q u e n c y ( t i m e of c o l l e c t i o n ) m u s t be s u c h t o account f o r s t o r m e v e n t s which may c o n t r i b u t e t h e m a j o r i t y of lake material load. D a t a u n c e r t a i n t y d u e t o t h e number of samples c o l l e c t e d r e s u l t s from t h e sample being o n l y a s u b s e t o r p r o p o r t i o n of t h e t o t a l p o p u l a t i o n . For example, i f a w a t e r q u a l i t y v a r i a b l e h a s an u n d e r l y i n g temporal p r o c e s s w i t h a weekly v a r y i n g component, t h e n c o l l e c t i n g one sample p e r w e e k may n o t be r e p r e s e n t a t i v e of t h e c h a r a c t e r i s t i c s o f t h e w a t e r q u a l i t y variable f o r t h e week. Another a s p e c t of sampling f r e q u e n c y t h a t may c a u s e d a t a u n c e r t a i n t y i s whether t h e s a m p l e s c o l l e c t e d a r e i n s t a n t a n e o u s g r a b samples o r t i m e c o m p o s i t e s , e a c h of which p r o v i d e i n f o r m a t i o n w h i c h i s r e p r e s e n t a t i v e of d i f f e r e n t t i m e intervals. T h u s , d a t a u n c e r t a i n t y may o c c u r b e c a u s e t h e t i m e
21
i n t e r v a l r e l a t e d t o t h e t y p e o f s a m p l e c o l l e c t e d may n o t b e r e p r e s e n t a t i v e i n f o r m a t i o n on t h e a c t u a l c o n d i t i o n s o r t h e de s i r e d o bj e c t i v e Sample c o l l e c t i o n i s t h e p h y s i c a l method of obtaining, s t o r i n g , and t r a n s p o r t i n g a water sample f o r l a t e r a n a l y s i s . The o b j e c t i v e o f s a m p l e c o l l e c t i o n i s t o c o l l e c t a p o r t i o n of m a t e r i a l small enough i n v o l u m e t o b e t r a n s p o r t e d c o n v e n i e n t l y and h a n d l e d i n t h e l a b , w h i l e s t i l l a c c u r a t e l y r e p r e s e n t i n g t h e Data u n c e r t a i n t y a r i s e s f r o m a l l water sour c e b e i n g s a m p l e d . t h r e e sample c o l l e c t i o n a c t i v i t i e s . The p h y s i c a l d e v i c e t h a t c o l l e c t s t h e water s a m p l e s h o u l d p r o v i d e a r e p r e s e n t a t i v e and uncontaminated sample. For example, i n ground water measurement t h e p h y s i c a l d e v i c e can cause a e r a t i o n o f t h e a n a e r o b i c w a t e r , hence c o n t a m i n a t e t h e sample. R i v e r s e d i m e n t s a m p l e r s a r e an example of t h e d i f f i c u l t y i n o b t a i n i n g r e p r e s e n t a t i v e s a m p l e s a s t h e y are unable t o sample t h e e n t i r e v e r t i c a l transect. Data u n c e r t a i n t y from sample s t o r a g e c a n b e a t t r i b u t e d t o c h e m i c a l s a d d e d t o p r e s e r v e t h e s a m p l e , f i l t r a t i o n p r i o r t o a n a l y s i s , and t h e t y p e of s t o r a g e c o n t a i n e r . The t r a n s p o r t a t i o n o f t h e s a m p l e may c r e a t e d a t a u n c e r t a i n t y due t o a g i t a t i o n , t e m p e r a t u r e , l i g h t , and t i m e u n t i l a n a l y s i s . L a b o r a t o r y a n a l y s i s i s t h e p r o c e s s of e s t i m a t i n g t h e l e v e l of m a t e r i a l p r e s e n t i n a given water sample using w e t chemistry a n d / o r i n s t r u m e n t a t i o n . Data u n c e r t a i n t y i n l a b o r a t o r y a n a l y s i s a c t i v i t i e s r e s u l t s from i n t e r f e r e n c e s d u e t o o t h e r v a r i a b l e s i n t h e sample, c a l i b r a t i o n procedures, sloppy experimental t e c h n i q u e , bad o r o l d s t a n d a r d i z e d r e a g e n t s , d e f e c t i v e i n s t r u m e n t s , a n d when m a t e r i a l l e v e l s a r e n e a r t h e d e t e c t i o n l i m i t of t h e a n a l y t i c a l t e c h n i q u e . The p r e s e n c e of h i g h s e d i m e n t c o n c e n t r a t i o n s i s a n e x a m p l e o f a v a r i a b l e t h a t may c a u s e p o t e n t i a l i n t e r f e r e n c e s i n c e r t a i n a n a l y t i c a l t e c h n i q u e s ( e . g. dissolved metals). Calibration procedures are inherently u n c e r t a i n due t o t h e s c a t t e r of t h e d a t a a n d t h e n e c e s s i t y of d e t e r m i n i n g a l i n e o f " b e s t f i t " t o r e l a t e t h e measurement readings t o t h e variable concentrations.
.
1.4
DATA UNCERTAINTY ESTIMATION
One method t o e v a l u a t e t h e amount of d a t a u n c e r t a i n t y i s t h e t o t a l s u r v e y d e s i g n (TSD) concept (Horvitz,l978). The TSD
22
concept a t t e m p t s t o minimize t h e t o t a l e r r o r of t h e e s t i m a t e o f i n t e r e s t by c o n t r o l l i n g t h e m a g n i t u d e o f i n d i v i d u a l e r r o r components through a l l o c a t i o n of a v a i l a b l e r e s o u r c e s . In order t o a p p l y t h e TSD c o n c e p t , i n f o r m a t i o n i s needed on t h e d e s i r e d m o n i t o r i n g ( s u r v e y ) o b j e c t i v e , model of s u r v e y e r r o r s , c o s t s o f a l t e r n a t i v e measurement p r o c e d u r e s , and c o s t s of e r r o r s i n d e c i s i o n making. Only t h e concept of s u r v e y model e r r o r s as used t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y , w i l l be examined i n t h i s paper. The t o t a l s u r v e y e r r o r i n m o n i t o r i n g programs i s a f u n c t i o n of both sampling and nonsampling (measurement) e r r o r c o m p o n e n t s and Sampling e r r o r may he estimated i n terms of v a r i a b i l i t y and b i a s . i s a f u n c t i o n of t h e n e t w o r k d e s i g n ( s t a t i s t i c a l s a m p l i n g t e c h n i q u e ) , u n d e r l y i n g p o p u l a t i o n of i n t e r e s t , and sample s i z e . Nonsampling e r r o r i s r e l a t e d t o t h e s p e c i f i c methods u s e d i n t h e measurement p r o c e s s . V a r i a b i l i t y may be e x p r e s s e d i n terms of t h e s u r v e y m e t h o d p r e c i s i o n , w h i c h may b e d e f i n e d a s r e p r o d u c i b i l i t y of a method when i t i s r e p e a t e d on a homogenous sample under c o n t r o l l e d c o n d i t i o n s r e g a r d l e s s i f s y s t e m a t i c e r r o r s a r e p r e s e n t o r not. The b i a s term measures t h e d i f f e r e n c e between t h e measured v a l u e and t h e t r u e v a l u e . A c c u r a c y may be d e f i n e d a s t h e a g r e e m e n t b e t w e e n t h e amount o f a component measured by a method and t h e a c t u a l amount p r e s e n t . Thus, f o r m e t h o d t o b e a c c u r a t e i t s h o u l d h a v e no b i a s a n d s m a l l variability. A g e n e r a l s u r v e y e r r o r model may be e x p r e s s e d a s (Kish,1965): T o t a l E r r o r = RMSE = (V:
+
2
Vns
+
COV(s,ns)
w h e r e RMSE = r o o t mean s q u a r e e r r o r ; V,
+
2
BS
+
2 )1/2 Bns
= sampling v a r i a n c e ; Vns
= nonsampling v a r i a n c e ; COV(s,ns) = c o v a r i a n c e between sample and nonsample; B, = sampling b i a s ; Bns = nonsampling b i a s .
I n g e n e r a l , t h e C O V t e r m may b e c o n s i d e r e d z e r o o r i n s i g n i f i c a n t i n w a t e r q u a l i t y m o n i t o r i n g d u e t o l a c k of Also, c o r r e l a t i o n between t h e two e r r o r components (Vs and V n s ) . t h e s a m p l i n g b i a s term i s u s u a l l y d r o p p e d s i n c e t h e t r u e p o p u l a t i o n must be known f o r it t o be e s t i m a t e d . Thus, a g e n e r a l
23
w o r k i n g model f o r e s t i m a t i n g t h e t o t a l s u r v e y e r r o r i n water q u a l i t y v a r i a b l e s is: 2 RMSE = (Vs
+
2 Vns
+
Bns
)1/2
(2)
T h i s model i s based on a s i n g l e macro and micro s t a t i o n , w i t h t h e term a f u n c t i o n of o n l y sampling f r e q u e n c y e r r o r , e s t i m a t e s of
V,
which a r e d i s c u s s e d i n t h e n e x t paragraph. The model may b e e x p a n d e d t o i n c l u d e : m i c r o l o c a t i o n e r r o r , by i n c l u d i n g e r r o r terms e s t i m a t e d f r o m o b s e r v e d c r o s s s e c t i o n d a t a ; a n d m a c r o l o c a t i o n e r r o r , by i n c l u d i n g e r r o r t e r m s e s t i m a t e d u s i n g s p a t i a l s t a t i s t i c s on o b s e r v a t i o n s a t d i f f e r e n t sample s i t e l o c a t i o n s . The s a m p l i n g v a r i a n c e t e r m is a f u n c t i o n of t h e s t a t i s t i c of i n t e r e s t (e.g. mean, v a r i a n c e , etc.) and t h e s t a t i s t i c a l sampling t e c h n i q u e ( e . g. random, s t r a t i f i e d , e t c . 1 u t i l i z e d . E s t i m a t o r s f o r v a r i o u s s t a t i s t i c s and s a m p l i n g t e c h n i q u e s may be f o u n d i n m o s t s t a t i s t i c a l s a m p l i n g t e x t s ( e . g . C o c h r a n , l 9 7 7 ) . For example, an estimate of sampling v a r i a n c e ' f o r t h e mean of samples o b t a i n e d f r o m s i m p l e random s a m p l i n g i s t h e s t a n d a r d e r r o r of t h e mean:
where sz = e s t i m a t o r of s t a n d a r d e r r o r of mean; s
= e s t i m a t o r of
sample v a r i a n c e ; n = sample s i z e ; f = sampling f r a c t i o n (n/N) For sampling from i n f i n i t e p o p u l a t i o n s w i t h o u t replacement, a s i n t h e c a s e of w a t e r q u a l i t y , t h e sampling f r a c t i o n term ( f ) i s v e r y s m a l l , t h u s one minus f may be assumed t o e q u a l one. While t h e sampling b i a s term is u s u a l l y n e g l e c t e d i n t h e water q u a l i t y s u r v e y e r r o r model because of l a c k o f i n f o r m a t i o n a b o u t t h e t r u e p o p u l a t i o n , t h e r e a r e some i m p o r t a n t p o i n t s t o c o n s i d e r under t h i s assumption. The a p p l i c a t i o n o f c e r t a i n s a m p l i n g t e c h n i q u e s may c r e a t e a biased e s t i m a t e f o r c e r t a i n s t a t i s t i c s . For example, a s e a s o n a l p r o c e s s t h a t i s sampled v i a simple random s a m p l i n g may b e b i a s e d i n t e r m s of t h e e s t i m a t e of t h e annual t o t a l . Two methods t o p r o v i d e some i n f o r m a t i o n on t h e p o t e n t i a l amount o f s a m p l i n g b i a s a r e p o s t e x a m i n a t i o n of d a t a and monte
24
c a r l o s i m u l a t i o n a n a l y s i s . P o s t examination of t h e d a t a c o n s i s t s of examining t h e observed d a t a a f t e r a c e r t a i n t i m e period f o r t h e underlying temporal and s p a t i a l p r o c e s s e s . Then, r e e v a l u a t i n g t h e s t a t i s t i c a l sampling technique u t i l i z e d f o r p o t e n t i a l b i a s i n l i g h t of o b s e r v e d p r o c e s s e s . Monte c a r l o s i m u l a t i o n a n a l y s i s may be u s e d t o s i m u l a t e c e r t a i n u n d e r l y i n g temporal o r s p a t i a l p r o c e s s e s and t h e n a p p l y a g i v e n s t a t i s t i c a l s a m p l i n g t e c h n i q u e and e s t i m a t e b i a s f o r c e r t a i n s t a t i s t i c s and v a r i o u s sample s i z e s . The amount of nonsampling v a r i a b i l i t y i s a s a f u n c t i o n of t h e v a r i a b i l i t y i n sample c o l l e c t i o n and l a b o r a t o r y a n a l y s i s procedures. F o r some w a t e r q u a l i t y v a r i a b l e s i t may a l s o be n e c e s s a r y t o make t h e amount of nonsampling e r r o r a f u n c t i o n o f t h e l e v e l of m a t e r i a l p r e s e n t , s i n c e l a b o r a t o r y v a r i a b i l i t y may i n c r e a s e a t v e r y low ( d e t e c t i o n l i m i t ) and h i g h c o n c e n t r a t i o n s . To e s t i m a t e t h e nonsampling v a r i a b i l i t y component of t h e t o t a l s u r v e y e r r o r model r e q u i r e s a method t o combine t h e e s t i m a t e s o f t h e v a r i a b i l i t y from t h e d i f f e r e n t measurement a c t i v i t i e s . The g e n e r a l procedure i s t o develop a model of t h e measurement p r o c e s s , which i s a f u n c t i o n of t h e measurement t e c h n i q u e s used f o r t h e s p e c i f i c water q u a l i t y v a r i a b l e . Then, some method i s a p p l i e d t o p r o p a g a t e t h e e r r o r of i n d i v i d u a l model components through t h e measurement p r o c e s s model t o e s t i m a t e t h e t o t a l e r r o r i n t h e estimated concentration. Some e x a m p l e s o f e r r o r p r o p a g a t i o n methods a r e s e n s i t i v i t y a n a l y s i s , f i r s t - o r d e r e r r o r a n a l y s i s , p r o b a b i l i t y t h e o r y o f f u n c t i o n s of random v a r i a b l e s , and monte c a r l o s i m u l a t i o n . Readers a r e r e f e r r e d t o B e r t h o u e x ( l 9 7 5 ) a n d E v a n s e t . a l . (1984) f o r d e s c r i p t i o n s of t h e f o u r methods. The a p p l i c a t i o n of t h e e r r o r p r o p a g a t i o n m e t h o d s r e q u i r e some f o r m o f e s t i m a t e s o f t h e v a r i a b i l i t y i n t h e i n d i v i d u a l components measurement mode. T h e s e e s t i m a t e s may b e o b t a i n e d f r o m e x p e r i m e n t a t i o n o r may b e f o u n d i n t e c h n i c a l j o u r n a l s and l a b o r a t o r y r e f e r e n c e manuals (e.g. SMEWW (19801, EPA (1979). The g e n e r a l method t o e s t i m a t e p r e c i s i o n i n m e a s u r e m e n t a c t i v i t i e s ( e . g . l a b o r a t o r y a n a l y s i s ) i s t h e c o e f f i c i e n t of variation (often referred a s t h e r e l a t i v e standard deviation) (USGSp1982) :
25
cv=s -X
(4)
I f t h e p r e c i s i o n i s c o n s t a n t over a l l c o n c e n t r a t i o n l e v e l s , which may be d e t e r m i n e d by s t a t i s t i c a l l y t e s t i n g f o r h o m o g e n e o u s v a r i a n c e , t h e p r e c i s i o n of t h e s p e c i f i c measurement component may be e s t i m a t e d a s (USGS,1982) :
To account f o r changes i n p r e c i s i o n a s a f u n c t i o n of t h e m a t e r i a l l e v e l , a s i m p l e s t a t i s t i c a l model may be d e v e l o p e d . If the p r e c i s i o n v a r i e s l i n e a r l y w i t h t h e l e v e l of m a t e r i a l , a l i n e a r r e g r e s s i o n l i n e c a n be d e t e r m i n e d . For c u r v i l i n e a r v a r i a t i o n , For e i t h e r m o d e l , a n p o l y n o m i a l e q u a t i o n s may b e f i t t e d . e s t i m a t e of model u n c e r t a i n t y ( e r r o r ) s h o u l d be i n c o r p o r a t e d i n t o t h e e s t i m a t e of t o t a l d a t a u n c e r t a i n t y . G e n e r a l l y , recommended sample measurement p r o c e d u r e s should be f r e e of b i a s . However, i n some i n s t a n c e s b i a s f r e e methods may n o t be a v a i l a b l e ; t h e n a n e s t i m a t e o f b i a s i s needed. AS w i t h measurement y a r i a b i l i t y , b i a s e s t i m a t e s f o r some measurement techniques a r e available in the literature. I n a d d i t i o n , it may a l s o be n e c e s s a r y t o make t h e b i a s e s t i m a t e a f u n c t i o n o f t h e l e v e l of m a t e r i a l . The a b o v e method f o r a c c o u n t i n g f o r m e a s u r e m e n t e r r o r i s r e l a t e d t o intra-measurement e r r o r (i.e. e r r o r associated with one m e a s u r e m e n t p r o c e s s ) . I n c a s e s where d a t a a r e c o l l e c t e d a n d / o r a n a l y z e d by d i f f e r e n t l a b o r a t o r y t e c h n i q u e s , p e r s o n n e l , and a g e n c i e s i t i s n e c e s s a r y t o a c c o u n t f o r i n t e r - m e a s u r e m e n t e r r o r i n t h e e s t i m a t e o f t o t a l measurement e r r o r . Intermeasurement e r r o r may be e s t i m a t e d by t h e same methods a s i n t r a measurement e r r o r . I n g e n e r a l , s i n c e inter-measurement e r r o r w i l l be l a r g e r t h a n i n t r a ( b e c a u s e i n t e r i n c l u d e s i n t r a ) i n t e r measurement e r r o r may be used t o r e p r e s e n t measurement e r r o r .
IMPLICATIONS OF DATA UNCERTAINTY The v a l u e of e s t i m a t i n g t h e t o t a l amount of w a t e r q u a l i t y d a t a u n c e r t a i n t y by i n c o r p o r a t i n g b o t h sampling and nonsampling e r r o r s 1.5
26
c a n be v e r y i m p o r t a n t . Four e x a m p l e s of t h e u s e f u l n e s s of t h e i n f o r m a t i o n g e n e r a t e d by e s t i m a t i n g d a t a u n c e r t a i n t y w i l l b e 1) t h e e f f e c t of b i a s on h y p o t h e s i s discussed. These a r e : t e s t i n g , 2) t h e e f f e c t of n o t i n c l u d i n g n o n s a m p l i n g e r r o r i n t o t h e e s t i m a t e of v a r i a n c e , 3 ) t h e d e t e r m i n a t i o n of o p t i m a l sample s i z e , and 4 ) t h e e f f e c t of d a t a u n c e r t a i n t y on d e c i s i o n making. To e x a m i n e t h e e f f e c t of b i a s on h y p o t h e s i s t e s t i n g , assume t h a t a sample i s c o l l e c t e d w i t h a mean ( 5 1 ) ; n o r m a l l y d i s t r i b u t e d s a m p l e a n d p o p u l a t i o n ; p o s i t i v e b i a s of B = /1 - Z, where A i s t h e t r u e mean; a n d t h a t t h e amount of b i a s i n t h e s a m p l e i s unknown. Hence t h e s t a n d a r d d e v i a t i o n c a l c u l a t e d from o b s e r v e d v a l u e s i s a b o u t x and n o t U. The e f f e c t of t h e b i a s i s t o i n c r e a s e t h e r e g i o n of r e j e c t i o n a t t h e upper t a i l and d e c r e a s e t h e r e g i o n a t t h e l o w e r t a i l , w h e r e t h e amount o f c h a n g e i s a f u n c t i o n of o n l y t h e r a t i o of b i a s t o s t a n d a r d d e v i a t i o n (Cochran,l977). For example, when t h e b i a s e q u a l s 0 . 4 times t h e e s t i m a t e d s t a n d a r d d e v i a t i o n t h e p r o b a b i l i t y of e r r o r of more t h a n 1 . 9 6 ~(6i s p o p u l a t i o n s t a n d a r d d e v i a t i o n ) i s : 0 . 0 6 8 5 compared t o a c t u a l v a l u e o f 0.05 f o r t h e t o t a l r e g i o n ; 0.0594 compared t o a c t u a l v a l u e of 0 . 0 2 5 f o r t h e l o w e r r e g i o n ; a n d 0 . 0 0 9 1 compared t o a c t u a l v a l u e of 0.025 f o r t h e upper r e g i o n . Thus, i f one were t e s t i n g f o r a t r e n d ( t w o - t a i l e d ) i n a w a t e r q u a l i t y v a r i a b l e t h a t was measured w i t h a n e g a t i v e b i a s e q u a l t o 0 . 4 , t h e a c t u a l p r o b a b i l i t y would b e 0 . 0 6 8 5 , and n o t 0.05, f o r r e j e c t i o n of t h e n u l l h y p o t h e s i s . The a f f e c t of n o t i n c l u d i n g an e s t i m a t e of n o n s a m p l i n g e r r o r w i t h s a m p l i n g e r r o r i s t o create an e s t i m a t e of v a r i a n c e s m a l l e r t h a n t h e a c t u a l amount. This could r e s u l t i n r e j e c t i n g a n u l l h y p o t h e s i s of e q u a l means o r v a r i a n c e s when i n r e a l i t y t h e r e i s no s t a t i s t i c a l d i f f e r e n c e . Hence, o n e i n c r e a s e s t h e l e v e l of Type I e r r o r . A p r o b l e m t h a t a r i s e s when d a t a u n c e r t a i n t y i s accounted f o r i s how t o s t a t i s t i c a l l y a n a l y z e a sequence of d a t a , when e a c h d a t a p o i n t i s n o t a s p e c i f i c v a l u e , b u t a c t u a l l y is r e p r e s e n t e d by some d i s t r i b u t i o n t h a t a t t e m p t s t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y . T h e p r o b l e m may b e f u r t h e r complicated s i n c e t h e d i s t r i b u t i o n parameters and/or t h e d i s t r i b u t i o n f u n c t i o n i t s e l f may c h a n g e a s t h e amount of t h e w a t e r v a r i a b l e changes.
27
The d e t e r m i n a t i o n of o p t i m a l sample s i z e i s a f u n c t i o n of t h e s p e c i f i e d n e t w o r k d e s i g n and t h e l e v e l of p r e c i s i o n r e q u i r e d . One method t o d e t e r m i n e o p t i m a l s a m p l e s i z e i s t o c o n v e r t t h e t o t a l d a t a e r r o r and sample s i z e i n t o commensurable c o s t s i n a n e q u a t i o n of t o t a l e x p e c t e d c o s t (TEC). The TEC i s e q u a l t o t h e c o s t of t a k i n g n o b s e r v a t i o n s and t h e e x p e c t e d c o s t of e r r o r i n v o l v e d i n u s i n g a n e s t i m a t e of a p o p u l a t i o n parameter i n s t e a d of t h e p o p u l a t i o n p a r a m e t e r i t s e l f . The c o s t of t a k i n g n o b s e r v a t i o n s i n c l u d e s a f i x e d c o s t and t h e c o s t of t a k i n g a s p e c i f i c sample, and i s dependent upon t h e s t a t i s t i c a l s a m p l i n g t e c h n i q u e employed. T h e e x p e c t e d c o s t o f e r r o r may b e r e p r e s e n t e d by a l i n e a r o r q u a d r a t i c f u n c t i o n and i s r e l a t e d t o both sampling and nonsampling e r r o r s . The o p t i m a l sample s i z e i s determined by d i f f e r e n t i a t i n g t h e TEC w i t h r e s p e c t t o n , s e t t i n g t h e d e r i v a t i v e t o z e r o , and s o l v i n g f o r t h e o p t i m a l sample s i z e . I f t h e d e s i r e d o b j e c t i v e were t o d e v e l o p t h e a c t u a l o p t i m a l s a m p l e s i z e , e s t i m a t e s of b o t h s a m p l i n g and nonsampling e r r o r s a r e n e c e s s a r y , o t h e r w i s e a s m a l l e r sample s i z e t h a n a c t u a l l y needed would be s e l e c t e d . For a d e c i s i o n making p r o b l e m , a s f o r m u l a t e d i n t h e s y s t e m s a n a l y s i s f r a m e w o r k , t h e T E C may b e e x p a n d e d t o a l l o w t h e s e l e c t i o n t h e b e s t c o u r s e ' o f a c t i o n . The e x p a n d e d T E C i n c l u d e s c o s t c o m p o n e n t s f o r c o s t of t h e a n a l y s i s o f a g i v e n c o u r s e of a c t i o n , e x p e c t e d c o s t of a Type I e r r o r , and e x p e c t e d c o s t f o r a Type I1 e r r o r . The T E C i s t h e s u m of t h e f o u r c o s t components and t h e b e s t a c t i o n i s t h e o n e w i t h minimum c o s t . Thus, i f n o n s a m p l i n g e r r o r i s n o t accounted, a m i s t a k e may be made i n t h e s e l e c t i o n of t h e b e s t c o u r s e of a c t i o n .
1.6
CONCLUSIONS
I n f o r m a t i o n i s needed t o a s s i s t i n making w a t e r q u a l i t y management d e c i s i o n s . The worth of t h i s i n f o r m a t i o n i s i n v e r s e l y r e l a t e d t o t h e amount of u n c e r t a i n t y i n t h e i n f o r m a t i o n . A n i m p o r t a n t f a c t o r t h a t a f f e c t s i n f o r m a t i o n u n c e r t a i n t y is d a t a uncertainty. D a t a u n c e r t a i n t y i s a f u n c t i o n of s a m p l i n g a n d n o n s a m p l i n g e r r o r s . Sampling e r r o r s r e s u l t from c o l l e c t i n g o n l y a s u b s e t of a t o t a l p o p u l a t i o n . Nonsampling e r r o r s r e s u l t from t h e measurement p r o c e s s o f e s t i m a t i n g t h e l e v e l of some water qua1 i t y v a r i a b l e .
28
One m e t h o d t o e s t i m a t e t h e amount of d a t a u n c e r t a i n t y i s t o The model d e v e l o p a model o f t h e t o t a l s u r v e y e r r o r . i n c o r p o r a t e s v a r i a b i l i t y a n d b i a s terms f o r b o t h sampling and nonsampling components. T r a d i t i o n a l l y , t h e n o n s a m p l i n g e r r o r term of w a t e r q u a l i t y v a r i a b l e s h a s been n e g l e c t e d o r assumed t o be z e r o , r e s u l t i n g i n a n e s t i m a t e of v a r i a n c e s m a l l e r t h a n t h e t r u e amount.
T h i s may
r e s u l t i n i n c o r r e c t e s t i m a t e s o f Type I a n d Type I1 e r r o r s i n h y p o t h e s i s t e s t i n g and poor e s t i m a t e s of t h e o p t i m a l sample s i z e t o collect. B e s i d e s s a m p l i n g a n d n o n s a m p l i n g e r r o r s , o t h e r f a c t o r s may i n c r e a s e t h e amount o f i n f o r m a t i o n u n c e r t a i n t y . Examples include: d a t a r e c o r d i n g e r r o r s , biased s t a t i s t i c a l estimators, data processing e r r o r s , m i s i n t e r p r e t a t i o n of
i n a c c u r a t e computer
data analysis results,
programs,
and e r r o r i n
e s t i m a t e s of d a t a u n c e r t a i n t y . T h e i n c o r p o r a t i o n of u n c e r t a i n t y a n a l y s i s i n t h e d e c i s i o n making p r o c e s s i s needed t o h e l p d e c r e a s e t h e s t a t e o f d o u b t a s t o t h e b e s t c o u r s e o f a c t i o n t o t a k e t o s o l v e a problem. More r e s e a r c h i s needed t o d e v e l o p t h e : methodology of w a t e r q u a l i t y d a t a u n c e r t a i n t y a n a l y s i s , q u a n t i t a t i v e techniques t o estimate d a t a u n c e r t a i n t y , d a t a needed t o make e s t i m a t e s of w a t e r q u a l i t y d a t a u n c e r t a i n t y , and s t a t i s t i c a l and m a t h e m a t i c a l t e c h n i q u e s t o a n a l y z e w a t e r q u a l i t y d a t a t h a t h a s a n a s s o c i a t e d e s t i m a t e of data uncertainty.
1.7
ACKNCWLEDGMENTS T h i s r e s e a r c h was p a r t i a l l y funded by O f f i c e of Water Research and Technology g r a n t No. 14-08-001-G-1060. The a s s i s t a n c e of Dr. J a m e s L o f t i s i n h e l p i n g t o f o r m u l a t e t h e c o n c e p t s on w a t e r q u a l i t y d a t a u n c e r t a i n t y is appreciated.
REFERENCES 1962. S c i e n t i f i c Method: o p t i m i z i n g a p p l i e d Ackoff, R.L., research decisions. John Wiley and Sons, N e w York, New York. Berthouex, P.M., 1975. Modeling c o n c e p t s c o n s i d e r i n g p e r f o r m a n c e , v a r i a b i l i t y and u n c e r t a i n t y . I n M a t h e m a t i c a l Modeling f o r Water P o l l u t i o n C o n t r o l , ed. T.M K e i n a t h and M.P. W a n i e l i s t a , p405-440. Ann Arbor S c i e n c e P u b l i s h e r s , Ann Arbor, Michigan.
29
Cochran, W. G. , 1 9 7 7 . S a m p l i n g T e c h n i q u e s ( 3 r d E d i t i o n ) . John Wiley and S o n s , N e w York, N e w York. EPA, 1 9 7 9 . Handbook f o r A n a l y t i c a l Q u a l i t y C o n t r o l i n Water and Wastewater L a b o r a t o r i e s (EPA-60014-70-019) U. S. Environmental P r o t e c t i o n Agency, Washington, D.C. Evans, J . S . , C o o p e r , D. W . , and Kinney, P., 1984. On t h e Envirn. p r o p a g a t i o n o f e r r o r i n a i r p o l l u t i o n measurements. M o n i t o r i n g and Assesment 4 (1984) :139-153. H o r v i t z , D.G., 1978. Some d e s i g n i s s u e s i n sample s u r v e y s . I n : S u r v e y Sampling and M e a s u r e m e n t , e d . N . K . N a m o o d i r i , p3-11. Academic P r e s s , New York, New York. Kish, L., 1965. S u r v e y Sampling. John Wiley and S o n s , N e w York, N e w York. Montgomery, R . H . , L e e , V. D . , and Reckhow, K . H., 1983. P r e d i c t i n g v a r i a b i l i t y i n a Lake O n t a r i o p h o s p h o r u s model. J. Great L a k e s R e s . 9 ( 1 ) : 7 4 - 8 2 . S a n d e r s , T.G., Ward, R. C., L o f t i s , J. C., S t e e l e , T. D . , A d r i a n , D. D . , a n d Y e v j e v i c h , V., 1 9 8 3 . Design of Networks f o r Monitoring Water Q u a l i t y . Water Resources Publications, L i t t l e t o n , C o l o r ado. SMEWW, 1 9 8 0 . S t a n d a r d Methods f o r E x a m i n a t i o n o f Water and Wastewater ( 1 5 t h E d i t i o n ) . American P u b l i c Health A s s o c i a t i o n , Washington, D.C. USGS, 1982. Q u a l i t y a s s u r a n c e p r a c t i c e s f o r t h e chemical and b i o l o g i c a l a n a l y s e s o f water and f l u v i a l s e d i m e n t s . Book 5 , Chap. A6 o f T e c h n i q u e s o f Water-Resources I n v e s t i g a t i o n of t h e U n i t e d S t a t e s Geological Survey. U.S. G e o l o g i c a l S u r v e y , Washington, D. C.
.
TEE USE OF MULTIVARIATE METHODS IN THE INTERPRETATION OF WATER QUALITY MONITORING DATA OF A LARGE NORTHERN RESERVOIR R. SCHETAGNE Biologist, Andre Marsan et Associgs Inc., formerly from SocigtS d'snergie de la Baie James ABSTRACT The SociStg d'gnergie de la Eaie James Engineering and Environment Department has established an ecological monitoring network on the La Grande Complex, Quebec, Canada. Water quality studies of the La Grande 2 reservoir were initiated in 1977, two years before its impoundment, and continued for five years after its filling. Principal component a n a l y s e s were successfully used to single out the parameters showing the greatest changes (pH, dissolved oxygen, chlorophyll a and a number of nutrients), and to present the data in a clear and synthetic manner. Hierarchical clustering analysis provided good results when used on bottom data where redox decline triggered sharp chemical changes. These methods showed that the process of decomposition of submerged organic matter and the simple mixing of waters of different quality account for most of the changes measured. INTRODUCTION In its first phase, the La Grande Complex calls for the construction of three powerhouses on the La Grande Riviere at sites known as La Grande 2, La Grande 3 and La Grande 4 , as well as the partial diversion of the Eastmain and Opinaca rivers, from the South, and the Caniapiscau river, from the East. Conscious that its project will have repercussions on the ecology of the territory, the SociBtg d'Bnergie de la Baie James Engineering and Environment Department has established an ecological monitoring network with the following objectives: evaluate physical, chemical and biological changes, rationalize corrective measures and improve methods of predicting impacts in future projects. The use of principal component analysis and hierarchical clustering analysis on the water quality data greatly helped in a c h i e v i n g these objectives.
31
This paper is i n t e n d e d to provide an example of the use of multivariate methods in an ecological investigation rather than a discussion of its mathematical basis. MATERIAL AND METHODS - Samplig Program The impounding of the La Grande 2 reservoir began towards the end of 1 9 7 8 and lasted for one full year. It has a total 2 2 area of 2 , 8 3 6 km which includes 2,629 km of flooded terrestrial soils. Its mean flowthrough time is 6.3 months, its 3 mean depth, 2 2 metres, its total volume, 6 2 . 4 km and its mean drawdown, 3 metres. A total of six sampling stations permitted the monitoring of water quality in this reservoir. Their locations were selected in terms of inflow and morphometric conditions (Figure 1). A control station was set up in an undisturbed environment, to allow variations due to the impounding to be distinguished from those due to natural factors. Water quality sampling in the area of the La Grande 2 reservoir was initiated in 1 9 7 7 , almost two years before its impoundment, and continued for five years after its filling. Water was sampled every two weeks during the icefree period and four times, while under ice cover, between early December to early June. From 1 9 8 2 to 1 9 8 4 , winter sampling was reduced to one campaign which took place at the end of the season, when maximum variations were encountered. In addition to measuring the temperature, dissolved oxygen and conductivity from the surface to the bottom in situ, two types of samples were taken at each station. The first was a water column from the surface to a depth of 10 metres, usually the photic zone. In addition to chlorophyll pigments, the parameters analyzed in the laboratory from these samples were pH, conductivity, major anions (chlorides, bicarbonates and sulfates), color, transparency, nutrients (total Kjeldahl nitrogen, total phosphorus, total inorganic carbon, total organic carbon and silica) and finally tannins and lignins. The second type of sample was taken one metre from the bottom. This sampling coincided with the summer and winter low water levels and the spring turnover. The parameters were limited to the nutrients, pH, conductivity,
O
Figure 1 Localisation of water quality stations
Table 1 Summer mean values for the parameters showing the most change ( La Grande 2 Reservoir
Parameters
Natural Conditions
1
Post-impoundment
Flooding
1981 or 1982
1984
1979
maxinun
last year
Dissolved oxygen (% saturation)
82
89
PH
64
64
Total inorganic carbon (mg/l of C )
1,8
1,4
Total phosphorus (!@I of P)
15
12
3,1
23
0,9
0,9
1978
Chlorophylla (pg/l) Silica (mg/l of S i q )
33
bicarbonates, temperature and dissolved oxygen. The analytic methods used comply with the methods described by APHA ( 1 9 7 1 ) .
- Mathematical Processing of Reservoir Data The sampling program has yielded a great amount of data. Principal component a n a l y s e s were performed on the raw data matrixes in which a water sample (rows) is described by 1 6 physical chemistry parameters (columns). This analysis allows a summary of the major portion of the total variance of a set of data by a few major dimensions (Legendre et Legendre, 1 9 7 9 ) . The result is a two dimensional graphic instead of the 1 6 original dimensions (each parameter). In our graphics the relation between the parameters are superimposed on the reduced space ordination of the sample points, as this facilitates the interpretation (Jolicoeur and Mosimann, 1 9 6 0 ) . The dotted line in Figures 2 and 6 is the circle of signification defined by Scherrer ( 1 9 8 5 ) . Only parameters longer than the radius of this circle contribute significantly (at 9 5 % probability level) to the formation of the plane determined by the two principal axes. The clustering analysis used is the hierarchical flexible linkage clustering analysis (Lance and Williams, 1 9 6 6 and 1 9 6 7 ) with the parameters set at: aj 0. 625, a m E: 0. 625, P = 0 .2 5 and Y p 0. This analysis was performed on a Gower similarity coefficient matrix (Legendre et Legendre, 1 9 7 9 ) . RESULTS Figures 3 and 4 show the distribution of the temperature and dissolved oxygen saturation percentage isolines at a typical La Grande 2 reservoir station through one complete year. The reservoir presented, overall, two periods of thermal stratification (intense in summer and inverse and weak in winter) interspersed with two periods of water mixing. The deficiency in dissolved oxygen resulting from the degradation of submerged organic matter (Schetagne, 1 9 8 0 ) , has proved to be greatest in winter, when the presence of ice prevents contact with the atmosphere. The spring and fall turnover periods allow a reoxygenation of the deep zones and a redistribution of the products of degradation throughout the column. -Major biophysical phenomena were observed in two zones
34
of the reservoir: the summer photic zone and the bottom zone which is alternately anoxic and well oxygenated.
- Photic Zone During the Ice Free Period Figure 2 illustrates the parameter vectors in the reduced space obtained by principal component analysis performed on a total of 384 integrated samples (0-10 metres depth) taken during ice-free periods in 1978 (natural conditions),l979 (impounding) and from 1980 to 1984 (operation). This figure shows the correlation between the various parameters and their relative contribution to each of the principal axes. It shows strong correlation between conductivity, bicarbonates and pH. These correlations are well known, since these parameters correspond to the main ions. This figure also shows good correlations between the following parameters: tannins, total organic carbon, color and total Kjeldahl nitrogen. These parameters all measure the organic matter present which, before or after the impounding, was essentially of allochthonous origin (Schetagne, 1985). The good correlation between this group and the first may be linked to the influence of pH on the degradation of organic matter of plant origin (Campbell et al., 1976 and Sylvester and Seabloom, 1965). A strong, but inverse, correlation can be seen between the dissolved oxygen and the total phosphorus and total inorganic carbon parameters. The degradation of submerged organic matter, whether chemical or biological, induces consumption of the dissolved oxygen, the release of C 0 2 (which, at the pH values measured, accounts for a good part of the total inorganic cax-bon) and the release of nutrients like phosphorus (Wetzel, 1975) A strong correlation is also noted between total phosphorus and chlorophyll a, which, according to Berman and Eppley (1974), can be considered a good measure of the phytoplanktonic biomass. The importance of the phosphorus content in relation to the phytoplanktonic production is also well documented in the literature. Chlorophyll a also seems to be inversely correlated to the silica. This might be explained by the role played by the diatoms, a phytoplanktonic group which is important in this region, in the
.
Figure 2 R d u u d pax ordination of the centroids for each year of all the samples as defined by the first two axes.
'
0
TOT0 STATION
Pheo.
*. 1978 x A
+ A 0
1979 1980 1981 1982 1983 1984
-I1
W
cn
36
annual silica cycle (Magnin, 1 9 7 7 and Wetzel, 1 9 7 5 ) . The first two axes account for 2 7 and 1 8 % , respectively, of the total variance. Color, conductivity, tannins, bicarbonates, total organic carbon and total Kjeldahl nitrogen contribute most to the first axis. A s we will see this axis can be described as the dilution axis because it demonstrates firstly, the heterogeneity of the source stations in terms of degree of mineralization and organic content, and secondly the subsequent homogenization and dilution as a result of impounding by less rich waters. Silica, pheopigments, chlorophyll a, total inorganic carbon, total phosphorus and dissolved oxygen contribute most to the formation of the second axis. This axis is a better characterization of the events associated with decomposition. Indeed, decomposition of submerged organic matter resulted in a rise in nutrients (such as total phosphorus), which in turn resulted in an increase in phytoplankton biomass (chlorophyll pigments) , which caused a drop in silica content. Figure 2 also shows the centroids of the samples collected each summer period at each station. The c o o r d i n a t e s of a centroid are the mean c o o r d i n a t e s of the samples it represents. The interpretation is done with the reduced space ordination of all the samples but only the centroids are shown here. This figure shows that the samples taken in 1 9 7 8 (symDol % ) , under natural conditions, are essentially distributed along the first axis from left to right. This demonstrates the heterogeneity of the stations before impoundment. The migration of the centroids in time are from the bottom to the top of this figure. Generally, the water was relatively poor in chlorophyll pigments, total inorganic carbon and total phosphorus, but relatively rich in dissolved oxygen and silica. Movement toward the upper part of the figure indicates a decrease in the silica and dissolved oxygen content and, inversely, a rise in the concentration of phosphorus, total inorganic carbon and chlorophyll (symbols X to 4 ) A general migration of the points toward the center of the figure in time can be seen along the first axis. This
.
37
represents an homogenization as a result of impounding. This trend can be explained mainly by the simple mixing of waters with different degrees of mineralization and organic content. Thus the centroids of the samples taken from 1981 to 1984 (symbols A and 0 ) can all be found in the upper center part of the figure, since all the stations have shown the same decomposition mechanisms and homogenization. Analysis of the particular changes at a specific station will make it easier to follow the events caused by impounding. The circled symbols have been taken at the "Toto station". Because of its geological location, the waters of this station had a relatively high degree of mineralization and a substantial amount of organic matter. These characteristics explain that, under natural conditions (symbol % ) , this station is positioned at the right of the first axis, while r?latively high values of silica and dissolved oxygen and relatively low total phosphorus, total inorganic carbon and chlorophyll a account for its position near the bottom. This station was submerged in 1979 ( X ) by the waters of the La Grande Rivizre which are less mineralized and less rich in organic matter. This fact explains the first movement to the left of the reduced space. The decomposition of the submerged plant matter, resulting in a drop of the dissolved oxygen, a decrease of the pH,and increases of the total inorganic carbon and phosphorus, explains the subsequent migration of the points toward the upper part of the figure (symbol A ) . In 1981, a sharp increase in chlorophyll a and pheopigments and a drop in silica associated with a continued increase in decomposition products, explain the continued rise of the points in the figure (symbol a). Because this station is located in a small bay away from the main body of water of the reservoir (Figure 11, it has shown a degree of mineralization and organic matter content intermediate between its initial characteristics and those of the reservoir. This explains why its centroids (circled) have not quite joined those of the other stations. But continuous drawdown and subsequent partial refilling have weakened the differencesand explain the subsequent
.,
+,
migration from right ( H in 1981) to left ( 0 in 1984) , closer to the others. The reduced space ordination of all the samples of all the stations superimposed on the parameter vectors on a large color graphic has permitted the interpretation of the events occuring from 1978 to 1984. This analysis has shown the bunching up of the 1981 through 1984 samples of all the stations. Subsequent a n a l y s e s done on these samples only have shown spatial differenciation of the reservoir waters and subsequent a n a l y s e s have g i v e n finer details. The results given by reduced space ordination (by principal component analysis) were interpretable when at least 40% of the total variance was explained by the first two axes. Principal component analysis singled out the parameters showing the greatest changes. It has shown that in the photic zone, the dilution (first axis) explains the greatest proportion of the total variance observed after impoundment. Table 1 gives the variations shown by the important parameters. It shows that, in the photic zone, most of the absolute variations were weak.
- Deep Zone Figure 6 is a representation of the parameter vectors in the reduced space obtained by the principal component analysis performed on the 111 samples taken one meter from the bottom of four La Grande 2 reservoir stations. These samples were taken between 1979 and 1984 at the time of winter anoxic conditions, summer stratification and spring turnover. The distribution. of the vector parameters shows a strong negative correlation of dissolved oxygen with the following parameters: total inorganic carbon, total phosphorus, total Kjeldahl nitrogen, conductivity and bicarbonates. All these correlations are associated with the decomposition mechanisms of submerged organic matter (Burdick and Parker, 1971). The o x i d a t i o n of this matter, by chemical or biological processes, results in the consumption of dissolved oxygen and the release of C 0 2 , ions (increased conductivity and bicarbonates) and nutrients (increase of total phosphorus and total Kjeldahl nitrogen).
Figure 3
Figure 5
Annual evolution of isotherms at Bereziuk station ( La Grande 2 Reservoir
Dendrogam representi the clusters obtained by flexble linkage clustering analysis (Lance and Williams)
0,92
0.50
bottom
1980
472
Figure 4 Annual evolution of ioopleths of percentage oxygen saturation at Bereziuk station ( La Grande 2 Reservoir )
(0.42)
bottom
1980 1
-9-
sampling dates percentage oxygen saturation
w
(D
40
The two main axes explain 48 and 1 8 % , respectively, of the total overall variance. The parameters which contribute the most to the first axis are: total inorganic carbon, saturation percentage of dissolved oxygen, total phosphorus, total Kjeldahl nitrogen, conductivity and bicarbonates. The following parameters contributed most to the second axis: pH, total organic carbon, temperature and sulfates. The greatest variations observed at depth were shown by the parameters associated with the process of decomposition of submerged plant matter (Schetagne, 1 9 8 5 ) . Thus, unlike the first principal component analysis, the decomposition axis (I) explains the largest percentage of the total variance. The hierarchical flexible linkage clustering analysis represented by the dendrogram in Figure 5 was created from a Gower similarity matrix calculated with the data from the bottom samples. Two very different groups can be noted. They have a similarity index of only - 0 . 7 2 . The two polygons in Figure 6 define the position of the sample-points of each group in the reduced space of the first two principal components. The two groups can be distinguished by the value of dissolved oxygen in their samples. Roughly speaking the samples of the group on the right have dissolved oxygen saturation rates below 20% and those on the left have a saturation rate above 2 0 % . This boundary corresponds to the zone of rapid change of the redox potential. The solubility of a number of compounds substantially increases in a reductive environment (Wetzel, 1 9 7 5 ) . Thus, the dissolved oxygen level is a preponderant factor in the changes in the water quality of a reservoir because it influences the rate of exchange between sediments and water. According to Campbell et al. ( 1 9 7 6 ) , the impact of the sediments on the quality of the overlying water can be up to fifty times greater in anaerobic conditions than in aerobic conditions. In this case, the group on the right side of the reduced space can be distinguished from the one on the left by the very low or totally absent levels of dissolved oxygen and thus by a very low redox potential and by much higher values of total phosphorus, total Kjeldahl nitrogen,
Figure 6
Table 2
RIcluwd space ordination of the first two clusters of bottom samples.
Values measured in one meter from bottom samples ( La Grande 2 Reservoir )
.
Parameters
'S
\
O2
1
'ercentage oxgen saturati (%) Conductivity ( pSlcm) Bicarbonates (mg/l of HCO;) PH
End of winter After sprilg m o v e r
c-20
60-9 1
19-88
14-22
5-36
3-9
588-68
5,8-6,3
0.5- 17,O
1.0-2.5
<$8- 158
1,&43
3.4-2 1A
33- 10,5
030-122 0,09-Q.m 25-178
10-28
42
conductivity, bicarbonates and total inorganic carbon. The differences between these groups are not a function of station location because, as a result of the turnover periods, the deep zones of all these stations show the same alternation in their dissolved oxygen saturation rates (Figure 4). Table 2 shows that the variations observed in the bottom samples were quite great. The much smaller variations observed in the photic zone indicate that the effect of the bottom zone on the overall reservoir is limited. A number of clustering a n a l y s e s were also performed on photic zone samples but, in contrast with the analysis of the bottom zone, they were not very instructive. The goal of clustering analysis is to locate discontinuities which in ecology are not usually well defined. In our studies, they were only helpful in the analysis of data from the bottom zone where the declining redox potential triggered sharp chemical changes. CONCLUSION Multivariate methods (principal component ing analysis) were successfully used to single parameters showing the greatest changes and to large amounts of data in a clear and synthetic
and clusterout the present very manner.
REFERENCES American Public Health Association et al., 1981. Standard Methods for the Examination of Water and Wastewater. APHA, Washington, D.C. 1134 p. Berman, T. and Eppley, R.W., 1974. The measurement of phytoplankton parameters in nature. Sci. Prog. 61:219-239. Burdick, J.C. and Parker, F.L., 1971. Estimation of water quality in a new reservoir. Department of Environmental and Water Resourcss Engineering, School of Engineering, Vanderbilt and U.S. Army Corps of Engineers, Report no. 8500 p. Campbell, P.G. et al., 1976. Effets du dscapage de la cuvette d'un reservoir sur la qualit6 de l'eau emmagasinee: 6laboration d'une msthode d'6tude et application au rsservoir de Victoriaville (Rivisre Bulstrode, Quebec) INRS-Eau, Quebec. 301 p. (Rapport scientifique no. 37). Jolicoeur, P. and J.-E. Mosimann, 1960. Size and shape variation in the painted turtle. A principal component analysis. Growth 24: 339-354. Lance, G.N. and Williams, W.T., 1966. A generalized sorting strategy for computer classification. Nature (Lond.) 212-218. Lance, G.N. and Williams, W.T., 1967. A general theory of classificatory sorting strategies. I. Hierarchical
43
system. Computer J. 9 3 7 3 - 3 8 0 . Legendre, L. et Legendre, P., 1 9 7 9 . Ecologie num6rique. Tome 2: la structure des donn6es 6cologiques. Presses de 1'UniversitS du QuGbec, Masson, Paris, 2 4 7 p. Magnin, E., 1 9 7 7 . Ecologie des eaux douces du territoire de la Baie James. Soci6t6 d'6nergie de la Baie James, Montrgal, 4 5 4 p. Scherrer, B., 1 9 8 5 . Construction et utilisation des cercles de contribution dquilibrde et de signification pour interpr6ter les r6sultats d'analyses en composantes principales. Rapport du groupe de recherche et d'6tudes en biostatistiques et en environnement pour la Soci6tB d'gnergie de la Baie James. 12 p . Schetagne, R., 1 9 8 5 . Le rsseau de surveillance Bcologique Physico-chimie et pigdu Complexe La Grande 1 9 7 7 - 1 9 8 4 . ments chlorophylliens. SociBt6 d'dnergie de la Baie James, Direction inggnierie et environnement, en pr6paration. Sylvester, R.O. et Seablom, R.W., 1 9 6 5 . Influence of site characteristics on quality of impounded water. J. AWWA, 57:1528-1546.
Wetzel, R.G., 1 9 7 5 . Toronto, 7 4 3 p.
Limnology.
W.B. Saunders Company,
MODELING RIVER ACIDITY
- A TRANSFER FUNCTION APPROACH
EIVIND DAMSLETH Norwegian Computing Center, Oslo, Norway
INTRODUCTION General For quite some time, Norwegian authorities have been concerned with pollution, with special emphasis on the pollution caused by acid precipitation. An extensive research program: "Acid precipitation - effects on forest and fish", was run through the period 1972 - 1 9 8 0 (Overein, Seip and Tollan, 1 9 8 0 ) . Most of the research activities in this field are now organized at the Norwegian Water Research Institute and the Norwegian Institute for Air Research. The research is, to a large extent, financed through the Norwegian State Pollution Authority. For a long time much effort was spent on collecting data from various sources, in addition to laboratory analysis. In the later years there has been an increasing understanding of the importance of analysing the data through more sophisticated methods, which leads to statistical data analysis. The Norwegian Computing Center (NCC) is a research institute founded by the Norwegian Council for Scientific Research. NCC is involved in research within data technology, data communication and cartography. Mathematical statistics and data analysis are also important fields of research. During the later years we have worked in close cooperation with the water and air research institutes, to assist them in their analysis and to bring their use and understanding of various statistical methods up to date. This is one of the major tasks f o r our Section f o r Statistical Analysis of Natural Resources Data. 1.
1.1
This study Time series analysis plays an important role in the analysis of pollution data, when it comes to questions about long term trends, e.g. whether air or water pollution at specific locations has changed from one year to another, or to judge the effects of specific actions. In 1983 and 1984 NCC was engaged in a project called "Stochastic time series models applied to pollution data". The 1.2
45
study covered air- as well as water pollution. A complete description of the project and the results is given in Damsleth (1984a). The results regarding air pollution are also presented elsewhere (Damsleth, 1984b1, and the present paper gives a summary of the results for river acidity. DATA We have analysed data from three rivers in the Southern parts of Norway: Nid River, Tovdal River and Mandal River. The location of these rivers are shown in Figure'l. Only the analysis from Nid River is presented here; the findings from the two other rivers are given only briefly at the end of the paper. 2.
Fig. 1. The geographical location of the three rivers under study. The river acidity is given as pH-measurements taken weekly from March 4. 1970 until December 30. 1980, 566 weeks in all. The data are shown in Figure 2 2.1 Missins values and outliers There were 18 weeks where the data were missing for various reasons: Three in 1970, six in 1971, seven in 1972 and one in 1976 and in 1977. In the analysis, these missing values were treated as follows: - The missing values were first estimated using simple linear interpolation between the observations preceeding and following the gap.
46
Discharge -
6
1000
100 -
10
...
.....
I
I
I
I
I
1970
I
I
I
I
1971
1
I
1000 -
10
! 4
1972 -6
I
I
I
I
I
I
I
I
I
I
1973
1974
1975
1976
1977
1978
4
I
;-
5
Fig. 2. pH (dotted, right scale) and discharge (solid line, left scale) for Nid river 1972 - 1982.
-
A univariate time series model was identified and estimated for this
adjusted series. - Optimal estimates (Damsleth, 1980) using this model were inserted for the
missing values.
- The series thus obtained was used to build a model for the relation between the acidity and the discharge. - When a model for this relation was found, this was used to calculate new optimal values. Finally the model parameters were estimated once more using the final estimates for the missing values. The various steps in the above process gave only small changes in the estimates for the missing values, and the model- and parameter estimates were almost unaffected by these changes. During the model building process, it became obvious that the data contained some "wild" values (outliers). There can be several reasons for such values: There may be an error in the data registration, an error may have occured during the laboratory analysis or the value may in fact be correct, but strongly affected by some occasional outlet from a non-typical acid or alkaline source. There are eight obvious outliers in the data, two in 1 9 7 0 and one in 1 9 7 1 , 1 9 7 5 , 1 9 7 6 , 1 9 7 7 and 1 9 7 8 . Such extreme values may have a very strong influence on the model building. To avoid this, we chose to ignore the outliers and to treat them as missing values according to the procedure above. UNIVARIATE ANALYSIS 3.1 3iihe series model, Simple univariate analysis gave the model 3.
(1
-
0.828 - 0.148
52
(0.04) (0.02)
)(Yt- 5 . 1 6 ) = at, oa= 0.1033 (0.04)
where we have used the standard notation (Box & Jenkins, 1 9 7 0 ) for time series models, so that Y represents the observed pH at time t, at is a white noise t sequence with standard deviation oa, and B is the backwards shift operator so that BkYt= Yt-k. Note the highly significant, though small, lag 5 2 term, which implies a certain seasonal pattern. Residual analysis There is reason to believe that the acidity may differ for the various seasons of the year, and there may also have been a change through time. In Table 1 we give the mean and standard deviation for the residuals from model ( 1 1 , according to season and according to year. In this context we define the seasons as follows: Winter is December through February, Spring is March through May, Summer is June through August and Autumn is September through There a r e s e v e r a l i n t e r e s t i n g f e a t u r e s i n T a b l e 1. There a r e no November. 3.2
s i g n i f i c a n t changes i n t h e a c i d i t y b v e r t i m e ; none o f t h e y e a r l y means d i f f e r s i g n i f i c a n t l y f r o m 0.
48
TABLE 1 Mean and standard deviation for the residuals from model ( I ) ,according to year and season. On the 5 S, level, the means marked * are significantly different from 0 and the standard deviations marked * significantly different from 0 . 1 0 3 3 , the estimated overall standard deviation. Year
1970
-.OOO Mean St.dev. . I 0 5 39 # obs.
1971
1972
. 0 1 6 -.006 . 0 7 7 * .093 52 53
1973
1974
1975
1976
1977
1978
.012 -.017 .I07 .I20 52 52
.016 .lo3 52
.001 -.020 -.015 .099 .099 .095 53 52 52
1979
1980
.006 . 0 0 4 .129* . l o 1 52 52
# obs.
n
When it comes to the seasons, the summer shows a clearly significant positive mean. We can thus conclude that the model changes according to the seasons, and that the summer months are not satisfactorily described. The table also shows that there is a large variation in the residual standard deviation between years and between seasons. In particular, the standard deviation is definitely smaller during the winter period. Figure 3 shows a histogram of the residuals from model ( 1 ) . It appears quite nice and symmetric, without any peculiarities. We have not performed any formal test of normality, but the visual impression gives no reason to doubt this assumption. The normality assumption is by no Fig. 3 . Histogram of means critical in the previous analysis, except perhaps the residuals from for the judgements of significance. model ( I ) . 4.
RELATION BETWEEN ACIDITY AND DISCHARGE There is reason to believe that the acidity in the river is affected by the discharge. The above analysis, which showed that the univariate model was not satisfactory during the summer when the flow is low, supports this hypothesis. We therefore wanted to incorporate the discharge into the model, using a transfer function model framework. 4.1 D i s c h a r g e d a t a The Norwegian Water and E l e c t r i c i t y Board has k i n d l y p r o v i d e d d a t a f o r t h e discharge i n Nid River f o r t h e p e r i o d o f i n t e r e s t .
The d a t a a r e d a i l y
49
measurements of the discharge in m3 per second. The flow is thus measured much more frequent that the acidity, which allows several possible choices of input variables to the transfer model. The natural approach would be to use several input series: one with the flow measurements taken the same day as the pH, one with the flow the day before and so forth. Unfortunately, the daily measurements are strongly autocorrelated, so that this approach would give a transfer function where the input series are strongly correlated. This leads to problems in the identification as well as the estimation process (Damsleth, 1 9 7 9 ) . Some experimenting lead to the use of the average flow during the last seven days prior to the pH-measurement as the only input series to the transfer function model. This input series is also plotted in Figure 2 .
. .
The discharse/acldltv m o m From Figure 2 one may deduce a negative co-variation between the discharge and the acidity. This is more pronounced in Figure 4 a, which shows a plot of pH-values against discharge. The figure also shows a smooth estimate of the functional relationship between the two variables, where we have used the LOWESS smoothing technique (Cleveland, 1 9 7 9 ) . The curve is clearly curved, and typical of a log-linear relationship. This is confirmed by Figure 4 b, where the flow is plotted in logarithmic scale, giving a near linear picture. Bearing the logarithmic definition of pH in mind, this is not surprising. 4.2
Fig. 4 . Relationshipbetweenacidityand discharge in Nid river. In (a) the discharge is plotted directly (in m3) along the X-axis, while in (b) the X-axis is given on a logarithmic scale. A few iterations of identification, estimation and diagnostic checking finally gave the transfer function/noise model:
50
0.55B)-1(-0.17 - 0.05B)(lnXt- 4.38) + Nt (0.07) (0.02) (0.02) 2 ( 1 - 0.738 - 0.12B INt = at, oa = 0.09156 (0.04) (0.04)
Yt
-
5.19 = (1 (0.03)
-
Here Xt represents the discharge as defined above, Nt is the noise in the transfer function model, and Yt and at is as given earlier. The residual standard deviation for model (2) was 0.09156, compared with the value 0.1033 from the univariate model (I), a significant, though not astonishing, improvement. It is worth noticing that introduction of the discharge as input to the model has removed the seasonal effect found in model (1). 4.3 Figure 5 again shows a histogram of the residuals from model (2). As for model (I),there is no reason
to doubt the normality assumption.
0 As in Table 1 for the univariate model, Table 2 gives the mean and standard deviation for the residuals from model (21, according to year and to season.
Fig. 5. Histogram of the residuals from model (2).
TABLE 2 Mean and standard deviation for the residuals from model (2), according to year and season, are presented in Table 3. The means marked * are significantly different from 0 at the 5 b level, and the standard deviations marked * are significantly different from 0.09156, the estimated overall standard deviation. Year
1970
1971
Mean St.dev. # obs.
.007 .089 36
.019 .021 -.006 -.005 .070* .079 .092 .099 52 53 52 52
1972
1973
1974
1975
1976
1977
1978
1979
1980
.013 -.012 -.009 -.006 -.011 -.008 .097 .085 .080 .094 .124* .087 52 52 53 52 52 52
# obs. ~~
The e n t r i e s i n T a b l e 2 have a much more homogeneous appearance when compared t o T a b l e 1.
There i s no r e a s o n t o s a y t h a t t h e r e has been any change t h r o u g h
t h e y e a r s , a l t h o u g h i t i s i n t e r e s t i n s t o n o t e t h a t t h e means a r e a l l n e g a t i v e f r o m 1976 on.
The seasons s t i l l d i f f f e r .
Summer s t i l l has a c l e a r l y p o s i t i v e
51
mean, and the spring has become negative. The standard deviation, however, has been significantly reduced for all four seasons. The introduction of the discharge has thus enabled us to,adjust for most of the seasonal behaviour, but it will be neccessary to develop a more complex model to remove it completely. Overall, the introduction of discharge gave a reduction in the residual standard deviation from 0 . 1 0 3 4 to 0 . 0 9 1 5 6 , that is 1 1 . 4 %. Comparison of Tables 1 and 2 shows that the reduction is not uniform over the seasons. The largest reduction is for the autumn, with 1 8 . 8 %. The reductions for summer and spring are 1 5 . 7 b and 1 0 . 9 % respectively, while the reduction during the winter period is only 1 . 4 %. The main reason for this is that the discharge is much more stable during the winter period, compared to the rest of the year. Since there is little variation in the flow, only a small proportion of the variation in the acidity can be explained by it, and it is not important for the residual standard deviationwhether the discharge is included in the model or not. 5.
CONCLUSIONS We have analysed pH-data from three rivers in Southern Norway. In Nid River and Tovdal River observations were taken weekly, while the pH in Mandal River was observed twice a month. All three series are well described by simple, univariate ARIMA-models. A small, but significant, term is included to account for the seasonal variation in the series. The model structure is very similar for all three rivers. Two of the rivers, Nid River and Mandal River, are exploited extensively for power production. Most of the discharge in these rivers consists of storage water, which is known to have a more stable water chemistry as compared to rivers which run freely. This is reflected in the residual standard deviation of the pH, which is much larger for the unexploited river. The discharge affects the acidity in all the three rivers. The relationship between flow and pH has been described in a transfer function model, where the input is the logarithm of the average discharge during the last seven days prior to the pH-observation. The effect, however, is much more pronounced in the two exploited rivers, which again can be explained by the stable chemistry of the storage water. The use of discharge as input to the model has removed most of the seasonal variation in all three series. For the controlled rivers it is fairly satisfactory to apply the same model for the whole year, independent of the season. For the uncontrolled Tovdal . River, this is not the case. Here the relation between discharge and acidity exist only during the summer and autumn. The explanation lies in the snow, which stores most of the water during the winter and early spring, discharching it again during the spring flood. Thus, during the winter the discharge in Tovdal River consists mainly of local rain and snow melting. During the spring,
52
flood water from the melting snow constitutes most of the discharge, while the summer and autumn discharge is mostly rain coming more or less directly into the river. It is not surprising that the relationship between discharge and acidity is different in these situations. For the exploited rivers, the discharge is mostly magazine water. The input to the magazines have of course the same various sources as described for Tovdal River, but the homogenization process in the magazines results in a model which can be used for the whole year. The analysis gives no reason to say that there has been any systematic trend in the river acidity. After adjustment for the variations in the discharge, there is vague evidence that the rivers have beensignificantlymore acid during the years 1977-79. This is somewhat surprising compared with other investigations which found a significant deterioration in the river acidity in Southern Norway (Henriksen et al., 1981). The explanation is that earlier studies used data all the way back to 1966, and thus caught the deterioration during the late sixties. The situation is obviously stabilized during the seventies, when our data were collected. REFERENCES Box, G.E.P. and Jenkins, G.M., 1970. Time Series Analysis, Forecasting and Control. Cleveland, W.S., 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. JASA, 74: 829-836. Damsleth, E., 1979. Analysis of Multi-input Transfer Function Models when the Inputs are Correlated. NCC-report no. 641, Norwegian Computing Center, Oslo. Damsleth, E., 1980. Estimating Missing Values in a Time Series. Scand. J. of Stat., 7: 33-39. Damsleth, E., 1984a. Time Series Analysis of Pollution Data - A Methodological Study (in Norwegian). NCC-report no. 745, Norwegian Computing Center, Oslo. Damsleth, E., 1984b. Long Range Transport of Air Pollution into Norway - a Transfer Function Approach. MIC, 5 : 141-150 Henriksen, A., Snekvik, E. and Volden, R., 1981. Changes in pH during the period 1966-1979 for 38 Norwegian Rivers (in Norwegian). Governmental Program for Pollution Monitoring Report no. 2/1981. Norwegian Institute for Water Research, Oslo. Overein, L.A., Seip, H.M., and Tollan, A., 1980. Acid Precipitation - effects on Forest and Fish. SNSF-project, FR19/80. Oslo.
SULPHATE, WATER COLOUR AND DISSOLVED O R G A N I C CARBON RELATIONSHIPS I N O R G A N I C WATERS OF A T L A N T I C CANADA HOWELL a n d T . L .
G.D.
POLLOCK
M o n i t o r i n g a n d S u r v e y s D i v i s i o n , Water Waters D i r e c t o r a t e , A t l a n t i c R e g i o n
Q u a l i t y Branch,
Inland
ABSTRACT This the
study
interference
l a r g e d a t a set t o re-examine
associated with
(e.g.
determinations
organic
and
anion
the
anions)
inability
concentrations,
analyses. colour
Using
threshold
sulphate
of
value
As
of
sulphate
a r e s u l t of c o l o u r
colourimetric
to
sulphate
quantitatively
analytical
perform p a r a l l e l colourimetric
some o f
t h e determination of
i n highly organic w a t e r s .
concentrations
organic
a
utilizes
difficulties
determine
laboratories
and i o n chromatographic
data
from
Atlantic
Hazen
20
units,
a water
Canada,
below
often
sulphate
which
there
i s n o s i g n i f i c a n t d i f f e r e n c e b e t w e e n t h e two a n a l y t i c a l t e c h n i q u e s E i g h t e e n d a t a s e t s w e r e exam’ined t o d e t e r m i n e
c a n be e s t a b l i s h e d . whether could
or
be
not
h i s t o r i c a l colourimetric
corrected
for
organic
anion
sulphate
concentrations
interference.
Nine
t h e s e d a t a s e t s n o t o n l y h a d good l i n e a r r e l a t i o n s h i p s ( r 2
of .65)
b e t w e e n A SO4 ( SOlMTB - SO4Ic) a n d w a t e r c o l o u r b u t a l s o e x h i b i t e d
similar slopes.
Similar r e s u l t s w e r e observed f o r relationships
between A S 0 4 and were
with
A
versus
SO4
few
DOC,
although
exceptions
colour
or
water
more
headwaters
colours)
can
the
for (i.e.
be
correlation
than
regressions.
c o r r e c t i o n o f S O 4MTB d a t a one
lower
those
These
headwater those
coefficients
observed
results
sites o r
for
indicate
the that
draining
sites
which have h i g h l y v a r i a b l e
accomplished
a
using
general
correction
equation. ACKNOWLEDGEMENTS The a u t h o r s w i s h t o t h a n k t h e s t a f f o f t h e A n a l y t i c a l S e r v i c e s Division
of
samples. Branch,
Canadian
and Oceans which
the
Q u a l i t y Branch
the
who
analysed
the
water
a p p r e c i a t i o n i s e x t e n d e d t o Water Q u a l i t y
Wildlife
Service
and
p e r s o n n e l who c o l l e c t e d t h e
formed
Computer
Water
In addition,
basis
of
the
data
Boulter.
of
l a r g e number
set
a s s i s t a n c e w a s p r o v i d e d by D.
t h e m a n u s c r i p t w a s t y p e d by L .
Department used
in
Fisheries of
this
Bingham a n d L .
samples paper. Wong a n d
54 INTRODUCTION
Recent of
interest
strong
in
mineral
the
t h e atmospheric d e p o s i t i o n
e f f e c t s of
acids
on
waters
surface
has
increased
the
demand f o r r e l i a b l e a n a l y t i c a l d e t e r m i n a t i o n o f n i t r a t e , s u l p h a t e and hydrogen i o n c o n c e n t r a t i o n s . mediated
by
t h e more
conservative
as
used
a complex
are
processes,
concentrations abiotic
A s b o t h hydrogen i o n and n i t r a t e ?
an
of
indicator
hydrogen
ion
series o f sulphate
loading.
biotic
ion
has
However,
and been
there
is evidence that s u l p h a t e is also mediated by b i o t i c and a b i o t i c processes
et
(Rao
al.,
1984;
Nriagu,
1984)
and
thus
does
not
R e g a r d l e s s o f these o b s e r v a t i o n s , s u l p h a t e
behave c o n s e r v a t i v e l y .
has been used e x t e n s i v e l y f o r the c a l c u l a t i o n of t a r g e t l o a d i n g s
as
and
a
basis
the
for
development
several
of
acidification
m o d e l s ( H e n r i k s e n , 1 9 7 9 ; Thompson, 1 9 8 2 ) . During
the
1970's,
sulphate
were
concentrations
generally
d e t e r m i n e d u s i n g a n a u t o m a t e d m e t h y l thymol b l u e (MTB) p r o c e d u r e (Lazrus et a l . , Cronan
(1979)
to
susceptible Although
that
interference
this
hypothesized with
1966; A n a l y t i c a l Methods Manual, 1 9 7 9 ) . indicated
that
the
in
organic
MTB
colourimetric
highly
is
interference
sulphate
the not ions
a
chemical,
rather
a
than
it
understood,
present ions,
physical,
is
s o l u t i o n compete
in
resulting
estimation of t h e a c t u a l sulphate concentration. of
was
organic coloured waters.
well
f o r a v a i l a b l e barium
However,
method
in
an
over-
This contention
colour
interference
m e c h a n i s m i s s u p p o r t e d b y t h e f a c t t h a t w i t h MTB s u l p h a t e c o n c e n t r a t i o n s good i o n b a l a n c e s are o b s e r v e d f o r t h e s e c o l o u r e d w a t e r s even
though there
This
would
i s n o m e a s u r e of
suggest
the
that
MTB
organic anion concentration. sulphate
determination
gives
an indication of sulphate plus organic anion concentration. The
in
the
techniques aquatic
employed t o
milieu
have
determine
progressed
sulphate concentrations considerably,
reflecting
advances i n b o t h a n a l y t i c a l methodology and automated procedures. With in
the
the
development
late
1970's,
of
reliable
many
ion chromatographic techniques
laboratories
switched
from
automated
c o l o u r i m e t r i c s u l p h a t e a n a l y s i s t o i o n chromatography. I n November o f began
ion
collected
1 9 8 1 , t h e Water Q u a l i t y B r a n c h , A t l a n t i c Region,
chromatographic
sulphate
i n LRTAP p r o g r a m s .
As
analysis
waters
for
of C a n a d a a r e c h a r a c t e r i s t i c a l l y h i g h l y c o l o u r e d , to
document
the
effects
of
dissolved
all
samples
i n t h e A t l a n t i c Region organic
it w a s necessary
matter
on
MTB
55 sulphate
measurements
and
thus
sulphate
measured by b o t h i o n chromatography a s m a l l d a t a s e t , Kerekes e t a l .
were
concentrations
and MTB c o l o u r i m e t r y .
Using
( 1 9 8 4 ) observed l a r g e differences
between c o l o u r i m e t r i c and i o n c h r o m a t o g r a p h i c s u l p h a t e ( A S O ) 4 f o r samples w i t h h i g h w a t e r c o l o u r s and d i s s o l v e d o r g a n i c m a t t e r . These
findings
sulphate
MTB
mass
had
serious
implications
particularly
when
of
in
budgets
sulphate
regarding
considering
organic
the
seasonal
systems.
of
use
trends
an
In
and
attempt
t o s a l v a g e h i s t o r i c MTB s u l p h a t e d a t a , t h e s e a u t h o r s i n v e s t i g a t e d the
possibility
ference.
of
The
correcting
authors
sulphate
MTB
observed
that
for
colour
although
site
inter-
specific
v e r s u s w a t e r c o l o u r and A S O
relationships did exist for ASO,
4 v e r s u s DOC, i t a p p e a r e d t h a t a g e n e r a l c o r r e c t i o n f o r t h e o r g a n i c i n t e r f e r e n c e of MTB s u l p h a t e was n o t f e a s i b l e . Although
is
ion no
balancing
convenient
of
organic
of
Oliver e t a l . the
the
organic
technique
from DOC and pH, chemist
need
places
this
colour
for
as
a resource
paper
the
has
two
investigate
available
for
at
(MTB)
SO4
present
the
there
determination
Although t h e e m p i r i c a l f o r m u l a t h i s has n o t been w i d e l y a d o p t e d an
analyzing
a
dual
ion
balancing
samples on
by
tool.
both
Despite
methods,
concentrations
SO4 ( I C )
burden
the
purpose.
analytical
The
primary
the
for
all
laboratory. is
goal
to
l a r g e a v a i l a b l e d a t a s e t t o e s t a b l i s h a water
t h r e s h o l d below which
between
because
(Cheam
( 1 9 8 3 ) h a s been u s e d t o e s t i m a t e o r g a n i c a n i o n
use the relatively
to
accurate
waters
method
t i m e and c o s t of d e t e r m i n i n g samples Thus
more
concentration.
analytical
obvious
of
direct
anion
concentration by
the
1 9 8 5 ) , t h e a n a l y t i c a l chemist s t i l l r e q u i r e s
e t al., for
is
SO4 ( I C )
sulphate further
t h e r e i s no s i g n i f i c a n t d i f f e r e n c e
methods.
the
secondary
A
possibility
of
objective
using
is
empirical
r e l a t i o n s h i p s t o c o r r e c t t h e l a r g e amount of h i s t o r i c a l
SO4
(MTB) d a t a .
METHODS Ion
chromatographic
NAQUADAT
method
code
sulphate 16309
was
(WQB,
determined
1983)
while
according automated
s u l p h a t e was measured u s i n g NAQUADAT method code 1 6 3 0 4 . water
colour
equipped with calibrated
was
estimated
with
a
Hellige
visual
to MTB
Apparent comparator
2 0 0 mm Nessler t u b e s and a c o m p a r a t o r c o l o u r d i s c
from 0
to
1 0 0 Hazen u n i t s
( N A Q U A D A T 02011L).
Water
56 c o l o u r s g r e a t e r t h a n 100 w e r e determined by an a p p r o p r i a t e sample dilution. The the
set
data
Atlantic
July using
Region
and
1984
both
employed,
collected
analyzed
sulphate
included a l l
originally
by
the
between Water
methods.
the
(MTB)
data
were
above
mg/L
20
The
initial
from
1981 t o
laboratory
screening
process
samples and a l l samples
concentrations
to
eliminated
Nov.
Q u a l i t y Branch
e l i m i n a t e d p r e c i p i t a t i o n and groundwater w i t h SO4 ( I C ) o r SOq
dates
samples
above
mg/L.
20
overcome
Those
interpretation
problems a s s o c i a t e d w i t h sample d i l u t i o n a t h i g h c o n c e n t r a t i o n s . Additional
screening eliminated samples w i t h w a t e r colours
data
1 0 0 I-lazen u n i t s
of
recorded
set
data on
as
a s GT 1 0 0 . of
up t o
1983 w a t e r
100 w e r e
above
S t a t i s t i c a l analysis was
samples.
2682
colours
This screening process resulted i n a f i n a l performed
a VAX 1 1 / 7 5 0 c o m p u t e r e q u i p p e d w i t h t h e R S / 1 d a t a m a n a g e m e n t
system.
Data
test
Smirnoff
was
tested
for
sample
while
normality
median
using
values
the
were
Kolmogorov/
tested
using an
u n p a i r e d Mann-Whitney t e s t . RESULTS A N D DISCUSSION To
facilitate
value, of
the
data
the
establishment
set
was
colour ranges.
Water
sorted
a water
of
and
colour
subdivided
threshold
a
into
series
colour was chosen rather than dissolved
o r g a n i c c a r b o n as c o l o u r i s a n e a s i l y m e a s u r e d p a r a m e t e r performed on most s a m p l e s s h o r t l y a f t e r receipt a t t h e l a b o r a t o r y . TABLE 1
Results
of
Mann-Whitney
Unpaired
Test
for
Four
Water
Colour
Ranges Colour Range 1 2 3 4
0-20 20-30 30-40 40-50
*
=
**
=
SO IC MeAian
3.1 2.9 2.6 2.6
SOqN1C
S O MTB M e&an
869 183 279 488
3.2 3.2 3.2 3.3
SO4 MTB
2 S t a t i s t i c Significance
Level
N
-1.5 -2.8 -6.2 -8.5
864 183 275 479
0.132* 0.004** 0.0001** .0001**
No s i g n i f i c a n t d i f f e r e n c e b e t w e e n m e d i a n s a t 9 5 % C 1 S i g n i f i c a n t d i f f e r e n c e between medians a t 95% C 1
Table
1 presents
the
c o n c e n t r a t i o n s of S O 4
Mann-Whitney
(MTB) a n d SO4 ( I C )
test
of
the
median
for four colour ranges.
These r e s u l t s i n d i c a t e no s i g n i f i c a n t d i f f e r e n c e i n median v a l u e s only
for
the
0-20
Hazen
units
colour
range.
Thus
a
cut-off
57
level
of
Hazen u n i t s
20
can be e s t a b l i s h e d ,
below which it
is
u n n e c e s s a r y t o a n a l y z e samples by b o t h s u l p h a t e methods. Although this may
may
be
not
be
statistically necessary
to
valid, rigidly
on
an
adhere
operational to
the
20
level
Hazen
The r e s u l t s o f l i n e a r r e g r e s s i o n a n a l y s i s of SO4
limit.
it
unit
(IC)
v e r s u s SO4 ( M T B ) f o r f i v e c o l o u r r a n g e s a r e p r e s e n t e d i n F i g u r e 1.
P R
I
E' D I C T b E
D S
044
8
x o
(
n
F i g . 1. Sulphate C o l o u r Ranges
(IC)
Values
greater
of
sulphate
vs
concentrations
than
the
0-20
mg/L
8.0
for
Five
excluded
from
(MTB)
were
unduly
influenced
U s i n g t h e s e r e g r e s s i o n e q u a t i o n s , SO4 can
be
calculated
a n y v a l u e o f SO4 ( I C ) ( T a b l e 2 ) . for
Sulphate
as a few h i g h p o i n t s
the regression analysis the regression.
Predicted
water
colour
for
colour
(MTB)
range
given
This approach i n d i c a t e s t h a t
range,
b e t w e e n SO4 ( I C ) a n d SO4 ( M T B )
each
there
which
o f t h e Mann-Whitney t e s t o f m e d i a n s .
is
little
corroborates
difference
the
results
F o r t h e 2 0 t o 4 0 Hazen u n i t
r a n g e c o n s i d e r a b l e d i f f e r e n c e s b e t w e e n SO4 ( I C ) a n d p r e d i c t e d SO4
(MTB)
colour
are
range
apparent
up
increases,
to
the
3 mg/L
significant
level.
differences
As
the water
between
the
two s u l p h a t e m e t h o d s a r e o b s e r v e d a t h i g h e r SO4 ( I C ) c o n c e n t r a t i o n s . In fact
for
the
80-90
water
colour
range,
large
a r e o b s e r v e d up t o a SO4 ( I C ) c o n c e n t r a t i o n o f 6 mg/L.
differences
58 TABLE 2
D i f f e r e n c e b e t w e e n SO4 (IC) a n d SO4 ( M T B ) p r e d i c t e d f r o m Regression equations of s p e c i f i c colour ranges Sulphate Colour 0-20 I C (mg/L) Sulphate M E
Slope
Intercept R**2 N
Sulphate M l B
the
1.
to
of
40
4.
Colour 80-90 Sulphate MlB
1.30 2.30 3.20 4.20 5.10 6.10 7.00 8.00
2.00 2.80 3.60 4.40 5.20 6.10 7.00 7.80
2.10 2.90 3.80 4.60 5.40 6.30 7.10 8.00
4.90 5.70 6.50 7.30 8.20
2.50 3.30 4.20 5.00 5.90 6.70 7.60 8.40
0.96 0.35 0.95 867.00
0.85 1.03 0.87 440.00
0.84 1.24 0.87 462.00
0.82 1.60 0.79 380.00
0.85 1.63 0.73 319.00
The h i s t o g r a m o f SO4 (IC) that
Colour 45-60 Colour 65-80 Sulphate MT'B Sulphate MTl3
Colour 20-40
majority
of
mg/L, w h i c h
Hazen u n i t s
4.10
concentration
values
found
suggests
is not
2.40 3.20
in
that
(Figure this
2)
region
a water
colour
indicates range
from
threshold
f e a s i b l e and t h e c u t o f f l e v e l should
be set a t 20.
N T
e
Fig.
2.
wm 2 2 upm 4 4 upm 6 6 upm 8 8 upm ie
Frequency
D i s t r i b u t i o n of
for t h e A t l a n t i c R e g i o n .
Sulphate
(IC) C o n c e n t r a t i o n s
59
However i n r e g i o n s where SO4 ( I C )
concentrations
are
higher,
i t may be p o s s i b l e t o i n c r e a s e t h e 2 0 Hazen u n i t s l i m i t . With t h e r e a l i z a t i o n of t h e problems a s s o c i a t e d w i t h SO4 (MTB) a n a l y s i s i n c o l o u r e d w a t e r ,
s e v e r a l a u t h o r s have investigated
t h e p o s s i b i l i t y o f c o r r e c t i n g h i s t o r i c a l SO4 ( M T B ) relationships between A S 0 4 ( S O colour
(Kerekes
et
4 1984)
al.,
( P o l l o c k and Komadina, 1 9 8 3 ) . were
very s i t e s p e c i f i c
and
(MTB)
-
and
data
using
SO4 ( I C ) ) and w a t e r
dissolved
organic
carbon
However t h e r e l a t i o n s h i p s o b s e r v e d i t was
s u g g e s t e d t h a t b a s e d on t h e
l i m i t e d a v a i l a b l e d a t a an o v e r a l l e q u a t i o n f o r SO4 ( M T B )
was
unfeasible. TABLE 3
Regression
analysis
of
delta
sulphate
vs
water
colour
for
s e v e r a l s i t e s i n t h e A t l a n t i c Region. Site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M t Tom Bk A t k i n s Bk W e s t River N S Lakes N S B a s i n 'E' Nfld Lakes Moose P i t Bk P e b b l e . Outflow Rogers Bk Nfld R i v e r s Upper Mersey R i v e r L i t t l e River G r a f t o n Bk Whiteburn Bk NS B a s i n ' D A t o D N ' NB R i v e r s Lower Mersey R i v e r NB L a k e s
Table 3
presents
of ASO,
versus
sites.
Of
I n t e r c e p t R**2
F-Value
0.021 0.020 0.018 0.021 0.020 0.016 0.022 0.016 0.017 0.016 0.015 0.011 0.013 0.013 0.013 0.011 0.012 0.002
-0.108 0.177 0.269 -0.127 -0.201 0.289 -0.445 0.091 -0.069 0.044 0.061 0.465 0.129 0.151 0.039 0.097 0.107 0.181
144.0 105.0 117.0 1294.0 4082.0 386.0 557.0 119.0 732.0 214.0 128.0 13.0 67.0 16.0 20.0 43.0 15.0 0.3
the
water
the
Slope
colour
had
DOC
c o e f f i c i e n t s of
of
for
between
eighteen
were
A
also
determination
found u s i n g w a t e r c o l o u r ( T a b l e 4 ) .
26 20 22 307 1115 103 166 41 328 196 268 30 170 60 91 252 88 75
linear regression analysis
investigated,
sites
relationships (r2> 0.65) between A S 0 4 and
results
0.860 0.850 0.850 0.810 0.790 0.790 0.770 0.750 0.690 0.520 0.320 0.320 0.280 0.210 0.180 0.150 0.140 0.004
Sample
SO4
only
groups 5 0 % had
and c o l o u r .
considered
but
considerably
of
selected
good
linear
Relationships
i n most
cases
lower t h a n t h o s e
60 TABLE 4
Regression
(mg/L) f o r several
d e l t a s u l p h a t e v s DOC
analysis of
sites i n t h e A t l a n t i c Region. Site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Slope
Moose P i t Bk 0.175 NS B a s i n ' E ' 0.185 0.137 Nfld Lakes M t T o m Bk 0.179 NS L a k e s 0.219 R o d g e r s Bk 0.149 L i t t l e River 0.171 West R i v e r 0.110 0.181 Pebble. Outflow A t k i n s Bk 0.126 Nfld Rivers 0.139 Upper M ersey River 0.109 W h i t e b u r n Bk 0.137 N S B a s i n ' DA t o D N ' 0 . 1 3 3 0.051 G r a f t o n Bk 0,071 Lower M e r s e y R i v e r NB R i v e r s 0.032 NB L a k e s -0.002
I n t e r c e p t R**2
F-Value
Sample Size
-0.262 -0.220 0.039 0.423 -0.291 -0.045 0.099 1.720 0.170 2.080 -0.003 0.365 -0.100 -0.231 0.321 0.426 0.335 0.242
688.000 2667.000 235.000 33.800 446.000 543.000 26.500 19.500 41.300 13.200 76.700 86.500 28.100 2.600 11.300 0.008 5.400 0.300
167 LO10 86 18 222 333 19 15 35 14 131 187 61 62 132 64 187 75
0.810 0.730 0.730 0.670 0.670 0.620 0.610 0.580 0.550 0.500 0.370 0.320 0.299 0.298 0.079 0.079 0.014 0.000
Some i n s i g h t i n t o t h e p r o b l e m s o f SO4 ( M T B ) be
gained
the
from
exception,
those
AS04 and
water
intercepts
(-0.5
similar
have the an
0.3). of
of
terminology
presented
.
in
slopes
c a u s i n g o r g a n i c matter
colour
et
(1985)
al.
Statistics
these
describing
c o l o u r s a n d l a r g e s t a n d a r d d e v i a t i o n s have t h e b e s t the
colour
lakes,
considerably
r2 s i t e s . this
relationships.
Newfoundland
deviations
or t o use
sites
water
the
and
t h e s e sites
that
represent colour
( T a b l e 5 ) i n d i c a t e t h a t those sites w i t h h i g h w a t e r
distribution
water
Without
(.016-.022)
T h i s would s u g g e s t
Cheam
3.
Table
may
good r e l a t i o n s h i p s between
similar
have
colour to
types
"iso-chrome"
results
sites which e x h i b i t
correction
The
pattern,
fact
In
first
the
higher
that
fact,
the
than
with
nine
the
sites
those
lakes
may b e e x p l a i n e d b y t h e f a c t t h a t
4 exception
had
observed
Newfoundland
AS0
of
standard for
the
deviate
low from
t h e s e headwater
l a k e s are sampled d u r i n g homothermal c o n d i t i o n s d u r i n g t h e s p r i n g and
fall
of
each
c h a n g e s on t h e A S 0 eliminated.
year
and
versus
therefore
water
the
effects
of
seasonal
c o l o u r r e l a t i o n s h i p have
been
61 TABLE 5
Mean,
Median,
Dev.,
St.
Sample
Size
and
Range
Water C o l o u r s
of
for Selected Sites.
Dev.
Site
Mean
Median
St.
1 2 3 4 5 6 7 8 9
176 147 136 51 105 70 86 114 41 38 38 81 35 75 50 33 61 16
210 140 155 30 110 60 70 120 35 35 35 80 35 80 50 30 60 10
94.2 61.2 58.0 56.8 52.8 51.1 49.6 49.2 34.3 33.7 31.3 28.2 24.2 21.9 18.0 16.9 16.9 12.5
10 11 12 13 14 15 16 17 18
A t k i n s Bk Moose P i t Bk W e s t River N S Lakes M t Tom Bk NS B a s i n ' E ' R o d g e r s Bk Pebble. Outflow N f l d Lakes Nfld Rivers N S B a s i n ' D A t o DN' L i t t l e River NB R i v e r s Upper M e r s e y R i v e r W h i t e b u r n Bk G r a f t o n Bk Lower M e r s e y R i v e r NB Lakes
are
one
or
sites
more have
drainage
headwater
headwater some
bog
area
basin
contrast,
either
sites
the
lakes
drainage, covered with
lakes
although
by
low
bogs
water
20 174 22 307 26 1138 330 41 104 196 91 30 264 268 60 170 88 76
streams
or
In
6).
(Table
Range
280 260 160 240 180 320 220 180 220 320 140 90 120 200 70 110 75 45
the gxeatest w a t e r colour
I n g e n e r a l t h e sites which e x h i b i t variability
Sample Size
the
is
all
drain these
percentage
highly
colour
which cases,
of
variable.
standard
the In
deviations
a r e e i t h e r h e a d w a t e r l a k e s w i t h no bog d r a i n a g e o r a r e d o w n s t r e a m of
a
series
with
large
which The
damp
of
lakes.
storage
reservoirs
seasonal
organic
These
observations
that
systems
have l o n g e r w a t e r r e s i d e n c e t i m e s
variability
fractionation
imply
work
of
both
of
and water
DOC
Glooshenko
and
colour.
Bourbonniere
( 1 9 8 5 ) has i n d i c a t e d t h a t f u l v i c a c i d s comprise a l a r g e percentage
(1
9 0 % ) of
the
humic
acids
Thus,
although
DOC
make
up
these
leaving
a
humic
in
acids.
the This
bog
significant
systems
c o n c e n t r a t i o n s and c o l o u r s , variations
the
relative
while
portion
exhibit
low
further of
the
downstream DOC
variability
pool.
in
DOC
it is l i k e l y that they exhibit large proportions
variation
in
of
chemical
various
fulvic
composition
of
and the
o r g a n i c c a r b o n p o o l a t t h e s e s i t e s may e x p l a i n t h e p o o r r e l a t i o n s h i p s o b s e r v e d between ASO,
and w a t e r c o l o u r and ASO,
a n d DOC.
62 These
results
suggest
that
it
is
possible
SO
MTB d a t a f o r bog i n f l u e n c e d h e a d w a t e r
the
regression equations f o r eight
4 drain
only
headwater
lakes.
t h a t t h e observed ASO,
to
Furthermore,
(mg/L)
of
is
colour
correct
l a k e s o r s y s t e m s which
the
the
similarity
sites
nine
1 . 9 % of
generally
of
indicates
water
the
colour. TABLE 6
Drainage
areas
basin
and
percent
a r e a c o v e r e d by l a k e s
of
and
bogs f o r e l e v e n s i t e s . Site
Station Drainag5 Area km
1 Moose P i t Brook 2 Atkins Brook 3 Mount Tom Brook 4 W e s t River 5 Rodgers Brook 6 Pebbleloggitch Outflow 7 L i t t l e River 8 Grafton Brook 9 Whiteburn Brook 10 Upper Mersey River 11 Lower Mersey River
16.7 15.0 11.1 119.0 10.0 1.7 131.0 52.9 7.1 295.0 723.0
N u m b e r of Lakes i n System
Percent Area Covered by Lakes
0 0 1*
Percent Area Covered by Bogs
1.2 6.3 2.7 3.2 1.0 8.8 1.8 1.9 0.0 2.3 2.5
0.0 0.0 1.4 2.4 0.0 20.6 6.2 12.4 3.9 7.6 9.4
8* 0
1* 14 8
1* 46 83
Headwater Lakes
' * I
CONCLUSIONS This
paper
employs
associated with the A
a l a r g e d a t a set t o r e v i e w t h e problems
measurement
sulphate i n coloured w a t e r s .
of
2 0 Hazen u n i t s w a s e s t a b l i s h e d b e l o w w h i c h
t h r e s h o l d v a l u e of
t h e r e i s no s i g n i f i c a n t d i f f e r e n c e b e t w e e n Although
(MTB).
Region
lakes
trations
it
and may
this
rivers be
in
SO4 ( I C ) a n d
SO4
level is appropriate f o r Atlantic
cut-off
areas w i t h higher
possible
to
increase
s u l p h a t e concen-
the
threshold
level
t o 4 0 Hazen u n i t s o r g r e a t e r . The p o s s i b i l i t y of c o r r e c t i n g h i s t o r i c SO4 ( M T B ) investigated
and
curve
could
not
sites
with
high
AS04 vs
and have
water
be
was
determined
developed.
standard
colour
intercepts. limited
it
However,
deviation
linear
that of
an
overall
it
was
water
relationships
observed
colour
with
data
was
correction had
that good
s i m i l a r slopes
T h i s i n d i c a t e d t h a t bog i n f l u e n c e d s i t e s w h i c h
water
storage
general correction f a c t o r s .
potential
can
be
corrected
using
63 REFERENCES
Methods M a n u a l , 1979. I n l a n d Waters D i r e c t o r a t e ' , Water Q u a l i t y B r a n c h , O t t a w a , C a n a d a . Sulphate i n Coloured Cheam, V., Chau, A. a n d Todd, S . , 1985. Waters: I n v e s t i g a t i o n on Methodologies, Data R e l i a b i l i t y Data. and Approaches t o Salvage Historical Colorimetric
Analytical
NWRI Report. C o n t r i b u t i o n Number 8 5 - 9 5 . Cronan, C.S., 1979. D e t e r m i n a t i o n of S u l p h a t e i n o r g a n i c a l l y A n a l . Chem. 5 1 : 1 3 3 3 . coloured w a t e r samples. 1985. Impact of Organic G l o o s h e n k o , W.A. a n d B o u r b o n n i e r e , R . A . , Waters f r o m P e a t l a n d D r a i n a g e o n A q u a t i c E c o s y s t e m s . Study Long Range T r a n s p o r t o f A i r P o l l u t a n t s P r o g r e s s Report. 1984/85 Annual R e p o r t . H e n r i k s e n , A., 1980. Acidification of freshwaters - a large scale t i t r a t i o n . p. 68-74. I n D. D r a b l o s a n d A. T o l l a n ( e d . ) E c o l o g i c a l Impact o f acid P r e c i p i t a t i o n . Kerekes, J . , Howell, G. and Pollock, T., 1984. Problems associated w i t h s u l p h a t e d e t e r m i n a t i o n i n c o l o u r e d , humic w a t e r s i n K e j i m k u j i k N a t i o n a l P a r k , Nova S c o t i a , C a n a d a . V e r h . I n t e r n a t . V e r e i n Limnol. 22:1811-1817. H i l l , K.C. a n d Lodge, J . P . , 1965. A new c o l o u r i Lazrus, A.L., metric microdetermination of s u l p h a t e i o n . Automatic Anal. Chem., p . 2 9 1 . 1984. Role o f i n l a n d w a t e r s e d i m e n t s as s i n k s Nriagu, J.O., f o r a n t h r o p o g e n i c s u l p h u r . S c i . T o t a l . E n v i r o n . 38:7-13. a n d Malcolm, R.L., 1983. The O l i v e r , B.G., T h u r m a n , E.M. C o n t r i b u t i o n o f Humic S u b s t a n c e s t o t h e A c i d i t y o f C o l o u r e d Geochim. C o s m o c h i m . A c t a , 47:2031-2035. N a t u r a l Waters. 1983. Determination o f sulphate P o l l o c k , T.L. a n d Komadina, V.A., Workshop P r o c e e d i n g s , i n A t l a n t i c Canada s u r f a c e waters. K e j i m k u j i k Calibrated Catchments Program, A p r i l , 1983. Edited by J. Kerekes. J u r k o v i c , A.A. and Nriagu, J.O., 1984. Bacterial Rao, S . S . , A c t i v i t y i n Sediments of Lakes Receiving A c i d P r e c i p i t a t i o n . E n v i r o n m e n t a l P o l l u t i o n ( S e r i e s A ) 36:195-205. T h e c a t i o n d e n u d a t i o n r a t e as a q u a n t i T h o m p s o n , M.E., 1982. t a t i v e i n d e x of s e n s i t i v i t y o f E a s t e r n Canada rivers t o acidic atmospheric p r e c i p i t a t i o n . Water, A i r , S o i l P o l l u t i o n 18:215-226. NAQUADAT. D i c t i o n a r y o f P a r a m e t e r Codes. Water WQB, 1 9 8 3 . Q u a l i t y Branch, Environment Canada, O t t a w a .
SULFATE IN COLOURED WATERS. I. EVALUATION OF CHROMATOGRAPHIC AND COLORIMETRIC DATA COMPATIBILITY
V. CHEAM, A.S.Y. CHAU AND S. TODD National Water Research Institute, Canada Centre for Inland Waters, Burlington, Ontario, Canada
ABSTRACT The compatibility and reliability of colorimetric and chromatographic SO, data were evaluated. The multiple standard addition technique was applied to numerous natural and humic acid fortified waters. A total of more than 20 different waters was used, in which the colour ranged from 50 to 440 H.U. and the organic carbon from 0.7 to 20 ppm. For the first time, it was demonstrated that Ion Chromatography (IC) data on organic-contaminated coloured waters are reliable.It was also confirmed that the Methyl Thymol Blue (MTB) colorimetric data were biased high. An approach for salvaging historical colorimetric data was found and briefly discussed. 1. INTRODUCTION There has been a great deal of discussion and concern over the analysis of sulfate in coloured waters. This is due to its importance in the study of acid rain, and to its questionable colorimetric data caused by interference from coloured matter in the waters. Early sulfate data were generated by the colorimetric method using methyl thymol blue (MTB). The validity of these data have been discussed in several papers; many scientists believe that these data are biased high (Kerekes et al., 1984; Pollock, 1983; Underwood et al., 1983; Kerekes et al., 1982, Watt et al., 1983; Kerekes and Pollock, 1983, Underwood et al., 1982). The high bias of MTB results was suspected as early as 1979 by Cronan. In 1980, Crowther also reported high MTB results in comparison to ion chromatography (IC) results for water samples from Dorset area. The report suggested that the colorimetric method was invalid due to the presence of tannins, lignins, humates and
65
fulvates whereas the IC methodology was relatively unaffected by these interferences. In 1982, Cheam conducted an interlaboratory special quality control study on soft and coloured waters and observed significant difference between MTB and IC results. Although the IC methodology appears to be unaffected by colour interferences, the reliability of sulfate data generated by IC has not been established (Workshop, 1983). Uncertain data lead to uncertain interpretations and conclusions. If sound conclusions are to be made, the reliability of analytical data must be first ensured. Thus in this paper, we wish to evaluate in detail the compatibility of MTB and IC data and to establish their reliability (or non-reliability) . A brief discussion on handling historical data will also be made. 2. STUDY DESIGN Establishing data compatibility or reliability would be greatly simplified if pertinent certified reference materials (CRMs) are available. Since there are no coloured water CRMs, the study design is more complicated and time consuming. The design utilizes the principle of multiple standard additions (Julshamm and Brackhan, 1975; Klein and Hack, 1977; Agemian and Cheam, 1978; Bader, 1980; Kalivas, 1983) and many different types of organic-contaminated waters, including seven different natural coloured waters from the Atlantic and Ontario regions (Table 1) and many humic acid fortified coloured waters (Table 2). By studying the commercial humic acid (H.A.) along with natural organic matter in coloured waters, we diversified the types of organic matter studied, and at the same time were able to create a more even spread of colours. 3 . EXPERIMENTAL Multiple standard addition (MSA) An advantage of MSA is its ability to diagnose the amount present in an unknown. Bader (1980) pointed out that in nearly every case, an appropriate method of standard addition can give the best absolute value for an unknown. In our case, the MSA experimental design is schematically presented in Figure 1 where sample S 0 4 - N represents each of the principal natural samples (Table 1) and humic acid fortified samples (Table 2 ) . ____.._____.I__
FIG 1
EXPERIMENTAL DESIGN FOR STUDYING NATUFiAL COLOURED WATER SAMPLES (TABLE 1 ) AND HUMIC ACID FORTIFIED WATER SAMPLES (TABLE 2)
(CMTB = SO4 concentration by MTB method cIC = SO4 concentration by IC method DOC = dissolved organic carbon)
67
TABLE 1.
IDENTIFICATION OF NATURAL SAMPLES ---- -. .- - . .. . . - - - - - . . .. . ..Sample Origin Colour Name H.U. -- _ __ _ _- _ _ _ - .- - - -- ____ so, -I P e b b 1e l o g g i t c h , A t 1a n t i c R e g i o n 100 __ - - _.__- - __ --- - - __- - -- - -- - .- - -_ - .- -. - - - _- -- - -_ ..- - . so,-I11 Moose R i v e r , O n t a r i o R e g i o n 60 ~__-- ____ __ _- ..~ -~ -. _._ so, -1v Dickie Lake, O n t a r i o Region 100 (Dorset a r e a )
so, -v -- -. so, - V I
A t k i n s Brook, A t l a n t i c Region 160 .. __- -- ..- .__- - - ..- -- - -. __ Upper Mercy R i v e r , 90 A t l a n t i c Region ___ __ Mount Tom B r o o k , A t l a n t i c R e g i o n 100 - _ _ - - . ..--- --- - __ _ _ - __ Sand Pond, A t l a n t i c Region 400 -- - - __ .- _ - .- .- - - - --- - - .. - -- _ - _ - _ - - - - -- .- - - .
__
so, - V I I
~
so, - V I
II
--
TABLE 2 .
H U M I C A C I D ( H A ) FORTIFIED COLOURED WATERS ( A l l v a l u e s are rounded d e s i g n v a l u e s )
._______-___I.________.____-_.-_
Sample
SO, , O r i g i n a l S p i k e , ppm
Name
_ __
--
so, -x -___ so, - X I
__
~
Colour H.U.
_
- .. PH Ba Spike Ad j u s t e d PPm _____ - ___ 4.3 0 _-_ __- 4.3 0
Spike mg/L
H.A.
~
0
60
0
100
10
0
250
25
4.3
0
0
400
40
4.3
0
so, - X I V
2
60
6
4.3
0
so, -xv
2
100
10
4.3
0
so,
2
2 50
25
4.3
0
6 -
~
~
so,
-XI I
so, - X I 1 1 --
-.-
-XVI
L--
so,
-XVII
2
400
40
4.3
0
so, - X X X I
2
400
40
4.3
1
68
Before
subjecting
approximate
( T a b l e 1 ) were d e t e r m i n e d .
SO,+-XIV ( T a b l e 2 ) ,
to
samples
the
concentrations,
SO,
MSA,
the
original
waters f o r t i f i e d samples SO4-X t o c o n c e n t r a t i o n s were f o u n d
xoI
F o r H.A.
t h e o r i g i n a l SO,
of
the
natural
t o b e v e r y s m a l l ; s u b s e q u e n t l y , e a c h w a s s p i k e d w i t h 2 ppm SO, t o p r o d u c e samples SO,-XIV t o SO4-XVII, a n d w i t h 5 ppm SO, t o samples SO4-XXII t o SO4-XXV ( T a b l e 2 ) . These spiked t a k e n as o r i g i n a l a p p r o x i m a t e c o n c e n t r a t i o n s , xoI of t h e f o r t i f i e d samples, produce
v a l u e s were
t h e samples w a s t h e n s u b s a m p l e d i n t o f o u r g r o u p s o f
Each o f
1, a n d t o to y i e l d a
t r i p l i c a t e s u b s a m p l e s a c c o r d i n g to t h e scheme i n F i g . e a c h s u b s a m p l e w a s a d d e d a known SO, final and
added
concentration
2 xo.
(Note
that
s t o c k volume
to
equal
xo;
0.0
a stock solution of
0.5
analyses
shown
i n t o d u p l i c a t e subsamples
in
replicate
analyses
a d d e d SO,
level.
Figure of
each
group of
seven natural
s i x o u t of
concentrations
within
this
the
there
subsamples with
samples
range.
various
are
six
a known
( T a b l e 2 ) were c h o s e n
T h e o r i g i n a l s p i k e s o f 2 a n d 5 ppm SO, because
for
Effectively,
1.
was
This resulted
E a c h of t h e s e s u b s a m p l e s was
in a negligible dilution effect). subdivided
1 xo
1 0 0 0 ppm SO,
u s e d so t h a t o n l y a v e r y s m a l l v o l u m e w a s a d d e d . further
xo,
The
used
( T a b l e 1) h a d
colour
250
H.U.
of
f o r t i f i e d waters ( T a b l e 2 ) w a s s t u d i e d t o p r o v i d e a more e v e n l y spaced
colour
range
than
waters.
natural
Also
the
pH
was
a d j u s t e d t o a p p r o x i m a t e l y 4 . 3 so t h a t acid
rain
pH
of
1964),
Martell,
4-5. only
To
it is w i t h i n t h e u s u a l p r e c i p i t a t i o n ( S i l l e n and
avoid
- 1 ppm
Ba
was
to
added
two
SO,-XXX and SO,-XXXI, t o see w h e t h e r t h e r e is i n t e r f e r e n c e or p r o b l e m a s s o c i a t e d w i t h i t s p r e s e n c e .
samples, any
Ba
Tina 1y s e s The using
ion an
chromatography
automated
Dionex
analyses 2100
of
were
SO,
system.
All
carried out samples were
f i l t e r e d b e f o r e b e i n g i n t r o d u c e d i n t o a 50 p L sample loop. e l u e n t was prepared b y d i s s o l v i n g 2 . 2 5 o f NaHCO, rate
was
g of N a , C 0 3
i n 1OL o f d e i o n i z e d d i s t i l l e d water. 2.0
mL/minute.
The
sample p a s s e d
The
and 2.25 g
The e l u e n t f l o w through
a
guard
or precolumn, a s e p a r a t o r column, an anion f i b r e s u p p r e s s o r w i t h d i l u t e H,SO, as r e g e n e r a n t , a n d f i n a l l y a conductivity detector. The d e t e c t e d s i g n a l w a s a m p l i f i e d and
column
69
to
converted
concentration
Hewlett
a
through
Packard
recorder/integrator. T h e c o l o r i m e t r i c SO, automated
No.
m e a s u r e m e n t s were c a r r i e d o u t u s i n g t h e
methylthymol
blue
(MTB)
method,
16306 (Environment Canada, 1 9 8 1 ) .
at
then
uncomplexed
NAQUADAT
t h e m e t h o d a l l o w s B a t o r e a c t w i t h SO,
o f B a C 1 2 a n d MTB, pH;
as
coded
Using equimolar s o l u t i o n
high
pH,
MTB,
reacts
Ba
is
which
with
measured
MTB, and
at low a grey t o SO,
leaving equated
c o n c e n t r a t i o n p r e s e n t i n t h e sample. D i s s o l v e d o r g a n i c c a r b o n (DOC i n ppm) was a n a l y s e d by t h e I R Analyser 1981);
Method,
was
pH
Apparent
Naquadat
code
measured
colour,
in
using
Hazen
Units,
(Environment
Radiometer
was
was m e a s u r e d u s i n g a CDM-83 Absorption
direct
by
HD061197);
No.
Na2S0,,
technique,
Naquadat
d i s t i l l e d water Chau,
1982).
flasks,
Stocks
whereas
from
NaHCO,
l e a s t o n e week
and
standards
before
were
samples were
test
the
and
Na2C0,
at
for
code
1981).
Inc.
J.T.
c o n t a i n e r s were c l e a n e d a n d
A l l
visual
a n d B a by t h e
C hem - __i c a 1s,-G_l_as_s_-a nd P 1a s t i c w a re H u m i c a c i d w a s p u r c h a s e d from A l d r i c h Chemical Co. C h e m i c a l Company.
meter.
S p e c i f i c conductance
c o n d u c t i v i t y meter,
aspiration
56101 (Environment Canada,
Canada,
PHM64
determined
a H e l l i g e Aqua T e s t e r .
comparison using
Atomic
06101
a
in
(Lot
Baker
stored
in
use
(Cheam a n d
in
volumetric
plastic
containers
made
w i t h s i z e s r a n g i n g f r o m 50 mL t o 500 mL. 4.
RESULTS AND DISCUSSION
C o m...p__ a _t i b__ i l.-i t y a n d r e.l i a b i l i t y o f MTB a n d
-Lg-daka
T h e a p p l i c a b i l i t y of t h e MSA r e q u i r e s t h a t t h e r e c o v e r i e s b e uniform, line,
t h e a d d i t i o n l i n e b e s t r a i g h t and p a r a l l e l to s t a n d a r d
the
dilution
effect
s t a n d a r d be about 0.5,
t h e samples ( F i g u r e 1 ) .
t o 1 5 d i f f e r e n t samples 1) and Each
of
eight
humic
these
be
minimal,
and
the
addition
of
1 . 0 and 2 . 0 times t h e o r i g i n a l v a l u e s i n
acid
1 5 water
I n t o t a l , w e a p p l i e d t h e MSA p r o c e d u r e
-
s e v e n n a t u r a l c o l o u r e d waters f o r t i f i e d coloured
samples
was
waters
analyzed
by
(Table
(Table 2). IC
and
MTB
m e t h o d s b e f o r e a n d aEter m u l t i p l e s t a n d a r d a d d i t i o n s . The g e n e r a l b e h a v i o u r o f methods of
MSA a p p l i c a t i o n t o t h e MTB a n d I C
t h e s e 1 5 water samples a r e i l l u s t r a t e d i n F i g u r e 2 .
The o r d i n a t e r e p r e s e n t s t h e a n a l y t i c a l r e s p o n s e or t h e amount
I0
TABLE
3.
COMPARISON OF CMTB RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN NATURAL AND FORTIFIED SAMPLES*
___ _
.-
-
-
Sample
.. - .-
Direct Analysis (amount found)
so,-I so, -11 I so,-1v
5.50 9.60 4.73 6.13 6.44 5.17 9.57 3.07 3.50 5.82
-V so, -VI so, -VI I so4-VIII so, -XIV SO,
so, -xv
f f f
f f 8.40 f 6.75 f
SO, -XVI
so, -XVII SO, -XXI I S04-XXIII SO, -XXIV
5.38 f 7.63 f 10.74 f
so, -xxv
6.64 9.21 10.20 10.52 6.80 7.47 8.18 3.54 4.11 6.40 10.95 7.42 5.17 7.42 12.14
f 0.52 f 0.20 f 0.06 f f
.-~
__
_ _ - _ - -~ _____
_ _ ..
~
-. . . ..-
MSA (amount present)
0.83 0.48 0.31 0.12 0.17 0.12 0.12 0.13 0.16 0.08 0.23 0.06 -
.
__
f
2.26
f 0.15 f
4.94
f 2.52 f f 2
i f f f -I f
f f
-
0.87 1.74 0.10 0.40 0.27 0.45 0.66 0.48 0.32 0.62 1-40 -
~. .. .. . -.-
*CHTB = SO, concentration by MTB method. TABLE 4 .
COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN NATURAL WATERS*
~
-- _-
~
Sample -
- - - - - - - - - - - - - - -.
so,-I so4-I11
so, -1v
.
Direct Analysis (amount found)
so,-v so,-VI so, -VII so4-VIII ._._ - - - __ - - - _
-
MSA (amount present)
- - -- .- - - _ . . ..- - - - - - - - - --.- _ _. - .- - - 2.37 f 0.16 2.86 f 0.18 8.95 f 0.45 8.99 f 0.32 1.63 0.19 1.61 f 0.11 1.56 f 0.31 1.67 f 0.08 4.96 f 0 . 1 5 5.10 f 0.12 1 . 9 3 f 0.36 2.08 f 0.15 2.39 f 0.42 1 . 9 5 ?r 0 . 0 6 - - - - _ - - - - - - - - - - - - - - - __ _ - - - _._ .- .- - - - - .- __ - - - --
_
*CIC = SO, concentration by IC method. TABLE 5 .
COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN HUMIC ACID FORTIFIED SAMPLES* (AT 2 PPM SO, SPIKE LEVEL)
- -_ - - - - - - - _ - - - -- - - - _ - - - - - - - - -- -
-
--
- -
- .- - - -
Sample
Direct Analysis (Amount Found) - - - - - __-- - - - - - - - - _ __ .____ - - __ - _ _ - - _. -_ - -
so,-x so, -XI so, -XI I so4-XI11 so, -XIV
so, -xv so, -XVI so, -XVII so, -xxx so4-XXXI -- _ - - __ - - - -- -._ - - ---
0.06 0.06 0.08 0.11 2.17 2.09 2.20 2.07 2.07 2.18 .- - - - - - - - -
_ .- -
__ - - .- .- -
.-
MSA (Amount Present) .-
__ .. ... ...-
-
_-
f 0.03 f 0.02 f
0.02
i 0.02
0.15 0.07 f 0.19
2.19 f 0.19 2.03 f 0.05 2.09 f 0.18 2.05 f 0.05
f f f f
0.02 0.03
f 0.03
_
- -_ - - - - _ - - -_ - -
*CIC = SO, concentration by IC method.
__ _-
- - - - -_ - - - - - - - - -
71 f o u n d by d i r e c t a n a l y s i s , w h e r e a s t h e a b c i s s a r e p r e s e n t s t h e c o n c e n t r a t i o n added and t h e amount " p r e s e n t " by e x t r a p o l a t i o n of
addition
line.
The
amount
is
present
defined
a b s o l u t e a b c i s s a v a l u e a t t h e i n t e r s e c t i o n of and t h e e x t r a p o l a t e d l e a s t - s q u a r e d
as
the
the abcissa l i n e
addition line.
T h e MTB l i n e ( F i g u r e 2 ) is c u r v e d , w h i c h i n d i c a t e s e x i s t e n c e of
interference
makes
and h e n c e u n c e r t a i n t y o f d a t a .
extrapolation
meaningless.
extrapolated as a s t r a i g h t l i n e ,
if
But
This curvature the
line
was
t h e amount " p r e s e n t " would be
is u n a c c e p t a b l e . Table 3 two t y p e s o f a m o u n t s a n d i n d e e d i n d i c a t e s t h a t t h e amount p r e s e n t is i n g e n e r a l h i g h e r t h a n t h e amount f o u n d .
h i g h e r t h a n t h e amount f o u n d ,
which
summarizes t h e Table
3 f u r t h e r shows t h a t
SO,-XVII
t h e r e s u l t s f o r samples SO4-XIV
i n c r e a s e w i t h c o l o u r a n d DOC,
to
a n d a r e much h i g h e r t h a n
t h e e x p e c t e d 2 ppm; l i k e w i s e , t h e r e s u l t s f o r sample SO,-XXII t o SO4-XXV i n c r e a s e w i t h c o l o u r a n d DOC a n d a r e h i g h e r t h a n t h e expected Finally, with,
5
ppm.
Thus,
the
MTB
t h e I C r e s u l t s (Table 3 vs.
are
results
are h i g h e r t h a n ,
t h e MTB r e s u l t s
not
reliable.
thus not compatible
Tables 4,
5 and 6 ) .
TABLE 6 .
COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA I N H U M I C A C I D FORTIFIED SAMPLES* (AT 5 PPM SO, SPIKE LEVEL) __ - - - .- - - - - - - - - - - - - - - - - - - - . - - - - ._ --- .. - - .- .- . .- - .-. - Sample Direct A n a l y s i s MSA (Amount F o u n d ) (Amount P r e s e n t ) ___- - - - - - - - - - - - -- - - __ - - - - - - - - .- .- - - - - _ _ - - - - - - - - .._ ._ -.- - - - .. so, - X X I I 5.68 f 0.08 5.54 2 0 . 1 5 so, - X X I I I 5.09 f 0.07 5.03 f 0.02 so, - X X I V 5.16 f 0 . 0 3 5.05 f 0.10 so, -xxv 5.09 f 0.06 5.06 f 0.17 - ..- - - - - .- - . - - - - - - - - - - - - - - - - . - - - - - - - - - .- - - .- . .- . . . .- ..- - - . . . *CIC = SO, c o n c e n t r a t i o n by I C m e t h o d . The I C s t a n d a r d an d a d d i t i o n l i n e s r e p r e s e n t i n g e a c h o f 1 5 samples a r e d e p i c t e d ly.
For n a t u r a l waters,
IC analyses.
of
uniformity
obeyed. the
third
F i g u r e s 3-9
I n e v e r y case, and
(=2 xo),
l e v e l o f n e a r l y 30 ppm. (Figure
o t h e r samples.
4)
and
dictated
t h e criteria
by
the
MSA
are
which
corresponds
t o a h i g h SO,
To a v o i d t h i s c o n c e n t r a t i o n e f f e c t , w e
d i l u t e d t h e s o l u t i o n s t w o times, line
as
w e o b s e r v e d s l i g h t downward c u r v a t u r e a t
F o r SO4-111, addition
p r e s e n t t h e MSA p l o t s o f
e x c e p t f o r SO4-111,
parallelism
the
3 to F i g u r e 17, r e s p e c t i v e -
i n Figure
then
obtained a s t r a i g h t addition
calculated
the
amounts
as w i t h
the
12
c
P x
Amxnt l 0 d
,/’ //’
/’ /
Anuunlpreseni Fig. 2
24
0
Cmcentrstim
THE GENERAL BEHAVIOR OF MSA APPLlCATlON TO THE M T B AND I C METHODS.
Fie 3
K: STANDARD ADDITION CUFiVE FOR SO4- I
-
22 -
20 -
7-
MOOSE RIVER
coLoR.6o,Doc~lz
coLoR.1oo.wc.11
ppm =4 added FQ 4 IC STANDARD ADDITION CWE FOR SO.-m
Ppn SO4
Fig 5
added
IC STANDARD ADDlTlDN UlRM FOR SO,-E!
73
T a b l e 4 c o m p a r e s t h e SO, amounts o b t a i n e d by d i r e c t a n a l y s i s (amount f o u n d ) and by MSA (amount p r e s e n t ) f o r t h e n a t u r a l waters,
and
shows good
agreement w i t h i n
This gave a s s u r a n c e t h a t
I C produces
errors.
experimental
reliable
results
in
the
p r e s e n c e of o r g a n i c matter i n n a t u r a l w a t e r s . f o r t i f i e d waters, a t colour
For humic a c i d
440
,
H.U.
the
parallelism
addition
(Figures
5,
agreement
225 and and the
v a l u e c a n be t a k e n
b a c k g r o u n d c o n c e n t r a t i o n s were
2 ppm s i n c e t h e
(Table
90,
also i n d i c a t e uniformity I n t h e s e f o u r samples,
lines
10-13).
o r i g i n a l s p i k e was 2 ppm, so t h e " t r u e " SO, as
50,
very
small
t o SO,-XIII). T a b l e 5 a l s o shows good b e t w e e n t h e t w o SO,, a m o u n t s d e t e r m i n e d by d i r e c t SO,-X
a n a l y s i s and by MSA.
T h e s e amounts r e p r e s e n t f o r a l l p r a c t i c a l
t r u e value,
is: amount f o u n d = amount p r e s e n t = t r u e amount, which i n d i c a t e s d a t a r e l i a b i l i t y . 100% of
purposes For
waters,
higher
the
levels
SO,
Figures
14-17
agreement between t h e
in
that
four other
humic
Table 6 c l e a r l y
and
t h r e e amounts,
acid
fortified
indicate excellent
found-present-true,
thus
a d d i n g f u r t h e r s u b s t a n t i a t i o n t h a t t h e d a t a o b t a i n e d by I C a r e unaffected
i n t e r f e r e n c e s from c o l o u r and
by
various
types
of
o r g a n i c m a t t e r , and a r e t h e r e f o r e r e l i a b l e . O t h-_e r -P-_ o s s- i___ ble I n t e r f e r e n __ ts
.-
Crowther
(1980)
tested
possible
interference
from
pH,
F e ( II), Mn(VI1) , humic a c i d , t a n n i c a c i d and l i g n i n s u l f a t e on a s i n g l e s t a n d a r d a d d i t i o n o f 1 0 ppm SO, and f o u n d t h a t t h e I C
r e c o v e r i e s were s a t i s f a c t o r y . The Before
presence the
estimated
of
barium
introduction the
precipitation
amount of
of
BaSO,,
( S i l l e n and M a r t e l l ,
may
of
Ba
which
Ba
using
1964).
cause c h e m i c a l i n t e r f e r e n c e . s a m p l e s t o be t e s t e d , w e
to
the
could
be
added
solubility
without
product
data
One ppm was e s t i m a t e d t o be s a f e
t h i s amount was a d d e d t o two HA f o r t i f i e d waters h a v i n g colour o f 50 and 440 H . U . and 2 ppm SO,. The r e c o v e r i e s o f SO, a s d e t e r m i n e d by I C were - 1 0 0 % ( T a b l e 5 , SO,-XXX, SO,-XXXI). T h e s e r e s u l t s i n d i c a t e t h a t B a , a l s o , causes no i n t e r f e r e n c e and
and h e n c e f u r t h e r s u b s t a n t i a t e s t h e r e l i a b i l i t y o f t h e v a l i d a t i o n o f t h e I C method.
I C d a t a and
74
1
:I 7
"I
:/
J
/ J
-
0
x 10
so4 -I! ATKINS B R O X
coLoR.1Go.Doc.1.1
SO4-
PI
UPPER MERCY RIVER COLcf?.So, Doc.9
I
'
0
2
4 Ppm
Fig 6
6
8
10
12
14
so4 added
ppm SO4 added Fig. 7 IC STANDARD AM)ITiON CVRVE FOR SO,-=
IC STANDARD ADDITION CURVE FOR SCb- P
7 t7
/
MOUNT TOM BROOK
coLoR-xx).Doc.lo.7 coLoR-xx).Doc.lo7
FQ. 8
IC STANDARD ADDVlON CURVE FOR SO.-lXS
Fig. 9
I C STANDARD Aw(TK)N WRVE KH S04-XIU
15
6-
6-
FIE 10 I C STANDARD ADDITION CURVE
A)R
S04-Xm
Fig 11
'f
F u . W IC
STANDARD ADDITON CURM
A)R
SO4-=
IC
STANDARD W T I O N CuLlVE
A)R
S04-XQ
76
,
12r
0
2
4
6
wm Fig 14
IC STANDARD ADDITON CURVE FOR SO,-XXn
8 1 0 1 SO4added
Fio 15 I C STANDARD ADDITON
"i
aRIM
'
2
FOR SO4--
12r
121
U
HA FORTIFIED CoLoR.440.Doc~7.2
' 0
2
4
6 8 10 ppm SO4 added
Fig.16 IC STANDARD ADDITCN CURVE
12
FOR SO,-xXm
4
6
0 ppm SO, added
rb
1'2
Fig. l7 I C STANWRD ADDITION CURVE FOR Sod-=
i4
77
A p p r o a c h e s for S a l v a g i n g H i s t o r i_c_a_l _D_ a_t a ~
Having e s t a b l i s h e d t h e q u a l i t y o f and
MTB
approaches data.
we
methods, in
This
elsewhere.
to
order
salvage
is
evaluation
data generated
to
proceeded
explore
and
historical
lengthy
and
by t h e I C
evaluate
the
MTB
colorimetric
be
communicated
will
The f o l l o w i n g b r i e f l y s u m m a r i z e s t h e f i n d i n g s :
1)
There
i s no
which
readily
simple
and
universal
converts
the
correction
historical
data
factor
to
true
values.
SO,
However, w e h a v e f o u n d t h a t f o r a s p e c i f i c amount and
2)
n a t u r e of o r g a n i c m a t t e r , t h e r e e x i s t s a r e l a t i o n s h i p b e t w e e n SO,
d e t e r m i n e d by MTB and SO,
d e t e r m i n e d by
t o s a l v a g e h i s t o r i c a l MTB d a t a , w e s i m p l y c a s e by case t h e t w o t y p e s of SO, values,
Thus,
IC.
obtain
r e l a t e them by a p o l y n o m i a l e q u a t i o n , a n d i n t e r p o l a t e t h e corresponding h i s t o r i c a l values to o b t a i n t h e e x p e c t e d t r u e SO,
values.
The c a s e by case t r e a t m e n t
can involve a s p e c i f i c site, r i v e r , of
them
which
similar
have
lake,
amount
or a group
and
nature
of
o r g a n i c matter. ACKNOWLEDGEMENT
We t h a n k D r . T. P o l l o c k f o r t h e many e n l i g h t e n e d d i s c u s s i o n s and
Dr.
Kerekes
J.
region;
for
providing
MacCrae and P .
R.
the
samples
from
Atlantic
Campbell f o r s u p p l y i n g t h e samples
from O n t a r i o r e g i o n . REFERENCES Agemian,
a n d Cheam,
H.
V.,
1978.
Anal.
Chim.
lo?,p p .
Acta,
193-197. Bader, M . ,
1980.
J . Chem.
Ed.,
5 7 ( 1 0 ) , pp.
703-706.
" S p e c i a l s t u d y on s o f t and c o l o u r e d w a t e r s t o LRTAP w a t e r s h e d s t u d y a r e a s - I o n i c b a l a n c e , a c i d i t y , a l k a l i n i t y , major i o n s and p h y s i c a l p a r a m e t e r s I R Q C 88 t o 9 1 " . NWRI M a n u s c r i p t N o . 6 1 , AMD-6-82-VC. 1982. "Manual €or t h e b i m o n t h l y Cheam, V. and Chau, A.S.Y., Cheam,
V.,
1982.
pertinent
interregional NO.
quality
control
studies".
Chem.,
5 1 ( 8 ) , pp.
NWRI
Manuscript
48, AMD-6-82-VC.
C r o n a n , C.S.,
1979.
Anal.
1333-1335.
78
Crowther, J., 1980. "Sulfate analysis for streams and lakes from Dorsett area". Memorandum to S. Villard and P. Dillon, Ontario Ministry of the Environment. Environment Canada, 1981. "Analytical Methods Manual". Inland Waters Directorate, Water Quality Branch, Ottawa. Julshamm, K. and Braekhan, O.R., 1975. At. Absorpt. Newsletter, 14(3), pp 49-52. Kalivas, J.H., 1983. Anal. Chem., 55, pp 565-567. Kerekes, J., Howell, G., Beauchamp, S. and Pollock, T., 1982. "Characterization of three lake basins sensitive to acid precipitation in central Nova Scotia (June, 1979 to May, 1980)". Int. Revue Ges. Hydrobiol., 67(5): 679-694. Kerekes, J. and Pollock, T., 1983. Can. J. Fish. Aquat. Sci., 40 (12)I pp 2260-2261. Kerekes, J., Howell, G., and Pollock, T., 1984. Problems associated with sulfate determination in coloured, humic waters in Kejimkujik National Park, Nova Scotia (Canada). Verh. Internat. Verein. Limnol., 22: 1811-1817. Klein, R. Jr., and Hach, C., 1977. Am. Lab, July 1977, pp. 21-27. Pollock, T., 1983. "Determination of sulfate in Atlantic Canada surface waters". Draft of a report. Sillen, L.G. and Martell , A.E. I 1964. Special Publication No. 17, The Chemical Society, London. Underwood, J.K., Vaughan, H.H., Ogden, J.G., 111, and Mann, C.G., 1982. "Acidification of Nova Scotia Lakes 11; Ionic Balances in Dilute Waters". Nova Scotia Department of the Environment (Halifax, Nova Scotia). Technical Report. Underwood, J.K., McCurdy, R.F. and Borgal, D., 1983. Effects of colour inteference on ion balances in Atlantic Canada lake waters using methylthymol blue sulfate results: Some preliminary observations. Workshop Proceedings (Kejimkujik Calibrated Catchments Program), Life Science Centre, Dalhousie University, Halifax, Nova Scotia (April 26, 1983). Ed. J. Kerekes. Watt, W.D., Scott, C.D. and White, W.J., 1983. Can. J. Fish. Aquat. Sci. , 40, pp 462-473. __ Workshop on Chemistry of Organic Waters, January 20, 1983. NWRI, CCIW, Burlington, Ontario.
THE IMPORTANCE OF DESIGN QUALITY CONTROL TO A NATIONAL MONITORING PROGRAM R.E.
KWIATKOWSKI
Water Quality Branch, 10th Floor, Place Vincent Massey, Inland Waters Directorate, Ottawa, Canada, K1A OE7 ABSTRACT: Water quality monitoring within Canada has been carried out by the Water Quality Branch, Department of the Environment, since 1970. Recent analyses of national (coast to coast) data for a variety of water quality parameters have identified an inherent difficulty with the interpretability of the data sets stored on NAQUAOAT (Canada's National Water Quality Data File). A lacustrine system (Lake Ontario) will be used as a case example of this difficulty, described as network design assurance. The importance of network design assurance, both spatially and temporally, to regional and national data sets, will be discussed, and a possible solution presented. Comments on the design of lotic networks under the newly implemented Water Quality Federal-Provincial Agreements will be discussed in relation to the above difficulty, identifying Canada's present attempt to produce data sets capable of generating statistically and ecologically valid national reports.
_INTRODUCTION _ Water represents one of Canada's most valuable resources, covering some percent of its surface. There is no substitute for water. The survival of all forms of life depends upon an adequate supply of water of acceptable quality. Thus, a sound knowledge of water quality is essential to all levels of government for the management of present water uses and for the planning of future uses. While management responsibilities for water in Canada are shared between the provinces and the federal government. the federal government plays an important leadership role, particularly when addressing water quality on a national level. The Water Quallty Branch (WQB), Inland Waters Directorate (IWD), Department of the Environment (DOE) is responsible for providing this leadership. Since its conception in 1970, the Water Quality Branch has carried out water quality monltorlng to provide scientific and technical information and advice on water quality in order to promote the conser,vation and enhancement of the quality of Canada's Inland water resources for the economic and social lenefit of all Canadians. 7.6
80
I t i s envisaged t h a t n a t i o n w i d e ( c o a s t t o c o a s t ) i s s u e s (e.g.
acid rain,
p e s t i c l d e r . e t c . ) w i l l c o n t i n u e t o be o f c o n c e r n and m a j o r energy developments w i l l be t h e cause f o r w a t e r q u a l i t y concerns a f f e c t i n g e n t l r e r e g i o n s w i t h i n Canada i n t h e f u t u r e .
Proposed c o n t i n e n t a l w a t e r d i v e r s i o n
schemes, such as t h e N o r t h American Water and Power A l l i a n c e t o d i v e r t w a t e r f r o m A l a s k a and t h e Yukon t o t h e s o u t h w e s t e r n U n l t e d S t a t e s , and t h e GRAND Canal ( G r e a t R e c y c l i n g and N o r t h e r n Development Canal) p r o j e c t ,
i n v o l v i n g t h e b u i l d i n g o f a dam a c r o s s t h e mouth o f James Bay and d i v e r t i n g w a t e r t h r o u g h t h e G r e a t Lakes t o t h e c e n t r a l and e a s t e r n U n i t e d S t a t e s , pose s i g n i f i c a n t e c o l o g l c a l concerns.
A sound knowledge o f t h e
w a t e r q u a l i t y i n Canada i s b a s i c t o an assessment o f t h e e n v i r o n m e n t a l and economic i m p a c t o f such developments and i t I s t h e f e d e r a l government t h a t has t h e mandate and t h e r e s p o n s i b i l i t y t o c o l l e c t t h i s n a t i o n a l w a t e r q u a l l t y data. To meet t h i s n a t i o n a l o b j e c t i v e t h e WQB c o n s i s t s o f a h e a d q u a r t e r s , l o c a t e d i n Ottawa, and f i v e r e g i o n a l o f f i c e s (Moncton L o n g u e u i l - Quebec Region, B u r l i n g t o n
-
-
A t l a n t i c Region,
O n t a r l o Region, Regina - Western
and N o r t h e r n Region, and Vancouver - P a c i f i c and Yukon Region. F i g u r e 1 ) . Water Q u a l i t y m o n i t o r i n g programs c a r r i e d o u t by t h e r e g i o n a l o f f i c e s a r e i n d i r e c t response t o t h e needs o f t h e r e g i o n , as d i c t a t e d by t h e R e g i o n a l D i r e c t o r , w h i l e h e a d q u a r t e r s (WQB) c a r r ! e s o u t a f u n c t i o n a l g u i d a n c e role.
I t s h o u l d be n o t e d t h a t t h e WQB i s o p e r a t i o n a l i n n a t u r e .
Research
and r e l a t e d s u p p o r t s e r v i c e s a r e p r o v i d e d by t h e r e s e a r c h i n s t i t u t e s . N a t i o n a l Water Research I n s t i t u t e (NWRI) and N a t i o n a l H y d r o l o g y Research I n s t l t u t e (NHRI).
The Branch i n t u r n p r o v i d e s o p e r a t i o n a l s u p p o r t t o b o t h
institutes. REGIONAL OFFICES AND HEADQUARTERS WATER QUALITY BRANCH CANADA
81 Recent r e q u e s t s , such as t h a t f r o m t h e Pearse Comnisssion's I n q u i r y on F e d e r a l Water P o l i c y , f o r d a t a on a v a r l e t y o f w a t e r q u a l i t y parameters f r o m v a r i o u s r i v e r b a s i n s a c r o s s Canada have r e v e a l e d a m a j o r d i f f i c u l t y w i t h t h e I n t e r p r e t a b i l i t y o f t h e d a t a s t o r e d on NAQUADAT ( N a t i o n a l Water Q u a l l t y Data Base, W h i t l o w and Lamb, 1983).
This d i f f i c u l t y .
referred t o
i n t h i s m a n u s c r i p t as " d e s i g n q u a l i t y assurance". must be c o n s i d e r e d when i n t e r p r e t i n g national (coast t o coast) or regional data t o d e f i n e A l a c u s t r i n e system (Lake O n t a r i o ) w i l l
d i f f e r e n c e s between r i v e r b a s i n s . be o f f e r e d as a case example.
Comments on how t o e s t a b l i s h a l a c u s t r i n e
and a l o t i c system m o n i t o r i n g program t o overcome t h e d i f f i c u l t y w i l l a l s o be s u p p l i e d . The Problem There a r e v a r i o u s phys c a l and b i o c h e m i c a l processes i n v o l v e d i n t h e s t o r a g e and t r a n s p o r t o f p o l l u t a n t s i n a l a c u s t r i n e system. C o n c e n t r a t i o n s o f p o l l u t n t s I n t h e aqueous phase a r e maximum a t t h e p o i n t o f e n t r y i n t o t h e a q u a t i c system.
These i n p u t s can undergo c o a s t a l
entrapment due t o s p e c i f i c p a t t e r n s and t h e r m a l s t r u c t u r e s c h a r a c t e r i s t i c o f i n s h o r e areas (Csandy. 1970).
As a r e s u l t , n e a r s h o r e environments
o f t e n d i s p l a y s i g n i f i c a n t l y h i g h e r c o n c e n t r a t i o n s o f i n p u t s and g r e a t e r
As a
v a r i a b l l l t y t h a n do t h e o f f s h o r e areas (Beeton and Edmondson, 1972).
r e s u l t concentrations of various p o l l u t a n t s are n o t stationary, but rather t h e y f o r m a continuum, s p a t i a l l y and t e m p o r a l l y .
A common o b j e c t i v e o f
m o n i t o r i n g o r s u r v e l l l a n c e programs i s t o sample t h i s continuum and e s t i m a t e a mean v a l u e ( f o r a g i v e n a r e a f o r a g i v e n t i m e p e r i o d ) f o r a v a r i e t y o f water q u a l i t y v a r i a b l e s :
where: ji
=
n 1
xIjp
r21
j=l
XA i s t h e annual average v a l u e f o r t h e g i v e n w a t e r q u a l i t y parameter. I I s t h e number o f sampling t r i p s f o r t h e g i v e n year i = 1 , 2, 3
xc
.... N .
I s t h e sampling t r i p average v a l u e f o r t h e g l v e n w a t e r q u a l i t y
parameter. x i j I s t h e c o n c e n t r a t l o n f o r t h e g l v e n w a t e r q u a l i t y parameter d u r i n g t h e I t h sampling t r i p , a t t h e j t h s t a t i o n . n i s t h e number o f s t a t i o n s j f o r sampling t r l p 1.
82
Subsequent comparisons o f t h e mean values ( a n n u a l l y , sampling t r i p ,
XA,
o r by
Xc) u t i l i z i n g a v a r i e t y o f parametric o r nonparametric
s t a t i s t i c a l techniques. a r e then performed, e i t h e r s p a t i a l l y t o determine areas o f s i g n i f i c a n t d e t e r i o r a t i o n , o r t e m p o r a l l y t o determine trends. Major d i f f i c u l t i e s i n o b t a i n i n g an adequate e s t i m a t e o f t h e mean value
XA
include:
t h e s p e c i f i c a t i o n o f s t a t i o n l o c a t i o n t o supply adequate
s p a t i a l coverage; adequate sampling frequency, t o d e s c r i b e seasonal f l u c t u a t i o n s ; and s u f f i c i e n t observations, t o reach a d e s i r e d confidence i n t e r v a l about t h e mean value; a l l w i t h i n a predetermined budget,
Thus
t h e l o c a t i o n , number and frequency o f measurements a r e based on t h e o b j e c t i v e s o f t h e program. t h e a v a i l a b l e s c i e n t i f i c knowledge, and resource a v a i l a b i l i t y .
U n f o r t u n a t e l y these v a r i a b l e s a r e n o t constant
between regions, and o f t e n n o t between programs w i t h i n a r e g i o n . Consequently t h e e f f o r t p u t i n t o t h e e s t i m a t e
(X)
o f t h e mean value (11)
o f t h e water q u a l i t y v a r i a b l e v a r i e s s u b s t a n t i a l l y between programs, depending on t h e issues o r parameters o f i n t e r e s t . When n a t i o n a l ( c o a s t t o c o a s t ) assessments, f o r any g i v e n parameter, a r e c a r r i e d out, various d a t a f i l e s f r o m t h e v a r i o u s programs c a r r i e d o u t w i t h i n t h e f i v e WQB regions a r e simply brought t o g e t h e r f o r e i t h e r comparative purposes, o r t o e s t a b l i s h a n a t i o n a l average c o n d i t i o n . Rarely i s design v a r i a b i i t y between programs taken i n t o account i n t h e interpretation of the r e u l t s .
I t i s a simple m a t t e r t o o b t a i n t h e number
and l o c a t i o n o f s i t e s ex s t i n g i n t h e v a r i o u s r i v e r basins from t h e computer f i l e s ; however, i t i s n o t always c l e a r how t h e number o f s t a t i o n s , t h e i r l o c a t i o n and parameters measured were chosen.
Yet w i t h o u t
t h i s i n f o r m a t i o n ( d e s i g n q u a l i t y assurance), gross m i s r e p r e s e n t a t i o n o f p o l l u t a n t c o n c e n t r a t i o n s on a r e g i o n a l o r n a t i o n a l s c a l e can occur, making comparisons, though s t a t i s t i c a l l y s i g n i f i c a n t . e c o l o g i c a l l y i n s i g n i f l c a n t . The L a c u s t r i n e System Concerned about t h e d e t e r i o r a t i o n o f t h e water q u a l i t y o f t h e lower Great Lakes, t h e Governments o f Canada and t h e U n i t e d States signed t h e Great Lakes Water Q u a l i t y Agreement I n 1972.
Prior t o the signing of the
Agreement, m o n i t o r i n g o f Lake O n t a r i o was i n c o n s i s t e n t .
No two years
d i s p l a y e d s i m i l a r s t a t i o n p a t t e r n s , o r c r u i s e (sampling t r i p ) frequency. I n 1974 a c o n s i s t e n t open l a k e s u r v e i l l a n c e program was designed f o r Lake O n t a r i o t o m o n i t o r v a r i o u s water q u a l i t y parameters, i n c l u d i n g t h e biomass parameter c h l o r o p h y l l
a.
The s t a t i o n p a t t e r n was designed t o g i v e an
o v e r a l l view o f t h e lake, w i t h maximum sampling o c c u r r i n g i n t h e two t o t e n k i l o m e t r e range from shore ( t h e area which showed maximum v a r i a b i l i t y I n t h e past. F i g u r e 2 ) .
F u r t h e r d e t a i l s on s t a t i o n l o c a t i o n . sampling
83 STATION PAlTERN FOR LAKE ONTARIO, 1974-80
n
STATION DELETED 1975
CI
STATION NOT SAMPLED 1975-1976
I
STATION DELETED 1977
L
I
.110111r%
I r a
+ urn
0
STATION ADDED 1975
0
STATION ADDED 1976
STATION ADDED i 9 n
F i g u r e 2:
S t a t i o n l o c a t i o n f o r Lake O n t a r i o S u r v e l l l a n c e program, 1974-1980.
frequency and parameters sampled, f o r t h e Lake O n t a r i o S u r v e l l l a n c e Program 1968-1980,
can be found i n Kwlatkowskl and N e l l s o n (1983).
years o f s u r v e i l l a n c e d a t a f r o m Lake O n t a r l o , 1974-1976, q u a l i t y parameter c h l o r o p h y l l
a
f o r t h e water
was s u b j e c t e d t o a r e g r e s s i o n model
developed by El-Shaarawl and Shah (1978). o r i g i n a l observations,
Three
The model transformed t h e
so t h a t t h e transformed values approximately
s a t l s f y t h e assumptions r e q u i r e d f o r t h e a p p l i c a t i o n o f t h e a n a l y s i s o f variance; 1.e.
t h e mean v a l u e can be expressed as a l i n e a r combinations o f
t h e I n f l u e n c i n g f a c t o r s , constancy o f variance, a d d i t i v i t y o f t h e e r r o r term, and n o r m a l l t y o f d i s t r i b u t i o n .
An a d d i t l v e l i n e a r model w l t h
seasonal and s p a t l a l components was then f i t t e d t o t h e transformed data. A h l e r a r c h l c a l c l a s s l f l c a t i o n procedure u s i n g e s t i m a t e s o f s p a t l a l e f f e c t s was t h e n a p p l i e d t o d i v i d e t h e l a k e I n t o s t a t l s t l c a l l y homogeneous zones, ( p 5 0.05).
I n o r d e r t o p r o v i d e an overview, a composite z o n a t i o n map
o f t h e 1974-1976 d a t a was drawn ( F i g u r e 3,) w i t h a p o i n t source zone ( a r e g l o n o f d l r e c t i n p u t , w l t h maxlmum c o n c e n t r a t i o n s and maximum v a r l a b i l l t y ) . an i n s h o r e zone ( a r e g i o n where n a t u r a l d i l u t i o n had reduced c o n c e n t r a t i o n s and v a r i a b i l i t y t o moderate l e v e l s ) , and an o f f s h o r e zone ( t h e l a r g e , deep c e n t r a l p o r t i o n o f t h e l a k e which can o n l y be a f f e c t e d by prolonged l o a d i n g s ) . Although each year produced s l l g h t l y d i f f e r e n t seasonal p a t t e r n s w l t h respect t o chlorophyll
a
(Kwiatkowskl, 1978). s l m i l a r i t i e s . t y p e f i e d by
t h e 1974 d a t a were e v l d e n t .
C r u i s e averages f o r t h e 1974 c h l o r o p h y l l
d a t a f o r each zone d e s c r i b e d i n F i g u r e 3 a r e p l o t t e d a g a l n s t t l m e i n
a
84
LAKE ONTARIO COMPOSITE OF CHLOROPHYLL ZONES
INSHORE IPOINT
OFFSHORE
F gure 3:
A composlte zonation map of chlorophyll &.
Taken from Kwiatkowski 1978. Figure 4 . The chlorophyll & concentrations for each zone displayed spring and fa 1 peaks typical of temperate lakes. There was a progression for the zones closest to the nutrient inputs to have higher chlorophyll concentrations. Similarly, the timing of the spring peak was progressively later, the farther away from shore the sample was taken. The movement o f the thermal bar (Rodgers, 1965) is probably the most SEASONAL PATTERN FOR CHLOROPHYLL ZONES (LAKE ONTARIO 1974)
~.
~~
Figure 4: Cruise mean value obtained for chlorophyll concentrations, 1974, from composite zonation map.
c
0 1 J
F
' '
1
M
h
1
A
,
1
'
M
'
I
' '
1
J J DATE
' '
'
A
'
' ' S
' '
'
O
' ' '
N
'
'
85
l i k e l y explanation f o r t h i s r e s u l t .
It i s interesting t o note that i n
zone 3 ( p o i n t s o u r c e ) t h e s p r i n g peak had a l r e a d y o c c u r r e d b e f o r e t h e
f i r s t c r u i s e ( 1 . e . t h e peak o c c u r r e d b e f o r e A p r i l ) . E v i d e n t l y a b i a s can be i n t r o d u c e d i n t h e a n n u a l (XA) o r c r u i s e means ( X c ) by s i m p l y moving s t a t i o n s f r o m one zone t o a n o t h e r .
The i m p o r t a n c e
o f t h i s b i a s can b e s t be e x p l a i n e d t h r o u g h a h y p o t h e t i c a l example, u s i n g t h e 1 9 7 4 Lake O n t a r i o c h l o r o p h y l l
a
data.
Assume t h a t t h e r e a r e two Lake
O n t a r i o s i d e n t i c a l i n e v e r y a s p e c t , e x c e p t one i s l o c a t e d i n t h e WQB-Ontario Region, and one i s l o c a t e d i n t h e WQB-Pacific and Yukon Region.
B o t h r e g i o n s e s t a b l i s h m o n i t o r i n g programs.
Due t o budget
r e s t r i c t i o n s , b o t h r e g i o n s can o n l y a f f o r d t o do 32 s t a t i o n s .
I n an
attempt t o b e t t e r d e f i n e t h e temporal v a r i a b i l i t y i n h e r e n t i n b i o l o g i c a l sampling, 1 4 c r u i s e s a r e conducted by each r e g i o n .
O n t a r i o Region,
c o g n i z a n t o f t h e f a c t t h a t t h e w a t e r r e s o u r c e i n Canada i s a shared federal-provincial
r e s p o n s i b i l i t y , m o n i t o r s m a i n l y t h e open l a k e
component, w i t h a few p o i n t s o u r c e s t a t i o n s ( F i g u r e 5 ) .
P a c i f i c and Yukon
Region, i n t e r e s t e d i n d e f i n i n g t h e p o l l u t a n t movement f r o m p o i n t sources and aware t h a t maximum v a r i a b i l i t y o c c u r s i n t h e n e a r s h o r e w a t e r s , samples m a i n l y n e a r t h e known i n p u t s , w i t h some open l a k e s t a t i o n s t o d e f i n e p r i s t i n e conditions (Figure 6).
A p l o t o f t h e mean c r u i s e v a l u e s o b t a i n e d
by t h e two r e g i o n s , as w e l l as t h a t o b t a i n e d f r o m t h e t r u e s u r v e i l l a n c e program a r e p l o t t e d i n F i g u r e 7. and d e t a i l s on t h e number o f o b s e r v a t i o n s
(N), t h e average v a l u e ( X c ) s t a n d a r d d e v i a t i o n (SD) and t h e c o e f f i c i e n t of v a r i a t i o n (CV),
by c r u i s e a r e g i v e n i n T a b l e 1. STATION LOCATIONS FOR LAKE ONTARIO ONTARIO REGION
\I 79'30' + 43'00'
F i g u r e 5:
4
3?
KILOMETERS
H y p o t h e t i c a l s t a t i o n l o c a t i o n s f o r Lake O n t a r i o program Water Q u a l i t y Branch, O n t a r i o Region.
86 Though t h e seasonal p a t t e r n i s q u i t e s i m i l a r f o r a l l t h r e e sampling designs, mean c r u l s e c o n c e n t r a t i o n s ,
especially during the spring period,
a r e s i g n i f i c a n t l y (p50.01) d i f f e r e n t between t h e O n t a r i o design, versus t h e P a c i f i c and Yukon design, Table 1.
O f t h e f o u r t e e n c r u i s e s conducted,
o n l y two ( c r u i s e s 11 and 13) were n o t s i g n i f i c a n t l y ( p 0 . 0 5 ) d i f f e r e n t from one another.
The d i f f e r e n c e i n c h l o r o p h y l l
a
c o n c e n t r a t i o n s between
t h e two r e g i o n s , expressed as a percentage, v a r i e d from 8.1 t o 133.2%, w i t h an annual average d i f f e r e n c e o f 44.5% ( T a b l e 2 ) . Dobson (1981) has i n d i c a t e d t h a t c h l o r o p h y l l g c o n c e n t r a t i o n s i n s u r f a c e waters can be used as a t r o p h i c index, w i t h values l e s s t h a n 2pg/l i n d i c a t i n g o l i g o t r o p h i c waters, 2-6pg/l 6pg/l e u t r o p h i c .
mesotrophic and g r e a t h e r t h a n
I f t h e s e d e s c r i p t o r s were t o be accepted as g u i d e l i n e s
by t h e WQB, d a t a generated by t h e O n t a r i o Region's network would i n d i c a t e t h a t t h e i r l a k e i s moderately mesotrophic, w i t h maximum c o n c e n t r a t i o n s (5.20pg/1)
o c c u r r i n g i n e a r l y summer, and d u r i n g t h e f a l l p e r i o d
(8.3Opg/l)
a f t e r thermal s t r a t i f i c a t i o n breakdown. ( F i g u r e 7 ) .
summer minimum o f 2.77pg/l
The l a t e
would be d e s c r i b e d as a r e s u l t o f n u t r i e n t
d e p l e t i o n o f t h e e p i l i m n e t i c waters.
The network d e s i g n by P a c i f i c and
Yukon Region would i n d i c a t e t h a t t h e i r Lake i s e u t r o p h i c w i t h maximum l e v e l s o c c u r r i n g I n t h e s p r i n g (8.41pg/l), (9.14pg/l)
periods.
summer (6.89pg/l)
and f a l l
Low (mesotrophic) l e v e l s a r e o n l y reached i n l a t e
summer due t o n u t r i e n t d e p l e t i o n . I f a n a t i o a1 ( a c r o s s Canada) comparison o f t h e t r o p h i c s t a t u s o f l a k e s ,
as p r e d i c t e d by t h e biomass i n d i c a t o r c h l o r o p h y l l a w e r e t o be requ red, t h e i n f o r m a t on would be e x t r a c t e d f r o m t h e two d a t a s e t s w i t h i n STATION LOCATIONS FOR LAKE ONTARIO PACIFIC AND YUKON REGION
I
79-30'
+ 43-00,
F i g u r e 6:
KILOMETERS
H y p o t h e t i c a l s t a t i o n l o c a t i o n s f o r Lake O n t a r i o program Water Q u a l i t y Branch, P a c i f i c and Yukon Region.
SEASONAL PATTERN FOR CHLOROPHYLL FROM THREE SAMPLING SCENARIOS (LAKE ONTARIO, 1974)
F i g u r e 7: chlorophyll
C r u i s e mean v a l u e s f o r
a
c o n c e n t r a t i o n s , 1974,
f r o m t h e O n t a r i o , P a c i f i c and Yukon, and S u r v e i l l a n c e programs.
:i,,-Efii 8
OYTARIO REGION
0 J
F
M
A
M
J
J
A
S
O
N
D
DATE
NAQUADAT.
P o s s i b l e c o n c l u s i o n s drawn f r o m comparisons o f t h e s e two d a t a
s e t s would be t h a t i n O n t a r i o , though f a l l l e v e l s o f c h l o r o p h y l l
a
are
worrisome and d e s e r v e f u t u r e s t u d y , t h e l a k e does n o t r e q u i r e any r e m e d i a l action.
I n t h e P a c i f i c and Yukon Region c h l o r o p h y l l & l e v e l s a r e
s i g n i f i c a n t l y h i g h e r t h a n i n O n t a r i o Region f o r a l l seasons.
The l a k e i s
h i g h l y e u t r o p h i c d u r i n g t h e s p r i n g and f a l l p e r i o d s , w i t h h i g h m e s o t r o p h i c l e v e l s r e p o r t e d i n t h e summer.
Remedial a c t i o n ( n u t r i e n t r e d u c t i o n . ) i s
warranted. One p o s s i b l e s o l u t i o n t o t h e above s t a t e d p r o b l e m i s a r e a l w e i g h t i n g . By assuming:
t h a t a l a k e can be p a r t i t i o n e d i n t o n homogeneous s a m p l i n g
p o p u l a t i o n s o r s t r a t a , each o f a r e a ha; t h a t t h e sample s t a t i o n s a r e independent and randomly o b t a i n e d ; and t h a t t h e sample mean i s n o r m a l l y d i s t r i b u t e d . N, t h e number o f o b s e r v a t i o n s needed t o e s t a b l i s h t h e t r u e c r u i s e mean l e v e l , a t any g i v e n c o n f i d e n c e i n t e r v a l can be c a l c u l a t e d f r o m :
where u i s t h e e s t i m a t e o f u based on t h e a v a i l a b l e i n f o r m a t i o n , and
88
has m degrees o f freedom.
The v a l u e o f t t o be used i n t h i s f o r m u l a i s
t h e c r l t l c a l v a l u e r e a d f r o m t h e t a b l e o f S t u d e n t s ' t f o r m degrees o f freedom, a t t h e l e v e l o f s i g n i f i c a n c e c o r r e s p o n d i n g t o t h e r e q u i r e d c o n f i d e n c e c o e f f i c i e n t and L i s t h e s p e c i f i c e r r o r .
& d. 1979,
1979, Ward
(Mandel 1964, Green
Dunnette 1980 and Nelson and Ward 1981).
Thus
g i v e n t h e r e q u i r e d v a l u e s o f L and t h e c o n f i d e n c e c o e f f i c i e n t , N can be computed f r o m a subsample where x, and m a r e known.
Once t h e c r u i s e means
f o r each s t r a t a a r e determined, and by assuming a s i m p l e l i n e a r d i l u t i o n between t h e m i d - p o i n t s o f t h e v a r i o u s s t r a t a , a whole l a k e mean c r u i s e v a l u e can be e s t i m a t e d by:
where:
-
i s t h e whole l a k e w e i g h t e d s u r f a c e c o n c e n t r a t i o n f o r any
xCW
given cruise,
xj
i s t h e average s u r f a c e c o n c e n t r a t i o n f o r s t r a t a j, j = 1 t o
n, a,
t h e s u r f a c e a r e a between t h e m i d p o i n t s o f each s t r a t a ,
'IS
i
=
1 t o n.
I t s h o u l d be n o t e d t h a t ha can be r e p l a c e d by Av ( t h e volume o f each strata).
However, t h i s would r e q u i r e a s i g n i f i c a n t l y g r e a t e r sampling
e f f o r t ( w i t h depth),
e s p e c i a l l y d u r i n g t h e summer s t r a t i f i c a t i o n p e r i o d .
C l u s t e r i n g o f t h e d a t a I n t o s t a t i s t i c a l l y homogeneous zones ( o r s t r a t a ) as done i n F i g u r e 3 a l l o w s f o r g r e a t e r e f f o r t , i n terms o f s t a t i o n numbers, t o improve
x3'
t h e e s t i m a t e d mean c o n c e n t r a t i o n o f s t r a t a j ,
w i t h o u t a d v e r s e l y a f f e c t i n g t h e whole l a k e average.
Increased e f f o r t ,
even w i t h i n f i x e d networks, i s an i n h e r e n t component i n a l l m o n i t o r i n g programs.
W i t h i n t h e s u r v e i l l a n c e program on Lake O n t a r i o , a program
o r i g i n a l l y designed t o be s t a b l e w i t h r e s p e c t t o s t a t i o n l o c a t i o n and sample frequency, changes have o c c u r r e d i n program.
response
t o t h e needs o f t h e
Between t h e y e a r s 1974 t o 1980, changes i n s t a t i o n l o c a t i o n
( F l g u r e 2) have r e s u l t e d i n a 40% i n c r e a s e i n e f f o r t f o r t h e p o i n t source area, a 13% l n c r e a s e i n e f f o r t f o r t h e i n s h o r e area, and a 6% decrease i n e f f o r t f o r t h e o f f s h o r e area.
Thus comparison o f whole l a k e averages
between h i s t o r i c a l and p r e s e n t l y c o l l e c t e d d a t a , must t a k e t h i s change i n t o account t o ensure t h a t changes have t r u l y o c c u r r e d and a r e n o t due t o r e l o c a t i o n o f sampling e f f o r t . A r e a l w e i g h t e d v a l u e s ( e q u a t i o n 4) f o r t h e 1 4 c r u i s e s conducted i n 1974 were c a l c u l a t e d u s i n g t h e t h r e e sampling s c e n a r i o s p r e v i o u s l y d e s c r t b e d
89 (e.g.
O n t a r i o Region, P a c i f i c and Yukon Region, and t h e s u r v e i l l a n c e The c a l c u l a t e d
program), and t h e t h r e e s t r a t a d i s p l a y e d i n F i g u r e 3. values a r e p l o t t e d i n F i g u r e 8.
The s p r i n g p e r i o d s t i l l produces t h e
g r e a t e s t d i f f e r e n c e s between t h e v a r i o u s scenarios, however, d i f f e r e n c e s between t h e O n t a r i o Region's and P a c i f i c and Yukon Region's sampling networks ( t h e two extreme c a s e s ) w e r e reduced t o an average o f 8.0% f r o m The d i f f e r e n c e s found i n t h e f i r s t t h r e e ( s p r i n g )
44.5% ( T a b l e 2 ) .
No s i g n i f i c a n t
c r u i s e s were reduced f r o m an average o f 107.9% t o 15.2%.
(PS0.05) d i f f e r e n c e s i n weighted mean c r u i s e c o n c e n t r a t i o n s were found between t h e O n t a r i o versus t h e P a c i f i c and Yukon network designs. SEASONAL PATTERN FOR CHLOAOPHVLL WITH WEIGHTING (LAKE ONTARIO, 1974)
F i g u r e 8:
A r e a l weighted c r u i s e
mean values f o r c h l o r o p h y l l c o n c e n t r a t i o n s , 1974, from t h e O n t a r i o , P a c i f i c and Yukon, and s u r v e i l l a n c e programs.
0
~ J
' F
~
'
M
' A
~ M
"
'
J
~ J
' A
~ S
"
' O
" N
'
~
~
*
~
'
'
'
~
'
~
D
DATE
A g r e a t e r r e d u c t i o n i n c r u i s e mean d i f f e r e n c e s between t h e v a r i o u s
scenarios can be o b t a i n e d by e s t a b l i s h i n g p r o g r e s s i v e l y s m a l l e r s t r a t a (Aa's).
A p p l i c a t i o n o f t h e model (El-Shaarawi and Shah, 1978) on t h e
1974 c h l o r o p h y l l
a
d a t a a t t h e p 5 0.25 s i g n i f i c a n c e l e v e l , d i v i d e d t h e
l a k e i n t o seven homogeneous zones ( F i g u r e 9 ) .
I t should be p o i n t e d o u t
however, t h a t as t h e s u r f a c e areas (Aa) o f each homogeneous s t r a t a a r e reduced, t h e number o f s t r a t a , and t h e r e f o r e t h e number o f sampling
-
s t a t i o n s (N) i n c r e a s e s .
As
Aa
+
0. N
+
151
90
Table 1:
(x),
Number of observations (N), mean standard deviatlon (SD) and coefficient of variation (CV) for chlorophyll cruises. Pacific and Yukon
Ontario
Cruise 1** 2** 3** 4** 5** 6* 7* 8** 9** lo** 11 12** 13 14**
30 30 30 32 32 15 20 32 32 31 31 25 27 31
annual
* **
X
3.28 4.12 4.01 3.71 5.20 5.20 3.31 3.45 2.77 3.98 6.04 5.45 8.30 4.68
SD
CV
N
2.95 3.61 3.19 1.80 1.58 2.45 1.74 1.61 1.01 1.41 3.32 1.74 2.53 1.32
89.9 87.6 79.6 48.5 30.4 47.1 52.6 46.7 36.5 35.4 55.0 31.9 30.5 28.2
32 32 31 32 30 25 21 32 32 31 32 26 31 30
398 4.49 2.65 59.0
SD
X
7.65 8.41 7.47 5.04 6.40 6.89 4.39 5.68 4.11 5.26 6.53 6.82 9.14 6.54
-
CV
4.69 61.3 4.79 56.9 4.29 57.4 2.01 39.9 2.19 34.2 3.54 51.4 2.42 55.1 7.38 129.9 2.29 55.7 2.34 44.5 4.46 68.3 2.33 34.2 3.16 34.6 3.02 46.2
417 6.49 4.04
X
SD
5.28 6.15 6.29 4.70 5.48 5.60 3.92 4.43 3.45 4.61 5.96 6.00 8.77 5.94
4.28 4.29 4.28 2.05 1.92 3.01 2.05 4.70 1.88 2.08 4.02 2.05 2.73 2.45
N
62.2
81 82 83 85 83 55 62 85 83 84 84 71 78 79
1095 5.47 3.41
CV 81.1 69.8 68.0 43.6 35.0 53.8 52.3 106.1 54.5 45.1 67.4 34.2 31.1 41.2 62.3
significant differences ( p I 0.05) between Ontarlo. and Pacific and Yukon; one tailed student 'tt". slgnlficant differences ( p IO.01) between Ontario, and Paclfic and Yukon; one tailed student 'lt".
Table 2:
Cruise 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Annual P&Y
N
Surveillance Proqram
-
=
Percentage dlfference between scenarlos. before and after weighting. Ontario/Surveillance Before After -37.9 -7.7 -33.0 -3.3 -36.2 -3.6 -21.1 -6.1 -5.1 6.8 -7.1 -1.2 -15.6 -11.7 -22.1 -11.2 -19.7 -9.0 -13.7 -11.6 1.3 1.7 -9.2 0.4 -5.4 4.2 21.2 -16.6 -17.9
-4.1
Pacific and Yukon Region
P&Y/Surveillance Before After 44.9 10.1 36.7 10.0 18.8 8.6 7.2 -6.3 16.8 3.4 23.0 16.0 12.0 6.4 20.9 -4.2 19.1 -0.7 14.1 6.6 9.6 -1.5 13.7 -1.3 4.2 0.0 3.0 10.1 18.6
3.6
P&Y/Ontario Before After 133.2 104.1 86.3 35.8 23.1 32.5 32.6 64.6 48.4 32.2 8.1 25.1 10.1 39.7
19.3 13.7 12.7 -0.3 -3.2 17.3 20.5 7.9 9.2 20.6 -3.1 -1.8 -4.0 23.4
44.5
8.0
91 LAKE ONTARIO STATISTICAL (P-25) ZONATION 1974 CHLOROPHYLL
u KILOMETFRS
79'30'
+ 43'00'
Flgure 9: Statlstlcal zones (PS0.25) formed with chlorophyll & data on Lake Ontario. 1974. As it has been demonstrated, location of sampling stations influences
annual mean concentrations. Similarly it has also been demonstrated that for Lake Ontarlo, varying the timing of sampling, can also result in a biased whole lake mean concentration (Kwiatkowski. 1985). Weekly chlorophyll data, collected from the inshore strata of Lake Ontario, was subjected to four different sampling scenarios based on a systematic design with k equal t o one month. Not only were annual means significantly altered but so were the seasonal cycles (Figure 10).
11 -
J FMAMJ J A 5ON DJ FMAMJ J A S 1 st WEEK 2 nd W E E K
ON DJ F M A M J J A 5 ON D J 3 i d WEEK
F MA M J J A S
ON D
4thWEEK
DATE
Figure 10: Seasonal cycles obtained for chlorophyll & w i t h varying sampling scenarios. Taken from Kwiatkowski 1985.
t
The L o t l c System The r i v e r b a s i n has become t h e fundamental u n i t o f e v a l u a t i o n w i t h i n t h e WQB m o n l t o r l n g e f f o r t s .
Water q u a l i t y v a r i a b l e s w i t h i n a l o t l c system,
f r o m t h e headwaters t o i t s mouth, p r e s e n t a c o n t i n u o u s g r a d i e n t o f environmental c o n d i t i o n s .
P o l l u t a n t s e n t e r l n g t h e system a t a p o l n t
source o r a t a d i f f u s e source, r e s u l t s I n h l g h e r t h a n background c o n c e n t r a t i o n s t o be observed a t t h e p o l n t o f e n t r y .
These l e v e l s ,
t h r o u g h d l l u t i o n , a r e reduced downstream u n t i l t h e y a r e I n d i s t i n g u i s h a b l e f r o m background l e v e l s .
A d i r e c t p a r a l l e l i n t h e m o n i t o r i n g o f l o t i c and
l a c u s t r i n e systems can be e s t a b l l s h e d .
Distinct strata (pristine, point
source and r e c o v e r y ) w i t h i n t h e l o t i c system can be i d e n t i f i e d . w h i c h a r e e q u i v a l e n t t o t h e s t r a t a ( o f f s h o r e , p o i n t s o u r c e and n e a r s h o r e ) p r e v i o u s l y d e s c r i b e d i n t h e l a c u s t r i n e system ( F i g u r e 11). RIVER MOHlTORlffi
ftgure 11:
Hypothetical
zonatton
rhtch can occur in a rlver system.
I t i s n o t t h e i n t e n t o f t h i s manuscript t o belabour t h e p o i n t o f z o n a t i o n ; however, one case example o f a l o t i c system f r o m t h e WQB-Quebec Reglon w i l l be o f f e r e d .
The S t . Lawrence R l v e r I s one o f t h e m a j o r r i v e r s
o f t h e w o r l d and I s o f p r i m e i m p o r t a n c e t o t h e p e o p l e o f Quebec.
Its
d l v e r s l f i e d b i o l o g i c a l r e s o u r c e s and t h e number a n d ' s i z e o f t h e m u n i c l p a l i t i e s I t supplies w i t h water t e s t i f y t o i t s environmental Importance, j u s t as m u n i c i p a l , a g r l c u l t u r a l and i n d u s t r i a l s t a t l s t i c s b r i n g o u t i t s economic I m p o r t a n c e .
I n Canada, t h e h e a l t h o f t h e
93
St. Lawrence River, which drains one of the most developed hydrographic baslns of the world Is the subject of particular concern (St. Lawrence River Study Committee 1978). In 1971, following an inventory carried out on the Great Lakes, the Canada-Quebec Consultative Committee on water problems set up a task force to review the historical records on the quality of the water In the St. Lawrence River, and to develop programs for its management and use. The St. Lawrence River Network (1977-1981), consisted of 46 stations located between Cornwall and Quebec City. Sampling was carried out six times a year, during the ice free period, for a variety of water quality parameters (Figure 12). The objectives of the program were to provide information on the location. severity, frequency and duration of non-achievement with the water quality objectives set for the various uses of the aquatic resource (Germain and Janson 1984). Analyses of the data indicated that the water quality in the St. Lawrence River varied significantly from one location to another, generally with lower concentrations of most pollutants being found in the high velocity navigation channel. Municipal and industrial effluents resulted in particularly high concentrations along the shore, at or below the discharge outlets. These elevated levels were found to continue far downstream because cross stream mixing took place slowly, due to natural channelization of the river. Specific conductance measurements (due to their conservative nature) were used as a tracer to delineate the impacted areas and define the effects of channelization (Figure 13) As it can be readily seen, mean sampling trip concentrations (2,) for specific conductance In the St. Lawrence River can easily be altered by simply
Figure 12: Station locations for the St. Lawrence River monitoring program, 1977-1981.
94
moving stations from one area to another. Figure 13 also provides an interesting example of a unique problem to lotic systems - e.g. cross stream variability. Thus, not only is it important to sample downstream from the effluent source. but also, to distinguish between the water quality near the shore versus that in the main channel, or from one shore to another. Presently, the 320 kilometre long, Cornwall to Quebec section of the St. Lawrence River has been divided into 23 homogenous zones through the combined application of correspondence analysis and cluster analysis, (Lachance fi d.1979). Though the zonation is based on only three water quality variables (turbidity, inorganic nitrogen and inorganic phosphorus) it brings the water quality monitoring o f the St. Lawrence River a step closer to the rationalization of a statistically sound network, (Lachance et al. 1979. Germain and Janson 1984). Once zonation has been established equation 3 can be modified for a lotic system, where Aa, the area o f the homogeneous zone within the lacustrine system, now becomes Al, the length (or stretch) o f river between the mid points of each strata. Av, the volume could also be used but requires substantially more information than is routinely available from a monitoring program. Again, as in the lacustrine system, estimates of the number o f samples needed to estimate the mean of each strata, within predetermined confidence limlts can be calculated from equation 3. Similarly, as with the lacustrine system, superimposed on the spatial gradient of the lotic system, is the effect of the temporal gradient. Flow variation of the river and its tributaries, climatic conditions and variations in the volume of municipal and industrial wastes are only some of the causes of temporal variability, which complicates environmental aualitv evaluation. SPECIFIC CONDUCTANCE ZONATION PRODUCED FOR THE ST. LAWRENCE RIVER 1977-1981
5225 226-275
=
276-325
3326
Figure 13: Zones formed with specific conductance data on St. Lawrence River, 1977-1981.
I
I I
95
Conclusions
I t I s I m p o r t a n t t o n o t e t h a t m o n i t o r i n g a c t i v i t i e s must be c o n s i d e r e d as The d a t a c o l l e c t e d , w h i c h i s l a t e r
p a r t o f a n o v e r a l l management program.
t r a n s f o r m e d i n t o i n f o r m a t i o n , must n o t o n l y meet t h e immediate l o c a l o r r e g i o n a l needs, I t must a l s o meet f u t u r e and n a t i o n a l needs, ( H e r r i c k s . 1984).
A s a r e s u l t , n e t w o r k d e s i g n must become an i n t e g r a t e d c o - o p e r a t i v e
program o f a l l i n t e r e s t e d p a r t i e s ( c l i e n t s ) t o e n s u r e t h a t b i a s e d , meaningless, o r u n r e l i a b l e r e s u l t s a r e n o t g e n e r a t e d a t e i t h e r t h e l o c a l , regional or national level.
The e s t a b l i s h m e n t o f enough w a t e r q u a l i t y
s t a t i o n s t o e n s u r e t h a t each s t r a t a i n each and e v e r y r i v e r b a s i n w i t h i n Canada i s sampled s u f f i c i e n t l y t o e s t a b l i s h r i v e r mean c o n c e n t r a t i o n s a t predetermined s i g n i f i c a n c e l e v e l s i s n o t p o s s i b l e w i t h t h e present r e s o u r c e s a v a i l a b l e t o t h e F e d e r a l Government.
The WQB, however, has
embarked on an a m b i t i o u s N a t i o n a l Assessment Program, d e s i g n e d t o e n s u r e t h a t s c i e n t i f i c r e s u l t s a r e o b t a i n e d w i t h i n a l l r e g i o n s o f Canada.
In
1982, C a b i n e t p r o v i d e d t h e Department o f t h e Environment w i t h t h e a u t h o r i t y and t h e r e s o u r c e s t o n e g o t i a t e F e d e r a l - P r o v i n c i a l m o n i t o r i n g agreements t o e f f i c i e n t l y implement a comprehensive w a t e r q u a l i t y n e t w o r k ; t o i m p r o v e i n t e r j u r i s d i c t i o n a l assessments and t o address n a t i o n w i d e a q u a t i c e n v i r o n m e n t a l concerns. Development and i m p l e m e n t a t i o n o f t h e Agreements i s based on t h e p r e m i s e t h a t b o t h t h e f e d e r a l and p r o v i n c i a l governments have a r e s p o n s i b i l i t y t o c o l l e c t s c i e n t i f i c a l l y sound w a t e r q u a l i t y m o n i t o r i n g d a t a .
Through t h e
w i s e c o o r d i n a t i o n and i n t e g r a t i o n o f t h e s e m o n i t o r i n g a c t i v i t i e s t h e r e i s a r e a l o p p o r t u n i t y t o e n s u r e e f f e c t i v e use o f e x i s t i n g r e s o u r c e s t o p r o v i d e b o t h f e d e r a l and p r o v i n c i a l governments w i t h a comprehensive p i c t u r e o f water q u a l i t y c o n d i t i o n s . (Haffner i n press). The N a t i o n a l Water Q u a l i t y Assessment Program ( H a f f n e r , i n p r e s s ) c o n s i s t s o f t h r e e interdependent network concepts: 1) a N a t i o n a l I n d e x Network t o p r o v i d e b a s e l i n e d a t a ; t o e s t a b l i s h l o n g
t e r m t r e n d s ; and t o a c t as an e a r l y w a r n i n g system t o h i t h e r t o unknowns a t t h e basin, r e g i o n a l o r n a t i o n a l l e v e l .
i t ) a R e c u r r e n t R i v e r B a s i n Network t o d e t e r m i n e sources and a r e a s o f impact; t o i d e n t i f y e x i s t i n g o r d e v e l o p i n g w a t e r q u a l i t y concerns; t o d e t e r m i n e s t a t i o n l o c a t i o n , sample f r e q u e n c y and parameter l i s t s ; and t o e s t a b l i s h r i v e r b a s i n s p e c i f i c water q u a l i t y o b j e c t i v e s . i i l ) A S p e c i a l S t u d i e s Network t o c o n d u c t i n d e p t h s t u d i e s a t a l o c a l , r e g i o n a l o r n a t i o n a l s c a l e t o address p r i o r i t y i s s u e s . C o - o r d i n a t i o n o f t h e s p e c i f i c t e c h n i c a l d e t a i l s f o r t h e t h r e e networks
w i l l be developed -by R e g i o n a l and Headquarters f e d e r a l s t a f f , and
96
Provincial staff. through a Federal-Provincial Task force. The Regional federal member will be responsible for operation of the networks, In cooperation with the Provincial member. The Federa Headquarters member will be responsible for ensuring the overall compat bility and co-ordination of the program with the nine other Federal-Provincial Task Forces, t o ensure a national perspective. Headquarters will also be responsible for the storage and maintenance of a centralized data system composed of the various federal and provincial data sets, while the National Laboratory in Burlington will ensure compatible analytical results. A wealth of information exists on how to design a specific monitoring network or a research study to provide scientifically sound information to meet a set objective on a specific river reach. Unfortunately. the same cannot be said about the establishment of large scale monitoring programs. Historically, national monitoring networks have consisted of an assemblage of existing independent programs, each designed to provide the proper information base required for the local or regional managerial group. The data files from these independent programs have all been stored and thereby formed a national data archive - but not necessarily a national water quality data bank. This approach has obvious weaknesses with regards to a national program. Therefore i t i s of paramount importance to obtain a concerted effort, both by the federal and provincial managers responsible for water quality, to ensure the coordination of all monitorlng components into a bl-laterally coordinated, efficient and cost-effective manner. Without this cooperation Canada’s National Water Quality Monitoring Program will be reduced to the parable of the elephant and the blind men. The Blind Men and the Elephant It was six men of Indostan T o learning much inclined, W h o went to see the Elephant (Though all of them were blind), That each by observation night satisfy his mind. T h e First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: “God bless! but the Elephant I s very like a wall!”
97 T h e S e c o n d , f e e l i n g o f the t u s k , C r i e d , "Ho! w h a t h a v e we here So v e r y r o u n d and s m o o t h and s h a r p ? T o me ' t i s m i g h t y c l e a r T h i s wonder o f an E l e p h a n t I s v e r y like a spear!" T h e T h i r d a p p r o a c h e d the a n i m a l , And h a p p e n i n g t o t a k e The squirming t r u n k w i t h i n h i s hands. T h u s b o l d l y u p and s p a k e : " I see." q u o t h h e . " t h e E l e p h a n t I s v e r y l i k e a Snake!" The Fourth reached o u t an eager hand. And f e l t a b o u t t h e knee, "What m o s t t h i s w o n d r o u s b e a s t i s l i k e Is m i g h t y p l a i n . " q u o t h h e : " T i s c l e a r e n o u g h the E l e p h a n t Is very like a tree!" T h e F i f t h who c h a n c e d t o t o u c h the e a r . S a i d : " E ' e n the b l i n d e s t man Can t e l l w h a t t h i s r e s e m b l e s m o s t ; Deny the f a c t who c a n , T h i s marvel o f an Elephant Is very like a fan!" T h e S i x t h no sooner had b e g u n A b o u t the b e a s t t o g r o p e , T h a n . s e i z i n g on the s w i n g i n g t a i l That f e l l within his scope, " I see." q u o t h h e , " t h e E l e p h a n t I s very like a rope!" And so these men o f I n d o s t a n D i s p u t e d l o u d and l o n g , Each i n h i s own o p i n i o n E x c e e d i n g s t i f f and s t r o n g , T h o u g h e a c h w a s p a r t l y i n the r i g h t And a l l w e r e i n the wrong!
by J.G. Saxe (1816-1887) REFERENCES
Beeton. A.M., and Edmondson, W.J., 1972. The eutrophication oroblem. J. Fish. Res. Board Can. 29:673-682. Csanady, G.T.. Proceedings 5th i970. Coastal entrapment I n Lake Hydron. International Conference on Water Pollution Research 3:1-7, New York, Pergamon Press. Csandy G.T., 1970. Coastal entrapment in Lake Huron. In Proreedings 5th Internatlonal Conference on Water PollutionResearch 3:1-7. New York. Pergamon Press. Dobson H.F.H.. 1981. Trophlc conditions and trends in the Laurentlan Great Lakes. W.H.O. Water Quality Bulletin 6 ~ 1 4 6 - 1 5 1 , 1 5 8 & 160. Dunnet e, D . A . , 1980. Observed frequency optimization using a water quality Index. Journal WPC F 52:2807-2811.
98
El-Shaarawl. A.H., and Shah, K.R., 1978. Statistical procedures for classlflcatlon of a lake Inland Waters Directorate, DOE, Sclentlflc Serles No. 86. Germaln. A.. and Janson. M . 1984. Quallte des eaux du fleuve Salnt-Laurent d e Cornwall A Quebec (1977-1981) IWO publication. Green, R.H., 1979. Sampllnq Deslsn and Statlstlcal Methods for Envlronmental Bloloqlsts. John Wlley & Sons, Inc. 257 pp. Haffner, G.D. (in press). Water Quality Branch Strategy for Assessments of the quallty of Aquatlc Envlronments. Water Quallty Branch. Inland Waters Directorate, Sclentlflc Serles. Herrlcks. E.E.. 1984. Aspects of monltorlng In river basin management. Wat. S c l . Tech. 16:259-274. Kwlatkowskl, R.E., 1978. Scenarlo for an ongoing chlorophyll survelllance plan on Lake Ontario for non-intenslve sampllng years. J. Great Lakes Res. 4:19-26. Kwlatkowskl. R.E.. 1985. The lmportance of temporal avallabillty to the deslgn of large lake water quallty networks. Accepted for publlcatlon J. Great Lakes Res. 11:462-477. Kwlatkowskl, R.E., and Nellson, M.A.T.. 1983. Lake Ontario survelllance data, 1968-1980. Inland Waters Dlrectorate. DOE. Technlcal Bulletin No. 126. Lachance. M . , Bobee, 6.. and Gouin. D. 1979. Characterlzatlon of the water quallty In the Salnt Lawrence River: Determlnatlon of Homogeneous Zones by Correspondence Analysts. Water Resources Research 15:1451-1462. Mandel, J., 1964. The statlstlcal analvsls of experimental deslqn. John Wlley and Sons, Inc. Nelson. J.D.. and Ward, R.C.. 1981. Statistical conslderations and sampllng technlques for ground-water quallty monltorlng. Ground Water 19:617-625. Rodgers. G.K., 1965. The thermal bar in the Laurentlan Great Lakes. Unlv. Mlch., Great Lakes Research Division, Ann Arbor, MlCh., PUbl. N O . 1 3 ~ 3 5 8 - 3 6 3 . St. Lawrence Rlver Study Commlttee 1976. Rapport du Comlte d'etude sur le fleuve St-Laurent. Environment Canada et le servlce d e la protection d e l'envlronnement du Quebec. L'edlteur offlclel du Quebec, 293 pages. Verslon anglalse s'lntltude. Final Report St-Lawrence River Study Committee, 209 pages. Ward R.C., Loftls, J.C., Nlelsen. K . S . . and Anderson, R.D.. 1979. Statlstlcal evaluatlon of sampling frequencies in monltorlng networks. Journal WPC F 51:2292-2300. Whlt ow, S., and Lamb, M . , 1983. NAQUADAT - Guide t o Interactlve retrleval Inland Waters Directorate. Water Quality Branch. Ottawa Canada. 63 pp.
DETERMINATION OF WATER QUALITY ZONATION IN LAKE ONTARIO USING MULTIVARIATE TECHNIQUES
NEILSON AND R.J.J. STEVENS, Water Quality Branch, Ontario Region, Inland Waters Directorate 867 Lakeshore Road, P.O. Box 5050, Burlington, Ontario, L7R 4A6 M.A.
The surface water qua1 i t y c h a r a c t e r i s t i c s of Lake Ontario were studied d u r i n g 29 c r u i s e s conducted on a monthly b a s i s throughout 1977, 1981 and 1982. El-Shaarawi and Shah's (1978) c l a s s i f i c a t i o n procedure was f i r s t used t o reduce each y e a r ' s multi-cruise information (each cruise sampling 94 s t a t i o n s f o r 14 parameters) t o a single value ( T I f o r each s t a t i o n and year. Principal components analysis was then applied t o these T-values, reducing the multiple parameter l i s t t o 3 f a c t o r scores. Ward's clustering procedure grouped together s t a t i o n s , according t o t h e i r factor scores, which demonstrated similar properties. These groups, o r zones, were then validated u s i n g discriminant analysis. INTRODUCTION The principal objectives of water quality monitoring i n Lake Ontario are t o determine long-term trends i n a variety of parameters and t o examine how the i n t e r r e l a t i o n s h i p s of variables both a f f e c t and respond t o observed changes. E l -Shaarawi and Kwiatkowski (1 977) examined the r e l a t i v e magnitude of both spatial and seasonal v a r i a b i l i t y f o r physical (temperature, transparency) and biochemical ( p a r t i c u l a t e organic carbon and nitrogen, chlorophyll 5 ) parameters i n the surface waters of Lake Ontario and found these parameters t o exhibit highly s i g n i f i c a n t ( p < O . O l ) spatial gradients. While t h e i r analysis permitted the determination of d i s t i n c t water masses i n the lake, i t was limited i n t h a t i t produced a d i s t i n c t zonation pattern f o r each parameter, thereby complicating any attempt t o i n t e r r e l a t e the parameters.
100
The need t o consider water mass distributions from a multivariate perspective was recognized by Moll e t al. (1985) who applied principal canponents and c l u s t e r analyses t o a multivariate data s e t f o r Lake Huron. The lake was regionalized on a cruise-by-cruise b a s i s because i t was expected t h a t homogeneous water masses would not have the same areal distributions throughout the year. Analysis of the s p a t i a1 distribution of water quality parameters i n Lake Ontario supported t h i s (Neilson and Stevens, i n p r e s s ) , however, cruise-by-cruise zonation complicates e f f o r t s t o determine trends i n time as well as seasonal changes w i t h i n any given region. A fixed s t a t i o n design and, consequently, a fixed zonation pattern, has been demonstrated t o be the most s u i t a b l e for determining long-term trends i n water quality parameters as i t eliminates t h e confounding influences of s t a t i o n numbers and placement (El-Shaarawi 1982; van Belle and Hughes 1983.) I l l u s t r a t i o n o f this point i s provided by the controversy surrounding the determi nation of trends i n hypo1 imnetic oxygen d e f i c i t i n Lake Erie (Charlton 1980; Barica 1982; Anderson e t a1 1984; Rosa and Burns 19851, much of which i s a t t r i b u t a b l e t o how the investigators defined a homogeneous water mass when employing a variable station design. Consequently, t o achieve the s t a t e d objectives, i t was necessary t o derive a zonation pattern t h a t would divide the lake i n t o limnologically d i s t i n c t water masses which, a t the same time, would be applicable t o a l l c r u i s e s i n a l l years, thereby maintaining a fixed s t a t i o n design.
Confronted w i t h similar objectives, Leach (1980) used factor and c l u s t e r analyses t o determine d i s t i n c t water masses i n Lake St. Clair. The standardized r e s u l t s of biweekly sampling a t 14 s t a t i o n s were used t o derive the zones, then these zones were superimposed on the data from other years. A similar b u t s l i g h t l y modified approach was adopted i n the current study. Because of the enormity of the database (30,000 data p o i n t s ) and t o canpensate f o r seasonality, prior t o application of principal components and c l u s t e r analyses, i t was necessary t o i n i t i a l l y reduce the r e s u l t s f o r each parameter (29 sampling events i n 3 y e a r s ) t o normalized non-dimensional estimates of the s p a t i a1 e f f e c t s associated w i t h each of the 94 s t a t i o n s sampled. The r e s u l t a n t zonation pattern, validated by use of discriminant analysis, i s t o be used f o r future trend analysis and i n the development of a more e f f i c i e n t sampling design.
101
METHODS
D u r i n g 1977, 1981 and 1982, lakewide cruises were conducted on Lake Ontario a t approximately monthly i n t e r v a l s sampling t h e 94 s t a t i o n s i l l u s t r a t e d i n Figure 2. Nutrient and major i o n samples were collected a t multiple d i s c r e t e depths u s i n g a modified Rosette sampler, however, only values from the l m depth were used i n this study. Chlorophyll 2 (corrected f o r phaeopigments), CHLA, p a r t i c u l a t e organic carbon, POC, and total p a r t i c u l a t e nitrogen, TPN, samples were collected u s i n g a 0-20m integrating sampler designed by the Engineering Section a t the Canada Centre f o r Inland Waters as per Schroeder (1969). Temperature, dissolved oxygen, D02, and s p e c i f i c conductance were measured on s i t e (Carew and Williams 1975). Ammonia, NH3, and n i t r a t e - p l u s - n i t r i t e , N03, samples, f i l t e r e d t h r o u g h 0.45 um Milli-pore f i l t e r s , were a l s o analyzed on board s h i p (El-Shaarawi and Neilson 1984). Samples f o r t o t a l phosphorus, TP, f i l t e r e d chloride, C L , and sulphate, S04, t o t a l f i l t e r e d nitrogen, TFN, a l k a l i n i t y , ALK, and soluble reactive s i l i c a , SRS, weve returned t o the Water Qua1 i ty Branch laboratory i n Burl ington f o r analysis. Detai 1s of preservation and analytical methodology can be found in the Analytical Methods Manual (Environment Canada 1979). Each of the fourteen parameters sampled formed a (94 x 9 ) matrix i n 1977 ( s t a t i o n s x c r u i s e s ) and a (94 x 10) matrix i n 1981 and 1982l. A regression model, developed by El-Shaarawi and Shah (1978), was applied t o each parameter matrix. The model transforms the original observations y i j , corresponding t o the j t h s t a t i o n s d u r i n g the i t h c r u i s e , according t o the equation: (Y;j -1 ) / A , X*O Z..= (1 1 1J I n Yij, X=O (Box and Cox 1964). The method of maximum likelihood i s used t o estimate the value f o r A t h a t best approximates a normal d i s t r i b u t i o n of the variable Z i j . The mean value o f Z i j can be resolved into three additive components: u, the general mean; a i , the e f f e c t due t o the i t h cruise, and b the e f f e c t due t o the j t h sampling station. The j' model then t e s t s the null hypothesis f o r differences between s t a t i o n s using Fisher's F - t e s t ( i e , Ho: b 1 =b 2=...=b 94=O).
1
.......................................................................... Due t o l o g i s t i c a l c o n s t r a i n t s , not a l l s t a t i o n s were sampled during every cruise.
102
A1 though t h e model t h e n continues, zones f o r each o f Kwiatkowski
1980,
c o n s t r u c t i n g s t a t i s t i c a l l y homogeneous
t h e parameters 1984;
s p a t i a l e f f e c t (T-values), t o c o n s t r u c t t h e zones,
(El-Shaarawi
N e i l s o n 1983),
the
and Kwiatkowski
standardized
1977;
estimates o f
which a r e c a l c u l a t e d f o r each s t a t i o n and used
were t h e d e s i r e d end p r o d u c t o f t h i s procedure.
The f o u r t e e n parameter m a t r i c e s f o r each y e a r were t h u s reduced, s t a t i o n ' s mu?t i p l e single spatial
c r u i s e i n f o r m a t i o n h a v i n g been a s s i m i l a t e d
e s t i m a t e (T-value).
The n e t r e s u l t was
a
each
into a
(94 x 1 4 )
( s t a t i o n s x T-values) m a t r i x f o r each year. Principal
components a n a l y s i s was then a p p l i e d t o
u s i n g a minimum eigenvalue o f 1.0,
t h e T-values,
and employing varimax r o t a t i o n as i t
c e n t r e s on s i m p l i f y i n g t h e columns o f a f a c t o r m a t r i x (ie.,
T-values) by
maximizing t h e v a r i a n c e o f t h e squared l o a d i n g s i n each column (Nie e t al.
1975).
From t h e r o t a t e d f a c t o r p a t t e r n m a t r i x ,
(F)
f a c t o r scores
were c a l c u l a t e d as 14
Where tk a r e coefficients
the for
standardized each
k
T-values
parameter.
and
The
fk a r e
factor
the
scores
composite scales r e p r e s e n t i n g t h e t h e o r e t i c a l dimensions the respective factors.
factor
score
produced
are
associated w i t h
The f a c t o r scores f o r each y e a r were subjected t o
Ward's h i e r a r c h i c a l c l u s t e r i n g procedure (Ward 1963; Ward and Hook 1963) t o examine i n d i v i d u a l segmentation schemes.
Ward's procedure was t h e n
a p p l i e d t o t h e combined f a c t o r scores t o d e r i v e a composite zone map. H i e r a r c h i c a l methods do n o t guarantee an o p t i m a l s o l u t i o n i n terms o f t h e c l u s t e r i n g c r i t e r i o n as t h e c l u s t e r s formed do n o t c o n s i d e r a l l p o s s i b l e c a n b i n a t i o n s o f t h e d a t a (Anderberg 1973).
Thus,
the results o f
t h e Ward c l u s t e r i n g procedure on t h e combined f a c t o r s c o r e m a t r i x were employed as a t r i a l composite zone map and d i s c r i m i n a n t a n a l y s i s a p p l i e d . Discriminant
analysis
(Nie
et
al.
1975)
compares
predicted
group
membership w i t h a c t u a l group membership, e m p i r i c a l l y measuring t h e success i n d i s c r i m i n a t i o n by o b s e r v i n g t h e p r o p o r t i o n o f c o r r e c t c l a s s i f i c a t i o n s . The canposite
zone map was
t h e n a p p l i e d t o t h e 1977, 1981 a n d 1982
i n d i v i d u a l f a c t o r score m a t r i c e s and d i s c r i m i n a n t a n a l y s i s again conducted t o determine t h e appropriateness o f t h e composite zone map.
103
To evaluate the efficiency of the proposed zonation pattern i n reducing the s p a t i a l v a r i a b i l i t y f o r each parameter, t h e following formula from El-Shaarawi (1984) was used: EC= (1 -TS. /TS 1x1 00
(3
where TS. i s t h e sum of parameter v a r i a b i l i t i e s exhibited d u r i n g a cruise i n each zone:
w i t h n k = number of s t a t i o n s sampled i n zone k , and TS i s the t o t a l c r u i s e v a r i a b i l i t y f o r t h a t parameter: N
(Xi-X)2
TS =
(5)
i =1 w i t h N = total number of s t a t i o n s sampieu. RESULTS AND D I S C U S S I O N
Table 1 presents the h values used t o transform each of the parameters. For most parameters, the necessary transformation varied from year t o year. For example, the biomass estimator parameters, POC, TPN, and CHLA, were log-transformed i n 1977 and 1981, while i n 1982, a square-root transformation was required. Frequency and timing of sampling, especially w i t h respect t o such phenomena as thermal bar formation, s t r a t i f i c a t i o n and upwelling, a f f e c t the d i s t r i b u t i o n of the data and, consequently, A . For example, the maximum likelihood estimates of X were 3.0, 0.5 and 0.0 f o r C L i n 1977, 1981 and 1982. In 1977, C L was sampled during only 5 cruises (March-June and September) as opposed t o 10 Thus the respective yearly C L means and cruises d u r i n g 1981 and 1982. 2 2 27.33 ( U =2.99), associated variances, u , for the three years were: 26.06 ( U2 = 27.10) and 25.97 ( C 2 =13.30). Having transformed the data, the variance r a t i o , F , was calculated. For a l l years, s i g n i f i c a n t spatial differences a t the 5% level were found i n a l l parameters. Although f o r a l l parameters considered the differences were of the same order of magnitude, conductivity showed the g r e a t e s t s p a t i a l heterogeneity. This then evidences real differences between s t a t i o n s , suggesting t h a t the lake be considered as more than a single homogeneous zone.
104
TABLE 1 .
Maximum likelihood estimates f o r h PARAMETER
1977 -
1981 -
1982 -
Temperature Conductivity
0.5 1 .o 0.5 3.0
0.5 0.5 1 .o 3.0 0.5 3.0
0.5 0.5 2.5 3.0
DO2
ALK CL S04 P oc TPN
3.0 3.0 0.0 0.0
CHLA NO3
0.0
TFN
0.0
NH3 TP S RS
0.0 0.0
1 .o
0.5
0.0 0.0 0.0
1 .o 0.0 0.0 -1 .o 0.0
0.0 1 .o
0.5 0.5 3.5 1.5 1 .o 0.0 -1 .o 0.0
Principal cmponents analysis was applied t o these T-values and the r e s u l t was, f o r each year, 3 f a c t o r s , orthogonal t o each other, which best accounted f o r the variance exhibited by the data. The cumulative percent v a r i a b i l i t y explained by the respective f a c t o r s was 83, 82 and 80 for 1977, 1981 and 1982. In each case, more than half the explained v a r i a b i l i t y could be accounted f o r by t h e f i r s t factor. Table 2 presents the factor score c o e f f i c i e n t matrices f o r each year. As expected, the biomass estimator parameters, POC, TPN and CHLA, always loaded highly on the same f a c t o r , usually i n conjunction w i t h temperature. Similarly, the major ions (ALK, SO4 and C L ) and conductivity scored together on a second factor. Figure 1 i s a graphical presentation of the rotated orthogonal f a c t o r s f o r 1982. Note the high degree of separation between the parameter c l u s t e r s denoting the lack of correlation between them. Dendrograms and t h e associated approximate zonation ( c l u s t e r i n g ) patterns, r e s u l t i n g from application of Ward's c l u s t e r i n g procedure t o each y e a r ' s f a c t o r scores, are presented i n Figures 2,3 and 4 . Note t h a t the zonation patterns described on the s t a t i o n maps are only approximations as, i n some cases, a s t a t i o n was clustered w i t h a zone t o
105
TABLE 2. F a c t o r score c o e f f i c i e n t m a t r i c e s o f t h e 3 f a c t o r s (FI,F2,F3)
f o r 1977, 1981 and 1982.
Temp.
-.306
Conduct,
.912 -
F2
-
.848
-.039 -.469
F3 .240
F1 .655
.667 .176
-.202
.873 - -.083
-.129
-.235
.891
.861 - -.316
-.181 -.lo8
.931
-
.894
.275
- -.165 .937
.904
.250
.940
-
CHLA
NO3
.294
TFN
.136
NH3
-.150
-.608
TP
-.229
.222 .228
S RS
-.695 -.483
-
.893
-.814 -
.043 .321 .659
-
.006 -.080
-.151
-.119
.914 - -.068 -.202 -.009 .305 -.026
-.126
-.216
.E .908 .949 -
.028 -090
.926 - -.lo3
-.010
.951 - -.076
-.045
-
.854
-.117
-.084
.093
.E
.886
.131
.072
.945
.918 _ .
-.073
-
.813 -
.723
-.lo5
.288
.676
-.166
.002
-
-715
-.411
.288
.735
-.311
.169
-.512
.735
.163
-.638
.525
.825 .251
-.158
1. Temp. 2. Cond. 3. TP 4 . DO2
6. TPN 7. POC
8. CHLA 9. ALK 10. NH3 11. so4 12. NO3 13. TFN 14. SRS
Factor 3 F i g . 1.
-
.312
CL
-.195 -.202
-.424
-.458 -.214
-.444 -.007
TPN
- -.300 .728
.151 .796 - -.319
-.221 -.167
- .240
F3 .211
DO2 ALK
.go9 -
.755 -
-.164
F2 -935
.583 ,833 - -.162
-.177
-.535
F3
-.159
-.135
SO4
F2.
F1
-.168
-.217
POC
1982
1981
1977 F1
Rotated f a c t o r score c o e f f i c i e n t s f o r 1982 parameters.
106
(u
In
’R
-
Station
Fig. 2 .
Dendrogram and r e s u l t a n t zonation p a t t e r n based on 1977 f a c t o r s c o r e s .
_d
__a
_-In
L
L
W
ln
0
L
ln V
0
u
c, Y4
E 0
%
7
ln
W
-0 4
L
E
n
W
n
4
c,
E 0 .r
c, N
0
c m
4
E
c, c, ln 3
7
L W E
U
m E
4
L
m 0
L
O
W
U E
m
cn LL
.r
107
108
CD
v)
L W 0
L
v) V
0
t: % 4 (u
!3? S
7
m v)
Q,
0 U
S
n L
Q,
n
3 4
S
0
S
4
w
.C
0 N
c, S m I-'
v)
3
7
01 L
U S m E
m
4 L
S
L 0 U
n W
0)
d
LL
-7
109 which,
geographically,
it
could
not
belong.
For
example,
s t a t i o n 64 was c l u s t e r e d i n t o zone 5 i n 1977 ( F i g u r e 2), i t was necessary t o i n c l u d e i t w i t h zone 6.
c l u s t e r s was grouped
somewhat
have
fixed
subjectively geographical
although
geographically
The f i n a l d e f i n i t i o n o f t h e
determined
as
locations.
the
stations
Consequently,
being was
it
necessary t o c o n t i n u e t h e c l u s t e r i n g procedure u n t i l l o g i c a l geographic d i v i s i o n s c o u l d be formed. the m s t
satisfactory.
For each y e a r , a s i x - c l u s t e r s t r u c t u r e proved Although,
w i t h the exception
of
zone 1,
the
boundaries o f t h e zones were n o t s t a t i c from one y e a r t o t h e next, general p a t t e r n o f d i s t i n c t water masses was e v i d e n t . shallow n o r t h e a s t e r n r e g i o n o f t h e l a k e ,
a
Zone 1 d e f i n e d t h e
t h e K i n g s t o n Basin, w h i l e t h e
deep o f f s h o r e waters were separated i n t o an e a s t e r n (zone 6 ) and western (zone 5) component.
Zone 2
formed
a transitional
zone between t h e
Kingston Basin and midlake and a l s o encompassed t h e e a s t e r n s h o r e l i n e . The
southern
and
western
nearshore
regions
c l u s t e r s , zones 3 and 4, r e s p e c t i v e l y .
emerged
as
2
distinct
Cluster analysis also i d e n t i f i e d
several s t a t i o n s w i t h water qual i t y c h a r a c t e r i s t i c o f nearshore i n p u t s as opposed t o whole l a k e ( r e f e r t o z o n a t i o n maps).
These s t a t i o n s were t h u s
designated as r e f l e c t i n g p o i n t sources and excluded from t h e zones. Because t h e 3 segmentation schemes e x h i b i t e d s i m i l a r p a t t e r n s i n clustering o f the stations,
t h i s suggested combining a l l
( 9 ) i n an a t t e m p t t o produce a composite zone map.
f a c t o r scores
When Ward's procedure
was a p p l i e d t o t h e combined f a c t o r scores, s t a t i o n s 8, 90, 97, 76, 71 and 86 were i d e n t i f i e d as r e p r e s e n t a t i v e o f nearshore source i n p u t s and t h u s isolated,
w h i l e t h e remaining 87 s t a t i o n s were merged i n t o 6 c l u s t e r s ,
hence d e f i n i n g t h e composite zone map shown i n F i g u r e 5. I n an i n v e s t i g a t i o n o f t h e s p a t i a l d i s t r i b u t i o n o f
n u t r i e n t s and
p a r t i c u l a t e o r g a n i c m a t t e r t h r o u g h o u t 1981 and 1982 i n Lake Ontario, N e i l s o n and Stevens
( i n p r e s s ) p r o v i d e d s u p p o r t i n g evidence f o r t h i s
r e g i o n a l i z a t i o n o f t h e lake. Hamil ton-Niagara
They found t h e Kingston Basin, t h e Toronto-
r e g i o n and t h e s o u t h e r n shore between t h e Niagara and
Genesee r i v e r s t o demonstrate d i s t i n c t i v e water qual ity c h a r a c t e r i s t i c s . These
areas
Furthermore,
correspond a
to
west-to-east
our
zones
decreasing
1,
4
and
3,
concentration
respectively. gradient
was
observed f o r a v a r i e t y o f parameters by N e i l s o n and Stevens ( i n p r e s s ) and i n p r e v i o u s s t u d i e s (Shiomi and Chawla 1970;
Gachter e t a l .
19741,
which would e x p l a i n t h e d i v i s i o n o f t h e midlake i n t o a west (zone 5 ) and e a s t (zone 6 ) component.
F i g . 5. Dendrogram and r e s u l t a n t z o n a t i o n p a t t e r n when f a c t o r s c o r e s from 1977, 1981 and 1982 were combined t o form composite zones.
111
Because t h e segmentation scheme derived from the combined f a c t o r scores ( i e . , the composite zone map) was determined i n c o n s i d e r a t i o n o f geographical c o n s t r a i n t s , i t was v a l i d a t e d using d i s c r i m i n a n t a n a l y s i s . The s t a t i o n s were assigned t o their r e s p e c t i v e zones and, using the f a c t o r scores a s d i s c r i m i n a t i n g v a r i a b l e s , the percent s t a t i o n s c o r r e c t l y c l a s s i f i e d was c a l c u l a t e d . Discriminant a n a l y s i s showed t h a t the composite zone map was indeed the optimal r e g i o n a l i z a t i o n a s 100% of t h e s t a t i o n s were c o r r e c t l y c l a s s i f i e d . Table 3 1 i s t s the standardized canonical discriminant function c o e f f i c i e n t s f o r t h e f o u r s i g n i f i c a n t functions, as determined by Wilks' lamda, used i n the a n a l y s i s . These 4 functions were a b l e t o account f o r 99% of the v a r i a b l i l i t y i n the f a c t o r scores. The f i r s t f a c t o r scores of each y e a r , and the t h i r d f a c t o r score of 1977, were t h e primary determinants of functions 1 , 3 and 4 while function 2 appeared a combination of many f a c t o r scores. Cross-referencing the
results of t h e p r i n c i p a l components a n a l y s i s (Table 2 ) discriminant f u n c t i o n 1 primarily represented 1981 POC,
revealed t h a t TPN and CHLA
d i s t r i b u t i o n s , function 3 was composed of 1977 conductivity and ion and
TABLE 3. Standardized canonical discriminant f u n c t i o n c o e f f i c i e n t s f o r the 4 s i g n i f i c a n t f u n c t i o n s r e s u l t i n g from discriminant a n a l y s i s of the canbined f a c t o r s c o r e s (FS).
FUNCTION 1 1977
1981
1982
FS1 FS2 FS3 FS1 FS2 FS3
.34982
FUNCTION 2
FUNCTION 3 .97442
- .08233 - .127 44
.157 42
1 .01605
-.16822 .16137 .03854
-.23994 .46603 32237
.07521 .53193 .26336
-.42521
.27212
.40318 .20713 .35664
-.091 43
-.02814 .33614
- .73885 - .04 975
FS1
-.13944
FS2
-.01110
FS3
.51483
-.12275 .48973
FUNCTION 4
.72198
- .38744
-.4546 2
-.
.25323
-.34552 -.27403
112
1982 POC, TPN and CHLA d i s t r i b u t i o n s , w h i l e f u n c t i o n 4 was almost e x c l u s i v e l y r e f l e c t i v e o f 1977 TP and NH3 l e v e l s . Figure 6 i s a s c a t t e r p l o t , d e f i n e d by t h e f i r s t two d i s c r i m i n a n t f u n c t i o n s , separation
amongst
zones,
indicating
that
the
factor
showing t h e scores
weve
successful d i s c r i m i n a t o r s .
8-
C
-I-
0
4
c
4-
3 LL
3
+
CT a C .-
.-E 5 .-tn
n
-f
4
.-0
3 3 : 3
3
0-
2
22 2 0222
11
1.
11 1
4
1
4
* 4 4 4
4.
4
5 5 5 5 555 5*855 5555 5 5 55
-4 -
Canonical Discriminant Function 1
F i g . 6.
S c a t t e r p l o t d e f i n e d by t h e f i r s t 2 canonical d i s c r i m i n a n t
f u n c t i o n s which r e s u l t e d from a n a l y s i s o f combined f a c t o r scores.
When t h e c a n p o s i t e zone map was t h e n imposed on t h e f a c t o r scores o f the i n d i v i d u a l years,
d i s c r i m i n a n t a n a l y s i s r e v e a l e d t h a t 74, 90 and
90 percent o f t h e s t a t i o n s i n 1977, correctly
classified.
misclassified,
When
a
1981 and 1982,
station
is
respectively,
designated
i t demonstrates a h i g h e r p r o b a b i l i t y o f
being
belonging t o a
zone o t h e r t h a n t h a t t o which i t was o r i g i n a l l y assigned. p r e d i c t e d zone d i d n o t correspond t o t h e a c t u a l zone (ie.,
as
were
Where t h e
t h e group f o r
which t h e s t a t i o n showed t h e g r e a t e s t p r o b a b i l i t y o f membership),
the
113
second highest probability zone was, i n almost a l l cases, the originally proposed zone. In this study, the g r e a t e s t percent of misclassifications This i s n o t unexpected due t o the occurred between zones 5 and 6. variation i n areal extent of these two midlake zones d u r i n g the three years ( r e f e r t o Figures 2,3 and 4 ) . Prior t o t h i s study, Lake Ontario had been subjectively divided i n t o 17 zones (Figure 7 ) on the basis of basin geomorphology, location of nearshore inputs and the summer epilimnetic c i r c u l a t i o n patterns (International J o i n t Commission 1977). To investigate whether t h e approach adopted f o r this study produced a superior zonation pattern, discriminant analysis of the f a c t o r scores was repeated u s i n g these 17 zones (herein referred t o as the IJC zones). Results of t h i s analysis indicated t h a t t h e r e was l i t t l e d i f f e r e n t i a t i o n between zones and only 47, 61 and 49 percent of the s t a t i o n s i n 1977, 1981 and 1982, respectively, were correctly c l a s s i f i e d . When t h e actual group membership was canpared w i t h the predicted group membership, IJC zone 1 showed a high percent of c o r r e c t c l a s s i f i c a t i o n s . All s t a t i o n s assigned t o IJC zones 9, 10, 1 2 and 15, however, consistently demonstrated higher probabilities of belonging t o other zones, suggesting t h a t , based on t h e chosen discriminating variables, these zones were i n f a c t not d i s t i n c t water masses. Similarly, only half t h e s t a t i o n s i n IJC zone 17 showed highest probability of membership f o r this zone; the remaining s t a t i o n s were categorized w i t h up t o 7 other zones.
F i g . 7.
Seventeen IJC zones previously used t o divide Lake Ontario.
114
TABLE 4.
Percent explained variab lity ( E C ) for the (6) composite zones of Lake Ontario. (a)
1977
Percent explained v a r t a t i o n f E C )
07/1807/22
08/1508/19
09/1209/16
10/1010/13
11/1411/17
Parameter
03/1603/20
04/1204/15
05/0905/13
06/0606/10
Temperature
58.4
63.3 52.6
40.4
37.2
26.6
10.7
12.6
5.4
45.8 62.3
73.3
Conductivity
50.0
38.1
31.3
31.5
18.1
14.1
44.1
49.9
61.3
6.2
14.4
40.4
36.9
48.1
18.1 52.7
DO2
36.1
24.7
42.2
64.9
ALK
10.3
42.5
28.1
21.2
NO3
41.9
28.7
60.7
61.4
22.9
SR S
30.5
30.1
59.9
73.0
47.2
TP
17.6
17.3
38.4
15.3
TFN
18.5
57.7
48.8
NH3
9.3 38.6
60.8
22.8
21.4
CL
40.9
75.0
51.6
SO4
40.6
60.1 62.3
59.5
47.8
POC
33.2
62.7
56.4
TPN
29.5
65.9
58.0
CHU
19.6
60.3
03/1603/20
W/O60411 0
Temperature
71.1
45.6
Conductivity
20.7
16.8
DO2
76.3
44.9
ALK
11.8
35.4 6.0
NO3
73.8
80.9
SR S
64.8
63.8
TP
9.4
TFN
22.9 15.6
(b)
32.7
17.8
13.9
16.7 25.4 40.7
23.3
68.6
41.2 29.1
55.3
16.7
37.1
11.3
39.4
56.2
8.9
34.9
15.1
41.1
56.5
55.4
42.8
47.9
15.2
27.4
61.1
58.6
W/2705/01
05/1905/22
06/1506/19
0711307/17
0811008/14
10/0510/09
1111611/20
12/0712/11
53.3
74.0
29.7
16.9
45.5
20.6
56.8
67.9
63.3
43.9 27.9
32.2
25.9
28.9
61.2
41.2
25.6 43.9
28.1 29.1
33.7 56.3
20.1 49.5
-
27.4
9.5 41.9 36.4
40.4
39.7 39.7 34.1
44.0
42,8
1981
NH3
37.8 39.7
CL
53.7
SO4
16.8
29.4
46.9 7.2 50.9
23.9
39.8
-
-
-
37.9 39.8
69.7
66.9
34.2
21.0
69.1 29.0
37.1
25.0
38.4 20.5
-
49.8 29.7
-
POC
62.8
68.6
65.0
74.9
5.0
12.8
21.6
6.7
29.1
55.0
TPN
63.9
58.3
59.2
75.0
6.0
12.0
22.0
1.5
24.1
61.3
CHLA
61.1
55.2
67.2
67.3
4.7
17.1
37.6
6.6
9.9
52.1
031080311 2
03/2904/02
W/2604/30
05/1705/21
06/1406/18
07/120711 6
0811 608/20
09/1309/17
10/1210/17
11/1511/19
56.3 29.7
54.3 28.5
41.0 40.1
49.3 33.2
64.6 22.9
56.8 51.7
30.2 42.2
61.0 28.3
19.9 49.7
78.4 39.9
54.6
52.6
62.0
49.7
59.3
13.8
(c)
1982
Tempe r a t ure Conductivity
DO2
24.4
65.0
37.8
ALK
24.8
45.2
46.5
NO3
5.0
41.2
74.3
SR S
TP
46.4 49.7
59.4 36.0
73.9 43.7
TFN
15.0
53.9
44.6
NH3
46.7 44.7
32.6 72.9
46.7 74.6
POC
16.2 23.6
23.2 55.1
70.3 55.2
TPN
31.2
,61.4
CHLA
23.2
69.4
CL
SO4
-
60.1
-
33.8 13.8
25.8
66.0 37.5
47.1
91.0
41.9
50.3 41.1
78.2
26.1
-
41.9 31.3 76.3 66.5 51.2 57.4 39.6 64.0
13.1
36.5 11.3
33.8
27.6
16.1 16.9
7.2
57.9 42.1
36.4
-
-
56.6
53.5
56.3
58.1
48.6
30.4
20.6
32.2
10.5
47.0
48.7
47.2
57.3
33.3
43.5
21.2
11.2
44.4
115
This then indicates t h a t the proposed composite zone map i s indeed a more e f f e c t i v e scheme f o r identifying water masses of similar water quality. For each cruise, a measure of the effectiveness of the composite zones i n reducing the t o t a l spatial v a r i a b i l i t y demonstrated f o r each parameter, EC, i s presented i n Table 4. The values of EC l i e between 0 and 100, and the closer this value i s t o 100, the more effective the zonation pattern.
For most of
the parameters measured
there d i d not appear t o be a d i r e c t relationship between time of sampling and EC, although the percent variation explained by the zones was generally higher i n spring (April/May).
T h i s suggests t h a t the composite Hence, the e f f e c t of zones have optimal application in the s p r i n g . spatial v a r i a b i l i t y on the accurate determination of trend i s greatly reduced by incorporating t h i s zonation scheme i n t o the a n a l y s i s of s p r i n g data. Having now determined
a
c l a s s i f i c a t i o n scheme which
identifies
d i s t i n c t water masses of similar water quality i n Lake O n t a r i o , the next step will be t o use these zones t o derive the m i n i m u m number of s t a t i o n s which need t o be sampled, and t h e i r distribution over t h e lake, t o obtain the same efficiency as t h a t obtained using the sampling strategy outlined i n t h i s study.
REFERENCES Anderberg, M.R. 1973. Cluster analysis f o r applications. Academic Press. New York. 359P. Anderson, J.E., A.H. El-Shaarawi, S.R. Esterby and T.E. Unny. 1984. Spatial and temporal v a r i a b i l i t y of dissolved oxygen i n Lake Erie, p. 103-130. In/A.H. El-Shaarawi (ed.) S t a t i s t i c a l assessment of the Great Lakes Surveillance Program, 1966-1981, Lake Erie. Env. Can., IWD S c i e n t i f i c Series No. 136. Barica, J . 1982. Lake Erie depletion controversy. J. Great Lakes Res. 8 ( 4 ) :71 9-722. Box, G.E.P. and D.R. Cox. 1964. An Analysis o f Transformations. J.R. s t a t i s t . SOC. B. 26:211-243. Carew, T.J. and D.J. Williams. 1975. Surveillance Methodology. Inland Waters Directorate. Technical Bulletin No. 92. 28p. Charlton, M.N. 1980. Oxygen depletion i n Lake Erie: Has t h e r e been any change? Can. J. F i s h . Aquat. Sci. 37:72-81. El-Shaarawi, A.H. 1982. Sampling strategy f o r estimating bacterial density i n large lakes. J. francais d'hydrologie 13:171-187. El-Shaarawi, A.H. 1984. Temporal changes i n Lake Erie. p. 27-102. In /A.H. El-Shaarawi (ed.) S t a t i s t i c a l assessment o f t h e Great Lakes Survei 11 ance Program, 1966-1 981 , Lake Erie. Environment Canada. IWD S c i e n t i f i c Series No. 136.
116
El-Shaarawi, A.H. and R.E. Kwiatkowski. 1977. A model t o d e s c r i b e the inherent s p a t i a l and temporal v a r i a b i l i t y of parameters i n Lake Ontario 1974. J . Great Lakes Res. 3 (3-4): 177-183. El-Shaarawi, A.H. and M.A. Neilson. 1984. Changes i n nutrient l e v e l s o f l a k e water s t o r e d a t 4'C. Can. J . Fish. Aquat. S c i . 41(6):985-988. El-Shaarawi, A.H. and K.R. Shah. 1978. S t a t i s t i c a l procedures f o r c l a s s i f i c a t i o n of a lake. Inland Waters D i r e c t o r a t e S c i e n t i f i c S e r i e s No. 86. 9p. Environment Canada, 1979. Analytical Methods Manual. Water Qual i t y Branch. Inland Waters D i r e c t o r a t e . Ottawa. Canada. Gachter, R., R.A. Vollenweider and W.A. Glooschenko. 1974. Seasonal v a r i a t i o n s of temperature and nutrients i n the s u r f a c e waters of l a k e s Ontario and Erie. J . Fish. Res. Board Can. 31:275-290. International J o i n t Commission. 1977. Great Lakes Water Qual i ty Board. Appendix B. S u r v e i l l a n c e Subcommittee Report. 11 Op. Kwiatkowski, R.E. 1980. Regionalization of the Upper Great Lakes w i t h respect t o s u r v e i l l a n c e e u t r o p h i c a t i o n data. J . Great Lakes Res. 6 :38-46. Kwiatkowski, R.E. 1984. Comparison of 1980 Lake Huron, Georgian Bay-North Channel surveillance data w i t h historical data. Hydrobiol. 11 8 ~255-266. Leach, J.H. 1980. Limnological sampling i n t e n s i t y i n Lake S t . C l a i r i n r e l a t i o n t o d i s t r i b u t i o n of water masses. J. Great Lakes Res. 6 ( 2 )~141-145. Moll, R.A., R. Rossmann, D.G. Rockwell, and W.Y.B. Chang. 1985. Lake Huron intensive survey, 1980. Special r e p o r t no. 110. Great Lakes Research Division. Great Lakes & Marine Waters Center. University of Michigan, Ann Arbor, Michigan. Neilson, M.A. 1983. Trace metals i n Lake Ontario, 1979. Inland Waters Directorate. S c i e n t i f i c S e r i e s No. 133. 13p. Neilson, M.A. and R.J.J. Stevens. In Press. Vertical and horizontal d i s t r i b u t i o n of nutrients and p a r t i c u l a t e organic m a t t e r i n Lake Ontario - 1981, 1982. Can. J. F i s h . Aquat. Sci. Nie. N.H..G.H. Hull. J.G. Jenkins. K.Steinbrenner. and D.H. Bent. . 1975. S t a t i s t i c a l Package- f o r the S o c i a i Sciences. McGraw-Hill. New York 6 / 3 0 Rosa, FT and N.M. Bhrns. 1985. Lake Erie Central Basin oxygen depletion changes from 1929-1 980. Environment Canada. NWRI Contribution # 85-1 02. Shiomi, M.T. and V.K. Chawla. 1970. Nutrients i n Lake Ontario. Proc. 13th Conf. Great Lakes Res. 715-732. Schroeder, R. 1969. E i n summierander wasserchopfer. Arch. Hydrobiol. 66~241-243. van B e l l , G. and J.P. Hughes. 1983. Monitoring f o r water q u a l i t y : f i x e d s t a t i o n s versus intensive surveys. J . Water P o l l u t . Control Fed. 55: 400-404. Ward, J r . , T H . 1963. Hierarchical grouping t o optimize an o b j e c t i v e function. J . Amer. S t a t i s t . Assoc. 58(301 ):236-244. Ward, J r . , J.H. and M.E. Hook. 1963. Application of an h i e r a r c h i c a l grouping procedure t o a problem of grouping p r o f i l e s . Educ. and Psycho1 Measurement 23(1 ):69-82.
.
SPATIAL VARIABILITY IN THE WATER QUALITY OF QUEBEC RIVERS
MARC SIMONEAU
Ministere de 1’Environnement du Quebec, Direction des releves aquatiques, rue Marly, 5e etage, Sainte-Foy (Quebec), Canada, G1X 4E4
3900,
ABSTRACT The spatial variability of the water quality of Quebec rivers was studied These data, which using data collected over a five-year period (1979-1983). were obtained through the operation of a monitoring network, were analyzed using multivariate analytical methods. A principal component analysis (PCA) was used to condense the information contained in the data matrix and to identify the physical chemical parameters responsible for the major portion of the among stations variance. Furthermore, a cluster analysis (using the squared Euclidean distance as similarity coefficient and Ward‘s method as an agglomerative hierarchical clustering algorithm) was performed to reveal the presence of homogeneous water quality regions in the province. The PCA produced two significant principal components wh ch explain 76 percent of the among stations variance. The first axis 51 percent) represents a mineralization and nutrient gradient whereas the second (25 percent) The cluster analysis has represents an organic content and color gradient revealed the presence of six distinct groups, whose water quality reflects the geology of the different basins and, to various degrees, the anthropogenic activities. One of these groups is composed of twelve problem-stations, most of which found in drainage basins with small surface area affected by local point source pollution. INTRODUCTION Water quality monitoring of Quebec rivers is an activity that w a s begun in 1967 with the creation, by the ministere des Richesses Naturelles, of a water quality network. As was the case in many other countries, a basic knowledge
118
119
of the water quality (and its spatial and temporal evolution) was found to be lacking at the time, and was needed in order to solve the problems caused by the utilization of the water resources in the province of Quebec. Since the early days, the water quality network has undergone many modifications in order to improve the quality of the collected data and to simplify its operation. Among those were the systematization of the sampling procedure, a change in the conservation method of the samples, a decrease in the delay between sampling and laboratory analyses and finally, the use of better analytical methods. Over the years, the measured parameters list became more extensive and, in addition to the major ions and physical parameters, went to include the nutrients and trace metals. Furthermore, changes took place in the sampling sites list, as new stations were added and old ones suppressed or relocated closer to the mouth of the rivers. The progressive development of the network was brought about by a change in the objectives which became more precise. The initial goal of obtaining a basic knowledge of water quality was found insufficient and too general, and therefore was replaced. The new goals became the characterization of the water quality of rivers on an annual or seasonal basis; the study of the spatial variability o f Quebec rivers; the study of the temporal evolution of the measured parameters and the prediction of their long term trend; and finally, the creation for potential users of an adequate water quality data bank. The current edition of the river water quality network, now operated by the Direction des releves aquatiques du ministere de l'Environnement, i n existence since the end of 1978 (Goulet
1979),
i s the r e s u l t of the
recommendations formulated by Bobee et al. (1977) in their thorough evaluation of the network activities. These authors studied the data collected between 1967 and 1975 and proposed corrective measures. This paper uses the data collected between January 1979 and December 1983 to study the spatial variability in the water quality of Quebec rivers. The objectives of the study are to identify the parameters responsible for the observed among stations variance, and to detect the presence of homogenous water quality regions in the province.
STUDY AREA The river network is composed of 136 stations located in 81 drainage basins, south of -latitude 52O (Fig. 1). The sampled rivers come from eight
Fig. 1. Position of the sampling stations used ir: monitoring the water quality of Quebec rivers.
120
of the ten hydrological regions of the province of Quebec. The choice of the rivers was based on the surface area and the population density of their drainage basin (Goulet 1979). Minimum values were set in both cases in order to eliminate the small basins (< 400 km2) and the quasi-uninhabited northern basins ( < 500 inhabitants). The presence of polluting industries or mines, and the importance of lumbering and farming on the basins were also considered in the selection of the rivers. Additional sampling stations were required in areas where socio-economic activities were more intense or dispersed, to better evaluate the effects of these activities and to observe how water quality varies along the different reaches of the same river.
S a m 1 i nq Drocedure The water samples collected at the different stations come from two distinct sources: the observers and the technicians of the Ministry. The observers are inhabitants living close to the sampling stations. They are recruited, trained and paid by the Ministry to collect f o r t n i g h t l y a water sample and to send it to the laboratory where the chemical analyses are performed. This group, which samples 113 stations, collects the major portion of the samples ( > 80 percent). Furthermore, they are asked to report to the Ministry any emergency situation that might arise on the river (spills, fish kills, etc.) so that immediate action may be taken. The rest of the samples are collected by the technicians of the Ministry on the same rivers sampled by the observers and at exactly the same location but on a seasonal basis. They also sample exclusively 23 other stations on a seasonal or monthly basis. In addition to the routine water sample collection, they perform some field measurements and take additional samples for the analyses of particular parameters and for occasional bioassays. The water samples collected by both the observers and the technicians, are depth-integrated grab samples. They are obtained by sinking a sampling iron at a constant rate over the water column and retrieving it after the desired depth has been reached. Sampling takes place on a bridge in the middle o f the river bed. The water samples contained in polyethylene bottles are kept refrigerated, and are sent to the laboratory in an insulated shipping box with ice-packs. The samples are usually received by the laboratory within a 24-hour period.
121
Laboratory a n a l y s i s the
All
chemical
analyses
were
performed by
the
laboratory
of the
m i n i s t e r e de 1’Environnement du Quebec (Complexe S c i e n t i f i q d e , 2700 rue E i n s t e i n , Sainte-Foy, Quebec, G l P 3W8). The analyzed parameters included t h e major
and minor
are
ions,
the
nutrients,
the
The complete parameter 1is t
parameters. shown
in
1.
Table
The
methods
trace
and t h e i r used
determination are described i n Longpre e t a l .
in
metals
and
physical
measurement frequencies performing t h e
chemical
(1982).
Data anal v s i s The
raw
data
matrix
used
in
the
present
study
contained
all
the
measurements obtained f o r 36 parameters a t 134 sampling s t a t i o n s between 1979 and 1983
(Fig.
Two s t a t i o n s were removed a t t h e onset o f t h e a n a l y s i s
2).
because they were n o t sampled over t h e whole f i v e - y e a r period. was
synthesized by
computing,
value f o r t h i s t i m e period. The new data m a t r i x was twelve parameters o n l y were chosen, as discussed l a t e r , s t a t i s t i c a l analyses. sulfate, iron, t o t a l
T h i s data set
f o r each parameter and by s t a t i o n ,
a median
f u r t h e r reduced as f o r t h e subsequent
These v a r i a b l e s were calcium, magnesium, c h l o r i d e , nitrogen, t o t a l phosphorus, t o t a l organic carbon,
tannins and l i g n i n s , t u r b i d i t y , a l k a l i n i t y and pH. The f i r s t a n a l y s i s performed was a p r i n c i p a l using t h e c o r r e l a t i o n m a t r i x between the twelve point.
component a n a l y s i s (PCA), parameters as a s t a r t i n g
The c o r r e l a t i o n m a t r i x (standardized data) was chosen i n s t e a d o f t h e
covariance
matrix
(centered data)
because t h e parameters selected
f o r the
a n a l y s i s had d i f f e r e n t magnitudes, ranges and scales o f measurement which, i f n o t taken i n t o account, would have given more weight t o c e r t a i n v a r i a b l e s due e n t i r e l y t o t h e i r r e s p e c t i v e variance (Legendre e t Legendre, 1983; W h i t f i e l d , 1983). This p a r t i c u l a r type o f o r d i n a t i o n transforms a data s e t c o n t a i n i n g n observations (samples) on p v a r i a b l e s (physical chemical v a r i a b l e s ) i n t o a reduced data s e t containing n observations on k
of
i n f o r m a t i o n caused t h e parameters
by
accounting
some
manner t h a t minimizes
t h e r e d u c t i o n (Green, f o r t h e major
1979).
portion
of
The PCA t h e among
s t a t i o n s variance. The second a n a l y s i s used i n t h e study was a c l u s t e r i n g procedure, t h e purpose o f which was t o produce groups o f s t a t i o n s w i t h s i m i l a r water quality.
Data were standardized p r i o r t o
the calculation o f
a similarity
TABLE 1
L i s t o f variables measured in the water samples along with their sampling frequencies. MEASUREMENT FREQUENCY BSERVERS
(13 PER YEAR)
EVERY 4 WEEKS PH ALKALINITY COLOR TURBIDITY T A N N I N S AND L I G N I N S FLUORIDE SILICA
SULFATE CHLOR IOE CALCIUM MAGNESIUM SODIUM POTASSIUM
IRON MANGANESE COPPER ZINC LEAD CADMIUM
1
I
NICKEL CHROMIUM ARSENIC
I 1
( 2 5 PER YEAR)
EVERY 2 WEEKS TEMPERATURE CONDUCTIVITY
ECHNIC I A N S
CARBON - TOTAL - INORGANIC
NITROGEN ( D I S S O L V E D ) - KJELDAHL - AMMONIA - NITRATE t N I T R I T E
PHOSPHORUS - TOTAL D I S S O L V E D - TOTAL P A R T I C U L A T E (MONTHLY FOR 6 S T A T I O N S )
SEASONAL SAME PARAMETERS A S ABOVE
D I R E C T MEASUREMENTS - D I S S O L V E D OXYGEN - DH - CONDUCTIVITY - TEMPERATURE
NONFILTRABLE RESIDUES
CYANIDES
TOTAL I N O R G A N I C PHOSPHORUS
ALUMINUM - TOTAL - DISSOLVED BIOASSAYS (SOME STAT IONS )
OCCASIONAL SILVER BARIUM
COBALT LITHIUM
SELENIUM STRONTIUM
OTHFR T O X I C A N T S
123
RAW DATA MATRIX
(36
PARAMETERS X 21906 SAMPLES)
MEDIANS MATRIX
(36
PARAMETERS X 134 STATIONS)
DATA REDUCTION
SELECTION OF PARAMETERS
1
MEDIANS MATRIX
(12 PARAMETERS X 134 STATIONS)
ORDINATION (PCA)
CLUSTER ANALYSIS
CORRELATION MATRIX
DATA STANDARDIZATION
FACTOR PATTERN
SQUARED EUCLIDEAN DISTANCE
1
t
i-
PRINCIPAL COMPONENTS SCORES
WARD'S METHOD
J
CLUSTERS WITH SIMILAR WATER QUALITY STATIONS SUPERIMPOSITION
F i g . 2.
Diagram showing the steps followed in the data analysis.
coefficient, the squared Euclidean distance. This step was necessary because the Euclidean distance does not have a maximum value. It increases with the number of parameters selected and is affected by the original scales of the parameters (Legendre et Legendre, 1983). Ward's method was used as the agglomerative hierarchical clustering algorithm. The results of the cluster analysis were then superimposed on the plot of the principal components scores to show the exact relationships between the objects (stations). Both the PCA and the clukter analysis were performed using SAS programs (SAS Institute Inc., 1982).
124
RESULTS AND DISCUSSION As o f t e n i s parameters
the
chosen
case
with
physical
chemical
f o r t h e s t a t i s t i c a l analyses
c o n c e n t r a t i o n d i s t r i b u t i o n s over
time
variables,
were
and over
most
found t o
stations.
have skewed
Consequently, we since i t i s
used t h e median as e s t i m a t o r o f t h e c e n t r a l tendency o f t h e d a t a n o t a f f e c t e d as much as t h e mean by extremely h i g h values
o f the
.
No attempt was made t o f i l t e r o u t t h e temporal e f f e c t s s i n c e most s t a t i o n s were sampled
on a r e g u l a r
month o f t h e y e a r .
b a s i s and d a t a
were o b t a i n e d f o r each
a seasonal b a s i s i n o r d e r t o o b t a i n d a t a which showed t h e o f water q u a l i t y . month p e r i o d , the r i s k
of
unusual
Furthermore,
they take getting
annual v a r i a b i l i t y
s i n c e d a t a used i n t h i s study covered a 60-
i n t o account i n t e r - a n n u a l v a r i a b i l i t y
non-representative
hydrological
Consequently,
and every
The s t a t i o n s sampled by t e c h n i c i a n s o n l y , were v i s i t e d on
events
which
water
could
quality
prevail
d a t a used i n t h e present study g i v e
and decrease
data on
imputable t o
a
given
a reliable
image
year. o f the
water q u a l i t y o f each s t a t i o n ( r i v e r o r r i v e r reach). perform t h e PCA and t h e
The v a r i a b l e s s e l e c t e d t o chosen
as t o o f f e r a general image o f
so
considerable logically,
v a r i a t i o n between s t a t i o n s independent o f
each
t h e water and most
other.
cluster
a n a l y s i s were
quality.
They showed
of
them were,
at least
Furthermore, these v a r i a b l e s c o u l d
r e f l e c t t h e g e o l o g i c a l and l a n d use e f f e c t s on water q u a l i t y . P r i n c i p a l comDonent a n a l y s i s
A rivers
f i r s t PCA, performed on t h e 134 s t a t i o n s , has r e v e a l e d behaved v e r y d i f f e r e n t l y from t h e o t h e r s . I n order
t h a t twelve to
avoid a
d i s t o r t i o n o f t h e s p a t i a l v a r i a b i l i t y image, these s t a t i o n s were removed from the
data The
set.
PCA,
Their
water q u a l i t y w i l l
conducted on
be
t h e remaining 122
discussed l a t e r . stations,
components w i t h eigenvalues equal t o o r g r e a t e r than one. t h e broken s t i c k model ( F r o n t i e r 1976), our study,
has
produced t h r e e
However, based on
o n l y t h e f i r s t two are considered i n
s i n c e t h e percentage o f variance e x p l a i n s by
t h e t h i r d component
they are s i m i l a r , s i n c e t h e y may be a p a r t on a t h i r d o r f o u r t h component. solve t h i s problem,
t h e r e l a t i o n s h i p s between s t a t i o n s were s t u d i e d
To
using a
cluster analysis. The c l u s t e r a n a l y s i s produced f i v e
d i s t i n c t groups o f
same t w e l v e v a r i a b l e s s e l e c t e d f o r t h e PCA. the
c l u s t e r analysis
identified five
groups
on
the of
principal
stations
stations using the
By superimposing t h e r e s u l t s o f components
scores
and t h e r e s p e c t i v e
(Fig.
4), we
position
o f these
125 PC
1 ,c
(2!
0.E
TA\
+ z Y
: 0 0.0 0
0
w
ul
-0.5
‘PH
-l,o 0.0 FIRST COMPONENT
-0.5
1.0
0.5
Fig. 3. Projection o f the twelve descriptor axes in t h e reduced plane formed by the first two principal components. also drawn is the equilibrium circle o f contribution ((d/n)1/2 = (2/12)1/2 = 0.41).
II
F
- 1.
d
-2-
-1
0
1
2
3
FIRST COMPONENT
Fig. 4. Superimposition o f the cluster analysis results on the principal components scores (position o f t h e stations in t h e reduced plane).
TABLE 2
Ranges o f station median concentrations within the groups revealed by the cluster analysis.
Variable
Calcium (mg 1-1) Magnesium (mg 1-1) Chloride (mg 1-1) Sulfate (mg 1 - 1 1 Iron (mg 1-1) Total nitrogen (mg 1-1) Total phosphorus (mg 1 Total org. carbon (mg 1 Tannins and lignins (mg Turbidity (NTU) Alkalinity (mg 1-1) PH
Group
6a (n=2)
1 (n=48)
2 (~31)
3 (n=13)
4 (n=23)
(n=7)
1.20-8.10 0.30-1.35 0.2-4.0 1.0-8.5 0.03-0.57 0.10-0.49 0.010-0.040 6.0-15.5 0.60-2.30 0.4-5.2 1.8-17.0 6.10-7.30
4.eO-23.80 1.40-5.00 1.1-32.0 5.6-30.2 0.15-0.86 0.34-0.90 0.020-0.110 7.5-14.0 0.60-1.85 1.5-10.0 12.0-50.0 7.00-7.60
17.00-36.95 2.90-12.00 12.6-46.5 10.1-31.O 0.32-0.68 0.84-2.07 0.060-0.370 10.2-17.0 0.60-1.20 3.8-20.0 44.0-97.0 7.20-7.80
11.25-35.50 2.00-6.60 1.1-11.0 5.0-19.2 0.01-0.35 0.20-0.64 0.010-0.080 5.0-10.8 0.10-0.70 1.0-6.0 42.0-93.O 7.50-7.80
7.10-29.00 1.50-4.30 1.7-12.2 6.0-19.8 0.39-1.97 0.42-0.77 0.040-0.120 15.5-23.0 1.50-3.26 6.0-17.5 11.0-60.0 6.70-7.70
6.0-20.0 1.5-3.7 0.04-0.19 0.06-0.70 0.22-1.01 0.38-1.51 0.029-0.127 0.030-0.190 4.0-8.8 5.0-20.5 10.0-30.0 3.5-6.5 185.0-328.5 20 .O-49.0 60.0-142.0
1.4-7.8 0.3-1.6 0.01-0.06 0.01-0.08 0.10-0.36 0.03-0.32 0.005-0.240 0.006-0.027 2.5-33.O 3.0-7.5 5.0-20.0 0.8-5.5 122.0-208.0 1 .O-23.0 49.0-105.1
2.4-10.4 5.5-17.0 0.9-2.5 0.7-1.3 0.06-0.09 0.05-0.11 0.02-0.15 0.01-0.12 0.06-0.24 0.05-0.10 0.22-0.59 0.29-0.53 0.022-0.050 0.027-0.040 0.017-0.052 0.021-0.077 2.2-29.8 5.0-7.0 7.5-11.0 7.5-7.5 6.1-160.0 15.5-20.0 4.8-7.2 4.1-9.2 74.0-216.0 68.0-227.0 37.0-50.0 49.0-82.0 17.8-25.8 25.6-87.4
5
5.40-8.00 1.00-1.40 3.2-17.9 7.6-42.8 0.51-0.66 0.28-0.73 0.050-0.132 43 .O-80.0 10.00-15.85 6.0-25.O 7.0-13.0 6.40-6.70
6b (n-lo) 10.70-70.00 6.00-30.00 22.5-88.5 14.0-102.0 0.29-2.27 0.90-4.OO 0.083-0.289 9.25-18.00 0.55-1.60 3 .O-40.0 102.0-188.5 7.04-8.40
The following variables were not used in the cluster analysis: Sodium (mg 1-1) Potassium (mg 1-1) Manganese (mg 1-1) Ammonia (mg 1-1) Nitrate t nitrite (mg 1-1) Kjeldahl nitrogen (mg 1-1) Total part. phosphorus (mg 1-1) Total diss. phosphorus (mg 1 - l ) Copper (us 1-1) Lead (ug 1-1) Zinc (pug 1-1) Silica (mg 1-1) Conductivity (US cm-1) True color (Hazen) Hardness (mg 1-1)
0.5-3.8 0.2-0.8 0.01-0.03 0.01-0.04 0.02-0.31 0.06-0.31 0.003-0.021 0.03-0.018 2.0-26.0 1 .o-20.0 5.0-14.0 2.6-10.15 12.0-73.0 12.0-56.0 4.2-25.9
1.5-20.0 0.5-1.8 0.01-0.16 0.02-0.16 0.15-0.55 0.16-0.50 0.009-0.051 0.009-0.057 2.5-14.O 3.0-7.5 5.0-28.1 2.1-6.3 58.0-246.0 17.0-37.0 19.1-73.1
10.0-75.0 2.6-6.1 0.04-0.12 0.05-0.90 0.50-1.55 0.35-1.60 0.027-0.180 0.058-0.192 2.5-9. O 5.0-9.0 8.2-180.0 5.4-9.4 252.0-791.5 25.0-61.0 65.1-279.3
127
groups in the reduced plane informed us about their physical chemical characteristics. Hence, the two analyses complemented each other very well and produced an image which summarizes all the information contained in the initial data matrix. Table 2 provides a summary of the water quality of each group. Some of the variables, not used in the cluster analysis, are not listed in this table because they did not show any variation among the groups (fluoride, cadmium, chromium and nickel), they provided redundant information (total carbon, inorganic carbon, and apparent color) or they were only measured at some stations on few occasions (trace metals and toxicants). The first group revealed by the cluster analysis contains most of the stations (rivers) located on the Canadian Shield (Fig. 5a). They correspond to large quasi-uninhabited drainage basins virtually unaffectea by human activities (low nitrogen and phosphorus concentrations). The water quality of these rivers reflects the geology of the Canadian Shield dominated by Precambrian rocks very resistant to erosion. As a result, these waters are weakly mineralized and have low alkalinity, pH and turbidity values (Table 2).
The second group contains rivers whose water quality shows the influence of various human activities. Agriculture and farming, the presence of pulp and paper mills and/or municipal discharges pollute to some extent these rivers. These waters are more mineralized than those of group 1, have higher alkalinity and pH values (Table 2 ) , and correspond to drainage basins located in the St. Lawrence lowlands and to the Ottawa River below Temiscaming (Fig. 5b). A high percentage of the phosphorus and nitrogen values recorded at the stations of this group exceed the water quality guidelines proposed for the protection of aquatic life (McNeely et al. 1980). Members of group 3 are more polluted than those of group 2 . They belong to five basins of the St. Lawrence lowlands region which also suffer from various anthropogenic activities (Fig. 5c). The Yamaska River basin, which is densely populated, has 40 percent of its surface area used for agricultural practices (including commercial 1 ivestock) and hence, counts numerous agriculture food-related and textiles-related industries. The Nicolet River basin similarly has 35 percent of its territory devoted to agriculture, compared to 26 percent for the Chiiteauguay River and 15 percent for the L'Achigan River. These three basins also have various industries (furniture, dyeing and finishing textiles, and food-canning industries). The L'Achigan River, which is part of the L'Assomption River drainage basin, suffers particularly from the swine farming industries concentrated in this region. Finally, the Pike River, which belongs to the Richelieu River basin,
128
a l s o shows t h e
influence o f agriculture,
t h e major a c t i v i t y o f
t h e region.
A l l those a c t i v i t i e s t a k i n g place on t h e basins, i n a d d i t i o n t o t h e municipal discharges from t h e d i f f e r e n t agglomerations, c o n t r i b u t e t o t h e poor water quality o f high
these r i v e r s .
alkalinity,
pH,
T h e i r waters
are s t r o n g l y
mineralized, and show
t u r b i d i t y , t o t a l n i t r o g e n and t o t a l phosphorus values
(Table 2 ) . The r i v e r s which belong t o group 4 are a l l found on t h e south shore o f t h e St.
mainly i n t h e Gaspe Peninsula and t h e lowlands regions
Lawrence River,
(Fig. 5 d ) . They correspond t o drainage basins v i r t u a l l y unaffected by human a c t i v i t i e s and, as a r e s u l t , t h e water q u a l i t y o f these r i v e r s t r u l y r e f l e c t s the
geology o f
(sedimentary
However,
Appalachian
rocks
p l a t e a u and
susceptible t o
the
weathering
Lawrence lowlands
St.
and
composed
of
i n terms o f m i n e r a l i z a t i o n
(Table 2 ) .
and n u t r i e n t s concentrations
they d i f f e r markedly f o r t h e parameters associated w i t h
p r i n c i p a l component, iron. These r i v e r s
soluble those o f
The water c h a r a c t e r i s t i c s o f t h i s group are s i m i l a r t o
minerals), group 2
the
t h e second
namely tannins and l i g n i n s , t o t a l organic carbon and a l l have r e l a t i v e l y transparent and weakly colored
waters, w i t h a low t u r b i d i t y and a h i g h pH. The r i v e r s which c o n s t i t u t e group 5 d i f f e r very much from those o f group 4 since they have t h e most c o l o r e d and t u r b i d waters. They show t h e highest tannins and l i g n i n s ,
i r o n and t o t a l organic carbon concentrations (Table 2).
Geographically speaking r e g i o n (Fig.
5e).
however,
these r i v e r s
do
n o t come
from
t h e same
T h e i r water q u a l i t y seems r e l a t e d t o t h e surface area o f
the drainage basin and t h e nature o f t h e t o p s o i l s .
For example, t h e G e n t i l l y
River and t h e Du Ch&ne River both have small basins which e s s e n t i a l l y d r a i n two regions o f t h e S t . Lawrence lowlands whose s o i l s are dominated by marine clays. S i m i l a r l y , t h e Ticouape River basin i s small and p o o r l y d r a i n s t h e found
north
Rivikre-du-Loup
lowlands
River,
of
Lake
which has
Saint-Jean. a
a f f e c t e d by t h e numerous organic matter t h i s region o f t h e S t . the
headwaters
of
the
larger
deposits and
Lawrence lowlands. Harricana
On drainage
River
the
other
basin,
hand, seems
the t o be
peat-bogs dispersed i n
F i n a l l y , t h e K i n o j e v i s R i v e r and have
a water
quality
which i s
i n f l u e n c e d by t h e presence o f humic s o i l s and wetlands, and by t h e mining a c t i v i t i e s t a k i n g place i n t h e r e g i o n ( h i g h copper and z i n c concentrations). As mentioned above,the f i r s t PCA performed on a t h e i n i t i a l 134 s t a t i o n s has revealed t h e existence o f twelve r i v e r s which d i f f e r markedly from t h e
Fig. 5. Geographical l o c a t i o n o f t h e s t a t i o n s composing each o f t h e s i x clusters.
129
130
131
132
rest of the stations. These problem rivers (stations) were removed from the data set to obtain a clearer image of the spatial variability which otherwise would have been distorted. A closer look at these highly polluted rivers reveals that they all have relatively small drainage basins ( 5 540 kd), and are concentrated in the St. Lawrence lowlands, except for one (the Malbaie River), which has a larger basin (1850 km2) and is located on the Canadian Shield (Fig. 5f). Furthermore, in addition to the geological effects and the influence, in some cases, of agriculture, these rivers all suffer from the presence of point sources of pollution. As a result of their low discharges, most of them have reduced self-cleaning capacity. The pollutants entering them are less diluted and tend to remain longer in the aquatic environment. The twelve rivers can be subdivided in two groups. The first one, constituted by two rivers strongly affected by a pulp and paper mill and other industries (The Malbaie and the Shawinigan Rivers), shows the highest median values for both total organic carbon and tannins and lignins. However, the water quality observed at the mouth of these rivers should not be considered representative of the whole basin since the pollution sources are concentrated in this segment of the rivers. The second group, containing the ten other stations, has the most mineralized waters. Some of these rivers present the highest median values for alkalinity, pH, turbidity, and total nitrogen, and their total phosphorus concentrations are similar to those of group 3 . The uppermost station of the Becancour River belongs to this group and shows, in addition to the effects of other human activities, the influence of asbestos mining on water quality (high magnesium concentrations). However, the river condition improves markedly downstream from the mining area, and the water quality observed at Lyster (middle station on the river) places this station in group 4 . The other rivers of the group drain the St. Lawrence lowlands and have a water quality which reveals severe anthropogenic effects (agriculture, industries and municipal discharges). CONCLUSION The use of multivariate techniques of analysis has produced very interesting results. The PCA has identified the list of parameters responsible for most of the among stations (rivers) variability. The superimposition of the cluster analysis results on the principal components scores (position of the objects in the reduced plane) has shown which was inferior to the percentage predicted by the model. The reduced plane
133
formed by the first two components explains 76 percent of the variance among stations. The twelve variables used in the PCA, when projected in the reduced (two-dimensional) plane, all produced vectors that exceed the equilibrium circle of descriptors (Legendre and Legendre, 1983) and consequently, contribute significantly to the formation of the plane. The variables associated with the first component were sulfate, chloride, total phosphorus, total nitrogen, turbidity, calcium, magnesium, alkalinity and pH. This axis represents a mineralization gradient. The variables correlated with the second component are the tannins and lignins, iron and total organic carbon. This second axis illustrates an organic content and Since the eigenvectors were normalized to the square water color gradient. root of their respective eigenvalue, the angle between two descriptor axes or between a descriptor axis and a component (Fig. 3) represents the correlation between variables or between a variable and a component (Legendre and Legendre 1983). The percentage of variance explained by the two components is rather high. The first axis in particular (51 percent), which explains !wice the amount of variance of the second axis (25 percent), suggests that there is some redundancy in the information concerning the mineralization of water (Scherrer, 1984). The major ions used to characterize the geology of the different drainage basins are strongly correlated with each other. This redundancy could have been reduced by summing cations and anions and using the two sums instead of the respective ions. There is nevertheless no doubt that mineralization plays a major role in the variability among stations since the geology of the different basins varies considerably at the scale of the province (for example, the Canadian Shield versus the Appalachian P1 ateau) . The positioning of the objects (stations) in the reduced plane preserved the Euclidean distances of the standardized (centered and reduced) data since the scoring coefficients were normalized to give principal components scores with unit variance (SAS Institute Inc., 1982). This representation eliminates the effects related to the units of measurement and the respective variance of the variables. C1 uster anal vsi s The principal components scores positioned the objects (stations) in the reduced plane according to their respective water quality. However, the proximity of two objects in a reduced plane does not necessarily imply that stations share a similar water quality. Furthermore, the relative position
134
of the five homogenous groups in the plane informed us about their general water quality characteristics. Ordination and cluster analysis complemented each other very well and summarized all the information contained in the data matrix. The six groups of stations (or rivers) revealed by our analysis show the importance of geological factors and land uses on the water qlrality. The rivers of group 1 and 4 are mostly pristine, and their water quality reflects the geology of the Canadian Shield and the Appalachian Plateau region respectively. The geographical regions corresponding to these two groups have low population densities and hence, these rivers are relatively unaffected by human activities. Group 2 and 3 represent rivers which are affected to different degrees by anthropogenic activities taking place on the drainage basins. The land use effects and municipal discharges from the agglomerations add up to the geological effects to produce the observed water quality. Agricultural practices play a major role as determinants of water quality in these geographical regions which are also densely inhabited. The seven rivers forming group 5 have a water quality which reflects the particular nature of the soils of these drainage basins, their surface area and the drainage quality. Finally, our study has identified problem rivers which are, with a few exceptions, found in the most populated and most industrialized region of Quebec. These rivers which are characterized by small drainage basins and low discharges, suffer from the important socio-economic activities going on in the region. For some of these rivers, the observed water characteristics are biased by the presence of a few major sources of pollution which often mask what would otherwise be an acceptable water quality. REFERENCES Bobee, B . , D. Cluis, M. Goulet, M. Lachance, L. Potvin, et A . Tessier. 1977.cvaluation du reseau de la qualite des eaux. Analyse et interpretation des donnees de la periode 1967-1975. Service de la qualite des eaux, Ministere des Richesses naturelles du Quebec, Q.E. 20, Quebec. 2 volumes, 514 p. Frontier, S. 1976. Etude de la decroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modele du biton brise. J. exp. mar. Biol. Ecol. 25: 67-75.
135
Goulet, M. 1979. Reseau de base de la qualite du milieu aquatique en rivieres a l’echelle du Quebec, Service de la qualite des eaux, ministere de l’Environnement, rapport interne 79-04, 60 pages, Envirodoq 02015. Green, R. 1979. Sampling design and statistical methods for environmental biologists. John Wiley and Sons, New-York, 257 p. Legendre. 1983. Numerical ecology. Development in Legendre, L. and P. environmental Modelling, 3. Elsevier, Amsterdam, 419 p. Longpre, G., G. Joubert, et J. Trottier. 1982. Guide d’information sur l’analyse physique, chimique, biologique et bacteriologique des milieux environnementaux. Ministere de 1’Environnement du Quebec, Direction des laboratoires, 152 p. McNeely, R.N., V . P . Neimanis et L. Dwyer. 1980. References sur la qualite des eaux. Guide des parametres de la qualite des eaux. Direction generale des eaux interieures. Direction de la qualite des eaux. Ottawa. 100 p .
SAS Institute Inc., 1982. SAS User’s guide: statistics, 1982. edition. SAS institute Inc., Cary, North Carolina. 584 p. 1984. Scherrer, B . Montreal, 95 p.
Analyse en composantes principales.
G.R.E.B.E. Inc.,
1983. Regionalization o f water quality in the upper Whitfield, P . H. Fraser basin, British Columbia. Water Res. 17, 1053-1066.
This Page Intentionally Left Blank
E S T I M A T I O N O F D I S T R I B U T I O N A L PARAMETERS FOR C E N S O R E D W A T E R Q U A L I T Y DATA D E N N I S R.
HELSEL
Geological Survey,
U.S.
Reston,
Virginia
INTRODUCTION Investigations o f t r a c e substances i n ambient waters increasingl y c o n d u c t e d d u r i n g t h e l a s t 10 y e a r s h a v e e n c o u n t e r e d a r e c u r r i n g
d i f f i c u l t y : a substantial
p o r t i o n o f water-sample concentrations
a r e below t h e l i m i t s o f d e t e c t i o n e s t a b l i s h e d by a n a l y t i c a l l a b o r a tories.
Data s e t s w i t h these "less-than''
"censored data" i n s t a t i s t i c a
observations are termed
terminology.
Censored d a t a do n o t
p r e s e n t a s e r i o u s i n t e r p r e t a t on p r o b l e m i f c o n c e n t r a t i o n s o f p r i mary i n t e r e s t a r e w e l l above o f t e n n o t t h e case.
he detection l i m i t ,
F o r some c h e m i c a l s ,
but this i s
established water-quality
c r i t e r i a a r e b e l o w commonly a p p l i e d d e t e c t i o n l i m i t s . others,
F o r many
t h e great u n c e r t a i n t y i n t h e e f f e c t s o f long-term exposure
t o v e r y l o w l e v e l s a l s o make i t d e s i r a b l e t o a s s e s s t h e f r e q u e n c y o f occurrence o f c o n c e n t r a t i o n s below t h e d e t e c t i o n l i m i t . short,
In
t h e r e i s a need t o e s t i m a t e t h e f r e q u e n c y d i s t r i b u t i o n o f
c o n c e n t r a t i o n s above,
near,
and below d e t e c t i o n l i m i t s u s i n g o n l y
d a t a above t h e d e t e c t i o n l i m i t . The p u r p o s e o f t h i s s t u d y i s t o a d d r e s s s e v e r a l
key aspects o f
estimating d i s t r i b u t i o n a l parameters from censored data.
These
include: 0
The p e r f o r m a n c e o f s e v e r a l e s t i m a t i o n m e t h o d s when e s t i m a t i n g d i s t r i b u t i o n a l parameters f r o m s m a l l samples drawn f r o m a wide range o f u n d e r l y i n g d i s t r i b u t i o n s and censored t o v a r y i n g degrees.
0
C r i t e r i a f o r determining, maining a f t e r censoring,
b a s e d o n l y on a t t r i b u t e s o f d a t a r e which e s t i m a t i o n method i s most l i k e -
l y t o be b e s t f o r each d a t a s e t . 0
The r e l i a b i l i t y o f e s t i m a t e s f r o m c e n s o r e d d a t a o f f o u r d i s t r i b u t i o n a l parameters: a n d in t e r q u a r t i l e r a n g e .
t h e mean,
standard deviation,
median,
138 A P P R O ACH
1.
Generation o f data.
Sixteen p a r e n t d i s t r i b u t i o n s were s e l e c t e d
as r e p r e s e n t a t i v e o f t h e range o f f r e q u e n c y d i s t r i b u t i o n s t h a t i s typical of trace water-quality s a m p l e s i z e s 10,
25,
data.
a n d 50 l i m i t .
Five hundred data sets o f Several sample s t a t i s t i c s were
computed f o r each d a t a s e t and t h e one w h i c h b e s t i n d i c a t e d t h e All
p a r e n t d i s t r i b u t i o n was s e l e c t e d . i f i e d using that statistic.
d a t a s e t s were t h e n c l a s s -
B e n e f i t s i n method s e l e c t i o n and
i m p r o v e d a c c u r a c i e s o f RMSEs w e r e e v a l u a t e d .
2.
Parameter E s t i m a t i o n Methods.
f o r e s t i m a t i n g t h e mean,
E i g h t methods were e v a l u a t e d
standard deviation,
q u a r t i l e range o f censored data.
median,
and i n t e r -
The r e l i a b i l i t y a n d r e l a t i v e
p e r f o r m a n c e o f m e t h o d s was e v a l u a t e d b a s e d o n t h e i r r o o t mean s q u a r e d e r r o r s (RMSEs).
3.
Estimation without classification.
and sample s i z e ,
For each c e n s o r i n g l e v e l
a l l d a t a s e t s f r o m t h e 16 p a r e n t d i s t r i b u t i o n s
w e r e c o m b i n e d f o r c o m p u t a t i o n o f RMSEs f o r e a c h m e t h o d a n d d i s t r i b u t i o n parameter.
B e s t methods,
b a s e d on minimum RMSE,
were
i d e n t i f i e d f o r each parameter f o r every combination o f censoring l e v e l and sample size.
RMSEs o f t h e s e b e s t m e t h o d s f o r e a c h s u c h
combination were e v a l u a t e d i n r e l a t i o n t o t h e most r o b u s t method over a l l simulation conditions. 4.
Estimation with classification.
Method s e l e c t i o n and t h e
a c c u r a c y o f RMSEs m i g h t b e i m p r o v e d b y c l a s s i f y i n g d a t a s e t s b a s e d on a t t r i b u t e s o f d a t a a b o v e t h e d e t e c t i o n l i m i t .
Several sample
s t a t i s t i c s were computed f o r each d a t a s e t and t h e one which b e s t i n d i c a t e d t h e p a r e n t d i s t r i b u t i o n was s e l e c t e d . were t h e n c l a s s i f i e d u s i n g t h a t s t a t i s t i c .
A l l data sets
B e n e f i t s i n method
s e l e c t i o n a n d i m p r o v e d a c c u r a c i e s o f RMSEs w e r e e v a l u a t e d . 5.
Verification.
Method s e l e c t i o n r e s u l t s were v e r i f i e d by apply-
i n g t h e same t y p e o f a n a l y s i s t o a c t u a l w a t e r - q u a l i t y
data.
The
c l a s s i f i c a t i o n s y s t e m d e v e l o p e d i n t h e s i m u l a t i o n s was t e s t e d b y comparing method performance f o r a c t u a l and s i m u l a t e d d a t a w i t h i n each c l a s s ,
and by e v a l u a t i n g t h e a b i l i t y o f c l a s s i f i c a t i o n t o
separate water-quality
d a t a s e t s h a v i n g d i f f e r e n t RMSEs o f p a r a -
meter estimates. 6.
E s t i m a t i o n o f sample s t a t i s t i c s .
The a b i l i t y o f t h e e i g h t
methods t o e s t i m a t e t h e v a l u e o f uncensored sample s t a t i s t i c s , r a t h e r t h a n t h e p o p u l a t i o n parameter as before,
was e v a l u a t e d i n
a s i m u l a t i o n u s i n g t h e same 1 6 p a r e n t d i s t r i b u t i o n s , levels,
and sample s i z e s .
censoring
T h e r e s u l t i n g RMSEs w e r e c o m p a r e d t o
139 those f o r estimating population parameters.
Finally,
these results
d a t a s e t s f r o m t h e U.S.
were v e r i f i e d u s i n g uncensored t r a c e - m e t a l
Geo 1 o g ic a 1 S u r v e y ' s N a t i o n a l S t r e a m Q u a l i t y A c c o u n t i n g N e t w o r k (NASQAN). E a c h o f t h e s e c t o n s o u t l i n e d a b o v e i s now d i s c u s s e d f u r t h e r . Additional detail,
ncluding t a b es o f results,
G i l l i o m and H e l s e l
1 9 8 5 ) a n d He s e l a n d G i l l i o m ( 1 9 8 5 ) .
may b e f o u n d i n
GENERATION O F DATA I n d e s i g n i n g t h e Monte C a r l o
xperiments,
a p r i m a r y g o a l was t o
m i m i c as c l o s e l y a s p o s s i b l e t h e t y p e s o f d a t a t h a t a c t u a l l y o c c u r for concentrations o f trace constituents sample p r o p e r t i e s and t h e v i s u a l
i n water.
Based o n t h e
i n s p e c t i o n o f sample histograms,
f o u r p a r e n t d i s t r i b u t i o n s w i t h p o s i t i v e skew w e r e c h o s e n : normal,
contaminated lognormal
and d e l t a ( l o g n o r m a l
augmented by z e r o s ) .
d i s t r i b u t on were c o n s i d e r e d ,
log-
(mixture o f two lognormals),
gamma,
Four v a r i a n t s o f each
h a v i n g C V ' s o f 0.25,
0.50,
1.0,
and
The d e n s i t y f u n c t i o n s f o r t h e r e s u l t i n g 1 6 p a r e n t d i s t r i b u -
2.0.
t i o n s a r e s h o w n i n f i g u r e 1.
I n a l l cases,
t h e m e a n s e q u a l e d 1.0.
A boxp o t which combines 100 d a t a s e t s f r o m each o f t h e 16 p a r e n t d i s t r i b u t i o n s i s compared t o b o x p l o t s f o r t r a c e m e t a l and nut r i e n t p l u s s e d i m e n t d a t a f r o m t h e U S G S N A S Q A N p r o g r a m i n f i g u r e 2. P r e s e n t e d a r e c o e f f i c i e n t s o f v a r i a t i o n (CV) and a measure o f
MS,
,symmetry, MS =
75 450
-
where
950
-
425
and q i i s t h e ith p e r c e n t i l e o f t h e data set. A l l three types o f d a t a have s i m i l a r d i s t r i b u t i o n s o f these non-dimensional v a r i a n c e and symmetry c h a r a c t e r i s t i c s .
Therefore,
t h e s e 16 d i s t r i b u t i o n s
were considered r e p r e s e n t a t i v e o f t h e d i s t r i b u t i o n s o f t r a c e cons t i t u e n t concentrations found i n water. The r e l a t i o n s h i p s u s e d t o g e n e r a t e d a t a f r o m t h e s e d i s t r i b u t i o n s a r e summarized below,
f o l l o w e d by a b r i e f d e s c r i p t i o n o f t h e
s i z e s and c e n s o r i n g o f d a t a s e t s .
All x's
r e f e r t o real-space
v a l u e s and a l l y ' s r e f e r t o log-space values. Lognormal D i s t r i b u t i o n When y = I n x i s n o r m a l l y d i s t r i b u t e d w i t h mean u y a n d v a r i G ~ a , s e t o f c o n c e n t r a t i o n s , xi, i=l,... n can be generY a t e d u s i n g e q u a t i o n 1: ance
X i
= exp(uy
+ uy*€i)
(1)
140
CV=0.25
CV=l.O ---- --
CV=0.50 __-
cv=2.0 _.____._.__..-
W
0
z
W
nf
[1L
3
0 0
0
G
I
I
0
>
0 Z
W
3
a w CY
G
Fig. 1.--Probability density functions butions used i n simulations.
f o r the parent d i s t r i -
where E i i s a randomly chosen v a l u e f r o m a normal d i s t r i b u t i o n w i t h a mean o f z e r o a n d v a r i a n c e o f one. Contaminated Lognormal
Distribution
The c o n t a m i n a t e d l o g n o r m a l d i s t r i b u t i o n u s e d i n t h i s s t u d y cons i s t s o f a m i x t u r e o f one p r e d o m i n a n t l o g n o r m a l
(pxl,
d e s c r i b e s 80 p e r c e n t o f t h e overa.11
and a c o n t a m i n a n t
population,
a x l ) which
141
MAXIMUM
41 97
45.3.
321
I
40-
4.0 -
30 -
30
Explanation:
>
T - Trace V - Verification (Nutrient and Sediment) S - Simulated
-
0
r”
-
20-
2.0 -
10-
10-
’1
n-
0-
T
V
S
T
v
N=781
N=918
N=1600
N=781
N=918
DATA TYPE
DATA TYPE
Fi g .
s N.1600
2.
Symmetry m e a s u r e (MS) a n d c o e f f i c i e n t o f v a r i a t i o n ( C V ) t y p e s [ * 35 d a t a s e t s h a v e d e n o m i n a t o r = O , a n d a r e be y o n d t h e m a x i m u m ’ ] f o r t h r e e d:ta
2 x ,2 ) , w h i c h d e s c r i b e s 2 0 p e r c e n t o f t h e o v e r a l l lognormal ( ~ ~ u population. P r o p o r t i o n a l r e l a t i o n s h i p s were s p e c i f i e d between t h e parameters o f t h e two d i s t r i b u t i o n s which allowed unique solut i o n s f o r t h e i r e x a c t parameters f o r any o v e r a l l d i s t r i b u t i o n s p e c i f i e d b y p, a n d u x . u x 2 = 1.5 U x 1 and
-
ox 2 - - - 2.0 ux 2
a
The c o n d i t i o n s imposed were:
ux 1 -.
Uxl
Gamma D i s t r i b u t i o n Two-parameter
gamma d i s t r i b u t i o n s ,
c h a r a c t e r i z e d by a shape
parameter, a x , and a s c a l e parameter, 8 , were generated u s i n g t h e I n t e r n a t i o n a l Mathematical and S t a t i s t i c a l L i b r a r i e s generating routine.
142 Delta Distribution The d e l t a d i s t r i b u t i o n i s a m i x t u r e o f a
ognormal d i s t r b u t i o n
( p x l , a x l ) a n d some p o r t i o n ( p ) o f z e r o v a l u e s F o r a l l simu at i o n s , t h e p o r t i o n o f z e r o s was 5 p e r c e n t ( p = 0 5 ) . The mean a n d standard d e v i a t i o n o f t h e o v e r a l l d i s t r i b u t i o n were g i v e n by Aitchison (1955). Sample S i z e s and C e n s o r i n g O f i n t e r e s t was t h e e f f e c t o f c e n s o r i n g o n d a t a s e t s o f v a r y -
i n g sample s i z e s . ducted,
Therefore,
w i t h d a t a s e t s o f 10,
simulation,
t h r e e s e p a r a t e s i m u l a t i o n s were con25,
a n d 50 o b s e r v a t i o n s .
I n each
500 d a t a s e t s w e r e g e n e r a t e d f r o m each o f t h e 16 p a r e n t
distributions.
A l l d a t a s e t s were censored a t f o u r d i f f e r e n t
levels (detection limits)--the
20th,
i l e s o f the parent distributions. i n g a r e common i n t r a c e - l e v e l
I " censoring (David,
40th,
60th,
and 8 0 t h p e r c e n t -
Such h i g h p e r c e n t a g e s o f c e n s o r -
water-quality
data.
With t h i s "type
1981), t h e actual percentage o f observations
censored v a r i e d f o r each d a t a s e t due t o sample v a r i a b i l i t y . t h e gamma d i s t r i b u t i o n w i t h CV=2.0, were s o c l o s e t o z e r o (0.0043
For
t h e 2 0 t h and 4 0 t h p e r c e n t i l e s
and 0.070)
t h a t t h e y were d i s c a r d e d
as b e i n g u n r e a l i s t i c d e t e c t i o n l i m i t s . We r e q u i r e d t h e c o n d i t i o n t h a t a t l e a s t t h r e e o b s e r v a t i o n s b e p r e s e n t i n e a c h d a t a s e t a f t e r c e n s o r i n g o r t h e d a t a s e t was d i s carded. for
F o r n=10,
t h i s e l i m i n a t e d about 72 percent o f t h e d a t a
censoring a t t h e 80th percentile.
R e s u l t s f o r n=10 a t t h e
8 0 t h p e r c e n t i l e were t h e r e f o r e n o t considered meaningful. P AR AMET E R EST I M AT I 0 N MET H 0 D S
E i g h t m e t h o d s w e r e e v a l u a t e d f o r e s t i m a t i n g t h e p o p u l a t i o n mean, standard deviation,
median,
and i n t e r q u a r t i l e range.
These a r e
l i s t e d below along w i t h t h e i r a b b r e v i a t i o n s used i n t h i s r e p o r t . ZE :
C e n s o r e d o b s e r v a t i o n s w e r e assumed t o e q u a l z e r o .
DL :
Censored o b s e r v a t i o n s were assumed t o e q u a l t h e d e t e c t i o n
UN :
Censored o b s e r v a t i o n s w e r e assumed t o f o l l o w a u n i f o r m
limit. d i s t r i b u t i o n between z e r o and t h e d e t e c t i o n l i m i t . f o r t h e ordered observations o f data censored,
symmetric around one-half NR :
Xi,
i=1,2,
xi=dl (i-l)/(nc-1),
...n c
Thus,
a n d nc=nurnber
a distribution
the detection l i m i t (dl).
C e n s o r e d o b s e r v a t i o n s w e r e assumed t o f o l l o w t h e z e r o - t o d e t e c t i o n l i m i t p o r t i o n o f a normal d i s t r i b u t i o n which was f i t t o t h e u n c e n s o r e d o b s e r v a t i o n s u s i n g l e a s t
143 squares r e g r e s s i o n as f o l l o w s .
"Normal
scores,"
z,
were
computed f o r each uncensored o b s e r v a t i o n u s i n g 1
z = w h e r e 9-
1
@-
(r/n+l)
i s t h e i n v e r s e cumulative normal d i s t r i t u -
t i o n function,
r i s t h e observation rank (r=nc+l,
...n )
and n i s t h e sample s i z e f o r t h e e n t i r e d a t a set. least-squares
A
r e g r e s s i o n o f c o n c e n t r a t i o n on normal
s c o r e s f o r a l l d a t a a b o v e t h e d e t e c t i o n l i m i t was extrapolated t o estimate censored observations (ranks
... n c ) .
r = l,
Any e s t i m a t e d v a l u e s f a 1 1 i n g b e 1 ow z e r o
were s e t equal t o zero. LR :
C e n s o r e d o b s e r v a t i o n s a r e assumed t o f o l l o w t h e z e r o - t o detection l i m i t p o r t i o n o f a lognormal d i s t r i b u t i o n f i t t o t h e uncensored o b s e r v a t i o n s by l e a s t squares r e g r e s sion.
The m e t h o d i s i d e n t i c a l t o NR,
c e n t r a t i o n s were log-transformed NM :
except t h a t con-
p r i o r t o analysis.
C o n c e n t r a t i o n s a r e assumed t o be n o r m a l l y d i s t r i b u t e d w i t h parameters estimated from t h e uncensored observat i o n s b y t h e maximum l i k e l i h o o d m e t h o d f o r a c e n s o r e d normal d i s t r i b u t i o n
LM :
(Cohen,
1959).
C o n c e n t r a t i o n s a r e assumed t o be l o g n o r m a l l y d i s t r i b u t e d w i t h parameters e s t i m a t e d u s i n g l o g a r i t h m s o f t h e uncens o r e d o b s e r v a t i o n s i n C o h e n ' s ( 1 9 5 9 ) maximum l i k e l i h o o d method.
T h e mean a n d s t a n d a r d d e v i a t i o n o f t h e u n t r a n s -
formed c o n c e n t r a t i o n s a r e t h e n e s t i m a t e d u s i n g t h e e q u a t i o n s g i v e n by A i t c h i s o n and Brown (1957). DT :
Censored o b s e r v a t i o n s a r e assumed t o be z e r o and uncens o r e d o b s e r v a t i o n s a r e assumed t o f o l l o w a l o g n o r m a l distribution.
Estimates o f parameters o f t h e o v e r a l l
d e l t a d i s t r i b u t i o n a r e o b t a i n e d b y c o m p u t i n g maximum l i k e l i h o o d e s t i m a t e s o f p a r a m e t e r s of t h e u n c e n s o r e d lognormal p o r t i o n and u s i n g r e l a t i o n s h i p s between t h e s e and t h e o v e r a l l d e l t a d i s t r i b u t i o n d e s c r i b e d by Aitchison (1955). The commonly u s e d method o f d i s c a r d i n g c e n s o r e d o b s e r v a t i o n s p r i o r t o c a l c u l a t i n g p a r a m e t e r e s t i m a t e s was n o t i n c l u d e d i n t h i s study.
Discarding censored observations w i l l always r e s u l t i n
b o t h h i g h e r b i a s a n d h i g h e r R M S E t h a n t h e DL m e t h o d .
Because t h i s
c a n n e v e r b e t h e m o s t a p p r o p r i a t e ( m i n i m u m RMSE) m e t h o d ,
i t was
144 n o t considered here.
The commonly u s e d s u b s t i t u t i o n o f v a l u e s
t h e d e t e c t i o n l i m i t was a l s o n o t i n c l u d e d ,
equal t o one-half
t o i t s s i m i l a r i t y t o t h e UN m e t h o d . i d e n t i c a l e s t i m a t e s f o r t h e mean,
due
These two methods w i l l
produce
w h i l e a range i n values between
z e r o a n d t h e d e t e c t i o n l i m i t f o r t h e UN m e t h o d s h o u l d p r o d u c e b e t t e r estimates o f t h e other t h r e e parameters than s u b s t i t u t i n g a single,
a r b i t r a r y value f o r a l l censored data.
T h e e v a l u a t i o n o f t h e r e l i a b i l i t y o f e s t i m a t i o n m e t h o d s was b a s e d o n RMSEs c o m p u t e d f r o m a c t u a l p a r a m e t e r s o f t h e u n d e r l y i n g distribution.
D e v i a t i o n s between t h e parameter v a l u e s e s t i m a t e d
from each censored d a t a s e t and t h o s e o f t h e u n d e r l y i n g d i s t r i b u t i o n w e r e d i v i d e d b y t h e t r u e p o p u l a t i o n v a l u e s t o e x p r e s s RMSEs as f r a c t i o n s o f t h e t r u e values.
F o r example,
the equation for
t h e R M S E o f t h e mean i s
[ p (*) i'
RMSE = where
xi
1'2
i=l U
i s t h e e s t i m a t e o f t h e mean f o r t h e i t h o f N data sets.
A l s o computed were t h e b i a s p o r t i o n o f t h e RMSE and t h e s t a n d a r d e r r o r o f t h e RMSE,
which d e s c r i b e s t h e r e l i a b i l i t y o f RMSE e s t i -
mates. EST I M AT I O N W I THOUT CL A S S I F I CAT I O N Simulation results without classification of data sets are g i v e n i n f i g u r e 3 f o r d a t a s e t s o f s i z e n=25 t o show t h e t y p i c a l p a t t e r n o f r e s u l t s f o r a l l parameter e s t i m a t i o n methods. RMSEs a r e h i g h e r a n d l o w e r f o r n = 1 0 a n d n = 5 0 , same e s t i m a t i o n m e t h o d s a l w a y s p e r f o r m w e l l
Though
respectively,
the
f o r a p a r t i c u l a r com-
b i n a t i o n o f c e n s o r i n g l e v e l and d i s t r i b u t i o n a l parameter. T h e r e a r e s e v e r a l ways t o a p p r o a c h i d e n t i f y i n g t h e " b e s t " mation method(s).
f o r every single combination o f censoring level, sample s i z e .
Alternatively,
t h a t works w e l l
esti-
One a p p r o a c h w o u l d b e t o d e s i g n a t e a b e s t m e t h o d parameter,
and
a s i n g l e r o b u s t method c o u l d be chosen
over t h e e n t i r e range o f conditions simulated.
Figure 4 i l l u s t r a t e s these two method-selection
approaches.
The
b e s t o v e r a l l m e t h o d was c h o s e n b y s u m m i n g t h e r a n k s o f RMSEs f o r each method o v e r a l l sample s i z e s ,
censoring levels,
T h e m e t h o d w i t h t h e s m a l l e s t sum o f r a n k s ,
LR,
and parameters.
was c o n s i d e r e d b e s t .
RMSEs f o r LR a r e s h o w n f o r a l l p a r a m e t e r s i n f i g u r e 4,
along with
t h o s e f o r a n y o t h e r m e t h o d s h a v i n g RMSEs s i g n i f i c a n t l y
(a=0.05)
l o w e r t h a n t h a t o f LR.
L i t t l e r e d u c t i o n i n R M S E f o r t h e mean a n d
145
t
c
0 ZE
0 NR
Q)
a
0 NM
8 ZE DT 0 NR
~ L DL M UN LR
0
MEAN
I
I
1
SD
MEDIAN
IQR
F i g . 3. E r r o r s o f e s t i m a t i n g t h e mean, median, and i n t e r q u a r t i l e range ( I Q R ) . with censoring a t the 40th percentile.
standard deviation (SD), Sample s i z e e q u a l s 25,
s t a n d a r d d e v i a t i o n i s accomplished by c o n s i d e r i n g d i f f e r e n t sample s i z e s and c e n s o r i n g l e v e l s s e p a r a t e l y .
T h e RMSEs o f LR a r e l o w e s t ,
or not s i g n i f i c a n t l y d i f f e r e n t than the lowest,
i n v i r t u a l l y every
situation. For t h e median and i n t e r q u a r t i l e range,
on t h e o t h e r hand,
s i g n i f i c a n t r e d u c t i o n s i n RMSE can be a c h i e v e d by u s i n g t h e b e s t m e t h o d f o r a p a r t i c u l a r s e t o f c o n d i t i o n s r a t h e r t h a n u s i n g LR f o r all
(fig.
4).
The l a r g e s t r e d u c t i o n s i n RMSE o c c u r f o r s m a l l
sample s i z e s and h i g h c e n s o r i n g . c e n s o r i n g l e v e l and sample s i z e ,
For a l l b u t f o u r combinations o f t h e b e s t method f o r e s t i m a t i n g
t h e m e d i a n a n d i n t e r q u a r t i l e r a n g e i s LM. r a n g e a t 20 p e r c e n t c e n s o r i n g , n=50.
For t h e i n t e r q u a r t i l e
LM i s t i e d w i t h LR f o r n = 2 5 a n d
F o r t h e m e d i a n a t 80 p e r c e n t c e n s o r i n g a n d n=25 a n d n=50,
LM i s a c l o s e s e c o n d t o N R . F i g u r e 4, approaches,
w h i l e showing t h e extremes o f method s e l e c t i o n
suggests an e f f e c t i v e t h i r d c o u r s e - - s e l e c t i n g
LR f o r
t h e mean a n d s t a n d a r d d e v i a t i o n a n d LM f o r t h e m e d i a n a n d i n t e r q u a r t i l e range.
I n fact,
LR h a s t h e l o w e s t sum o f r a n k s ( l o w e s t
r a n k w i t h l o w e s t RMSE) o f a n y m e t h o d f o r t h e mean a n d s t a n d a r d
146
W
100
I
I
3
I
100
MEAN
t
I
1
5
P f Y u
60-
-
w
40.
n w
/
if
-_____---
n.10
UN
n25
n:50
3
2
-
20-
4
%
5 20
60
40
80
I
I
20
40
POPULATION PERCENTILE OF CENSORING LEVEL
60
80
POPULATION PERCENTILE OF CENSORING LEVEL EXPLANATION n number of observations In each sample before
-RMSE 01
,
L
O
0
20
40
60
80
censoring
LR method __.. RMSE 01 best method [mdicated lor each datum)
POPULATION PERCENTILE OF CENSORING LEVEL
Fig. 4.
R o o t mean s q u a r e d e r r o r s f o r b e s t e s t i m a t i o n m e t h o d s .
147 d e v i a t i o n o v e r a l l c e n s o r i n g l e v e l s a n d s a m p l e s i z e s w h i l e LM h a s t h e l o w e s t sum o f r a n k s f o r t h e m e d i a n a n d i n t e r q u a r t i l e r a n g e . L i t t l e r e d u c t i o n i n RMSE i s a c c o m p l i s h e d by u s i n g o t h e r methods f o r d i f f e r i n g sample s i z e s o r censoring l e v e l s . T h e LM m e t h o d p r o d u c e s some e r r a t i c a l l y h i g h e s t i m a t e s o f t h e mean a n d s t a n d a r d d e v i a t i o n ( f i g u r e 3 ) , censoring levels.
particularly f o r higher
T h i s o c c u r r e d f o r t h e same d a t a s e t s f o r w h i c h
LM g e n e r a l l y p r o d u c e d t h e b e s t e s t i m a t e s o f t h e m e d i a n a n d i n t e r q u a r t i l e range,
a n d c a n b e e x p l a i n e d u s i n g f i g u r e 5.
The e s t i m a t e d
p r o b a b i l i t y d i s t r i b u t i o n s p r o d u c e d b y t h e LM a n d LR m e t h o d s a r e compared t o t h e p a r e n t d i s t r i b u t i o n f o r one h i g h CV d a t a s e t censored a t t h e 60th percentile.
F i g u r e 5 i l l u s t r a t e s t h a t t h e LM
method p r o d u c e d an e s t i m a t e d d i s t r i b u t i o n t h a t more c l o s e l y m i m i c s t h e p a r e n t d i s t r i b u t i o n t h a n t h e LR m e t h o d . accurate estimates o f percentiles.
This results i n
To d o t h i s ,
however,
a n d s t a n d a r d d e v i a t i o n w e r e g r o s s l y o v e r e s t i m a t e d a t 4.7 respectively.
T h e LR m e t h o d ,
the parent distribution,
0.14
I
I
I
t h e mean a n d 453,
though n o t m i m i c k i n g t h e shape o f
p r o d u c e d a c c u r a t e e s t i m a t e s o f t h e mean
I
I
I
I
I
1
I
I
__ Parent gamma ( p = l . O , U = 2.01 -_ - LR (X = 1.09, s 2.101 LM (X = 4 . 7 , s = 4 5 3 )
0.12
---
W
0
6 0.10 n
n 3
$
0.08
0 L L
>
-\h
Censoring level
0.06
0 Z
s
0.04
K
U
0.02
0.00 0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
CON c E NTRA T I
2.00
2.25
2.50
2.75
oN
F i g . 5. E s t i m a t e d f r e q u e n c y d i s t r i b u t i o n s b y LM a n d LR ( n = 2 5 ) The d a t a s e t c o m p a r e d t o t h e gamma C V = 2 . 0 p a r e n t d i s t r i b u t i o n . was c e n s o r e d a t t h e 6 0 t h p o p u l a t i o n p e r c e n t i l e .
3.00
148 (1.09)
a n d s t a n d a r d d e v i a t i o n (2.10).
B e c a u s e t h e LR,
NR,
and UN
methods i n v o l v e s i m p l y c a l c u l a t i n g sample p a r a m e t e r s t a t i s t i c s a f t e r estimating censored observations, estimates o f distributional
they r a r e l y produce w i l d
parameters.
E S T I M A T I O N WITH CL A S S IF I C AT I O N
R a n k i n g s a n d RMSEs w e r e p r e v i o u s l y p r e s e n t e d i n f i g u r e 3 w i t h all
16 p a r e n t d i s t r i b u t i o n s e q u a l l y r e p r e s e n t e d .
d i s t r i b u t i o n w e r e known,
however,
If the parent
t h e o t h e r 15 c o u l d be i g n o r e d ,
w i t h t h e r e s u l t i n g m e t h o d r a n k i n g a n d RMSE m a g n i t u d e s p o s s i b l y q u i t e d i f f e r e n t t h a n f i g u r e 3.
For example,
figure 6 separately
p r e s e n t s RMSEs f o r t h e mean f o r d a t a s e t s f r o m e a c h o f t h e f o u r
P
40
+
c 9) 9)
30
-
20
-
a c
.w" t n
B
lo 0
t
L
0
LR ALL LM ALL
t
I
I
I
0.25
0.50
1.0
2.0
cv F i g . 6. E s t i m a t i o n e r r o r s u s i n g t h e LR a n d LM m e t h o d s f o r 4 l o g normal d i s t r i b u t i o n s ( d i f f e r i n g CV's) and f o r a l l 16 p a r e n t d i s t r i b u t i o n s combined. Sample s i z e e q u a l s 25, w i t h c e n s o r i n g a t t h e 80th percentile.
lognormal d i s t r i b u t i o n s .
A l l d a t a s e t s c o n s i s t e d o f 25 o b s e r -
v a t i o n s and were censored a t t h e 8 0 t h p e r c e n t i l e . d i s t r i b u t i o n w i t h CV=O.25,
( L M ) h a s a n RMSE o f 9 p e r c e n t , a l o w e r R M S E o f 39 p e r c e n t .
For a lognormal
t h e l o w e s t ranked e s t i m a t i o n method w h i l e f o r CV=2.0
However,
t h e LR m e t h o d h a s
w i t h a l l 16 d i s t r i b u t i o n s
149 t o g e t h e r LR a n d LM e s t i m a t e t h e mean w i t h a n RMSE n e a r 30 p e r c e n t (fig.
6).
Therefore,
i f the parent d i s t r i b u t i o n o f a data set
c o u l d be i n f e r r e d f r o m a t t r i b u t e s o f d a t a above t h e d e t e c t i o n limit,
i m p r o v e d e s t i m a t e s o f RMSE m a g n i t u d e and p e r h a p s method
selection should result.
This i s t h e goal o f c l a s s i f i c a t i o n .
N o t e t h a t i f t h e t r u e C V w e r e 2.0, b e l a r g e r t h a n t h e 30 p e r c e n t w i t h a l l
an RMSE o f 39 p e r c e n t w o u l d 16 d i s t r i b u t i o n s i n c l u d e d .
Yet i t w o u l d b e a more a c c u r a t e e s t i m a t e f o r t h a t h i g h e r r o r p a r e n t distribution.
C l a s s i f i c a t i o n i n t h i s case should exclude data
f r o m l o w e r e r r o r ( l o w e r CV) d i s t r i b u t i o n s . Selection o f Class Boundaries Four d i m e n s i o n l e s s sample s t a t i s t i c s computed f r o m t h e d a t a above t h e d e t e c t i o n l i m i t were e v a l u a t e d f o r t h e i r a b i l i t y t o c l a s s i f y each d a t a s e t i n t o a g r o u p c o n t a i n i n g one o r more p a r e n t distributions.
S u c c e s s f u l c l a s s i f i c a t i o n o c c u r r e d when t h e p a r e n t
d i s t r i b u t i o n g e n e r a t i n g t h a t d a t a s e t was c o n t a i n e d i n t h e a s s i g n e d group.
T h e m o s t e f f e c t i v e s t a t i s t i c was t h e r e l a t i v e q u a r t i l e
r a n g e o r r q r ( G i l l i o m and H e l s e l ,
1985),
a measure o f t h e d i s p e r -
s i o n o f d a t a above t h e d e t e c t i o n l i m i t r e l a t i v e t o t h e m a g n i t u d e of the detection l i m i t .
The b e s t s e p a r a t i o n b e t w e e n g r o u p s was
evaluated using pairwise discriminant analysis.
The p r o b a b i l i t y
d e n s i t y f u n c t i o n e q u a t i o n s f o r each c o n s e c u t i v e group p a i r were solved,
a n d t h e p o i n t a t w h i c h t w o d e n s i t i e s w e r e e q u a l was t h e
optimum p o i n t o f s e p a r a t i o n . be d i s c r i m i n a t e d ,
Some d i s t r i b u t i o n g r o u p s c o u l d n o t
a n d t h e r e f o r e some r q r c l a s s e s r e p r e s e n t t w o
d i s t r i b u t i o n groups. B e n e f i t s o f C1 a s s i f i c a t i o n T h e b e s t e s t i m a t i o n m e t h o d was d e t e r m i n e d f o r e a c h c o m b i n a t i o n o f sample s i z e ,
c e n s o r i n g l e v e l and r q r c l a s s .
results without classification,
I n l i g h t of the
b e s t m e t h o d s f o r t h e mean a n d
standard d e v i a t i o n were determined se parately from those f o r the median and i n t e r q u a r t i l e range.
The b e s t m e t h o d was t h a t w h i c h
m i n i m i z e d t h e r a n k s o f RMSEs a c r o s s t h e t w o d i s t r i b u t i o n a l meters being considered.
If additional
nificantly different (t-test parameters,
a t a=0.05)
para-
m e t h o d s h a d RMSEs n o t s i g from t h e best f o r both
these were a l s o i n c l u d e d as "best."
Finally,
a single
b e s t m e t h o d o v e r a l l t h r e e s a m p l e s i z e s was s e l e c t e d f o r e a c h r q r class.
R e s u l t s a r e g i v e n i n G i l l i o m and H e l s e l ( 1 9 8 5 ) .
The s i n g l e
b e s t m e t h o d was o f t e n t h e o n l y m e t h o d t h a t q u a l i f i e d f o r b e s t f o r a l l t h r e e sample s i z e s .
Where m o r e t h a n o n e m e t h o d q u a l i f i e d o r
150 w h e r e n o n e was b e s t o v e r a l l s a m p l e s i z e s ,
t h e method which m i n i -
m i z e d t h e sum o f s q u a r e d R M S E s o v e r t h e t h r e e s a m p l e s i z e s was selected. I n every r q r class,
t h e b e s t e s t i m a t i o n method f o r t h e median
a n d i n t e r q u a r t i l e r a n g e was LM.
P r i o r t o c l a s s i f i c a t i o n t h e LR
m e t h o d was g e n e r a l l y b e s t f o r e s t i m a t i n g t h e mean a n d s t a n d a r d deviation,
b u t w i t h c l a s s i f i c a t i o n t h e LM,
UN,
o r NR m e t h o d s some-
t i m e s p r o d u c e s l i g h t l y l o w e r RMSE t h a n d i d L R .
These s l i g h t l y
l o w e r RMSEs a r e i n most i n s t a n c e s n o t s i g n i f i c a n t l y d i f f e r e n t (a=.05)
t h a n t h e RMSE o f L R .
tically significant, UN,
Even w h e r e d i f f e r e n c e s a r e s t a t i s -
they are not large.
In contrast,
n o r NR a r e s i m i l a r l y r o b u s t o v e r a l l r q r c l a s s e s .
n e i t h e r LM, For example,
LM has a s l i g h t l y b u t s i g n i f i c a n t l y l o w e r RMSE t h a n L R f o r b o t h t h e mean a n d s t a n d a r d d e v i a t i o n a t t h e 6 0 t h p e r c e n t i l e c e n s o r i n g l e v e l a n d r q r = 0.25
t o 0.60
(n=25).
i n the next highest rqr class mean a n d s t a n d a r d d e v i a t i o n ,
Yet L M i s t h e w o r s t method
( r q r = 0.60
t o 1.4)
f o r both the
w i t h RMSEs o v e r 100 p e r c e n t o f t h e
t r u e value f o r standard deviation. When a p p l y i n g p a r a m e t e r e s t i m a t i o n m e t h o d s t o a c t u a l w a t e r q u a l i t y data,
an i m p o r t a n t c o n s i d e r a t i o n i s method r o b u s t n e s s .
Given t h e p o s s i b i l i t y o f m i s - c l a s s i f y i n g i n d i v i d u a l d a t a s e t s based on r q r ,
and t h e small
any r q r c l a s s , making l o w - r i s k
i n c r e a s e s i n RMSE when L R i s u s e d f o r
t h e use o f t h e more r o b u s t L R method i s b e s t f o r e s t i m a t e s o f t h e mean a n d s t a n d a r d d e v i a t i o n f o r
a l l data sets. Accuracy o f RMSEs Though t h e c l a s s i f i c a t i o n s y s t e m does n o t ,
i n practice,
method s e l e c t i o n compared t o r e s u l t s w i t h no c l a s s i f i c a t i o n , does r e s u l t i n s u p e r i o r e s t i m a t e s o f e r r o r (RMSE),
alter it
by c o n s i d e r i n g
d i f f e r e n c e s due t o t h e p r o b a b l e p a r e n t d i s t r i b u t i o n .
Figure 6
showed t h a t R M S E s v a r y c o n s i d e r a b l y b e t w e e n d a t a s e t s f r o m d i f f e r ent parent distributions.
T h e c l a s s i f i c a t i o n s y s t e m was d e s i g n e d
t o i n d i c a t e t h e t y p e s o f p a r e n t d i s t r i b u t i o n s from which each d a t a s e t may h a v e o r i g i n a t e d ,
and t h e r e f o r e y i e l d more a c c u r a t e e s t i -
mates o f e r r o r ( w h e t h e r h i g h e r o r l o w e r ) t h a n t h e a v e r a g e RMSE f o r a l l data sets from a l l
16 p a r e n t d i s t r i b u t i o n s ,
such as g i v e n i n
f i g u r e 3. To i l l u s t r a t e t h e i m p r o v e m e n t i n R M S E a c c u r a c y f o l l o w i n g c l a s sification,
t h e data f o r 60th p e r c e n t i l e censoring (n=25) i s
p l o t t e d i n f i g u r e 7.
Shown i n t h e f i g u r e a r e t h e R M S E s f o r p e r f e c t
151
250 a
W
a
3 1
5
200 -
1
W
3 I1I
I
........ ........ 95-percent confidence interval of
+ uW
RMSE when all data sets are correctly classified 95-percent confidence interval of RMSE for all data sets falling in the rqr class corresponding t o each distriDution group
150
-
...............1
c7
RMSE for all data sets combined and no classification
<
-i
Z W
0
100
a v)
Q
w
0
I 50 [r
0
I
m
II
Iy:
P
PT
DISTRIBUTION GROUP F i g . 7. Comparison o f RMSEs w i t h and w i t h o u t c l a s s i f i c a t i o n f o r e s t i m a t e s o f t h e median f r o m d a t a s e t s o f n=25 censored a t t h e 60th popoulation percentile.
c l a s s i f i c a t i o n i n t o p a r e n t d i s t r i b u t i o n group, actual classification according t o rqr, classification.
those f o r t h e
a n d t h e RMSE w i t h o u t
When d a t a s e t s a r e c l a s s i f i e d , m o r e r e l i a b l e
RMSE e s t i m a t e s a r e o b t a i n e d . G i l l i o m and H e l s e l
( 1 9 8 5 ) show t h a t t h e r q r c l a s s i f i c a t i o n
system r e s u l t s i n RMSEs which a r e v e r y s i m i l a r t o t h e b e s t e s t i m a t e o f t r u e RMSE,
that of perfect classification.
Only a t 8 0 t h p e r -
c e n t i l e c e n s o r i n g do t h e RMSE v a l u e s s u b s t a n t i a l l y d e p a r t f r o m truth.
This r e f l e c t s the greater i n a b i l i t y t o correctly c l a s s i f y
152 Even a t 8 0 t h p e r c e n t i l e c e n s o r i n g ,
h i g h l y censored data sets. however,
r q r c l a s s i f ic a t i o n g e n e r a l y i m p r o v e s t h e a c c u r a c y o f
RMSE e s t i m a t e s o v e r t h o s e w t h no c a s s i f i c a t i o n . V E R I F I CAT I O N U n c e n s o r e d d a t a s e t s w i t h m o r e t h a n 50 o b s e r v a t i o n s f o r s u s pended sediment,
t o t a l phosphorus,
t o t a l Kjeldahl nitrogen,
and
n i t r a t e n i t r o g e n c o n c e n t r a t i o n s were o b t a i n e d f r o m 313 s t a t i o n s o f t h e U.S.
Geological Survey's
NASQAN n e t w o r k .
m o n t h l y samples t a k e n d u r i n g 1974-81,
Most d a t a were
r e s u l t i n g i n 917 d a t a s e t s
h a v i n g more t h a n 50 o b s e r v a t i o n s and no c e n s o r i n g . Suspended s e d i m e n t and m a j o r n u t r i e n t s d a t a w e r e a n a l y z e d r a t h e r t h a n t r a c e c o n s t i t u e n t s because: o
most a v a i l a b l e d a t a s e t s f o r t r a c e c o n s t i t u e n t s c o n s i s t e d o f l e s s t h a n 30 o b s e r v a t i o n s .
o
most t r a c e c o n s t i t u e n t d a t a s e t s c o n t a i n e d c e n s o r e d o b s e r v a tions.
0
s u s p e n d e d s e d i m e n t a n d n u t r i e n t s a r e t r a n s p o r t e d b y t h e same t y p e s o f p r o c e s s e s a s many t r a c e c o n s t i t u e n t s .
T h i s l a s t p o i n t i s i m p o r t a n t because s i m i l a r i t y i n t r a n s p o r t p r o c e s s w i l l t e n d t o r e s u l t i n s i m i l a r l y shaped f r e q u e n c y d i s t r i butions.
T h i s s i m i l a r i t y was p r e v i o u s l y c o m p a r e d i n f i g u r e 2.
For t h e v e r i f i c a t i o n t e s t s , and one o f n=25,
two subsamples,
were randomly s e l e c t e d w i t h r e p l a c e m e n t f r o m each
o f t h e 917 s e d i m e n t and n u t r i e n t d a t a s e t s . s a m p l e was c e n s o r e d a t 2 0 , method ( D a v i d ,
one o f s i z e n=10
1981),
40,
60,
Each r e s u l t i n g s m a l l
and 80 p e r c e n t b y t h e t y p e I1
as p o p u l a t i o n p e r c e n t i l e s w e r e n o t known.
W i t h t h i s m e t h o d t h e same f r a c t i o n o f e a c h d a t a s e t i s c e n s o r e d . Each o f t h e e i g h t p a r a m e t e r e s t i m a t i o n m e t h o d s w e r e a p p l i e d t o each censored sample.
Sample s t a t i s t i c s computed f r o m t h e o r i g i n a l
( n > 5 0 ) s e d i m e n t and n u t r i e n t d a t a s e t s w e r e u s e d as e s t i m a t e s o f
t h e t r u e p o p u l a t i o n p a r a m e t e r s i n RMSE c a l c u l a t i o n s . Results B e s t methods f o r t h e v e r i f i c a t i o n d a t a , R M S E o r w i t h RMSEs n o t s i g n i f i c a n t l y ( t - t e s t
t h e lowest,
methods w i t h t h e lowest a t a=0.05) l a r g e r t h a n
were i d e n t i c a l t o t h o s e o f t h e s i m u l a t i o n .
o v e r a l l m e t h o d f o r e s t i m a t i n g t h e mean,
The b e s t
standard deviation,
median,
a n d i n t e r q u a r t i l e r a n g e b a s e d o n h a v i n g t h e s m a l l e s t sum o f R M S E ranks over a l l f o u r d i s t r i b u t i o n a l levels,
and t h r e e sample s i z e s ,
parameters,
four censoring
was a t i e b e t w e e n LR a n d UN.
LR
153
p r o d u c e d t h e l o w e s t summed R M S E r a n k f o r t h e m o m e n t p a r a m e t e r s a n d LM f o r t h e p e r c e n t i l e p a r a m e t e r s f o r t h e v e r i f i c a t i o n d a t a . V e r i f i c a t i o n d a t a were t h e n c l a s s i f i e d by r e l a t i v e q u a r t i l e range ( r q r ) ,
a n d RMSEs w e r e c a l c u l a t e d f o r e a c h r q r c l a s s .
Ranks
o f m e t h o d RMSEs w e r e a g a i n s e p a r a t e l y summed f o r t h e m o m e n t a n d p e r c e n t i l e p a r a m e t e r s o v e r b o t h n=10 and n=25 sample s i z e s . RMSEs w e r e s i g n i f i c a n t l y
(t-test
a t a=0.05)
No
lower than those of
LR f o r t h e m o m e n t p a r a m e t e r s a n d o f LM f o r t h e p e r c e n t i l e p a r a meters. best,
T h e r e f o r e f o r e v e r y r q r c l a s s t h e s e two methods a r e e i t h e r
o r not signficantly d i f f e r e n t from the best,
and no s i g n i f i -
c a n t r e d u c t i o n i n e r r o r would r e s u l t f r o m s e l e c t i n g s e p a r a t e methods f o r each r q r class.
T h i s method s e l e c t i o n e x a c t l y f o l l o w s t h a t o f
t h e simulation study. The v e r i f i c a t i o n
r e s u l t s are strong evidence t h a t t h e previous
s i m u l a t i o n s t u d y l e d t o o p t i m a l c h o i c e o f e s t i m a t i o n methods f o r t h e mean,
standard deviation,
censored water-quality
median,
data sets.
and i n t e r q u a r t i l e range o f
Furthermore,
the verification
r e s u l t s show t h a t t h e r q r c l a s s i f i c a t i o n s y s t e m d e v e l o p e d f r o m s i m u l a t i o n s t u d i e s p r o v i d e s a n e f f e c t i v e means o f d i s t i n g u i s h i n g between d a t a s e t s o r i g i n a t i n g f r o m d i f f e r e n t t y p e s o f p a r e n t d i s t r i butions. E S T I M A T I O N O F SAMPLE S T A T I S T I C S F o r some a p p l i c a t i o n s ,
e s t i m a t e s o f sample s t a t i s t i c s r a t h e r
than population parameters might be desired from censored data. Uncensored w a t e r - q u a l i t y tics,
d a t a a r e summarized by t h e i r sample s t a t i s -
and comparisons between t h e s e d a t a and censored d a t a should
be on a n e q u a l b a s i s . Second S i m u l a t i o n S t u d y To d e t e r m i n e how w e l l t h e e i g h t m e t h o d s e s t i m a t e s a m p l e s t a t i s tics,
a s e c o n d s i m u l a t i o n s t u d y was p e r f o r m e d .
Distributional
shapes and o t h e r c r i t e r i a a r e i d e n t i c a l t o t h e p r e v i o u s s i m u l a t i o n study.
However,
RMSEs a n d b i a s w e r e c a l c u l a t e d ( u s i n g t h e mean
f o r example) as:
(3)
bias =
!
i= 1 ( x i
$,xo)/N
(4)
154 w h e r e yo i s t h e s a m p l e mean f o r t h e u n c e n s o r e d d a t a s e t ( r e p l a c i n g u ) , and t h e o t h e r p a r a m e t e r s a r e as p r e v i o u s l y g i v e n . C e n s o r i n g was a t t h e 20,
40,
60,
and 8 0 t h p e r c e n t i l e s o f each s i m u l a t e d
sample ( t y p e I 1 c e n s o r i n g ) ,
as opposed t o p e r c e n t i l e s o f t h e p a r e n t
population i n the f i r s t simulation study (type I censoring).
This
was t o f a c i l i t a t e c o m p a r i s o n w i t h t h e v e r i f i c a t i o n r e s u l t s . An e x a m p l e o f t h e r e s u l t s a r e s h o w n i n f i g u r e 8 .
Best methods
f o r t h e moment a n d p e r c e n t i l e p a r a m e t e r s i n t h i s new s i m u l a t i o n s t u d y w e r e LR a n d LM,
respectively,
rankings over a l l censoring levels. LR.
b a s e d o n t h e sum o f m e t h o d T h e o v e r a l l b e s t m e t h o d was
Best p e r f o r m i n g methods f o r e s t i m a t i n g sample s t a t i s t i c s were
t h u s i d e n t i c a l t o those f o r e s t i m a t i n g p o p u l a t i o n parameters. ever,
How-
t h e m a g n i t u d e s o f RMSEs d i f f e r f r o m t h o s e f o r p o p u l a t i o n para-
meters.
RMSEs o f s a m p l e e s t i m a t e s i n f i g u r e 8 c a n b e c o m p a r e d t o
t h o s e o f t h e p o p u l a t i o n p a r a m e t e r s p r e s e n t e d i n f i g u r e 3. a r e g e n e r a l l y s m a l l e r when e s t i m a t i n g s a m p l e s t a t i s t i c s .
RMSEs Therefore,
c o n f i d e n c e i n t e r v a l s a r o u n d t h e LR o r LM e s t i m a t e c a l c u l a t e d f r o m t h e b i a s a n d RMSE ( H e l s e l a n d G i l l i o m ,
1985) a r e s m a l l e r f o r i n -
c l u s i o n o f t h e uncensored sample s t a t i s t i c as compared t o t h e p o p u l a t i o n parameter.
RMSEs f o r t h e moment s a m p l e s t a t i s t i c s
70
fNy
N !R
NU ZE DT
60
+
.* l.M
!I
0
9)
NU
Q
c
40
d v)
I 30 K
8 DT ZE
iBk LR
NR
8 LR UN ZE DL NR DT
20
.W
10
SD F i g . 8. E r r o r s o f e s t i m a t i n g t h e u n c e n s o r e d s a m p l e mean, s t a n d a r d d e v i a t i o n ( S O ) , median, and i n t e r q u a r t i l e range (IQR). Sample s i z e e q u a l s 25, w i t h c e n s o r i n g a t t h e 4 0 t h p e r c e n t i l e .
155 decrease w i t h increasing r q r class, o f t h e population parameters.
the opposite trend from t h a t
T h i s i s due t o t h e g r e a t e r i n f l u e n c e
o f t h e h i g h e r o b s e r v a t i o n s o n t h e s a m p l e mean a n d s t a n d a r d d e v i a tion with higher rqr. censoring,
These h i g h e r o b s e r v a t i o n s r e m a i n a f t e r
p r o d u c i n g a more a c c u r a t e l y e s t i m a t e d sample s t a t i s t i c
w h i l e i n d i c a t i n g much l e s s a b o u t t h e p o p u l a t i o n p a r a m e t e r . V e r i f ic a t i o n o f S a m p l e S t a t is t i c E s t i m a t e s To v e r i f y t h e new s i m u l a t i o n r e s u l t s ,
t h e uncensored t r a c e
m e t a l d a t a s e t s s u m m a r i z e d i n f i g u r e 2 w e r e c e n s o r e d ( t y p e 11) a t t h e 20,
40,
60,
and 8 0 t h sample p e r c e n t i l e s and e r r o r s were c a l c u Table 1
l a t e d by comparison t o t h e uncensored sample estimates. l i s t s the water-quality s e t s f o r each.
p a r a m e t e r s c h o s e n a n d t h e number o f d a t a
Sample s i z e s r a n g e d f r o m 10 t o 40 o b s e r v a t i o n s .
Eleven o t h e r t r a c e c o n s t i t u e n t s had no d a t a s e t s which c o n t a i n e d o n l y uncensored o b s e r v a t i o n s and were n o t used. a l a r g e r number o f d a t a s e t s ,
In order t o obtain
i r o n and manganese d a t a w e r e i n c l u d e d
even though t h e y a r e n o t u s u a l l y found a t " t r a c e "
levels.
T r a c e m e t a l d a t a s e t s c o n t a i n i n g 1 0 t o 20 o b s e r v a t i o n s w e r e combined i n t o one group,
r e p r e s e n t i n g s a m p l e s i z e s g e n e r a l l y compa-
r a b l e t o n=10 s i m u l a t i o n r e s u l t s .
Data sets having fewer than
t h r e e d a t a p o i n t s a f t e r c e n s o r i n g were deleted.
A second group o f
d a t a s e t s h a v i n g f r o m 2 1 t o 40 o b s e r v a t i o n s was f o r m e d f o r c o m p a r i son t o n=25 s i m u l a t i o n r e s u l t s . a p p l i e d t o t h i s data.
Again,
The e i g h t e s t i m a t i o n methods w e r e
LR p r o v e d t h e b e s t o v e r a l l m e t h o d .
LR was b e s t f o r t h e m o m e n t p a r a m e t e r s a n d LM was b e s t f o r t h e p e r c e n t i l e parameters,
based on t h e rank c r i t e r i a g i v e n p r e v i o u s l y .
When c l a s s i f i e d b y r q r ,
RMSEs f o r a c t u a l t r a c e w a t e r - q u a l i t )
d a t a were s i m i l a r t o t h o s e o f t h e s i m u l a t i o n s .
O n l y m e d i a n esti-
m a t e s f o r 60 a n d 80 p e r c e n t c e n s o r i n g a p p e a r d i f f e r e n t , l a t i o n RMSEs h i g h e r t h a n a c t u a l .
w i t h simu-
T h i s i s perhaps due t o t h e
i n c l u s i o n o f l a r g e r sample s i z e s i n t h e a c t u a l t r a c e - d a t a mates,
esti-
with the simulation results representing conservative error
e s t i m a t e s based o n l y on n=10 o r n=25. CONCLUSIONS The m o s t r o b u s t e s t i m a t i o n method f o r m i n i m i z i n g e r r o r s i n e s t i m a t e s o f t h e mean,
median,
and i n t e r q u a r t i l e
r a n g e o f c e n s o r e d d a t a was t h e l o g - p r o b a b i l i t y
r e g r e s s i o n method
(LR).
standard deviation,
T h i s method i s based on t h e assumption t h a t censored observ-
ations follow the zero-to-censoring
l e v e l p o r t i o n o f a lognormal
156 d i s t r i b u t i o n o b t a i n e d by a least-squares
r e g r e s s i o n between
l o g a r i t h m s o f uncensored c o n c e n t r a t i o n o b s e r v a t i o n s and t h e i r normal scores. When m e t h o d p e r f o r m a n c e was e v a l u a t e d s e p a r a t e l y f o r e a c h d i s t r i b u t i o n a l p a r a m e t e r , LR r e s u l t e d i n t h e l o w e s t RMSEs f o r t h e mean a n d s t a n d a r d d e v i a t i o n .
The l o g n o r m a l maximum l i k e l i h o o d
e s t i m a t o r f o r c e n s o r e d d a t a ( L M ) p r o d u c e d l o w e s t RMSEs f o r t h e median and i n t e r q u a r t i l e range.
These two methods c o n s t i t u t e t h e
best procedures f o r t h e i r respective parameters. Using t h e r e l a t i v e q u a r t i l e range ( r q r ) ,
the interquartile
range o f uncensored observations d i v i d e d by t h e detection l i m i t , c e n s o r e d d a t a s e t s c a n b e c l a s s i f i e d i n t o g r o u p s r e f l e c t i n g probable parent distributions.
W i t h i n these r q r groups,
t h e accuracy o f
RMSEs s u b s t a n t i a l l y i m p r o v e d o v e r t h o s e w i t h o u t c l a s s i f i c a t i o n . The e i g h t methods were a p p l i e d t o uncensored suspended sediment and n u t r i e n t d a t a h a v i n g l a r g e sample s i z e s ( n > 5 0 ) . t h e e s t i m a t i o n m e t h o d t h a t was b e s t o v e r a l l , p e r c e n t i l e parameters separately,
Selection of
b e s t f o r moment a n d
and b e s t w i t h i n e v e r y r q r c l a s s
exactly followed those o f the simulation. E r r o r s i n e s t i m a t i n g s t a t i s t i c s o f uncensored samples r a t h e r than p o p u l a t i o n parameters were a l s o evaluated. e s t i m a t i n g s a m p l e s t a t i s t i c s w e r e LR a n d LM, moment a n d p e r c e n t i l e p a r a m e t e r s .
B e s t methods f o r
respectively,
for the
RMSEs w e r e a l m o s t a l w a y s s m a l l ' e r
when e s t i m a t i n g s a m p l e s t a t i s t i c s t h a n f o r p o p u l a t i o n p a r a m e t e r s (LM m e d i a n e s t i m a t e s o c c a s i o n a l l y h a v e g r e a t e r R M S E s ) , a n d w e r e s o m e t i m e s much s m a l l e r .
Therefore,
e s t i m a t e s o f uncensored sample
s t a t i s t i c s are i d e n t i c a l t o those o f population parameters,
but
have s h o r t e r c o n f i d e n c e i n t e r v a l s . These r e s u l t s f o r m t h e b a s i s f o r making t h e b e s t p o s s i b l e e s t i mates o f e i t h e r p o p u l a t i o n parameters o r sample s t a t i s i c s from censored water-quality
data.
The L R , m e t h o d f o r moment p a r a m e t e r s
a n d LM m e t h o d f o r p e r c e n t i l e p a r a m e t e r s s h o u l d b e t h e m e t h o d s o f c h o i c e when e s t i m a t i n g d i s t r i b u t i o n a l p a r a m e t e r s f o r c e n s o r e d trace-level
water-qua1 i t y data.
157
T a b l e 1.--Trace c o n s t i t u e n t s f r o m t h e NASQAN n e t w o r k used t o e s t i m a t e sample s t a t i s t i c s Number o f d a t a s e t s n=10-20 n=21-40 Parameter arsenic 100 7 dissolved arsenic 3 63 barium 5 0 boron 11 3 dissolved boron 19 7 1 13 copper dissolved copper 1 5 0 17 1e a d nickel 9 3 zinc 1 32 d i s s o l ved z i n c 0 2 iron 12 273 d i s s o l v e d ir o n 4 68 manganese 11 180 d i s s o l ved manganese 0 15 REFERENCES A i t c h i s o n , J o h n , On t h e d i s t r i b u t i o n o f a p o s i t i v e . r a n d o m v a r i a b l e h a v i n g a d i s c r e t e p r o b a b i l i t y mass a t t h e o r i g i n , J. A m e r i c a n S t a t i s t i c a l ASSOC., Sept., 901-908, 1955. A i t c h i s o n , J o h n , a n d J . A. C . B r o w n , T h e L o g n o r m a l D i s t r i b u t i o n , 1 7 6 pp., U n i v e r s i t y P r e s s , C a m b r i d g e , 1 9 5 7 . Cohen, A. C., Jr., S i m p l i f i e d e s t i m a t o r s f o r t h e normal d i s t r i b u t i o n when s a m p l e s a r e s i n g l y c e n s o r e d o r t r u n c a t e d , T e c h n o m e t r i c s , 1, 3, 2 1 7 - 2 3 7 , 1 9 5 9 . D a v i d , H. A., O r d e r S t a t i s t i c s , 2 n d Ed., 3 6 0 pp., J o h n W i l e y a n d Sons, I n c . , 1981. G i l l i o m , Robert J . , and Dennis R. H e l s e l , E s t i m a t i o n o f d i s t r i b u t i o n a l parameters f o r censored t r a c e - l e v e l w a t e r - q u a l i t y data. I : E s t i m a t i o n t e c h n i q u e s , Water Resources Research, i n p r e s s , 1985. a n d R o b e r t J. G i l l i o m , E s t i m a t i o n o f d i s t r i b u H e l s e l , D e n n i s R., t i o n a l Darameters f o r censored t r a c e - l e v e l w a t e r - a u a l i t v data. 11: V e r i f i c a t i o n and a p p l i c a t i o n s , Water Resources R e s e a r c h , i n p r e s s , 1985.
NATURAL VARIABILITY OF VATER QUALITY I N A TEMPERATE ESTUARY
1
Laurence E . Gadbois" and Bruce J . N e i l s o n V i r g i n i a I n s t i t u t e of Marine S c i e n c e / S c h o o l o f H a r i n e S c i e n c e The C o l l e g e of William & Mary i n V i r g i n i a G l o u c e s t e r P o i n t , VA 23062
AESTRACT I n t e r p r e t i n g t h e d a t a fron: w a t e r q u a l i t y m o n i t o r i n g n e t w o r k s i s difficult if
t h e n a t u r a l v a r i a b i l i t y o f t h e s y s t e m i s n o t known.
A n a l y s i s of d a t a from e s t u a r i e s i s made more d i f f i c u l t by t h e a d v e c t i o n o f s p a t i a l p a t t e r n s with t h e o s c i l l a t i n g t i d e s .
I n t h i s s t u d y samples
w e r e c o l l e c t e d froni a p o l y h a l i n e , p a r t i a l l y - m i x e d
e s t u a r y which
t y p i c a l l y h a s minilral l o n g i t u d i n a l g r a d i e n t s f o r n o s t water q u a l i t y measures.
W a t e r s a m p l e s f r o m a 2.5 meter s h o a l w e r e a n a l y z e d
for
n i t r o g e n and phosphorus c o n t e n t . Data f r o n two 57-hour
intensive studies indicate that hourly
f l u c t u a t i o n s were on t h e o r d e r o f 15%.
Furthermore t h e variations
showed no s i g n i f i c a n t c o r r e l a t i o n w i t h t i d a l h e i g h t . I n t h e s e c o n d p a r t o f t h e s t u d y , samples c o l l e c t e d a t 45 m i n u t e i n t e r v a l s were composited t o d e t e r r i i n e d a i l y a v e r a g e c o n d i t i o n s o v e r a n annual cycle.
In a d d i t i o n t o a s t r o n g s e a s o n a l s i g n a l , i t was found
t h a t d a i l y f l u c t u a t i o n s were on t h e o r d e r o f 20 t o 50 p e r c e n t f o r t o t a l n i t r o g e n a n d t o t a l p h o s p h o r u s a n d 30 t o 70 p e r c e n t f o r n i t r z t e - p l u s n i t r i t e nitrogen.
Data from m o n i t o r i n g networks w i t h less f r e q u e n t
o b s e r v a t i o n s must b e i n t e r p r e t e d w i t h c a u t i o n g i v e n t h e magnitude o f t h e s e s h o r t term v a r i a t i o n s w h i c h a r e a s s u m e d t o a r i s e f r o m n a t u r a l phenomena.
'VIMS C o n t r i b u t i o n KO. XXXX. *Current a d d r e s s :
Naval Ocean Systems C e n t e r , San Diego, CA 92152.
159
I N T R ODUCTIOIJ
A s s e s s m e n t o f w a t e r q u a l i t y c o n d i t i o n s i n a q u a t i c and m a r i n e systems t y p i c a l l y i n v o l v e s t h e c o l l e c t i o n o f g r a b s a m p l e s o n w h i c h p o l l u t a n t c o n c e n t r a t i o n s a r e measured.
O f t e n w e do n o t know t h e e x t e n t
t o which t h e s e g r a b samples a r e measuring " t y p i c a l " v a l u e s a s opposed t o v a l u e s s t r o n g l y i n f l u e n c e d by t i m e - t r a n s i e n t
perturbations.
€!ence,
n a t u r a l t e e p o r a l v a r i a b i l i t y can i n f l u e n c e t h e v a l i d i t y a n d u s e f u l n e s s of c o n c l u s i o n s based upon a s i n g l e o r s m a l l number of samples. Natural v a r i a t i o n s occur in both s p a c e and t i m e .
Spatial scales
r a n g e f r o m t h e a i c r o g r a d i e n t s s u r r o u n d i n g p l a n k t o n and s u s p e n d e d p a r t i c l e s (Lehman and S a n d g r e n , 1 9 8 2 ; K o r s t a d ,
1983) t o v e r t i c a l and
h o r i z o n t a l m a c r o g r a d i e n t s of t h e same s c a l e a s t h e w a t e r body. space v a r i a t i o n s a r e i n t e r - r e l a t e d
T i m e and
i n e s t u a r i e s because s p a t i a l p a t t e r n s
a r e a d v e c t e d up and down r i v e r w i t h t h e o s c i l l a t i n g t i d e s .
This e f f e c t
c a n be s e e n i n t h e d a t a ( F i g u r e 1 ) from an around-the-clock
sampling of
t h e P a g a n R i v e r , a small t r i b u t a r y o f t h e James R i v e r i n V i r g i n i a (Bosenbaum and N e i l s o n , 1 9 7 7 ) .
S a l i n i t y l e v e l s were h i g h e s t a t E i g h
W a t e r S l a c k (PWS) a n d l o w e s t a t Low W a t e r S l a c k (LWS).
Municipal
wastewater d i s c h a r g e s and t h e e f f l u e n t from meat p a c k i n g p l a n t s r e s u l t e d i n e l e v a t e d b e c t e r i a l l e v e l s i n t h e upper r e a c h e s o f t h e e s t u a r y .
Fecal
c o l i f o r n i l e v e l s were l o w e s t a t HWS when d i l u t i o n w i t h r e l a t i v e l y c l e a n James R i v e r w a t e r was t h e g r e a t e s t .
Thus t h e t e m p o r a l v a r i a t i o n s o f
f e c a l c o l i f o r m s and s a l i n i t y w e r e 1 8 0 d e g r e e s o u t o f p h a s e , b u t b o t h showed s e m i - d i u r n a l v a r i a t i o n s w i t h t h e t i d e s .
A l g a l growth, s t i m u l a t e d
by t t e n u t r i e n t s i n t r o d u c e d by t h e s e v e r a l d i s c h a r g e s , response t o
t h e d a i l y c y c l e of
sunlight.
varied
in
Dissolved oxygen
160
l2
(a)
t
* .
+
Figure 1 .
SALINITY
+
+
t
Short-term v a r i a t i o n s i n water q u a l i t y i n t h e Pagan River, Virginia: ( a ) semi-diurnal ( t i d a l ) v a r i a t i o n s i n s a l i n i t y l e v e l s a t three s t a t i o n s , ( b ) semi-diurnal ( t i d a l ) v a r i a t i o n s i n f e c a l coliforni l e v e l s a t four s t a t i o n s , and ( c ) diurnal v a r i a t i o n i n d i s s o l v e d oxygen concentrations a t a s i n g l e s t a t i o n .
161 c o n c e n t r a t i o n s , w h i c h w e r e i n t u r n a f f e c t e d by t h e p h o t o s y n t h e t i c a c t i v i t y , showed a marked d i u r n a l s i g n a l w i t h l i m i t e d t i d a l e f f e c t s . The p r e s e n t s t u d y had a s i t s o b j e c t i v e q u a n t i f i c a t i o n of n o n - t i d a l t e m p o r a l v a r i a b i l i t y u s i n g two d a t a s e t s .
Day-to-day
v a r i a t i o n s were
studied using observations o f d a i l y a v e r a g e w a t e r q u a l i t y c o n d i t i o n s made over an annual c y c l e .
Hourly w a t e r q u a l i t y measurenents t a k e n o v e r
two 57 hour p e r i o d s were used t o i n v e s t i g a t e s h o r t term v a r i a t i o n s s u c h a s t h o s e due t o a s t r o n o m i c a l t i d e s .
YATERIALS AND METHODS W a t e r s a m p l e s were drawn from t h e mid-depth of t h e 2.5 meter w a t e r colunin o v e r a n e a r s h o r e s h o a l a r e a i n t h e p o l y h a l i n e Y o r k R i v e r ( L a t i t u d e 37 1 4 . 8 ,
Longitude 76 30.1).
Samples were c o l l e c t e d w i t h a n
I S C O a u t o m a t i c w a t e r sampler, d e p o s i t e d i n g l a s s j a r s packed i n i c e , and
c o l l e c t e d w i t h i n t h r e e days.
Samples t h a t had been withdrawn from t h e
r i v e r e v e r y 45 m i n u t e s were combined i n t o d a i l y c o m p o s i t e s a a p l e s . S a m p l e s w e r e f i l t e r e d t h r o u g h a 300 m i c r o n n y l o n mesh t o remove d e t r i t u s and l a r g e zooplankton.
Sampling was c o n d u c t e d fron! J u l y 1 9 8 3
t o June 1984. During t h e two 57 hour i n t e n s i v e s t u d i e s (0800 Hay 22 t h r o u g h 1 6 0 0 Hay 24 and 0800 P a y 3 0 t h r o u g h 1600 June 1, 19841, samples were t a k e n from t h e r i v e r e v e r y h o u r , c o l l e c t e d w i t h i n e i g h t h o u r s , f i l t e r e d t h r o u g h a 300 micron nylon mesh as d e t a i l e d above, and f r o z e n w i t h i n 1 2 hours of sampling.
T h e s e s a m p l e s were a n a l y z e d
individually.
The p e r i o d s c h o s e n w e r e 180 d e g r e e s o u t o f p h a s e w i t h r e g a r d t o t h e t i d a l cycle.
162 N u t r i e n t measurements i n c l u d e d t o t a l phosphorus (EPA, 1979 365.21,
t o t a l n i t r o g e n (D’Elia
Method 353.21,
and S t r e u d l e r ,
avin’onia n i t r o g e n (EPA, 1979
p l u s - n i t r i t e n i t r o g e n (EPA, 1979
- Method
1 9 7 7 , a n d EPA, 1 9 7 9
- Method 353.21.
- Piethod
350.11,
-
and n i t r a t r -
Every t e n t h sample
was r u n i n d u p l i c a t e a n d s p i k e d w i t h a known s t a n d a r d t o measure t h e p r e c i s i o n and a c c u r a c y o f t h e a n a l y t i c a l t e c h n i q u e .
D u p l i c a t e s and
s p i k e s were w i t h i n a c c e p t a b l e limits (EPA, 1979). A l l c o n t a i n e r s and l a b ware which c o n t a c t e d t h e samples were r i n s e d
w i t h t a p w a t e r t h r e e t i m e s , r i n s e d w i t h 50% H C 1 o n c e , r i n s e d w i t h d i s t i l l e d deionized water t h r e e t i m e s , and a i r d r i e d b e f o r e u s e .
The
i n t a k e h o s e f o r t h e a u t o m a t i c w a t e r s a m p l e r was washed a s d e s c r i b e d above e a c h week.
RESULTS
Hour-to-hour
0.093
variability:
T o t a l phosphorus ranged between 0.041 and
m g / l d u r i n g t h e two i n t e n s i v e sampling p e r i o d s .
The mean v a l u e ,
s t a n d a r d d e v i a t i o n , r a n g e , minimum v a l u e , maximum v a l u e , and mean h o u r l y f l u c t u a t i o n w e r e v e r y s i m i l a r f o r t h e two p e r i o d s
(See T a b l e 1 and
F i g u r e s 2 and 3 ) .
T o t a l n i t r o g e n c o n c e n t r a t i o n s showed g e n e r a l l y
similar behavior.
A l t h o u g h mean c o n c e n t r a t i o n s were s l i g h t l y h i g h e r
d u r i n g t h e second p e r i o d , t h e s t a n d a r d d e v i a t i o n , r a n g e o f v a l u e s , a n d mean h o u r l y f l u c t u a t i o n a l l were s m a l l e r d u r i n g t h e l a t t e r sampling effort.
When t h e d a t a f o r t h e s o l u b l e i n o r g a n i c p o r t i o n s a r e e x a m i n e d ,
one notes that nitrate-plus-nitrite e l e v a t e d and ammonia-nitrogen sampling period.
n i t r o g e n l e v e l s were s l i g h t l y
l e v e l s were much h i g h e r d u r i n g t h e s e c o n d
P r e v i o u s s t u d i e s i n t h e York R i v e r have documented
changes i n water q u a l i t y (Webb a n d D ‘ E l i a ,
1 9 8 0 ; D‘Elia
e t a l . 19811
163 Table 1.
Summary o f n u t r i e n t d a t a from t h e i n t e n s i - v e samplings.
TP
-
F i r s t sampling: May 2 2 24, 1984 Mean ( n = 5 7 ) Standard Deviation
NH4
TN
0.055 0.011
mg 0.548 0.096
Range Minimum Maximum
0.040 0.041 0.081
Kean Hourly F l u c t u a t i o n S t d Dev of H r l y F l u c
0.012 0,009
-----.-------
/
------------
1 0.073 0.025
0.030 0.005
0.400 0.343 0.743
0.106 0.025 0.131
0.021 0.019 0.040
0.094 0.075
0.014 0.015
0.004 0.004
-Standard Deviation Range Mean Rourly F l u c t u a t i o n Second sampling: May 30
-
N02+N03
--
a s p e r c e n t of sample mean 20 18 34 17 73 73 145 70 21 17 19 12
June 1,1984
TP
TN
-----------g
NH4
N02+N03
1 1 --------------
Ifean ( n = 5 7 ) Standard Deviation
0.058 0.011
0.581 0.068
0.188 0.022
0.035 0.006
Range M i n imum Maximum
0.050 0.043 0.093
0.374 0.431 0.805
0.128 0.105 0.233
0.031 0.019 0.050
Mean E o u r l y F l u c t u a t i o n S t d Dev o f H r l y F l u c
0.011 0.009
0.068 0.055
0.015 0.014
0.005 0.005
-- a s Standard Deviation Range Mean Hourly F l u c t u a t i o n
19 86 18
p e r c e n t of sample mean -12 12 18 64 68 89 12 8 14
t h a t o c c u r when t h e r e i s i n c r e a s e d m i x i n g and reduced s t r a t i f i c a t i o n around times o f s p r i n g t i d e (Haas e t a l . 1 9 8 1 ) .
For t h e c a s e a t hand,
t h e t i d e r a n g e w a s a b o u t 55 cm d u r i n g t h e f i r s t sampling (neap t i d e ) and a b o u t 8 0 cm d u r i n g t h e s e c o n d p e r i o d ( s p r i n g t i d e ) .
The e l e v a t e d
ammonia c o n c e n t r a t i o n s c o u l d b e t h e r e s u l t of t h e mixing of ammonia-rich
164
May 2 2
-
24
Q
0.80
; I
I
I
0.70
1 cn
0.60 .r
0.50
+ :
I-
0.40
I
I
a30
A
Q
:
0.04
P
-
c
0.03
3 +N 0 L II
Ro2
Figure 2 .
I
I
I
I
I
10
20
30
40
50
0
4
Short-term v a r i a t i o n s i n water q u a l i t y i n t h e York River a t Gloucester P o i n t , May 22-24, 1984: ( a ) Total phosphorus, (b) Total n i t r o g e n , ( c > Ammonia n i t r o g e n , ( d ) N i t r a t e - p l u s - n i t r i t e n i t r o g e n , and ( e > Tidal h e i g h t .
165
:
10-
I I I I
May 30
-
June 1
I 7
1
.08-
m
E c
-r
n t-
1;
-061.80
II I
; c
I
\
I 1
1.70
:
-04-
:
5
-25-
1 I I
L50
-g \
20-
-
-. E
L60
+
)A0
:
l.05
:
c
cz
I
.15-
a
I 1
L
7
II I
\ m
:
I
.lo-
*4
E E .r
m
s
5-
I
+-
L O ~8
I I I
I1
0
I
I
I I
I
4-
CI
W I&W
Lo2
:
c c
c,
r
0, c
3-
W
X
m
-0
.,-
:20
I I I I I
:
Figure 3.
'0
I
10
20
30
I
I
40
50
4
Short-term v a r i a t i o n s i n water q u a l i t y i n t h e York River a t Gloucester P o i n t , Hay 30-June 1 , 1984: ( a ) Total phosphorus, (b) Total n i t r o g e n , ( c > Ammonia n i t r o g e n , ( d ) N i t r a t e - p l u s - n i t r i t e n i t r o g e n , and ( e l Tidal h e i g h t .
166 bottom w a t e r s t h r o u g h o u t t h e water column a t t h e t i m e o f s p r i n g t i d e s . It i s c u r i o u s t h a t t o t a l n i t r o g e n l e v e l s , however, remained nearly constant.
The i n c r e a s e i n mean TN (0.04 m g / l ) was much s m a l l e r t h a n
t h e i n c r e a s e i n mean ammonia l e v e l s (0.11 m g / l > .
The s o l u b l e i n o r g a n i c
f r a c t i o n s a c c o u n t e d f o r a b o u t 197: o f t h e t o t a l n i - t r o g e n d u r i n g t h e f i r s t s a m p l i n g b u t m a d e u p 38% o f t h e t o t a l n i t r o g e n d u r i n g t h e s e c o n d sampling. The d a t a i n d i c a t e t h a t hour-to-hour
v a r i a t i o n s a r e on t h e o r d e r o f
10% t o 20% o f t h e mean o f a l a r g e number o f samples.
Addi-tionally, t h e
r a n g e o f c o n c e n t r a t i o n s o b s e r v e d was o f t h e same o r d e r of magnitude as t h e mean c o n c e n t r a t i o n f o r e a c h of t h e w a t e r q u a l i t y m e a s u r e s . analysis of
Factor
t h e h o u r l y n u t r i e n t c o n c e n t r a t i o n s and t i d a l h e i g h t s
r e v e a l e d no s i g n i f i c a n t c o r r e l a t i o n b e t w e e n n u t r i e n t l e v e l s a n d t h e s t a g e of t h e t i d e .
T h e l a c k o f c o r r e l a t i o n i s a p p a r e n t when t h e d a t a
a r e compared i n g r a p h i c a l f o r m a t ( F i g u r e s 2 and 3 ) .
Day-to-day
variability:
Seasonal f l u c t u a t i o n s i n d a i l y a v e r a g e n u t r i e n t
c o n c e n t r a t i o n s were pronounced ( F i g u r e 4 ) .
T o t a l phosphorus l e v e l s were
h i g h e s t i n t h e summer (mean f o r J u l y t h r o u g h September of a b o u t 0.080 mg/l).
From t h i s p e r i o d u n t i l m i d - J a n u a r y ,
d e c l i n e d t o a b o u t h a l f t h e suninier v a l u e s .
total nutrient levels
The i n c r e a s e which began i n
mid-January and p e r s i s t e d t h r o u g h t h e end o f sampling i n J u n e was n o t a s r a p i d a s t h e d e c l i n e f r o m mid-summer
levels.
Examination of t h e
g r a p h i c a l summary o f t h e d a t a shows t h a t m o s t o f t h e v a l u e s f e l l i n a band o f a b o u t 0 . 0 2
t o 0.04 mg/l w i d t h , b u t f r e q u e n t l y v a l u e s t h a t were
much h i g h e r w e r e r e c o r d e d .
P h o s p h o r u s i s known t o s o r b t o m i a e r a l
p a r t i c l e s and t h e s e e l e v a t e d r e a d i n g s c o u l d b e a s s o c i a t e d w i t h i n c r e a s e d
167
0.15-
--
YORK RIVER
A
A A%
A
1 I
(a
TOTAL PROWORUS daily average A
I
0 JUN JUL AUG SEP OCT NOV DEC JAN 1983
FEB MAR APR MAY 1984
JUN
JUL
+ NITRATE
YORK RIVER
average A A A
JUN JUL AUG SEP OCT NOV DEC JAN 1983
F i g u r e 4.
FEB MAR
APR MAY 1984
JUN
JUL
Annual v a r i a t i o n of d a i l y average water q u a l i t y c o n d i t i o n s i n t h e York R i v e r a t G l o u c e s t e r P o i n t from June 1983 t o J u l y 1984: ( a ) T o t a l phosphorus , (b) T o t a l n i t r o g e n , and ( c ) N i t r a t e - p l u s - n i t r i t nitrogen
.
168
levels of turbidity that occur following storms. Total nitrogen followed a similar, although less pronounced, pattern.
Concentrations averaged over 0.7 mg/l from July through the
end o f October and 0.5 mg/l during the winter.
The seasonal trend for
njtrate-plus-nitrite, however, was the inverse of the total nitrogen pattern and was of a far greater magnitude.
Early summer levels were
near zero (mean of 0.003 for June 1983) and the mean for July and August was only 0.016 mg/l.
Concentrations increased from late August
averaged around 0.083 mg/l through January.
and
Daily values of 0.10 mg/l
in late January were followed by a rapid drop in concentration in February and March; spring (February through April) values averaged about 0.040 mg/l and decreased to a mean of about 0.025 mg/l for May and June. The pattern of day-to-day variability resembles the seasonal pattern in that nitrate-plus-nitrite was substantially more variable than total nitrogen and total phosphorus.
The daily fluctuations were
on the order of 30% to 50% for TN and TP and several hundred percent for nitrate-plus-nitrite.
DISCUSSION One would expect nutrient concentrations in the water column to be affected by runoff from the land.
Generally s p e a k i n g , h i g h
values for one nutrient usually were not correlated with high values for other nutrients.
This probably is due to missing data, the large volume
of the river near the sampling site, and the effects of tidal mixing. Iiowever in mid-April 1984, all three variables measured showed elevat ec levels (days 116-118).
River flow was high for the month with local
169 maxima on t h e 1 8 t h (day 1 0 9 ) and t h e 25th ( d a y 1 1 6 ) .
Rainfall records
i n d i c a t e t h a t r a i n f a l l n o t o n l y was above normal, b u t t h a t most o f i t o c c u r r e d on a few d a y s ( A p r i l 4-5,
14-16,
and 2 2 - 2 3 ) .
It is not c l e a r
why t h e s e e v e n t s had s u c h a pronounced e f f e c t on w a t e r q u a l i t y , b u t t h e c o n c u r r e n t r i s e i n TN, TP and n i t r a t e - p l u s - n i t r i t e
a t a t i m e of high
r i v e r f l o w s u g g e s t s t h a t r u n o f f was t h e c a u s e . A marked r e d u c t i o n i n c i t r a t e - p l u s - n i t r i t e a b o u t d a y 45.
l e v e l s can be noted a t
T h e York R i v e r e s t u a r y t y p i c a l l y e x p e r i e n c e s a s p r i n g
p h y t o p l a n k t o n bloom and t h i s i s b e l i e v e d t o be t h e c a u s e o f t h e c h a n g e in nitrate-plus-nitrite that t i m e .
levels.
Water t e m p e r a t u r e s were a b o u t 5 C a t
I n December a n d J a n u a r y , t h e w a t e r was r e l a t i v e l y c l e a r
( S e c c h i d e p t h r e a d i n g s w e r e o n t h e o r d e r o f 1 . 5 m) i n p a r t b e c a u s e p h y t o p l a n k t o n l e v e l s were low ( c h l o r o p h y l l c o n c e n t r a t i o n s a v e r a g e d a b o u t
6 micrograns per l i t e r ) .
F r o m mid-February
t h e S e c c h i d e p t h a v e r a g e d o n l y a b o u t 0.75 a v e r a g e d o v e r 20 m i c r o g r a m s p e r l i t e r .
t h r o u g h t h e end o f Karch, m and c h l o r o p h y l l l e v e l s
Whether t h e a l g a e u t i l i z e d t h e
n i t r a t e and n i t r i t e d i r e c t l y , or u t i l i z e d ammonia, t h e r e b y r e d u c i n g t h e amount of ammonia a v a i l a b l e f o r n i t r i f i c a t i o n , t h e d a t a s u g g e s t t h a t t h e decrease i n nitrate-plus-nitrite
l e v e l s was r e l a t e d t o t h e s p r i n g a l g a l
bloom.
CONCLUSIOhTS
D a t a f r o m two t y p e s o f s a m p l i n g i n d i c a t e t h a t n a t u r a l v a r i a b i l i t y i n water q u a l i t y c o n d i t i o n s i s g r e a t .
Hour-to-hour
v a r i a t i o n s are on
t h e o r d e r o f 1 0 % t o 20% o f t h e mean o f a l a r g e number of samples.
The
r a n g e o f c o n c e n t r a t i o n s o b s e r v e d o v e r a d a y or two i s o f t h e same magnitude a s t h e mean c o n c e n t r a t i o n .
170 S e a s o n a l v a r i a t i o n s c a n b e pronounced f o r water q u a l i t y . T o t a l n i t r o g e n a n d t o t a l p h o s p h o r u s l e v e l s were h i g h e s t i n t h e summer and lowest i n t h e w i n t e r ; n i t r a t e - p l u s - n i t r i t e
n i t r o g e n was p r e s e n t a t
v e r y low l e v e l s d u r i n g t h e summer and was abundant d u r i n g t h e w i n t e r , p r e s u m a b l y a s t h e r e s u l t o f u p t a k e o f ammonia a n d n i t r a t e b y phytoplankton.
Day-to-day
v a r i a t i o n s were on t h e o r d e r of 30% t o 50%
f o r Tn a n d TP a n d u p t o s e v e r a l h u n d r e d p e r c e n t f o r n i t r a t e - p l u s n i t r i t e , d e s p i t e a sampling p r o t o c o l d e s i g n e d t o r e d u c e t h e i n f l u e n c e o f t i d e s and o t h e r s h o r t t e r n phenomena.
Presumably m e t e o r o l o g i c a l e v e n t s
s u c h a s t h e p a s s a g e of f r o n t s , winds, and r u n o f f from t h e a d j a c e n t l a n d produce some of t h e v a r i a b i l i t y o b s e r v e d . The i n t e r p r e t a t i o n o f v o n i t o r i n g d a t a ciust be conducted w i t h t h e understanding t h a t t h e r e is considerable v a r i a b i l i t y i n t h e r e c o r d s a t time s c a l e s of
h o u r s and d a y s .
Care m u s t b e t a k e n t o i n s u r e t h a t
c o n c l u s i o n s d e r i v e d from w a t e r q u a l i t y m o n i t o r i n g programs a r e n a d e w i t h t h a t u n d e r s t a n d i n g i n mind.
REFERENCES A N ) 1.TTERATURE CITED D'Elia, C . F . and C. S t r e u d l e r , 1977. " D e t e r m i n a t i o n o f t o t a l n i t r o g e n i n aqueous samples u s i n g p e r s u l f a t e d i g e s t i o n " Limnology & Oceanography 2 2 ( 4 ) : 760-764. "Time Varying D'Elia, C . F., K . I.. Webb a n d R . L. W e t z e l , 1 9 8 1 . H y d r o d y n a m i c s a n d Water Q u a l i t y i n a n E s t u a r y " i n Estuaries and Nutrients, N e i l s o n and Cronin Eds., Hunana P r e s s , C l i f t o n , N. J.
E n v i r o n m e n t a l P r o t e c t i o n Agency (EPA) , 1 9 7 9 . Methods for Chemical Analysis of Water and Wastes. EPA-600/4-79-020. G a d b o i s , L. E . , 1984. "The Fesponse of B e n t h i c R e s p i r a t i o n t o N u t r i e n t Levels", u n p u b l i s h e d KS t h e s i s , School of M a r i n e S c i e n c e , C o l l e g e of William & Nary i n V i r g i n i a , 91pp.
171 Haas, L. W., F. J . Holden and C. S. Welch, 1981. "Short Term Changes i n V e r t i c a l S a l i n i t y D i s t r i b u t i o n of t h e York R i v e r E s t u a r y A s s o c i a t e d w i t h Reap-Spring T i d a l Cycle" i n Estuaries and Nutrients, N e i l s o n and Cronin E d s , Humana P r e s s , C l i f t o n , N. J. K o r s t a d , J. , 1983. " N u t r i e n t r e g e n e r a t i o n by z o o p l a n k t o n i n s o u t h e r n J. G r e a t . Lakes Res. 9(3): 374-388. Lake Huron". Lehman, J. T., and C. D. Sandgren, 1982. " P h o s p h o r u s d y n a m i c s o f t h e Limnol. & Oceanogr. p r o c a r y o t i c nanoplankton i n a Michigan lake". 27(5) : 828-838. Rosenbaum, A. and B. N e i l s o n , 1977. "Water Q u a l i t y i n t h e Pagan River" S p e c . Rep. No. 132, V i r g i n i a I n s t i t u t e o f M a r i n e S c i e n c e , G l o u c e s t e r P o i n t , VA. Webb, K . I.. and C. F. D'Elia, 1980. " N u t r i e n t and Oxygen R e d i s t r i b u t i o n D u r i n g a S p r i n g Neap T i d a l C y c l e i n a Temperate E s t u a r y " S c i e n c e 207, 29 Feb 1980, pp. 983-985.
This Page Intentionally Left Blank
EXTENSION OF WATER QUALITY DATA BASES I N PLANNING FOR W A T E R TREATMENT G.T. ORLOB A N D N. M A R J A N O V I ~ University of California, Davis
ABSTRACT Design of of are
water
treatment facilities requires estimation of extreme values
critical water quality parameters.
or
sparse
non-existent
a
When water quality data for the source
sufficient
record
for
statistical
be constructed from fragmentary records at nearby locations.
analysis
must
A procedure is
described for construction of the necessary record and derivation of a design target
vector
records, quality
of
time
water
series
parameters
quality.
analysis,
from
It
includes spatial
frequency
analysis
correlations of
and
correlation
partial
of
water
both continuous and grab sampling campaigns.
It is
demonstrated for the example of tho North Bay Aqueduct of the California State Water Project. 1.
INTRODUCTION The North Bay Aquaduct, a component of the California State Water Project
(SWP), will divert water from a tributary of the Sacramento River in
Northern
California to serve municipal and industrial users, who will have to
provide
treatment
preparatory
to distribution.
Initially,
the
SWP planned to divert
water from Cache Slough in the northern Sacramento-San Joaquin Delta, the present
location
of
deterioration of was
installed
the
intake for the
water
has
quality
motivated
alternative location
at
City of
designers of
on nearby
Vallejo.
this location the
However, progressive
since Vallejo's pumping plant new
aquaduct
Lindsey Slough, as shown in
to
consider
an
Figure 1, where
water is expected to be of superior quality. It is necessary for the design of statistical
properties of
water
water treatment facilities to derive the
quality
at
the
new location
using records at
Cache Slough, Lindsey Slough, and other sampling stations without the advap tage of a common period of observation.
The temporal distribution of partial
records at various locations in the study area is summarized in Table 1. Records at Cache Slough, obtained by a continuous EC recorder over the period 1972 to 1984, are sufficiently detailed in the temporal sense to allow estimation cycles
(the
of
long location
term is
trends, influenced
seasonal by
tides)
variations, and
longer
quality
period
changes
due
tidal to
174
FIGURE 1. LINDSEY SLOUGH AND VICINITY, LLXXTION M4F' FOR DIVERSION POINT TABLE 1 SPATIAL CDRRELATIONS BMWEEN EC AT CACHE SLOUGH AND SELECTED UXATIONS EC(sta. Stat ion
Locat ion
Sample
Analysis
Period
)/
EC(C. s.)
2
Cache Slough at Vallejo Pumping Plant
C
Ec
72-84
1.0
3
Lindsey Slough at Hastings Cut
G
IT
77-83
0.69 0.57-0.67
4
Barker Slough at Hwy 113
G
85
0.50
5
Calhoun Cut at Hwy 113
G
85
0.60
6
Prospect Slough Liberty Island
G
oc oc cx:
85
0.25
7
Lindsey Slough near Rio Vista
G
oc
52-66
0.40 0.37-0.43
8
Barker Slough at Proposed Pumping Plant
G
oc
85
0.77-0.96
9
Cache Slough at Hastings Island Pwnping Plant
G
IT
77-83
0.55 0.52
C Continuous recorder; GGrab
EGElectrical conductivity; -Partial;
EC,CI,TDS; OC=Gomplete chemical
175 discrete hydrologic events, yet they do not include water quality parameters of greatest interest to treatment plant designers. Records at Lindsey Slough, on the
other hand, although
extending over a
period
more detailed in terms
without
regard
to hydrologic conditions that
problem
in this investigation
Slough location
of
quality
constituents,
1952-1969, are from monthly grab samples collected
may affect water quality.
The
is to derive a record of quality at the Cache
sufficient to allow correlation with the Lindsey Slough data.
When this is accomplished the Lindsey Slough record, with more quality information relative statistically
to design can be extended in time, translated in space and
analyzed
to
establish
limiting
criteria
for
treatment
plant
design. In this paper a procedure for development of the statistical properties of at
EC
the
proposed
diversion
location
(Station
8, Figure
1) is
described.
Additionally, the extension of this record to create a vector of water quality concentrations of statistical
key design parameters is discussed.
analysis,
after
adjustment
for
Finally, the results of
treatment
plant
operational
constraints, are transformed into specific targets for design. STATISTICS OF WATER QUALITY Two basic problems are presented i n this situation, one concerned with the
2.
spatial displacement observation
and
between the
the
other
location
concerned
of
with
the
diversion
and
points
temporal discontinuities
in
of the
various records. 2.1 Spatial Correlations
As illustrated in Table 1 there were no periods of concurrent observation at the two locations of longest record, Cache Slough and Lindsey slough near Rio Vista.
However, one set of grab samples (EC and chlorides) taken over the
period 1977-1983 does include both Cache Slough Hastings (Sta. 9) and Lindsey Slough
Hastings
Vallejo (Sta. Figure 2 .
2).
(Sta. A
plus
3),
the
correlogram
continuous
Synoptic surveys conducted
information
that
sion location.
permitted Results
extension of
of
these
EC
record
at
Cache
Slough
for the Cache Slough stations is shown in in
1984 and 1985 provided additional
the 77-83 correlations to the diver-
studies are summarized for
all stations in
the area in Table 1. A key relationship i n translating the experience of the two longer records to the diversion location tions (3 and 7). is
generally
Sloughs. that
is the correlation between the Lindsey Slough sta-
Results of
degraded
in
an
correlation analysis indicates that water quality upstream
direction,
in
both
Cache and Lindsey
For example, in Cache Slough the lower station shows water quality
is superior to that
at
the Vallejo diversion point
by
the ratio 0.55:l.
176
In Lindsey Slough the lower station is also superior by a ratio of 0.40:0.69 (in terms of
Cache Slough quality).
the dominance of
land-derived
The significance of
sources of
this degradation
is
salinity over the primary source of
water for diversion, the Sacramento River at the confluence with its two tributaries.
During periods of storm runoff, water entering the upper reaches of
the sloughs is generally inferior, accumulated
during the
persists
accretions
as
added to the system. is
inverse
apparently as a result
prior
dry
from
groundwater
period.
The overall result
from that
of
During dry and
local
pick
u p of
periods this irrigation
salts
condition
drainage
are
is a salinity (quality) gradient that
normally encountered
in
estuarial systems,
i.e.
negative
in the seaward direction.
F
2
500
200
z t-
(I)
r
/ /O 00
t
,-doocr,
'
/
I
/\
-0
oo
/ EC,
=
0.55 EC,
/
V
/
W
0
0
I
I
I
EC
I 600
I
I
200
0
400
J
I
I
800
VALLEJO PUMPING PLANT, pmhos/cm
FIGURE 2. CORRELATION BETWEEN EC's AT TWO CACHE SLOUGH STATIONS, 1978-1983 2.2 Time Series Analysis
Attempts to were not ties of
extend
the partial
records
by
traditional statistical
altogether successful in this case, apparently
the
estuarial
environment.
Nevertheless,
they
methods
due to the complexiprovided useful insight
in interpretation of partial water quality records. The Cache Slough EC record, a fragment of which is illustrated in Figure 3, was divided into two parts of equal length and tested for stationarity with
BMDP (Dixon, et al, 1981). significant
and
the
existed, apparently
time
The difference in mean values was found to be series
was
tested
due to the accumulation of
for
trend.
A
positive
trend
salt in the tributary drainage
177
due to domestic waste discharges of a small city where increasing use of water softeners
has
been
increased
salinity
noted.
were
After
identified,
detrending a
primary
of
the
cycle
data
associated
two
cycles
with
of
surface
runoff during the period October through March and a secondary cycle related to
the
irrigation
period
April
through
September.
The
dominant
cause
of
abnormal salinities, however, is surface runoff.
I
FIGURE 3. PARTIAL RECORD OF EC AND PRECIPITATION AT CACHE SLOUGH, CITY OF VALLEJO PUMPING PLANT
Regression
with
attempted.
precipitation
This effort
ficiently strong
to justify
data
base to overlap that
then
to
resort
to
at
the
nearest
was unsuccessful; utilization of
frequency
of
that
meteorologic
a regression
of
the
was
equation to extend the
the Lindsey Slough station. analysis
station
is, correlations were not suf-
partial
It
records,
was necessary relating
these
through the spat!al correlations described above. 2.3 Frequency Analysis The
time
series analysis did
analysis of Cache Slough EC data. with
periods
of
high
produce
information
of
value
in
frequency
I t associated t h e dominant episodes of EC
surface runoff,
thus
indicating
the
importance of
this
source
of
salinity
in
establishing
critical
design
criteria
for
water
treat-
ment. Two factors control the design of water treatment from the point of view of
specific water quality parameters:
peak
concentration and duration.
In
analysis of EC data at Cache Slough individual episodes were characterized by frequencies
of
exceedence
at
specified
durations
of
1,
Results of this analysis are summarized in Table 2. EC at Cache Slough are illustrated in Figure 4. translated to Lindsey Slough and the location correlation relationships summarized in Table 1.
3,
7
and
days.
30
Typical distributions for
These distributions are then of
proposed diversion by
the
TABLE 2 F REQ UENCY-DU RATION-EXCEE DENCE ELECTRICAL CONDUCTIVITY AT CACHE SLOUGH 1972 - 1984
Limits of Exceedence, pmhos/cm Recurrence Interval - years Duration, days
1
2
5
10
1 3 7 30
1170 1070 950 580
1220 1110 1000 740
1350 1140 1070 870
1950 1160 1120 950
2.4 Other Quality Parameters
Electrical
conductivity
is
not
itself
is necessary also to describe the water position,
hardness,
silica,
e.g. iron and manganese.
turbidity
and
sufficient
for
design
purposes.
It
supply in terms of its mineral comthe
concentration of
certain
metals,
Since these data were not available at Cache Slough
they had to be developed for the Lindsey Slough
-
Rio Vista location, then
transferred to the diversion point. For
the
quantities required
mineral
derived
constituents, i.e.
from
values can
be
these,
like
derived
by
the
principal
hardness
and
cations and
total
correlation with
EC.
dissolved In
anions and solids,
general
the these
correlations take the form
x
= K(EC)"
(1)
where X is the desired quality parameter and K and n are constants. Table 3 summarizes the EC correlations developed for the Lindsey Slough location. vs EC.
Figure
5
presents
a
representative quality
correlation,
chlorides
179
0
I
\
l
I
I
l
I
I
I
I
I
I
I
I
I
I
-
v)
0
c
-
E
=
-
1000-.
FIGURE 4. FREQUENCY O F EXCEEDENCE OF E C AT VARIOUS DURATIONS, CACHE SLOUGH--VALLEJO
140 120
2
z ~
W
e
s
[I
0
I
I
I
I
I
I
I
I
I
I
I
+ /
-
+ -
/+
100-
+/
H+
00
-
+ + + + /’
+A
-
+
++p+ & , /*
60-
+
4020
/
-
o y
0
0
’
-
CI- = 0.015 EC’’3
/ + / J I
I
1
I
I
I
I
I
I
1
ELECTRICAL CONDUCTIVITY pmhos/cm FIGURE 5. WATER QUALITY CORRELATION, CHLORIDES VS ELECTRICAL CONDUCTIVITY, LINDSEY SLOUGH NEAR RIO VISTA, 1952-1966
,
180
TABLE 3 CORRELATION OF WATER QUALITY CONSTITUENTS WITH ELECTRICAL CONDUCTIVITY, LINDSEY SLOUGH NEAR RIO VISTA Constituent
Range
EC Correlation
EC
140 - 500
TDS
100
-
1.0 40 + 0.46 EC
270
c1-
10
TH (as CaC03)
50 - 160
0.153 EC1.I4
Na+
10 -
40
0.035 EC1*l4
8 -
36
Ca++ (as CaCO3)
0.015
50
ECLmJ
0.075 EC1.14 0.078 EC1.14
Mg++ (as CaC03) SO4 HC03
60 - 200
0.71 ECoa9
Si02
10 -
25
none
Turbidity
20
-
700
none
Reactive processes,
an
(dissolved) silica,
cannot
indigenous soils
-
7
be related of
the
0.0008 EC1.83
70
important
t o EC, but
tributary
area.
certain
industrial
is more closely identified
with the
In
parameter
for
this locality
centrations varied between rather narrow limits, from
soluble silica
con-
1 0 t o 25 mg/l, and did
not appear t o depend on hydrologic or agricultural conditions. Turbidity, ditions,
on
particularly
pitation.
Since
frequency
analysis
the to these
other
hand,
episodes of were
was
closely
surface runoff
generally
was possible,
related
stochastic
although limited
to
hydrologic
generated by
in
heavy
character
t o some extent
conpreci-
traditional by available
Turbidities measured a t t h e Cache Slough Vallejo intake for a period of
data. about
four
point.
years
1980-1983
served
as
surrogate
They were utilized directly without
measures
for
the
diversion
correction for geographic disloca-
tion. 2.5 Water Quality a t Diversion Point Five year-1 day concentrations of key water quality parameters were determined
at
the
several
sampling
locations,
point by means of correlations presented for
the
diversion
point
was
formed
then
translated
in Table 1.
that
was
to
the
diversion
Thus, a quality vector
considered
representative
of
extremes that would have t o be accommodated in an economic design for water treatment.
The final design criteria a r e presented in Table 4.
181
TABLE 4 WATER TREATMENT DESIGN TARGETS NORTH B A Y AQUEDUCT POINT OF DIVERSION Constituent
Target, mg/L*
Turbidity NTU Dissolved S O 2 , mg/L Calcium Magnesium Total Hardness
710 30 180 170 350
Sodium Potassium
180 14
C hlor ide Sulfate Alkalinity
128 175 24 1
Total Dissolved Solids, mg/L Electrical Conductivity, ,umhos/cm
760 810
*As equivalent C a C 0 3 except a s otherwise noted 3.
SUMMARY A N D CONCLUSIONS A
water
general
procedure
treatment
described.
It
quality
adjacent
at
cause-effect
for
facilities includes
developing using
a
considerations
locations,
relationships.
The
water
water of
quality
of
spatial
fragmentary
and
principal
steps
targets
unknown and
temporal
discontinuous in
the
for
quality
design has
variations records,
procedure
are
of
been in and as
follows: 1.
Spatial correlation between stations with partial records
2.
Time series analysis of selected records
3.
Frequency analysis
4.
Selection of design frequency and duration of exceedence
5.
Correlation analysis between multiple parameters
6.
Translation of quality characteristics t o design location
7.
Formation of a design target vector.
The procedure was applied t o water quality data from the CacheLindsey Slough area in the vicinity of a proposed pumping diversion t o t h e North Bay Aqueduct of the California S t a t e Water Project.
A vector of
design of a water treatment plant was derived.
water quality targets for
182
REFERENCES Dixon, W.J., Brown, M.B., Engleman, L., Frane, J.W., Hill, M.A., Jennrich, R.I. and Toporek, J.D., 1981. "BMDP Statistical Software", University o f California Press, Berkeley, Ca.
STATISTICAL WESLEY 0 .
INFERENCES FROM COLIFORM MONITORING O F POTABLE WATER
PIPES
INTRODUCTION C o l i f o r m m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems i n v o l v e s c o l l e c t i n g samples f r o m w a t e r s e r v i c e l o c a t i o n s and d e t e r m i n i n g i f c o l i f o r m b a c t e r i a a r e p r e s e n t i n one o r more subsamples,
each sub-
s a m p l e h a v i n g a s t a n d a r d v o l u m e o f e i t h e r 10 m l o r 1 0 0 m l .
If the
membrane f i l t e r t e c h n i q u e (MF) f o r s a m p l e e x a m i n a t i o n i s u s e d ,
a
s i n g l e s u b s a m p l e o f 100 m l i s t e s t e d a n d a n u m b e r , t h e MF c o l i f o r m colony count,
i s obtained along w i t h t h e information about t h e
presence o f c o l i f o r m bacteria. method i s used,
I f t h e fermentation tube (FT)
f i v e 10 m l subsamples a r e t e s t e d and t h e number o f
subsamples w i t h p o s i t i v e r e a c t i o n s ( c o l i f o r m s p r e s e n t ) i s recorded. Samples a r e c o l l e c t e d o n e o r more d a y s p e r m o n t h ( b u t u s u a l l y n o t e v e r y day o f t h e month) and f r o m one o r more s a m p l i n g l o c a t i o n s (but c e r t a i n l y n o t every possible sampling l o c a t i o n ) . o f t h e month t h e l a b o r a t o r y r e s u l t s a r e t a b u l a t e d ,
A t t h e end
c e r t a i n para-
m e t e r s a r e c a l c u l a t e d and compared w i t h s t a n d a r d s and t h e a c c e p t a b i l i t y o f t h e w a t e r f o r human c o n s u m p t i o n
is d e t e r m i n e d f r o m t h e
comparisons. There a r e s e v e r a l v e r y i n t e r e s t i n g s t a t i s t i c a l problems r e l a t e d t o t h e process o f c o l i f o r m monitoring. o f the statistical
l i t e r a t u r e which developed from t h e problems o f
coliform monitoring.
T h i s l i t e r a t u r e has been r e v i e w e d e l s e w h e r e
( E l Shaarawi and Pipes, here.
There i s a l a r g e section
1982) and w i l l n o t be e x p l o r e d f u r t h e r
Some o f t h e s t a t i s t i c a l
problems have been d e a l t h w i t h i n
g r e a t d e p t h w h i l e o t h e r s have b a r e l y been touched. T h e r e g u l a t o r y rationale f o r c o l i f o r m m o n i t o r i n g i s t o p r o v i d e a b a s i s f o r d e c i s i o n making.
The s a m p l i n g r e s u l t s f o r a m o n t h a r e
compared w i t h acceptance c r i t e r i a . teria,
I f t h e r e s u l t s exceed t h e c r i -
t h e n some a c t i o n m u s t b e t a k e n t o r e d u c e t h e l e v e l o f c o n -
t a m i n a t i o n o f t h e water system.
On t h e o t h e r h a n d ,
are l e s s than t h e acceptance c r i t e r i a ,
i f the results
no a c t i o n need be t a k e n .
I t i s usual t o r e p o r t t o t h e p u b l i c t h a t t h e w a t e r meets t h e bac-
t e r i o l o g i c a l standards w i t h o u t e x p l a i n i n g t h a t c e r t a i n l e v e l s o f c o n t a m i n a t i o n a r e a c c e p t a b l e under t h e standards used.
However,
184
Table 1 U. S . P R I M A R Y D R I N K I N G WATER REGULATIONS M i c r o b i o l o g i c a l Maximum C o n t a i n m e n t L e v e l s
A.
Membrane F i l t e r (MF) M e t h o d ( 1 0 0 m l S a m p l e s )
1. 2. 3.
B.
Sample a v e r a g e c o u n t s h a l l n o t be g r e a t e r t h a n 1 p e r 1 0 0 m l
No m o r e t h a n 1 s a m p l e w i t h c o u n t > 4 p e r 1 0 0 m l , i f l e s s t h a n 20 samples a r e exainined. No m o r e t h a n 5 % o f s a m p l e s w i t h c o u n t > 4 p e r 1 0 0 m l , o r more samples a r e examined.
F e r m e n t a t i o n Tube ( F T ) T e c h n i q u e ( f i v e 10 m l
1. 2. 3.
i f 20
o r t i ons )
No m o r e t h a n 1 0 % o f t u b e s p o s i t i v e .
No m o r e t h a n 1 s a m p l e w i t h 3 o r m o r e p o r t o n s p o s i t i v e
if
l e s s t h a n 20 s a m p l e s a r e examined. No m o r e t h a n 5 % o f s a m p l e s w i t h 3 o r m o r e p o r t i o n s p o s t i v e , i f 20 o r more s a m p l e s a r e examined.
i f t h e s t a n d a r d i s e x c e e d e d a n d t h i s f a c t is r e p o r t e d t o t h e p u b l i c , i t i s usual t o e x p l a i n t h a t i n s p i t e o f t h e existence o f "contamination"
i n t h e water t h e r e i s no danger t o h e a l t h .
The maximum m i c r o b i o l o g i c a l c o n t a m i n a n t l e v e l s ( M C L ' s ) o f t h e U.
S.
D r i n k i n g W a t e r R e g u l a t i o n s a r e g i v e n i n t a b l e 1.
examples o f acceptance c r i t e r i a p r e s e n t l y i n use. d i f f e r e n t r u l e s f o r each method o f examination.
These a r e
There a r e two It should be
noted t h a t t h e r u l e s a r e w r i t t e n i n terms o f sample parameters r a t h e r than parameters o f t h e occurrence o f c o l i f o r m b a c t e r i a i n t h e d i s t r i b u t i o n system. method a r e p a r a l l e l .
The two r u l e s f o r e a c h e x a m i n a t i o n
The f i r s t r u l e i n e a c h c a s e i s a l i m i t o n
t h e a v e r a g e number o f c o l i f o r m b a c t e r i a i n t h e samples a n d t h e second r u l e i s a l i m i t on t h e f r a c t i o n o f t h e samples w i t h l a r g e numbers o f c o l i f o r m b a c t e r i a p r e s e n t .
The number o f s a m p l e s e x -
amined each month v a r i e s f r o m 1 f o r systems s e r v i n g l e s s t h a n 1000 p e o p l e t o more t h a n 500 f o r v e r y l a r g e systems. There a r e two o t h e r problems which w i l l be mentioned here as an It i s not
a s i d e and t h e n c o n s i d e r e d f u r t h e r i n l a t e r s e c t i o n s .
c l e a r t h a t t h e r e i s any reason f o r u s i n g one month as a standard sampling p e r i o d o t h e r t h a n as a m a t t e r o f convenience.
Ideally,
t h e r e p o r t i n g p e r i o d s h o u l d be r e l a t e d t o t h e p e r s i s t e n c e l o f t h e microbiological water quality.
Also,
i t i s n o t c l e a r w h y t h e num-
b e r o f samples examined p e r r e p o r t i n g p e r i o d s h o u l d be d i f f e r e n t f o r water d i s t r i b u t i o n systems o f d i f f e r e n t s i z e s .
Indeed,
sam-
p l i n g t h e o r y s u g g e s t s t h a t t h e number o f s a m p l e s r e q u i r e d i s r e l a t e d t o t h e desired p r e c i s i o n o f t h e parameter estimation,
not t o
185
t h e s i z e o f t h e w a t e r system. S i n c e 1 9 7 8 , we h a v e e x a m i n e d some o f t h e q u e s t i o n s r e l a t e d t o t h e m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a i n studies a t Drexel University.
The o b j e c t i v e o f t h i s p a p e r i s
t o i d e n t i f y some o f t h e y e t u n s o l v e d p r o b l e m s a n d s t i m u l a t e f u r t h e r i n t e r e s t i n attempts t o f i n d s o l u t i o n s f o r these problems. C o l i f o r m m o n i t o r i n g d a t a c a n p r o v i d e much m o r e i n f o r m a t i o n a b o u t w a t e r s y s t e m s t h a n i s now o b t a i n e d a n d t h e r e a r e some s i g n i f i c a n t problems needing f u r t h e r s t a t i s t i c a l
investigation.
FREQUENCY DISTRIBUTIONS FOR COLIFORM DENSITY The i n i t i a t i o n o f t h e s t u d i e s a t D r e x e l was t h e q u e s t i o n o f t h e minimum number o f samples p e r month needed f o r m o n i t o r i n g t h e s a m l l e s t w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a (Pipes and C h r i s t i a n ,
1982).
I t was w i d e l y r e c o g n i z e d t h a t t h e o n e sam-
p l e p e r month f o r t h e s m a l l e s t systems i s n o t adequate b u t ,
i n
1 9 7 8 , t h e r e was n o g o o d m e t h o d o f d e t e r m i n i n g how many s a m p l e s would be adequate.
To a p p r o a c h t h e q u e s t i o n o f t h e a d e q u a c y o f
t h e n u m b e r o f s a m p l e s i t i s n e c e s s a r y t o a s s u m e t h a t t h e r u l e was i n t e n d e d as a l i m i t o n t h e a v e r a g e c o l i f o r m d e n s i t y i n t h e w a t e r d i s t r i b u t i o n system.
C l e a r l y , t h e average c o l i f o r m colony count o f
t h e s a m p l e s c a n b e u s e d t o e s t i m a t e t h e mean c o l i f o r m d e n s i t y o f t h e w a t e r i n t h e d i s t r i b u t i o n s y s t e m a n d i t seems r e a s o n a b l e t o assume t h a t t h e c o m m i t t e e t h a t f o r m u l a t e d t h e f i r s t r u l e i n t e n d e d , i n some way,
t o p u t a l i m i t on t h e t o t a l number o f c o l i f o r m b a c t e r i a
w h i c h i s t h e mean d e n s i t y t i m e s t h e v o l u m e o f w a t e r i n t h e s y s t e m . Also,
i n o r d e r t o e v a l u a t e adequacy o f t h e number o f samples i t
i s n e c e s s a r y t o assume s o m e t h i n g a b o u t t h e d e s i r e d p r e c i s i o n o f t h e e s t i m a t e o f t h e mean c o l i f o r m d e n s i t y . s e t a t 1 p e r 100 m l ,
S i n c e t h e l i m i t was
we a s s u m e d t h a t t h e f o r m u l a t o r s o f t h e r u l e
w e r e c o n c e r n e d t h a t a mean c o l i f o r m d e n s i t y o f 1 p e r 1 0 0 m l w o u l d i n d i c a t e l a c k o f adequate p r o t e c t i o n ;
i.e.,
t h a t t h e r e i s something
s i g n i f i c a n t a b o u t 1 p e r 100 m l o t h e r t h a n t h a t i t i s a s m a l l number which i s not zero. m a t t e r o f concern, ficiencies.
We f u r t h e r a s s u m e d t h a t ,
i f 1 p e r 100 m l i s a
t h e n 10 p e r 1 0 0 m l w o u l d i n d i c a t e s e r i o u s d e -
I n o t h e r words,
a confidence i n t e r v a l on t h e estimate
o f t h e mean c o l i f o r m d e n s i t y w h i c h i n c l u d e d 1 0 p e r 1 0 0 m l w o u l d n o t be acceptable.
This leads t o the formulation o f a c r i t e r i o n
t h a t t h e s a m p l e s t a t i s t i c s s h o u l d a l l o w a n e s t i m a t i o n o f a mean c o l i f o r m d e n s i t y o f 1 p e r 100 m l w i t h a 95% c o n f i d e n c e i n t e r v a l o f t o r
-
1 p e r 100 m l .
186 E s t i m a t i o n o f t h e mean c o l i f o r m d e n s i t y o f a w a t e r d i s t r i b u t i o n system i s e a s i e r i f t h e f r e q u e n c y d i s t r i b u t i o n o f c o l i f o r m d e n s i t y
i s known.
I n particular,
i f the variance o f the coliform density
i s r e l a t e d t o t h e mean d e n s i t y ,
then i t i s essential
t o know t h e
frequency d i s t r i b u t i o n . Our i n v e s t i g a t i o n s o f t h e f r e q u e n c y d i s t r i b u t i o n s o f c o l i f o r m d e n s i t i e s have r e l i e d e n t i r e l y o n MF c o l i f o r m c o l o n y c o u n t s .
MF
c o l i f o r m c o l o n y c o u n t s h a v e t o b e i n t e g e r s w h i c h h a s l e d some i n vestigators t o t r y t o f i t the counts t o a negative binomial d i s tribution.
We h a v e p u b l i s h e d o n t h i s ( C h r i s t i a n a n d P i p e s ,
b u t now b e l i e v e t h a t t h i s p r o c e d u r e i s i n c o r r e c t .
1983)
Use o f t h e n e -
g a t i v e b i o n o m i a l r e q u i r e s t h e a s s u m p t i o n t h a t 100 m l i s a n a t u r a l sampling u n i t .
I t i s t r u e t h a t c o l i f o r m bacteria occur o n l y i n
u n i t s o f one c e l l ;
however,
a r b i t r a r i l y selected. p l e volume,
t h e 1 0 0 m l v o l u m e f o r e x a m i n a t i o n was
I f 1 2 3 . 7 4 m l h a d b e e n s e l e c t e d as t h e sam-
i t w o u l d have been c l e a r t h a t c o l i f o r m d e n s i t y i s a
c o n t i n u o u s v a r i a b l e b e c a u s e an M F c o u n t o f 1 w o u l d h a v e i n d i c a t e d a d e n s i t y o f 0.81 p e r 100 m l . There a r e s e v e r a l c o n t i n u o u s f r e q u e n c y d i s t r i b u t i o n s w h i c h a r e s u i t a b l e f o r d e s c r i b i n g t h e MF c o l i f o r m c o l o n y c o u n t s w h i c h a r e o b t a i n e d i n samples f r o m w a t e r d i s t r i b u t i o n systems.
We h a v e u s e d
t h e l o g n o r m a l d i s t r i b u t i o n b e c a u s e i t i s f a m i l i a r t o some w a t e r works p e r s o n n e l and i t i s c o n v e n i e n t t o w o r k w i t h .
The l o g n o r m a l
d i s t r i b u t i o n c a n be d e s c r i b e d c o m p l e t e l y b y two p a r a m e t e r s w h i c h can be s p e c i f i e d i n two d i f f e r e n t domains. d e n s i t y and Y = logX.
L e t X be c o l i f o r m
Then Y i s n o r m a l l y d i s t r i b u t e d w i t h mean
u and v a r i a n c e u 2. The p a r a m e t e r s i n t h e c o u n t d o m a i n a r e t h e Y Y and t h e g e o m e t r i c s t a n d a r d d e v i a g e o m e t r i c mean, p x = a n t i l o g p Y' t i o n , uX = a n t i l o g u The mean a n d v a r i a n c e o f t h e u n t r a n s f o r m e d Y' 2 d e n s i t i e s a r e CI. = e x p ( p + 1 / 2 o Y 2 ) a n d B = a 2 ( e x p u Y 2 1) reY spectively.
-
I t has a l r e a d y been p o i n t e d o u t t h a t a sample o f w a t e r w i t h a
low c o l i f o r m d e n s i t y i s u n l i k e l y t o p r o d u c e c o l i f o r m c o l o n i e s o n a MF f i l t e r when a 100 m l s u b s a m p l e i s u s e d .
i s 0.1 p e r 100 m l
(1 per l i t e r ) ,
t h e p r o b a b i l i t y o f one o r more
c o l i f o r m s i n a 100 m l sample i s 0 . 0 9 5 2 i s 0.01 p e r 100 m l ( 1 p e r 10 l i t e r s ) ,
so f o r t h .
I f the coliform density
and,
i f the coliform density
t h e p r o b a b i l i t y i s 0.01 and
I f t h e w a t e r i n a d i s t r i b u t i o n system meets t h e r e g u l a -
t o r y c r i t e r i a o f a n a v e r a g e o f no m o r e t h a n 1 p e r 1 0 0 m l ,
t h e geo-
m e t r i c mean i s c o n s i d e r a b l y l e s s t h a n 1 p e r 1 0 0 m l e v e n w i t h a moderately small
uX.
Thus,
i n u s i n g MF c o l i f o r m c o l o n y c o u n t s we
187
are trying t o evaluate a
px
w h i c h i s u s u a l l y much l e s s t h a n a n y o f
t h e c o l i f o r m d e n s i t i e s t h a t we a r e a b l e t o m e a s u r e . There i s a l s o an upper l i m i t t o t h e c o l i f o r m d e n s i t y which can b e m e a s u r e d b y t h e MF m e t h o d .
I f two c o l i f o r m b a c t e r i a l a n d n e x t
t o e a c h o t h e r o n a membrane f i l t e r ,
t h e colonies t h a t they produce
w i l l merge and be c o u n t e d as a s i n g l e c o l o n y .
This e f f e c t i s not
t o o f r e q u e n t a t d e n s i t i e s i n t h e 1 t o 10 c o l o n i e s p e r f i l t e r r a n g e b u t i t becomes m o r e p r e v a l e n t a t h i g h e r d e n s i t i e s . t h e U.
The r u l e t h a t
E n v i r o n m e n t a l P r o t e c t i o n Agency uses t o m i n i m i z e t h i s
S.
e f f e c t i s t o r e c o r d a n y MF c o l i f o r m c o l o n y c o u n t g r e a t e r t h a n 80 o n a s i n g l e f i l t e r a s " t o o n u m e r o u s t o c o u n t " o r TNTC.
T h u s , we
have c o l i f o r m d e n s i t i e s w h i c h a r e " i n d e t e r m i n a t e h i g h " as w e l l as c o l i f o r m d e n s i t i e s which are "indeterminate low." Figure 1 i s a cumulative lognormal frequency d i s t r i b u t i o n P l o t ) f o r uX o f 3 0 a n d u X b e t w e e n l o - '
(Hazen
The h o r i z o n t a l
and
l i n e s r e p r e s e n t samples volumes w h i c h m i g h t be used f o r m o n i t o r i n g Any d e n s i t y l e s s t h a n 1 p e r s a m p l e
water d i s t r i b u t i o n systems.
volume w i l l be i n d e t e r m i n a t e as w i l l any d e n s i t y g r e a t e r t h a n about 80 p e r s a m p l e v o l u m e .
The p o i n t s used t o
a r e r e l a t i v e l y c l o s e t o g e t h e r and long extrapolation.
Thus,
px
estimate
t h e s l o p e ux
i s estimated from a rather
i t i s d i f f i c u l t t o h a v e much c o n f i d e n c e
i n t h e estimates o f t h e lognormal parameters o r even i n t h e select i o n o f t h e l o g n o r m a l as t h e f r e q u e n c y d i s t r i b u t i o n . E s t i m a t i o n o f t h e a r i t h m e t i c mean i s a somewhat d i f f e r e n t p r o blem t h a n e s t i m a t i o n o f t h e lognormal parameters and t h e e s t i m a t e s h o u l d be more p r e c i s e . lower l i m i t o f detection,
However,
the value o f interest i s a t the
the variance o f the densities i s very
l a r g e i n r e l a t i o n t o t h e mean a n d m o s t o f t h e s a m p l e s h a v e i n d e terminate densities.
T h e p r o b l e m o f e s t i m a t i n g a mean v a l u e f r o m
i n d e t e r m i n a t e r e s u l t s has n o t been t r e a t e d a d e q u a t e l y i n t h e s t a tistical
literature.
A l l things considered,
i t m i g h t be w i s e t o
s e l e c t some o t h e r p a r a m e t e r t o c h a r a c t e r i z e t h e m o n i t o r i n g r e s u l t s . E S T I M A T I O N O F FREQUENCY-OF-OCCURRENCE The s e c o n d m i c r o b i o l o g i c a l
MCL r u l e o f t h e U.
S.
D r i n k i n g Water
R e g u l a t i o n s i s an example o f a frequency-of-occurrence rule.
type o f A c o l i f o r m d e n s i t y i s s e l e c t e d as a l i m i t t o d i s t i n g u i s h be-
tween "contaminated" water and "uncontaminated" water. s e n t U.
100 m l .
S.
I n the pre-
R e g u l a t i o n s t h e l i m i t i s s e t a t a MF c o u n t o f 4 p e r
Then a f r a c t i o n i s s e l e c t e d ( i n 5% o f t h e s a m p l e s examined
i n a n y m o n t h ) w h i c h i s a l l o w e d as p o s i t i v e o r " c o n t a m i n a t e d "
188
Percent of Samples with Coliforms .o 1 5% Positive Samples
I
-I
~
/
I
-----
1 per 50ml 1 per lOOml---*-1 per 200ml-
Hazen Plot for
GSD = 30
I
,
GM = .007 10-5
10-6
--
! 2
I
I
5
10
,
1
1
1
1
I
20 30 40 50 60 70 80
I
90
95
98 99
,
I
99.8 99.9 99.! 9
Percent of Samples without Coliforms F i g u r e 1. C u m u l a t i v e Lognormal F r e q u e n c y D i s t r i b u t i o n ( H a z e n P l o t ) f o r C o l i f o r m D e n s i t i e s i n Water samples. The f r a c t i o n p o s i t i v e i s an e s t i m a t o r o f t h e f r e q u e n c y of-occurrence of col iform b a c t e r i a . The U. S . E n v i r o n m e n t a l P r o t e c t i o n Agency i s c o n s i d e r i n g t h e e l i m i n a t i o n o f t h e f i r s t MCL r u l e f o r r e v i s e d d r i n k i n g w a t e r r e g u lations. o f rule.
T h i s would l e a v e o n l y t h e f r e q u e n c y - o f - o c c u r r e n c e t y p e I f t h i s change i s adopted, i t i s l i k e l y t h a t the l i m i t i n g
c o l i f o r m d e n s i t y w i l l be r e d u c e d f r o m ' 4 p e r 1 0 0 ml t o 1 p e r 1 0 0 ml a l t h o u g h t h e 5 % f r a c t i o n p o s i t i v e w i l l p r o b a b l y be r e t a i n e d .
189
The a d o p t i o n o f t h i s a p p r o a c h t o m i c r o b i o l o g i c a l m o n i t o r i n g o f water d i s t r i b u t i o n systems provides several p r a c t i c a l advantages f o r sample e x a m i n a t i o n and f o r parameter e s t i m a t i o n .
It i s easier
and cheaper t o d e t e r m i n e i f c o l i f o r m b a c t e r i a a r e p r e s e n t i n a s a m p l e o f w a t e r t h a n i t i s d e t e r m i n e how m a n y c o l i f o r m b a c t e r i a are present.
The l a b o r a t o r y e x a m i n a t i o n can be a s i m p l e b r o t h
f e r m e n t a t i o n t e s t such as C l a r k ' s P-A t e s t ( C l a r k 1969) and t h e r e d u c e d c o s t p e r s a m p l e c a n make f e a s i b l e samples.
the
e x a m i n a t i o n o f more
The a p p r o p r i a t e f r e q u e n c y d i s t r i b u t i o n f o r f r e q u e n c y - o f -
occurrence i s t h e b i n o m i a l and t h e c a l c u l a t i o n o f c o n f i d e n c e l i m i t s i s r e l a t i v e l y simple.
For instance,
i f 60 s a m p l e s a r e e x -
a m i n e d a n d 3 o f t h e 6 0 ( 5 % ) a r e p o s i t i v e , we c a n s a y t h a t we a r e 95% c o n f i d e n t t h a t l e s s t h a n 10% o f t h e w a t e r i s "contaminated".
O n t h e b a s i s o f t h e s t u d i e s d o n e a t D r e x e l , we h a v e r e c o m m e n d e d t h a t t h e minimum number o f s a n p l e s p e r month r e q u i r e d f o r m o n i t o r i n g be 5 .
T h i s w o u l d t h e n g i v e a t o t a l o f 60 samples i n a 12
month p e r i o d .
The 5% r u l e w o u l d a l l o w 3 o f t h e 60 samples t o be
p o s i t i v e i n any 12 month p e r i o d .
I n a l l p r o b a b i l i t y there would
a l s o be a l i m i t o f no more t h a n one p o s i t i v e sample i n any month and any t i m e t h e f o u r t h p o s i t i v e sample t u r n e d up i n any 1 2 m o n t h period,
a n d MCL v i o l a t i o n w o u l d b e r e c o r d e d w i t h o u t w a i t i n g u n t i l
t h e end o f t h e y e a r o r even t h e end o f t h e month. This approach t o m i c r o b i o l o g i c a l monitoring o f small water systems b r i n g s up a g a i n t h e q u e s t i o n o f t h e logical water quality.
1s
i t
persistence
o f microbio-
reasonable t o try t o characterize
t h e m i c r o b i o l o g i c a l q u a l i t y o f t h e water i n a d i s t r i b u t i o n system o v e r a p e r i o d o f a y e a r o r even o v e r a p e r i o d o f a month? present time,
A t the
t h e r e i s no good b a s i s f o r a n s w e r i n g t h a t q u e s t i o n .
T h i s p r o b l e m seems t o b e a n i n t e r e s t i n g o n e f o r a t i m e s e r i e s a n a l y s i s approach. EXAMPLE - S Y S T E M WH An e x a m p l e o f some o f o u r s t u d i e s o n m i c r o b i o l o g i c a l
monitoring
o f w a t e r d i s t r i b u t i o n s y s t e m s i s b a s e d o n s e v e r a l samplings o f Woodbury H e i g h t s , 3,600
people.
New J e r s e y .
T h i s s y s t e m serves a p o p u l a t i o n o f
The w a t e r i s s u p p l i e d f r o m a w e l l a n d t h e o n l y
treatment i s chlorination. A s u m m a r y o f o u r s a m p l i n g d a t a f o r s y s t e m WH i s g i v e n i n T a b l e
2.
P e r i o d I was t w o w e e k s i n A p r i l 1 9 7 9 ,
i n May 1 9 7 9 ,
P e r i o d I 1 was t w o weeks
P e r i o d I 1 1 was t w o weeks i n J u n e 1 9 8 1 ,
f o u r weeks i n A u g u s t 1983 and P e r i o d
P e r i o d I V was
V was f o u r w e e k s i n O c t o b e r
Table 2 C o l i f o r m Sampling Data f o r System WH Sampl i ng Period
Number o f 1 0 0 ml S a m p l e s ___-
Total
Positive '
9D 9E PerGd I 9F 9G P e r G d I1
46
4
90 -
4
136
8
126 172 298
45 -
~
168 174
31
76
10
Fraction Positive
Frequency-ofOccurrence (95% C.I.)
0.01-0.11
0.25 0.26 0.26
0 . 1 7 - 0 . 32a 0.20-0.33, 0.21-0.31
0.06 0.16 0.11
0.02-0.10, 0.11-O.2la 0.08-0.14
342
3E 3F 36 3H P e r f i d IV
55 52 35 63 __ 205
1 0 0 1 2
0.07.
0 0 0.02 0.01
<0.05 ~0.04 ~0.05
50 49 45 60
6 1 2
0.12 0.02 0.04 0.02 0.05
0.03-0.21 <0.06
3s 3T 3v 3x Perfid V a.
204
38
1 10
Average
45
0.98 0.16 0.43
13.84 0.99 5.42
0.03-0.83
1.29 1.81 1.59
25.85 39.76 34.06
0.39-2.1ga 0.87-2.75, 0.92-2.26
>536
>O. 71 >2.40 >1.57
>39.43 >117.79 >79.78
2
0.04
0.04
< o . 09
S i g n i f i c a n t c h a n g e from p r e c e e d i n g p e r i o d [week o r m o n t h )
14 59
162 312 474
>119 >=
-
1
-
Variance
Mean Col i f o r m Density (95% C . I . )
Total
0.09 0.04 0.06
1G 1H P e r z d I11
28
MF C o l i f o r m C o l o n y C o u n t
-
-
3
0.02 0.01
0.02 0.01
29 1 2 1 33
0.58 0.02 0.04 0.02 0.16
0.46 0.02 0.03 0.02 0.11
-
<2.09 <0.36
-
~1.51 <0.06
10.10
<0.06
191
isolated A r e a s o f Woodbury Heights Distribution S y s t e m F i g u r e 2. F r a c t i o n s P o s i t i v e f o r t h e I s o l a t e d Areas f o r Sampling P e r i o d I 1 1 Show a S i g n i f i c a n t l y L o w e r C o l i f o r m O c c u r r e n c e i n t h e C e n t r a l Area
1983.
The n u m b e r o f
p o s i t i v e samples l i s t e d i n column 3 i s t h e
n u m b e r o f s a m p l e s f o r w h i c h t h e MF c o l i f o r m c o l o n y c o u n t was 1 o r m o r e f o r a 100 m l s a m p l e .
The f r a c t i o n o f t h e samples p o s i t i v e i s
g i v e n i n column 4 and t h e f r e q u e n c y - o f - o c c u r r e n c e , confidence i n t e r v a l f o r t h e f r a c t i o n p o s i t i v e ,
5.
which i s the 95%
i s g i v e n i n column
The t o t a l MF c o l i f o r m c o l o n y c o u n t s f o r t h e p o s i t i v e s a m p l e s
a r e g i v e n i n column 6. t h e s a m p l e s was "TNTC".
The > s i g n i n d i c a t e s t h a t a l e a s t one o f Those r e s u l t s w e r e i n c l u d e d i n as >80 p e r
1 0 0 m l i n t h e c a l c u l a t i o n o f t h e means a n d v a r i a n c e s .
The a v e r a g e
MF c o l i f o r m c o l o n y c o u n t s a r e g i v e n i n c o l u m n 7 a n d a r e u s e d t o e s t i m a t e t h e mean c o l i f o r m d e n s i t y o f t h e w a t e r d i s t r i b u t i o n s y s tem.
When a > 8 0 c o u n t o c c u r s ,
t h e mean d e n s i t y i s n o t e s t i m a t e d
because o f t h e i n a c c u r a c y due t o h i g h range d a t a t r u n c a t i o n .
192
Table 3 D i f f e r e n c e s Among I s o l a t e d A r e a s o f WH D i s t r i b u t i o n S y s t e m
~
S a m p l in g Period
~
_
Number o f 1 0 0 m l S a m p l e s
Total Positive East 42 4 0 Central 14 Southwest 36 4 North 44 0 _______ I 1 East 74 16 Central 72 23 Southwest 78 21 North 74 16 111 East 33 0 Central 74 2 Southwest 84 12 North 106 16 I V East 33 0 Centr a1 38 0 0 Southwest 25 North 109 2 V East 91 4 Central 25 3 Southwest 23 0 North 65 2 a - S i g n i f i c a n t d i f f e r e n c e s a t 5% l e v e l .
I
_ x2
for Contingency Table 6.41
2.83
7.9ia
1.78
3.95
There a r e s i g n i f i c a n t changes i n t h e f r e q u e n c y - o f - o c c u r r e n c e f r o m week t o week a n d f r o m m o n t h t o m o n t h .
The m o s t s t a r t l i n g
c h a n g e o v e r t i m e was P e r i o d I t o P e r i o d I 1 ( a l s o w e e k 9 E t o w e e k 9F).
T h i s c h a n g e was a l s o d e t e c t e d a s a s i g n i f i c a n t d i f f e r e n c e i n
t h e mean d e n s i t y p a r a m e t e r .
However,
the microbiological water
q u a l i t y r e m a i n e d c o n s t a n t f o r a t l e a s t f o u r weeks a t a t i m e i n 1 9 8 3 a n d p o s s i b l y f o r as l o n g a s f o u r m o n t h s ( P e r i o d s I V a n d V ) . T h e WH w a t e r d i s t r i b u t i o n s y s t e m was d i v i d e d i n t o i s o l a t e d a r e a s as shown i n F i g u r e 2.
Since t h e w e l l and standpipe a r e b o t h i n
t h e E a s t a r e a t h e f l o w o f w a t e r is a l w a y s f r o m E a s t t o C e n t r a l a n d from C e n t r a l t o N o r t h and Southwest areas.
These i s o l a t e d a r e a s
w e r e p r o d u c e d b y t h e f a c t t h a t t h e c o m m u n i t y was d i v i d e d b y a t u r n pike,
a r a i l r o a d , a major highway,
and a p r i v a t e s c h o o l .
The
c o l i f o r m d a t a f o r each o f t h e sampling p e r i o d s a r e d i s t r i b u t e d among i s o l a t e d a r e a s i n t a b l e 3 .
T h e r e was o n e p e r i o d w i t h a s i g -
n i f i c a n t d i f f e r e n c e among t h e i s o l a t e d a r e a s .
This suggests t h a t
i t may b e d e s i r a b l e t o m o n i t o r t h e i s o l a t e d a r e a s s e p a r a t e l y . c o u l d b e a rationale
This
f o r r e q u i r i n g more samples p e r sampling p e r i o d
f o r l a r g e r water d i s t r i b u t i o n systems.
193
SUMMARY
Routine m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a f o r r e g u l a t o r y purposes has r e s u l t e d i n t h e r e c o g n i t i o n o f several interesting s t a t i s t i c a l
problems.
A reasonably l a r g e
b o d y o f s t a t i s t i c a l . l i t e r a t u r e d e a l i n g w i t h some o f t h e s e p r o b l e m s has d e v e l o p e d o v e r t h e l a s t 75 y e a r s b u t t h e r e a r e s t i l l p r o b l e m s which need a d d i t i o n a l study. One p r o b l e m i s t h e e s t i m a t i o n o f mean c o l i f o r m d e n s i t y .
Most
o f t h e sample r e s u l t s a r e i n d e t e r m i n a t e , e i t h e r t o o h i g h o r t o o low f o r t h e a v a i l a b l e methods t o measure a c c u r a t e l y .
A technique
f o r estimation o f an average from indeterminate values would prob a b l y be o f g r e a t b e n e f i t i n s e v e r a l d i f f e r e n t s c i e n t i f i c f i e l d s but,
short o f that,
i t i s p r o b a b l y b e t t e r t o u s e some p a r a m e t e r s
o t h e r t h a n mean d e n s i t y f o r r e g u l a t o r y p u r p o s e s . The n o r m a l r e p o r t i n g p e r i o d f o r m i c r o b i o l o g i c a l d r i n k i n g w a t e r q u a l i t y i s one month,
b u t t h e r e i s no s c i e n t i f i c b a s i s f o r u s i n g
a m o n t h r a t h e r t h a n a week o r a y e a r .
The q u e s t i o n o f t h e p e r s i s -
tance o f m i c r o b i o l o g i c a l w a t e r q u a l i t y i n a d i s t r i b u t i o n system needs t o be i n v e s t i g a t e d f u r t h e r i n r e l a t i o n t o r e g u l a t i o n and monitoring. Under p r e s e n t r e g u l a t i o n s i n t h e U n i t e d S t a t e s , t h e number o f samples p e r month r e q u i r e d f o r m i c r o b i o l o g i c a l m o n i t o r i n g runs f r o m 1 p e r month t o 500 p e r m o n t h and i n c r e a s e s w i t h i n c r e a s i n g s i z e o f t h e w a t e r d i s t r i b u t i o n system.
Sampling t h e o r y suggests
t h a t t h e number o f s a m p l e s needed i s n o t r e l a t e d t o t h e s i z e o f t h e system.
T h e r e may b e d i f f e r e n c e s i n m i c r o b i o l o g i c a l w a t e r
q u a l i t y among d i f f e r e n t a r e a s o f a w a t e r s y s t e m a n d l a r g e r s y s t e m s p r o b a b l y have g r e a t e r
heterogeneity
.
This greater
heterogeneity
may b e a r a t i o n a l e f o r r e q u i r i n g m o r e s a m p l e s f o r l a r g e r s y s t e m s . T h i s a l s o needs f u r t h e r s t u d y . REFERENCES C h r i s t i a n , R . R. a n d P i p e s , W. O . , 1983. Frequency D i s t r i b u t i o n o f C o l i f o r m s i n Water D i s t r i b u t i o n Systems, Appl. E n v i r o n . M i c r o b i o l . , 45: 603-609. H . , 1969. The D e t e c t i o n o f V a r i o u s B a c t e r i a I n d i c a t i v e Clark,. o f W a t e r P o l l u t i o n b y a P r e s e n c e - A b s e n c e ( P - A ) P r o c e d u r e , Can. J . M i c r o b i o l . , 15 771-780. E l - S h a a r a w i , A. a n d P i p e s , W. O . , 1 9 8 2 . E n u m e r a t i o n and S t a t i s t i c a l I n f e r e n c e , pp. 43-65, I n : Bacterial Indicators o f Pollu3, ( W . 0 . P i p e s , e d . ) , C R C P r e s s , B o c a R a t o n , FL. P i p e s , W. 0. a n d C h r i s t i a n , R . R . , 1 9 8 2 . Sampling FrequencyM i c r o b i o l o g i c a l D r i n k i n g Water R e g u l a t i o n s , EPA 570/9-82-001, O f f i c e o f D r i n k i n g W a t e r , U. S . E n v i r o n m e n t a l P r o t e c t i o n A g e n c y , W a s h i n g t o n , DC.
MODELLING OF BACTERIAL POPULATIONS AND WATER QUALITY MONITORING IN DISTRIBUTION SYSTEMS A. MAUL', 'Centre
A.H.
EL-SHAARAWI'
and J.C.
BLOCK'
des Sciences de 1'Environnement
University of Metz, France 'National
Water Research Institute, Burlington, Ontario, Canada
ABSTRACT Bacteriological surveys were performed on the drinking water distribution system of the city o f Metz in France according to a
systematic
temporal
sampling
distribution
design of
to
determine
heterotrophic
the
spatial
bacteria
in
and the
network.
A non-hierarchical nearest-centroid clustering method
was used
for dividing the water distribution system into zones
corresponding to different levels o f bacterial density.
Since
the frequency distributions o f microorganisms within the zones could be modelled water
by
the negative binomial
distribution system
studied may be
distribution, the
considered
composed o f several heterogeneous subsystems.
as being
Information o n
the spatial and temporal variability of bacteriological data is used
to
develop
a
sampling
quality monitoring.
design
for
use
density of the water is
given
which
design. the
future
water
Under the assumption that the objective of
monitoring is to determine whether or not
the mean
bacterial
exceeds a specific standard, a criterion
determines
stations allocated
in
to each
the
optimal
number
of
sampling
zone in case of a one-run sampling
These stations are determined by assuming that either
risk
of
sampling
(i.e.,
making
the
wrong
decision)
is
prespecified or that the total number o f stations to be sampled is predetermined.
1.
INTRODUCTION Sampling
programs
designed
to
monitor
or
to
study
the
bacterial density in water distribution systems usually involve
195 the
collection
locations
and
objective
of
of
a
over
number
an
public safe
bacterial
quality
specify
( i )
a
and
maximum of
( i i ) a threshold
v a l u e of
1976). the
period
basic
have
biological
been
counts
may
(Colwell 1981)
al.,
et
and
organisms/mL) Drinking
Water
plate
even
Regulations
Community
(EEC)
that
arithmetic
mean
the
should not Prior
exceed to
monitoring
Where
to
samples be erroneous the
fact, the
drinking
level.
guidelines
depends
(A),
on
(i)
( i i )
the
variability Pipes a
and
given
quality the
of
the
the
temporal
in
data al.,
et
Interim
500
Primary
the
European
water
states
(i.e.,
mean of
1982).
program
water
of
during
the
that
on
bacterial
both
in
Therefore,
6
of and
the
of
at
a
water
bacterial
programs the
water
( i i i )
(Esterby,
the
the
when,
monitoring
that
the
a ) and
risk
density
monitoring depends
two
monitoring the
bacteria
shows
nature,
8).
collected,
of
This for
largely
variation
controlling
samples
the
violating
risk,
risk,
bacterial
distribution
should
regulations
probability
true
number
is
producers'
of
the
samples?
is declaring that
microbiological
consist the
this
o€
first
consumers'
the
When
for
to
How many
( i i i )
the
program
answers
(i)
system
with
violated
the
water,
and
the
are
Christian,
of
methods
(i.e.,
sampling
the
(i.e.,
of
However,
sampling
be
bacterial
Means
drinking
required:
samples?
not
should
specific
a
possible:
true
quality
of
compliance
objective
water
to
quality
Further,
drinking
are
water
is
it
i s not
this
other
1978;
U.S.
for
EPA,
U.S.
continues
water
any sampling d e s i g n
are
declaring
basic
(1976).
of
the
For
the
when
is
second
collect
taken?
regulation
1978,
(Y) f o r h e t e r o t r o p h i c b a c t e r i a a t 2 0 ° C
quality
of
the
a
over a
averaged
limitation
in
preparation
decisions
quality
exceeding
coliforms) or/and
heterotrophic
al.,
regulation
following three questions ( i i )
samples
100 organisms/mL.
the
the
thus
potability,
et
proposed
Economic
most
Canada,
count
ensure
regulations
(e.g.
complementary
McFeters
total was
water
major
water,
enumeration
instance,
useful
1978;
a
of
Welfare
of
For
provide
or
various a
to
drinking
bacteria
and
test
advanced.
is
microbiology
although coliform
at
Since
t h e mean s a m p l e c o u n t
(Health
However,
time.
guidelines
indicator
samples
of
reliable
proportion
particular value
specified
water
period
health
bacteriologically water
of
extended
the 1982;
efficiency of
microbiological the
spatial
populations
in
and the
196 systems
sampled.
population
as
1976) is not setting a the
Moreover,
in
always advisable.
(El-Shaarawi Clearly, the
degree
this
of
of
an
primary
is
paper
aim of
of
assumption
of
water
distinct
a
in
structured
distribution
regions
of
bacterial
heterogeneity observed
that
need
the
determining to
6,
risk
different of
and
zones
sampled ( i i ) of
is
stations
which
be
in
a
w i l l
becomes a v a i l a b l e ,
presented
the
and
in
showing
by
a
underlying
composed
be
of
the
examined.
in this and
the
run
the
several negative
allocating
The
number a
the
of
stations
given
level
stations total
sequential
information
patterns
study a r e then used
for
Further,
new
this
spatial
d e n s i t y assuming the
prespecified. takes
of
in
sampling
populations
modelled
single
optimally
analysis
with
an o b j e c t i v e c l a s s i f i c a t i o n
location
bacterial
continually
2.
the
and the
being
(i.e.,
system i n t o zones w i l l
(i)
system
heterogeneity
as
the water
population.
setting
picture
in
is
water
correlated
particular,
of
for:
in
the
be
bacterial
d i s t r i b u t i o n ) by means o f
parameter
in
s i z e of
to
EPA,
the
water
spatial
system
(U.S.
bacteria
clear
In
samples
quality
statistical
a
system.
important
might
the
of
drinking water
factor
the
give
number
regulations
the
heterotrophic
heterogeneous
binominal
not
population
indirect
to
variation
the
and
the
heterogeneity
distribution
whole
the
The most
of
1985)
al.,
size
The
temporal water
patterns
represents
design. in
et
the
it
relating
water quality
sampling program t o monitor
dispersion
case
U.S.
the
into
of
to
the
number
sampling,
account
it
as
a l s o be d i s c u s s e d .
MATERIALS A N D METHODS
2.1
Sampling s t a t i , o n s and c o l l e c t i o n o f w a t e r The
tion
s a m p l i n g was
system of
the
coming
from
water portion
of
the
confined t o c i t y o f Metz the
same
distribution
conducted
covers
the
districts
of
the
city
samples
were
an
Metz
of
the
samples water
distribu-
i n F r a n c e s e r v e d e x c l u s i v e l y by
treatment
plant.
system
which
northwestern of
area
to
end
and
including
all
a
The this of
the
small
few
enclaved study
was
southern adjoining
communities. Water number
of
systematic surveys During
sites
were each
spread
sampling thus survey,
collected over
design
performed
during
the
study
(Figure from
a 3 hr. p e r i o d area
1).
December
102 s a m p l e s were
according
Six
1983
taken with
from
a
to
a
bacteriological to
June
1984.
s t e r i l e 5 0 0 mL
197
X .-v)
a,
5 2
E .c
m
m
c
-0 a,
m
5
3
v)
C .-
a)
5
0
L
198 glass
bottles
solution)
flushed
at
with a
full
from
taps
pressure
of
as
were
Long
transported
within
s i x hours
2.2
Sample p r o c e s s i n g After
the
dilutions percent
the
NaCl
low n u t r i e n t
agar
at
the
shaken
for
72
then
obtain water
at
and
ambient
processed
each survey.
serial
made
through
ten-fold
sterile
in
mL p o r t i o n s of
um pore
( R e a s o n e r and
20°C
thio-
flamed
samples held
manually,
filtered 0.45
Hawg,
sodium
to
laboratory
were
t o 0.01
Ten
percent previously
necessary
Bottle to
samples
then
were
(Millipore,
incubated
were
original
solution.
samples
filters
as
from t h e b e g i n n i n g o f
bottles
of
3
a
were
which
stabilized temperature.
temperature
water
1 mL
(containing
sulfate
the
sterile
membrane
s i z e ) , d e p o s i t e d o n R-2A
Geldreich,
1985)
The
hours.
to
be
bacterial
finally density,
e x p r e s s e d as t h e number o f
heterotrophic bacteria contained
1
sample,
of
mL
the
combination
of
original the
available
was
counts
calculated
in
p r o b a b i l i t y d i s t r i b u t i o n s and
t h e number of b a c t e r i a
negative of
(Maul e t
in the
dilutions
al.,
1981).
S t a t i s t i c a l analyses Fitting
If
from
successive
a c c o r d i n g t o t h e maximum l i k e l i h o o d m e t h o d
2.3
0.8
initial
binomial
parameter
i n t h e s a m p l e s i s assumed
distribution
(Fisher,
19411,
the
estimation. to
follow a
probability
f i n d i n g r o r g a n i s m s i n a sample i s g i v e n by
This
distribution is specified
mean
of
the
distribution
is
by
the
=
pk.
A,
p a r a m e t e r s p and The
k.
The
maximum-likelihood
e s t i m a t e k for k s a t i s f i e s t h e e q u a t i o n
where
7
i s t h e a r i t h m e t i c mean o f
b a c t e r i a l c o u n t s from n independent A
likelihood
estimate of
goodness-of-fit
,..., r n
rl
samples.
-
p is as p = r/k.
statistic
which are t h e T h e maximum-
The c h i - s q u a r e
( S n e d e c o r and Cochran,
1967) w a s used
199 test
to
the
represent If
each
binomial
1,2
adequacy
the
of
sets
1,
of
distribution
,..., a ) ,
used
of
negative
the
test
to
estimation
of
data
with
likelihood
are
distribution
ratio
the
equality
the
common
represented
parameters
of
test the
value
a s s u m i n g t h e n u l l h y p o t h e s i s Ho ki's).
binomial
to
the data.
ki's.
(kc)
sets
equality of
estimate,
kc,
( i
=
can
be
requires
II
the
negative
k;
1968)
This
of
a
and
(Lindgren,
(i.e.,
The m a x i m u m - l i k e l i h o o d
by
pi
the
of
data
the
f o r k,
s a t i s f i e s the equation
w h e r e 7i i s :he which
are
of
set
on
a
computer
rejected it
the
data.
is
(i.e.,
of
(i)
k.
starting
is
k
by
the
the
p. d a t a to
with
determine
two
(iii)
if
test
then
the the
as
the
process of
test
difference
not
between
stopped.
The
where
test
the
is
the
k
is
values
is
k
common
at of
of
a
data
sets
The
two
values
the
three
k; ( i v ) as
long
significant
the
number
the
two
time, a
obtained
accepted.
to
(ii)
prespecified
for
a
a
follows:
values of
once
the
then
as
those
at
set
(v)
is
Ho
k),
order;
k's
smallest
estimate of m represents for
is
correspond
adding one d a t a
ith
d i s t r i b u t i o n s with
equality
significant;
m
m,
equality
three
When
this
equality
the
numerically
common
ascending
which
for
test
a
of
doing in
sets
data
and
k
for
on
solved
have
subset
ki's
dats s e t s corresponding to the continue the
a
used
samples
mi
method.
sets do n o t
different
level;
accepted,
the
( 3 ) can be
t o m negative binomial
the of
from
and
Newton-Raphson
procedure
the
values
significance of
(2)
fitted The
arrange
lowest
counts
Equations
interest
which can be common
bacterial
...,r i m i ;
ril, ri2,
a r i t h m e t i c mean o f
process of
data
estimate
is
sets
of
kc
i s o b t a i n e d from e q u a t i o n ( 3 ) . Correlation analysis. (T)
was
calculated
v a r i a b i l i t y between the Clustering
Kendall
to
rank the
correlation coefficient reproducibility
of
the
surveys.
method.
c l u s t e r i n g m e t h o d was
study
A
used
non-hierarchical to divide
the
nearest-centroid
stations
in
the
water
200
distribution system into
sets.
When
these sets are given on
the map o f the distribution system, they will be called zones. The
clustering method
is
given
in detail
in
Anderson
et
al.
(1984). Sampling strategy. binomial
If a system is modelled by the negative
distribution with
probability
parametrs
p
and
k
(A
=
pk),
the
of accepting that the water in the network is
(PA)
in state o f control
(e.g.,
not
exceeding the
100 bacteria/mL
EEC standard) is given as
l1 oon r=O
=
PA
which
is
(nk + r-l)! (nk-l)!r!
Pr (1 + p)"-
function o f
the
a
number
(4)
of
samples collected,
n.
El-Shaarawi et al. (1985) approximated PA as 100-x P
A
=
$
(5)
)
(
JX/n'(l+X/k where $ ( z )
--
from
to
is the area under the standard normal distribution Hence, the estimate o f the number of samples (n)
Z .
level, 8 ,
to be collected, when PA is set at a prespecified
is
given as n
hz20 (l+X/k)
->
where
(100
- A)*
zo
is
probability
the
level
normal
8.
inversely related to k.
From
variable formula
corresponding
(6)
it
to
appears that
the n
is
Reciprocally, equations (4) or ( 5 ) can
be used to calculate the risk 6 associated with a given number of samples, n. More generally, if a system can be divided into 11 zones with
the
negative
model
for
the
above
formula
binomial
distribution
dispersion o f bacteria permits
the
in
determination
samples to be allotted to each zone. all
representing each of
suitable
the
the
number
of
Suppose 6 is fixed
for
the z o n e s , the number (ni) of samples needed
zone can then be calculated
a
zone, then
from the ith
directly from equation ( 6 ) .
How-
ever, for administration purposes or monetary constraints, the total number of samples ( N ) to be collected, might be fixed and
201 the
problem c o n s i s t s
n
=
i
of
d e t e r m i n i n g how
to
The o p t i m a l a l l o c a t i o n i s o b t a i n e d
zones.
d i v i d e N among
the
(7)
from e q u a t i o n
1 + A / k 21
N
T
(7)
1
1
( 1 + A/ki)
i=l i s t h e parameter a s s o c i a t e d w i t h zone i .
w h e r e k;
The r i s k 6 c o r r e s p o n d i n g t o s u c h a d e s i g n i s g i v e n a s
- x
100
Sequential sampling. in
a
are
system, to
be
modelled
reported
monthly),
then
determine
whether
regulations. out
the
sequential the
The
first
negative
every
:
A
H1
:
A
L
sampling
task
sequential
A1
200,
in
the
time
be
performed
compliance sampling
l e v e l of
(say to
with
the
is
set
to
the population's
less
than
6/1-a,
true
is
as
showed
the
than
case t h i s
B(n)
soon
that
than
and
less
A(n)
and A 1 c a n b e c h o s e n a s
A0
100 and
respectively).
as
the
rejection
accumulated
or
data
acceptance
of
provide
Ho
is
a
very
sampling i s continued.
(1947)
is
< >
in
distribution, of
s a m p l i n g a l l o w s a d e c i s i o n t o b e made o n t h e mean
indication
true
present
can
is
(e.g.,
density
unlikely i f Wald
period
system
A0
Sequential bacterial strong
binomial
specific
namely
(A),
Ho
Tn
by
for
two o p p o s i n g h y p o t h e s e s a b o u t
mean
Tn
t h e r e s u l t s of a m o n i t o r i n g program
If
and
that
6
if risk
a
i f
the the of
r i s k of
a c c e p t i n g Ho w h e n H1
likelihood accepting
Kn
is equivalent
is to
larger
ration
H1 w h e n than
(Rn)
is
actually
1-B/a.
is
less
HO i s In
the
N
0
N
1.0-
/
0.9-
.--
________
0
JAN FEB APR MAY JUN
10 21 3 15
26
I
I
I
I
1
2
3
4
Number of bacteria per mL (log scale)
FIG. 2 Observed cumulative frequency distributions for the Metz water distribution system data.
/"
203 where
is
Tn
the
cumulative
f i r s t n samples, w i t h A(n)
number
of
bacteria
found
in
the
and B ( n ) g i v e n as
where
The
sampling
process
has
to
be
continued
until
Tn
falls
o u t s i d e t h e i n t e r v a l d e t e r m i n e d by A(n) and B ( n ) .
RESULTS A N D APPLICATIONS
3.
Characterization
3.1
of
the
spatial
and
the
temporal
v a r i a b.i 1 i t y The e m p i r i c a l surveys
(i.e.,
May 1 5 a n d the
cumulative distribution
December
June
chi-square
26)
binomial
were
not
significant
unsuitability of in
s p a t i a l reproducibility of the
different
of
A l l
of
probability
stable
water
divide
the
data
surveys
coefficient
surveys.
percent
from
at
the
the
six
21, A p r i l
first
the
can
the
level,
the
and
3,
last
by
calculated
into
The
is
It
distribution
density
rank
for
pair
each
at
zones
on
the
1
bacteriologically
appropriate
application
in
Kendall
significant a
This
The d e g r e e o f
bacterial
reflecting
system.
surveys.
of
were
T'S
thus
binomial
assessed was
level.
surveys.
pattern be
15
1 percent
negative
(T) which
distribution
six
the
Only t h e v a l u e s o f
for
the
four of
d i s t r i b u t i o n system the
February 1.
for
the probability distribution is negative
for describing the data
correlation
test
goodness-of-fit
assuming t h a t
the
10,
are given i n Figure
surveys,
shows
January
13,
functions
the of
then
basis
the
of
to the
clustering
method t o a l l t h e d a t a a d e q u a t e l y d i v i d e d t h e w a t e r system i n t o four
z o n e s a s shown i n F i g u r e
this
clustering
total
(R)
variations.
density,
includes
Zone
53
2.
The v a r i a b i l i t y e x p l a i n e d by
represented
1,
stations
sampled and h e n c e c o v e r e d most
more
the from of
than
zone a
80
of total
t h e a r e a of
percent
lowest of
102
of
the
bacterial stations
t h e network.
The
204
TABLE 1.
Number of stations, mean, estimates of the parameters of the negative binomial distribution and goodness-of-fit statistic for each combination of survey and zone.
--
Date
Zone
Number of stat ions
Parameters of the negative binomial distribution
Mean, r
P Dec. 13
Jan. 10
Feb. 21
Apr. 03
June 26
* **
Degrees of freedom
x2
-
1
52
36.81
105.54
0.34877
3
8.60*
2
31
668.23
1224.24
0.54583
1
0.54
3
14
561.64
1072.43
0.52371
4
3
3820.00
53.21
71.78516
1
53
24.21
68.53
0.35324
3
10.53"
2
31
852.65
2676.59
0.31856
1
2.79
3
15
233.20
448.11
0.52041
4
3
9420.00
10.78
874.22991
1
52
44.08
127.08
0.34683
3
4.88
1
4.39"
2
31
627.94
1547.44
0.40579
3
14
537.58
1206.38
0.44561
4
3
13433.33
487.67
27.54602
1
52
35.92
66.95
0.53659
3
5.22
2
31
284.64
308.46
0.92280
1
2.27
3
15
787.39
2266.19
0.34745
4
3
7833.33
1090.56
7.18284
1
51
87.10
128.98
0.67527
3
11.08*
2
30
446.70
468.96
0.95253
1
2.22
3
14
577.72
626.41
0.92227
4
3
2669.67
45.20
59.05907
May 15
Goodness-of-fit statistic
-------
, -
1
52
197.60
420.58
0.46982
3
2.31
2
31
3168.90
2387.71
1.32717
1
1.84
3
15
291 19.99
3305.08
8.81069
4
3
7633.33
1088,.53
7.01250
-
Value is significant at the 5% level. Value is significant at the 1% level.
-I-
205
26
FIG. 3 Temporal variation of the bacterial density for each zone.
zones of higher density consisted and e a s t e r n e n d s o f The for zone.
Thus,
the
(calculated
t h e number o f
was
significant
the
negative
the
zones.
to
at
negative been
s t a t i o n s n,
Chi-square
binomial
s u r v e y and the
None o f
level.
d i s t r i b u t i o n may
heterotrophic
each
the
the negative binomial
goodness-of-fit
1 percent
within
t h e mean Y = A ,
of
zone.
distribution
reconsidered
bacterial
i n Table 1 for
the values
This be
(x2)
statistic
1 and 2 o n l y ) are p r e s e n t e d
binomial
represent
the has
the parameters
f o r zone
each combination of
model
of
data
e s t i m a t e s p and k o f d i s t r i b u t i o n and
the northwestern
the city.
appropriateness
describing
primarily of
taken
of
indicates as
counts
a
x2
that
suitable
within
the
206 The t e m p o r a l the
various
variation
c l e a r l y shows t h e in
the
system
emphasized the
uniformally
indicate
by
3.2
an
in
the
in
the
words,
the
zones
graph
population
it
However,
consistency
within
The
bacterial
survey.
change i n b a c t e r i a l
rather
density
3.
Figure
must
be
trajectories
i n t e r a c t i o n between the
other
the
seem t o behave
mean b a c t e r i a l
increase of
lack of
in
the
last
the
the
effect:
of
illustrated
general
for
that
centroids
time
is
zones
zone
are
density
and
not
and,
of the
affected thus,
they
variability
w i l l
i n d e p e n d e n t l y from each o t h e r .
Sampling s t r a t e g y f o r f u t u r e .data c o l l e c t i o n Available
information
now b e u s e d
-
One-run
2
Table
presents
estimated, test
number
kc,
to
samples of
1 only. 2
zone
bacterial
maximum-Likelihood
the
ki's
However,
and
zone
this
common
each
the
has been chosen t o c h a r a c t e r i z e
then
formula
that
should
6 and X
for (n)
as
is
shown
a
(6) be
in
The
dictate
to
results
been
zone
was
k
assigned
the
the
number
provided
another
total
of
these
by n g i v e s
w h i l e column headed
Table
3.
for
the
second
k;
(i.e.,
of
using
the
Assuming
zone,
stations
the values
samples
way,
number
of
that
from formula
calculations in
The
assigned
t o each
levels
that
for
zone 4.
t h e o p t i m a l a l l o c a t i o n among t h e
summarized
of
assuming d i f f e r e n t Seen
be
observed
of
4.
low.
accepted
could
The minimum number
the r i s k 6 is obtained
2 are
lowest
determine
t o each
k,
Figure
i s N,
column headed
0.05,
used
given.
function of
to equalize
Figure
is
for k has
allocated
are
constraints collected
value
too
approach
7.01250)
specific
the
k
4 since the
zone
was
zone for
in
a
while
of
s i g n i f i c a n c e of
for
zone
value
following
paper
the
except
within
a
3
in
estimate
t h e maximum-likelihood
section
When
this
ki's,
involved
the
zone,
f o r e a c h zone and
comparing
homogeneity zone
the
s u r v e y and
€or kc
for
of
the
future data collection.
design
each combination of
the
about
for designing
required
€or 6 a n d A , if
practical
samples
to
be
zones i n order (7).
zones
shown
in
X i s s e t a t 200,
the allocation corresponding t o 6 = by n p r e s e n t s t h e
optimal allocation
207 TABLE 2 .
Zone
1
Maximum-likelihood estimate of k for each combination of zone and survey, maximum-likelihood estimate of k, for each zone and significance o f the test for homogeneity. Survey
ki
Dec 1 3
0.34877
Jan 1 0
0.35324
Feb 2 1
0.34683
3
0.53659
May 1 5
0.67527
Jun 2 6
0.46982
Dec 1 3
0.54583
Jan 10
0.31856
Apr
Test for the Homogeneity of the k i t s (significance)
kc
0.43927
0.100
P
.-
2
------
Feb 2 1
0.40579
Apr
3
0.92280
May 1 5
0.95253
Jun 2 6
1.32717
<
0.001
0.52371
Jan 1 0
0.52041
Feb 2 1
0.44561
3
0.34745
May 1 5
0.92227
Jun 2 6
8.81069
Dec 1 3
71.78516
Apr
4
P
---Dee 1 3
3
0.65823
Jan 1 0
874.22991
Feb 2 1
27.54602
3
7.18284
May 1 5
59.05907
Jun 2 6
7.01250
Apr
0.84629
41.54600
P
<
0.001
208
18C
16C
14C
x
= 200 ___.___._. = 150
120 Ul
a, Q
E, 100 (I)
c
0
2 L
80
5
z
60 40
20
0 k
FIG.4
Number of samples to be collected as a fraction of k assuming different levels for the bacterial density h and the risk p
TABLE 3.
Optimal
a l l o c a t i o n of
sampling
stations
to the water
d i s t r i b u t i o n s y s t e m of t h e c i t y of M e t z .
Zone
k
Number
of S a m p l i n g
St at i o n s
1
0.43927
52
24. 70
35.96
2
0.50514
31
21.48
31.28
3
0.51945
15
20.89
30.42
4
7.01250
3
1.59
2.33
209 of
100
=
(6
samples
0.0235).
The
efficiency
of
the
1983-84
s a m p l i n g d e s i g n c a n b e e v a l u a t e d by c o m p a r i n g t h e v a l u e s of or n w i t h
those
n.
Table
3 shows
the
Metz
water
detect
0.95
X
(i) at
that
=
the
water
200
and
( i i )
a s s o c i a t e d with 6 = 0.05, first
two
i n zone
-
zones
system
of
and
order
the
to
sampling
t h e number of the
able
probability
optimal
increases
be
from to
above
design,
stations in the
number
of
stations
3. Sequential sampling
Although 1983-84 from
sequential
study,
the
1
zone
sampling has
is
method
which
yielded
6
set
are
different displays
equal
an
levels the
of
path
the of
two
the
fell
to
opposing
outside
the
the
The
using
of
kc
process
uncertainty
as
the data
0.43927.
are
given
hypotheses. bacterial
first
in
the
( 9 ) when b o t h a
curves
cumulative
1.
zone
performed
here
from formula These
t a k e n randomly from t h e d a t a of corresponding
been
estimate
0.05.
to
not
illustrated
5 shows t h e c u r v e s o b t a i n e d
Figure and
Tn
in
quality with
reduces
slightly
n
d e n o t e d by
70 s a m p l e s a r e n e e d e d
least
distribution
violations when
used during the s i x surveys
actually
for
Figure counts
5
(Tn)
and t h e s i x t h s u r v e y s
was
continued
region
until
delimited
by
In two
associated curves. Thus, from
the
the
compliance
11th
sample
while v i o l a t i o n of sample
onwards
example, February
21,
April
r e g u l a t i o n had and
regulation
stated 28th
June
26
3
and
been m e t
December
be
data
procedure
the
could
t h e s t a n d a r d c o u l d be o b s e r v e d from t h e the
for
surveys
performed
May
data. on
15 s u r v e y s d a t a
respectively
from t h e
a
As
the
further
January
showed 12th,
that
22nd,
10, the 14th
14th sample onwards.
4.
DISCUSSION AND CONCLUSION The g o o d n e s s - o f - f i t
the
the
13 surveys
for
same
the
with
onwards
unsuitability
representing However, zones
the
showed
the
of data
tests the
negative
from
goodness-of-fit that
the
s u m m a r i z e d i n T a b l e 1 c l e a r l y show
the
six
tests
negative
binomial
distribution
bacteriological calculated
binomial
model
for
within
f i t s
for
surveys.
the
the data
Number of samples Fig. 5 Sequential sampling illustrated for zone 1 using the data for December 13 and June 26 surveys
211 well
in
the
different
distribution being
system
composed
modelled
by
of
the
by d i f f e r e n t
able
in
the
negative
l e v e l s of
The
The
the
of
of
system,
hydraulic
which in
a
may
2
are
more
of
the
plant
In
and
other to
to
a
water as
(i.e.,
characterized
Furthermore,
the
can be c o n s i d e r -
necessarily occur
system
spatial
bacteria
in
of
of
therefore
age
of
al.,
the
zones
may
be
pipes,
the
residuals,
and
habitat,
succession
1981).
of
In
this
heterotrophic
peripheral it.
close to
of
chlorine
ecological et
bacterial
into
heterogeneity.
and c h e m i c a l c h a r a c t e r i s t i c s
parameters
in
than
in
far
locations
some o f patterns regard, bacteria from
the
Such an u n d e r s t a n d i n g
density
the
a
be
helps
to
improve
the
bacterial
was
constancy
to
likely
month
time.
might
collected
be
to
k
to in
vary
the
order
within
of
to
zone
bacteria
the
values
number to
of
maintain
i n view of
preferable
shown
1
than
s i g n i f i c a n t l y from
between
However,
so t h a t
of
dispersion
discrepancies
heterogeneity
sampling design is
fix
of
k
samples the
same
administration the
t h e maximum r i s k
number
of
is controlled
level.
zonation
of
a
and w i l l
In t h i s regard,
the
during all
during the
k
from month
it
of
the
readjustment
sampling over
to
relative
stability in
The
imply
incidence
p a r a m e t e r k ) on t h e
where
problem a r e a s
violated
the
densities
occur
zones
specific
The
not
chronological
(Means
high
month.
constraints, samples
of
level
stage
The
better
continually r i s k of
the
rather
study,
a
normally
at
that
emphasized.
the
subsystems
distribution
the
other
the
i n t e r m s of
reflects
month
does
the
considered
f u t u r e monitoring sampling programs.
this
(i.e.,
water
as
of
variability
design of
it
but
pattern
systems
likely
treatment
be
distribution)
t o some p h y s i c a l such
dictate
shows
Hence
may
heterogeneous
heterotrophic
number
unperturbed
Figure
a
conditions,
probably
Metz
binomial
network
partially related of
network.
of
the bacterial density.
structured
incidence
the
same m a n n e r .
the
division a
of
city
in the bacterial population
whole
in
reflects
the
several
temporal variation
t h e zones
zones
of
last
network thus
be
useful
facilitate
may
taking
100 b a c t e r i a l m i s t a n d a r d , the
s i x surveys,
survey only,
for
zone
for
for
for zones
1.
determining
remedial
2,
action.
i n s t a n c e , was
3 and 4 ,
and
212
Sequential sampling offers another way to monitor of
bacterial
density
in
a
system.
In
the level
addition
to
k,
the
procedure takes into account the concentration of bacteria the
samples
through
after
such
a
predetermined
each
collection.
strenuous
number
of
sampling samples
This
may
avoid
program
as
using
collected
in
a
a
large
single
run.
Moreover, sequential sampling may show increased efficiency with
comparison
mean
the
bacterial
standard.
density
design,
is
far
bacterial
and B(n)
illustrated
lies
sampling, both (8)
consumers' risk
or
above
below
in
true the
somewhere between
the producers'
risk
and
ho
(a) and
hl.
the
are considered and the graphical procedure
in Figure 5
analysed
enough
the
for some large sample s i z e , the true mean
density probably
In sequential
especially when
the cumulative sample Tn remains
Nevertheless, if
between A(n)
water
on-run
in
going
allows
into
further
different
classification of
classes
of
the
bacterial
concentration. Although the examples given in this paper use heterotrophic bacteria
and
a
specific
presented
may
be
easily
other
bacteriological
regulation,
adapted
water
the
to other
quality
data
sampling
designs
regulations o r described
even
by
the
negative binomial distribution. 5.
ACKNOWLEDGEMENTS The
authors
thank
S.R.
Wters) who kindly made
Esterby
(Canada
Centre
for
Inland
available to us a computer program to
perform the clustering method. 6.
LITERATURE CITED
Anderson, T.E.,
J.E.,
1984.
(U.S.A.
-
variability Hydrol.
El-Shaarawi, Dissolved 1.
Canada), using
A.H.,
oxygen Study
cluster
Esterby,
S.R.
concentrations
and
of
spatial
regression
and
Unny,
in Lake Erie and
temporal
analysis.
J.
72: 209-229.
Colwell, R.R.,
Austin,
B.
and
Wan.
L.,
considerations of the microbiology
65-75.
In: Evaluation
drinking
water
(C.W.
of
1978.
t h e microbiology
Hendricks,
ed.)
Protection Agency, Washington, D.C.
Public
health
of "potable" water. U.S.
standards
p. for
Environmental
213 El-Shaarawi, A.H., historical
Block, J.C.
data
for
and M a u l , A.,
estimating
the
1985.
The use o f
number
required for monitoring drinking water.
of
samples
Sci. Tot. Environ.
42: 289-295. Esterby,
S.R.,
1982.
Fitting
bacteriological data.
water quality guidelines. Fisher,
R.A.,
1941.
Ann. Eugen. Health
and
distributions
for
surveys
J. Fr. Hydrol.
The
negative
and
to for
13: 189-203.
binomial
distribution.
11: 182-187.
Welfare
drinking
probability
Considerations
water
Canada,
1978.
quality.
Guidelines
Canadian
for
Government
Canadian
Publishing
Centre, Supply and Services Canada, Hull, Quebec.
1968.
Lindgren, B.W.,
Statistical
theory, 2nd
ed.
Collier-
Macmillan, London. Maul, A.,
D o l l a r d , M.A.
principe
du
and Block, J.C.
maximum
de
vraisemblance
bactdrien s u r milieu gelos6. McFeters,
G.A.,
Alternative
Shillinger, indicators
J.E.
of
.
37-48.
p.
standards
for
In:
drinking
Application du
(N.P.P)
J. Fr. Hydrol. and
water
physiological characteristics of water
1981.
Stuary,
titrage
D.G.
contamination
1978. and
some
heterotrophic bacteria
Evaluation
water
au
12: 245-254.
(C.W.
of
the
in
microbiology
Hendricks,
ed.),
U.S.
Environmental Protection Agency, Washington, D.C. Means, E.G.,
Hanami, L., Ridgway, H.F.
Evaluating mediums bacteria 73:
Pipes,
and
plating
and O l s o n , B.H.
techniques
in water distribution systems.
for
1981.
enumerating
J. Am.
Works Ass.
585-590. W.O.
and
Christian,
frequency-microbiological report.
U.S.
R.R.,
drinking
Environmental
1982.
water
S a m p 1 ing
regulations-final
Protection
Agency,
EPA
R-805-63719-82-001, Washington, D.C. Reasoner, D.J.
and
enumeration and
Geldreich, E.E., subculture o f
Appl. Environ. Microbiol. Snedcor, G.W. 6th ed.
and
1985.
A
n e w medium
for
bacteria f r o m potable water.
49: 1-7.
Cochran, W.G.,
1967.
Iowa State University Press.
Statistical methods, Ames, Iowa.
214 U.S.
Environmental Protection Agency, Office of Drinking Water, 1976.
National
EPA-57019-76-003, Washington, W a l d , A.,
1974.
interim primary U.S.
drinking water
Environmental
regulations,
Protection
D.C. Sequential analysis.
New York, Wiley.
Agency,
A GOODNESS-OF-FIT TEST FOR THE NEGATIVE APPLICABLE TO LARGE SETS OF SMALL SAMPLES
BINOMIAL
DISTRIBUTION
BARBARA HELLER, Illinois Institute of Technology, Mathematics Department
Frequently, in microbiological work, bacterial counts are obtained serially in time or in space. I f there are replicates, they are few in number. If we assume that the same probability model can be used for the whole set of counts, then parameter values might vary from one point in time to the next.
Our object is to devise a goodness-of-fit test
for a given probability model, taking into account the effect of varying parameters and small sample sizes.
If we assume a Poisson model, then the index of dispersion statistic DL is available for testing goodness-of-fit.
Suppose that we have a sequence of size M of sets of
replicates of size n; where n is small and M i s large. Using the property that the Poisson mean i s equal to i t s variance, we compute the ratio Di2 of those two sample moments for each sample i = I, 2,
..., M.
Under the null hypothesis, each Di2 has, asympototicolly,
a x 2 distribution with n-l degrees of freedom. We utilize the fact that M i s large, even though n is small, by considering the frequency distribution of the set comparing it with the X 2(,-l)distribution
using, e.g.,
Di2
and
the chi-squared goodness-of-fit test.
We consider, here, the case where the assumed model i s negative binomial.
See
El-Shaarawi, Esterby, and Dutka (l98l), Christian and Pipes (1983), Pipes and Christian
(1984). As in the case of the Poisson distribution, we devise a sample statistic based upon a characterizing property which involves sample moments. Then we take advantage of large M by combining the individual sample statistics into one test statistic. However, here we have the added complication of dealing with two unknown parameters. Estimating unknown parameters directly, when the sample size n is small, leads to difficulties due to high variance of the estimators, especially for moments higher than the first. By working conditionally on the sample mean, we can avoid estimating one of the parameters, but the other one remains a problem. We have data of the following form. Let { Xi1 ,Xi2
M samples of size n.
For i
# i', we
,...,Xin],
i = 1.2
,...,M, represent
assume that the sets {X.. l a n d tX.,.l are mutually IJ
IJ
independent random variables with the same type of probability distribution but with possibly different parameter values. Consider underlying negative binomial distribution prqx for x = O,I,2
,...,
216
In Lukacs (I 963) and Heller (I 985131, it is shown that the negative binomial, Poisson pair of distributions are characterized by the zero regression on the mean of a statistic T: T = nL4- [n-(n-4)L ] L3 + (3-2n)L22 + [(n+l) L + L 2 ] L2-L3
n
and L = X I + X2 +
.. + Xn.
for sample XI, X2,
.., Xn.
(2)
As given in Heller (1985a), we construct a test statistic based upon the statistic
T.
I f X i s either negative binomial or Poisson, E ( T I L) = 0.
depend upon any unknown parameters.
The statistic T does not
Normalization is accomplished by utilizing the
conditional variance of T given L. (See Gart (1974) for conditional tests involving the Poisson distribution). From Heller (1985~)we have two formulas for conditional variance. Put V = Var (T I L ) if the underlying distribution i s Poisson. Put W(r) = Var (T /L) i f the underlying distribution is negative binomial. In the Poisson case, V doesn't depend upon any unknown parameters. negative binomial case, W depends upon the parameter r. As r
+ a,
But in the
the statistic W(r)
approaches the statistic V. In Heller (1985a), there are constructed two test statistics, representing two ways of dealing with the unknown parameter in W(r).
In each case, the null hypothesis is that
the underlying distribution i s either negative binomial or Poisson. Also in each case, the test statistic has, approximately, a t-distribution with M-l degrees of freedom, under the null hypothesis. Using formulas (I) and (2), (and the formulas for V and W), for each sample, Yli = Ti/Vi'/2 Y2i = Ti/Wi I /2
.
Then, for each of the above sets {Yli
1 and
{Yzi 1, put
7 equal to the sample mean and S 2
equal to the sample variance, (according to the usual formulas). For the first statistic, A, we use the set I Y l i
1
. The normalization is correct i f
the underlying distribution is Poisson and approximate i f it is negative binomial. Put A =
(Y1/SI) (M)'l2
For the second statistic, C(R), we use the set tY2i} and approximate the unknown parameters r r i } w i t h one "central" value R.
217 Put C(R) = (YZ(R)/S2(R)) where R is chosen such that S22 (R) has a value which is close to I. (See Heller (I 9850)). In Heller (19850), Monte Carlo studies on tests A and C(R) indicate that significance levels are satisfactory for M >, 10 and n 5 10. Also, we must hove n > 2.
If
the null hypothesis i s not rejected, we can distinguish between negative binomial and Poisson data by using the index of dispersion test. We consider an example of coliform counts (MF) which were obtained from the
U.S. Environmental Protection Agency, A.E.
McDoniels ( I 984).
The data consists of
several series of replicate counts obtained from a study on samples which were collected from a chlorinated municipal distribution system. The samples were split, held at 4O and 2OoC, and analyzed by the standard total coliform method a t 0, 2, 6, 24, 30, 48, and 72 hours.
Most analyses were made in triplicate but there were also a good many sets of 6
replicates. coliforms.
One set of 12 samples were analyzed as above for naturally occurring Another set of 12 samples were dosed with a pure coliform culture.
(A
discussion of this example may also be found in Haas and Heller (1985)). We perform the negative binomial
- Poisson goodness-of-fit
test on 4 sets of data:
Coliform count for "natural" samples for n = 6 and n = 3, Coliform count for "dosed" samples for n = 6 and n = 3. We consider first, the case where n = 6. For coliform counts from dosed samples, there were 68 sets of 6 replicates each.
For natural samples, there were 54 such sets.
Results are given in Tables I and 2. In Table I, significance of statistic A and C values is to be estimated by using the t-distribution with M - l degrees of freedom.
We also note,
that i f the data were Poisson, statistic s 1 would be close t o I .OO and statistic R would be h 2 large. In Table 2, D frequencies, f, are to be compared with expected frequencies, f, colculated from the
x2
distribution with 5 degrees of freedom.
We consider the D2 results first.
Evidently we reject the Poisson model for the
natural somples and do not reject it for the dosed.
Now looking at the A and C test
results for the natural samples, we see that a negative binomial model i s not rejected. Therefore, we are led to the conclusion that the Poisson model is appropriate for dosed samples and negative binomial for natural ones in this experiment. Looking more closely a t the A and C tests we see corroborative evidence for our conclusions. For the dosed samples: ( i) A and C values are close to each other, ( ii) S1 is not far from 1 , (iii) R is not small. These are all commensurate with the Poisson model. For the natural samples:
218 Table I. Neqative binomial qoodness-of-fit test results for dosed and natural samples, n = 6. Dosed M = 68
A = -1.40 C = -1.26
S I = 1.834 S2 = 1.000, R = 12.7
Natural M = 54
A = -0.898 C = 0.903
S I = 11.726 S2 = 1.000, R = 8.94
Table 2. D2 frequencies, n = 6.
NATURAL*
DOSED
06s. INTERVAL
FREQ. f
EXP. FREQ.
( 0, 1.6) (1.6, 3.1) (3. I, 4.7) (4.7, 6.3) (6.3, 7.9) (7.9; 9.4) > 9.4 Total
5 15 12 12 8 6 10 68
6.48 15.44 15.37 1 I .70 7.82 4.83 6.34 67.98
i
06s. (f-?/i
FREQ.
EXP. FREQ.
0.34 0.0 I 0.74 0.0 I 0.00 0.28 2.1 1 3.49
I 4 6 9 8 4 22 54
5.15 12.26 12.20 9.29 6.2 I 3.84 5.04 53.99
Table 3. Neqative binomial goodness-of-fit test results for dosed and natural samples, n = 3. Dosed M = 147
A = -0.956 C = -0.805
s1=
Natural M = 117
A = 0.1 I 3 C = 0.661
SI = 1.796 S2 = I .OO, R = 2 I .O
1.112 S2 = 1.00, R = 127.0
Table 4. D2 frequencies, n = 3.
NATURAL*
DOSED
06s. INTERVAL
FREQ, f
( 0, (1.6. (3. I * (4.7,
71 39 18 9 10 I47
1.6) 3.1) 4.7) 6.3) >
Tota I
6.3
EXP. FREQ.,
80.00 36.46 16.62 7.57
6.34 146.99
P
06s. (f-;I2/i
1.01 0.18 0.1 1 0.27 2.1 I 3.68
FREQ. f
55 27 13
II II I I7
EXP. FREQ., 63.7 29.0 13.2 6.0 5. I 117.0
^r
(f-i)2/i
I .20 0.14 0.00 4.17 6.83 12.34
219
(i) (ii)
A
and C are both not significant but are not close to eoch other,
SI
is appreciably larger than I. These are commensurate with the negative
binomial model. We perform a similar analysis on the triplicate data. Here there were 147 sets of dosed triplicates and I 17 sets of natural triplicates. We see here that, just as in the case where n = 6, the Poisson hypothesis is not rejected for dosed samples. For the natural samples, the Poisson model is rejected and the negative binomial not rejected; but the Poisson model i s not so far o f f the mark as was the case far n = 6. This illustrates the loss in discriminatory power (for both tests) in the case of n = 3 as compared to n = 6. In conclusion, we note various alternative distributions to the negative binomial model which are in common use. On the one hand, i f the data derive from Neyman Type
A
or Poisson-with-added-zeros
negative.
distribution,
the expected value of statistic TI
Therefore we expect the test statistics
A
and C to be "too" negative.
is
On the
other hand, if the data are from the logarithmic-with-zeros distribution, the stotistics
A
and C will have positive expected value and we expect to see values which are"too" positive.
(See El-Shaarawi (1985) for a discussion of some of these alternatives).
A
detailed discussion of the power of this goodness-of-fit test for the negative binomial distribution with respect to the above alternatives can be found in Heller (1985a).
220
REFERENCES C h r i s t i a n , R.R. and Pipes, W.O., 1983. Frequency d i s t r i b u t i o n o f c o l i f o r m s on water d i s t r i b u t i o n systems. A p p l i e d and Environmental M i c r o b i o l o g y , 45: 603-609. and Dutka, B.J., 1981. B a c t e r i a l d e n s i t y El-Shaarawi, A.H., Esterby, S.R., i n water determined by Poisson o r n e g a t i v e b i n o m i a l d i s t r i b u t i o n s . A p p l i e d and Environmental M i c r o b i o l o g y , 41: 107-116. El-Shaarawi, A.H., 1985. Some g o o d n e s s - o f - f i t methods f o r t h e Poisson p l u s added zeros d i s t r i b u t i o n . A p p l i e d and Environmental M i c r o b i o l o g y , 49: 1304-1306. Gart, J.J., 1974. The Poisson d i s t r i b u t i o n : The t h e o r y and a p p l i c a t i o n o f some c o n d i t i o n a l t e s t s . I n : G.P. Pate1 e t a l . ( E d i t o r s ) S t a t i s t i c a l D i s t r i b u t i o n s i n S c i e n t i f i c Work 2, pp. 125-140. Haas, C. and H e l l e r , B., 1985. S t a t i s t i c s o f enumerating t o t a l c o l i f o r m s i n water samples by membrane f i l t e r procedures. Water Research ( I n Press). H e l l e r , B., 1985a. A new n e g a t i v e binomial g o o d n e s s - o f - f i t t e s t based upon c h a r a c t e r i z a t i o n by z e r o r e g r e s s i o n ; u s e f u l f o r sequences o f small samp l e s (Submitted). H e l l e r , B., 1985b. C h a r a c t e r i z a t i o n o f t h e n e g a t i v e b i n o m i a l d i s t r i b u t i o n by r e g r e s s i o n p r o p e r t i e s ; re-examination o f a s t a t i s t i c due t o Lukacs (Submitted). H e l l e r , B., 1985c. Computation o f c e r t a i n c o n d i t i o n a l variances r e l a t i n g t o t h e Poisson and n e g a t i v e b i n o m i a l d i s t r i b u t i o n s w i t h t h e a i d o f MACSYMA (Submitted). Lukacs, E., 1963. C h a r a c t e r i z a t i o n problems f o r d i s c r e t e d i s t r i b u t i o n s . In: G. P a t i l ( E d i t o r ) Proc. I n t e r n a t i o n a l Symposium on C l a s s i c a l and Contagious D i s c r e t e D i s t r i b u t i o n s , pp, 65-73, Pergamon Press, Oxford, New York. 1984. Personal communication. McDaniels, A.E., McDaniels, A.E. and Bordner, R.H., 1983. E f f e c t s o f h o l d i n g t i m e and temperat u r e on c o l i f o r r n numbers i n d r i n k i n g water. Journal o f t h e American Water Works A s s o c i a t i o n , 75: 458-463. Pipes, W.O. and C h r i s t i a n , R.R., 1984. E s t i m a t i n g mean c o l i f o r m d e n s i t i e s o f water d i s t r i b u t i o n systems. Journal o f t h e American Water Works Associa t i o n , pp. 60-64.
REPORTING BACTERIOLOGICAL COUNTS FROM WATER SAMPLES: HOW GOOD I S THE INFORMATION FROM AN INDIVIDUAL SAMPLE? HILARY E . TILLETT Communicable Disease S u r v e i l l a n c e Centre, P u b l i c H e a l t h L a b o r a t o r y S e r v i c e , 61 C o l i n d a l e Avenue, London NW9 5EQ ABSTRACT I n a s s e s s i n g t h e m i c r o b i o l o g i c a l q u a l i t y o f w a t e r t h e s t a t i s t i c i a n has two main r e s p o n s i b i l i t i e s . i n d i v i d u a l sample.
F i r s t l y , t o assess c o r r e c t l y t h e i n f o r m a t i o n f r o m each
Secondly, t o a d v i s e on s a m p l i n g schemes f o r e f f i c i e n t and
r e a l i s t i c monitoring.
T h i s paper i s concerned w i t h t h e f i r s t problem.
I n B r i t a i n b a c t e r i o l o g i c a l e x a m i n a t i o n o f d r i n k i n g and r e c r e a t i o n a l w a t e r s i s o f t e n assessed u s i n g t h e m u l t i p l e d i l u t i o n ( " m u l t i p l e t u b e " ) method o r , i f n o t , by t h e membrane f i l t r a t i o n t e c h n i q u e .
Evidence f r o m q u a l i t y c o n t r o l
t r i a l s , where r e p l i c a t e s i m u l a t e d specimens were i s s u e d t o v o l u n t e e r l a b o r a t o r i e s , shows t h a t c o l i f o r m c o u n t s f r o m membrane f i l t r a t i o n tended t o be lower than the intended r e s u l t .
The m u l t i p l e t u b e method was more s e n s i t i v e i n
d e t e c t i n g the b a c t e r i a i n waters w i t h low contamination. Membrane f i l t r a t i o n g i v e s a p r e c i s e c o u n t whereas t h e m u l t i p l e t u b e method g i v e s an e s t i m a t e d c o u n t w h i c h s h o u l d be q u a l i f i e d by a range o f p r o b a b l e counts.
P u b l i s h e d t a b l e s o f most p r o b a b l e numbers (MPN) o f b a c t e r i a use expo-
n e n t i a l a p p r o x i m a t i o n s w h i c h r e q u i r e t h e assumption t h a t t h e w a t e r examined comes f r o m a l a r g e body o f homogeneous w a t e r .
Some MPN's have been r e c a l c u l a t e d
w i t h o u t making any such assumption and u s i n g occupancy t h e o r y .
I t i s suggested
t h a t , i n s i t u a t i o n s where t h e r e a r e c l o s e c o n t e n d e r s f o r t h e t i t l e MPN, a r a n g e of p r o b a b l e numbers s h o u l d be q u o t e d r a t h e r t h a n a s i n g l e MPN.
If the bacterio-
l o g i c a l c o n t e n t o f t h e w a t e r i s b e i n g compared w i t h a Standard, t h e n s h o u l d t h e whole o f t h e " p r o b a b l e range" o f c o u n t s pass t h i s Standard? The b a c t e r i o l o g i c a l r e s u l t f r o m a s i n g l e w a t e r sample s h o u l d be r e p o r t e d w i t h care. only.
I t s h o u l d be made c l e a r t h a t i t r e p r e s e n t s t h a t p l a c e a t t h a t t i m e
I t g i v e s no i n f o r m a t i o n a b o u t l i k e l y ranges o f c o u n t s a t t h e w a t e r
source, e x c e p t i n t h e u n l i k e l y s i t u a t i o n t h a t t h e sample comes f r o m a homogeneous body o f w a t e r . INTRODUCTION I n England and Wales r e s p o n s i b i l i t y f o r r o u t i n e t e s t i n g o f w a t e r samples f o r m i c r o - o r g a n i s m s i s shared by t h e P u b l i c H e a l t h L a b o r a t o r y S e r v i c e (PHLS) and
222
r e g i o n a l water a u t h o r i t i e s .
Since j o i n i n g t h e European Economic Community (EEC)
d i s c u s s i o n s have l e d t o t h e i n t r o d u c t i o n o f Standards f o r , amongst o t h e r t h i n g s , d r i n k i n g and b a t h i n g waters. The p r e - e x i s t i n g Standards used f o r water s u p p l i e d f o r d r i n k i n g have n o t had t o be a l t e r e d i n t h i s c o u n t r y .
They i n c l u d e requirements t h a t sampling be
f r e q u e n t and t h a t " s a t i s f a c t o r y " samples should c o n t a i n no E s c h e r i c h i a
coli
organisms, t h a t no consecutive samples should c o n t a i n any c o l i f o r m organisms and t h a t no i n d i v i d u a l sample should c o n t a i n more than t h r e e c o l i f o r m s p e r 100 m l (DOE, 1983). EEC d i r e c t i v e s on b a t h i n g waters r e q u i r e a minimum sampling frequency o f f o r t n i g h t l y , and g u i d e l i n e s i n c l u d e t h a t c o l i f o r m organisms should n o t exceed 500 and f a e c a l c o l i f o r m s should n o t exceed 100 p e r 100 m l .
Individual countries
a r e allowed t o s e t o t h e r l e v e l s than these, b u t t h e y should be w i t h i n c e r t a i n l i m i t s (European Communities, 1976). The r e s u l t s from r o u t i n e sampling should be s t u d i e d f o r t i m e t r e n d s b u t a l s o i t i s c l e a r t h a t each i n d i v i d u a l sample needs t o meet c e r t a i n c r i t e r i a . Therefore t h e s t a t i s t i c i a n needs t o advise, n o t o n l y on sampling s t r a t e g y and time s e r i e s a n a l y s i s , b u t a l s o on t h e i n t e r p r e t a t i o n o f b a c t e r i a l r e s u l t s from i n d i v i d u a l samples.
This paper i s concerned w i t h b a c t e r i a l counts from
i n d i v i d u a l samples and c o n s i d e r s whether t h e l a b o r a t o r y method used t o achieve t h e count should i n f l u e n c e t h e i n t e r p r e t a t i o n o f t h a t count. METHODS Routine water samples a r e i n v e s t i g a t e d f o r t o t a l c o l i f o r m organisms and f o r
_E. _c _o l i .
I n t h e PHLS two methods predominate, t h e membrane f i l t r a t i o n technique
and t h e m u l t i p l e tube ( d i l u t i o n s e r i e s ) method. ( i ) Membrane f i l t r a t i o n technique y i e l d s a count o f v i a b l e organisms ( s e l e c t e d by media, temperature and c u l t u r e c o n d i t i o n s ) i n t h e volume f i l t e r e d . I f t h e water sample comes from a homogeneous body of water t h e n t h e count can be
taken as an e s t i m a t e o f t h e v a r i a n c e o f b a c t e r i a l d e n s i t y i n t h a t body ( u s i n g t h e Poisson d i s t r i b u t i o n ) . I f t h e sample comes from a non-homogeneous water then t h e count r e p r e s e n t s t h a t sample s i t e and a t t h a t time o n l y . ( i i ) M u l t i p l e tube method y i e l d s an e s t i m a t e d count ( t r a d i t i o n a l l y t h e most probable number
-
MPN) i n t h e volume examined.
F o r standard d i l u t i o n s e r i e s
MPN's and accompanying confidence i n t e r v a l s (which a p p l y t o homogeneous waters) can be o b t a i n e d f r o m p u b l i s h e d t a b l e s (DOE, 1983; APHA, 1975) o r , f o r nonstandard s e r i e s , by computer program ( H u r l e y and Roscoe, 1983).
I f t h e water
sample comes from a non-homogeneous water then a c t u a l ranges o f probable numbers can be c a l c u l a t e d u s i n g occupancy t h e o r y ( T i l l e t t and Coleman, 1985) A homogeneous body o f water i s one i n which t h e b a c t e r i a a r e d i s t r i b u t e d
223
w i t h random v a r i a t i o n o n l y .
Such a c r i t e r i o n i s u n l i k e l y t o be e n c o u n t e r e d i n Perhaps random v a r i a t i o n
recreational waters o r pre-treatment d r i n k i n g waters.
ought n o t t o be assumed i n t r e a t e d w a t e r s s i n c e t h e aim o f s a m p l i n g i s t o l o o k f o r e v i d e n c e o f breakdown i n t r e a t m e n t r e s u l t i n g i n an i n f l u x o f b a c t e r i a .
I f w a t e r samples a r e assumed t o come f r o m w a t e r sources w i t h p o t e n t i a l l y non-random v a r i a t i o n (non-homogeneous) t h e n t h e membrane f i l t r a t i o n c o u n t c a n n o t be q u a l i f i e d by a c o n f i d e n c e i n t e r v a l and t h e r e f o r e i t i s c l e a r whether o r n o t t h e c o u n t exceeds a Standard.
B u t t h e MPN i s an e s t i m a t e d c o u n t and t h e r e may
be a range o f o t h e r l i k e l y c o u n t s .
T h i s paper i l l u s t r a t e s such examples and
suggests t h a t t h e r e s h o u l d be more d i s c u s s i o n on s e l e c t i n g t h e a p p r o p r i a t e range f o r comparison w i t h a Standard.
Then a comparison w i l l be made between membrane
f i l t r a t i o n and m u l t i p l e t u b e r e s u l t s i n some m u l t i - l a b o r a t o r y q u a l i t y c o n t r o l t r i a l s , w i t h r e g a r d t o d e t e c t i n g t h e presence and t o c o u n t i n g numbers o f coliforms.
The q u e s t i o n w i l l be asked whether e i t h e r r e s u l t s o r Standards need
t o be a d j u s t e d t o t a k e i n t o a c c o u n t t h e l a b o r a t o r y method used. RESULTS 1.
P r e c i s i o n o f m u l t i p l e $be
method
I f V i s a volume o f w a t e r i n which n b a c t e r i a a r e randomly d i s t r i b u t e d and
i f a s u b p o r t i o n , v, i s i n o c u l a t e d and i n c u b a t e d i n a t u b e o f c u l t u r e medium t h e n t h e p r o b a b i l i t y , p, o f no g r o w t h i s p = ( 1 -vn/V = e .
- v/V)~; if
v/V i s v e r y s m a l l t h e n
approximately p
T h i s a p p r o x i m a t i o n has been made f o r p u b l i s h e d t a b l e s o f MPN's and conf i d e n c e i n t e r v a l s (DOE, 1983; APHA,
1975) and computer program ( H u r l e y and
Roscoe, 1983) and i s v a l i d when a v e r y l a r g e sample i s c o l l e c t e d and m i x e d f o r e x a m i n a t i o n o f a v e r y s m a l l subsample, o r where t h e sample examined comes f r o m a l a r g e homogeneous source.
I f a r e l a t i v e l y s m a l l sample i s c o l l e c t e d f r o m a non-
homogeneous w a t e r s o u r c e t h e n p r o b a b l e numbers o f b a c t e r i a can be c a l c u l a t e d as follows :
-
I f m t e s t t u b e s w i t h e q u a l volumes o f t h e sample c o n t a i n n b a c t e r i a d i s t r i b u t e d
a t random, t h e n t h e p r o b a b i l i t y t h a t ( m - j ) t u b e s w i l l be s t e r i l e i s o b t a i n e d u s i n g occupancy t h e o r y ( D a v i d and B a r t o n , 1962). p ( j o c c u p i e d / n b a c t e r i a ) = -1 - m! mn ( m - j ) ! where
j,n
p(l/l) =
j,n
i s S t i r l i n g ' s number o f t h e second k i n d w i t h i n i t i a l c o n d i t i o n s
1, p(O/n) = 0 f o r a l l n and
p(j/n) =
0 for j>n.
P r o b a b i l i t i e s can be developed f o r d i f f e r e n t d i l u t i o n s e r i e s as shown by T i l l e t t and Coleman (1985) where t h e p a r t i c u l a r 11-tube s e r i e s 1 x 50 m l : 5 x 10 m l and 5 x 1 m l i s e x p l o r e d and c o n d i t i o n a l p r o b a b i l i t i e s t a b u l a t e d i n
224
The most p r o b a b l e numbers o f b a c t e r i a , u s i n g t h e s e e x a c t c o n d i t i o n a l
detail.
p r o b a b i l i t i e s , a r e v e r y c l o s e t o t h o s e o b t a i n e d f r o m t h e Poisson a p p r o x i m a t i o n . What i s a p p a r e n t i s t h a t t h e r e a r e s i t u a t i o n s i n which a s i n g l e MPN i s inappropriate. F i g u r e 1 shows c o n d i t i o n a l p r o b a b i l i t i e s a s s o c i a t e d w i t h a s i n g l e p o s i t i v e r e a c t i o n i n t h i s 11-tube s e r i e s .
I f the r e a c t i o n i s i n a 1 m l tube then i t i s
v e r y u n l i k e l y t h a t t h e r e a r e two o r more b a c t e r i a p r e s e n t i n t h e 105 m l sample examined.
I f t h e r e a c t i o n i s , as i s most l i k e l y , i n t h e 50 m l t u b e t h e n t h e
e s t i m a t e o f n i s n o t so c l e a r c u t . F i g u r e 2 shows c o n d i t i o n a l p r o b a b i l i t i e s a s s o c i a t e d w i t h a sample g i v i n g (1,5,1)
positive reactions.
t i t l e "MPN";
C l e a r l y t h e r e a r e many c l o s e c o n t e n d e r s f o r t h e
i n f a c t a l l v a l u e s o f n f r o m 30 t o 42 have a p r o b a b i l i t y a t l e a s t
95% o f t h e maximum p r o b a b i l i t y (most p r o b a b l e r a n g e ) and v a l u e s o f n f r o m 12 t o 120 have a p r o b a b i l i t y a t l e a s t 10% o f t h e maximum ( " p o s s i b l e " r a n g e ) . Most p r o b a b l e ranges f o r s e l e c t e d r e s u l t s f o r t h i s d i l u t i o n s e r i e s a r e shown i n t h e f i n a l column of T a b l e 1. TABLE 1
11-Tube D i l u t i o n S e r i e s 1 x 5 0 : 5 x 1 0 : 5 x 1 m l Selected combinations ( i , j , k ) o f p o s i t i v e r e a c t i o n s which a r e t h e most l i k e l y g i v e n t h e presence o f n b a c t e r i a p e r 100 m l ; and t h e most p r o b a b l e numbers o f b a c t e r i a f o r t h e s e c o m b i n a t i o n s when n i s unknown.
(i,j,k) 0 1 1 1 1 1 1 1 1 1 1 1
1 0 1 2 3 4 5 5 5 5 5 5
0 0 0 0 0 0 0 1 2 3 4 5
Values f o r n f o r which t h i s c o m b i n a t i o n i s most l i k e l y
Most P r o b a b l e Numbers i . e . quoted r e s u l t w h e f i i , j . k ) i s observed*
It
1
It
1
2- 3 4- 6 7 - 10 1 0 - 17 1 8 - 19 20 - 40 41 - 68 69-110 111-175 176- 00
2 4- 5 7- 9 11 - 14 20 - 27 29 - 40 44 - 65 75-110 134-1 90
>300- co
t b o t h combinations e q u a l l y l i k e l y . *defined as n such t h a t p ( i , j , k I n ) >0.95 x maximum p ( i , j , k
I n).
225
3. p(I,O,O In)
2. p(O,I,O In)
Fig I.
Probabilities of observing growth in one tube in the dilution series I x 5 0 m l : 5xIOml: 5xIml, conditional on the presence of n bacteria
MOST PROBABLE NUMBER = 35
PROBABLE NUMBERS = 30 - 42
POSSIBLE NUMBERS = 12 - 120
~b Fig 2.
io
io
40
1-
50
1 - - - r 60 70
do
do
do
n
lio
Probability of observing growth in l,5,l tubes in the dilution series I x 5 0 m l : 5xIOml: 5xIml, conditional on the presence of n bacteria
do
226
2.
Comparison o f methods
A s e r i e s o f m u l t i - l a b o r a t o r y q u a l i t y c o n t r o l t r i a l s i s b e i n g o r g a n i s e d by t h e PHLS Water Committee. So f a r E. -c o l i a t d i f f e r e n t d e n s i t i e s f o r each t r i a l have been i n t r o d u c e d i n t o a p r e p a r e d s i n g l e b a t c h o f w a t e r u s i n g t h e method d e s c r i b e d by Gray and Lowe (1976).
The r e p l i c a t e samples ( u s u a l l y 1 0 ) a r e s e n t
t o each l a b o r a t o r y and i n t e r s p e r s e d w i t h s t e r i l e samples.
Each l a b o r a t o r y i s
asked t o examine specimens u s i n g b o t h t h e m u l t i p l e t u b e method and membrane filtration.
L a b o r a t o r i e s have been u s i n g an 11-tube d i l u t i o n s e r i e s and pub-
l i s h e d MPN's (DOE, 1983). R e s u l t s have been compared w i t h r e s p e c t t o t h e d e t e c t i o n r a t e and t h e s i z e o f reported counts. ( i ) D e t e c t i o n o f E. c o l i .
with
r. 2were
O v e r a l l 387 samples f r o m batches i n o c u l a t e d
examined by t h e m u l t i p l e t u b e method and 19 (5%) were r e p o r t e d
s t e r i l e compared w i t h 37 (11%) o f 330 samples examined by membrane f i l t r a t i o n . T h i s d i f f e r e n c e i s s i g n i f i c a n t b u t i t would be more a p p r o p r i a t e t o c o n f i n e t h e comparison t o samples a c t u a l l y examined by b o t h methods and t o compare w i t h i n t r i a l s because o f v a r y i n g d e n s i t i e s between t r i a l s . T a b l e 2 shows t h a t t h e r e were c o n s i s t e n t l y more f a l s e n e g a t i v e s by membrane f i l t r a t i o n t h a n by m u l t i p l e t u b e ( p = 0.0002).
The f a l s e n e g a t i v e s were s p r e a d
amongst most o f t h e p a r t i c i p a t i n g l a b o r a t o r i e s and were n o t c o n f i n e d t o l a b o r a t o r i e s i n which t h e membrane f i l t r a t i o n t e c h n i q u e was t h e l e s s f a m i l i a r method.
When more t r i a l s have been done i t i s hoped t o make a more d e t a i l e d
a n a l y s i s o f r e s u l t s w i t h i n and between l a b o r a t o r i e s . TABLE 2 D e t e c t i o n o f E. c o l i i n S i x Q u a l i t y C o n t r o l T r i a l s 245 samples examined b y b o t h methods Trial No.
1 2 3 4 5 6
M u l t i p l e Tube -ves total
1 0 0
20
0 5
60 80 60 9
0
16
Cochran's X = -3.68,
Membrane F i 1t r a t i o n -ves total 3 3
20 60
5
80
4
60
6
9
2
16
p = 0.0002
( i i ) Reported c o u n t s .
T a b l e 3 shows t h e median c o u n t s o f
z.coo r e p o r t e d
i n each q u a l i t y c o n t r o l t r i a l f o r samples examined by b o t h methods.
I n every
t r i a l t h e median r e s u l t by m u l t i p l e t u b e was h i g h e r t h a n b y membrane f i l t r a t i o n e x c e p t f o r one t r i a l where t h e y were t h e same.
227
TABLE 3 Reported numbers of E.coli in six quality control trials 245 samples examined by both methods ~~~
~~
Trial No.
No. of samples
Median Count Multiple tube* Membrane filtration
~~
1 2 3 4 5 6
20 60 80 60 9 16
15 24 63 50 0 6
50 35 90 50 1 13
* Laboratories reported MPN's from published tables for the 1 x 50:
5 x 10:
5 x
lml dilution series
Was this a genuine difference or an artefact caused by the large gaps in tabulated MPN's? For example, in the 11-tube series used by these laboratories the published tables give no count between the range 50 for a (1,5,2) result and 90 for a (1,5,3) result, so that if an actual count was 75, it might register as a (1,5,3) result and, traditionally, be reported as an MPN of 90. In order to investigate this question results have been studied for each laboratory and each quality control trial. For example laboratory 1 in trial 3 reported 10 membrane filtration counts of 42, 46, 62, 71, 79, 79, 80, 84, 84, 85 with a median value of 79. Thus the estimated bacterial density for this water batch was 79 per 1OOml by this laboratory. A typical sample from a water with 79 E.coli per 1OOml would contain 83 E.coli in the 105ml used in the dilution series. From probabilities conditional on the presence of 83 organisms (Tillett and Coleman 1985) it can be shown that the probable numbers of tubes observed positive would be:p (1,5,2/83) = 0.280 p (1,5,3/83) = 0.341 p (1,5,4/83) = 0.202 Therefore the most probable multiple tube result would be (1,5,3) for which the laboratory would have reported the MPN of 90. However the MPN results recorded for these 10 replicate water samples were 50, 2 x 90, 7 x 160 with a median value of 160 By this comparison the higher counts from the multiple tube method are not accounted for by the fact that the observed membrane filtration average count of 79 could have yielded an MPN of 90. (Table 1 shows the most likely multiple tube results for given values of n bacteria per 100ml.) In Table 4 this comparison is repeated for each laboratory in the three
228
largest quality control trials. Far 14 of the 20 comparisons the membrane median count, expressed as it equivalent MPN, was less than the observed median MPN. In only one comparison was it larger.
TABLE 4 Comparison of median counts in each laboratory between methods, adjusting membrane filtration counts to their most likely MPN Trial No.
Membrane Count converted to MPN
<
=
>
3 1
0 1
1
0
Total Laboratories
observed MPN 2 3 4
3 6 5
6 8 6
DISCUSSION When routine sampling of water to monitor the bacteriological content is planned then allowance should be made for variability of counts with time and place. In recreational and pre-treatment water these variations could be large, but must be observed in order to check that there is no undesirable upward trend. As well as studying trends the results from each individual water sample has to be checked to see that it conforms to certain Standards in Britain. If the sample has been examined by membrane filtration then an actual count is achieved and can be compared directly with the Standard. I f the water has heen examined using a multiple tube method then there may be an estimated range of counts so that the comparison with the Standard may not be clearcut. An explanation of exact probability methods to calculate the probable numbers of bacteria was introduced in a previous paper (Tillett and Coleman, 1985) and has been continued here. The selection of a range of most probable numbers should, perhaps, reflect two things. Firstly, the presence of close contenders for the title 'most probable number'. It has been shown that these ranges are quite large when most of the tubes have shown a positive reaction. As a suggestion a range had been presented where all the conditional probabilities p( i,j,k/n) are at least 95% as great as the value o f p(i,j,k/ MPN), the maximum conditional probability. However, it could be argued that this range is not wide enough and that values with a 90% probability, or even less, could still be regarded as probable.
229
Secondly, i t must be r e a l i s e d t h a t t h e d i l u t i o n s e r i e s may have gaps between most p r o b a b l e numbers, e s p e c i a l l y i n t h e r a n g e where n e a r l y a l l t h e tubes show a p o s i t i v e r e a c t i o n .
Thus a t r u e c o u n t o f 75, f o r example,
is
most l i k e l y t o g i v e a c o m b i n a t i o n o f p o s i t i v e r e a c t i o n ( i n t h e 1 1 - t u b e s e r i e s s t u d i e d ) f o r which t h e MPN has t r a d i t i o n a l l y been quoted as 90.
However, t h e
Indeed, f o r t h e most o f t e n
suggested most p r o b a b l e r a n g e i s 75 t o 110.
observed c o m b i n a t i o n s o f p o s i t i v e t u b e s i n t h i s s e r i e s ,
t h e range counts f o r
which each c o m b i n a t i o n i s t h e most l i k e l y i s o f t e n comparable t o t h e most p r o b a b l e range o f 95% maximum c o n d i t i o n a l p r o b a b i l i t i e s . W i t h t h e membrane f i l t r a t i o n method i n t e r p r e t i n g r e s u l t s and comparing them w i t h a S t a n d a r d i s s t r a i g h t f o r w a r d .
However, m u l t i - l a b o r a t o r y q u a l i t y
c o n t r o l t r i a l s i n England and Wales have, o v e r a two-year p e r i o d , c o n s i s t e n t l y shown t h e m u l t i p l e t u b e method t o be more s e n s i t i v e i n d e t e c t i n g t h e presence o f E . c o l i and t o g i v e h i g h e r c o u n t s f r o m t h e same f r e s h w a t e r sample.
(Each w a t e r 'sample' was o f s u f f i c i e n t volume t o a l l o w h a l f t o be
used i n t h e 1 1 - t u b e d i l u t i o n s e r i e s d e s c r i b e d and l O O m l t o be f i l t e r e d . ) D i f f e r e n c e s i n s e n s i t i v i t y were l a r g e r t h a n c o u l d be accounted f o r b y t h e s l i g h t d i f f e r e n c e s i n volumes examined
-
105ml compared w i t h 100ml.
Twelve l a b o r a t o r i e s have been i n v o l v e d i n t h e s e t r i a l s so f a r , b u t i t i s hoped t o i n c l u d e more and t o e x p l o r e o t h e r and m i x e d b a c t e r i a . I f t h e s e d i f f e r e n c e s a p p l y e l s e w h e r e t h e n i t would seem t h a t s l i g h t l y
g r e a t e r volumes o f w a t e r s h o u l d be f i l t e r e d when l o w l e v e l s o f c o n t a m i n a t i o n a r e expected.
T h i s s h o u l d improve s e n s i t i v i t y .
s h o u l d be g i v e n t o ' d o u b l e s t a n d a r d s ' .
With higher counts thought
Should t h e l e v e l o f a c c e p t a b l e c o u n t s
be s e t h i g h e r when t h e m u l t i p l e t u b e method i s used so as t o a l l o w f o r r e p o r t i n g o f ranges r a t h e r t h a n s i n g l e c o u n t s and t o a l l o w f o r t h e f a c t t h a t r e s u l t s may t e n d t o be h i g h e r because o f t h e method and because o f t h e r o u n d i n g up f o r t r u e c o u n t s which f a l l between t h e sequence o f o b s e r v a b l e combinations o f tube r e a c t i o n s ? I n c o n c l u s i o n , t h e r e s u l t s f r o m t h e m u l t i p l e t u b e method s h o u l d be expressed as a r a n g e o f most p r o b a b l e counts.
I t s h o u l d be made c l e a r t h a t
t h e range i s because o f t h e u n c e r t a i n t y o f t h e method.
( I t i s n o t meant as
an i n d i c a t i o n o f l i k e l y r a n g e o f c o u n t s a t t h e w a t e r s o u r c e be e s t i m a t e d o n l y b y t a l k i n g m u l t i p l e samples.)
-
t h a t range can
I n most s i t u a t i o n s t h e r a n g e
s h o u l d be c a l c u l a t e d u s i n g methods a p p r o p r i a t e f o r non-homogeneous waters. Evidence f r o m one s e t o f q u a l i t y c o n t r o l t r i a l s i m p l i e s t h a t a w a t e r sample i s more l i k e l y t o e r r o n e o u s l y pass a S t a n d a r d i f i t i s examined b y membrane f i l t r a t i o n r a t h e r t h a n b y m u l t i p l e t u b e method.
230 ACKNOWLEDGEMENT Thanks a r e due t o t h e P u b l i c H e a l t h L a b o r a t o r y S e r v i c e Water Committee f o r a l l o w i n g me t o p r e s e n t r e s u l t s f r o m t h e Q u a l i t y C o n t r o l t r i a l s .
REFERENCES American P u b l i c H e a l t h A s s o c i a t i o n , American Water Works A s s o c i a t i o n , Water P o l l u t i o n C o n t r o l F e d e r a t i o n , 1975. S t a n d a r d Methods f o r t h e E x a m i n a t i o n o f Water and Waste-water. APHA, Washington D.C. David, F.N.
and B a r t o n , D.E.
1962.
C o m b i n a t o r i a l Chance.
G r i f f i n , London.
Department o f t h e Environment, Department o f H e a l t h and S o c i a l S e c u r i t y , P u b l i c H e a l t h L a b o r a t o r y S e r v i c e , 1983. The b a c t e r i o l o g i c a l E x a m i n a t i o n o f D r i n k i n g Water S u p p l i e s , 1982. Her M a j e s t y ' s S t a t i o n e r y O f f i c e , London. European Communities, 1976. Hygiene o f b a t h i n g waters. o f H e a l t h L e g i s l a t i o n . 27: 709-724.
I n t e r n a t i o n a l Digest
1976. The p r e p a r a t i o n o f s i m u l a t e d w a t e r samples Gray, R.D. and Lowe, G.H. f o r t h e purpose o f b a c t e r i o l o g i c a l q u a l i t y c o n t r o l . J o u r n a l o f Hygiene. 76: 49. H u r l e y , M.A. and Roscoe, M.E. 1983. Automated s t a t i s t i c a l a n a l y s i s o f m i c r o b i a l enumeration b y d i l u t i o n s e r i e s . Journal o f Applied B a c t e r i o l o g y . 55: 159-164. T i l l e t t , H.E. and Coleman, R. 1985. E s t i m a t e d numbers o f b a c t e r i a i n samples f r o m non-homogeneous b o d i e s o f w a t e r : how s h o u l d MPN and membrane f i l t r a t i o n r e s u l t s be r e p o r t e d ? J o u r n a l o f A p p l i e d B a c t e r i o l o g y . 59: 38L
SOME APPLICATIONS OF LINEAR MODELS FOR ANALYSIS OF CONTAMINANTS IN AQUATIC BIOTA
ROGER H. GREEN University of Western Ontario
1.
INTRODUCTION This paper deals with log-log linear models and some examples of their
application to water quality monitoring.
Such models arise out of any
situation where it is desired to estimate a proportion, percentage, or ratio which is in practice calculated from two other observed variables.
The
common practice of actually calculating this derived variable for each sampled observation, and then using the desired value as the response variable in a statistical model such as ANOVA, leads to problems of both statistical analysis validity and of interpretation of the results (Sokal and Rohlf 1 9 7 3 , Atchley et a1 1 9 7 6 , Green 1979).
Log-log regression or analysis of covariance
(ANCOVA) models can usually satisfy the objectives in such studies without derived variables being used in statistical analysis. The problem and this solution to it can best be explained by an example, which does not relate to water quality monitoring.
Then three examples which
are in a water quality monitoring context will be presented.
All four
examples are based on simulated data, s o that known parameters can be estimated by statistical procedures which can then be evaluated by their success in estimating the parameters and by their success in testing hypotheses whose truth or falsehood are known.
The simulated data for all
four examples are given in Appendix 1 , so that readers can do the analyses themselves.
All data simulation was done in the MINITAB statistical package.
Statistical analyses were done both in MINITAB and SAS.
In Appendix 2 is an
annotated SAS (Statistical Analysis System) j o b listing which will carry out both the ratio-variable and the log-log ANCOVA analyses for the first example. It can also be used to do the analyses for the other three examples, or for any log-log ANCOVA analysis with these objectives, by changing the variable names and changing the data to be analyzed.
232 2.
WATER CONTENT OF SPRING AND FALL FROGS - AN EXAMPLE Frogs are collected in spring and fall, and the question is whether there
is a difference between seasons in the percentage water content of the frogs. A common approach would be to determine total weight and dry weight for each frog, calculate percent water, and then do an ANOVA with the response variable being the derived percent water values and the treatment groups being the two seasons. Apart from the question of the poor statistical behavior of a ratio variable calculated using a denominator which has substantial variance, there is the problem that critical questions are being ignored and particular answers to them are being assumed.
For one thing, is percentage water content
independent of the size of a frog? The "derived ratio-variable'' approach assumes that it is, and treats the data in a way that obscures the problem. Furthermore, it may be that at one time of year the relationship between percent water and frog size is different than at the other time of year.
For
example, there may be a relationship in fall but not in spring. Let us instead build a log-log ANCOVA model as follows. water content of a frog and let D be the dry weight. total weight is W+D.
Let W be the
Therefore the frog's
If frog size changes, the percent water will stay the
same if and only if the water content changes at the same percentage rate as does the dry weight.
For example, if a 10 g frog is 6 g water and 4 g dry
weight, then both water and dry weight must go up 20 percent (to 7.2 g and 4 . 8 g) if a 12 g frog is to have the same percent water.
Thus dW/W
=
b(dD/D)
with b = l .
If this differential model is integrated we obtain the log-log b model log W = a + b log D, and the nonlinear form W = A D where A=ea. Again, only if b=l does percent water remain constant as frog size varies, and of course this is true of the WID ratio as well. W/D
=
If b=l then W
the constant ratio of water content to dry weight.
=
A D , and A
=
If b < l then percent
water, and the W/D ratio, decrease as frog size increases.
If b>l then
percent water and the W/D ratio increase with increasing frog size. The log-log form of the model is convenient for analysis because it represents a linear relationship between log W and log D.
A dummy variable
can be added to represent intercept differences between seasons, and another variable equal to the product of the dummy variable and log D to represent slope differences between seasons, and we then have a log-log ANCOVA model. By ANCOVA we can answer all four of the following questions: (a) What is the percentage water content, or equivalently the WID ratio? (b) Does percent water differ between seasons? (c) Does the size o f the frog influence the percentage water content? (d) Does the relationship, if any, between percentage water content and size of frog differ between seasons?
233
Question (d) corresponds to a test of H : "common slope" in the log-log ANCOVA model, and question (c) corresponds to a test of H : "slope b=l".
Given
acceptance of H : "common slope", question (b) corresponds to a test of H : "common intercept".
Given acceptance of H : "slope b=l" for either or both of
the regressions (for each season), question (a) corresponds to estimation of the intercept a from which A
=
ea = W/D can easily be calculated.
percent water content can then be calculated as 100W/(W+D)
Of course
= lOO(W/D)/(W/D+l).
Figure 1 shows the log-log plot of data simulated to represent this analysis problem.
The 30 values of log D are from a uniform random
distribution between 0 to 3, corresponding to a range of 1 to 20 for D.
We
simulate the data to represent the case that the answer to question (d) is "no", by simulating under a common slope model.
However we make the answer to
question (c) "yes" by choosing a common slope b=l.l, implying that percent water increases with size of frog because b>l.
The intercept for spring frogs
is chosen to be a =-0.511 corresponding to A=ea=0.6, and for fall frogs S
a =-0.223 corresponding to A=ea=0.8. F
Thus an average 1 g spring frog has a
W/D ratio of 0.6, equivalent to a percentage water content of 100(0.6)/(0.6+1)=37.5
percent.
For an average 1 g fall frog W/D=0.8 and
percentage water content is 100(0.8)/(0.8+1)=44
percent.
For a frog of any
given dry weight, the fall frog will average one-third greater water content (WF/Ws
=
0.8 D1'l/0.6
=
1.33).
Any realistic data set must have error variance in it, and in this case we introduce an error standard deviation (in predicting log W from knowledge of log D and season) of 5=0.1. The ANCOVA on these data yielded the following results. slope" was accepted (p>0.05).
The Ho:
The H : "common
"b=l" was rejected (p
common slope was estimated to be b=1.106.
The Ho: "common intercept" was
rejected (p
and the fall intercept aF as -0.215 (corresponding
The square root of the error mean square, which is an estimate
, was 0.096. It can be seen that our estimates are close to the model
parameters, and that all questions and associated hypotheses were answered correctly. 3. 3.1
WATER QUALITY MONITORING EXAMPLES BIOMAGNIFICATION OF A CONTAMINANT The first example of an application to water quality monitoring has to do
with biomagnification of a contaminant's concentration. Let us assume that clams are sampled from a mercury-contaminated river at various distances from the point source.
At each clam's location a sediment sample is taken and
234
WATER CONTENT OF SPRING AND FALL FROGS 3.0-
2.5L
0 6
2.0U A
T E 1.5R
U E 1.0-
I 6
H T 8.5-
8.8
8.5
1.8
1.5
2.8
2.5
3.8
Dashed line k "S" symbols: S p r i n s Broken line & "P" symbols: Fatt j c r o g ~ F i g u r e 1.
Log-log p l o t of s i m u l a t e d d a t a r e p r e s e n t i n g water c o n t e n t v e r s u s d r y weight f o r f a l l and s p r i n g f r o g s . The f i t t e d model i s drawn i n . The d i a g o n a l i s shown a s a s o l i d l i n e .
235
analyzed for Hg, and both muscle and liver from the clam are also analyzed for Hg.
Suppose that we wish to estimate: a) the relationship between Hg concentration in sediment ([Hg] ) and the S
Hg concentration in clam tissue ([HglT where T=M or T=L for muscle or liver respectively), b) the influence of tissue-type (muscle versus liver) on the biomagnification (i.e., on the ratio of [HgIT to [HgIS), c) The influence of varying sediment contamination ([Hg] ) on the S
biomagnification, and d) any difference between the tissues in how biomagnification responds to varying [%Is. The ratio-variable approach would start by calculating a synthetic variable "[HgIT divided by [HgIS", and then continue as an ANOVA where this synthetic ratio-variable (the estimated biomagnification for that particular tissue and clam) is the response, and the treatment or group is tissue type. The problems with this approach are the same as for the "spring and fall frogs" example previously described.
In particular, only (a) and (b) above
are estimated and they are estimated on the tacit assumption that effects (c) and (d) are nonexistent. The log-log regression approach derives from the following logic, which is analagous to that in the "spring and fall frogs" example.
As [HglS varies
over the contaminated area, the percentage variation of [HgIT should be proportional to the percentage variation of [HglS.
That is, d[HglT/[HglT
=
If (c), and therefore (a), are not significant effects then b(d[HgIS/[HglS). the word "proportional" can be replaced by the word "equal", and b=l. By integration we obtain the log-log model log[HglT nonlinear model [HglT =
AIHglS and A
=
=
A ([Hg]s)b
[HglT/[HglS
=
where A=ea.
=
a
+
b log[HgIS, and the
If and only if b=l, then [HglT
the estimate of the biomagnification.
If bfl
then the biomagnification varies as [HglS varies, and its magnitude cannot be determined unless [HgIS is specified. The entire analysis, with all desired tests of hypotheses (a)-(d),
is
easily done as a log-log ANCOVA in the same mannner as for the "spring and fall frogs'' example.
The dummy variable for group membership now identifies
tissue-type instead of season.
The response and predictor variables are now
log [HglT and log [HglS instead of log Ww and log WD.
Figure 2 shows the
log-log plot of a set of data simulated to represent this kind of analysis problem.
The 20 values of [HglS are from a uniform random distribution
between 0 and 100. For muscle the [Hg]T,M
HI]^,^ =
2.303
+
0.7 log [HglS
+5
where
the predictive model [HglTZM = 10[Hg]S0'7.
values satisfy the relationship log
5
=
0.2.
In nonlinear form this is
For liver the corresponding models
236
BIOMAGNIFICATION IN TISSUES
e
1
2
3 LO6 Hg CONC.
4
5
6
7
IN SEDMDcl
Dashed line & "L" symbols: L i v e r
Broken line & "M" symbols: Muscte Figure 2 .
Log-log p l o t of s i m u l a t e d d a t a r e p r e s e n t i n g mercury c o n c e n t r a t i o n i n l i v e r and muscle o f clams v e r s u s mercury c o n c e n t r a t i o n i n s e d i m e n t . The f i t t e d model i s drawn i n . The d i a g o n a l i s shown as a solid line.
231
are log[HglTCL 6[Hg]so'9.
=
1.792
+
0.9 log[HglS
+ 5 , again with 5
=
0.2, and
=
The choice of parameters is intended to represent a situation
where (a)-(b)
the biomagnification at [Hg] =1 is 10 for muscle and S
6 for liver, (c) biomagnification drops as [HglS increases, but (d) the decrease is greater for muscle than for liver. The ANCOVA on these data yielded the following results. slopes" was rejected (p
+
+ 0.96
The Ho: "common
and the two regression models were estimated to 0.60 , and [Hg]T,M = 13.9[HglS
0.60 log[HglS, or
log[HglS, or [HglT,L
5.2[Hg]S0'96.
=
of the error mean square, estimating 5 , is 0.22.
The square root
These parameter estimates,
for slopes, intercepts and residual error, can be compared with the previously stated parameters of the simulated data. questions corresponding to (a)-(d)
They are reasonably close, and the
that we wished to answer are answered
correctly, i.e., there are no Type I or I1 errors. 3.2
RATIO OF ISOTOPES OF ELEMENTS IN BIOGENIC MATERIAL The second water quality monitoring example has to do with the use of
relative concentration of isotopes of two elements in biogenic material as a monitor of potential pollution by an effluent containing one of the elements. Let us assume that clams are sampled from two areas, one of which receives an effluent which contains or may contain strontium. the control.
The other area serves as
The shell of each of these clams is analyzed for Sr and Ca
concentration, and then converted to total Sr and Ca content in that shell using the weight of the shell as a multiplier.
Suppose that we wish to
estimate : (a) the relationship between Sr and Ca content in a shell, (b) the influence of location (impact versus control areas) on the Sr/Ca ratio, (c) the influence of clam size (and presumable age), which is measured by Ca content, on the Sr/Ca ratio, and (d) any diffence between the locations in how the Sr ratio responds to varying clam size. The ratio-variable approach would contrast observed Sr /CaL ratios L between locations L=I and L=C, by ANOVA. Again this approach would only estimate effects (a) and (b) and would do so assuming that effects (c) and (d) are non-existent. The log-log regression approach is based on the assumption that any percentage variation in Sr content of shells would be proportional to
238
percentage variation in Ca content.
That is, d SrL/SrL
integration we obtain the log-log model log SrL
=
a
+
= b(dCaL/CaL). By b log CaL, and the
nonlinear model Sr = A CaLb where A=ea. If and only if b=l, then SrL = A CaL L and A = Sr /CaL. If b'l then the Sr /Cay ratio varies as clam size varies, L L L and its magnitude cannot be determined unless clam size (i.e., calcium content Ca ) is specified. L
Figure 3 shows the log-log plot of data simulated to represent this The 16 values of log Ca are from a uniform random distribution
situation.
between 0 and 3, which corresponds to a range of 1 to 20 for Ca.
Here we
simulate the situation where the (c) and (d) effects are nonexistent, i.e. the slopes are the same for the two locations and they are equal to b=l. Therefore the Sr /Ca ratio does not change as clam size varies. However, we L L do create differences between the locations in the intercepts a = log A, such that A
=
the Sr/Ca ratio is AC
the impacted area. SrI
=
0.1 CaI or log SrI
=
=
0.01 for the control area and AI
= 0.1
for
0.01 CaC or log SrC = -4.605 + log CaC, and -2.303 + l o g CaI. For both locations < = 1 for the
Thus SrC
=
log-log models.
The ANCOVA on these data yielded the following results.
The Ho:"common
slopes" was accepted (p> 0.05) and the common slope was estimated to be b
=
1.27.
The Ho:"b=l" could not be rejected (p> 0.05).
The Ho:"common
intercepts assuming common slope" was rejected (pi 0.01). The estimates for the "common slope b=l" model are log SrC
-5.0835
+
log CaC or SrC
0.0062 Ca
for the control area, C 0.1171 Ca for the impacted area. I The square root of the error mean square, estimating <, is 1.14. Again our =
and log SrI
=
-2.1452
+
=
log CaI or SrI
=
estimates are close to the true parameter values, and we have answered the questions corresponding to (a)-(d) 3.3
correctly.
RATIO OF SENSITIVE SPECIES TO RESISTANT SPECIES The third water quality monitoring example uses the ratio of the number
of species classed as "sensitive" ( S ) to the number classed as "resistant" (R) as a community index of water quality.
It is assumed that such a
categorization of species has been done a priori, and this is certainly a valid assumption for certain taxonomic groups in fresh water.
Again samples
are from two areas, one a possible impacted area and the other a control area. The species present in each sample are determined, and the number of species on the "sensitive" and "resistant" lists are recorded.
We wish to estimate:
(a)
the ratio of S to R in a sample,
(b)
the influence of location (impact versus control area) on the S/R ratio,
239
RATIO OF ISOTOPES IN SHELL
L! 3-l
60 1
/
:-I
/ / /
c -2
/ /
/
/
0
N-
I
T
C
/
-5
-4
& /
/
/
/ /
-6
/
-2
-3
-1
0
I
2
3
LO6 Co CONTENT
Dashed line 8t I symbols: Impacted &Tea Broken line 8t “C” symbols: Controt area ff
Figure 3.
ff
Log-log plot of simulated data representing the strontium content versus calcium content of clam shells from impacted and control areas. T h e fitted model is drawn in. T h e diagonal is shown a s a s o l i d line.
240
(c)
the influence of number of species present, as indicated by R, on
(d)
any differences between the locations in how the S I R ratio responds
the SIR ratio, and to varying species richness. The ratio-variable approach would contrast observed S/R ratios between locations L=I and L=C, by ANOVA.
Again only effects (a) and (b) would be
estimated based on the assumption that effects (c) and (d) are nonexistent. The log-log regression approach assumes that percentage variation in number of "sensitive" species (S) in a sample would be proportional to percentage variation in number of "resistant" species (R) in the sample.
That
By integration we obtain the log-log model
is, dSL/SL = b(dtt/%). logsL = a + b log and the nonlinear model S = A where A = ea. If and L If b=l then the SL/\ ratio varies and A = SL/%. only if b=l, then S = A L as species richness varies, and its magnitude cannot be determined unless
'tb
2,
't
is specified. Figure 4 shows the log-log plot of data simulated to represent this situation.
The 26 values of l o g R are from a uniform random distribution
between 1.386 and 3.689, which corresponds to a range of 4 to 40 for R.
As in
the previous example we simulate the situation where the (d) effect is nonexistent, but here we simulate a (c) effect.
That is, the slopes are the
same for the two locations but the common slope is b=1.2.
Therefore as the
number of species increases (greater species richness) the proportion of sensitive species, and the S/R ratio, increases.
This fits the perception
that higher diversity communities tend to have more "K-selected'' (biologically rather than physically accommodated) species. For the control area the log-log model is log Sc
=
-0.916
+
1.2
and the nonlinear form is S = 0.4 RCla2. For the impact area the C = -2.303 + 1.2 log RI. For both locations<= 0 . 8 .
model is log SI
The ANCOVA on these data yielded the following results.
The Ho "common
slope" was accepted (p > 0 . 0 5 ) , and the Ho:"b=l" just fails rejection (t = 1.93, t
.05
(23 df)
=
2.07).
Therefore we would marginally accept a common
slope b=l model, and conclude that the S / R ratio does not vary as species richness varies.
In doing s o , of course, we would commit the Type I1 error
since in fact b=1.2.
However, the model shown in Figure 4 is that based on
the estimated common slope of b=1.48, as it would have been had we correctly Sc
=
.
That model is log S = -1.718 + 1.48 log RR or C 0.18 R 1*48for the control area, and log SI = -2.966 + 1.48 log RI or
rejected H :"b=l" SI = 0.05 RIF.48
for the impact area.
square, estimating<, is 0.81.
The square root of the error mean
In this example we have not done as good a job
at estimating the parameters, and we would have committed one Type I1 error.
241
RATIO OF SENSITIVE TO RESISTANT SPECIES
5j I
L : 046
:
# 3-
s : E 2N :
s : I 1-
T . I :
/
r
-2
-1
/
e
I
I
2
3
4
5
LO6 # RESISTANT SPP.
II
1)
Dashed line & I symbols: Impact u r e a Broken line & "C" symbols: Control urea Figure 4 .
Log-log plot of simulated data representing the number of species in a sample classed as sensitive versus the number classed as resistant. The samples are from impacted and control areas. The fitted model is drawn in. The diagonal is shown a s a solid line.
242
A larger sample would have been needed to correctly reject H :"b=l" , and to obtain better estimates of the parameters.
4.
DISCUSSION A few cautionary comments should be added.
It should be recognized that
in all the above examples we have applied Model I least squares regression analysis and ANCOVA to what are obvious Model I1 data.
That is, both the
response variable and the predictor variable are really response variables that are sampled rather than controlled, and presumably both are measured with error.
The seriousness and consequences of this have been endlessly debated
in the literature (Madansky 1959, Kidwell and Chase 1967, Kuhry and Marcus 1977, Ricker 1973, 1975, 1984, Jolicoeur 1975, Snedecor and Cochran 1980). There is little agreement on remedies or alternatives, or indeed on whether remedies are needed.
The main concern is that the estimate of the slope is
biased downward by error in estimating the predictor variable (Snedecor and Cochran 1980).
If this error is similar in different groups, and the range of
observations on the predictor variable is similar in the different groups, then the test of H :"common slope" should not be affected.
It is estimates of
what the slopes are, and tests of H :"b=l" , that may be biased.
Similarly, if
the common slope model is fitted then the test of H :"common intercepts" should not be affected, but the estimates of what the intercepts are may be biased. My own approach to this problem is to re-estimate slopes by other more robust methods.
A variety of methods are available, such as finding the slope
of the first principal component in the 2-dimensional space defined by the two log-transformed variables (see Sokal and Rohlf 1981 for calculations), or using the method of grouping (Wald 1940, Nair and Banerjee 1942, Bartlett 1949, Madansky 1959, Kidwell and Chase 1967) which is implemented as the RLINE procedure in the MINITAB statistical package (see Velleman and Hoaglin 1981, which also contains FORTRAN and BASIC programs).
If these estimates are in
close agreement with the least squares Model I estimates, then bias is probably not a problem.
If they are not in agreement, then perhaps a barely
significant slope b < 1 should be taken with a grain of salt.
For the "percent
water in frog" data, the RLINE estimate of the common slope was 1.1062 compared to the least squares estimate of 1.1061 and the true value of 1.1. For the "biomagnification of mercury" data, the RLINE estimate of the slope for muscle was 0.57 compared to the least squares estimate of 0.60 and the true value of 0.7.
In neither case wou1.d there seem to be any cause for
concern. Another concern is that the groups being contrasted should not differ on
243
the predictor variable.
If they do differ, then an unbalanced design results.
Unfortunately, since the predictor variable is usually not controlled, it may differ between the groups. results.
This could result in ambiguous interpretation of
For discussion see Snedecor and Cochran 1 9 8 0 , Huitema 1 9 8 0 , Cox and
McCullagh 1 9 8 2 .
In all the examples presented here data were simulated for
both groups from a common distribution on the predictor variable.
REFERENCES Atchley, W.R., C.I. Gaskins, and D. Anderson., 1 9 7 6 . Statistical properties of ratios. I. Empirical results. Syst. Zool., 2 5 : 137-148. Bartlett, M.S., 1 9 4 9 . Fitting a straight line when both variables are subject to error. Biometrics, 5 : 207-212. Cox, D.R. and P. McCullagh, 1 9 8 2 . Some aspects of analysis of covariance. Biometrics, 3 8 : 541-554. Green, R.H., 1 9 7 9 . Sampling design and statistical methods for environmental biologists. Wiley, New York, 257 p. Huitema, R.E., 1 9 8 0 . The analysis of covariance and alternatives. Wiley, New York, 445 p. Jolicoeur, P., 1 9 7 5 . Linear regression in fishery research: some comments. J. Fish. Res. Board Canada, 3 2 : 1491-1494. Kidwell; J.F. and H.B. Chase, 1 9 6 7 . Fitting the allometric equation - a comparison of ten methods by computer simulation. Growth, 3 1 : 165-179. Kuhry, B. and L.F. Marcus, 1 9 7 7 . Bivariate linear models in biometry. Syst. ZOO^., 2 6 : 201-209.
Madansky, A., 1 9 5 9 . The fitting of straight lines when both variables are subject to error. Am. Statist. Assoc. J., 5 4 : 173-205. Nair, K.R. and K.S. Banerjee, 1 9 4 2 . Note on fitting of straight lines if both variables are subject to error. Sankhya, 6 : 3 3 1 . Ricker, W.E., 1 9 7 3 . Linear regressions in fishery research. J. Fish. Res. Board Canada, 3 0 : 409-434. Ricker, W.E., 1 9 7 5 . A note concerning Professor Jolicoeur's comments. J. Fish. Res. Board Canada, 3 2 : 1494-1498. Ricker, W.E., 1 9 8 4 . Computation and uses of central trend lines. Can. J. ZOO^., 6 2 : 1897-1905.
Snedecor, G.W. and W.G. Cochran, 1 9 8 0 . Statistical methods. 7th ed., Iowa State Univ. Press., Ames, Iowa, 507 p . Sokal, R.R. and F.J. Rohlf, 1 9 7 3 . Introduction to biostatistics. Freeman, San Francisco, 368 p. Sokal, R.R. and F.J. Rohlf, 1 9 8 1 . Biometry. Freeman, San Francisco, 8 5 9 p. Velleman, P.F. and D.C. Hoaglin, 1 9 8 1 . Applications, basics, and computing of exploratory data analysis. Duxbury Press, Boston, Mass., 354 p. Wald, A., 1 9 4 0 . Fitting of straight lines if both variables are subject to error. Ann. Math. Statistics, 1 1 : 284-300.
244
APPENDIX 1.
Simulated data used in the examples.
Biomagnification Hg
Water in frogs
DWT 2.13 3.04 4.49 2.32 1.56 4.42 8.39 5.56 4.46 11.49 6.78 3.12 9.55 8.71 1.31 2.83 2.98 3.10 2.04 1.59 1.93 1.55 4.75 8.41 3.12 2.24 18.43 11.24 2.66 1.29
Season 1.65 2.63 4.97 2.10 1.30 3.99 8.60 5.83 3.71 10.71 7.57 3.05 9.44 9.61 1.03 2.22 2.04 2.36 1.31 1.05 1.50 1.07 3.58 7.08 2.13 1.77 16.03 8.93 1.59 0.92
F F F F F F F F F F F F F F F S S S S S S S S S S S S S S S
S e d . Clam Tiss. ___--
26 90 1 48 14 60 17 70 55 46 23 86 39 55 12 1 88 83 35 31
129 190 12 110 89 153 73 141 165 177 89 267 182 282 52 6 526 422 121 165
M M M M M M M M M
M L L L L
L L L L
L L
Ratio of species types
Isotopes in Shell
-Ca 1.03 3.34 7.27 4.94 7.87 18.82 1.47 6.34 9.64 3.49 12.10 5.21 7.21 8.64 1.84 1.76
_Sr 0.010 0.029 0.029 0.021 0.032 0.589 0.006 0.021 2.552 0.882 6.699 0.408 0.050 1.664 0.420 0.070
Lot. C C C
c C C C C I I
I I I I I I
Resis. Sen. Loc. --38.31 8.55 5.00 6.56 6.83 4.78 7.96 8.18 4.66 7.09 4.32 29.02 6.20 37.58 13.03 5.83 23.22 17.21 4.24 25.92 16.99 12.11 16.22 6.25 11.43 7.14
15.15 5.86 1.60 2.85 3.11 5.75 5.33 11.79 0.33 6.41 0.63 63.11 1.48 14.09 2.56 0.61 6.23 0.78 0.49 11.26 2.90 1.55 2.02 3.72 3.23 0.50
C C C C C C C C C C C C C I I I I I I I I I I I I I
245
APPENDIX 2.
.
SAS job listing for analysis of ratio variables.
T I T L E WATER CONTENT O F S P R I N G AND F A L L FROGS: CREATE A DATA S E T C O N T A I N I N G THE V A R I A B L E S DRY WT. , WATER WT. , S E A S O N ( F = F A L L , S = S P R I N G ) , LN DRY WT. LN WATER WT. AND i %WATER , AND A NUMERICAL SEASON CODE ( F = l , S . 2 ) . t-Ill.1 r---, u1I N P U T DRWT WAWT S E A S O N $: LNDRWT=LOG(DRWT): LNWAWT=LOG(WAWT); TWT=DRWT+WAWT: PCWAWT=100* (WAWT/TWT) : I F S E A S O N = ' F ' THEN S C O D E = l : E L S E S C O D E = 2 : CARDS; 1.65 F 2.13 3.04 2.63 F
,
:
,
-.._..
.
.
.
PRODUCE S T A T S ON THE V A R I A B L E %WATER PROC MEANS; VAR PCWAWT: BY SEASON:
DO 2-GROUP ANOVA TO S E E IF PROC GLM: MODEL PCWAWT=SCODE:
.
$WATER
FOR EACH SEASON.
D I F F E R S BETWEEN S E A S O N S .
T E S T WHETHER T H E S L O P E S O F T H E LN WATER WT. VS. * R E G R E S S I O N S D I F F E R BETWEEN SEASONS. P R K GLM: MODEL LNWAWT=SCODE LNDRWT SCODE'LNDRWT;
LN DRY WT.
*
:
F I T T H E COMMON S L O P E A N A L Y S I S O F COVARIANCE MODEL AND OUTPUT T H E P R E D I C T E D LN WATER WT. VALUES TO A NEW DATA S E T . PROC GLM; MODEL LNWAWT=SCODE LNDRWT; OUTPUT O U T = O U T l P=PRLNY:
C R E A T E A NEW COMBINED DATA S E T C O N T A I N I N G OBSERVED AND P R E D I C T E D , : UNTRACSFORMED AND TRANSFORMED, WEIGHT AND %WATER WEIGHT VALUES.: DATA NEW; S E T 161: S E T O U T 1 ; P H Y = E X P ( P R L N Y ) : PRPCWAWT=100* ( P R Y / ( P R Y + D R W T ) ) : P R I N T OUT THE NEW DATA S E T . PROC P R I N T ;
.
4 DO A P L O T O F LN WATER WT. V E R S U S LN DRY W T . , CODED 3 Y SEASON. PROC PLOT: P L O T LNWAWT*LNDRWT=SEASON/ V A X I S = O T O 3 BY 0.5 H A X I S = O T O 3 BY 0 . 5 V p O S = 5 0 H P O S = 6 6 :
;
S U B S E T THE S P R I N G DATA ONLY DATA S P R I N G ; S E T NEW: I F S E A S O N = ' S ' ; DO LOG-LOG P L O T O F S P R I N G DATA W I T H T H E F I T T E D MODEL S U P E R I M P O S E D . PROC PLOT: P L O T LNWAWT'LNDRWT PRLNY*LNDRWT="' / OVERLAY V A X I S = O T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0 . 5 V P O S = 5 0 H P O S = 6 6 :
;
S U B S E T T H E F A L L DATA ONLY DATA FALL: S E T NEW: I F S E A S O N = ' F ' ;
.
Do T H E SA!4E P L O T FOR T H E F A L L DATA. PROC PLOT: P L O T LNWAWT'LNDRWT PRLNY'LNDRWT='*' / OVERLAY V A X I S = O T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0.5 V P O S = 5 0 H P O S = 6 6 : P L O T P R E D I C T E D LN WATER WT. V E R S U S LN DRY WT. , CODED BY SEASON. PROC P L O T DATA=NEW; P L O T PRLNY*LNDRWT=SEASON/ VAXIS.0 T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0.5 V P O S = 5 0 H P O S = 6 6 ;
;
t
PLOT P R E D I C T E D DRY WT. V E R S U S DRY WT. , CODED BY SEASON. PROC P L O T DATA=NEW: P L O T PRY*DRWT=SEASON/ V A X I S = O T O 2 0 BY 5 HAXIS.0 T O 2 0 BY 5 V P O S = 5 0 HPOS=66; PLOT P R E D I C T E D %WATER V E R S U S TOTAL WT. PROC P L O T DATA=NEW: P L O T PRPCWAWT*TWT=SEASON:
.
* FOR S P R I N G P L O T \WATER V E R S U S TOTAL W T . , WITH THE F I T T E D MODEL SUPERIMPOSED. PROC P L O T D A T A = S P R I N G ; P L O T PCWAWT*TWT PRPCWAWT"TWT='*' / OVERLAY: Do THE SAME P L O T FOR THE F A L L DATA. PROC P L O T DATA=FALL: P L O T PCWAWT'TWT PRPCWAWT*TWT="'
/ OVERLAY;
:
A COMPARATIVE STUDY
OF THE SAMPLING PROPERTIES OF FOUR SIMILARITY
INDICES
HONG-WOO KHOO AND LIM TIT-MENG
1.1
INTROOUCTION D i s s i m i l a r i t y and s i m i l a r i t y
i n d i c e s have p l a y e d an i m p o r t a n t
r o l e i n r e c e n t e c o l o g i c a l s t u d i e s b u t u n f o r t u n a t e l y t h i s does n o t appear t o have extended t o a q u a t i c e c o l o g y a c c o r d i n g t o Washington (1984).
He s t a t e d t h a t t h e r e i s an o b v i o u s need f o r t h e f u r t h e r and
g r e a t e r e v a l u a t i o n o f s i m i l a r i t y i n d i c e s e s p e c i a l l y f o r a q u a t i c ecosystems and w a t e r p o l l u t i o n problems.
T h i s paper hopes t o c o n t r i b u t e
t o t h i s need. I n terms o f s a m p l i n g p r o p e r t i e s ,
a reliable similarity
index
would be one w i t h l o w tendency t o g i v e v a l u e s t h a t d e v i a t e f r o m t h e t r u e s i m i l a r i t y v a l u e s and w i t h s m a l l d i s p e r s i o n o f i n d e x v a l u e s f o r r e p e a t e d measurements o f t h e same community s i m i l a r i t y .
The magnitude
o f b i a s and d i s p e r s i o n o f a sample o f good s i m i l a r i t y measures s h o u l d n o t v a r y w i t h f a c t o r s such as sample s i z e and t h e number o f s p e c i e s i n v o l v e d i n t h e community comparison. Several w o r k e r s have e v a l u a t e d i n many d i f f e r e n t ways t h e p r a c t i cality
of
Bullock, Wolda,
different
similarity
1971; Johnston,
indices
in
biological
work
Lamont and Grant,
(eg.
1976;
Huhta,
1979;
1981; R i c e and B e l l a n d ,
1982).
B u t few s t u d i e s r e a l l y l o o k e d
i n t o t h e s a m p l i n g p r o p e r t i e s o f t h e a f f i n i t y measures. Lau (1980) a r e among t h e v e r y
1979;
R i c k l e f s and
few who have s t u d i e d t h e s a m p l i n g
p r o p e r t i e s of s i m i l a r i t y i n d i c e s . To
determine
how a
similarity
index
behaves
with
different
s a m p l i n g parameters i n t h e f i e l d i s q u i t e t e d i o u s and d i f f i c u l t s i n c e n a t u r e i s a l m o s t always t o o complex t o a l l o w f o r c o n t r o l l e d s a m p l i n g experiments. simulate
B u t w i t h t h e a i d o f a computer, i t i s p o s s i b l e f o r us t o
artificial
communities
on
which
proper
and
repeatable
s a m p l i n g t r i a l s can be c a r r i e d o u t and t h e s a m p l i n g b e h a v i o u r s o f t h e indices studied. U s i n g computer g e n e r a t e d samples t a k e n from s i m u l a t e d communities,
247
t h i s s t u d y i n v e s t i g a t e d t h e s a m p l i n g responses o f f o u r commonly u s e d similarity
indices.
All
these
( i d e n t i c a l resemblance) and a l o w e r l i m i t o f 0 ( n o resemblance). f o u r chosen i n d i c e s a r e :
1 The
i n d i c e s have an u p p e r l i m i t o f
Gower's g e n e r a l c o e f f i c i e n t o f s i m i l a r i t y
(Gower, 1971), B r a y - C u r t i s '
i n d e x ( s e e Huhta, l 9 7 9 ) , M o r i s i t a ' s i n d e x
( m o d i f i e d by Horn, 1966) and t h e E u c l i d e a n d i s t a n c e i n d e x . The s a m p l i n g responses o f t h e s e f o u r s i m i l a r i t y measures were s t u d i e d w i t h r e s p e c t t o v a r i o u s sample s i z e s ,
q u a d r a t e s i z e s and t h e
number o f s p e c i e s a t t r i b u t e s .
1.2
METHOD
1.2.1 S i m i l a r i t y I n d i c e s The f o l l o w i n g a r e t h e i n d i c e s i n v e s t i g a t e d i n t h i s s t u d y : ( i ) Gowers g e n e r a l c o e f f i c i e n t s i m i l a r i t y o r Gowers' S i m i l a r i t y I n d e x (GSI)
GSI..
= ~/N<(~-(/x~-x./)/R~) 1J J N = t o t a l number o f s p e c i e s i n v o l v e d i n t h e community comparison
IK.-X.f 1
= a b s o l u t e d i f f e r e n c e between t h e abundance o f s p e c i e s
J
k in
community i and j .
Rk = s p e c i e s range; d e f i n e d as t h e d i f f e r e n c e between t h e maximum and
minimum abundance o f s p e c i e s k p r e s e n t i n a l l communities u n d e r comparison
z= sum o f
k = 1 t o N species.
( i i ) B r a y - C u r t i s ' S i m i l a r i t y I n d e x (BCSI)
B C S I . .=l-(
P. = Jk
x.
Jk
/(X
jk
= r e l a t i v e frequency o f s p e c i e s k i n community j
248
( i v ) Euclidean S i m i l a r i t y Index (ESI)
E S I 1. J. = l - (
Pik,Pjk
= same as above
The complement v e r s i o n s o f t h e B r a y - C u r t i s and E u c l i d e a n i n d i c e s were used t o s t a n d a r d i z e t h e outcome i n o r d e r t h a t 0 s t a n d s f o r no s i m i l a r i t y and 1 f o r i d e n t i c a l s i m i l a r i t y .
Calculation f o r the ESI
was based on r e l a t i v e f r e q u e n c y d a t a .
1.2.2
S i m u l a t i o n Technique The method i s s i m i l a r t o t h a t d e s c r i b e d by L i m and Khoo ( 1 9 8 5 ) .
The s t r a t e g y was t o c r e a t e a r t i f i c i a l communities so t h a t t h e y can be sampled,
a n a l y s e d and s t u d i e d i n t h e same way as i f s t u d i e s were
conducted on r e a l
communities.
The b a s i c
idea o f
the
simulation
f o l l o w s t h a t o f Khoo and W i l i m o v s k y (1978).
In t h e s a m p l i n g s i m u l a t i o n , a square sample q u a d r a t e o f t h e r e q u i r e d s i z e was f i r s t randomly a l l o c a t e d i n a 100 x 100 c o - o r d i n a t e space w h i c h r e p r e s e n t e d t h e community " u n i v e r s e "
dimension.
In t h i s
" u n i v e r s e " i n d i v i d u a l members o f each community were a l l o t e d a c c o r d i n g t o t h e number o f s p e c i e s and s p e c i e s numbers o r community s t r u c t u r e and randomly p o s i t i o n e d as C a r t e s i a n c o - o r d i n a t e s space.
i n t h e "universe"
O v e r l a p p i n g o f i n d i v i d u a l p o s i t i o n s was a l l o w e d .
The number o f i n d i v i d u a l s p e r s p e c i e s t h a t were f o u n d i n t h e sample q u a d r a t e was c o u n t e d by comparing t h e i n d i v i d u a l ' s c o - o r d i n a t e s w i t h t h o s e o f t h e f o u r c o r n e r s o f t h e square q u a d r a t e .
The whole
process was r e p e a t e d f o r t h e number o f s a m p l i n g u n i t s ( q u a d r a t e s i n this
case)
r e q u i r e d by t h e e x p e r i m e n t and t h e n t h e mean s p e c i e s
abundance p e r q u a d r a t e c a l c u l a t e d . repeated f o r community
another
structure.
The whole s a m p l i n g process was
community w i t h e i t h e r t h e same o r d i f f e r e n t These
sampling
results
were
then
used
to
c a l c u l a t e t h e s i m i l a r i t y index values.
In o r d e r t o measure t h e b i a s and p r e c i s i o n t h e above s i m u l a t i o n was r e p e a t e d t h i r t y t i m e s and t h e mean and s t a n d a r d d e v i a t i o n o f t h e
30 s e t s o f sample v a l u e s f o r each s i m i l a r i t y i n d e x were c a l c u l a t e d . The mean and s t a n d a r d d e v i a t i o n v a l u e s were f o u n d t o s t a b i l i z e a f t e r
30 rounds.
A l l t h e s i m u l a t e d communities i n t h i s s t u d y had a c o n s t a n t
d e n s i t y o f 1000 i n d i v i d u a l s i n t h e 100 x 100 c o - o r d i n a t e " u n i v e r s e " .
249
1.2.3
Community S t r u c t u r e Communities w i t h 3, 6 and 12 s p e c i e s were c r e a t e d and o n l y com-
p a r i s o n s between communities w i t h t h e same number o f s p e c i e s were made.
F o r each s p e c i e s number l e v e l , e.g.
d i f f e r e n t abundance
6 species (Fig. l ) , e i g h t
p a t t e r n s were c r e a t e d so t h a t when t h e y were
t h e i r e x p e c t e d i n d i c e s w o u l d range f r o m 0 t o 1.
compared,
Table 1
shows t h e e x p e c t e d s i m i l a r i t y v a l u e s between t h e communities f o r each o f the f o u r indices.
1.2.4
Treatments Two s e t s o f s a m p l i n g s t r a t e g i e s were conducted t o s t u d y t h e
e f f e c t s o f sample and q u a d r a t e s i z e s .
I n t h e f i r s t s e t , sample s i z e s
o f 15, 10, 20, 40 and 80 q u a d r a t e s were taken. 8 co-ordinate
u n i t s were
used.
The
Quadrate sizes o f 8 x
second s e t used 10 q u a d r a t e
samples w i t h q u a d r a t e s i z e s c h a n g i n g from 4 x 4, 8 x 8, 16 x 16 t o 32 x 32 c o - o r d i n a t e u n i t s . 1.2.5
Response Measures The measures i n response t o t h e t r e a t m e n t s were t h e d e v i a t i o n o f
t h e mean observed s i m i l a r i t y v a l u e f r o m t h e e x p e c t e d s i m i l a r i t y v a l u e , and t h e s t a n d a r d d e v i a t i o n o f t h e r e s u l t i n g 30 r e p l i c a t e i n d e x v a l u e s . The f o r m e r was termed t h e b i a s and t h e l a t t e r t h e d i s p e r s i o n o f t h e sample i n d e x v a l u e s . indices
The b i a s w o u l d r e f l e c t t h e a c c u r a c y o f t h e sample
i n measuring
the
i n h e r e n t community
similarity while
the
d i s p e r s i o n would measure t h e p r e c i s i o n of t h e i n d i c e s i n e s t i m a t i n g s i m i 1a r i t y . 1.3
RESULTS
Bias
1.3.1
The degree o f b i a s i n r e l a t i o n t o t h e e x p e c t e d v a l u e s f o r Gower's, Bray- C u r t i s ' , M o r i s i t a ' s and E u c l i d e a n S i m i l a r i t y I n d i c e s a t d i f f e r e n t sample s i z e s a r e shown i n F i g . 2 w h i c h i l l u s t r a t e s t h e r e s u l t s f o r t h e 6 s p e c i e s s i m u l a t i o n model. F o r Gowers' i n d e x ( G S I )
h i g h e r b i a s was o b t a i n e d a t h i g h ( t h e
i n d e x t e n d s t o 1 ) as w e l l as l o w ( t h e i n d e x t e n d s t o 0) e x p e c t e d values
while
l o w b i a s was
Gowers' i n d e x v a l u e s . similarity,
observed a t t h e
intermediate
expected
T h i s means t h a t when two communities have h i g h
Gowers' i n d e x t e n d e d t o u n d e r e s t i m a t e ( n e g a t i v e b i a s ) t h e
t r u e s i m i l a r i t y w h i l e a t low t r u e s i m i l a r i t y i t tended t o overestimate (positive bias) it.
T h i s p a t t e r n remained t h e same even when sample
250
400
1
Communities 1 & 2
3
5
4
r
mh 9
L
m
U
C
2
400
m
-
6
7
8
rn
.-W
-
U
aJ
Q u)
200
-
0
abcdef
abcdef
abcdef
Species composition
h
abcdef
Pig. 1. Diagram showing the community structures or speciesabundance patterns o f the computer simulated communities for the six-species community model.
251
1. Expected similarity values f o r comparisons between communities w i t h 3 , 6 , and 1 2 species with 1 to 5 types of c o m m u n i t y structure (see fig. 1 . ) for Gower's index (GSI), Bray-Curtis' index (BCSI), Morisita's index (MSI) and Euclidean distance index ( E S I ) . TABLE
Comparison between community types 1 & 2 2 & 33 1
:"I
1
Similarity
Indices
GSI
BCSI
MS I
ESI
1.00
1.00
1.00
1.00
0.83
0.92
0.98
0.88
& 4 & 5 6.4 & 5 & 5
0.67
0.83
0.92
0.77
0.83 0.17 0.0
0.92 0.75 0.67
0.98 0.87 0.72
0.88 0.65 0.53
1 & 2
1.00
1.00
1.00
1.00
'"'3
0.61
0.83
0.93
0.84
2 2 3 3 4
=/
2 & 3 6 4 & 5 6 4 & 5 & 5
0.50
0.79
0.89
0.80
0.90 0.10 0.0
0.96 0.61 0.57
0.99 0.67 0.61
0.96 0.64 0.60
1 & 2
1.00
1.00
1.00
1.00
2 & 3
0.71
0.84
0.89
0.89
0.50
0.77
0.88
0.85
0.80 0.20 0.0
0.93 0.61 0.54
0.99 0.64 0.56
0.96 0,73 0.70
2 2 3 3 4
ii:i
2 3 3 4
& & & &
5 4 5 5
Number of species in each community 3
6
12
252
Sample N = 5
0
0.5
0.5
40
20
10
1 0
size
1 0
0.5
Expected S i m i l a r i t y
1 0
0.5
80
1 0
0.5
1
I n d e x Values
Fig. 2. Diagram showing the deviation of the sample (observed) index values from the expected similarity values f o r the six-species community comparison. GSI= Gower's Similarity Index, BCSI= BrayCurtis' Similarity Index, MSI= Morisita's Similarity Index and ESI= Euclidean Similarity Index.
253
s i z e s were i n c r e a s e d t o 80 b u t t h e magnitude o f b i a s decreased w i t h t h e i n c r e a s e i n sample s i z e . U n l i k e Gowers'
index,
t h e o t h e r t h r e e i n d i c e s showed,
a t high
expected s i m i l a r i t y , a n e g a t i v e b i a s w h i c h decreased as t h e e x p e c t e d values
decreased
to
0.
This
means
that
they
would
tend
to
u n d e r e s t i m a t e t h e a s s o c i a t i o n o f two h i g h l y s i m i l a r communities b u t would g i v e more c o r r e c t and a c c u r a t e e v a l u a t i o n s when t h e e x p e c t e d s i m i l a r i t y i s low. The magnitude o r degree o f b i a s i n Gowers' i n d e x decreases w i t h i n c r e a s i n g sample s i z e and reached an a s y m p t o t i c c o n s t a n t v a l u e beyond t h e sample s i z e o f 40 ( F i g . however, values,
Beyond t h i s optimum sample s i z e ,
3).
a n e g a t i v e and a p o s i t i v e b i a s a t h i g h and l o w e x p e c t e d respectively,
are s t i l l
observed f o r Gowers'
Index.
This
means t h a t i n c r e a s e i n sample s i z e would r e d u c e t h e magnitude o f b i a s but w i l l not eliminate i t ' s bias potential pattern i n relation t o the expected values. The decrease i n b i a s magnitude i n t h e o t h e r t h r e e i n d i c e s , a l s o reached an a s y m p t o t i c v a l u e b u t f o r each o f them t h e v a l u e was reached a t d i f f e r e n t sample s i z e s .
F o r B r a y - C u r t i s ' and E u c l i d e a n i n d i c e s t h e
optimum sample s i z e s were between 20 and 40 whereas f o r M o r i s i t a ' s i n d e x t h e y were between 10 and 20. When t h e e x p e c t e d s i m i l a r i t y was low, however, t h e b i a s o f t h e s e t h r e e i n d i c e s reached t h e a s y m p t o t i c v a l u e s a t a s m a l l e r sample s i z e than
at
high
expected
values.
This
means
that
when
the
true
s i m i l a r i t y i s l o w a s m a l l e r sample s i z e i s s u f f i c i e n t t o measure i t w h i l e a l a r g e r sample s i z e i s needed when t h e t r u e e x p e c t e d s i m i l a r i t y i s high. O f t h e f o u r i n d i c e s , M o r i s a t a ' s I n d e x has t h e l e a s t b i a s and i s therefore situations
the when
most only
suitable smaller
index sample
for
comparing
sizes
((20)
communities are
in
available.
M o r i s i t a ' s I n d e x appears a l s o t o g i v e t h e b e s t a c c u r a t e measure o f t h e true similarity.
R i c k l e f s and Lau (1980) a l s o showed t h a t i t has l e s s
bias than t h e Euclidean distance.
A t t h e i r r e s p e c t i v e optimum sample
s i z e s f o r h i g h e x p e c t e d s i m i l a r i t y comparisons t h e p e r c e n t a g e b i a s f o r Morisita's Bray-Curtis,
I n d e x i s l e s s t h a n 5% o f t h e e x p e c t e d w h i l e t h o s e f o r E u c l i d e a n and Gowers' I n d i c e s were a l l g r e a t e r t h a n 10%.
T a b l e 2 shows t h e r e l a t i o n s h i p between t h e b i a s p o t e n t i a l o f t h e f o u r i n d i c e s and t h e number o f s p e c i e s f o r d i f f e r e n t sample s i z e s a t t h e e x p e c t e d s i m i l a r i t y v a l u e o f 1.
The b i a s o f Gowers' i n d e x was
independent o f t h e number o f s p e c i e s .
T h i s was t r u e i r r e s p e c t i v e o f
254
5
10
20
Sample 40
Size 80
1
+3.0 GS I
Expected Similarity Values
f0.2
+0.1
.o.o
'0.103 0.0
0.605 0.502
-0.1
-0.2 -0.3
-0.4
0.0 rn rd
m .rl
-0.1
-0.2
+:
-0.1 fO.1
0.0
-0.1
-0.2
0.056 0.614
1
I
ESI
0.233
7-
Fig. 3. Diagram showing the relation betweeb bias (observed minus the expected similarity values) of each of the four indices ( G S I , B C S I , MSI and E S I ) with respect to the increase in sample size. The example is for the six-species comparisons,
255
TABLE 2 . B i a s o f sample i n d e x v a l u e s f o r t h e 3 , 6 and 1 2 s p e c i e s communities i n r e l a t i o n t o s a m p l e s i z e ( 5 - 80) when t h e e x p e c t e d s i m i l a r i t y i s 1. R e s u l t s a r e f o r t h e f o u r i n d i c e s - GSI, BCSI, MSI and ESI.
Similarity indices
Number of s p e c i e s i n each community
GSI
BCSI
MS I
ESI
5 10 20 40 80
-0.39 -0.30 -0.36 -0.34 -0.25
-0.17 -0.11 -0.08 -0.06 -0.04
-0.05 -0.02
-0.18 -0.11 -0.08 - 0 .0 5 -0.04
3
5
-0.19 -0.15 -0.11 -0.08 -0,06
-0.05 -0.03 -0.02 -0.01
-0.18 -0.14 -0.10 -0.07 - 0 .0 5
6
20 40 80
-0.35 -0.31 -0.37 -0.27 -0.21
5 10 20 40 80
-0.42 -0.36 -0.32 -0.38 -0.33
-0.31 -0.20 - 0 .1 5 -0.11 -0.08
-0.22 -0.11 -0.06 -0.03 -0.02
-0.21 -0.14 -0.11
12
Sample s i z e
10
-0.01 -0.01 -0.004 -0,09
-0.07
-0.05
TABLE 3 . D i s p e r s i o n ( s t a n d a r d d e v i a t i o n ) o f s a m p l e i n d e x v a l u e s f o r 3 , 6 , and 12 s p e c i e s c o m m u n i t i e s a t e x p e c t e d v a l u e of 1, i n r e l a t i o n t o sample s i z e .
Sample s i z e
GSI
BCSI
MS I
ESI
5 10 20 40 80
0.14 0.14 0.12 0.10 0.11
0.07 0 .0 5 0.03 0.03 0.02
0 .0 4 0 .0 2 0.01 0.01 0.003
0.08 0 .0 5 0.04 0.03 0.02
3
5 10 20 40 80
0.10 0.11 0.09 0 .0 7 0.07
0,06 0 .0 5 0.03 0.02 0 .0 2
0.05 0.04 0.02 0.01 0.01
0.05 0.04 0.03 0.02 0.02
6
5
0.08 0.09 0.07 0.05 0.05
0.07 0.05 0.03 0 .0 2 0.02
0.08 0.05 0.03 0.01 0.01
0.05 0.03 0 .0 2 0.01 0.01
12
10 20 40 80
No. o f s p e c i e s i n community
256
sample
sizes.
This
bias,
however,
was
observed t o
increase w i t h
i n c r e a s e i n s p e c i e s number f o r t h e o t h e r t h r e e i n d i c e s .
1.3.2
Dispersion The d i s p e r s i o n ( s t a n d a r d d e v i a t i o n ) p a t t e r n s o f t h e 4 i n d i c e s i n
r e l a t i o n t o sample s i z e and s i m i l a r i t y v a l u e s a r e shown i n f i g . 4.
The
s t a n d a r d d e v i a t i o n v a l u e s were g r e a t e s t a t t h e m i d range v a l u e s (0.5) and l e a s t a t b o t h t h e l o w and t h e h i g h end s i m i l a r i t y v a l u e s ( 0 and 1 ) . The g r e a t e s t d i s p e r s i o n v a l u e s were observed f o r Gowers' I n d e x followed
by
Bray-Curtis'
and M o r i s i t a ' s
i n d i c e s and t h e n
by
the
E u c l i d e a n I n d e x w h i c h appears t o be t h e most p r e c i s e o f t h e f o u r indices.
A s i m i l a r i t y measure w i t h h i g h p r e c i s i o n o r l o w d i s p e r s i o n p r o p e r t i e s would be more s e n s i t i v e i n d e t e c t i n g s m a l l e r d i f f e r e n c e s o r
I n t h i s regard t h e Euclidean index i s
changes t h a n one w h i c h i s n o t . t h e most p r e c i s e .
I t ' s s t a n d a r d d e v i a t i o n a t t h e optimum sample s i z e
o f 40 was a b o u t 0.027
under t h e w o r s t c o n d i t i o n s .
Thus i t ' s 95%
c o n f i d e n c e i n t e r v a l i s l e s s t h a n 10% o f t h e mean s i m i l a r i t y v a l u e . T h i s i n t e r v a l t e n d s t o g e t s m a l l e r when t h e s i m i l a r i t y v a l u e s t e n d towards t h e two extremes o f 0 and 1. other three
i n d i c e s were
The c o n f i d e n c e i n t e r v a l s o f t h e
g r e a t e r than
10% a t t h e mid-range
mean
s i m i l a r i t y values. I n c r e a s e i n sample s i z e had t h e e f f e c t o f l o w e r i n g t h e d i s p e r s i o n v a l u e s and hence i n c r e a s e s t h e p r e c i s i o n o f a l l f o u r i n d i c e s .
No
optimum sample s i z e a t w h i c h t h e d i s p e r s i o n s t a b i l i z e s c o u l d be seen. H e t e r o g e n e i t y o f v a r i a n c e o f t h e s i m i l a r i t y v a l u e s were observed from a l l
four
indices.
Uneven d i s p e r s i o n a t d i f f e r e n t v a l u e s
s i m i l a r i t y i s n o t a good p r o p e r t y o f an i n d e x .
of
The v a r i a n c e s were
r e l a t e d t o t h e i n d e x v a l u e s i n a p a r a b o l i c manner w i t h t h e g r e a t e s t v a r i a n c e a t t h e m i d range v a l u e s .
T h i s p a r a b o l i c and uneven d i s t r i b u -
t i o n o f t h e s t a n d a r d d e v i a t i o n , however tended t o b r e a k down as t h e sample s i z e i n c r e a s e d t o 80.
I t would appear t h e r e f o r e t h a t t o o b t a i n
homogeneity o f v a r i a n c e and t h e independence o f t h e d i s p e r s i o n f r o m t h e mean i n d e x v a l u e s one s o l u t i o n i s t o i n c r e a s e sample s i z e b u t i n t h i s case up t o 80.
T h i s number may n o t be p r a c t i c a l under f i e l d
c o n d i t i o n s . R i c k l e f s and Lau (1980) had a l s o observed uneven d i s t r i b u t i o n o f s t a n d a r d d e v i a t i o n even u p t o sample s i z e s o f 400 f o r a n o t h e r s i m i l a r i t y index. index
values
to
statistical tests.
An a l t e r n a t i v e s o l u t i o n i s t o l o g - t r a n s f o r m t h e obtain
homogeneity
of
variance
for
subsequent
257
Sample
. ft
0.150
0,100
0.075
40
.
h
c
.d
Size
20
10
0.100
.rl
2 P
2
0.050
a
a
9
U
m
v
0
0
0.5
1
0
0.5
1
0
0.5
1
Average S i m i l a r i t y Values
Fig. 4. D i a g r a m s h o w i n g t h e d i s p e r s i o n p a t t e r n s of t h e sample s i m i l a r i t y v a l u e s f o r t h e f o u r indices. The r e s u l t s a r e f o r t h e six-species communitv comoarisons.
258
As t h e number o f s p e c i e s i n c r e a s e d i n t h e communities t h e d i s p e r sions
of
Morisata's
Gowers' index
and
Euclidean
increased
indices
( t a b l e 3).
decreased w h i l e The
that
of
dispersion pattern o f
B r a y - C u r t i s ' i n d e x was, however, independent o f s p e c i e s number. 1.3.3
quadrate s i z e S i m u l a t i o n r e s u l t s showed t h a t t h e i n c r e a s e s i n q u a d r a t e s i z e s
had t h e same e f f e c t s on t h e b i a s p o t e n t i a l and t h e d i s p e r s i o n o f t h e f o u r i n d i c e s as t h a t o f i n c r e a s i n g sample s i z e .
I t reduced t h e b i a s
and i n c r e a s e d t h e p r e c i s i o n i n t h e same manner as i n c r e a s e s i n sample s i z e b u t a l s o d i d n o t a l t e r t h e r e l a t i o n s h i p p a t t e r n between t h e b i a s p o t e n t i a l and t h e e x p e c t e d s i m i l a r i t y v a l u e s as w e l l as t h e p a r a b o l i c r e l a t i o n s h i p between t h e d i s p e r s i o n v a l u e s and t h e s i m i l a r i t y v a l u e s i n a l l the four indices. 1.4
DISCUSSION
A good s i m i l a r i t y i n d e x s h o u l d be s i m p l e t o c a l c u l a t e ; i t s h o u l d be 0 i f t h e two e n t i t i e s o r communities under comparison a r e c o m p l e t e l y d i f f e r e n t and i t s h o u l d be 1 when b o t h communities a r e i d e n t i c a l ; t h e i n d e x s h o u l d v a r y l i n e a r l y and can be t e s t e d s t a t i s t i c a l l y and s h o u l d have t h e d e s i r a b l e s a m p l i n g p r o p e r t i e s o f b e i n g a c c u r a t e and precise. From t h e above c o m p a r a t i v e s t u d y i t c o u l d be seen t h a t none o f them was a b s o l u t e l y s u p e r i o r .
I n terms o f a c c u r a c y w i t h r e s p e c t t o
sample s i z e , M o r i s i t a ' s i n d e x had t h e l e a s t b i a s a t s m a l l sample s i z e f o l l o w e d by t h e E u c l i d e a n and B r a y - C u r t i s '
i n d i c e s and t h e
least
a c c u r a t e was Gowers' I n d e x . I n terms o f a c c u r a c y w i t h r e s p e c t t o s p e c i e s numbers,
Gowers'
Index was most p r e c i s e f o l l o w e d by B r a y - C u r t i s ' and M o r i s i t a ' s I n d i c e s and t h e l e a s t p r e c i s e was Gowers' I n d e x . I n terms o f p r e c i s i o n and sample s i z e , E u c l i d e a n I n d e x was most p r e c i s e f o l l o w e d b y B r a y - C u r t i s ' and M o r i s i t a ' s I n d i c e s and t h e l e a s t p r e c i s e was Gower's Index. I n terms o f p r e c i s i o n i n r e l a t i o n t o s p e c i e s number, B r a y - C u r t i s ' I n d e x was t h e b e s t because o f i t s independence f r o m t h e number o f species. Gower's g e n e r a l c o e f f i c i e n t o f s i m i l a r i t y has been c l a i m e d by Johnston (1976) t o be r o b u s t . include i t s applicability
Some o f i t s m e r i t s l i s t e d b y J o h n s t o n
t o any l e v e l
presence-absence d a t a t h r o u g h c o n t i n u o u s ,
o f measurement
(discrete,
biomass d a t a ) and i t does
259
not
require
calculation.
statistical
standardization
This
however,
study,
c r i t e r i a d i s c u s s e d above, therefore,
of
showed
the
that
data
out
of
used the
in four
Gower's i n d e x o n l y s a t i s f i e d one and i t i s
i n comparison w i t h t h e o t h e r i n d i c e s ,
n o t as u s e f u l as
c l a i m e d by Johnston i n terms o f i t s s a m p l i n g p r o p e r t y . Bray-Curtis'
i n d e x has been e v a l u a t e d by Huhta (1979) t o be good
i n showing s u c c e s s i o n i n f o r e s t f l o o r a r t h r o p o d communities o n l y a f t e r l o g a r i t h m i c transformation o f data.
T h i s s t u d y , however, showed t h a t
i t s one advantage i s t o have i t s sample d i s p e r s i o n v a l u e s independent o f t h e number o f s p e c i e s i n t h e communities. M o r i s i t a ' s i n d e x has been recommended by Wolda (1981) f o r i t s independence o f sample s i z e and s p e c i e s number.
T h i s s t u d y , however,
showed t h a t i t i s dependent on sample s i z e ( t h o u g h n o t as s t r o n g l y as t h e o t h e r t h r e e i n d i c e s ) and s p e c i e s number. E u c l i d e a n d i s t a n c e s i s c o n s i d e r e d t o be one o f t h e good s i m i l a r i t y i n d i c e s by Lamont and G r a n t (1979) f o r i t s s e n s i t i v i t y t o changes i n d i f f e r e n t number o f s p e c i e s between communities.
T h i s s t u d y , however,
o n l y compared abundance p a t t e r n s between communities o f same s p e c i e s number.
I t i s f o u n d t o be p r e c i s e and i t s p r e c i s i o n i n c r e a s e s w i t h
sample s i z e and t h e number o f s p e c i e s a t t r i b u t e s .
It i s therefore a
s u i t a b l e i n d e x t o use f o r complex communities and when sample s i z e s are large. M o r i s i t a ' s i n d e x s h o u l d be used when t h e a c c u r a c y o f community s i m i l a r i t y measurement i s t h e main o b j e c t i v e whereas E u c l i d e a n i n d e x s h o u l d be used when t h e p r e c i s i o n o f r e p e a t e d measurements i s o f importance i n s i t u a t i o n s where o n l y r e l a t i v e d i f f e r e n c e s a r e needed. According
to
Wolda
(1981),
indices are not y e t available.
statistical However,
tests
for
similarity
R i c k l e f s and Lau
(1980)
suggested t h e use o f f i e l d d a t a and t h e use o f computer s i m u l a t i o n t o o b t a i n s t a t i s t i c a l confidence l i m i t s f o r s i m i l a r i t y estimates.
This
p r e s e n t s t u d y a l s o shows t h a t i t i s p o s s i b l e t o e s t i m a t e c o n f i d e n c e l i m i t s f o r these s i m i l a r i t y i n d i c e s through s i m u l a t i o n provided the s a m p l i n g p r o p e r i t e s a r e t a k e n i n t o account. I t i s t h e r e f o r e p o s s i b l e t h a t t h r o u g h t h i s k i n d of
study
the
properties
of
similarity
indices
under
simulation
various
other
c o n d i t i o n s can be e x p l o r e d and w o u l d l e a d t o a b e t t e r u n d e r s t a n d i n g o f t h e many s i m i l a r i t y i n d i c e s a t p r e s e n t i n use, b e f o r e t h e y can be used properly i n the f i e l d .
260
REFERENCES Bullock, J.A., 1971. The investigation of samples containing many species 11. Sample Comparison. Biol. J. Linn. SOC. 3: 23-56. Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27: 857-871. Horn, H.S., 1966. Measurement of "overlap" in comparative ecological studies. American Naturalist 100 (914): 419-424. Huhta, V . , 1979. Evaluation of different similarity indices as measures of succession i n arthropod communities of the forest floor after clear-cutting. Oecologia 41: 11-23. Johnston, J.W., 1976. Similarity Indices I: What do they measure? BNWL-2152. Battele, Pacific Northwest Laboratories, Richland, Washington, U.S.A. Khoo, H.W. and Wilimovsky, N.J., 1978. Similarity Index. Department of Zoology, University of Singapore (unpublished report). Lamont, B.B. and Grant, K.J., 1979. A comparison of twenty-one measures of site dissimilarity. In: Multivariate Methods in Ecological Work. (eds.) Orloci, L., Rao, C.R. and W.M. Stiteler, pp 101-126, International Co-operative Publishing House, Maryland, U.S.A. Limy T.M. and Khoo, H.W., 1985. Sampling properties of Gower's general coefficient similarity. Ecology (in press). Morisita, M. 1959. Measuring Interspecific Association and Similarity Between Communities. Mem. Fac. Sci. Kyushu Univ., Ser. E (Biol), 3: 65-80. Rice, J. and Belland, R.J., 1982. A simulation study of moss flora using Jaccard's coefficient of similarity. J. Biogeography 9: 411-419. Ricklefs, R.E. and Lau, M. 1980. Bias and dispersion of overlap indices: Results of some Monte Carlo simulations. Ecology 61 (5): 1019-1024. Washington, H.G., 1984. Diversity, Biotic and Similarity Indices: A review with special relevance to aquatic ecosystems. Water Res. 18(6): 653-694. Wolda, H., 1981. Similarity indices, sample size and diversity. Oecologia 50: 296-302.
RANDOMIZED SIMILARITY ANALYSIS OF HULTISPECIES LABORATORY
AND FIELD STUDIES
ERIC P. SMITB
Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia
24061
INTRODUCI'ION
Biological monitoring studies involving measurments on a large number of species are difficult to analyze. the loss of important
Biological concerns are many, ranging from
species, to changes in the abundance, biomass or
biovolume of important species, to changes in the composition or diversity of groups of
species.
Although
a
number of
researchers have
recommended
multivariate methods for detecting the changes associated with differences in locations or levels of a toxicant, most studies cannot use these methods for inference because the sample sizes are not adequate and it is not feasible to obtain
adequate sample sizes.
For example,
it
is not
uncommon
researcher to observe as many as 100 different species in a study.
for a
To apply
multivariate analysis of variance, one would need over 100 replicate samples. Furthermore, it is unlikely that the assumptions of MANOVA would be realized even if this many samples were obtained.
Because of the large number of
species typically absent at a given site, the normality assumption cannot be met.
Therefore, some alternate methods are needed for the analysis of this
type of data. This paper then focuses on the analysis of biological data arising from multispecies studies.
Of interest in this paper are three basic questions
which arise in these studies (1) Are there differences due to the locations or treatments?
(2) Which species are primarily involved in the differences?
(3)
Which locations or treatments are different? The primary
inferential method we propose is based on permutation or
randomization procedures.
Such procedures were proposed for use in monitoring
studies using diversity measures by van Belle and Fisher (1977) and also by Bell et al. (1981).
Here, the methods are based on comparing similarities
between samples from like and unlike sites.
Similarities or measures of
distance between species are connnonly used to compare sites. the comparisons tend
to be
graphical and
not
based
on
However, most of an
inferential
262 procedure.
The permutation methods presented
here
are
complimented by
graphical and sununary measures to aid in interpreting the test results. MFPHODS AND DATA For
simplicity, the methods will
be
discussed
randomized design with a single factor of interest.
assuming
a completely
For example, we may be
interested in the effects of different concentrations of a single chemical on growth
in microcosm
experiments.
Each microcosm
would
receive only
1
treatment and each treatment would be applied to several replicate microcosms. Observations on for example, species biomass, would then be taken at some suitable time.
Alternatively, one may be interested in deciding if there are
differences in the aquatic community above and below a chemical plant. Several similar sites would be chosen above and below and the comaunity composition compared. variable
measured
If only a single species were to be studied and the
were
biomass,
appropriate method of analysis.
an
analysis of
variance might
be
an
However we have multiple species so an
alternate approach is required. Assume then that we have sampled at several sites in a river.
At each site,
a single sample (possibly a composite sample) is taken and recorded on a number of species.
information
This information may be presence or absence
of species, or abundance, biomass or biovolume etc. of individual species.
We assume further that the river has some point source of pollution and interest is in whether the pollution has an effect on the aquatic c o m n i t y . The approach that we take here uses the randomization or permutation method, a cormon approach in nonparametric statistics (Pitman 1938, Bradley 1968, Mielke et al. 1981, Moore et.al. 1984). The first step in the randomization analysis is the summarization of the data vectors through the use of similarity (or distance) measures.
A simi-
larity measure, Sij, describes the degree of relatedness between the species at t w o sites, i and j.
There exist a number of measures and a number of
research papers describing the merits and demerits of the various measures. The object here is not to enter into that debate but to describe methodology that is useful after the measure is chosen. is verg
important
hypothesis.
and
determines the
H m v e r , the choice of the measure
interpretation of 'the statistical
Some guidelines on the choice of appropriate measures are found
in discussions in Sneath and sokal (1973), Hellawell (1978) LamDnt and Grant (1979).
Eiajdu
(1981),
and
others.
A
simplified
summarization of
the
similarity measures categorizes them into three groups that are related to types of changes in community structure. First, if presence-absence data is used, the focus is on loss of species
263 associated with the pollutant.
Measures such as Jacard's coefficient
Sij = a/(a+b+c)
(1)
where a is the number of species present in both sites, b is the number present in site i only and c is the number present in site j only
or the simple matching coefficient Sij =
( a+d )/(
a+b+c+d )
(2)
where d is the number absent in both sites i and j, are useful for detecting changes in the occurrence of species.
Loss of species is of course not the only type of change that may occur in a ecosystem.
With mild pollution, one may expect global decreases in the
abundance of species, or some intolerant species may decline in abundance while others increase, or the relative abundances of species may change.
In
the first situation, measures that are based on quantitative or absolute measurements are to be preferred.
Some common measures include Euclidian
distance D - . = [c(xik-xjk)2]k lJ k
(3)
where xik and xjk refer say to the biomass or biovolume of species k at site i and k, or a version of the Minkowski metric Dij = ttxfi-xjkl
.
(4)
These measures may be converted to similarity measures. For relative changes, proportional abundances may be used and a measure
such as Bhatacharyya's (1946) measure of similarity sij = C(PikPjk)k
(5)
or the proportional similarity measure
Sij = Z min(Pik,Pjk) may be useful.
(6)
Kere Pik and Pjk refer to the proportion of species k at site i
and j respectively.
Alternatively, with biomass data a commonly used measure
(Sullivan 1978) is Stander's (1970) measure which is more generally the cosine of the angle between two vectors
s 13 ..
= I: PikPjk/( &ptk&k))4
( 8 )
if proportions are used. The
second
step
in the
randomization
analysis is the test of
site
264 differences based on permutations of the matrix of similarity or distance measures.
To help clarify the basic ideas, we shall make use of the simple
data and calculations in Table 1. that represent sampling from two locations, one above and one belav a suspected source of pollution.
We shall assume that
there were 5 species studied and three replicate samples taken at each location.
Below the data matrix is the matrix of similarity coefficients,
using Stander's cosine measure. coefficients. small
Note the obvious structure in the matrix of
There are two groups of high similarities and a single block of
coefficients.
The
large
coefficients
represent
the
"within"
similarities, that is, the similarities between the replicates from the same location (receiving the same treatment). the "between" similarities.
The block of small coefficients is
These coefficients measure the similarity between
samples from different locations (receiving different treatments).
If there
are no treatment effects, we expect the between similarities to have roughly the same values as the within similarities, otherwise the between similarities should be laver than the within values.
The permutation test compares the
between and within similarities assuming that there is no difference due to the location.
If there are large differences between the locations, the test
usually will indicate these differences.
TABLE 1 Bypothetical data and similarity estimates for three replicate sites at two locations, one above a source of pollution and one below.
DATA location
site
1 1 1 2 2 2
1 2 3
4 5 6
spl 10 12 18 5 3
4
sp2 5 2 9 7 4 8
sp3 8
9
4 9 6 2
sp& 2 5 1 15 12 16
sp5 1 0
2 5 9
8
ESTIMATED SIMILARITIES site 1 2 3 4 5 6
To
1 2 3 4 5 6 .685 .556 .486 1.00 ,955 .908 .950 1.00 .836 .717 .586 .506 - -.908 - - - - - - -.836 - - - - - - - 1.00 - - - - - - - - - -.515 - - - - - - - -.413 - - - - - - - - - a- - 4.685 .717 .515 11.00 .946 .925 1.00 .941 .556 .586 ,413 I .946 .L86 ,506 .44L I .925 .941 1.00
' '
'
test
differences.
for
differences,
.
a
statistic
is
needed
to
summarize
the
Recognizing the similarity of the above situation with the
analysis of variance method, a possible statistic (Gocd 1982) is
265 L =
E/ii
where % is the mean between similarity and
w is the mean within similarity.
It turns out that because T, the total of the similarities is a fixed number
% or
for a given similarity matrix, one may also just consider
w for testing
purposes (see for example Mielke et al. 1981 or Good 1982). To carry out the permutation test, we compute the statistic L for the data as collected.
Now because we have assumed that if
Call this value L(data).
there is no effect due to the pollutant, we should be able to switch one or more of the upstream samples with the same number of downstream locations and this should not change the value of the statistic by very much.
If however,
there is a difference between locations, switching the data should cause a relatively large change in the value of the statistic L.
We shall refer to
To test for differences in location, we carry
this value of L as L(permute).
out a large number of these data switches or permutations, say 1,000 and compare the original value L(data), with the permuted value.
If L(data) is
more extreme than say 95% of the L(permute) values, then one would reject the null hypothesis of no differences between locations.
Note that one may
permute similarities and not the data to save computation. Several comments need to be made at this point. locations
while
illustrative
is
problematic
in
First, the example using that
the
pollution
is
confounded with location so that any differences may not be due to pollution. As
with
most
statistical procedures,
a
significant difference does not
Second, in some cases (small samples) there are
necessarily imply causation.
only a small number of permutations possible. observations at k
locations or treatments
location, there are Nl/(nllnZI.. .%I
)
If there are a total of N with
ni replicate sites per
different permutations.
When
this
number is small, one may wish to compute all possible permutations of the data (Berry and Mielke 1984).
Third, the test is a one sided test and rejection
depends on the statistic that is used. values of
i/w
similarity measure is used. one considers
If
B/w
is used, one rejects for large
if a distance measure is used or small values of
?i
%/w if
a
Alternately, if a similarity measure is used and
as the statistic, then wereject for small values of
%
or large
values of
6.
If a distance measure is used, then one would reject for large
values of
6 or
small values of
ii.
PoLLoW-UP ANALYSES OF REPLICATES
After rejecting the null hypothesis of no location differences, there are a number of analyses that could be used to determine which locations (assuming there are more than two) differ and which species are the important species for indicating the differences.
It may also be of interest to look for
266
possible odd data values that may influence the analysis.
While there are a
number of procedures that could be utilized, we will focus on just a few here. For comparing locations, a set of graphical procedures useful for ordering the samples are the multidimensional scaling procedures (Gower 1966).
The
purpose of this set of procedures is to find a set of x and y data points that when graphed represent the location of the samples in a two dimensional space (a reduced species space).
The points (x,y) are chosen so that the visual
(Euclidian) distance between the points is approximately the same as the distance (or inverse to the similarity) between the samples as measured by the chosen index.
By looking at the graph of the sample sites, one can look for
groups of samples, for possible ordering of the sites, or for odd data values. A
more inferential approach is to compare the mean similarities for the
different sites.
These comparisons may be made using the multiple comparisons
permutation procedure developed by Foutz et al. (1985).
This procedure allows
one to control the error rates associated with overall set of comparisons.
FOLLOW-UP ANALYSES OF SPECIES
There are a number of possible methods useful for evaluating the individual species.
Our interest here however is not only in changes in individual
species but which species are primarily indicated by the test.
responsible for the differences
As the test is strongly dependent on the measure of
similarity, we propose using the contribution (or importance) of the species to the overall statistic as an importance measure. (for example, Bhattacharyya's computed.
measure),
If a measure is additive
the contribution may be directly
However, most measures are not additive so a method based on the
effect of removing a species on the similarity is proposed. Let B-i be the mean of the between similarities with species i removed. Then
rwi=
10o(B-~-?i)/B
(10)
measures the percentage relative influence of species i on the mean between similarity.
Large positive values indicate a species whose removal greatly
increases the between similarity.
These species show differences between
locations. Species with large negative values decrease the between similarity when removed and represent species that show little change over most locations but contribute to the between similarity.
Species that have influence close
to zero indicate species that are relatively unimportant or do not dominate the measure.
EXAMPLE For an application of the above methods, we shall use a data set from an
267
experiment on the effect of zinc on the periphyton comunity in the New River at Glen Lyn, Virginia.
Twelve artifical streams were placed by the river and
received one of four zinc treatments (0.0, 0.05, 0.5 or 1.0).
Artifical
substrates were placed in the streams and removed at a number of times throughout the experiment.
Thus, the experimental design is a split-plot
design but without blocking (Miliken and Johnson 1983).
As the experhtent was
designed to obtain a significant time-zinc interaction, data for individual times were analyzed separately.
A detailed analysis of the full data set will
be presented elsewhere. Table 2 gives the data for day 20 of the experiment and some sununary measures using Stander's measures for the eleven dominant species.
The
permutation test indicated significant differences between the four treatments for both measures (no mean between similarities were lower than the observed in 1000 permutations of the data). Figure 1 shows that the low zinc treatments (0, 1 and 2) are close together while the highest dose of zinc ( I )is well separated from the other doses. The mean
similarities suggest
a
relatively
high
degree
of
similarity
for
replicates within a treatment (the diagonal elements) while the between means show a decreasing similarity with increasing zinc concentration.
Note that
this is not clearly displayed in Figure 1 due to the horseshoe effect (Kendall 1971).
Variance
estimates suggests a high
replicates using Stander's index.
degree
of
variability
for
The multiple comparison procedure applied
to the between means suggest overlappings between the 0.0, 0.05 and 0.50 treatments.
2
0 0
0
1
11
2
2 Axis 1 -0.25
-Oa50
-0.75 -1.00
t
t+
I
:
+ - + --0.2 +-
0.0 0?2Axis 2 Figure 1. Plots of 12 replicate artifical streams from the Glen Lyn study in the first two species axes using multidimensional scaling on the matrix of Stander's similarities. Symbol 0 represents replicates for the 0.0 zinc treatment, 1 the 0.05 treatment, 2 the 0.50 treatment and I the 1.00 treatment. -0.6
-0.6
k?
TABLE 2
I .
Data (in cells/cmL) on 11 species from a study on the effects of zinc on the algae community in the New River, influence measures and mean similarities. SPECIES ZN 0.0
0.05 0.5 1.0
INFLUENCE
REP
1
2
3
4
5
6
7
1 2 3 1 2 3 1 2 3 1 2 3
175 134 77 44 18 49 111 59 29 81 98 66
1745 931 393 1738 24 1 716 846 386 482 862 953 794
642 0 393 14 11 16 44 22 29 65 53 37
408 412 323 59 55 66 89 40 51 85 80 14
21 53 48 564 137 874 7 3 0 44 17 0
730 596 705 1923 1103 1891 7 18 14 40 26 51
0 0 3 0 0 0 163 3 44 694 561 185
204 134 230 876 427 924 5133 802 995 8457 14012 10028
8
0.06
4.18
0.47
0.69
0.19
2.33
0.02
44.73
9
10
11
29 187 141 222 200 208 326 155 111 102 205 14
379 245 300 490 159 216 1515 594 1084 98 989 37
2293 1070 3613 5615 3725 7099 12940 9489 20116 1013 5740 1121
0.06
MATRIX OF MEAN SIMILARITIES USING STANDER'S INDEX ZN
0.00
0.05
0.50
1.0
0.00 0.05 0.50 1.00
.872
.882 .982
.824 .933 .967
.265 .322 .365 .97 3
-0.01
-15.09
co
269
The influence measures indicate some interesting relationships between the data and the similarity statistics.
On looking at the data, one notices that
some cell densities tend to be quite large. for species 5 and 7.
differences in similarities. the magnitude of 11.
k.
Easily observed differences are
However, these species have negligible importance to the Species 8 and 11 are the species that determine
This peculiarity is due to the high abundance of species
To better understand the relationship between the data, influence and the
differences between treatments, one must consider the proportions adjusted for the square root of the sum of the squared proportions which are presented in Table 3. 0.05
Note that species 11 dominates Stander's measure for treatments 0.0,
and
0.50.
abundance.
Only
the
last
treatment
alters this species relative
Species that have a strong decreasing effect on between similarity
are species 8, 6, and 2. Species 8 increases relative to others in replicates of the 1.0 treatment while species 2 and 6 are diminished by increasing zinc. Differences in Figure 1 between replicates for the 0.0 and 0.05 treatment are primarily due to species 2, while species 6 contributes to differences between the replicates for the 0.05 and 0.50 treatment and species 8 and 11 affect separation of the 0.50 and 1.0 treatments.
DISCUSSION The methods discussed above represent only a few of possible techniques available for analysing conanunity data.
These methods are however oriented to
the analysis of data using a specific measure of similarity. A number of other techniques such as principal components, discriminant analysis and detrended correspondence
analysis
(Greenacre
Euclidian type measures of distance.
1986,
Gauch
1983)
are
dependent
on
If interest is in Euclidian (or weighted
Euclidian) distance then these methods, which are available on some computer packages, should provide useful results.
One method of interest is Gabriel's
biplot analysis (ter B r a a k 1983) which allows graphical displays of both the species and the replicates. One drawback to the permutation approach is that the size of the difference between the mean between and mean within similarity may not be important but significant.
If all between similarities are only 0.01 less than the smallest
within similarity, the degree of significance using the permutation test is the same as if the difference was 0.50. nonparametric methods.
This problem is c o m n with many
Hence one should consider the magnitude of the between
and within similarities as well as the significance of the test.
An approach
based on the size of the similarities is available if one is willing to make additional assumptions.
Dyer (1978) presents a method based on linear models
and the assumptions of normality and homogeneity of variances.
N
4 0
TABLE 3 Adjusted proportions (p
ik
f o r Glen Lyn d a t a .
k
SPECIES
ZN
0.0 0.05
0.50 1.0
REP
1 2 3 1 2 3 1 2 3 1 2 3
1 0.056 0.082 0.021 0.007 0.005 0.007 0.008 0.006 0.001 0.010 0.006 0.007
L
0.562 0.570 0.105 0.276 0.061 0.096 0.060 0.040 0.024 0.100 0.063 0.079
3 0.207 0.000 0.105 0.002 0.003 0.002 0.003 0.002 0.001 0.008 0.004 0.004
4 0.132 0.252 0.086 0.009 0.014 0.009 0.006 0.004 0.003 0.010 0.005 0.001
5 0.007 0.032 0.013 0.090 0.035 0.117 0.001 0.000 0.000 0.005 0.001 0.000
6 0.235 0.365 0.188 0.306 0.281 0.252 0.001 0.002 0.001 0.005 0.002 0.005
7 0.000 0.000 0.001 0.000 0.000 0.000 0.012 0.000 0.002 0.081 0.037 0.018
8 0.066 0.082 0.061 0.139 0.109 0.123 0.366 0.084 0.049 0.984 0.921 0.991
9 0.009 0.115 0.038 0.035 0.051 0.028 0.023 0.016 0.006 0.012 0.013 0.001
10 0.122 0.150 0.080 0.078 0.041 0.029 0.108 0.062 0.054 0.011 0.065 0.004
11
0.739 0.655 0.961 0.892 0.949 0.947 0.922 0.994 0.997 0.118 0.377 0.111
271 The examples discussed above are for relatively simple designs.
For
example, we did not test directly for a time effect or interaction between time and treatment in the Glen Lyn study. Analyses of more complex designs are possible but not as straightforward due to the use of repeated measurements on the same e-rimental
unit over time (Carter et al. 1982).
ACKNOWLEDGEMENTS This paper was supported in part by NIB grant #18770.
REFERENCES Bell, C.B., L.L. Conquest, R. Pyke and E.P. Smith 1981. Some nonparametric statistics for monitoring water quality using benthic species counts. pp.100-121. In Environmetrics 81: Selected Papers. Society of Industrial and Applied Mathematicians, Philadelphia. Berry, K.J. and P.W. Mielke 1984. Computation of exact probability values for multi-response permutation procedures (MRPP). Conrmunications in Statistics: Simulation and Computation 13:417-432. Bhattacharyya, A. 1946. On a measure of divergence between two multinomial populations. Sankhya 7:401-&6. Bradley, J.W. 1968. Distribution Free Statistical Tests. Prentice Eiall, New York. Carter, R.L., R. Morris and R.K. Blashfield 1982. Clustering two-dimensional profiles: a comparative study. Technical Report 175, Department of Statistics, University of Florida, Gainesville. Dyer, D.P. 1978. An analysis of species dissimilarity using multiple environmental variables. Ecology 59:117-125. Foutz, R.V., D.R. Jensen and G.W. Anderson 1985. Multiple comparisons in the randomization anlaysis of designed experiments with growth curve reresponses. Biometrics 41:29-38. Gauch, K.G. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press, Cambridge. Good, I.J. 1982. An index of separateness of clusters and a permutation test for its significance. Journal of Statistical Computation and Simulation 15 :81 -84. Gower, J.C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325-338. Greenacre, M.J. 1984. Theory and applications of correspondence a n a l y s i s . Academic Press, London. Kajdu, L.J. 1981. Graphical comparison of resemblance measures in phytosociology. Vegetatio &8:47-59. Kellawell, J.M. 1978. Biological Surveillance of Rivers: A Biological Monitoring Handbook. Water Research Centre, Stevange Englan. Kendall, D.G. 1971. Seriation from abundance matrices. In F.R. Hodson, D.G. Kendall and P.A.P. Tautu (eds.). Mathematics in Archaelogical and Historical Sciences pp.215-252, Edinburgh University Press, Edinburgh. Lamont, B. and K. J. Grant 1979. A comparison of twenty measures of site dissimilarity. L. Orloci, C.R. Rao and W.M. Stiteler (eds.). Multivariate Methods in Ecological Work. International Cooperative Publishing Kouse, Fairland, Maryland pp.101-126. Pitman 1937. Significance Tests Which May be Applied To Samples from Any Populations. Journal of the Royal Statistical Society (Series B), 4:119130.
27 2 Mielke, P.W., K.J. Berry, P.J. Brockwell and J.S. Williams 1981. A class of nonparametric procedures based on multiresponse permutation procedures. Biometrika 60:720-724. Milliken, G.A. and D.E. Johnson 1984. Analysis of Messy Data. Volume 1: Designed Experiments, Wadsworth Inc. London. Moore, W.E.C., L.V. Holdeman, E.P. Cato, I.J. Good, E.P. Smtih, R.R. Ranney, J. Bunnerster 1984. Variation in periodontal flora. Infection and Immunity 46 :720 - 726. Sneath, P.H.A. and R.R. Sokal 1973. Numerical Taxonomy: The Principals and Practices of Numerical Classification. Freeman Publishing Company, San Francisco. Stander, J.M. 1970. Diversity and similarity of benthic fauna off Oregon. M.S. thesis, Oregon State University, Corvallis, Oregon. 72 pp. Sullivan, M.J. 1978. Diatom community structure: taxanomic and statistical analysis of a Mississippi salt marsh. J. Phycol. 14:468-475. ter Braak, C.T.F. 1983. Principal components biplots and alpha and beta diversity. Ecology 64:&4-462. van Belle, G. and L. Fisher 1977. Nonitoring the environment for ecological change. Journal of the Water Pollution Control Federation 49:1671-1679.
OF
ASSOCIATION
A
CHLOROPHYLL
WITH
PHYSICAL
AND
CHEMICAL
FACTORS IN LAKE ONTARIO, 1967-1981
,
A.H. El-Shaarawi' David R. Peirson4
J. Richard E l l i o t t 2 , R.E.
Kwiatkowski3 and
'National Water Research Institute, Burlington, Ontario Department of Mathematics3 and Biology4 Wi lfrid Laurier University, Water loo, Ontario
1.
INTRODUCTION Eutrophication
lakes.
is
a
However, the
oligotrophic eutrophic
state
state
naturally-occurring
rate
at
which
(limited
a
nutrient
(excessive
all
in
from an
availability)
nutrient
accelerated by anthropogenic inputs.
process
lake proceeds availability)
to
an
can
be
The most visibile impact
of eutrophication on an aquatic ecosystem is that of increased biomass/number primary biomass
of
biological
producer has
supplies,
(algae)
been
organisms,
level.
This
responsible
reduced
for
depleted
or
tainted oxygen
hypolimnetic
(cold,
with
decaying algal m a t s , and
dead
or
deep)
waters
the aquatic resource, making i t
particularly
of
at
the
excessive increase drinking levels
lakes,
water
in
beaches
in the
clogged
overall impairment
of
unsuitble for other desirable
uses. Concerned the Lake
about
the deterioration of the water quality of
lower
Great
Lakes
Erie
Water
Pollution
Ontario-St.
Lawrence
due
State
(renewed in 1978). of
Lake
program
Ontario
eutrophication
Board
Water
governments of Canada and Canada-United
to
and
the United
Great
the
Pollution Lakes
(International
International Lake Board,
1969),
States signed
Water
Quality
Though whole lake water quality has
specifically
occurred addressing
Agreements was initiated in 1974.
since the
1967,
a
the
the 1972
Agreement sampling
surveillance
requirements
of
the
214 One of the objectives of the surveillance program was to describe lake conditions on a spatial and temporal basis. Due to the
fact
increased
that
increased biomass
nutrient
loadings
is an expected effect of
(Vollenweider,
1976),
it
was
concluded that the cumulative progress of eutrophication could be
described
by
the
long-term
periodic
measurement
of
phytoplankton standing stocks (Watson et al., 1975). Two
methods
phytoplankton
were
readily
biomass.
available
Direct
for
enumeration
measurement and
cell
of
volume
measurement with subsequent biomass calculation, or indirectly through
advantages (a
5
chlorophyll
measurement.
and disadvantages.
Both
Direct
methods
have
enumeration was
costly
time consuming ( 2 to 3 samples/day/person)
$lOO.OO/sample)
and difficulties existed with accurate
linear measurements of
the cells and adequate dry weight estimates of the organisms. The advantage, assuming
that
an adequate estimate of numbers
could be obtained by subsampling the parent population, was a highly
accurate
Chlorophyll done
estimate
of
phytoplankton
5 was inexpensive (=$2.00/sample)
quickly
and
disadvantage was
(80
efficiently simply that
biomass.
and
could
samples/day/person).
5
the chlorophyll
be The
concentration
within a given phytoplankton cell varies with species, age and the overall physiological status of the algal cell, as well as the
nutrient
and
physical
environment
in
which
the
phytoplankton cell exists. Due to
the
size of the Lake Ontario
surveillance program
(approximately 90 stations sampled monthly, Figure I),
direct
phytoplankton enumeration and measurement was considered to be too
costly
and
time
prohibitive
surveillance parameter. only biomass
to
be
used
as
As a result, chlorophyll
indicator routinely measured
a
routine
5 became the
on all surveillance
cruises. Now
that
concentrations corresponding
reductions have
been
reductions
in
spring
clearly in
algae
biomass
predicted by the Vollenweider model a result, the valuable Ontario.
long-term
historical The
relationships
aim
between
of
the
this paper
are
phosphorus (IJC,
is
chlorophyll
5
1983)
e x p e c t e d , as
(Vollenweider, 1976).
chlorophyll
record of
total
documented
As
data set has become a
trophic to
2
find and
status if
*
of
Lake
statistical
other
commonly
275
STATION PATTERN FOR LAKE ONTARIO, 1974-80
>.
STATION DELETED 1975
II
STATION NOT SAMPLED 1975-1976
A
STATION DELETED 1971
u
19 30
LilOlEl~SS
t 4300
b STATION ADDED 1975
+
STATION ADDED 1976
.> STATION ADDED 1977
1.
Fig.
measured
S t a t i o n P a t t e r n f o r Lake O n t a r i o ,
water
quality
variables
exist,
1974-80.
and
to
study
the
stability o f these relationships seasonally, as well as over time,
to
provide
insight
into
the
overall
limnological
processes influencing the trophic status of Lake Ontario.
2.
MATERIALS A N D METHODS
Chlorophyll 5 has been collected on all whole lake cruises on Lake Ontario since 1967. year ranged number
of
The number of cruises in any one
from three (1971,
1980) to
and 1973) to 9 5 (1969).
from 3 2
identical station and cruise
Consistency in whole lake monitoring began with the
surveillance cruises, starting in 1974.
Similarly, changes in
methodology in collection and analysis of chlorophyll occurred.
the
(1972
It should be pointed out that in the
pre-1974 years n o two years had patterns.
17 (19741, while
stations in any given cruise ranged
For
1968
and
1969,
continuous
5
have
fluorometric
measurements of surface waters were performed using the method
276 of
Lorenzen
(1966)
concentration.
to
In
estimate
1970,
photometrically using the Parsons,
1968)
Starting
in
at
phacopigments
the
analytical
Yentsch
the
0-20
a
1974. and
Further
integrator
details lists
on
for
the
data
used
in
computer
at
the
Canada
Ontario,
in
the
STAR
stored 1973
in
two
files,
inclusive
Lake
on.
efficient
The
-a
data
silica
procedures (19741, Due
can
and
found
differences
in
programs were r e q u i r e d had
been
record the
sampled
for
further
period
1967
variables
were
collected
between
found
on
a
Cyber
Burlington,
data
the
data
from
data
was
1967 t o
from
retrieved
1974
from t h e
p r o g r a m PRETl w h i c h w a s
the
(mg
chlorophyll
total
phosphorus
P/Q),
integrated
temperature
depth
at
(meters),
the
soluble
integrated particulate organic of
i n mg N / Q ) m e a s u r e m e n t s w e r e
on
frequently
integrated
not
analytical
Department
of
the
techniques,
these
SORT1 w a s 1969,
1969
methods
stations at
combine
at
recorded,
and
so
field
Environment
(1975).
select
July
measured
(metals),
N/Q),
sampling
to
July
be
were
analysis. to
location
can
Ontario
the
1973 measurements
in
to
and
in
(both
details
be
Lake
all
study were
(mg
and Watson and W i l l i a m s to
stored
secchi
nitrogen
Further
station
Waters,
phosphorus
C),
ammonia a n d n i t r a t e / n i t r i t e retrieved.
starting
i n v e s t i g a t i o n were
From 1 9 6 7 t o
particulate
1969)
are
The
the
(mg S i O , / Q )
(me C / Q ) .
phytoplankton
Inland
station
nitrogen
reactive
and
samples were obtained
containing
for
reactive
(deg.
biomass
using
(1969)
program a v a i l a b l e .
retrieved
soluble
point
total
for
particulate
sampling
carbon
base.
sounding depth of
P/Q),
(mg total
study
general access
retrieval
variables
(pg/Q),
Lorenzen
frequency,
for
other
The v a l u e s r e q u i r e d
analyzed
2)
1983. this
Centre
the
data base using the most
by
column.
also
Ontario cruises
one c o n t a i n i n g
and
were
(Schroeder,
cruise
the
water
phacophorbide
chlorophyll
i n Kwiatkowski and N e i l s o n ,
A l l
and
spectro-
( S t r i c k l a n d and
the
samples
5
described
stratum,
metre
parameter
chlorophyll
chlorophyll
determined
in
To p r o v i d e e s t i m a t e s o f
i n t h e t o p 20 m e t r e with
depths
(phacophytin
procedure
(1970).
was
SCOR/UNESCO e q u a t i o n
discrete
1972,
for
fi viva
the
chlorophyll
during
the
water
and
1971,
various
which c h l o r o p h y l l
data
into
applied
to
which
a l l
surface. a l l
sorting
of
the
a
single
data
from
relevant For
data
variables
277 except one
t e m p e r a t u r e and c h l o r o p h y l l were m e a s u r e d
meter,
and
the
remaining fewer
variables.
than for
than
75 s t a t i o n s ,
these;
measured data.
at
a
From
and
the
meter
and
used
from
20
the
1973,
at
a depth
combine
of
the
one
s u r f a c e d a t a on t h e
most
surface data
cruises
only
cruises
so
which
sampled
SORT1 w a s
sampled
more
t e m p e r a t u r e were
depth
applied
and
SORT3 w a s
sampling
generate
to
these
for
total
particulate
comparable
SORT4 a v e r a g e d
all
of
or
less
for
analysed
more
than
meters
used
was
phosphorus,
To
variables,
to
the variables except
total
chlorophyll.
readings
had
integrated
nitrogen,
remaining
and
remaining
a l l of
1974 on,
particulate carbon
for
one
1972
In
75 s t a t i o n s
used
ding
SORT2 w a s
r e a d i n g s on t h e s e v a r i a b l e s w i t h
meter
the
program
readings
the
each
organic for
correspon-
variable
per
stat ion. Some
data
variables variable for
were such
measured
at
secchi
depth
as
a n a l y s i s but
s i z e by more
others
would
were
partitioned
into The
alternate data
be
not. two
set
be
would
reduce
seemed
the
large
was
all
example,
a
enough
p o s s i b l e sample
instances,
the
not
not
often
measured
containing
analysis
For
In o t h e r
consistently
subsets
as
measured
Consequently,
second
once
station.
would
approximately one-third.
variables.
3.
each
including it
variables
where
were
sets
data
at
one
would
different performed
enough t o warrant
or
stations become
sets
unless
of the
it.
RESULTS The
the
following
tables
regressions
on
i d e n t i f i e d by t h e i r maximum
R-squared
number o f
cases
the
value
total
in
t h e program
in
the
drop
is
most
obtained 1974-1981,
graphs
1967
summarize
to
1984
starting dates. values
for
for for
the
the
easily
results
versus
inclusively.
and
in
with
R-squared
value
the
August
Julian
into
the
default
There
Figure
T a b l e s 2.1
entered
1.1 t o 1.4 c o n t a i n
This
from t h e y e a r s
identified
R-squareds
given each variable
July
2.4
can
1974 t o
2,
for
be
1981.
which
the
is
conditions
is a minor
which
days
to
of are
together
equation using
in
results Cruises
cruise
entering variables.
values
the
data.
Tables
each
for each regression.
R-squared
more c l e a r l y i n
and
the
seen This
graphs the
drop
the
period
indicate the rating
equation
by
the
stepwise
TABLE
1.1
R2 v a l u e s t o 1971
for
Date
** ** ** ** ** NS
** ** ** ** **
** NS ** ** ** ** NS NS NS
** ** ** ** NS
NS
** ** ** ** ** ** NS
** ** ** ** ** NS NS
* ** **
Jan Feb Mar Mar Mar Apr Apr Apr Apr May May May May Jun Jun Jun Jun Jul Jut Jul Jul
3 30 31 31 12 27 28 30 I 12 25 29 3 9 20 22 2
data
on
Number
from 1967
of Cases
Number o f Variables
13 46 49 21 20 17
7 6 7 8 3 7
.61 .60 .49 . 9 3 (**) .54 .59 .80
11
18
7
(**)
.09
37 24 19 30 37 15 12 37 14 40 31 23 31 37
.52 .31 .88 .84 (**) . 6 8 .II
25 34 21 17 33
.83
38
3
.53
33
4
.78 .34 .66 .79 .67 ( .92 .86 .47 .60 .67 . 8 9 (**I . 5 6 (**I
22 28 21 27 16 16 26 25 30 22 24 15 23 19
I 3 5 5 8 6 3 4 7 5 8 8 3 7 7 3
.71 .67 .71 .75 .67 .97 .20 .52 .86 .45 .34 .46 .I1
R 10 16 Jill 2 3 J u l 29 Aug 5 Aug 5 Aug 9 Aug 1 7 Aug 1 9 Aug 2 1 Aug 2 1 Sep 5 Sep 8 Sep 10 Sep 14 Sep 16 act 1 Ost 2 Oct 5 OCt 8 Oct 1 7 O c t 27 Oct 2 8 Oct 31 NOV 15 Nov 1 6 NOV 1 8 Dec 1 Dec 8
.I9
.44 .21
.oo
.91 .85 .40 at at
regression
program.
order
done
R2
* Significant **Significant
its
regressions
the the
of
.73
(**)
3 5 3 8 2
6 3 8 3 3 6 7
5 3
5 4 7 8 3
36
39
59 42
54
18
22
7
7
7 6
7
5% l e v e l 1% l e v e l
The
rating
e n t r y by t h e
total
was
obtained
number of
by
multiplying
variables
available
(8 f o r 1 9 6 7 t o 1 9 7 2 a n d 9 f o r 1 9 7 4 t o 1 9 8 1 ) , a n d t h e n d i v i d i n g by
the
number
of
equation.
An
obtained.
This
importance equations.
of
variables
average was
each In
rating
done
to
variable
Tables
1
which for
each
facilitate
between
and
actually
2,
a
variable
was
and
under
the then
of
the
over
all
comparison
equations line
entered
the
number
279 1.2
TABLE
R2 v a l u e s
f o r r e g r r r s i o r l r d o n e on d a t a
R2
Date
** ** ** ** ** ** ** ** ** ** * **
**
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** NS NS
** ** NS ** ** **
**
** **
** **
** **
Jan Jan Jan Jan Jan Jan Jan Jan Feb Feb Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun Sep nct Oct Oct Oct Nov Dec Dec Dec Dec Dec Dec
3 3 8 9 15 16 29 30 12 26 6 12 13 26 27 4 5 10 10 10 11 17 18 24 25
lowest
(72) (72) (72) (72) (72) (72)
11 12 18 19
the
the
important
the
.
.I5 .84
.56 .51 .77 .38 .70 .87 .63 .86 .75
obtained
determined In
the
nitrogen and
1% l e v e l ) , first
at the
significant in
by
years
1967
(second
temperature
were
In
to
and
1971
at
and
at
most
the
important
the
with three
the most
1% l e v e l ) ,
the
and
first
1% l e v e l
the
1973,
the
three
nitrogen
(second
soluble
second
5% l e v e l .
the
1% l e v e l
the
silica
the
(third at
(first
and
at
variables
nitrate/nitrite
5% l e v e l )
1% l e v e l
at
1972
soluble reactive the
2,
Table choosing
secchi depth
5% l e v e l ) .
variables
and
at
.
v a r i a b l e s were
the
(first
.
averages were
5% l e v e l ) at
2 2 2 7 2 2 2 2 3 3 7 3 2 3 2 3 2 3 8 7 3 3 3 3 3 3 3 3 3 8 7 7 7 3 7 8 7 7 2 3 3 2 2 2
t h e v a r i a b l e was
nitrate/nitrite
second
19 33 23 19 37 44 17 21 17 17 29 23 44 26 43 24 43 25 15 32 19 24 16 23 18 31 19 28 20 16 31 31 24 18 31 17 59 32 43 19 17 40 37 21
.
(I) (2)
ratings.
important
Number o f Variables
.79 .73 .81 .94 .84 84 .93 .94 .73 .85 .40 .47 .33 .61 .42 .60 .53 74 .95 .74 .50 58 .62 .81 76 .75 .79 .59 78 .88 .66 .84 .80
(2)
72) 72) 72) A 72) B 72) 72) 72) 72) 72 1 72) 72) 72) 72) 72) 72) 72) 72) 72) 72)
indicates that From
Number of Cases
.
I
variables
(1)
(73) (73)
2 8 9 23 23 19 5 3 17 30 30 20 5 5
from 1972
1973
to
at
reactive
and most
(third
the
at
at
1% l e v e l
phosphorous
5% l e v e l ) .
For
the
TABLE
1.3
R2 t o
for
values 1977
repressions
R2
Date
** ** ** ** ** ** ** ** ** ** ** **
*
** ** ** ** t* ** ** ** ** ** ** ** ** ** ** * ** ** ** ** ** ** ** NS
** ** * **
** ** **
** ** ** ** ** **
** ** ** ** ** ** ** **
**
* ** ** ** ** ** ** ** **
Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May Hay Hay May May Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jul Jul Jul Jul .Ju1 Jul Jut Jul Jul Jul Jul Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Sep Sep Sep Sep Sep Sep Sep
15 15 5 5 11
11 12 12 16 16 26 26 29 29 9 9 13 13 23 23 2 2 3 3 6 6 7 7 17 17 28 28 28 28 2 2 2 I8 18 19
LL 21 22 27 27 27 27 6
6 12 12 12 12 12 12 15 15 17 17 19 19 2 2 2 2 2 3 3
(77) (77) (76) (76) (75) (75) (77) (77) (74) (74) (76) (76) (74) (74) (77) (77) 74) 74) 75) 75) 75) 75) 74) 74) 77) 77) 76) 76) 74) 74) 76) 76) 76) 76) 74) 75) 75) 77) 77) 76) (75) (75) (74) (76) (74) (76) (76) (74) (74) (74) (74) (74) (74) (75) (75) (77) (77) (76) (76) (74) (74) (75) (75) (75) (75) (75) (74) (74)
A B A
B A
B A B A B A B A B A B A B A B
A B A B A B A B A B A B C
D A B C A B A
A B A A
a C
D A
B A B
c D A
B A B A B A B 0 A B
c D A B
.
96 .88 .93 .93 .95 .90 .92 .86 .86 .R3 .95 .94 .93 .94 .94 .90 .92 .72 .79 .69 .93 .92 .89 76 .95 .90 .95 .84 .78 f38 .82 64 .79 .65 .85 .73 .38 .77 .73 .32 .60 .70 .69 .55 .63 .67 .68 .39 .17 .80 .68 .79 .54 .58 .65 .55 76 .89 .84 .66 77 .86 63 .58 .70 .31 .a2 .70
.
.
.
. .
done
on
data
from
1974
Number of Cases
N u m b e r of Variables
31 71 20 48 21 45 50 87 19 38 24 47 16 40 48 93 21 40 23 43 17 28 22 40 31 53 24 46 27 42 14 23 42 64 14 20 62 28 82 18 28 46 30 23 40 35 58 22 42 15 30 15 20 28 46 27 53 27 47 22 39 13 24 42 18 31 20 33
7 6 7 6 8 7 7 6 5 4 5 4 5 4 7 6 6 5 5 4 8 7 5 4 7 6 7 6 5 4 6 5 5 4 5 6 4 5 4 3 5 4 5 6 2 5 4 5 4 5 4 5 4 5 4 5 4 7 6 5 4 7 6 5 6 5 6 5
283 R2 v a l u e s f o r r e g r e s s i o n s done
1.3
TABLE
1977
to
Date
** ** ** ** ** ** ** * ** ** ** **
** ** ** NS
** ** *
** **
** ** ** ** ** ** ** ** ** ** ** ** ** **
R2
14 14 15 NOV 15 Nov 1 5 Nov 1 5 Nov 2 5 Nov 2 5 Dec 3 Dec 3 Dec 5 Dec 5 Nov
NOV
5 4 7 6 5 4 6 5 5 6 5 5 4 5 4 5 4 7 6 5 4 5 4
.
1977,
soluble
reactive
reactive
phosphorous
5%
29 63 38 94 24 42 18 32 22 18 34 34 65 20 43 20 37 18 45 19 53 19 44 18 53 11 33 28 63 16 38 21 48 16 45
.
1974 t o
level)
1974
from
Number o f Variables
.
Nov
period
data
Number o f Cases
.95 88 .73 .78 ,91 .92 .91 .87 .99 .91 .91 .89 .87 .78 .72 76 .67 .93 .82 .70 .75 .89 .72 .54 .69 .96 .79 .69 76 .84 .70 .89 .92 .86 .73
7 7 12 12 16 16 22 22 30 4 4 4 4 14 14 15 15 25 25 25 25 3 3
Sep Sep Sep Sep Sep Sep Sep Sep Sep Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Nov Nov
on
Continued
and
the
silica
three
total
most
(third
(third at
5 4 6 5 .5 4 6 5 5 4 5 4
important
at
1% l e v e l ) ,
the
1% l e v e l a n d
the
particulate
nitrogen
were
variables
soluble
second
(first
at
at
the both
l e v e l s 1. For
all
inclusive
remaining
were
observed.
This
inconsistencies above. period
The
summaries,
excluded
the
method
inconsistencies
virtually
impossible
of
also to
from
1967
significant
no
principally
was
in
results
since
due
data
to
to
1973
trends
were
the
numerous
collection,
made
the
relate
to
results the
as
noted
from
1974
this
to
1981
data. Examination variables R-squared high.
of
available value
Total
was
the
data
indicates
that
the
number
to
the
regression
was
lower
when
the
low
versus
value
was
particulate
when
nitrogen
the
was
R-squared
the
most
common
of
first
282
-
. .. ... ..<: :. ....'... ' 87-. -. . . . . . . - z. 97--
N
0
0 7
X 67 -
. . ..
.
. ... . . . .. .- . .. . -.
77:.
I
..:.. ..
% ,.*
. . 0
..
.
. * .
9.
. .
57 J
47 -
. .
3727 -
Fig. 2. R 2 v a l u e s f o r r e g r e s s i o n s d o n e o n d a t a f r o m 1974 to 1 9 8 1 .
v a r i a b l e when the
year,
the
the
R-squared
was
low.
of
the
percentage
accounted
for
the
equation;
they
were
percent.
Based
on
were
three
variables
very
high,
usually
the
ratings
4.
i n o r d e r of
between
from T a b l e
integrated
their
R-squared entered
2,
the
and
100
70
the
value
into
three
most
for estimating the chlorophyll
s o l u b l e r e a c t i v e p h o s p h o r u s and carbon,
total
first
frequently utilized variables concentration
Consistently throughout
total
particulate
5
nitrogen,
integrated particulate organic
importance.
DISCUSSION number
A
of
d i s t r i b u t i o n of 1970;
Nicholson,
et al., and past
1973;
have
chlorophyll 1970;
been
2
relate
1980).
published
Few
chlorophyll
on
o v e r Lake O n t a r i o
Glooschenko
Glooschenko and
Kwiatkowski, to
papers
et &.,
Dobson, attempts
2
1975; have
the
horizontal
(Chau et
1972;
al.,
Glooschenko
Kwiatkowski, been
concentrations
to
made the
1978
in
the
other
283 TABLE
1.4
R 2 v a l u e s f o r r e g r e s s i o n s d o n e o n d a t a from 1 9 7 4 to
1977
Date
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * *;
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
Mar Har Mar Mar Mar Mar Har Mar Apr Apr Apr Apr
Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun Jun Jun Jun Jun Jun Jul Jul Jul Jul Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug
R2
16 16 20 20 21 21 24 24
. .
4 4 8 8 10 10 21 21 27 27 30 30 8 8 19 19 28 28 5 5 15 15 25 25 4 4 13 17 30 30 8 8 10 10 25 25 27 27
Sep 5 Sep 5 S e p 14 S e p 14 S e p 17 Sep 17 Oct 5 Oct 5 O c t 10 O c t 10 Nov 1 4 Nov 14 NOV 1 6 Nov 1 6 Nov 1 9 Nov 1 9 Dec 7 Dec 7
.86 .86 .83 76 96 .92 .94 .94 .94 .R4 .95 .92 .94 .91 .93 .92 .87 .80 .95 .85 .95 .82 .78 .79 .93 .82 .82 .73 .89 .85 .90 .88 .66 .72 .57 .43
79) 78) 78) 81) 81) 79) 79) 78) 78) 81) 81) 80)
80) 79) 79) 78) 78) 31)
B A B A B A B A B A B A B A
B A B A
81)
5
79) 79) 31) 31) 78) 78) 38) 38) (81) (81) (79) (79) (81) (81)
A B A B A B A B A B A B A 5
.80 .66 .79 .66 .86 .87 .65 .69 .80 .62 .66
.62 .88 .85 .91 .90 .74 .82 .85 .73 .55 64 .90 .82 .89 .84 .93 .79
.
Number of Cases
Niirnber o f Variables
40 90 39 83 43 89 42 88 43 94 35 92
7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 5
45 92 59 92 55 93 45 93 27 56 62 94 35 59 33 54 61 93 34 59 30 58 55 92 33 59 28 58 45 89 35 93 26 59 54 92 51 94 52 94 44 93 26 59 18 53 31 88 39 93 31 93
4 7 6 7 6
5 4 5 4 7 6 5
4 5 4 5 4 5 4 5 4 5 4 7 6 7 6 5 4
7 6 5
4 7 6 5 4 5 4 5 4 7 6 7 6 5 4
284 Rated o r d e r of v a r i a b l e s
TABLE 2 . 1 Date
Depth
Jan I1 Feb 3 Mar 3 Mar 3 0 Mar 3 0 Mar 31
Aug
Aug Aug Aug Auz Aug
Sep Sep Ser, Sep Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Nov Nov Nov NOV Dec Dec Dec
2.67 4.00
--
5.00 1.60 2. 67 1.60
variables Notable
8.00
5.33 5.33 8.00 8.00
2.67 __ 6.40 4.80 1.00
-
Stadelmann only one
and
a
chlorophyll the
a
1.14 3.43 3.43
1 .00 3 .00
5.71
4.57
6.86
6.86
5 . 71 3.20 --
4.80
0.00
8.00
5.00
(8.00)
4.57
3.00 7 . 0 0 2.00 1.14 2.29 3.43
8.00
7.00
6.00
5.00
2.00
3.43 7.00 6.40
2.29
(8.00) 3.00
6.36
(8.00)
1. I 4 2.00
5.00 6.00
7.00 5.00 4.57
1.00
2.00
1.00
4.80 6.40 3.20
2.00
6.00 7.00 6.86
3.00 8.00
2.29
1.00 3.43 - -
4.00 6.00
3.00
7.00
5.00
3.20 8.00 6.00 5.71 7.43
4.80 6.40 7.00 4.57 5.71
2.00 1.14 1.14
6.00 4.57
(8.001
5.71
8.00
3.00 2.67
8.00
1.60 6.86 8.00 4.00 6.86 8.00 2.29
1.14
3.43
8.00
(8.00)
7.00 4.57
3.43 6.40 6.00 4.57 3.00 4.57
2.00
1.00
7.00 8.00 4.00
1.16 7.00 6.86
I _
8.00
5.00 5.71 1.00
5.71
2. 67 5.71 4.00 5.71
5.00
6.86
6.86
1.99 2.79 3.19 3.18
3.43 2.29 -
4.98
are significant
5.09 at
5% l e v e l
the
measured are
Munawar
(1975), located
Kwiatkowski concentrations
The p r e s e n t
on
on and from
a
4.49
__ 2.18 2.05 3.0L
surveillance and
although
variables paper
1.97 4.87
the
Stadelmann
2.15 2.31
1.97 --
2.20 2.20 5. 11
I _
2.29
2.00-
8.00
physio-chemical
1974.
2.00
5.71 4.57 2.29
1.00
(8.00)
1.60
few s t a t i o n s year.
6.86
(8.00)
8.00
4.80
1.84 2.04 3.95
routinely
4.57
(8.00)
(5.00)
I _
exceptions
3.43
8.00
rn
1.14
underlined
6.86
8.00 2.67
2.67
4.61
year,
8.00
8.00 2.67
3.20
*Values
with
-
1.00
2.35 3.37
to
2.29 5.33
1.14
1 8
over
8.00
1.00
27 5 5 9 9 17 21 5 10 14 16 I 2 2 5 13 17 27 28 31 31 15 15 16 18 1
1.14 -
TP
SRP
N02,NOs
8.00 5.71 4.57 7.00 8.00
4.00
5.00
1.60
May 1 Mav 1 2 May 2 5 May 27 Jun 3 Jun 9 J u n 20 J u n 22 .Jill 2 Jul 8 J u l 10 J u l 16
NH3
3.43 6.86 5.71
2.67
12 Apr 28 Apr 30
Jul
Temp S e c c h i S i l i c a
(70) 4.57 2.29 (70) 2.29 *1.14 (70) (71)A (5.00) 6.00 ( 7 1 ) B 2.29 3.43 (70) 5.33 8.00 1.14 (69) (70) 8.00 (68) 6.40 (69) 5.33 2.67 (69) 6.00 4.00 (8.00) 5.33 70) 6.86 68) 5.71 8.00 69) 5.33 69) 3.00 69) 5.33 2.67 70) 68) 4.57 5.71 69) 6.00 4.00 8.00 ?.20 67) 8.00 5.33 70) 67) 4.80 3.20 67) 69) 3.00 4.06 7 1 ) A 4.00 2.00 71)B 5.71 70) 2.67 8.00 8.00 (57) 2.67 2.2 6.00 (67) (69) 4.00 5.33 (70) 8.00 8.00 1.60 (67) (67) (69)A 4.00 5.00 3.43 ( 6 9 ) B 6.86 (68) 6.86 2.29 (70) 5.33 (67) 3.20 6.40 (68) 4.57 2.29 (67) 4.80 3.20 2.00 (69)A 3.00 ( 6 9 ) B 3.43 2.29 ( 7 1 ) A 6.00 5.00 1.14 (71)B 3.43 8.00 (70) 5.33 (68) 4.57 1.14 5.00 ( 6 9 ) A 6.00 8.00 (69)B 2.67 5 . 3 3 (70)
Apr
1967 t o 1971
Fraser these
transect
El-Shaarawi all
measured,
and
were r e s t r i c t e d of
Lake
(1977)
surveillance
represents the
program.
(1974)
but
only
first
Ontario related stations for
attempt
one to
285 TABLE 2.2
Rated order of variables 1972 to 1973
Date Jan Jan
Jan Jan Jan Jan Jan Jan Feb Feb Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun
Sep Oct Oct Oct Oct Nov Dec Dec Dec Dec Dec Dec
3 3 8 9 15 16 29 30 12 26 6 12 13 26 27 4 5 10 10 10 11 17 18
24 25 1 2 8 9 23 23 19 5 3 17 30 30 20 5
5 11 12 18
19
Depth
Temp
8.00 8.00 8.00
4.00
(73)(1) (73)(2) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (72) (72) (72) (72)A (72)B (72) (72) (72) (72) (72) (72) (72) (72) (72) (72)A (72)B (72) (72) (72) (72) (73)A (73)R (72) (72)(1) (72)(2) (72) (72) (72) (72)
Secchi
Silica
NOZ,N03
(8.00)
2.29
3.43
5.72
6.86
SRP
TP
4.00
4.00
4.34
5.72
4.00 4.00 4.00 4.00 5.33
8.00
8.00 8.00 5.00
2.67 5.33 2.67 1.14 3.43
8.on 8.00
5.33
8.00 8.00
__ 2. 67
2.67
5.33
8.00
2.67
8.00 (8.00)
4.00
4.00
4.00 8.00
5.33
8.00
2.67 2.67 2.67
__ 2.67
5.33
5.33
8.00 8.00 8.00 8.00 8.00 8.00 8.00 8.00
4.00
8.00
2.67
2.67 2.67 7.00 2.29 8.00 4.57
5.33 5.33 6.00 8.00
4.57 6.86 5.33 4.57 4.00
__ 1.00
2.67
8.00
6.84 7.00
2.29
8.00
4.57
3.43
4.00
8.00 8.00
4.00
2.00
4.00 4.00
8.00 8.00 8.00
1.00
7.00
5.33 2.67
5.33
5.33 5.33
(8.00)
6.86
1.14
4.57
2.29
5.00
8.00
5.33
8.00 5.33 6.00 6.86
2.67
3.00
2.00
5.72
2.29
3.43
1.14
5.00 3.43 1.14 2.29 -
3.00 6.84 3.43 8.00
2.00 5.72 5.72
1.14
5.72 6.00 5.72 6.86
8.00
3.00
3.43
2.29
8.00
4.57
4.00 4.57
3.43 1.14 5.00 1.14 1.14
6.36 5.72
2.29
3.43 1.00
2.29 8.00 4.57 8.00
1.77 2.24 -
1.14 2.29 1.68 1.71 -
6.86 5.72
1.14
8.00
4.00
3.61 - 4.18 - 5.27 3.56 3.76 5.17 -
-
2.11
2.75 __
4.60 4.37
6.37
the other variables
2.67 -
1.46 __ 1.46 __
__ 2.11
e s t a b l i s h if a c o n s i s t e n t and
NH3
9 4.24
pattern exists between chlorophyll measured
on
the Lake
Ontario
a
cruises
1967-1984.
In Lake by
Ontario,
chlorophyll
spring pattern
and has
a
fall, been
phytoplankton biomass maxima,
concentration, w e r e c o n s i s t e n t l y
with
minima
reported
in
as b e i n g
the
summer.
represented found This
in t h e bimodal
typical of large temperate
oligotrophic lakes (Hutchinson, 1 9 6 7 ) . It
is
indicators,
interesting integrated
to
note
that
particulate
the organic
other
biomass
carbon
and
286 M L E 2.3
Rated order of variables 1974 to 1977
Date
Depth
Mar 15 (77)A Mar 15 (77)B 5 (76)A
Apr Apr
5.72 (8.00) 5.72 5.33 8.00 2.29
5 (7618
A D r 11 (75)A
Apr 11 (75)B 4.57 Apr 12 (77)A Apr 12 (77)B 4.00 Apr 16 (74)A 4.110 Apr 16 (74)B 8.00 Apr 26 (76)A 8.00 Apr 26 (76)B 8.00 Apr 29 (74)A (8.00) Apr 29 (74)B 8.00 May 9 (77)A 4.57 May 9 (77)B 5.33 Mav 13 (74)A 6.67 Ma; 13 (74)B 1.60 May 23 (75)A 4.80 May 23 (75)B 6.00 Jun 2 (75)A 6.00 Jun 2 (75)B (8.00) Jun 3 (74)A 4.80 Jun 3 (74)B 6.00 Jun 6 (77)A 8.00 J u n 6 (77)B 5.33 4.57 Jun 7 (76)A Jun 7 (76)B 4.00 J u n 17 (74)A 3.20 Jun 17 (7418 4.00 Jun 28 (76)A 5.33 Jun 28 (76)B 6.40 Jun 28 ( 7 6 ) c 3.20 4.00 Jun 28 (76)O J u l 2 (74)A 4.80 J u l 2 (74)B 5.33 J u l 2 (74)C 6.00 J u l 18 (77)A 6.40 8.00 J u l 18 (77)B J u l 19 (76)A 8.00 J u l 21 (75)A 8.00 J u l 21 (75)B (8.00) J u l 22 (74)A 4.80 J u l 27 (76)A 5.33 J u l 27 (76)C (8.00) J u l 27 (76)O 6.00 Aug 6 (74)A 1.60 Aug 6 (74)B 2.00 Aug 12 (74)A Aug 12 (7418 2.00 Aug 12 (74)C Aug 12 (74)O 2.00 Aug 12 (74)A 6.40 8.00 Aug 12 (7418
1.60
2.35 3.37 4.61
integrated biomass,
noted
8.00 6.67 3.43 4.00 6.00 8.00 5.72 8.00 3.20 6.00 6.40 6.00 3.20 6.00 6.86 6.67 8.00 8.00 3.20 4.00 2.00 2.29 6.40 8.00 2.29 2. 67 8.00 8.00 4.80 6.00 4.00 8.00 4.80 8.00 6.40 2.67 4.00 3.80 4.00 5.33 3.20 4.00 1.60 8.00 4.80 8.00 3.20 4.00 4.80 4.00 8.00 6.00 3.20 4.00
4.57
2.29 -
2.29 -
4.57
(8.00)
1.14 4.00
-
SRP
IPOC
3.43 1.33
6.86 2.67
1.00 1.14 2.29
5.00 4.57 3.43 2.67
1.148.00 6.67 2.67
3.00 3.43
5.72 1.14 1.33
5.33
E
m
2.00 1.60 -
3.20
2.
4.00
on
4.
no
6.40 2.00
2.29 -
5.72 4.00 -
1.33 (6.40) 4.00
3.00 5.72
3.20
-
4.57 -
-
3.43 -
8.00 2.29 6.67
6.86 6.40
x -
1.33 4.00 4.80 (8.00) 2.00 __ 1.00 3.43 8.00 4.00 1.14 1.33 5.72 2.67 1.60 2.00 1.60 __ 1.60 2.00 8.00 (8.on) 8.00 8.00 6.00
6.40 3.20 6.67 4. A0 -
(8.00)
1.14
1.60 2.00 -
6.86
5.72 6.67
4.00 -
1.14
3.43 5.33
1.33
(8.00) (11.00) (8.00) 4.80 8.00 6.00 1.60 1.33 2.00 1. 60 2.00
2.67 3.20
-
4.00
_ .
8.00 8.00 6.00
8.00 1.33
4.00
E 4.00
3.20 -
4.80
6.40 4.00
(R.00)
4.80 6. on
3.20 4.00 1.99 3.19 4.98
2.79 3.18 5.09
particulate plus
-
6.00 4.57
6.00 6.40 8.00
3.20 6.67 6.40
3.20
living
ts.oo)
7.00 6.86
8.00
4.80 6.00
in
8.00 5.33 6.40 1.60
1.60 2.00 m 2.67 3.20 2 .00 6.40
6.40
4.80
-
-
2.67 3.20 _ .
-
2.67 -
-
4.00
37z
8.00 2.67
-
6.67
-
the
ITP
1.33 2.00 -
(8.00)
living
that
ITN
8.00 4.00 6.86 6.86 6.67
7.00
variables
-
Silica 5.33
-
total
phytoplankton been
Secchi
1.84 2.04 3.95
e.g.
influential
Temp
2.20 2.20 5.11
1.60 -
2.00 -
nitrogen
detritus)
2.15 2.31 4.49
1.97 1.97 4.87
(measure
proved
to
explaining chlorophyll biomass) v a r i a b i l i t y .
bicarbon
ratio
(a
2.18 2.05 3.01
ratio
of
of
be
a
It
total
the
most
(measure of has
the
already
amount
of
287 TABLE 2.4
Date Mar Mar Mar Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr
Apr May May May May May May Jun Jun Jun
Jun Jun Jun Jul Jul Jul Jul
Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug Sep
Sep Sep Sep Sep Sep Oct Oct Oct Oct
Depth
6.86 16 (81)A 6.67) 16 (81)B 20 (78)A 5.72 8.00 20 (78)B 5.72 21 (79)A 21 (79)B (8.00) 8.00 24 (80)A 24 (80)B 4.00 4 (81)A 6.86 4 (8118 8.00 8 (79)A 8.00 6.67 8 (79)B 6.86 10 (78)A 10 (78)B 8.00 21 (80)A 6.86 21 (80)B 8.00 27 (81)A 6.40 27 (81)B 6.00 30 (79)A 5.72 30 (79)B 6.67 4.57 8 (78)A 8 (78)B 5.33 19 (81)A 4.80 8.00 19 (81)B 28 (79)A 8.00 8.00 28 (79)B 8.00 5 (78)A 5 (7B)B 8.00) 15 (78)A 6.40 1 5 (78)B 4.00 6.40 25 (79)A 4.00 25 (79)B 3.20 4 (78)A 6.00 4 (78)B 13 (81)A 3.20 4.00 13 (81)B 6.40 30 (79)A 6.00 30 (79)B 8 (78)A (8.00) 6.00 8 (78)B 10 (81)A 4.57 8.00 10 (81)B 25 (80)A 5.72 25 (80)B 6.67 4.80 27 (19)C 6.00 27 ( 7 9 ) 0 5 (78)A (8.00) 5.33 5 (78)B 14 14 17 17 5 5 10
10 Nov 14
NOV 14
NOV 16 Nov 16
Nov 19 Nov 19
Dec Dec
Rated order of v a r i a b l e s 1978 t o 1981
7 7
(81)A (81)B (79)A (79)B (81)A (81)B (78)A (78)B (78)A (78)B (81)A (81)B (79)A (79)B (81)A (81)B
6.00 4.80 6.00 6.40 6 .00 4.57
3.43
4.57
6.72 8.00
1.14
(8.00)
4.57 5.33 6.86 6.67 6.86 5.67 3.43 5.33 6.86
(8.00)
3.43 4.52 4.52
8.00
5.72 2.67 -
2.29 5.33
2.29
3.20
1.90 4.00
8.00
(8.00)
4.80
-
(8.00)
1.33 4.80 2. on
2.67 5 . 72
2.67
6.40 4.00 2.29 5.33
3.20
6.00 1.14 4.00 1.60 -
m
2.29
5.33 3.20 4.00 -
6.40
-- -
2.00) 5.72 5.67
R.no
2.00 -
8.00
6.40 2. no 6.40
1.60 8.00
1.60 __
2.00 1.60 2. on 1.60
8.00 4.00 4.80
6.86 2.67 4.51 8.00
5.72 4. 00
1.14 1.33 -
3.43
__
2. 00
3.43 1 . 1 4 __ 6.67
3.20
6.40
4.57 4.00
6.68
-
5.72 8.00
4.80 -
1.14 1.33 -
I. 60 2.00 2.29
2.67
2.67 4.00 1.60 __ 4.51 4.00
8.00 2.29 6.67
8.00 6.67
1 .14 1.33
4.80 4. no
6.40 8.00
3.20
- -
2.51
2.92
1.60 2.00 1.69
2. 71
2.92
1.96
4.534.99
;%55
3.43 8 . no
2.00 -
2.29 1.14 2.67 1.33 3 . 4 3 1.14 5.33 1.33
4.80
~
m3-m
1.33
mA.OO
8.00
6.86 3.00
4.51)
1.33 R.00 6.00
2.00
5.72 2.67
3.68 4.04
8.00
8.00 3.43 1 . 1 4 1.33 4.00 3.43 6.67 1.33 (8.00) 5.33 2, 0 0
2.00 1.60
1. 60 -
4.51 4.66
--
-ITb(i
6.86 6.67
3.20 4.00 8.00 4.00 5.72 6.67 6.40 6.00
8.00
( 8 . 0 0 ) 2.29 6.67 4.00 3.43 5.72 -1_ .33_ 4.00 2.29 1.14 -2.61 _ _ 5.33 2.29 5.72 2.67 4.00 -
8.00 (8.00)
4.00
2.29 5.33 2 .67
8.00
4.80
6.40
2.67
-
1.33 1.14 1.33 1.14 2. 67
6.00
6.00
3.20 8.00
4.00
1.33
ri-6 ~ . __
3.20 -
3.20
4.80 6.00
1.60 2.00
--
-
4.80
8.00 5.33 3.86 4.00
.
2.Zv3-73
(8.00)
1.60
_
2.29 2.67
3.43
(8.00)
6.00
-
6.86 1.33
1.33 1.14
2.29 8.00 2.29 4.00
IPOC
SRP
2.29 2.67
1.14 1.33 -
6.77 4.57 6.67
8.00
4.00 3.43 5.33 4.80
6.67 4.57 5.33 5.12
ITP
ITN
4.57
6.86
8.00 6.40 4.00
Silica
5.72
1.60 -
8.00) 3.43 8.00
Secchi
4.00 -
8.00
5.33
Temp
3.20 4.00 R.00 R.OO 8.00 8.00
6.86 6.67 2.29 2.67 -
5.72 5.33 6.96 8.00
3.20 __
8.00 -
2.47 2. 70
2.44 3.00
m m
288
living
material
to
the
amount
of
detritus)
layer of Lake Ontario is uniform and
for
the
0-20 m
approximately equal
to
12.5%, while the summer ratio is highly variable, ranging from 25 to 50% (MacRinnon and Kwiatkowski, 1980). Obviously the relationship between detrital pool to living material
plays
an
important
role
in
the
variable
interrelationships found in this study. It was found that over 9 0 % of the variation in the size of the
algal
biomass
(as
estimated
by
the
chlorophyll
5
concentration) could be accounted for by a multiple regression model
typically
integrated
containing
total
only
particulate
three
variables;
nitrogen,
soluble
namely, reactive
phosphorus and integrated particulate organic carbon, in order of
importance.
Moreover,
the
coefficients
in
the
multiple
regression models, although varying within each year, appeared to be similar from year to year.
It was also noted that the
magnitude of the squared multiple correlation coefficient was positively that
correlated with
i s , the R-squared
the size of the algae population;
values
were high when
the
was large and low when the population was low. values
were
consistently found
period, with
during
5
between chlorophyll during
-a/phytoplankton decreasing cation,
the
summer
ratios
nutrient
for
why
stratified
have
been
peak
15
to
25
to
dominated agreement dominated
in was by
the low
greens
5
found during
phytoplankton in and
summer
known
with summer
stratifito
settle
In Lake Ontario thermocline
meters,
Good agreements between chlorophyll have been
the
subthermocline chlorophyll and
the
0-20 m
the water
column.
thus
integrator may not have adequately sampled measurements
decrease
summer
been
variables
Chlorophyll
during
During have
form a
(Steele and Yentsch, 1960). from
relationships
period.
(.e.p
phytoplankton populations
ranges
the
found
concentration
of the epilimnion and
depth
stratified
and the other physio-chemical
stratified period, Tolstory, 1979). out
L o w R-squared
summer
the physical rather than the nutrient variables
being the most influential. Several explanations exist
decrease
the
population
and direct phytoplankton the spring, when diatoms
community. when
blue-greens
the
However,
phytoplankton
(Watson
this was
5 g.,1975).
Bacterial growth (total biomass) has been found to be maximal
289
after the diatom peak in algal biomass (chlorophyll
al.,
1979).
lence
While Mortimer
and
within
internal
Lake
waves
Ontario
(1974)
has
generated
contribute to
proposed
by
the
2 ) (Rao
that
turbu-
mid-summer
breakup
of
seiches
the major
2 peak through alteration o f the physio-
spring chlorophyll
chemical environment. Obviously, more detailed observations on the chemical environment, as well as chlorophyll zooplankton and
bacterial
a,
physical
and
phytoplankton,
measurements, coupled
with
primary
productivity data at selected stations, at a frequency greater than monthly, will be
required
to
resolve questions on what
regulates summer phytoplankton populations.
REFERENCES Chau,
Y.K.,
R.A.
Chawla,
1970.
V.K.,
2
-
1970: 659-672, Manual
H.F.
and
Vollenweider,
Distribution of trace elements and chlorophyll
a in Lake Ontario.
Department
Nicolson,
of
Proc.
13th Conf.
Great Lakes Res.
Internat. Assoc. Great Lakes Res.
the
Environment.
1974.
Inland
1974.
Waters
Analytical
Directorate,
Methods
Water
Quality
Branch, O t t a w a , Canada. Glooschenko, W.A., M o o r e , J.E. and Vollenweider, R.A. 1972. The seasonal cycle of pheopigments in Lake Ontario with particular
emphasis
Limnol. Oceanogr. Glooschenko, Surface Ontario
W.A.,
of
role
zooplankton grazing.
17: 597-605. Nicholson,
distribution and
the
on
Erie,
of
1970.
H.F.
and
total
Moore,
chlorophyll
Tech.
Rep.
No.
J.E.
1973.
5
Lakes
in
351, Fish.
Res.
Bd. Can. Glooschenko, W.A.
and Dobson, H.F.H.
Hutchinson, G.E.
1975.
1967.
A
Treatise o f Limnology.
Introduction to
lake
Wiley and Sons.
New York, New York.
International Quality
Joint
Board's
Appendix.
Water quality in
Nature Canada, 4: 3-6.
the Great Lakes.
biology
Commission. 1983 Report
and
the
1983.
Vol.
limnoplankton. Great
Lakes
2. J.
Water
on Great Lakes Water Quality.
Great Lakes Surveillance, 100 Ouellette Avenue,
Windsor, Ontario.
290
International
Lake
Erie
Water
-
International Lake Ontario Board.
1969.
Commission.
Report
Vol.
Kwiatkowski,
St.
to
Board
and
Lawrence Water Pollution the
International
and
El-Shaarawi,
surveillance
data
Great Lakes Res.
1970. for
Lake
2.
J.
3: 132-143.
Kwiatkowski, R.E.
1978.
surveillance
Scenario for an ongoing chlorophyll
plan
sampling years. Kwiatkowski, R.E.
Joint
A.H.
obtained
O n t a r i o , 1974 and their relationship to chlorophyll
-a
the
1, Summary.
R.E.
Physicochemical
Pollution
Lake
on
Ontario
1980.
Ontario, 1974-1979.
Chlorophyll
for
non-intensive
4: 19-26.
J. Great Lakes Res.
5 measurements from Lake
Canadian Technical Report of Fisheries
and Aquatic Sciences No. 933. Kwiatkowski, R.E.
and
surveillance d a t a ,
Neilson, M.A.T. 1968-1980.
Technical Bulletin No.
126.
1983.
Lake
Inland Waters Environment
Ontario
Directorate,
Canada,
Ottawa,
Canada.
1966.
Lorenzen, C.J.
%
of
A method for the continuous measurement
chlorophyll concentration.
Deep Sea Res.
13:
223-227. Lorenzen,
C.J.
1967.
phacopigments Oceanogr.
Determination
spectrophotometric
chlorophyll
of
equations.
12: 343-346.
MacKinnon, M.
and Kwiatkowski, R.E.
1980.
A
survey of A T P
concentrations in Lake Ontario, 1975-1976. Res.
J. Great Lakes
6: 177-183.
Nicholson, surface
H.F.
1970.
water
of
The
Lake
Tech. Rep. No. 186. Rao,
and
Limnol.
Ontario,
June
to
R.E.
and
J.D.H.
handbook
of
of
November,
Jurkovic,
Distribution of bacteria and chlorophyll
the
1967.
and
seawater
A.A.
1979.
2 at a nearshore
Hydrobiologia, 6 6 : 33-39.
station in Lake Ontario. Strickland,
content
Fish Res. Bd. Can.
Kwiatkowski,
S.S.,
5
chlorophyll
Parsons,
1968.
T.R.
analysis.
Fish.
Res.
A
practical Board
Can.
Bull. 167. Tolstory,
A.
1979.
phytoplankton Hydrobiol.
volume
85: 133-151.
Chlorophyll in
some
a
Swedish
in
relation
lakes.
to
Arch.
Vollenweider, R.A. loading
levels
1976. for
Advantages
phosphorus
Mem. 1st. Ital. Idrobiol. Watson,
N.H.F.,
Problems
Carpenter,
in
the
in
in
defining
lake
critical
eutrophication.
33: 53-83. and
G.F.
monitoring
of
Munawar,
biomass.
1975.
M.
Water
Quality
Parameters, ASTM STP 573, American Society for Testing and Materials, 1975: 311-319. Watson, N.H.F.
and Williams, D.J.
of a pilot presented
1975.
surveillance program at
Research,
the
18th
Annual
International
Design and operation
for Lake Ontario.
Conference
Association
on
for
Paper
Great Great
Lakes Lakes
Research, Albany, New Y o r k , 1975. Yentsch, C.S.
1970.
Productivity, Trebon.
The state o f
In
environment. pp.
chlorophyll in the aquatic
Prediction
489-592.
Proc.
and
Measurement
IBP/PP
Tech.
of
Meeting
This Page Intentionally Left Blank
GAMMA MARKOV PROCESSES
R.M. PHATARFOD Monash U n i v e r s i t y ,
Clayton,
Victoria,
Australia
INTRODUCTION AND GENERALITIES The statistical problem of monitoring of water quality over time is essentially a study of the time series of some variable of interest (usually called a parameter) such as, for example, dissolved oxygen, suspended solids, various kinds of chemicals, organic matter and other impurities, the values being observed at various intervals of time, such as a day, a week or a month. There are two situations involved here. One is when the observations are made on the body of water such as a lake. Here an unusually high observed value of a parameter would make us inquire as to its cause. This may lead us to check the input source, such as a river for excessive pollution, or acid rain etc. The other situation is when the observations are made on the input source itself, such as a river and the problem of interest is the effect the observed values (or a statistical model of it) would have on the value of some parameter in the body of water fed by that river. An example of a problem of the latter kind is: having observed the concentration of suspended solids in the river over a period of years, we would want to know about the probability distribution of the totality of the load of that type in the lake in the future. It is with the second problem that the paper is concerned. Let us denote the amount of water flowing into a lake, say during the ith time period by Qi , and the volume of impurities 1.
.
Usually the Pi would be estimated from (load) by Pi observed values of the concentration and estimated values of Qi If the impurities are conservative, i.e. do not decay over time, we are then concerned with the sequence of concentrations C, in j j the lake at time j , where Cj = C P./C Qi On the other hand, 1 1 1
.
if organic matter is involved, the concentration C, is given by j j C . = C bJ-i Pi/C Qi . We are interested in the probabilistic 3 1 1 behaviour of the sequence {C.) over time; of particular 3
.
294
interest is the probability distribution of the time N such that CN crosses a threshold T for the first time. NOW, even if the {Pi) and CQi) were a sequence of independent and identically distributed random variables, C C . 1 3 is a sequence of dependent random variables, the dependence being of a complicated kind, and the problem of finding the distributions of ( C . 1 and N is a formidable one. 7 As a first step towards the solution of the above problems, we consider here the simpler problem of formulating a model for input variables (loads), their properties, and their mathematical tractability particularly in the direction of the derivation of the distribution of their cumulative sums and weighted sums. In the context of water quality monitoring, we are in effect ignoring the variation of the Q ' s and concentrate on the cumulative sums of the P's. First, let us see some broad features of the input variables. As in many geophysical phenomena, a time series of input variables may be regarded as a realization of an autocorrelated stationary process which may be approximated by a Markov process. The random variables are obviously non-negative; they are usually of the additive type (except when they are of a biological kind in which case they are of a multiplicative type): they are continuous rather than discrete. These considerations lead us to assume an input model to be a Gamma-Markov. Let us now consider the properties a proposed model (Gamma-Markov) for inputs should have: (A) First, since the purpose is to study real life phenomena, and .therefore use historical data, one should be able to estimate parameters of the model from the data in an efficient manner. This is particularly important for geophysical phenomena, as in most cases the historical series are very short. (B) The model should be easy to simulate. Such a facility would allow us to study the behaviour of processes derived from the model when it is not possible to do so mathematically. (C) The majority of hydrological phenomena have a pronounced seasonal element: it follows therefore that the model should be capable of extension to take seasonality into account. The word 'season' is used here in the wide sense. A year can be divided into 4 (natural) seasons or 12 'seasons' (or months): a week is composed of 2 'seasons' - week-days and week-ends, etc.
295
(D) The model should be mathematically tractable;
in the present context this implies that one should be able to derive the properties of sums - weighted or otherwise - of the random variables involved. A survey of Gamma-Markov models was given in Phatarfod (1976). The situation has changed somewhat since then. First a new model - The Gamma Autoregressive model (Gaver and Lewis; 1980) has appeared; secondly, in comparing the properties of the various models proposed, such as simulation and estimation of parameters, the advance in computer technology has meant that what was a difficult problem for some models is a comparatively simple one now. There are five Gamma Markov models proposed - three of which have been proposed in hydrological context (to be sure, for water quantity, not quality) and two as models for point processes; the latter two are mathematical1 tractable - they would have to be, or otherwise they wouldn't have been proposed! Phatarfod ( 1 9 7 6 ) has given a description of the models by Thomas and Fiering, Yevdjevich, and Moran. It was shown that while these models have most of the properties A to C above, they do not have property D. A description of the Linearly Regressive Gamma model and the properties of its cumulative sums were also given there. What we consider here is the problem of simulation of that model, its seasonal extension and the properties of the cumulative sums - weighted or otherwise - of the seasonal extension. It should be mentioned that it is possible to obtain M.L.E. of the parameters of the models; for reasons of shortness of space these are given elsewhere; see Phatarfod (1985). For the same reason, the properties of the Gamma Autoregressive model are not given. It can be shown that that model cannot be extended to take seasonality into account. Let us now define our variables of interest by X , i.e. {Xi} forms a first-order Markov chain when in the equilibrium condition, X has a gamma distribution. First consider the case when the X represents impurities (loads) in the form of organic matter, i.e. the non-conservative case. It is then of interest to study the distribution of Yn - Xn + bXn-l + ... + bnXo , where 0 < b < 1 , and represents the rate of decay.
Assuming that the process has been going on
296 m
for a long time we replace this by
Yn
= C 0
brXn-r
.
This is
considered in Section 4 for the (seasonal) Linearly Regressive model. Secondly consider the more difficult case when the X represents conservative matter. The cumulative sum CXi must increase over time, and what is of interest is the first time N T NT such that C Xi 3 T , i.e. crosses the threshold T For
.
1
example, X may be volume of suspended particles, and T is the volume such that EXi? T implies that the outlets of a reservoir are blocked; NT is then the life-time of that reservoir. When the sequence Xi are i.i.d., the asymptotic cumulants of NT are most easily obtained by applying Wald's Identity (see e.g. Cox and Miller; 1965). For, we have ignoring the overshoot over the barrier T ,
]=
-NT-1
1
where f*(t) is the Laplace Transform (L.T.) of each Xi . Therefore, if NT is large so that NT+l can be replaced by NT , we have
- SNT log e [ E
] = tT
(2) where s = l o g f*(t) . The left-hand side of (1) is the cumulant generating function (g.f.) of NT and hence the above two equations show that the cumulant g.f.'s of NT and of X are inverse functions. Inverting ( 2 ) we have,
.
It then follows where K ~ ,K ~ ,... are the cumulants of X that the first four cumulants of NT are asymptotically, (3)
Now it is shown in Phatarfod (1971a) that an analogue of Wald's Identity holds for the {Xi} forming a Markov chain
291
for which
n E[exp(-tC Xi]
Q
D(t)A(tln , in which case
f*(t)
is
1
replaced by A (t) . If now log A (t) can be expressed as r xKr(-t)/k! (where K r need not be cumulants) the cumulants of NT are given by ( 3 ) . 2. THE LINEARLY REFRESSIVE MODEL be a sequence of random variables Let Xi (i = 0, 1, 2, ...) such that the conditional L.T. of L 1+1 ~ , (tJxi)= [l+ta ( 1 - p )
I-'
Xi+l
Xi = xi
given
exp [-ptxi/{l+ta
is
11
(1-p)
(5)
cr,p>o, O < p < l From this, the conditional density of
Xi+l
xi
given
=
x.
is
where Ir is modified Bessel function. The equilibrium distribution of X is a Gamma (p,a) distribution, with L.T. Lx(0) = (l+ta)-' . From ( 5 ) , we have E(Xi+llxi) = pxi + (1-p)pa 2 2 v(xi+l(xi) = 2Pa(l-P)xi + pa ( 1 - p ) Corr(Xi,x. . ) = 1+7
Denoting
pJ
o s
p <
pxi/{a(l-p) I
L ~ , (tlx.1 = e-A c 1+1 r=O
1
A , we have
by
A'[l+ta
(1-p)
3 - (p+r)
r!
Eq. (6) shows that given Xi = x , Xi+l is a Poisson-Gamma mixture, i.e., 'i+l has a G(U+p,a(l-p)) distribution, where is a Poisson variable with mean A This result allows us to
U
.
simulate the sequence xo, xl, x2, ... as follows. Generate a value of a Poisson random variable (e.g. by using IMSL/GGPOS(1980) subroutine) with mean A = pxo/{a(l-p)) Call it u Now generate a value of a gamma random variable (e.g. by using IMSL GGAMS(1980) subroutine) with parameters u + p , a ( l - p ) . The value so obtained is x1 This procedure is repeated n
.
.
.
times to give the sequence xo
I
x1
, x2
I
-.. I
x*
.
n The L.T. of the sum given by (see Phatarfod;
Sn = C xi 1 1971b).
for a given
Xo = x
is
298
L n '
n n (tlxo) = CIAul (t) + Bu2 (t 3 -p pl(t) ,
where
For large
n
p2
(t) = %[l+p+ta(
-p) *{
[l+p+ta ( 1 - p ) 3
- 4P 1 %
*
we have
(tlxo) Dpl-np(t) Lsn from which we obtain,
From (7) we obtain us
h(t) = pl-np(t)
and the cumulants of
3.
(7)
NT
.
Expaiiuing log A(t)
gives
as
LINEARLY REGRESSIVE SEASONAL MODEL We first consider the case of two seasons.
The sequence
...
) of random variables form a cyclic Markov Xli, X2 1. (i = 0,1,2, sequence with the transitions X21. + Xl,i+l and Xli + X2i given by the conditional L.T.'s as follows:
L ' 1 , i+l
(tlx2i= y) = [l+tal(l-p2)1-p expI-tp2a1y/[a2+ta1a2 (1-P2) 11
(tlXli = x)
= L(a2'a1,P1IX)
.
The equilibrium distributions of X1, X2 are gamma with parameters ( p , a , ) and (p,a2) respectively. Also, corr. (Xli,X2i)
= p1
Equations ( 8 ) ,
where
X1
=
;
corr(X 2i' X 1,i+1)
=
~2
*
(9) can be written as
~ ~ x / { a ~ ( l - ~,~X2 ) }= p2y/{a2(1-p2)I
.
Thus the conditional distributions are Poisson-Gamma mixtures. Given X2i = y , Xl,i+l has a G(U + p , a l ( l - p , ) ) distribution where U is a Poisson variable with mean p2y/Ca2(1-p2) I and similarly given Xli = x I X2i has a G(V + PI a2(1 - p,))
299
distribution where V is a Poisson variable with mean plx/{al ( l - p l ) } . From this the sequence xll, x21, x12, x22, , . can be generated in a manner similar to that for the non-seasonal case. As in the non-seasonal case, the particular form of the conditional L.T.'s and the Markov dependence can be exploited to n (Xli + XZi) . The derive the L.T. of the sum Sn = C i=l derivation is somewhat cumbersome and is given in Phatarfod (1985). We have
.
Ln(t) = E where ul with H = Thus ,
Moreover, expanding log A(t) gives us (for simplicity, we consider only the mean and the variance).
t K1
= P(a +a )
1
I
2
K2
(a12+a2
2
1
(1 2
T
E(N ) = T p(al+a2)
Var(NT) =
-
2a1a2(P1+P*)
IP
PlP2)
2
1 (1+P1P2)+2ala2 (Pl+P2IT 2 3 (1-P1P2) P ( a 1 + a 2 )
(a1 + a 2
,
+
(l+PlP2)
=
For the three-seasons case, the transitions of the sequence are given by the conditional L.T.'s, Xli, X2i, X3i L (tlX31. = '1, i+l Lx2i(tlXli = X) Lx3i(tlx2i
=
Y)
2)
=
L(a1ra3r~3rz)
= L(a2ralr~~r~) = L (a3ta2 r ~2 ry)
with the equilibrium distribution of Xi being Gamma (?,ai) and Also, Corr (Xi, Xi+l) = p . (i = 1,2) and Corr (X3, X1) = p3 using the results from Phatarfod (1985) we obtain 3 K~ = p C a. and l 1 K 2 = P[( 1 + P l P 2 P 3 ) xaif2ala2 (Pl+P2P3)+2a2a3 (P2+PlP3)+2ala3(P3+PlP2 ) 1/ (l-PIP 2 P 3).
.
4.
WEIGHTED SUM OF SEASONAL GAMMA VARIABLES For simplicity, we consider here the case of two seasons only.
Let Xli, X2i (i = 0, 1, 2 , Section 3 and let
...)
be the sequence as given in
300
NOW, from (11) w e have
- b2i+l Qi+l
+
a 2 p 1 R1. / a l ;
zQizi
al(l-b R(z) =
R.zi
=
2
2)
(13)
alb+a
z (alb+a2p1)
=
+ alp2Qi/a2
we obtain
from which by t a k i n g g.f.'s Q(z) =
= b2i
Ri
i
q = Q ( 1 )=
(l-plp2z)
a1 (1-b
2
p
2 1
1 (1-p1p2)
a2+ba p (14) a2+ba p z 1 2 1 2 ; r = R(1) = 2 2 a 2 ( 1 - b 2 ) (1-p1p2z) a 2 (1-b 1 ( 1 - p 1 p 2 )
S u b s t i t u t i n g i n (12), w e o b t a i n 2 ~-r = p ( a 2 + a l b ) / ( l - b ) = pal/(l-b)
when
1
To o b t a i n V a r ( Y 2 n )
u
=
a1 = a 2
= u2
,
.
w e have,
d l o dt2 g L i ( t l~ ~ - ~ ~ a ~ ~ l - ~ ~ ~ ~ f - ~ ~ ~ l - p ~ ~ h(15) ~ + a ~ ~ l -
L
where, F i = K i " ( 0 )
,
Si = J i " ( 0 )
From (11), w e h a v e , 2 - a2p1 F. - 2a2 P1(l-P1) 'i+l - -
al
al
s = CSi,
h = CRi
2
,
s
and
f
2 Ri
YP2 ,F.=-Si-
a2
2a12 (1-p2) Qi
2
"2
i and s o l v i n g f o r s and f , w e i n t e r m s of h and t . S u b s t i t u t i n g i n ( 1 5 )
from which, summing o v e r gives us
f = CFi,
.
t = ZQi2
obtain,
,
Also, from (13) we have on squaring and taking g.f.’s, 2apb t = - b24 + - a 2 ’12h+--R(b2) 1-b 1 O1
2
1-b
(17)
“2
Using (14), (16) and (17) we finally obtain,
For the case
,
a1 = a 2 = a
.
p1
-
- p2 = p
,
o2
reduces to
pa2(l+bp)/[ (1-b2)(1-pb)] This final result was obtained by Lloyd and Warren (1981) using different methods. REFERENCES Cox, D.R. and Miller, H.D., 1965. The Theory of Stochastic Processes. Methuen & Co., London. Gaver, C.P. and Lewis, P.A.W., 1980. First-order autoregressive gamma sequences and point processes. Adv. Appl. Prob. 12, 727-745. IMSL/GGAMS, 1980. International Mathematical and Statistical Libraries, Houston, Texas, Vol. 2, 8th ed. IMSL/GGPOS, 1980. International Mathematical and Statistical Libraries, Houston, Texas, Vol. 2, 8th ed. Lloyd, E.H. and Warren, D., 1981. The linear reservoir with seasonal gamma-distributed Markovian inflows. Time Series Methods in Hydroscience, Edited by A.H. El-Shaarawi and S.R. Esterby. Elsevier, Amsterdam. Pegram, G.G.S., 1974. Factors affecting draft from a Lloyd reservoir. Water Resour. Res., E l 63-66. Phatarfod, R.M. 1971a. Some approximate results in Renewal and Dam Theories. Jour. Austr. Math. SOC., 12, 425-432. Phatarfod, R.M. 1971b. Sequential tests for normal Markov sequence. Jour. Austr. Math. SOC., 12, 433-440. Phatarfod, R.M., 1976. Some aspects of stochastic reservoir theory. J. Hydrol., 30, 199-217. Phatarfod, R.M., 1985. The linearly regressive Gamma process. Stats. Res. Report No. 130, Dept. df Mathematics, Monash University, Australia.
DYNAMIC C O V A i U A T I i ADJUSTMEN'I' OF' IGA'i'ER
QUALITY PARANETEKS
FO K S ' T R E A f i F L O W : T R A a S F E R PUNC'I'ION X O D E L S E L E C T I O N LAK17Y D. H A U G H , Y O S U K E NODA, J O A N M C C L A L L E N of
University
Vermont
A BS'T K AC'T
Meekly concentration dat2 for
toM phcqhorus and
total suqpended d d s , as
w e l l as mean wee'dy f l o w , w e r e studied for a period of 229 w e e k s (mid 1979 t~ late 1983)
at
one station
in
the
River
LaPlatte
Waterdied
of
Vernont.
The
autixorr&>tion patterns in these data and thek :;tationarity in the mean are der;cnwd
via selection oE u n i v a t k t e A R I M A time Series models.
The r h t i o n s h i p of t h e
concentration data to the f l o w d a h is modelled by selection of a tr.msfer f u n d o n Contrasts are drawn between stationary and nons?htionary
model.
difEerenced) mod&.
The relative advantage of
understanaing trends in concentration h considered.
(first order
t h e flow-dependent
model for
The model provides a dynamic
adjlstinent, or m v a r i a t e a n d y s k , of concentration changes due to Elow, which would be critical in messing intervention effects.
Such a model a k captures ~ seasonal
variation in concentration via the m n a l variation i n f l o w .
This study focuses on the relationshq3 between w a t e r quality parameters measured
in a
stream, such
as suspended
ads,
phcsphorous
and
nitmgen
concentrations, and t h e stream's f l o w rate. O u r objective w a s statistical in nature, in
that w e wished to assess whether a standard time series transfer function model would be adequate to describe such relationships.
'The water quality parameter; mentioned
are exam,des of outcome v a r i a b k that would be considered in many w a t e r quality monitcxing program. The , m u l a r data discussed here arase in a long b r m water monibring project i n the L a P l a t t e River watewhed of Vermont. Of interat in such studies are short and long t a r m trends in w a t e r quality as well as the e f f e c t s of various changes in t h e environment
OK
man-made interventions, due to changes in
agriculturdl or industrial practices. O f course to properly asses the eEfect o€ interveritions or understand the reasms f o r trends in w a t e r quality, one m u j t firt account €or the changes i n such q m t y vadahlerj which would just be due to natural changes in the w e a t h e r or
envjrmment. In Lpart.icular,the concentration of various elements i n a stream w j l l be
a function of the s t r e a n ' s f l o w rate. A s t h e f l o w rate changes with rainfall or with natural alterations in ~unoEf patterns, the concentration of various elements in t h e strsam w i l l change, quite apart (or in addi*&n) to man-made interventions. ' T h s any
303
statistical study of casual effect.;
in
w a t e r quality m u s t account f o r such a
time-varying covariate as streanflow. example).
Of
(Kirsch, et. al., 1982 emphasize this f o r
come some irkenrentions by their nature [nay aka alter total
s t r e a i n f l o w or the timing of appearance of water i n the streambed and such effeects
would require separate analysis to W - y the confounding that would then n&t o t h e r c a u s e of flow changes.
In
with
the Vermont study t h e intervention of primary
concern w a s not expected to alter streamflow directly, but m y aitV coricentrations for other reasou. It is hoped that such transfer function analyses of water quality parameters w i l l h e l p advance t h e state OE t h e art in a s s d g trends and intervention.
Although such
analytical kchniques have k e n available f o r s o m e time (Box and JeniCinS, 1976) and have been applied in hydrobgical or water quality work in other conteet. al., 1 9 U q
wel, et.
(Snorrason,
al., 1985, and r e f e r e n c e therpk), they are not used as a
standard Inethodalogy in this context (see McLeod, et. al., 1983 and Damdeth, 1986, though for r a t e d studies). A h , the inclusion of ELOW mte explicitly as a variable in the analysis of seasonality.
water quality paraineters may serve to simplify the analysis of
T h a t is, much of Me x a s o n a l i t y in elemental concentrations is directly
due to of at Least confounded with changes i n f l o w rate through the seasons of t h e year.
R a t i i e r than forcing inore arbitrary seasonal adjustments on concentration
levels, which typically would somehow average o u t f l o w ,ptt.erm acres ye-, be better
it could
to s e models incorporating f l o w expliciuy via a s i h a s t i c transfer function
model, or perhapi a recjresion model i n special cases.
In t h e following section the particular Vermont t i m e series data to be analyzed w i l l be ,aced i n pempeedive. sjmmarized.
Then the statistical methods to k used w i l l ce
The results of univariate t i m e sene analysis oE toM suspended a d s
(TSS)totA phcsphorous and sWeamElow w i l l k e given. Th? rdationship of TSS and TP to s&.r?amEbw w i l l be summarized via c m correlation analysis and selection of t r a d e r function model; to be fit.
Finally aore general remark w i l l be gi.3e.n i n the
discussion section. L4PLA'r'f'E RTVEK WAIXRSHED P R O J E C ' I '
'The
LaPlatte
River
rlakrshed
lies in
northwestmi Veraont, USA. It is a mapr non.&t which has rssulted
southern
in p30r water quality in the river
ChainLiLain into which it ?mpties.
Chittenden
County
in
soure of phcsphoros and sediment and the Jhelbumr Bay of Lake
Fifty percent of
the w a t a s h e d is used for
agricultural purpzses w i t h considerable acreage in cropland and large numbers of
cattle in the watenhed. ZroL&nd er&n and poor management of animal manure w e r e felt to be drjnificant nonpoint source contributors to these water quality problems. The data used in this study were taken within the Mud Hollow Brook
304 subwatershed area, t h e brook itself being characterized by high turbidity.
This
corresponds to Station iqo. 2 w i t h i n t h e w a t e r quality monibring program of t h e L a P B t t e River Natershed P m j e c t of the Vermont 'Water R e m u r c ~ sResearch C e n t e r
of the UniverJiy of Vermont which S sponsored by t h e USDA S a i l Conservation Service,in a-qperation w i t h o t h e r local agencies within Vermont. See Figure 1. The reports of the project (e.g. N&, 1983) should be Seen for furtlier details of the watershed area, s a m p b g bcations, prccedures used in obtaining the quality and w a t e r f l o w measurements, as w e l l as motivation for establishing best management practices
The 1983 report serves as t h e
(BNP) to help control manure runoEf i n the area.
source for the general comments nade here, and t h e scolpe of t h e project is summanzed in Meals, 1985.
F y . 1. LwatLon of t h e LaPBtta riivo-r d a k r J h e d .
i 4 o m t x x q S M m n No. 2
The subwatershed mrresLwndmg to
IS h m e n e u .
watsr quality rnonibring for the project k g a n in Spring 1979. 'The p u r L m e s of the monitoring program include evaluating the i m ' o a c t
of implementing conservation
practices on t h e surface wat;?rs of the watershed and on the c x p r t o€ jediment and nutrients to S W u m e Say.
Fiv2 automated sstations
w-2
&zaolkhed
in the
watershed for t h e p u p s , at d i c h s e u i w n t , p h a p h o r o s , and nitrogen and offier p l h h n t s arz determined on a reyular time-s;lm@ng b&s.
part oE a n eleven-year study (1979-1990).
'The s a m p h i g prograin is
'The Mud i l o l l o w B r w k Station (.Lo.2)
monitors a subwatenhed of a b u t 4000 acres. The s o b in h e a r a are predor:iinat?kj lacustrine clay and t'le area contains a variety of W g e and m a n u s :nanagement practices w i t h a high l e v 4 of farmer a r t i c i p a t i o n
Li
B H P implementation.
'The only
305
a a p r p i n t surce is a w a e r t r e a t n e n t Lnlant w!lich b downstream of this bbtion. Stream >&age w a s measured at Station 2 by bubbler-type recorders and &charge d a b w e r e calculated &g
sbge-discharqe ratings devodoped earlier and , w r i O d i d l y
updated €ram manudgauginy data.
Samples for quality variables w e r e taken every
in the laboratory
eight hour; automatically and corn@tsd and one 72-hour =n@e m c h week.
;ilinpk
b yield four 24-hour
Analytical analyses f o r total ?hcqhorous,
total suspended solick a i d total K j e l d a X Nitrogen were performed in t h e University WaQr Quality Laboratory according to a c c e p e d analytical k c h n i q u e ( A P d A standard methods). In the statistical studies which served as a basis f o r our study, both the
concentrahon and mass of the v a r i a b h totd p h o s p h m o s ('TP), total suspended solrds (TSS), and totd K y k k h l nitrogen ( T K N ) w e r e consickred.
concentration are miUlgrams per liter.
The medurement d t s for
The mean weekly f l o w ( N N F ) is measured in
in units of punds per week k denved koin concentration on an aggregated weekly basis .zs bei73 proportional t? the product of cubic
f e e t ,per
second.
concentration and f l o w . be on the r&tiOnsm
Mas;
B e c a s e of space l i m i t a t i o n s t h e emphasis in tkis paper w i l l
of both TS3 and TP to d W F. 'PKN was also analyzed similarly,
but thee w e r e long stretches of mis+xi.ng values, and the results are not @ven here. The total data available f o r this study (TSS,I P , d d F ) spanned t h e t i m e period
of 1979-1983, f o r 229 consecutive wee'=. O V E R V I E r J O F STATISTICAL A N A L Y S I S
are
Wabr quality and Elow data such as these have a s u k t a n t i a l inherent variance, to extreme events (rainst0rms, etc.) and subject to sampling and
sub-
measurement errorj.
Thus a g r e a t deal of initial data editing is p r e r e q ~ t eto a
formal m u h v a r i a t e t i m e series analysis. values in any of the v&&,
&g
unfortunately there is also a ,rntakial f o r due f o r example to automatic samplers not
f u n c t i o r i q . A standard regression analysis is not a f f e c e d by these as much as a transfer f u n d o n fit because of the time lags inherent
in the htter models, and a
method f o r supplying missing values consistent with the autocorrelated model Deing fit would be the 'kst practice.
Fortunately in t h e a c t u a l mod&
w e fit to the data, the
extent of a u t c c o r r e h t i o n w a s weak (because w e are using weekly values), so that occasional missing values could more safely t x supplled by i n t e r p l a t i o n . Based on a variety of sitistical tools, including t i m e plots, the autocorrelation function
(ACF),
partial autocorrelation f u n d i o n (PACF),
consideration of
time
differencing and transformation of t h e values, a univariate t i m e Series model was identified, fit and checked f o r each of t h e vatiabls (Box and Jenkins, 1976).
Vadous
extensions (for shifts in level, seasonality and trend) of t h e basic A R I M A models often ars adequate to consider for this p .The models are fit and checked u i b g an
306
efficient statistical &mation
method, which i n our case w a s the maximum likelihood
method approximation available in the BMDP2T program of B M D P (Dixon, 1981). C r o s ; correlation of the univ.xkte residuals from each sedes fit can be u;ed as an aid to identification of a transfer function model relating quality variables to f l o w ( H a q h , 1976; Haugh and Box, 1977). The cros correlation approach available in B M D P follows that outlined in Box and J e n h (1976), in which t h e univadate model form of the f l o w series i s u;ed to transform c o r r q x m d i i g l y the quality vadable series, so that the
crcx~scorrelation
fundion between the f l o w residual series and t h e transformed quality series is pro,uortional 'm the transfer function weigh& relating Elow to the quality Series. The p t t z r n in this ccosj correlation function serves to identify the form of a rational transfer function model which can be fit.
The residuals of such a fit are used to
identify (via e.g. ACF) t h e autocorrelation form of the noise component of t h e model. Final selection of a particular model w i l l be based on various critsria.
Statistical c n W which can aid in the selection include t h e mean quare error of t h e residuals (MSE) or an A r n e - b a s e d
m k e r b n of the AIC
Ultimately t h e selected model must also be tes*d it is to be used in a forecasting application.
form (Hipel, 1981).
for out&-sample
forecast error, iE
in this study is not forecasting however, but adjustment of the quality variable Series to changes in O u r primary application
flow. Box and Jenkins (1976) rightly emphasize the iterative nature of the model
building proces.
Although automatic selection criteria like AIC can be helpful f o r
selecting a best fit model w i t h i n a specified class of models, one should not try to
-ape
from t h e diagncstic checking and identification stages of the whole process.
One m l s t imaginatively
~se graphical
and numerical checks of f i t t e d models to help
discover ways that the model could be improved. Although mathematically a stochastic time series model is either stationary or nonstationary, it is not d w a y s so clear c u t in t h e data record of a few years length which would provide t h e more useful model. Thw in practice Series can appear to be "nearly nonstationary," this also often being the case in economic apphcations. Thus
in this study a special e f f o r t w a s made in fitting transfer f u n d i o n models to e x e r e f a y each of the @bilities
that could exist with res,oed to whether the quality
variable or f l o w w a s considered as stationary or requiring f i r s t differencing.
A s long
as one fallows u p each W h i l i t y in efficiently fitting in appropriate transfer function mode, one w i l l find approximately the S a m e adjustment model implied f o r f l o w . U M V A R I A T E DESCRIPTION O F THE SERIES
In this section s a m e univariate time sedes models w i l l be given for TSS, TP and MWF. However w e have chc6en to ignore longer lags (q.seasonal lags) in the
307 moaeling presented here.
Autocorrelation coefEcients for lags less than 52 w e r e only
marginally significant in any case (say at about 26 wee'ffi) and w e evenutally w i l l
€Low itself via t h e transfer f u n d i o n model to explain seasonal efEects.
ffie
O u r zmphasis
here is on whzther the series for shorter lags are stationary and what sort of A R I M A model a t l o w lags may be n e c e s a r y to explain short t e r m persistence. Parenthetically it should be noted t h a t events such as spring runoff affecting f l o w do not occur a t fixed weeks in the year, so t h a t s9xxhastic seasonality w i l l be present (Damskth, In t e r m s of the A C F the seasonal effect a t say a 52 week lag can be fairly
1986).
weak, and not sharply focused at 52 w e e k , but &t
aLr;o a t neighboring lags.
O u r purpase is not to fully characterize the seasonality of TSS or TP but
to a d j s t ultimately the quality Series by t h e f l o w Series.
ix-d
A general principle
that is of value to keep in mind when dealing with seasonality is to a i m f o r e x p k i t e l y including t h e c a w of seasonality in the model whenever , d b l e , mther than just describing it via a spectral analysis or high order A R I M A model. ffie t h e
In our case w e can
f l o w series a s a surroyate f o r seasonal efEect in general.
Incluion of the
f l o w series as an independent variable in the trans€er function model f o r a quality serie5, if successful in
this reyard, would yield a residual d e s which is assentially
nonseasonal. Mean Weekly Flow (NWF) Stream discharge is considerzd on a mean weekly basis, and generally is expected to follow a seasonal pattern of high and variable discharge during the w e t fall season, l o w e r values in the winter, peak values in the Spring and very l o w f l o w s during t h e summer (M&,
1983).
See Figure 2. Graphical analysis of the weekly f l o w
figures indica'e t h a t a log transformation would be helpful, and hencefortb just results
for log
ly
WE' are given.
See Figure 3.
For t h e given s a m p l e data there is =me
y u e s j n as to whether a first order differencing of the series is d a i r a h l e or not.
In
t h e modelling process w e tracked through to completion the t w o a l t s m a t i v e strategies 01not dzferencing or diEferencing the Series.
Any higher order diEferencing is not
needed. Asuming a weakly stationary model for f t o w yields an AR(1) as the selected model among the class of l o w order A R M A models, (1-.75B) zt = at, with an error standard deviation ( R M S E ) of 0.71, where B s t a n d s f o r t h e usual Dackshift cperator, a t b w h i t e noise error and z is mean corrected log MWF. Summaries of this and other hv&te
models appear in Table A
In fact the best model among the class of low order AEUMA models with first differencing had a standard deviation somewhat larger (0.80), (l-B)zt
= (1-.265B)at.
Note t h a t rewriting this I M A (1,l) model as a n infinite order A R model yielck 2 (1-.735B-.195B )zt = at up to second order, which agrees approximately with t h e first
308
I]
o h . . AUG
STREAM DISCHARGE - WATERSHED 2 YEAR 5 SEPTEMBER, 1982 - AUGUST, 1983 LAPLAlTE RIVER WATERSHED PROJECT
.,. .. .. . ....... . .. .Y. .. . . .. I . .
SfP
OCl
WOV
l9W
DEC
JbN
(013
FES MAR
-. . . .... . APR
YAY
.I..
JUN
.
.r.,
N L
. .,...I AUG
SO'
nuE
Fig. 2 . Stream Discharge on a weekly b a s i s over a one y e a r period, showing seasonal highs and lows. TRENDS IN STREAM DISCHARGE - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLATTE RIVER WATERSHED PROJECT
0.1
1979
1980
1981
1982
1983
1984
llME
F i g . 3. Stream Discharge on a weekly b a s i s over more than a f o u r y e a r period, now on a l o g s c a l e .
309 order t e r m of the AR(1) model above.
This raises the general point that when fLtting
a l k n i a t i v e AiUIMA models, it should be found t h a t contending models which f i t w e l l
w i l l be s i m i l a r to each other when expressed simultaneously as infinite a u t o r e y r d v e
or infinib moving average models in terms oE their coefficient weights in such expansions. Parenthetically, the v&b& in all stationary u n i v a i k t e f i t s her2 are mean corrected as necessary, and the estimated means of the differenced series model in all
cases w e r e i7signiEicantly different f r o m zero. Although t h e rer;ults w i l l not be given here, several alternative, higher order A X M A models w e r e f i t and found to be w e r L p r a n e t e r i z e d relative to the mod&
displ3yed in Table A. Such dmphfications in model can be justiEied via significance trsts on the a t i m a t e d coefficients or via t h e AIC cnt&n. The same modelling approach of overfitting and then &n&.fying when reasonable w a s s e d f o r all univ3liate v&!de fits discussed in the following sections. T A B L E A.
FI'r'IED U N I V A R I A ' T E A R I M A MODELS
Variab k
Log Mean Wee:dy Flow
Log Total S s p n d e d
A odd
Parameter t i t i m a t e with s t m d a r d -%rots
Standard deviation
(K MSE)
Ail(1)
0.747 (.044)
0.711
IM ~ ( 1 , l )
0.265 (0.64)
0.799
A K(1)
0.682 (.O48)
0.638
M I A(1,l)
0.334 (.063)
0.670
M I A(1,4) e 2=o
0.439 (.056) 0.180 (.062) 0.238 (.065)
0.636
0.536 (.056)
0.428
0.366 (.loll -0.246 (.103) 0.238 (.1)64)
0.417
0.454 0.445 0.163 -0.147
0.424
Solid5
Log Total Phorjpharus
AR(1) AR
A(1,8)
=...=e 7=0 2
0
IMA(1,lO)
8
3
=...=€I =3
8
(.060) (.06r)) (.062) (.061)
310 Total Suspended S d d s (TSS) In general the sediment amcentration, as measured by Total Suspended Salids (,I'sS) is responsive to the changes in strean discharge, and thus i s highest i n t h e rainy fall and spring runoff periods. See Figure 4. In the analyses considered here the bgarithm of TSS w a s used as the quality v-ble. The best fitting b w order A R N A model is a n AK(l), (1-.682B) xt = at with an
d l t v n a t i v e l o w order first difference model being again the I f i A ( l , l ) ,
as (1-B) zt =
(1-.334B) a (see Table A). H o w e v a f o r this vadable s o m e more comLnlicated, but s t i l l t relatively b w order, ARIMA models actually had a l o w e r mean q u a r 2 d error and AIC value.
In
prtxular
3 4 (1-.4398-.180B -.238B )at.
the
favOwing
model
was
better:
(l-B)zt
=
This is an example of a subset moving average model,
wher--in some of t h e usual M A parameters (here just 9 ) are set to 0, and makes the 2 general p i n t that best fitting models in t h e A R I X A clas c a n be oE this cons3xained
parameter form. The iligher order terms present here indicate a "persistence" in ' E S exknding beyond the one week lag in diEfwenced values.
TRENDS IN TSS CONCWTRATON - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLAllE RIVER WATERSHED PROJECT
llME
Fq. 4. 'TOM. Suspended Solids ('TSS) concentration on a wee'dy hiis over aore than a four year ,?riod. Total Pm+kx.is
f rP)
A s with susLpended soh&,
the concentration of tot4 phos9horou.s generally
follows changes m stream discharge, having seasonal peaks i n the f a l l and spring and
311 its b w e s j values i n the winter and summer.
See Figure 5.
A
logarithmic
transformation is s e d b e b w , also mean corrected in the stationary case.
Treating
the series as s b t i o n a r y , a simple model to consider would be the A R ( l ) , which i n fact would be preferred to an AK(2) or A R M A ( 1 , l ) model, (1-.536B)z = a However a t t t' somewhat longer lags there is also autixomelation which is a t least margindly signiELcant.
Far e x a m p l e t h e following model has a better X S E and AIC than the = (1+.246B + .238B 8)at. This again is a model with longer
A R ( 1 ) model, (1-.366B)zt
a p p a r m t persistence than the AK(1) model.
TRENDS IN TOTAL P CONCENTRATION - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLATIE RIVER WATERSHED PROJECT
0.0
1979
1980
1981
1982
1983
1984
TIME
Fig. 5. Total Phosphorus ( T P ) concentratiori on a weekly basis over nore than a four year period. An important general point to beware o€ in cases like this, wheri it nay bs diEficult to argue why there should a stronger l a g 8 p s i s t e n c e than at lags
intermediate,is that the large correlation may be somewhat artifactual, and i n fact due to a few unusual ozcurrences (e.g. outlying v a l u s ) in the data set. A scatter ,plot of zt v-wsus z f o r exaznpk may reveal a f e w unwual i x i ~o€ pints t-8 contributing to the large autocorrelation.
Considering the differenced sedes, again a longer lagged model happens to f i t 2 9 10 = (1-.4548-.4458 -.163B +.147B )at which can be viewed as a subset
best, (l-B)zt M A model.
The sirn,&t
models do not ,oass d i a g n a t i c checks on their residual s a i ~ ~ .
312 T R A N S F E R FUNC'TION lYODELS FOR TSS
Once models have k e n identified f o r each of the sa-ies individually, a crcs correlation analysis of the p r e w h k n e d v e r S m s of each Series can be undertaken. The use of the &dual
s~riesfrom t h e univadate m o d 4 f i t in the cross correlation
analysis ai& t h e interpretation of tile observed lead-lag relations-, in that each Series autocorrelation structure has Seen removed (Haugh, 1977). We v i e w this crross c o r r h t i o n analysis as an aid to identi.€ying the dynamic relationship between t h e t w o s e n e (Haugh and Box, 1977).
The transfer function identification technique of Box
and Jenkins (1976) as implemented in BMDP just requires a prewhitenhg of the input series, which
is here iul W F.
4s a n exercise in model identification w e proceeded to identify and fit transfer
function models under the four &ble cases of stationary or nonstationary input and st3.ti-Jnaty or nonstationary output series. The s t a t h n a r y A R M A and nonshtionary A R I M A models f o r each of TSS and E l o w have been previouily described.
residual; of
Log f l o w f o r e x a m p l e was Ati(1) and the
t h a t AK(1) model can be crw correlated with the correspondingly
tramformed (ie. filtered) TSS series, as would be done in t h e BMDP approach.
The
only strongly significant point i n the plot is at l a g 0, indicating a contemporaneous Thus one idenida transfer function as zt = w (i4 W F ) + nt, wher2 z is the mean corrected TSS and n is the noise in the o t t t relationship and w is the r q r & n coefficient of log TSS on log Elow. 0 Having identified the transfer f u n c t i o n forms one then fits a model with an assumed A K N A noise model. The noise model identification can either w a i t for a n analysis Df the residuals and their ACF from a first f i t or one can calculat2 an -timated noise series a;; $ = z - G ( B ) ( l q MWF ), where in our case t t t A n = z - Go(bq id rJF ), and identiEy a noise model €corn the' AC F of the ;t. $ can t t t 0 be calculated €corn the v a l u e of t h e cross correlation a t h g 0 and the standard deviations of t h e input and output series. The ACF of the noise series hdxates a highly autixorrp-lated series, which it turns o u t can be w e l l fit by an AK(1) model (see Table 8). O t h e r slightly more comphcated noise models did not 6t significanay better, and the residual; Lpassod the usual diagncstic c h e c k . The r d d u a l smndard deviation of f i t w a s 0.682 f o r log TSS, which can be correlation (not unsurprkincj).
comparm to the u n i v x i a t 2 model error standard deviation for log TSS of 0.638. the inclusion of l
b
Thus
in~ a transfer function i n o u e l yields a n a d j a t e d series w i t h a 6 %
smaller sbndarct deviation, than an a a j w k d (or residual series) for b g ks own tistory.
O f cowzie C?e cornparison is nore drarnatic with tile s%mciard
deviation of log TSS itself, if no a d w t . n e n t were done a t all. Even though the reduction in error .jtandard deviation is not drainatic for the transf%r function model, if compared to the univariate model, t n a t is a F E p e c t i v e b a t is u;eEul only in tile situation of :>roviding sliort term forecasts of the quality
313 vathid.?.
Thar, is, for short t 2 r n pre&.ction (a few weeks ahead) one can do just
about as w e l l using a w e l l fit univxiate model, as it i n e n c e incorprates t h e f l o w history via t h e hktonc.al yuality lev&.
However the purpose of
OUT
fitting t5e
transfer functions is to underJt3nd h o w to adjust qudlity levels f o r k w changes in future intervention studies.
S o to d e k r m i n e whether trends in quality have k e n
significant over time or whether ther? are significant u f e r e n c e s in different >&retches of the monitoring r x o r d , it is important to have a proLper adjustment for k w so that one can s e i s t h e e f f e c t of other variables beyond those e f f e c t s simply
due to f l o w changes.
Thm t h e closer analogy is with a dynamic covariate adjvstment,
rather tiian w i t h dynamic forecasting. 'FABLE B. l?t'FFEU TRANSFER E U N C r I 3 N S F O R T O F A L S U S P E N D E D SOLI3S AiQD M L A N WEEKLY F L O h
Prmsforrnatmn of sene5 log TSS, b g
Iy
FunctLon Nme Cmfklents* Coef€n.ents** (Skndard Errors) Tra1lt;icer
CJF
b g TSSS, (1-B) b g
M XF
(l-Y)log TSS, log M N F
(l-B)log TSS, (1-B)log M rJF
* **
Error Standard DevntLon (RMSE)
0.298(.056)
Ati(l):0.707(.047)
.602
0.299(.054), 0.992(.008)
A R ( 1):0.689(.048)
.602
0.273(.058) 0.281(.057)
A R( 1):8.653(.060) M A(1):0.971(.023)
.610
AK(1):0.634(.061) M A(1):0.955(.021)
.608
0.277(.056)
Numerator coefficients given first,then denominator coefficients of the rational transfer function. Ths h t t w o noise models are AR(l), the next t w o are A R l Y A (1,l).
INonstationary Flow Imphcations In the above transfer function model fit M WF w a s t r e a t e d as stA%nary,
A R ( 1 ) model. sdected.
via a n
Suppose instead that a first differenced model form f o r is1 W r ' w e r e
W h a t transfer function model would
result?
In this case one would select say the I N A ( 1 , l ) model to filter the log €low and
104 TSS
data, before c r s conelation.
The ct-osj correlation in this case does not as
clearly indicate t h e pattern i n t h e t r a n s f s r function, as the weights at lags 0 and 1 are jEt marginally significant.
A s a ,&bk
summarization of the pattern (and in
this case w e know what w e might k looking for) one notes that the weights f o r positive lags drop slowly €corn the Lag zero weight. The identification then & of a 1 With this function, various l o w order naise
transfer function of the form wo(l-SB)-
.
models w e r e fit, t h e final selection being an AR(1) (see Table B).
N o t e that the
314 h
dynamlc praneter 6 = .992 is insignficantly different horn 1.00, which indica-
that
the transfer function combined with the (1-B) differencing operator of log M W P g i v s the s a m e cons-nt
transfer function as t h a t w i t h t h e stationary input log “N‘F
previoudy seen.
The noise models are also t h e S a m e m n t i i l l y , as is the error
sL3ndard deviation.
Thus one would need other bases beyond statistical fit to choose
betvJeen the stationary and nonstationary versions of f l o w as input. N o n s h t b n a q TSS linphcations In both of t h e above identifkations of t r a r ~ e rfunctions, TSS w a s treat&
as
s k t i o n a r y , but in this and in the next subsection w e w i l l follow y on the identification results i f TSS w e r e firt differenced k s d . If one treats flow as stationary and J5lt.e~it, as w e l l as the differenced TSS sene;, by ths AR(1) model, one sees a
c r s correlation pattern which has signiEicant
qke; at h g s 0 and 1 (other marginally significant lags w i l l not k explored in this
This leads to a transfer function of the form (wo - wlB). V a r i o u s l o w order noise models w e r e considered, w i t h the s i m p l e s t models, such as AR(1) and discusion).
MA(1) found to be inadequak. h
coefficients of ,$ = 0.653 and difEerent f r o m 1 (see Table B).
An ARMA(1,l) model did f i t adequately, w i t h
6
= 0.971, but the H A operatar is insigniEicantly
Also the fitted transfer f u n d i o n is (0.273
-
0.281B)
which is insignificantly diEferent from a model of the k r m w (1-B). Thus there 0 approximately’ a common f a c t o r of (1-B) in the transfer function, noise model and in t h e differencing used f o r log TSS.
Elimination of the common f a c t o r would yield a
model Like that €i.tst found between log TSS and log M WF, where each w e r e t r e a t e d as s b t i o n a r y initially.
Nonsthmary TSS And M W F Implications In this case both of the f i t s t differenced series are filtered by the univariate The result is a crcorrelation function w i t h just a
IMA(1,l) model f o r log f l o w .
spke a t lag 0, leading to a transfer f u n d i o n model of jlst wo.
Among l o w order
= (1-.955B)at t match the previous results seen when both series a r s stationary, To
noise models, t h e AHMA(1,l) is found to be adequate, with (1-.634B)n (see Table B).
there should be a (1-B) t e r m in the noise model.
H e r e the M A coefficient is
marginally significantly different €rom 1, although clearly clce to (1-B) in form. Eliminating t h e common factor of (1-B) in the noise model and in the differencing operators f o r output and input yields the equivalent model found when treating both
series as stationary. T R A N S F E R FUNCTION
MODELS F O R 8rP
A d m i l a r exercise w a s done for T? as w a s done with TSS.
@bk
Each of the four
differencing posSibilities among Ti? and M W F w e r e explored, as to thek
315
ultimate form of f i t t e d transfer functions.
R e s u l t s are summarized in Table C.
Not2
that t h e f5tted models are all within 2 % of each other in t e r m s of their error >&.anaard
deviation.
The forms of transfer function can be resolved with each otlier by t a h g
into account the differencing operatar and ignoring s m a l l coefficients. Far example the transfer function f o r log TP with (1-B) log MWF is (.182-.132B-.O84B 2) which when
factored
by
(1-B)
gives
(l-B)(.182+.055.034B 2...),
approximately t h e (.176) form aE transfer function f o r log
which MWF,
agrees
with
ignoring the
mefficients which are .05 and s m a l l e r . TABLE C.
FIT’IED ‘IHANGPEK FUNCTIONS FOR TOTAL PHOSPklORUS AND MEAN WEEKLY FLOW
Transfer Function Coefficients (Standard Error)
S€Z-kS
Tnmformations
log TP, log MiJF
0.176 (.036)
idaise
Coefficients
A R(1):0.389 (.097)
Fxror Standard Deviation (R MSE) 0.396
MA(1):-0.262 (.lo) MA(8):-0.202 (.065)
log TP, (1-B) b g MWF
0.182 (.035) 0.132 (.042) 0.084 (.034)
AR(lk0.340 (.103) MA(1):-0.286 (.102) MA(8):-0.224 (.066)
0.395
(1-B) b g TP, l a 3 MWF
0.163 (.036) -0.172 (.036)
MA(1):0.958 (.021) M A(1):-0.294 (.120) MA(810.171 (.069)
0.402
0.169 (.036)
AR(110.308 (.126) MA(1):0.948 (.022) MA(110.166 (.069) MA(810.20 (.120)
0.403
(1-B) kq TP, (1-B) b.j M WF
CONCLUSION A N D DISCUSSION Although
there w i l l be cases where the choice between sZationary and
nonstationary versions of the sedes are dear, t h e r e are situations where it is not obvious from the s t r e t c h of historical record.
For t h e purpcse of adjustment of the
quality series t h e distinction is not so critical, as a careful iterative model building
process of identification, estimation and diagnastic checking should bring us round to models which are close enough f o r practical purpxes. For forecasting purposes one would hopefully have o t h e r knowledge to guzde t h e c h o k e in t e r m s of the implied short and longer term forecast functions. As in any careful statistical analysis t i m e m u s t be invested in considering reasonable transformations of t h e data and i n ixybg various alternative model forms. Although a c r i t e c b n such as AIC can be helpful in selecting among several f i t t e d
316 models, one m u s t have investigated a broad enough class of models to insure that the
0ptim.d model w f l be u k m a t e l y found within the class.
In practice one should not have to explore as completely as w a s done her? tqe variom V b l e di;ferencings for each of the independent and deLpndent inodds in the It is comforting to realize that the ultimate model c h w n f o r historical
models.
adjustment w i l l be somewhat robust to such a choice.
One would choose, i n the
a k e n c e of other consid~rations,that ,nodel with a f i t c r i k r b n optimized (e.g. MSE or AIC ).
The adjustment procedure f o r t h e quality variables w i l l &,End
on the overall
assumptions m a d e about the joint relationshq among the quality series, t h e 3 o w and the otlier variables of inters* such as interventions.
If one wants to adjust for f l o w ,
a p a r t Erom any other factors, one could use the (mean-corrected) transfer function model, zt
= V(B)
M WFt
flow-adjmted series.
+
nt.
fit would then &come the the j3int model, if specified, one could fit
The r d d u a l series
In terms of
siinuhneously a model involving all components, zt = V(B) MWFt + (other variable effects) + nt.
This would give a dynamic covariate analysis of the effects of t h e
other variables on zt.
This of course assumes that thers h no interaction between
f l o w and the other variables considered.
In s o m e cases a more complex i n t e r a d i o n
model might be appropriate or separate fits may be appropiate depxding on the season or flow rate.
aka that the m u d transfer function model assumes a linearity of a f f e c t of That B, a Lparti.cularchange in Elow rate should c a s e the s i n e change in concentration, whether the concentxation is fe.latively l o w or reLatively high. I€that w e r e not true, a more complex nonlinear model would have Note
flow
011
elemental concentration.
ta be formulated, or separate models would have to be used f o r di€€erent ranges of f l o w level. The forms or' transfer function in our examples w e r e quite s i m p l e , for e x a m p l e between say log TSS anci log M WP (see Table B) all t h e lagged weights are zero.
This
jlnplles that a simpler pro,mrtional regression adju;tment for contemporaneous f b w
rate would be adequate.
However the autocorrdated noise in such a model serves
notice t h a t ordinary regression analysis would not have been statistically efficient in making the adjustment.
Similar resuks of a s i m p l e proportional a d w m e n t f o r f l o w
held true aLS0 for TI? and TKN, where again the noise model w a s autccorrelated. ACKNONLEDGEbENT
W e gratefully acknowledge t h e help given by the Vermont Water Resources
C e n t e r personnel, Dr. Alan Cassell, Don i4&
and most particularly Dr. Jack Clamen.
Computer time was provided by the Academic Computing Center of t h e University of Vermont.
317
REF E R EN C ES
Box, G.E.P. and Jenkins, G.M., 1976. T h e Series A n a l y s i s : Forecasting and Control, Revised Edition. Hdden-Day, San Francisco. D a r n s l e t h , E., 1986. Modelling River A a d i t y - A Tmnsfer Function Approach. In: A. H. El-Shaarawi, and R. E. K w k t k o w s k i (Editors), 1986. Developments in Water Science. S t a t i s t i c a l A s p e c t s of Water Quality Monitoring. %vier Science Publisher, A m s t s r d a m . Dixon, W.J. (editor), 1981. BMDP Statistical Software, 1981 Edition. University of Califonlia Press, Berkeley. Haugh, L.D., 1976. Checking the Independence of T w o Covariance-Stationary Tine Series: A Univariate Reiidual C r c s Correlation Approach. Journal of t h e American S t a t i s t i c a l A d t i o n , 71:378-85. Haugh, L.D. and Box, G.E.P., 1977. IdentiEicatbn of D y n a m i c Regression (Distributed Lag) M o d e l s Connecting T w o Time Series. Journal of the American Statistical Ascciation, 72:121-30. H i p e L , K.W., 1981. Geophysical lyodel Discrimination Using the A k a i k e Information Criterion. I E E E Transactions on A u t o m a t i c Control, AC-26:358-378. Flipel, K.W., McLeod, A.I. and Li, W-K., 1985. C a d and Dynamic Relationship between Natural Phenomena. In: O.D. Anderson, J. K. Ord and E. A. Robinson (Editors), 1985. Time Series A n a l y s i s : Theory and Practice 6. Elsevier Science Publishers, Ams&erdam, ,pp. 13-34. tlirLjh, R.M., Slack, J.R. and Smith, R.A., 1982. Techniques of Trend A d y & for Monthly Water Quality Data. Water Resowces R e e a r c h , 18: 107-121. M c L s A , A l . , Hipel, K.W. and Camacho, F., 1983. Trend A s e s m e n t of d a t e r Quality Tine Series. WatEr R e m x c e s B U t i n , 19: 537-547. Meals, D.N., Jr., 1983. LaPlatte River Watetshed Water Quahty r\loni'aring and Vernont idater Analysis P q r a i n , Program R e p & No. 5, P r o j e c t Y e d r 4. R s ~ u r c e sResearch Center, University of Vermont, Burlington. M&, 0. W., 1985. luonitoring Changes i n Agricukural ilunoff Q d t y in the LLaPlatte River 'datershed, V-Jnont. In: Perspectives on Nonpoiit Source P d u t i o n . Pmceeckiqs of a Xational Conference. U. S. Envkmmental Protection Agency, pf?. 185-190. 1984. i"lult.iL& Input 'Tmnsfer Snorrwon, A., Nemtdd, P. and I y a x w d l , W.H.C., Function - N&e iblodelilg of River Flow. In: Maxwell, Ir3.H.C. and Beard, L.R. (Editors), 1984. Fmritiers in Hydrology. Water R ~ u r c e Publications, s Littleton, C a l o r a c b , rzp. 111-126.
RESIDUALS FROM REGRESSION WITH DEPENDENT ERRORS R. J . KULPERGER
Department o f S t a t i s t i c a l and A c t u a r i a l Sciences, O n t a r i o , London, O n t a r i o , Canada, N6A 5B9
51.
The U n i v e r s i t y o f Western
INTRODUCTION Regression models r
Y. =
’
z
+ xi
cxef,(Zi)
(1.1)
0
are
very
useful
i n practice.
Here we
o b t a i n e d a f t e r f i t t i n g t h e parameters. identically
x 1,n .
= y. 1
distributed r A
(i.i.d.)
are
process,
interested i n the residuals
{ X i
If
the
}
i s an independent and
residuals
are
given
by
M a c N e i l l (1974, 1978) and M a c N e i l l and Jandhyala
c ae,n(~i).
(1985) have c o n s i d e r e d some p r o p e r t i e s of t h e r e s i d u a l p a r t i a l sum process.
X
R e c e n t l y t h e case where
i s a dependent s e r i e s ,
s p e c i f i c a l l y an a u t o -
r e g r e s s i v e (AR) process, has become o f i n t e r e s t ( s e e El-Shaarawi and E s t e r b y (1982)
for
residuals
several
such examples).
i n t h i s case f o r
some
We c o n s i d e r some p r o p e r t i e s o f t h e simple
r e g r e s s i o n cases
i n section
3.
I n s e c t i o n 2 we summarize some r e s u l t s i n t h e AR case w i t h no r e g r e s s i o n . S e c t i o n 3 c o n s i d e r s t h e r e g r e s s i o n case and a l s o some remarks on d i f f e r e n c i n g . Section
4
describes
some
s i m u l a t i o n examples
to
illustrate
some o f
the
results. AUTUREGRESSIVE RESIDUALS
$2.
K u l p e r g e r (1985a) c o n s i d e r e d t h e model
where
i s an i . i . d .
i s assumed t o
process, mean z e r o and v a r i a n c e
satisify the
invertability
conditions
of
u
2
.
Box
The p r o c e s s and J e n k i n s
(1976). Observe d a t a
(8, ,n,
.. . ,8p,n),
X i , i = -p+l,-p+2 ,...,n. Estimate 81 ,..., B P t h e o r d i n a r y l e a s t squares e s t i m a t e , which m i n i m i z e s
n
P
by
319 The r e s i d u a l s a r e t h e n d e f i n e d by
x .1
2.i , n =
sn2
Let
- !i? j n X I. - J.,
i = 1,2
,...,n. u2 ,
be a c o n s i s t e n t e s t i m a t e o f
sn2 = l
f o r example
n ;i,n2 ,z
.
The
1 =1
r e s i d u a l p a r t i a l sum p r o c e s s i s t h e n d e f i n e d by
(2.2) h
K u l p e r g e r (1985a) t h e n shows
B, s t a n d a r d Brownian
converges weakly t o
Bn
m o t i o n ( s e e B i l l i n g s l e y (1968) f o r d e t a i l s on weak convergence and Brownian The weak convergence means f o r any c o n t i n u o u s f u n c t i o n
motion).
I:
D +
< 1} < t 0 -
sup{lB(t)
where
f
D+
on means
< 11, and < t 1: 0 -
f o r nice
(ii 3.
!?f ( B ) ,
F o r example
convergence i n d i s t r i b u t i o n . ( i )s u p { l B n ( t )
*(in)
lives,
Bn
space i n w h i c h
the function
g.
SOME REGRESSION MODELS WITH AR ERRORS Work i s c u r r e n t l y i n p r o g r e s s on t h e s e t y p e s o f r e s u l t s .
we w i l l
I n t h i s section
More d e t a i l s a r e g i v e n
p r e s e n t o n l y some more s p e c i f i c r e s u l t s .
i n K u l p e r g e r (1985b). F i r s t Order P o l y n o m i a l
3.1
We c o n s i d e r
first
y . = a,, t a l i
+
e s t i m a t e s of
aO,al
X.
1
where
L
=>
Xi
=
minimize
[ntl jointly En(t)= - c u f i 1
where
a special
ei
case of
+ BX. 1-1
E
s e c t i o n 3.2.
n 2 c (Yi-a -a i ) i=l 0 1
= > B ( t ) and
-I
means converges weakly.
The AR(1) p r o c e s s i s now e s t i m a t e d by
an
i is
.
AR(1)
I t can
t h e model
Consider
then
process. be
shown
The that
320
The r e s i d u a l s a r e f i n a l l y d e f i n e d t o be . ,
h
;. 1,n
x i. , n
=
i = 1,2 ,...,n.
- RnXi-1,n’
It e a s i l y f o l l o w s t h a t
E^.i n
+ (l-Bn)(ao-Gon)
+ (B-Rn)xi-l
. E1
=
+ (1-Bn)(al-Gln )i [ntl
h
Bn(t)
Let
t
C
= ofi
i=l
(3.2)
Bn(al-Gln)
2.i , n ’
< 1, 0 < t -
be t h e r e s i d u a l
partial
sum p r o c e s s .
Then u s i n g ( 3 . 1 ) and ( 3 . 2 ) i t now f o l l o w s t h a t i n ( t ) => B ( t )
2
t
t h e same l i m i t process as i n t h e case i n which t h e e r r o r s a r e i . i . d . Polynomial P l u s Centred P e r i o d Component Consider t h e model Yi = aO + a 1i t a 2 f ( i ) + Xi
3.2
n
The assumption Otherwise l e t = a
0
t a
1
i
-1
+
a2(g(i)+cl)
2 1
0
zf(i)
+
g(i) = f(i) - c
= ( a t a c ) t e-,i
t
t
c1
1’ Xi
a2g(i) + X i
The AR process i s e s t i m a t e d by = Y.
1
-
- u^ On - Glni
GZnf(i),
-p+ 1,
...,n
Upon f i t t i n g t h e AR(p) model, t h e r e s i d u a l s a r e o b t a i n e d as
P
h
;i,n
=
i s
i s n o t such a r e s t r i c t i o n w i t h
The r e g r e s s i o n e s t i m a t e s s a t i s f y
x i. , n
X_
an AR(p)
We need t h e f o l l o w i n g assumptions .
process.
Y. 1
where
Yi,n
-
1”
h
$,n
x .1-J.11.
c1
= 0.
321
Then
+ Pc
(B.-@.
j=1
-
p
Jn
J
C
8.
1
Jn
=> B ( t )
1 ) ofi
[ntl
x. . c i=l 1-J (al-$,,)
(a,-~,,)Cntl t(
U f i
-
where (B,Zo,Z1)
B,(t)
= U f i
U f i
has t h e j o i n t limit l a w o f
Cntl and
z 1
( z0t+z1 t 2
,#’(
e
1
E
~
.
Therefore
[ntl
[ntl (i-j)
(a2-G2,,)
z
1
t U f i
f(i-j)
322
-
B n ( t ) => B ( t ) + Z(B(1)
$
3
1
-
B(s)ds)t
3(B(1)
-
2
f
0
1
B ( s ) d s ) t2
9
G(t)
0
T h i s i s t h e same l i m i t as i n s e c t i o n 3.1. I f t h e model i s changed t o
Y. =
+
a.
1
+ a2fl(i)
ali
+
+
a f (i)
2 2
Xi
where fl and f2 b o t h s a t i s f y t h e assumptions a t t h e b e g i n n i n g o f t h i s s e c t i o n , t h e r e s i d u a l p a r t i a l sum l i m i t p r o c e s s a g a i n t u r n s o u t t o b e G(.).
3.3
Remarks on D i f f e r e n c i n g In
processes
differencing.
with
trends,
Here
161 <
1
(MI)
Xi+l
= 6x0
(M2)
Xi+l
=
(M3)
Yi
+
1
=
-
X.
1
X. 1-1
is
simple
performed
examples,
after
all
with
ali
E.
1+1
+ Xi,
with
s a t i s f y i n g (Ml).
X
f o r M1 and M2 and
i n M1 and M2 we have
...,n.
i = - l,O,l,
Observe d a t a a t t i m e s Z.
three
analysis
1+1
1
+
consider
the
E.
a + 6x. +
= a.
we
often
Zi
Z . = 6 z .1-1 + 1
yi
Upon d i f f e r e n c i n g o b t a i n d a t a
-
Yi-l
where
yi
=
Yo
f o r M3. =
E.
i
-
‘i-1.
Then f o r example Estimate
6
by
o r d i n a r y l e a s t squares,
Bn
Then
for
+
all
three
cases
( s e e Jandhyala
f o r some f u r t h e r comments on M l ) . y.
=
v^.1 ,n
E.
=
and t h e
P
O,n
Pi,,(a)
zi
E~+,
-
f o r M1 and M2 i s e s t i m a t e d by
B nz 1-1 .
E ~ ’ Sa r e
e s t i m a t e d by t h e r e s i d u a l s
(a) = a =
$,n + P.1-1 ,n ( a ) .
The sums o f t h e r e s i d u a l s a r t
and K u l p e r g e r
(1985)
323 Theorem 3.1 ( a ) For M1,
n-l
- bnX-l,
i f a # Xo
(i)
then
[ntl c 1
and
-
( i i ) i f a = Xo
B^nX-l,
then
1 o f i
i s standard Brownian motion.
B
where ( b ) For M2
For M3
zi
= Yi
- Yi-l
= a1
ui
+
U. = X. - X. 1 1 1-1-
where
t h e process
by
Ui
Ui,n
Estimate
a1
-
G,n.
= Z.
1
n -1 I: Zi. Estimate 1 Something different from
by
Gln
= n
Theorem
3.1 above occurs. Theorem 3.2 For M3
where
Xo,X1
have t h e AR(1) d i s t r i b u t i o n and
and has t h e same d i s t r i b u t i o n as
4.
i s independent o f
Xo,X1,
.,X,
REMARKS In
Let
=
t h e AR(1)
-.4
N
and l e t f(x) =
&-Ix'
just
i l l u s t r a t e Theorem 3.1
.
n = 200.
We a l s o take
process a r e n o t used. =
100
differencing gives
points
Rn
.+
6-1
for
model
M1.
be d i s t r i b u t e d symmetric e x p o n e n t i a l , t h a t i s
E
process and t o remove t h e s t a r t u p phase,
t h e AR
first
s e c t i o n we w i l l
this 6
with density of
X
By t r i a l ,
i s reasonable. =
-.7.
Here
I n o r d e r t o simulate the f i r s t
F i t t i n g an AR(1)
B
=
N
points
i t seems t h a t dropping t h e
model a f t e r
-.79. The f i r s t p i c t u r e , u s i n g
324
20 100 P L O T
u
0.1. I
I I I
00 D0000D00
000000 OOOODD D0000000
'0.3.
I
0000000
I
DO00
I
0000"0000
I
-0.7. I
I
-1.1. I I
FIGURE 1
asymptotic s ope -1.3687. The second 1 Cntl and g i v e s __ I: ?.. These o n l y i l l u s t r a t e has
p i c t u r e uses the
c0
= '.3687
1
L7-d difficulties
known
in
working
with
non-invettable
some
of
Also
i f one i s dea i n g w i t h a process c l o s e t o these,
,
models.
strange t h i n g s can
happen. I n many
the
cases,
residual
partial
sum
w i t h AR e r r o r s , i s t h e same as t h a t o f t h e i . i . d .
processes,
for
e r r o r s case.
regression
It s t i l l
10 100 PLOT u1 0.7. 1 I I I 0.41 I I I
I 0 . I.
00 00 0 D O 00 0 0 00
w
0 0 0
0 0
0 0
0 0
000
0
00
0 0
0
0
D
0
0
0
0 000
0
0
00
0
0
00 000
0 0 000 0
0
00
00
D O 000000
0 0
0 0 0
0 00
0 000
DODO
0 0
00
000 0
0000
-0.2.
0
00
0 0 000 0 00
0 00
00
0 0 0000
0
00 0
000
0
0 00
00 0
0
000
a
OD DOO
0
0
oa
00
I I
0
0
D
I
FIGURE 2
325 remains
to
be
seen
i f these r e s u l t s are useful
regression over time. sums,
these
results
distributions,
that
i n d e t e c t i n g changes i n
However f o r h e u r i s t i c t e s t s based on r e s i d u a l p a r t i a l and
those
i s where
i n Kulperger the
null
(1985b)
hypothesis
can g i v e
i s that
some n u l l
of
no change
i n regression. ACKNOWLEDGEMENT Supported by NSERC g r a n t number A5724. REFERENCES P.
Billingsley,
(1968).
Convergence
of
Probability
Measures.
Wiley,
New York. Box,
and J e n k i n s ,
G.E.P.
and C o n t r o l .
(1976).
G.M.
El-Sharaawi, A. and E s t e r b y , S.
(1982).
Developments i n Water Science, 17. Jandhyala,
Time S e r i e s A n a l y s i s :
Forecasting
Holden-Day, San F r a n c i s c o .
V.K.
(1985).
Ph.D.
Time S e r i e s Methods i n Hydrosciences. E l s e v i e r , New York.
Thesis.
Department o f S t a t i s t i c s , U n i v e r s i t y
o f Western O n t a r i o , Canada. Jandhyala,
V.K.
and K u l p e r g e r ,
R.J.
(1985).
Estimation o t t h e autoregressive
parameters i n some n o n - s t a t i o n a r y ARMA(p,l) models. K u l p e r g e r , R.J.
(1985a).
and p o l y n o m i a l
On t h e r e s i d u a l s o f a u t o r e g r e s s i v e processes
regression.
To appear i n S t o c h a s t i c Process and T h e i r
Appl ic a t i ons. Kulperger,
R.J.
errors
and
(1985b). their
Some remarks on r e g r e s s i o n w i t h a u t o r e g r e s s i v e
residual
processes.
Tech.
Report,
Department
of
S t a t i s t i c s , U n i v e r s i t y o f Western O n t a r i o . MacNeill, and
I.B.
(1974).
distributions
Ann. S t a t i s t . , MacNei 11, I .B.
Tests
of
some
for
change o f
related
parameter
functionals
on
at
unknown
Brownian
time
motion.
2, 950-962.
( 1978).
P r o p e r t i e s o f sequences o f p a r t i a l sums o f polynoini a1
r e g r e s s i o n r e s i d u a l s w i t h a p p l i c a t i o n s t o t e s t s f o r change o f r e g r e s s i o n a t unknown t i m e s . MacNeill,
1.6.
Ann. S t a t i s t . ,
and Jandhyala,
l i n e a r regression.
V.K.
6, 422-433. (1985).
The r e s i d u a l process f o r non-
To appear i n J. A p p l . Prob.
ALTERNATIVES FOR IDENTIFYING STATISTICALLY SIGNIFICANT DIFFERENCES EDWARD A. McBEAM INTRODUCTION The need to discriminate between two or more sets of data is commonplace. Examples where discrimination is needed include the determination of the impact of an implemented remedial technology and the examination of whether a non-point pollutant source is producing a statistically significant impact. In responding to these types of questions requiring analysis, a number of testing procedures have been utilized. However, in selecting the procedure for use in a particular application, there are no absolute rules, only guidelines. To a large extent, the selection of the best procedure involves careful scrutiny of the characteristics of the problem-at-hand, and the assumptions implicit in the particular discrimination technique being considered. The most frequently used procedure for environmental problems is the t-test. However, there are assumptions implicit to the test which require different approaches in application to a problem. The intent of this paper is to discuss the nature of these assumptions and some of the available alternatives in application to analysis of water quality monitoring data. BACRGROUND Mathematically, the testing procedure as presented by Fisher (1925) allows the testing of whether the means from two sets of measurements, say X (where elements of X are xi where i=l, 2, ...m) and Y (where elements of Y are y where j=l,...n) are the same. j Assuming that X and Y are normally distributed with the same variance but that their population means LI and ii may be different, - - Y then the difference between the sample means x-y will be normally 1 + n). 1 distributed with mean ( p -u ) and variance u (m Then X Y t =
IX - 71
where I I denotes the absolute value sign and ' u ' represents the standard deviation, will follow a t-distribution with m+n-2 degrees
327 of f r e e d o m .
(a)
Noteworthy p o i n t s r e g a r d i n g t h e above i n c l u d e :
t h e a s s u m p t i o n t h a t d i s t r i b u t i o n s o f X and Y have t h e s a m e v a r i a n c e i s e s s e n t i a l t o t h e argument;
(b)
the variance a2
(mL L ) n
i s n o r m a l l y r e f e r r e d t o a s t h e common
variance;
(c)
t h e t - t e s t i s based on t h e a s s u m p t i o n t h a t t h i s u n d e r l y i n g d i s t r i b u t i o n i s normal o r g a u s s i a n .
U n f o r t u n a t e l y , o n e o r more o f t h e s e a s s u m p t i o n s i s f r e q u e n t l y v i o l a t e d i n s u r f a c e water q u a l i t y monitoring d a t a .
As w e l l ,
numerous o t h e r d i f f i c u l t i e s w i t h t h e d a t a i n c l u d e :
-
t h e t e s t s a r e a p p l i c a b l e i f t h e o b s e r v a t i o n s w i t h i n , and between samples c a n b e t r e a t e d a s i n d e p e n d e n t o f one a n o t h e r . I n many c a s e s , however, t h i s i n d e p e n d e n c e may n o t e x i s t .
-
a l l l a b o r a t o r y a n a l y t i c a l techniques have d e t e c t i o n l i m i t s below which o n l y " l e s s t h a n " v a l u e s may b e r e p o r t e d .
The
r e p o r t i n g of less t h a n v a l u e s p r o v i d e s a d e g r e e of q u a n t i f i c a t i o n , b u t even a t t h e i r d e t e c t i o n l i m i t s ,
the concentration
l e v e l s o f p a r t i c u l a r c o n t a m i n a n t s may b e o f c o n s i d e r a b l e importance b e c a u s e of t h e i r p o t e n t i a l h e a l t h h a z a r d .
How d o e s
one t h e n c a l c u l a t e t h e n e c e s s a r y s t a t i s t i c s f o r u s e i n Equation
(l), o r e q u a t i o n m o d i f i c a t i o n s t h e r e o f ? ALTERNATIVE FORMS
Out o f t h e f u n d a m e n t a l d e v e l o p m e n t s by G o s s e t t and F i s h e r , a number o f d i f f e r e n t t e s t s f o r s t a t i s t i c a l d i s c r i m i n a t i o n h a v e b e e n developed,
The d i f f e r e n t t e s t s i n c l u d e :
( i ) t h e two s a m p l e t - t e s t r e q u i r e s t h a t a l l t h r e e a s s u m p t i o n s i n d i c a t e d a b o v e ( a ) t h r o u g h ( c ), b e m e t ; ( i i ) m o d i f i e d t-tests have been d e v e l o p e d ( e . g .
Satterthwaite
( 1 9 6 4 ) , Behrens ( 1 9 2 9 ) , C o c h r a n ' s Approximation t o t h e BehrensFisher Students' t-test
(see C o c h r a n ( 1 9 6 4 ) ) r e l a x t h e s t r i n g e n c y
of a s s u m p t i o n s ( a ) and ( b ) .
As well,
t h e t-test
is reasonably
i n s e n s i t i v e t o moderate d e v i a t i o n s from n o r m a l i t y i n t h e d i s t r i b u t i o n of t h e d a t a .
A s an example, t h e R e s o u r c e C o n s e r v a t i o n
Recovery A c t assumes t h a t a sample w i t h a c o e f f i c i e n t o f v a r i a t i o n
less t h a n 1 . 0 0 i s l i k e l y t o have a normal d i s t r i b u t i o n ( F e d e r a l Register, 1982); ( i i i ) p a i r e d s a m p l e t - t e s t s a r e u s e d when t h e s a m p l e p o p u l a t i o n s a r e n o t i n d e p e n d e n t , s u c h a s o c c u r when s u c c e s s i v e s a m p l i n g t a k e s p l a c e o f t h e s a m e w a t e r s a m p l e s u p s t r e a m and downstream o f some source.
W
'Table 1
Test
t statistic
Two Sample t-Test
t
N a3
Summary T a b l e of t - T e s t S t a t i s t i c s , Degrees of Freedom and A s s u m p t i o n s S a t t e r t h w a i t e Approximation t o the Two Sample t-Test
Cnchran' 6 Approximat 1on t o the Behrens-Fisher t e s t
Paired t-Test
- IX - YL
sm m
n
m
n
m
n where Di
and
and S
6
D
-
xi-yi
for i = l , . . . m
m
Z Di 1-1
=
/
p
m- 1 Degrees of Freedom
df=m+n-2
dfx dfy
-
-
-
t t a b l e s with m-1 degrees of freedom
-
t t a b l e s with n-1 degrees of freedom
s 2 Wx Note: Comments
Since a is unknown, i t I s replaced by S, the sample atandard deviation. The same formulae a r e used w i t h transformed d a t a , as with untransfomed data.
round ' d f ' dovn t o the next nearest integer
S 2
-%andW
-y-
~n
with t h e r e s u l t the comparison t - s t a t i s t i c is
wx
tx
+w
t
Y
df
-
m -1
329
A summary table of the mathematics implied in some of the resulting tests is included as Table 1. As an example of the difficulties of test selection, the surface water quality monitoring results obtained from measuring both upstream and downstream of a potential nonpoint source, are included as columns I1 and I11 in Table 2. Some remedial technologies were implemented in October/November 1980 and the water quality monitoring data are as characterized by column V, as measured in 1981/82. Of interest are two questions: (i) Is the source contributing significantly to the river? and (ii) Did the remedial technologies significantly impact the water quality? Each will be briefly addressed. Statistical Discrimination for Non-Point Loadings Columns I1 and I11
-
Using Satterthwaite's Approximation, an examination of the upstream and downstream concentrations finds X = 1.69 y = 4.74 m = 10, vm = 9 n = 10, v = 9 n sx* = 1.35 S = 1.83 Y m n t1 = 1.71 v1 = 17.5 which is then taken as 17 Finally, for a one-sided test (from standard t tables) tC
Since
tC
=
1.74
0.05
1 > t 0.05-
then a statistically significant change has not been identified at the 95% level. However, a visual inspection of the upstream/ downstream data clearly demonstrates that the downstream water quality is at a lesser water quality level. For the type of correlation existing between upstream and downstream points, the pairing of individual observations and then observing only the differences between the observations is appropriate. Once the differences in the pairs are calculated, they are treated as a single random, independent sample.
This
capability is particularly important for data series possessing seasonality. Therefore, although the paired test has half the degrees of freedom of the two-sample t-test, the paired test does not "see" the cyclical variation which affects both populations and thus does not include it in the calculation of the standard
330 TABLE 2
Upstream and Downstream Water Q u a l i t y M o n i t o r i n g Records I
Date of Sampling 10/79 11/79 12/79 1/80 2/80 3/80 4/8 0 5/8 0 6/80 7/80 8/80 9/80
Pre-Remedial Records I1 I11 Downstream Upstream Measurements Measurements (mg/ a. 1 ( m s / a. ) .29 12 .32
---
Mean Standard Deviation
Post-Remedial Records IV V Date of Downstream Sampling Measurements (mg/2 )
4.3 16 6.1
13/81 11/81 12/81 1/8 2 2/82 3/8 2 4/8 2 5/82 6/82 7/82 8/82 9/82
---
.49 -14 1.58 1.77 1.07 -07 -14 -
2.66 3.0 4.42 5.74 1.40 1.49 2.3
1.69 3.67
4.74 4.28
.53 1.5 1.3
---
2.1 1.1
--
1.8 1.2 -64 1.1 1.25 .50
TABLE 3
Impact of A l t e r n a t i v e E q u a l i t y Assignments I Data Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17
I1 Phenol Concentrations mg/ .t 4 46
I11
4 46 1
<1 <1 <1
IV
V
Impact of A l t e r n a t i v e E q u a l i t y Assignments (i) (ii) (iii) 4 46 .5 .5 .5
1
1 2 <1 1 <1 1 3 3 1 1 4 4 <1 1 9 9 8 8 2 2 2 2 7 7 mean 5.5 standard deviation 10.8 2
.
mean (no o u t l i e r ) 3 . 0 s t a n d a r d d e v i a t i o n * 2.7 (no o u t l i e r )
2 .5 .5 3 1 4 .5 9 8 2
4 46 0 0 0 2 0 0 3 1 4 0 9 8 2 2 7 5.18
2 7 5.35 10.8
10.9
2.8 2.9
2.6 3.0
Notes : - * w i t h removal of t h e o u t l i e r - L o c a t i o n of m o n i t o r i n g d a t a - phenol l e v e l s i n S c h n e i d e r C r e e k , J u n e 1 9 7 6 , by g r a d sample w i t h a f r e q u e n c y n o t e x c e e d i n g one p e r day.
331
error.
-
D
Using t h e p a i r e d t e s t = 2.95
SD = 1 . 5 5 t
= 6.0
F o r 9 d e g r e e s of f r e e d o m tc = 1.83 f o r 95% a n d a s t a t i s t i c a l l y s i g n i f i c a n t impact ( t 2 t ) i s profoundly demonstrated (as t h e d a t a C
v e r y l o g i c a l l y i n d i c a t e s t h a t such must b e t h e s i t u a t i o n ) . I m p a c t o f t h e Remedial T e c h n o l o g y Using S a t t e r t h w a i t e ' s A p p r o x i m a t i o n i n c o m p a r i n g p r e - and p o s t r e m e d i a l measurements g i v e s t * = 2.56 and
tc = 1 . 7 which i n d i c a t e s t h e r e m e d i a l m e a s u r e s h a v e had a s t a t i s t i c a l l y s i g n i f i c a n t impact. DETECTION LIMIT DATA ANALYSES
R e c o g n i t i o n o f t h e c a r c i n o g e n i c r e l a t i o n s h i p s o f many c o n s t i t u e n t s (most n o t a b l y t h e c h l o r i n a t e d c a r b o n s ) e v e n a t c o n c e n t r a t i o n s i n t h e range of d e t e c t i o n l i m i t s of t h e i n s t r u m e n t a t i o n , h as created a d i f f i c u l t y i n discrimination analyses.
U t i l i z a t i o n of
t h e p a r a m e t r i c d i s c r i m i n a t i o n procedures r e q u i r e s replacement of " l e s s t h a n " d e t e c t i o n l e v e l d a t a by q u a n t i f i e d v a l u e s . F r e q u e n t p r a c t i c e t o a l l o w q u a n t i f i c a t i o n of t h e parameter s e t i s t o a s s i g n less t h a n v a l u e s a s " e q u a l t o " e i t h e r detection l i m i t ,
(ii)one-half
( i )t h e
t h e d e t e c t i o n l i m i t , o r (iii) z e r o .
T h e s e a s s i g n m e n t s p r o v i d e a d e g r e e o f q u a n t i f i c a t i o n b u t may s e r i o u s l y a f f e c t subsequent u t i l i z a t i o n of t h e parameters used i n t h e t-test.
A s an i n d i c a t i o n of
t h e consequences, c o n s i d e r t h e
example where t h e ' l e s s t h a n ' v a l u e s a r e assumed e q u a l t o t h e i r d e t e c t i o n l i m i t ( ( i )above).
I n t h i s case, t h e e s t i m a t e of t h e
mean i s h i g h a n d t h e e s t i m a t e of t h e s t a n d a r d d e v i a t i o n i s low. A s an i n d i c a t i o n of t h e problem, c o n s i d e r t h e chemical c o n c e n t r a -
t i o n d a t a r e p o r t e d i n T a b l e 3 from p h e n o l c o n c e n t r a t i o n s i n Schneider Creek i n J u n e 1976.
The r e s u l t i n g s t a t i s t i c a l p a r a m e t e r
c a l c u l a t i o n s when t h e less t h a n v a l u e s a r e r e p l a c e d i n a c c o r d w i t h
( i ) , ( i i )a n d (iii) are a s i n d i c a t e d i n columns I11 t h r o u g h V i n T a b l e 1. The i m p a c t s o f a s s u m i n g ( i ) ,( i i )o r (iii) i n c l u d e t h e r e s u l t s t h a t Snedecor's F - t e s t might i n c o r r e c t l y suggest t h a t t h e variance
332
90 80 706050403020 10 5 2 I PROBABILITY OF EXCEEDANCE Figure 1
Probability of Exceedence of Phenol Concentrations
333
computed f r o m t h e s e d a t a i s s i g n i f i c a n t l y d i f f e r e n t f r o m t h e v a r i a n c e c a l c u l a t e d from o t h e r d a t a ( e . g . p o s t r e m e d i a l technology).
A l t e r n a t i v e l y , t h e r e s u l t s could s u g g e s t a n impact
h a s o c c u r r e d b e c a u s e o f t h e i n c o r r e c t e s t i m a t e s o f means and v a r i a n c e s where,
i n f a c t , t h e r e i s no i m p a c t
(a f a l s e positive).
I n response t o t h e s e concerns, s e v e r a l procedures are possible: i t i s p o t e n t i a l l y b e s t t o u t i l i z e more t h a n a s i n g l e p r o c e d u r e a n d interpret the collective findings.
.
The s u g g e s t i o n s i n c l u d e :
Examine t h e s i g n i f i c a n c e t e s t s by s e n s i t i v i t y a n a l y s e s , i . e .
u t i l i z e e a c h of
( i ) , ( i i ) and
(iii)p r e v i o u s l y d e s c r i b e d .
If all
t h e tests a r e i n agreement, then t h e assumption u t i l i z e d is unimportant.
.
F i t a p r o b a b i l i t y d i s t r i b u t i o n t o t h e i n f o r m a t i o n above t h e
technological l i m i t .
To b e s u c c e s s f u l , a r e a s o n a b l e p o r t i o n of
t h e d a t a must b e i n e x c e s s o f t h e d e t e c t i o n l i m i t .
Interesting
c a n d i d a t e d i s t r i b u t i o n s i n c l u d e t h e normal and l o g n o r m a l d i s t r i b u t i o n s , which t e n d t o b e u s e f u l i n d e s c r i b i n g many d a t a s e t s b e c a u s e of t h e c e n t r a l l i m i t t h e o r e m .
.
U t i l i z e a n o n - p a r a m e t r i c t e s t s u c h a s t h e Nann-Whitney
test
i n s t e a d of t h e t - t e s t . I f t h e s t a t i s t i c a l o u t l i e r ( t h e second monitored v a l u e ) i s l e f t i n t h e d a t a s e t , l i t t l e d i f f e r e n c e i n t h e c a l c u l a t e d means and s t a n d a r d d e v i a t i o n s i s a p p a r e n t from t h e r e p o r t e d v a l u e s i n t h e s e c o n d l a s t row o f T a b l e 3 .
However, i f t h e o u t l i e r i s removed,
t h e c h a n g e s a s s o c i a t e d w i t h a s s u m p t i o n s of
( i ) , ( i i )o r
( i i i )a r e
p o t e n t i a l l y more i m p o r t a n t . A s a n a l t e r n a t i v e p r o c e d u r e , t h e r a n k e d d a t a and p l o t t i n g
p o s i t i o n u s i n g t h e Weibull p l o t t i n g f o r m u l a ( m / ( n + l ) where m = t h e r a n k o f t h e s a m p l e a n d n = t h e number of s a m p l e s ) a r e c o n t a i n e d i n F i g u r e 1 on l o g n o r m a l p r o b a b i l i t y p a p e r .
Determining t h e b e s t - f i t
l i n e t o t h e r e s u l t i n g d a t a must b e c a r r i e d o u t w i t h c a u t i o n a b r i e f d a t a s e t , one o u t l i e r
(e.g.
t i a l l y bias the resulting line.
-
with
The r e s u l t i n g s t a t i s t i c s t o
c h a r a c t e r i z e t h e d a t a are:
-
--
t h e v a l u e of 4 6 ) c a n substan-
-
when a l l v a l u e s a r e i n c l u d e d ( L i n e A on F i g u r e 1)
x =
when t h e o u t l i e r i s n o t i n c l u d e d ( L i n e B on F i g u r e 1)
x
s -
1.79
= 10.4 =
1.91
S =
8.24
As. t h e t h i r d p r o c e d u r e , t h e n o n - p a r a m e t r i c p r o c e d u r e s such a s
t h e Mann-Whitney t e s t , d o n o t r e q u i r e t h e a s s i g n m e n t of means and
334
standard deviations and therefore avoid the problem to some extent. On the negative side, non-parametric tests typically do not have the same discrimination capability and so may not be as effective in application. CONCLUSIONS To make inferences about the means of small samples, the tdistribution which describes the distribution of the means of small samples from a normally distributed population, is frequently chosen as the reference. Whether this is valid or not depends somewhat upon the purpose of the testing and how the test is applied. The concept of "statistical significance" must be reflected in a number of aspects of the monitoring program involving not just the choice of the level of significance but also the choice of the test, and the requirements of the number of samples. REFEREPJCES Behrens, W.V., Landwirtsch. Jahrb., 68, 1929, 807. Cochran, W., "Approximate Significance Levels of the BehrensFisher Test", Biometrics, March 1964, p. 191. Federal Register, Rules and Regulations, Vol. 47, No. 143, July26, 1982. Satterthwaite, F.E., Biometric Bulletin, 2, 1946, 110. Snedecor, G.W., and C o c h r a n , i s t i c a l Methods, The Iowa State University Press, 6th Edition, 1967.
GLOBAL VARIANCE AND ROOT MEAN SQUARE ERROR ASSOCIATED WITH LINEAR INTERPOLATION OF A MARKOVIAN TIME-SERIES D.A.
CLUIS
INRS-Eau, Uni v e r s i t 6 du QuGbec, C.P. 7500, Sainte-Foy (Qugbec), Canada G1V 4C7
ABSTRACT Most general-purpose d a t a a c q u i s i t i o n n e t w o r k s p r o v i d e equispaced i n s t a n t a n e o u s i n f o r m a t i o n ; t h e f r e q u e n c y o f measurements necessary t o o b t a i n t h i s i n f o r m a t i o n e f f i c i e n t l y i s r e l a t e d t o t h e i n t r i n s i c temporal v a r i a b i l i t y o f t h e g i v e n phenomenon.
Thus,
m e t e o r o l o g i c a l and h y d r o m e t r i c phenomena a r e sampled more i n -
t e n s i v e l y t h a n a r e more s t a b l e groundwater v a r i a t e s . Once t h e d a t a have been a c q u i r e d ,
t h e e s t i m a t i o n o f v a l u e s which s h o u l d have
been t a k e n by a t i m e - s e r i e s a t h i g h e r f r e q u e n c y o r o f t h e c o m b i n a t i o n o f two o r more t i m e - s e r i e s problem.
measured a t d i f f e r e n t f r e q u e n c i e s i s a f r e q u e n t b u t u n s o l v e d
I n the f i e l d o f water q u a l i t y monitoring,
f o r example, t h e e s t i m a t i o n
o f mass-discharges i s a p r e r e q u i s i t e f o r t h e i n t e r p r e t a t i o n o f t r a n s p o r t phenomena,
source-effects
r e l a t i o n s h i p s and t r e n d d e t e c t i o n .
To e v a l u a t e t h i s
i m p o r t a n t secondary v a r i a t e , one must combine h i g h f r e q u e n c y / h i g h v a r i a b i l i t y f l o w d a t a w i t h l o w f r e q u e n c y / l o w v a r i a b i l i t y c o n c e n t r a t i o n data;
t h i s can be
done by u s i n g some c o m b i n a t i o n o f a g g r e g a t i o n and i n t e r p o l a t i o n o f d a t a . a g g r e g a t i o n o f h i g h f r e q u e n c y d a t a has r e l a t i v e l y m i n o r e f f e c t s , e.g.
The
a reduc-
t i o n o f t h e v a r i a n c e and a m o d i f i c a t i o n o f t h e p e r s i s t e n c e s t r u c t u r e i n t h e t r a n s f o r m e d data.
However,
the spreading o f the i n f o r m a t i o n r e s u l t i n g from
l i n e a r i n t e r p o l a t i o n c r e a t e s a c e r t a i n l e v e l o f h e t e r o s c e d a s t i c i t y and a l s o produces an e r r o r o f e s t i m a t i o n ,
t h e v a r i a n c e o f which i n c r e a s e s w i t h t h e num-
ber o f p a r t i t i o n s . F o r phenomena e x h i b i t i n g s h o r t - t e r m p o s i t i v e Markovian p e r s i s t e n c e , tical
expression f o r t h e global
established sel f - s i m i l a r
for
t h e analy-
v a r i a n c e o f t h e e s t i m a t i o n e r r o r was f i r s t
skipped s e r i e s d e r i v e d from actual
measurements:
p e r s i s t e n c e s t r u c t u r e o f t h e Markovian processes,
using the
t h e Root Mean
Square E r r o r (RMSE) o f t h e i n t e r p o l a t e d t i m e - s e r i e s was t h e n deduced.
Thus, a
336 criterion
r e l a t i n g the
short-term
persistence
to
t h e number o f
partitions
a l l o w s t o c o n t r o l and l i m i t t h e l e v e l o f e r r o r i n t h e t r a n s f o r m e d t i m e - s e r i e s . INTRODUCTION Sometimes one wishes t o combine two o r more g e o p h y s i c a l t i m e - s e r i e s which a r e sampled s y s t e m a t i c a l l y a t t i m e i n t e r v a l s whose t i m e s t e p s may o r may n o t be i n t e g r a l m u l t i p l e s o f each o t h e r .
C o n s i d e r a t i o n o f o n l y s i m u l t a n e o u s l y sampled
e v e n t s o f t h e s e r i e s would r e s u l t i n a m a j o r l o s s o f i n f o r m a t i o n ; t h i s i s espec i a l l y t r u e i f t h e b a s i c d a t a c o n s i s t s o f t h e equispaced i n s t a n t a n e o u s i n f o r m a t i o n p r o v i d e d by most g e n e r a l - p u r p o s e d a t a a c q u i s i t i o n network;
i n t h i s case
t h e f r e q u e n c y o f measurements necessary t o o b t a i n t h i s i n f o r m a t i o n e f f i c i e n t l y i s r e l a t e d t o t h e i n t r i n s i c temporal v a r i a b i l i t y o f t h e g i v e n phenomenon and c o u l d d i f f e r by some o r d e r s o f magnitude.
Thus, m e t e o r o l o g i c a l and h y d r o m e t r i c
phenomena a r e sampled more i n t e n s i v e l y t h a n a r e more s t a b l e groundwater v a r i a tes. I n the f i e l d o f water q u a l i t y monitoring,
f o r example, t h e e s t i m a t i o n o f mass-
d i s c h a r g e s i s r e q u i r e d f o r t h e i n t e r p r e t a t i o n o f t r a n s p o r t phenomena, e f f e c t r e l a t i o n s h i p s and t r e n d d e t e c t i o n .
source-
To e v a l u a t e t h i s i m p o r t a n t secondary
v a r i a t e , one has t o combine h i g h f r e q u e n c y / h i g h v a r i a b i l i t y f l o w d a t a w i t h l o w f r e q u e n c y l l o w v a r i a b i l i t y c o n c e n t r a t i o n data.
I n t h e P r o v i n c e o f Quebec, w a t e r
l e v e l s l e a d i n g t o p u b l i s h e d compounded mean d a i l y d i s c h a r g e s a r e r e c o r d e d e v e r y
15 minutes,
whi l e t h e m o n i t o r i n g network f o r r u n n i ng-water qua1 it y parameters
p r o v i d e s samples r e g u l a r l y e v e r y 3-4 weeks. ges, o r t h e f l u x e s ,
An e s t i m a t i o n o f t h e mass-dischar-
o f p o l l u t a n t s can be made by using,some c o m b i n a t i o n o f ag-
g r e g a t i o n and i n t e r p o l a t i o n ,
b u t i t i s i m p o r t a n t t o e v a l u a t e how much n o i s e i s
added by t h i s m a n i p u l a t i o n o f t h e data.
Knowledge o f t h e l e v e l o f t h i s n o i s e
can be o f extreme v a l u e i n t h e f i e l d o f water-qua1 i t y m o d e l i n g where "measured" mass-discharges a r e used f o r c a l i b r a t i o n purposes.
Disregarding t h e inaccuracy
o f the reference values could l e a d t o f u t i l e attempts a t f u r t h e r f i n e - t u n i n g o f a model as measured d a t a a r e a c t u a l l y f u l l y e x p l o i t e d . The d e f i n i t i o n o f a s u i t a b l e t i m e - i n t e r v a l
f o r calculation o f a time-series
c o m b i n a t i o n v a r i a t e as w e l l as t h e e s t i m a t i o n o f t h e e r r o r s i n v o l v e d c o n s t i t u t e s a f r e q u e n t b u t u n s o l v e d problem.
The a g g r e g a t i o n o f h i g h f r e q u e n c y d a t a .
does have s t r u c t u r a l consequences e.g.
a r e d u c t i o n o f t h e v a r i a n c e and a modi-
f i c a t i o n o f the persistence s t r u c t u r e i n t h e transformed data; b u t i t i s not introducing external
i n f o r m a t i o n i n t o t h e transformed time-series.
However,
the spreading o f t h e i n f o r m a t i o n r e s u l t i n g from l i n e a r i n t e r p o l a t i o n creates a c e r t a i n level o f heteroscedasticity, modifies the persistence structure,
but
337
a l s o induces an e r r o r of estimation, the variance of which increases with the number of p a r t i t i o n s . We deal with t h i s problem in t h i s paper. Hypothesis The sampled time-series Z i i s taken t o follow a c l a s s i c a l a d d i t i v e scheme: Zi = T . + S . i i
+ Xi
(1)
where T i a n d S i r e p r e s e n t t h e t r e n d and s e a s o n a l components of the studied phenomenon, X i t h e short-term temporal f l u c t u a t i o n s of i n t e r e s t and i and index of the time of occurrence. I n t h i s paper, we assume t h a t a s u f f i c i e n t l y long record of data i s a v a i l a b l e t o accurately i d e n t i f y and remove ab i n i t i o the long term trend T and the seasonal v a r i a t i o n s S. The following developments deal e s s e n t i a l l y with the s t a t i w i a r y innovation component X . The s t a t i s t i c a l model used f o r t h i s component i s c o n s i s t e n t with the s h o r t term behaviour of numerous geophysical time-series: the sampled process behaves as i d e n t i c a l l y d i s t r i b u t e d and c o r r e l a t e d random v a r i a b l e s , having t h e same zero mean and variance 0 2 ; i t s a u t o c o r r e l a t i o n s t r u c t u r e i s only a function of the time separation between the concerned sampled data; furthermore, an exponential decay of the autocorrelogram i s assumed according t o the r e l a t i o n : rk = r l k where rl i s the lag-one a u t o c o r r e l a t i o n c o e f f i c i e n t corresponding t o a u n i t sampling i n t e r v a l . I n t h i s development, we a r e s p e c i a l l y i n t e r e s t e d with the p r a c t i c a l case of a strongly p o s i t i v e dependence ( r l > 0 ) . As we a r e dealing here with i n t r i n s i c p r o p e r t i e s and in order t o avoid the use of multiple n o t a t i o n s , we will not attempt t o formally d i s t i n g u i s h the s t a t i s t i c s of the process from those of t h e i r e s t i m a t e s in t h e case o f l a r g e samples with neglected end e f f e c t s . L I N E A R INTERPOLATION
I t e r a t i v e l i n e a r i n t e r p o l a t i o n ( I L I ) i s one of the most common and p r a c t i c a l estimation techniques used t o generate values a t a frequency higher than t h a t of the measured hydrological v a r i a t e . I t i s c l e a r t h a t no new information i s created by i n t e r p o l a t i n g l i n e a r l y ; however, the o r i g i n a l content i s spread in time. When only two data points, located a t the ends o f the time-interval which i s studied r e t r o s p e c t i v e l y , a r e considered a t a p a r t i c u l a r time, ILI
338
c a n n o t be c o n s i d e r e d as an o p t i m a l e s t i m a t o r . s i m u l a t i o n o f i n t e r m e d i a t e values,
However, depending on t h e v a r i a -
ILI i s an e s t i m a t o r more p o w e r f u l f o r t h e
b i l i t y o f t h e process under s t u d y ,
t h a n t h a t o b t a i n e d by a mathematical expec-
t a t i o n which uses a g e n e r a l o r a seasonal mean value. t h e o r y p r o v i d e s r e f i n e d means, functions,
Numerical a p p r o x i m a t i o n
i n c l u d i n g i n t e r p o l a t i n g p o l y n o m i a l s and s p l i n e
f o r o b t a i n i n g more p o w e r f u l e s t i m a t o r s which t a k e advantage o f t h e
s h o r t t e r m temporal simplicity,
t r e n d s i n t h e measured data.
Nevertheless,
due t o i t s
ILI remains one o f t h e most common and p r a c t i c a l e s t i m a t i o n t e c h n i -
ques used i n h y d r o l o g y t o g e n e r a t e v a l u e s a t f r e q u e n c i e s h i g h e r t h a n t h o s e o f t h e measured v a r i a t e . C o n s i d e r t h e d a t a s e r i e s Xi c o n s e c u t i v e v a l u e s o f Xi interpolated series Y
j
(i = 1
... k )
o f l e n g t h k.
( j = 1
... N)
o f l e n g t h N = ( k - 1 ) p + 1.
i s d e f i n e d as a l i n e a r i n t e r p o l a t e o f t h e s e r i e s Xi (p'-l)
xi+l
p' and
p
+ (p-p'+l)
Each t e r m Y j
by t h e f o l l o w i n g e q u a t i o n :
xi (2)
Y J. = Y ( i - 1 ) p+p' where i
Each i n t e r v a l between
i s t h e n d i v i d e d i n t o p equal i n t e r v a l s , g i v i n g t h e
P
i s an i n t e g e r v a r y i n g f r o m 1 t o k; i s an i n t e g e r v a r y i n g f r o m 1 t o p; e x c e p t f o r i = k where p ' = 1; i s a f i x e d i n t e g e r d e f i n i n g t h e s u b d i v i s i o n l e v e l used i n t h e i n t e r polation.
Mean v a l u e s
1 N - 1 k c Y.; r e p l a c i n g Y J. by i t s d e f i n i By d e f i n i t i o n X = - c Xi and Y = k i=l ( k-1 )p+l 1 J t i o n e q u a t i o n [ 2 ] and p e r f o r m i n g t h e i n t e g r a l summations, one o b t a i n s :
-
2 p k
x -
(p-1) (x, + xk)
Y =
2 [ ( k - l ) p + 11 I f unknown end-values
Y,
and Xk a r e e s t i m a t e d by t h e mean v a l u e X o r i f t h e
-
l e n g t h o f t h e g e n e r a t i n g s e r i e s Xi
-
u n b i a s e d e s t i m a t e o f X.
i s l a r g e , t h e n Y becomes an a s y m p t o t i c
Making use o f t h i s p r o p e r t y and i n o r d e r t h e keep
t h e mathematical d e r i v a t i o n s as c o n c i s e as p o s s i b l e ,
t h e developments have
t o t h e case o f l a r g e and c e n t e r e d s t a t i o n a r y d a t a s e r i e s Xi. - G i v e n t h i s l a r g e - s a m p l e r e s t r i c t i o n , X = Y = 0 and t h e i n f l u e n c e o f t h e endbeen
limited
Val ues becomes n e g l ig i b l e.
339 So I L I p r e s e r v e s t h e f i r s t - o r d e r
s t a t i o n a r i t y o f the series;
cond-order s t a t i o n a r i t y i s n o t r e t a i n e d , s t r i c t 0 sensu,
however t h e se-
s i n c e i t w i l l be shown
To
t h a t t h e v a r i a n c e depends on t h e l o c a t i o n p ' o f t h e i n t e r p o l a t e d p o i n t .
c i r c u m v e n t t h i s p r o b l e m we have r e t a i n e d a b r o a d e r d e f i n i t i o n f o r s t a t i o n a r i t y g i v e n by PANKRATZ (1983, p. 1 6 ) w h i c h s t a t e s t h a t :
" i f a data-series
i s stationary,
t h e n t h e v a r i a n c e o f any m a j o r s u b s e t w i l l
d i f f e r f r o m t h e v a r i a n c e o f any o t h e r subset o n l y by chance".
To keep t r a c k o f t h i s r e s t r i c t i o n ,
we c a l l " g l o b a l " t h e v a r i a n c e o b t a i n e d by
summations o v e r i n t e g r a l numbers o f p a r t i t i ons. GLOBAL VARIANCE Noting
~2
and r l t h e v a r i a n c e and l a g 1 a u t o c o r r e l a t i o n c o e f f i c i e n t o f t h e
d a t a s e r i e s Xi,
a n u n b i a s e d e x p r e s s i o n o f t h e v a r i a n c e S2 o f t h e s e r i e s Y .
J
may be w r i t t e n as: N
Applying the variance operator t o t h e running point Y t h e i n t e r p o l a t e d s e r i e s e x p r e s s e d by e q u a t i o n [2],
(p,p') one g e t s :
= y(i-l)p+p'
T h i s e q u a t i o n shows t h a t t h e i n t e r p o l a t e d s e r i e s Y j i s n o t homoscedastic,
Of
i.e.
the variance i s n o t independent o f t h e l o c a t i o n p i o f the i n t e r p o l a t e d point. The minimal v a r i a n c e o c c u r s a t t h e m i d d l e o f t h e i n t e r p o l a t e d segment, however i n t h e case o f s e r i e s w i t h p o s i t i v e p e r s i s t e n c e ,
t h i s non-stationarity i n the
v a r i a n c e i s n o t v e r y severe:
I n t h e .case o f h i g h p o s i t i v e v a l u e s o f rl,
t h i s e f f e c t can be n e g l e c t e d .
For
an i n t e g r a l number o f p a r t i t i o n s , t h e s e r i e s Y . c o n s i d e r e d as a whole possesses J a f i n i t e and s t a b l e g l o b a l v a r i a n c e which can be o b t a i n e d by t h e summation o f
340 p ' between 1 and p; n e g l e c t i n g t h e e n d - e f f e c t s ,
The a s y m p t o t i c g l o b a l
one o b t a i n s :
v a r i a n c e o f an i n t e r p o l a t e d s e r i e s i s always reduced
w i t h r e g a r d s t o t h e v a r i a n c e o f t h e data s e r i e s .
The i n f l u e n c e o f t h e lag-one
a u t o c o r r e l a t i o n c o e f f i c i e n t rl i s g e n e r a l l y l a r g e r t h a n t h a t o f t h e p a r t i t i o n l e v e l p o f Xi
P
PI
and i s d i s p l a y e d on T a b l e 1.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2
0.750
0.775
0.800
0.825
0.850
0.675
0.900
0.925
0.950
0.975
5
0.680
0.712
0.744
0.776
0.808
0.640
0.872
0.904
0.936
0.968
10
0.670
0.703
0.736
0.769
0.802
0.835
0.868
0.901
0.934
0.967
,,.
0.667
0.700
0.733
0.767
0.800
0.833
0.867
0.900
0.933
0.967
ERROR OF ESTIMATION INDUCED BY ILI The expected Root Mean Square E r r o r (RMSE) r e s u l t i n g f r o m a l i n e a r i n t e r p o l a t i o n w i t h p p a r t i t i o n s w i l l be a n a l y t i c a l l y developed.
will
consider
the errors
p - s k i p p e d measured data.
To a c h i e v e t h i s , we
involved with the p possible realizations
o f the
We w i l l t h e n make use o f t h e s e l f - s i m i l a r p e r s i s t e n c e
s t r u c t u r e o f Markovian processes t o deduce t h e RMSE o f t h e i n t e r p o l a t e d s e r i e s o f s u b d i v i s i o n l e v e l p. Case where p = 2 S t a r t i n g w i t h t h e o r i g i n a l s e r i e s Xi d e r t h e two p o s s i b l e r e a l i z a t i o n s X I i
measured a t u n i t t i m e i n t e r v a l s , we c o n s i and
o b t a i n e d w i t h t h e s e r i e s Xiskip-
ped e v e r y two t i m e u n i t s and t h e n i n t e r p o l a t e d in-between: XIi
= XI
=
x*
xu* X'I3
x3 x4
XI4
x5
X'I5
X6
The e s t i m a t e d v a l u e s
XIi
.... ..... or
are l i n e a r interpolates o f type
341
( X i - l + Xi+1)/2. The two s e r i e s of estimation e r r o r s X-X' and X - X I ' c o n s i s t of z e r o s a l t e r n a t i n g with g e n e r i c terms X i - ( [Xi-1 + Xi+1]/2). Neglecting the end e f f e c t s , the expected variance of the estimation e r r o r f o r the two
s e r i e s ( t o t a l length 2N) i s :
I n t r o d u c i n g t h e v a r i a n c e and covariances of the sampled Markovian s e r i e s X i , one gets:
E2 a2
-
3
1
3
4
r l + 4-
= - -
2
1 r l + - r *
4
4
(4)
1
This expression r e l a t e s the variance of the estimation e r r o r t o the variance and the persistence s t r u c t u r e of t h e sampled data. The preceeding r e s u l t a p p l i e s t o an i n t e r p o l a t i o n of subdivision level 2 b u i l t If one wishes now t o i n t e r p o l a t e a on a measured s e r i e s of u n i t time step. s e r i e s t o a time s t e p half of the measured d a t a , the estimation e r r o r can not 1 3 5 be known because the intermediate times - , - , were not sampled, b u t i t s 2 2 2 variance can be estimated because of the s e l f - s i m i l a r s t r u c t u r e of the Markovian process: one has only t o replace the lag-one a u t o c o r r e l a t i o n c o e f f i c i e n t rl, o f t h e non-sampled s e r i e s (time s t e p 4) by ri4; this, because i n a Marko-
...
r l i s the lag-two auto-correlation
vian world,
c o e f f i c i e n t of t h i s s e r i e s .
T h u s , in t h i s p a r t i c u l a r case, f o r a Markovian process of mean zero, of variance 0 2 sampled a t u n i t times, the global variance of the i n t e r p o l a t e d s e r i e s a t times $ i s
E2
+
= 02
tion i s E 2 =
a2
-r1 l
a n d the expected variance of t h e e r r o r of estima-
[-i 4r1
3
[- - J r l + 4
-1.
4
Generalization f o r any level of p a r t i t i o n s p
-
Following the previously described p a t t e r n , we consider t h e p possible r e a l i s a tions of the P-skipped s e r i e s where the intermediate values have been l i n e a r l y i n terpol a ted:
P
P
*
f
X'
=
x1
,
X I 2
,
X I 3
,....) <+l
J'pt2.'
XI
p+3
'....'
XZp+l
* *
-
342
The e s t i m a t e d v a l u e s X ' ,
X",...,X(pl
a r e l i n e a r i n t e r p o l a t e s of t y p e :
I n t h i s e x p r e s s i o n , t h e i n d e x j v a r y i n g f r o m 1 t o p r e p r e s e n t s t h e rows, t h e i n d e x p ' v a r y i n g f r o m 1 t o p r e p r e s e n t s t h e columns and k i s an i n t e g e r , f r o m 0
N/P i d e n t i f y i n g t h e p a r t i t i o n .
to
I f one c o n s i d e r s t h e s e t o f t h e p s e r i e s o f i n t e r p o l a t i o n e r r o r s :
... , ( x - x ' P ) )
(x-x"),
(X-X'),
i t s generic term i s :
and t h e v a r i a n c e o f t h e e s t i m a t i o n e r r o r f o r t h e p s e r i e s ( t o t a l l e n g t h pN) can be expressed,
n e g l e c t i n g t h e e n d - e f f e c t s by 3 terms:
The two f i r s t summations
r e p r e s e n t t h e v a r i a n c e s o f t h e o r i g i n a l s e r i e s and
o f the p interpolated series:
The t h i r d summation g i v e s : a 2 2 - [p p
4
-P2
P
c
(p-p'+ll r p q 1
PI= 1
I f t h e p e r s i s t e n c e s t r u c t u r e o f t h e process i s Markovian, t h e n : l-rlP
P
( p - p ' + l ) rpIml =
1
p"
1
p r1 r1 (1-r1)2
1-rl
343 Thus t h e v a r i a n c e o f t h e e s t i m a t i o n e r r o r o f a p - s k i p p e d s e r i e s whose i n t e r m e d i a t e v a l u e s have been i n t e r p o l a t e d i s :
A p p l y i n g now t h e same r a t i o n a l e as w i t h t h e case where p = 2,
we t a k e advan-
tage o f t h e s e l f - s i m i l a r s t r u c t u r e o f Markovian processes t o e s t i m a t e t h e exp e c t e d v a r i a n c e o f t h e e s t i m a t i o n e r r o r f o r a s e r i e s sampled a t u n i t t i m e and 1 ,2 ,3 i n t e r p o l a t e d a t intermediate times To a c h i e v e t h a t , one has PUP! P because i n t h i s s e r i e s , r c o n s t i o n l y t o r e p l a c e i n e q u a t i o n [6] rl by r1 1
... ~.
tutes the l a g p autocorrelation coefficient:
T h i s e q u a t i o n d e t e r m i n e s t h e e x p e c t e d RMSE o f i n t e r p o l a t i o n ,
relative t o the
v a r i a b i l i t y o f t h e p r o c e s s u and f u n c t i o n o f t h e l a g one a u t o c o r r e l a t i o n c o e f I t takes f u l l y i n t o account the f i c i e n t rl and o f t h e l e v e l o f p a r t i t i o n p. h e t e r o s c e d a s t i c i t y o f t h e v a r i a n c e o f t h e i n t e r p o l a t e d s e r i e s and t h e terms o f c o v a r i a n c e i n t r o d u c e d i n t h e e r r o r s o f e s t i m a t i o n by t h i s p a r t i c u l a r i t y . As p i n c r e a s e s , e q u a t i o n ( 7 1 can be expanded a c c o r d i n g t c 1 ’ H o p i t a l ’ s r u l e :
so t h e a s y m p t o t i c e x p a n s i o n o f e q u a t i o n
[ 7 ] becomes,
f o r very l a r g e values
o f p: E 2 = 02
5+r
[--
+
3
4
4(1-r1)
Ln rl
Ln2 r1
-+-
1
Table 2 i l l u s t r a t e s t h e r e l a t i v e RMSE o f e s t i m a t i o n E / u
r e s u l t i n g from I L I
w i t h p p a r t i t i o n s b u i l t on a M a r k o v i a n s e r i e s o f parameter rI. t h a t t h e s i g n i f i c a n t f a c t o r i s t h e p e r s i s t e n c e parameter rI, values
of
r,,
very
little
t h a t an i n c r e a s e i n t h e l e v e l F o r l o w e r v a l u e s of
rl,
noise of
variance
is
F i g u r e 1 shows t h a t for higher
introduced by
I L I and
p a r t i t i o n p i s not a significant factor.
t h e i n t e r p o l a t i o n process,
l i k e any e s t i m a t i o n t e c h -
344 Expected relative RMSE of estimation resulting from I L I with p partitions b u i l t on a Markovian series of parameter r l .
Table 2:
riI
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2
0.866
0.677
0.594
0.527
0.466
0.410
0.354
0.297
0.236
0.162
5
1.13
0.782
0.680
0.600
0.530
0.465
0.401
0.337
0.267
0.184
10
1.21
0.796
0.691
0.610
0.539
0.472
0.408
0.342
0.271
0.186
1.29
0.801
0.695
0.613
0.541
0.475
0.410
0.344
0.272
0.187
P
0.4
0.2
0 Figure 1:
Relative RMSE
ds a
0.6
0.8
1.
f u n c t i o n o f r, and p .
n i q u e i s l e s s e f f i c i e n t and f o r rl = 0 ( t h e case o f independent sampled d a t a ) i t can even l e a d t o a l a r g e r RMSE o f e s t i m a t i o n t h a n t h e " g l o b a l " e s t i m a t o r
where i n t e r m e d i a t e unsampled d a t a p o i n t s a r e a r b i t r a r i l y e s t i m a t e d w i t h t h e g e n e r a l mean and whose e x p e c t e d r e l a t i v e RMSE o f e s t i m a t i o n i s € ' / a =
[p-!]". P
The a n a l y t i c a l r e s u l t s o b t a i n e d f o r t h e v a r i a n c e and e x p e c t e d v a r i a n c e o f t h e errors Xi
induced
(0; a*; rk =
by
the
r L k la r e
linear
interpolation
g i v e n on Table 3.
of
a
Markovian
series
345 Table 3: Synthesis o f the analytical results obtained with the interpolation o f a s e r i e s Xi[O;
k rk=rl
1.
Expected relative variance of the error E 2 / 0 2
Relative variance o f interpol ation s ' / 0 2
1 La?e-p-=-Z local value p' = 1
; ' 0
)
pl
1 ( l + r l )/ 2
= 2
(3tr-l) / 4
lglobal value
Occasional m i s s i n g v a l u e s The p r e c e e d i n g development a l l o w s t h e e v a l u a t i o n o f t h e i n f l u e n c e o f t h e use o f l i n e a r i n t e r p o l a t i o n t o f i l l - i n o c c a s i o n a l m i s s i n g values.
Consider a time-
s e r i e s o f l e n g t h N w i t h n n o n - c o n s e c u t i v e m i s s i n g v a l u e s ; t h e e x p e c t e d RMSE f o r
N
a series of length
-
i n t e r p o l a t e d once in-between would be:
2
E 3 - = [ - - 2 ~ r 0 2
r I
1/2
+>I 2
Thus f o r o n l y n r e - c r e a t e d d a t a p o i n t s one o b t a i n s : n
E
- = a
I:-
N
3 (-
2
1/2
-
2 J rl
+
rl
-)]
2
This equation gives a q u a n t i t a t i v e c o n t r o l over t h e variance o f the e r r o r global l y introduced i n t o the time-series. A p p l i c a t i o n t o mass-discharge c a l c u l a t i o n s I n t h e i n t e r p r e t a t i o n o f chemical t r a n s p o r t phenomena i n r i v e r s ,
charge e s t i m a t i o n c o n s t i t u t e s an e s s e n t i a l p r e l i m i n a r y step.
t h e mass-di s-
I f c ( t ) and q ( t )
a r e t h e c o n t i n u o u s processes r e p r e s e n t i n g t h e v a r i a t i o n s w i t h t h e t i m e o f t h e c o n c e n t r a t i o n s and o f t h e d i s c h a r g e s , t h e n t h e p r o d u c t c ( t ) q ( t ) represents t t h e f l u x o f m a t t e r a t t i m e t and Jt2 c ( t ) q ( t ) d t t h e mass d i s c h a r g e s e x p o r t e d between t i m e t, and t2.
I
346
I f simultaneous and equispaced t i m e - s e r i e s a r e a v a i l a b l e ,
t h e n t h e mass-dis-
N
1 ci -qi * A t a c c o r d i n g t o i=I I f sampling f r e q u e n c i e s a r e n o t
c h a r g e s a r e e v a l u a t e d by d i s c r e t e summations o f t y p e the trapezoidal r u l e o f numerical i n t e g r a t i o n .
t h e same d a t a m a n i p u l a t i o n such as a g g r e g a t i o n and i n t e r p o l a t i o n a r e necessary t o g e n e r a t e synchroneous e s q u i s p a c e d data. As numerous c o m b i n a t i o n s o f such m a n i p u l a t i o n s a r e p o s s i b l e t o o b t a i n a common frequency f o r t h e t r a n s f o r m e d data, some c o n s i d e r a t i o n s a r e expressed a b o u t t h e consequence o f t h e s e d a t a m a n i p u l a t i o n s : I f t h e t r a n s f o r m e d v a l u e s o f t h e processes c ( t ) o r q ( t ) a r e t o be i n t e g r a t e d
i n time according t o t h e trapezoidal i n t e g r a t i o n r u l e s then t h e e r r o r s i n t r o d u ced by l i n e a r i n t e r p o l a t i o n t o i n c r e a s e t h e f r e q u e n c y o r by t h e e l i m i n a t i o n o f i n t e r m e d i a r y v a l u e s t o reduce t h e frequency,
have a s i m i l a r m a t h e m a t i c a l formu-
l a t i o n , t h e o n l y d i f f e r e n c e o r i g i n a t e s f r o m t h e number o f o r i g i n a l d a t a p o i n t s and f r o m t h e v a l u e s o f t h e a u t o c o r r e l a t i o n c o e f f i c i e n t a s s o c i a t e d w i t h t h i s time i n t e r v a l .
T h i s s h o u l d be t h e b a s i s f o r m a x i m i z i n g t h e a c c u r a c y o f mass-
d i s c h a r g e s ; i n t h i s case t h e v a r i a n c e o f t h e e r r o r a s s o c i a t e d w i t h t h e c a l c u l a t i o n i s even more complex, because one s h o u l d a l s o t a k e i n t o a c c o u n t t h e covar i a n c e s and t h e p o s s i b l e f u n c t i o n n a l r e l a t i o n s h i p between c ( t ) and q ( t ) . CONCLUSION W i t h i n t h e c o n t e x t o f l a r g e M a r k o v i a n samples,
the global
v a r i a n c e and t h e
expected RMSE o f e s t i m a t i o n r e s u l t i n g f r o m i t e r a t i v e l i n e a r i n t e r p o l a t i o n ( I L I ) have been d e r i v e d .
The r e s u l t s show t h a t t h e l e v e l o f e r r o r i n t r o d u c e d by I L I
i s more s e n s i t i v e t o t h e p e r s i s t e n c e parameter rl t h a n t o t h e l e v e l o f r e a l i z e d p a r t i t i o n p. From a p r a c t i c a l p o i n t o f view,
t h e p r e c e e d i n g developments a l l o w t h e u s e r
t o c o n t r o l t h e amount o f n o i s e v a r i a n c e t h a t he i s w i l l i n g t o i n t r o d u c e i n t o t h e r e c o r d e d s i g n a l i n o r d e r t o make use o f t h e sampled i n f o r m a t i o n a t a f r e quency h i g h e r t h a n t h e one o f t h e measurements. REFERENCE PANKRATZ, A.
(1983). F o r e c a s t i n g w i t h u n i v a r i a t e Box-Jenkins models. 309 p. J.
W i l e y and Sons.
EMPIRICAL POWER COMPARISONS O F SOME TESTS FOR TREND
K.W. HIPEL’, A.I. MCLEOD2 and P.K. FOSU3
ABSTRACT Using monte carlo studies, the powers of Kendall’s tau and the lag-one serial correlation are compared for detecting trends in time series. Simulation experiments demonstrate that tests based on Kendall’s tau are more powerful than serial correlation tests for discovering deterministic trends. On the other hand, the lag-one serial correlation is more powerful when only purely stochastic trends are present. 1.
INTRODUCTION An important consideration in environmental impact assessment is whether a set of data
is random or whether a systematic trend is present. There have been several investigations (both theoretical and empirical) of test statistics to be used to test for randomness. An overwhelming majority of the test statistics currently used are nonparametric. Kendall and Stuart (1979) and Kendall et al. (1983) developed and employed statistics which include the turning point test, the sign test, Kendall’s tau denoted by r , and the rank correlation coefficient. Authors such as Dietz and Killeen (1981), van Belle and Hughes (1984), Hirsch et al. (1982), Hirsch and Slack (1984) and Simon (1977) considered modified versions of
Kendall’s tau statistic. Cox (1966) investigated the empirical distribution of the lag-one serial correlation, r l . Bartels (1982) compared the rank von Neumann statistic (RVN) to the runs test and the von Neumann statistic (VN). In that paper, the asymptotic relative efficiency (ARE) of RVN to VN was established as having a lower bound of 0.89. Knoke : Professor, Department of Systems Design Engineering, University of Waterloo, Wa-
ter loo, Ontario. : Associate Professor, Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario. : Phd Candidate, Department of Statistical and Actuarial Sciences, University Western Ontario, London, Ontario.
348 (1975,1977,1979) investigated the distribution of the serial correlation at different lags and how they could be employed in tests of randomness. For example, in the 1977 paper,Knoke compared some nonparametric tests to r l and established that the ARE'S of the rank serial correlation test and the turning point test (for normal first order autocorrelation alternatives) to be 0.91 and 0.19 with respect to r l , respectively. Kendall et al. (1983) derived the
ARE of
7
relative to the regression estimator to be 0.89.
The main purpose of this paper is to rigorously compare the most promising tests for trend and autocorrelation. In particular, using simulation, the powers (the power of an alternative hypothesis is the probability of rejecting the null hypothesis when the alternative hypothesis is true.) of Kendall's tau and r l are evaluated using various alternative models. Following the definition of the two test statistics, the alternative models to the null hypothesis are described. Finally
, the alternative models are employed in simulation studies to
ascertain the powers of r and r l . TEST STATISTIC
2.
Following the definition of
71,
Kendall's tau statistic is described. Let
x1,.. . , x,,
be a
random sample of size n drawn from a population. The first serial correlation r1 is defined here as
where
For samples as small as 10, Cox (1966) observed that
r1
has an approximate normal null
distribution for both normal and some nonnormal parent distributions. Knoke (1977) observed empirically that the normal distribution provides an adequate approximation for determining the critical regions (the subset of the sample space in which if the observed data falls, it is rejected.). The asymptotic distribution of rl was established by Wald and Wolfowitz(l943) and Noether(l950). In this paper, critical regions for
r1
are determined
by the normal approximations with the following moments (Kendall et al,l983;Dufour and Roy,1985) mean = -l/n and variance =
(n- 2)' n * ( n - 1)
349
Knoke(1975) noted that rl is a powerful test for detecting nonrandomness for first order autoregression alternatives and that it performs reasonably well for a wider class of alternatives including the first order moving average model. The second statistic considered is Kendall’s tau, ~(Kenda11,1970).For any two pairs of random variables (X;,Y;) and (Xj,yj), Kendall’s tau is defined as the difference (Gibbons, 1971) 7
= 7 l c - nd
(4)
where rC= p[(xi < xj)n (Yi< Yj)] +p[(Xi > Xj)n (Yi>
yj)]
(5)
In the case of no possibility of ties in neither the X’s nor the Y’s, r can be further expressed as 7
= 27lc
-
1 = 1 - 2nd
These relationships give practical meaning to Kendall’s
T
(7) since it is not just a test statistic
but it can also be expressed in terms of probabilities. From the sample of size n defined above, Kendall’s
T
is estimated by
(Gibbons,1971,1796;Conover,l971;Kendall,1970; Hollander and Wolfe,1973)
where N c , Nd and S are given by n
Nc =
C%; i<3
where
and
where if x; > zj; 0, otherwise. and
(9)
350
where I$; =
{
if x; < xj; if xi = x i ; otherwise.
I, 0, -1,
The statistic S can also be expressed as
In place of the statistic (8), the test statistic most often used in the literature is S (Kendall,l97O;Hirsch and Slack,l984;Hirsch et a1,1982;van Belle and Hughes,1984), where S is as defined in (13) and (15). In fact both
T
and S are statistically equivalent.
Under the assumption that X ; is independent and identically distributed (IID), Kendall (1970) gave the mean and the variance of S as
and for no ties; for ties. Where t is the number of ties for a particular rank and
Ct sums over all such ties.
(17) For
example, given the following set of ranks: 1,2,2,4,5,5,5,8,9,10,11,12the variance of S is V ( S )= 12(11)(29)-2(1)(9)-3(2)(11) 18 - 3744118. Kendall(l970) and Mann(1945) derived the exact distribution of S for n 5 10. For samples size as small as 10, the normal approximation was found to be adequate. For use with the normal approximation, Kendall(l970) suggested a continuity correction which is the standard normal variate Z defined as if S > 0; s+l
3.
otherwise.
ALTERNATIVE MODELS Under the null hypothesis, it is assumed that the time series Zt, t = 1,2,. . . ,n consists
of IID random variables. In this paper, the following specific alternative models are entertained, where the first three models contain only deterministic trends while the last three have purely stochastic trends. In the case of a purely deterministic trend the time series, Zt, may be written
351 where f(t) is a function of time only and et is an IID sequence. On the other hand, a time series having a purely stochastic trend may be written Zt = f ( Z t - l , Z t - z , . . .) where f(&-1,Zt--2,.
+ at
. .) is a function of the past data and at
is an innovation series assumed
to be IID and with the property
where < . > denotes expectation. In actual practice, it may be difficult to distinguish between deterministic and stochastic trends. For example, the series plotted in Figure 1 was simulated from the model (1- B ) ~=zat ~
where B is the backshift operator, at
-
N I D ( 0 , I ) and 21 = 100,
22 =
101,
23
= 102.
Based upon the shape of the plot, the series might well be fitted using a purely determistic trend even though the correct model is purely stochastic. Box and Jenkins (1976) suggest that for forecasting purposes it is usually better to use purely stochastic trend models provided such a model is reasonable apriori and also gives an adequate fit. However, in water quality studies it is often of interest to test if the level of the series has changed in some way and in this case a model with a possible deterministic trend component may seem more reasonable apriori. 3.1
Linear Model In the water resources literature, using linear regression models as alternative hypotheses
is quite common (Lettenmaier,1976;Hirsch et al,l982;Hirsch and Slack, 1984;van Belle and HughesJ984 ). AssumeZt is given by
zt = a + where et
- NID(0,a’).
bt
+
Et,
t = 1,2,. . . ,n.
Without loss of generality, let a = 0.0
.
(23)
352
50( 0 0
0
00
40(
Q)
3
3 D
Q)
C
0
300
00
0
m
0 0
E’ iij
0 0
0
200 0 0 0 0
f 10 30 40
0
Sequence Number
FIGURE 1 .
50
353 3.2
Logistic Model Because it is possible for a series to change rapidly at the start and then gradually to
approach a limit, a logistic model constitutes a reasonable choice for an alternative model. This model is defined as (Cleary and Levenbach,l982)
Zt = M/(1 - c(exp -at)) where
ct
-
+~
t ,
t = 1 , 2 , . . . ,n.
(24)
N I D ( 0 , l ) and M is the limit of 2, as t tends to infinity.
Step Function Model
3.3
The Step Function Model is defined as
where c t
- N I D ( 0 , a 2 )and a is the average change in the level of the series after time t
=
n/2.
The Step Function Model is actually a specific type of intervention model which can be used to model the effects of one or more interventions upon the mean level of a series (Box and Tiao,1975). In water resources, the intervention model has been employed in ascertaining the effects of both man-induced and natural interventions upon the mean level of water quantity (Hipel et a1,1975) and water quality (Mcleod et al,l983;Whitfield and Woods,1984) time series. Hipel and Mcleod (1986) presented a wide variety of time series applications in water resources using intervention and transfer function-noise models. 3.4
Barnard’s Model This alternative model is due to Barnard(l959) and is defined as Nt
Zt = 2,-1
+ 26; + i=
ct
t = 1,2,. . . ,n.
1
where Nt follows a poisson distribution with parameter A, 6 N I D ( 0 , l ) . Without loss of generality
, let
- NID(0,u2)
(26) and
ct
-
21 = €1. Barnard (1959) developed this model
for the use in quality control where there may be a series of Nt correctional jumps between measurements. 3.5
Second Order Autoregressive Model The second order autoregressive model may be written as (Kendall et a1.1983)
354 where
ct
-
E(Zt) = 0.0 3.6
N I D ( 0 , u 2 ) . For the simulation studies executed in this paper, u2 = 1.0 and
.
Threshold Autoregressive Model (TAR) The development of this type of model is due to Tong (1977,1978,1983), Tong and Lim
(1980) and Tong et al(1985). Tong(1983) and Tong and Lim (1980), Tong et al(1685) found TAR models to be suitable for modelling and forecasting riverflows. The particular model considered here(Tong,l983;Tong et a1,1985) is given by
where Zt is the volume of riverflow per cubic metre per second per day, Jt is the temperature in degrees centigrade e j l )
-
NID(O,0.69) and ej2)
-
NID(O,7.18)
. The above model was
estimated for the Vatnsdalsa River in Iceland for the period 1972 to 1974. 4.
SIMULATION EXPERIMENTS Sample sizes of 10,20,50 and 100 are considered. The power functions are estimated
for a significance Ievel of 5% by the proportion of rejections from 1000 replications. The
standard error of any entry in the tables is number of replications and
T
d ~ (- lT ) / N (Cochran,l977), where N is the
is the true rejection rate. For example, for the estimated
significance level of 5%, the standard error is
4°'05110i:'05) = 0.0069.
For the estimated significance level, the test is said to be conservative if the estimated level is clearly less than the nominal level (in this case 0.05). On the other hand, if the estimated level is clearly greater than the 0.05, the test is said to be optimistic. Otherwise the test is said to be adequately approximated. Empirical significance levels and powers are given in the six tables below, where there is a table for each of the six alternatives described in Section 3. The results suggest the critical regions are adequately determined by the null approximate distribution.
355
Empirical Rejection Rate at 5 Percent Level of Significance TABLE 1. LINEAR MODEL n 10 B U 7 71 0.00 0.05 0.052 0.041 0.01 0.05 0.335 0.138 0.50 0.01 0.057 0.039 0.01 1.00 0.053 0.040 0.041 0.01 2.00 0.050 0.05 0.05 1.000 0.990 0.05 0.50 0.121 0.063 0.05 1.00 0.066 0.049 0.05 2.00 0.050 0.040 1.ooo 0.10 0.05 1.000 0.138 0.335 0.10 0.50 0.063 0.10 1.00 0.121 0.049 0.10 2.00 0.066
TABLE 2. LOGISTIC MODEL n 10 A C M 7 rl .01 .01 0.0 0.052 0.041 .01 0.041 .01 0.1 0.052 .01 0.041 .01 1.0 0.052 .01 .01 5.0 0.052 0.041 .01 .50 0.0 0.052 0.041 .50 0.1 0.053 .01 0.042 .50 1.0 0.051 .01 0.042 .50 5.0 0.096 .01 0.046 0.052 .90 0.0 .01 0.041 .90 0.1 0.057 0.045 .01 .90 1.0 0.825 .01 0.431 .01 1.000 .90 5.0 1.000 0.052 .10 .01 0.0 0.041 .01 0.1 0.052 .10 0.041 1.0 0.053 .10 .01 0.041 0.053 .10 .01 5.0 0.041 0.052 .50 0.0 .10 0.041 0.054 .10 .50 0.1 0.046 0.076 0.044 .10 .50 1.0 0.622 .10 0.272 .50 5.0 0.041 0.052 .90 0.0 .10 0.045 0.052 .10 .90 0.1 0.239 .90 1.0 0.603 .10 0.976 1.000 .10 .90 5.0
20
50 71
0.035 0.995 0.072 0.050 0.036 1 .ooo 0.632 0.208 0.084 1.ooo 0.995 0.632 0.208
rl
7
0.044 0.749 0.043 0.043 0.047 1.000 0.194 0.064 0.043 1.000 0.749 0.194 0.064
0.040 1 .ooo 0.487 0.146 0.073 1.000 1.000 1.000 0.669 1.ooo 1.ooo 1.000 1.000
20 7
0.035 0.035 0.035 0.037 0.035 0.037 0.049 0.375 0.035 0.072 1.000 1.000 0.035 0.035 0.037 0.037 0.035 0.040 0.092 0.934 0.035 0.047 0.745 1.000
0.045 1.000 0.080 0.042 0.042 1.000 1.000 0.655 0.121 1.000 1.000 1.000 0.655
100 r rl 0.048 0.051 1.000 1.000 1.000 0.690 0.769 0.131 0.268 0.065 1.000 1.000 1.000 1.000 1 .ooo 1.000 0.927 1.000 1 .ooo 1.000 1.000 1.000 1.000 1.000 1.ooo 1.000
50 71
0.044 0.044 0.044 0.045 0.044 0.047 0.040 0.078 0.044 0.045 0.950 1.000 0.044 0.044 0.047 0.047 0.044 0.047 0.045 0.579 0.044 0.045 0.366 1.000
T
0.040 0.040 0.041 0.041 0.040 0.043 0.166 1.ooo 0.040 0.174 1.000 1.ooo 0.040 0.040 0.040 0.043 0.040 0.045 0.155 0.983 0.040 0.058 0.733 1.000
100 71
0.045 0.045 0.045 0.045 0.045 0.047 0.057 0.771 0.045 0.060 1 .ooo 1 .ooo 0.045 0.045 0.044 0.045 0.045 0.048 0.063 0.893 0.045 0.047 0.490 1.000
7
0.048 0.048 0.047 0.047 0.048 0.050 0.514 1.000 0.048 0.279 1.000 1.000 0.048 0.048 0.047 0.046 0.048 0.047 0.153 0.951 0.048 0.053 0.583 0.999
71
0.051 0.051 0.050 0.050 0.051 0.048 0.075 0.999 0.051 0.057 1.000 1.000 0.051 0.051 0.050 0.050 0.051 0.049 0.053 0.905 0.051 0.045 0.464 1.000
356
TABLE 3.
S T E P FUNCTION MODEL 10
n
A 0.00 0.05 0.05 0.05 0.05 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 5.00 5.00 5.00 5.00
0.05 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00
TABLE 4. n
x 1.o 1.0 1.0 1.o 2.0 2.0 2.0 2.0 5.0 5.0 5.0 5.0 10.0 10.0 10.0 10.0 20.0 20.0 20.0 20.0
7-
U
0.052 0.199 0.058 0.053 0.053 0.711 0.199 0.090 0.055 0.711 0.490 0.199 0.090 0.711 0.711 0.711 0.596
20
50
71
7
71
7
0.041 0.121 0.040 0.040 0.040 1.ooo 0.121 0.057 0.048 1.ooo 0.352 0.121 0.057 1.000 1.000 0.943 0.507
0.035 0.382 0.038 0.036 0.035 0.994 0.382 0.131 0.060 0.994 0.887 0.382 0.131 0.994 0.994 0.944 0.960
0.044 0.152 0.048 0.047 0.045 1.ooo 0.152 0.064 0.042 1.ooo 0.594 0.152 0.064 1.ooo 1.000 1.ooo 0.821
0.040 0.803 0.051 0.046 0.045 1.ooo 0.803 0.283 0.103 1.000 1.000 0.803 0.283 1.ooo 1.000 1.000 1.000
100 7
71
0.045 0.281 0.039 0.044 0.045 1.000 0.281 0.070 0.043 1.000 0.958 0.281 0.070 1.000 1.000 1.000 0.998
0.048 0.983 0.069 0.055 0.052 1.000 0.983 0.525 0.160 1.000 1.000 0.983 0.525 1.ooo 1.000 1.ooo 1.ooo
71
0.051 0.520 0.051 0.049 0.050 1.ooo 0.520 0.113 0.059 1.ooo 1.ooo 0.520 0.113 1.000 1.000 1.ooo 1.000
BARNARD’S MODEL 10 U
7-
0.05 0.50 1.oo 2.00 0.05 0.50 1.oo 2.00 0.05 0.50 1.oo 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00
0.484 0.459 0.482 0.486 0.477 0.575 0.467 0.460 0.485 0.469 0.491 0.478 0.488 0.501 0.472 0.480 0.473 0.501 0.498 0.506
20
50
rl
r
71
7-
71
0.579 0.579 0.571 0.579 0.573 0.586 0.586 0.571 0.560 0.562 0.577 0.591 0.565 0.628 0.603 0.586 0.569 0.612 0.593 0.607
0.681 0.667 0.680 0.655 0.688 0.690 0.683 0.669 0.689 0.670 0.680 0.674 0.690 0.677 0.665 0.665 0.689 0.658 0.654 0.666
0.933 0.944 0.944 0.955 0.937 0.946 0.954 0.958 0.935 0.937 0.948 0.952 0.938 0.961 0.946 0.960 0.941 0.949 0.950 0.946
0.809 0.819 0.817 0.800 0.802 0.789 0.798 0.784 0.812 0.802 0.802 0.783 0.796 0.802 0.790 0.796 0.801 0.804 0.811 0.819
1.000 1.000 1.000 1.000 1.ooo 1.000 1.000 1.ooo 1.ooo 1.000 1.000 1.ooo 1.000 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo
357
TABLE 5.
AR(2) MODEL
n
10
20
50
41
42
T
rl
i-
71
-1.40 -0.70 0.70 1.40 -1.20 -0.60 0.60 1.20 -0.80 -0.40 0.40 0.80 -0.40 0.40 0.80 -0.50 -0.30 0.30 0.50 -0.40 -0.20 0.20 0.40
-0.80 -0.80 -0.80 -0.80 -0.50 -0.50 -0.50 -0.50 -0.20 -0.20 -0.20 -0.20 0.10 0.10 0.10 0.30 0.30 0.30 0.30 0.50 0.50 0.50 0.50
0.000 0.002 0.031 0.247 0.000 0.007 0.072 0.305 0.001 0.009 0.082 0.260 0.010 0.180 0.390 0.013 0.024 0.193 0.309 0.017 0.046 0.175 0.288
0.731 0.009 0.170 0.881 0.660 0.064 0.223 0.745 0.422 0.082 0.137 0.456 0.207 0.180 0.452 0.500 0.239 0.138 0.243 0.615 0.319 0.143 0.211
0.000 0.000 0.009 0.132 0.000 0.002 0.050 0.264 0.001 0.004 0.091 0.290 0.010 0.187 0.536 0.005 0.023 0.297 0.401 0.013 0.059 0.302 0.470
1.000 0.154 0.485 0.998 0.989 0.279 0.470 0.985 0.866 0.210 0.267 0.850 0.470 0.386 0.872 0.788 0.466 0.306 0.585 0.824 0.470 0.250 0.513
TABLE 6. TAR MODEL n: 10 20 71 r r 0.320 0.499 0.435
4.1
-
T
0.000 0.000 0.002 0.067 0.000 0.000 0.036 0.285 0.000 0.002 0.089 0.264 0.008 0.248 0.633 0.002 0.023 0.335 0.524 0.005 0.088 0.377 0.610
100
r
rl 1.000 0.975 0.991 1.ooo 1.000 0.900 0.932 1.ooo 1.000 0.645 0.684 1.000 0.840 0.814 1.ooo 0.990 0.776 0.:01 0.975 0.984 0.682 0.578 0.933
0.000 0.000 0.000 0.045 0.000 0.000 0.031 0.230 0.000 0.002 0.103 0.268 0.007 0.245 0.625 0.005 0.028 0.348 0.527 0.008 0.061 0.421 0.684
50 rl
0.904
T
0.320
rl 1.000 1.000 1.ooo 1.000 1.ooo 0.998 1.000 1.000 1.000 0.939 0.951 1.ooo 0.991 0.985 1.ooo 1.ooo 1.000 0.934 0.999 1.000 0.881 0.831 0.999
100
r
rl
1.000
0.338
71
1.ooo
Linear Model The results for this model (Table 1) indicate t h a t the smaller the standard deviation, the
better the performance of the two tests. This implies the better the fit of linear regression t o a time series the greater the chance of the detection of nonrandomness. For samples as small as 10, the tests are very powerful for small standard deviations. An encouraging aspect of this model is t h a t both tests attain asymptotic efficiency quite rapidly. For example, there is considerable improvement in the power functions from n=lO to n=20. A noteworthy point is t h a t r is generally more powerful compared t o
TI,
even though
the difference is almost negligible for n=50 and n=100 when both tests approach asymptotic efficiency.
358
4.2
Logistic Model The results for this model are presented in Table 2. Here too the tests perform better
when there is a good fit, indicating nonrandomness.When M
<
1.0, obviously the stan-
dard deviation of 1.0 used in the simulation studies tends t o have a greater impact on the simulated d a t a than the parameters of t h e model. Hence a great component of mined by
~ t which ,
is random. For M
Zt is deter-
> 1.0, the two tests (especially T ) prove effective for
detecting the presence of trend. Finally, it is seen t h a t r is more powerful than r l , especially for cases where the Logistic Model describes the d a t a fairly well ( M
2
1.0). There is, however, not much difference
between the two tests when n=100. 4.3
Step Function Model The results for this alternative model (Table 3) indicate greater power for relatively small
standard deviations (and hence fairly good fits). What is remarkable about this model is the great power of both tests even for samples as small as 10. For example, for a=5, the power is a t least 50% for all sample sizes. The power functions also improve as a increases. Both tests are very effective in detecting trend for even a slight shift of 0.5 in the mean level of the series. For a change of 5 in the mean level, both tests are very powerful. Even though tests are almost equally powerful for n 4.4
T
is more powerful t h a n r l , both
2 50.
Barnard’s Model The results of Table 4 are very consistent and easily comprehensible. For all sample
sizes and all combinations of lamda (A) and standard deviation (o),r1 has greater power than r For n as small as 50, r l attains asymptotic efficiency, while r is only about 80% efficient. T h e power of the two tests can be well appreciated by considering the results for n=IO. While the power of r is about 50% t h a t of r l is always greater t h a n 50%. 4.5
Second Order Autoregressive Model The results of this model (Table 5) parallel fairly closely those of Barnard’s model. T h e
main difference between. the two models is t h a t the results here are not as dramatic as in Table 4. Here too, r l is more powerful than r. As n increases, the power of rl increases faster than t h a t of r. For n=100, r l attains almost 100% efficiency while
T
performs fairly poorly in some cases. For example, for
$1
=
359
-0.2 and 4.6
42
= 0.5 the power of r l is 0.881 while t h a t of r is only 0.061 for n = 100.
TAR Model T h e results for this model are very similar t o those of Tables 4 and 5. T h e rl test is
obvious'ly more powerful than r. Also, while the power of r l increases very rapidly with increasing n, the the power of r only makes a slow progression. T h e
r1
test is about 90%
efficient for n=20 and it attains 100% efficiency a t n=50. On the other hand, the power of
r is less than 50% even for n=100. 5.
CONCLUSION The general deduction from Section 4 is that r is more powerful for t h e first three
models while r1 is more powerful for the last three. As noted in Section 3, the first three models contain deterministic trends and the last three have stochastic trends. Therefore, it is reasonable to conclude that r is more powerful for detecting deterministic trends while
rl
is more powerful for discovering stochastic trends. In practice, it is advantageous to have both a sound physical and statistical understanding of the time series being analyzed. This will allow one to decide whether one should employ models possessing deterministic trends or whether one should use models having stochastic trends. For example, it may be better to model certain kinds of water quality measurements using models having deterministic trends. On the other hand, for modelling seasonal riverflows, models having stochastic trends, such as a threshhold autoregressive model, may work well (Tong,l983;Tong et a1.,1985). In some cases, one may wish t o use a model which possesess both deterministic and stochastic trends.
360
REFERENCES Barnard, G.A. (1959). Control Charts and Stochastic Processes.
Journal of the Royal
Statistical Society,Series B 2 1 , 239-271. Bartels, R. (1982). The Rank version of von Neumann’s Ratio Test for Randomness.Journa1 of the American Statistical Association 77, 40-46. Box, G.E.P.;Jenkins, G.M. (1976). Time Series Ana1ysis:Forecasting and Control.2nd Edition, Holden-Day. Box, G.E.P.;Tiao,G.C. (1975). Intervention Analysis with Applications to Economic and Environmental Problems.Journal of the American Statistical Association 70,70-79. Cleary, J.A.;Levenbach, H. (1982). The Professional Forecaster. Lifetime Learning Publications,Belmont, California. Cochran, W.G. (1977). Sampling Techniques.3rd Edition, Wiley, New York. Conover, W.J. (1971). Practical Nonparametric Statistics. Wiley,New York. Cox, D.R. (1966). The null distribution of the first serial correlation coefficient.Biometrika 53,623-626.
Dietz, E.J.;Killeen, T. (1981). A Nonparametric Multivariate Test for Monotone Trend with Pharmaceutical Applications. Journal of the American Statistical Association 76,169-1 74. Dufuor, J.M.;Roy, R. (1985). Some Robust Exact Results on Sample Autocorrelation and Tests of Randomness.Department of Data Processing and Operation Research, University of Montreal.
Gibbons, J.D. (1971). Nonparametric Statistical Inference. McGraw-Hill,New York. Gibbons, J.D. (1976). Nonparametric Methods for Quantitative Analysis.Holt,Rinehart and Wins ton, Ne w York. Hipel, K.W.;Lennox, W.C.;Unny, T.E.;Mcleod, A.I. (1975). htervention Analysis in Water Research. Water Resources Research l l ( 6 ) ,855-861. Hipel, K.W.;Mcleod, A.I. (1986). Time Series Modelling for Water Resources and Environmental Engineers.Elsevier,Amsterdam. Hirsch, R.M.;Slack, J.R.;Smith, R. (1982). Tecniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1),107-121. Hirsch, R.M.;Slack, J.R. (1984). A Nomparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6),727-732.
361
Hollander, M.;Wolfe,D.A. (1973). Nonparametric Statistical Methods. Wiley,NewYork. Kendall, M.G. (1970). Rank Correlation Methods.4th Edition, GriEn’London. Kendall, M.G.;Stuart, A . (1979). T h e Advanced Theory o f Statistics. Vol I.GriEn,london. Kendall, M.G.;Stuart, A.;Ord, J.K. (1983). The Advanced Theory o f Statistics.Vo1 3.Griffin,London. Knoke, J.D. (1975). Testing for Randomness Against Autocorrelated A1ternatives:The Parametric Case. Biometrika 62, 571-575. Knoke, J.D. (1977). Testing for Randomness Against Autocorrelation. Alternative Tests. Biometrika 64,523-529. Knoke, J.D. (1979). Normal Approximations for Serial Correlation Statistics.Biometrics 35,491-495.
Lettenmaier, D.R. (1976). Detection of Trends in Water Quality Data from Records with Dependent Observations. Water Resources Research 1 2 (5),1036-1046. Mann, H.B. (1945). Nonparametric Tests Against Trend.Econometrica 13,245-259. Mcleod, A.I.;Hipel, K . W.;Camacho, F. (1983). Trend Assessment o f Water Quality T i m e Series. Water Resources Bulletin 19(4),537-547. Noether, G.E. (1950).
Asymptotic Properties o f the Wald- Wolfowitz Test o f Random-
ness.Annals o f Mathematical Statistics 21, 231-246. Simon, G. (1977). A Nonparametric Test o f Total Independence Based on Kendall’s Tau. Biometrika 64,237-282. Tong, H. (1977). Discussion o f p a p e r by A.J. Lawrence and N.T. Kottegoda.Journa1 o f the Royal Statistical Society, Series A 140,34-35. Tong, H. (1 978). On a Threshold Mode1.h Pattern Recognition and Signal Processing.(C.H. Chen ed).Sijthoffand Noordhoff,TheNetherlands. Tong, H. (1983). Threshold Model in Non-Linear T i m e Series Analysis. Lecture Notes No. 21,New York,Springer Verlag. Tong, H.;Lim, S. (1980). Threshold Autoregression,Limit Cycles and Cyclical Data (with discussion). Journal o f the Royal Statistical Society,Series B 42,245-292. Tong, H.;Thanoon, B.;Gudmundsson, G. (1985). Threshold T i m e Series Modelling o f two Icelandic Riverflow Systems.Water Resources Bulletin 2 1 . van Belle, G.;Hughes, J. (1984). Nonparametric Tests for Trend in Water Quality. Water Resources Research 20 (1),127-136.
362
Wald, A.;Wolfowitz,J . (1943). A n Exact Test for Randomness in the Nonparametric Case Based on Serial Correlation .Annals o f Mathematical Statistics 14,378-388. Whitfield,P.H.; Woods, P.F. (1984). Intervention Analysis of Water Quality Records. Water Resources Bulletin 20[5),657-667.
Statistical Assessment of a Lhnnological Data Set by Robert Clifford, Jr., John W. Wilkinson and Nicholas L. Clesceri Rensselaer Polytechnic Institute, Troy, N.Y.
ABSTRACT
In a study of Wisconsin Lakes, to examine the effects upon water quality of imposition of a ban on detergent phosphorus, the design protocol employed the concept of test lakes and reference lakes. A pairing was made of each test lake with a reference lake having as many similar characteristics as possible with the test lake except for a loading of phosphorus from municipal wastewater effluent or septic tank seepage. The responses measured for each lake were physical, chemical and biological in nature. Measurements were taken both before and after imposition of the ban. To estimate the potential effect of the ban, three forms of statistical models were used; (i) for each test lake a model using the reference lake variable as a covariate and the ban as a classification variable, (ii) a comprehensive model for all of the lakes combined using the reference lakes as covariates and the test lakes as dummy variables, and (iii) multivariate models providing multiple comparison estimates for pre- and post-ban differences. The advantage to the paired lake approach is the potential for variance reduction, and an examination of this was made for several data sets. In this paper are discussed the comparisons of the modeling procedures as well as estimates of the "ban effects." Also presented are some of the observed distributional characteristics of the measured responses. INTRODUCTION
The growth of algae is, to a large extent, regulated by the presence of the macronutrients nitrogen and phosphorus in the water column (Hutchinson, 1957, Wetzyl, 1975). Excess growth can degrade water quality by reducing clarity, adding noxious odors and taste to the water, hampering motorboat movement, and reducing overall aesthetic quality. Of the macronutrients, phosphorus is most frequently "limiting", i.e. the amount of phosphorus input to a water body is the regulating factor in photosynthetic production (Likens, 1972, Schindler, 1977). Phosphorus is an important ingredient in laundry detergents, serving as a "builder" by, among other things, reducing water hardness. As a means of reducing the load of phosphorus to both municipal and private wastewater treatment systems bans prohibiting the presence of phosphorus in laundry detergents have
364
been imposed in numerous locations around the United States. Although a reduction in treatment plant loadings of phosphorus have been monitored in some of these areas, mixed reviews have appeared as to the effectiveness of detergent phosphorus bans in subsequently improving water quality in these locales (Pieczonka and Hopson, 1 9 7 4 , Bell and Spacie, 1 9 7 8 , Hartig and Horvath, 1 9 8 2 , Runke, 1 9 8 2 , Maki, Porcella and Wendt, 1 9 8 4 ) . The state legislature of Wisconsin enacted a multi-year detergent phosphorus ban which became effective on 1 July 1 9 7 9 and was in effect to 3 0 June 1 9 8 2 . The Soap and Detergent Association initiated a lake study program in 1 9 7 8 and continued it through 1 9 8 3 in order to determine the effectiveness of the ban. The study looked at physical, chemical and biological parameters from the study lakes to determine if any changes in these were resultant from imposition of the ban. An assumption accepted, and borne out throughout the literature, was the strong relationship between phosphorus concentrations and a number of other lake water quality parameters. Typically, trend analysis of water quality data is hampered by several factors, among them missing values, values below detection limits, seasonality, and the non-normality of the parameter distributions (Hirsch, Slack and Smith, 1 9 8 2 , Van Belle and Hughes, 1 9 8 4 ) . It has also been reported that an extensive data record is necessary in the assessment of lake restoration programs in order to increase the statistical power level if parametric tests are used (Trautmann, et. al., 1 9 8 2 ) . As a result, non-parametric statistical methods are usually employed to determine time related variations in water quality. These studies, however, assume that a monitoring record is available only for a limited number of lakes or for only those lakes which are impacted by phosphorus control measures. Considering that the imposition of a detergent phosphorus ban was an experiment in improving water quality, two groups of lakes were selected for investigation. The experimental group, or "test" lakes, were those lakes within the state of Wisconsin that were determined to be receiving a significant percentage of their phosphorus loading as sewage effluent, from either public or private treatment systems. These lakes would therefore be the most likely to be impacted by a reduction in phosphorus concentration from these sources. The control group, or "reference" lakes, were lakes determined not to be impacted by sewage eff-
365
luent. By coincidentally monitoring reference lakes a baseline would be established reflecting only natural fluctuations in water quality occurring over time, those chiefly a function of climatic conditions (i.e. temperature, rainfall amounts and frequency). An overall temporal trend in water quality data observed upon the test lakes, which significantly deviated from any -observed upon the reference lakes, could them be ascribed as a function of imposition of the detergent phosphorus ban. MONITORING METHODOLOGY
In considering lakes to be included in the monitoring program, preference was given to those for which historical information was available from sources such as the National Eutrophication Survey (NES) or the Wisconsin Department of Natural Resources (WDNR) Quarterly Monitoring Program. Consideration in terms of size, depth, and hydraulic residence time followed NES selection criteria (NES, 1974). The locations of the lakes selected for the study are shown in Figure 1. The apparent concentration of study lakes in the northern part of the state is consistent with the actual partitioning of lakes within the state (WDNR, 1975). Groups of lakes fall within regional boundaries set by the WDNR and corresponding to bedrock and glacial geology as well as soil cover (Lillie and Mason, 1983). These groups include test lakes and their corresponding reference lake. In the analysis, test lakes Butternut, Elk and Balsam were paired with reference lake Teal. These lakes are situated in granite soils underlain by a sandstone bedrock (Prescott, 1962). Test lakes MOSS, Enterprise and Townline were paired with reference lake Little Bearskin; all are surrounded by sandy or silty soil and underlain by sandstone. Test lake Swan is paired with reference lake Fish; both are located in the alkaline soil of the southern regions of the state and are underlain by limestone. Limnologic, morphologic and drainage basin characteristics of the lakes are summarized in Table 1. Reference lakes are geographically proximate to their test lakes and it may be noted from Table 1 that, in several cases, morphological dissimilarities are minimal between test-reference lake pairs. Though not "pristine" (residences are located along the lake shore), the reference lakes have the least amount of drainage basin area devoted to shoreline development. The extent of impaction by
366
WISCONSIN
/
MICHIGAN
-5
I'
d
FISH
@MADISON
I
Figure 1. Locations of the Wisconsin Study Lakes sewage effluent upon the test lakes is listed in Table 2. The determination that Balsam, Moss and Enterprise lakes were not impacted by effluent phosphorus was made upon a reevaluation of nutrient loadings conducted after the monitoring study. At the time the study was initiated, in 1978, the phosphorus removal capabilities of municipal land treatment systems and private septic tank tile field systems were in question. These lakes were maintained as test lakes throughout the analysis since they did differ from their respective reference lakes by having effluent land treatment systems within their watersheds and, therefore, could be used to verify the phosphorus removal capabilities of these types of systems. Since the detergent
367
TABLE 1 Limnological, Morphological, and Drainage Basin Characteristics of the Study Lakes.
Lake
County
Surface Area (ha.),
Volume (10"*6 cu m.)
Mean Depth
(m)
2
3
Max Depth
(m)
Mean Hydraulic Residence Ti,e
Number o f Tributaries
in
5
Out
(days) ~
_
_
Immed. Drain.
No- O f Residences
":By:
i n 1981
6
(sq.km.1
_
7
~~
Butternut
Prlce
407
17.10
4.2
10.0
180
4
1
8.5
Elk
Prlce
36
0.55
1.5
6.0
c5
1
1
3.8
7
Balsam
Hashburn
119
8.74
7.3
15.0
70
2
1
9.6
24
265
Tea I
Sawyer
425
16.15
3.8
9.0
210
2
1
9.7
137
Moss
V I 10s
79
2.36
3.0
9.0
900
0
1
3.0
40
Towntine
Oneida
62
2.15
3.5
6.0
220
2
1
1.1
71
204
7.26
3.6
(1.0
620
1
1
10.9
124
66
1.57
2.4
8.0
50
1
1
5.0
43
E n t e r p r l s e Langlsde Llttie Oneida Bea r s k l n swan
Columbia
164
16.03
9.8
25.0
160
1
1
21.3
104
Flsh
Dane
102
6.34
6.2
19.0
1410
0
0
7.7
66
(1) Source - Wisconsin Department of Natural Resources (1981) (2) Volumes estimated planimetrically using depth contours from maps prepared by The Clarkson Company, Kauksuna, WI (3) Lake volume divided by surface area ( 4 ) Lake volume divided by the mean annual flow (5) Intermittent streams are not listed as tributaries ( 6 ) Source - Wisconsin Department of Natural Resources (1975) (7) Visual survey conducted by the Environmental Research Group, Inc., St. Paul MN. (Note: a resort was counted as equivalent to 20 residences, a scout camp equivalent to 40 residences)
phosphorus ban was intended to impact lakes which would be considered candidates for nutrient reduction measures, such as a lake possessing an effluent discharge within its watershed, the effect of the ban upon Balsam, Moss and Enterprise lakes would be relevant to the overall success of the ban. The Wisconsin lakes were monitored from 1978 through 1982. Only reference lakes were monitored during 1979, the year the ban was initiated. Monitoring of Fish Lake was discontinued in 1981 and, hence, data from Fish Lake is not included in the statistical analyses. Field trips to the lakes occurred between ice-out (late April to mid-May) and fall overturn (late October to early November). The interval between sampling was typically four weeks although samples were collected every two weeks during the summer months (July and August). Samples and measurements were taken, on all of the lakes, at the location of the deepest point and at one or two other locations, depending upon the
368
TABLE 2. Extent of Wastewater Treatment Within Study Lake Basins.
Lake
Name of Municipa 1 WWTP
Final Application of Treated Wastewater
----_-_--------_--__ ....................... Elk Butternut
Phillips Butternut
Direct Discharge to Lake Indirect Discharge to Surface Water
II Swan Pardeeville II Townline Three Lakes Balsam Birchwood Land Disposal 11 Moss Lac du Flambeau Enterprise --Septic tank/Tile field
Phosphorus Load % of Total kg/yr Load
------ -----1660
22
480
19
1730
39
54 0
8 0 0 0
0 0
Note: About 30% of wastewater phosphorus may be assumed to come from detergents. morphology of the lake. Transparency was measured using a standard Secchi disk. profile measurements were made at one meter depth increments for temperature, dissolved oxygen, and conductivity. An integrated two meter sample of the epilimnion was obtained using a 37 mm (I.D.) PVC pipe. Aliquots of the integrated sample were stored in amber Nalgene bottles at 4 O C and earmarked for specific analyses. Chemical analysis of the samples was typically initiated within 48 hours. Total Phosphorus determinations followed persulfate digestion (Menzyl and Corwin, 1 9 6 5 ) ; the colorimetric reaction involved reduction using ascorbic acid (Murphy and Riley, 1 9 6 2 ) . Chlorophyll-g was determined using trichromatic methods (APHA, 1 9 7 6 ) . Temporal Variation of the Data Temporal plots of the monitoring data, such as that presented in Figure 2, evidence the amount of variability present in water quality records of either physical or chemical parameters. Yearly trends in any of the monitored parameters were difficult to discern from the plots. However, a degree of "tracking", a synchronous correspondance between plots for test and reference lake pairs, could be ascertained in several cases. The obvious imprecision of any subjective determinations made upon the data set, however, lead to the statistical methodolgy employed.
Y
0
7t
3
5
c
0,
0
I
z
...
m
7c
s
B
m
--I
u l
<
3
3 m
r
D 2
v,
r-
m D
I
I
Di
< r
U
0
m
0
W
0
r
I
SEfCH% DE?3TH ‘?ETys’ 6 0
H
I
H
r-
’q -
0
20
h
dl
CO CEN (UG L) 60 100 120
0
Y
rn
-4
J:
60 Coh!ENf8dUGhb’ 300 360
370
STATISTICAL METHODOLOGY AND RESULTS
For examination of a potential ban effect, three types of statistical analysis were used: (i) covariance analysis for each test lake separately, (ii) combined covariance analyses for all test lakes, and (iii) multivariate analysis obtaining multiple comparison estimates for pre- and post-ban differences of interest. All analyses were performed using logarithmic transformations of the original lake data, a scale of measurement strongly supported by earlier lake data distribution studies. Covariance Analysis for Individual Test Lakes For each test lake, a covariance analysis was performed using a model of the form: log yt =
Po + Pllog Y, + P, B +
E
,
where yt
represents a test lake observation, represents a corresponding reference lake yr observation, B is a 0, 1 indicator variable indicating a pre- or post-ban observation. One can think of this model permitting variance reduction of the test lake data, due to their association with the reference lake data obtained under similar background conditions, thus allowing a potential difference due to the ban to be detected with improved sensitivity. Table 3 lists the salient features of the covariance analyses for the logarithms of the responses for the individual test lakes. Covariance Analysis for All Test Lakes The estimate for the change in the intercept associated with the post-ban period is the feature of greatest interest. Only for Elk and Townline Lakes for secchi disc depth is this change statistically significant. Part of this may be attributed to the small sample sizes and the large variability, encouraging an examination of the test lakes simultaneously using an "indicator variable" approach. This analysis is described next. Assuming that the test lake-reference lake relationship is similar for all the lakes, improved sensitivity for ban-effect detection is provided by a model that simultaneously considers
371
TABLE 3 . Covariance Analysis for Individual Test Lakes. I-----Flsh-----l---------------------Teal------------------I Elk wan I Balsam I Butternut I I I I
RererenCe Lake
I
Lake
Pa ramt e r Intercept
II
I
Po
Post-Ban Change Standard E r r o r o f Change
11.89 1.52 1.111
8
I
I .08 21
I
I
.13
,89 .27 1.0411.29 -.06
I
I
8,
I
.12 -.201 .OO
.13 -.Oll-.05
.17
.2(1II
.10
.201 .09
.05
.06
.21 1 .08 I
.05
.151 1
I
I
I I .28 . 3 5
.33l .15
I
.191 .08
I
I I .OO
.08
.ll
I
I
.13
.271 .13 I
.18
.Oll .31
.42
. 0 4 l .25
.39
.lo1 .OO
.12
.041 .OO
I
I
R-Squa red
I
.It(*.33 I
.31l .OO
* * I * * I - . 8 0 -.15l .51 .55 .281 .38 .68
1-.13
Standard E r r o r
.4311.36
.I7
0
slope
,I I I I
A R-SqUa red Rsrerence Lake
j--------------Little
*
I
I
I
*
Y
.26
-48
.13
.20 I I
.20
.I41
.27
.lll
I
I
I
I
I
I
4' I
d
.33I .14
.la1 .13 I .09 . O O l .OO
I
I
.801 .30
I
I
I I .02 I
I
I
I
t I
I
Bearskin---------------
I
Enterprise I Moss I I I ITP Se WIZ S D W l I I -I 1-.41 -.01 .13I-.28 .37 .081 I I I
I I
Lake Pa ramst e r Intercept
Po
Post-Bsn Change standard
o f Change
error
B
I st8 so0 21 I -12 -07 I 4
8,
Slope Standard E r r o r
R-SqUa red
A R-squs red
I 1.13 I I -22 I
,
0
-161 - 0 2
I
Se 2 !@ .!
.85
.314 .97
-.lo
-06 -071 -00
-22
.09l - 0 8
-05
.12
-25
.32
I
- 1 3 1 -10 .05
I
I 41
T~~nIlnd TP
4
41
0
0
.54 I .52
I)
.95
.7011.08
-26
-181 -19 -17 el21 -14 -19 -1s
I .47 .34 I I .04 .OO I
I
.33I .54 I -03I .OO I
.67
I
i
.41 .Oh
.391 .31 I
.021 .OO
I
i
.lb
.141
.33
.021
I
I
* Significance at the 5% level all test lakes.
Such a model has the form:
where each D is a dummy of indicator 0,l variable depending upon j whether or not the observation is from the j-th test lake or not, and B is a 0,l variable for pre- or post-ban ( g denotes the number of test lakes minus one). This model permits estimation of the differential effect on the slope and intercept for the various test lakes. In partitioning the test lake variability, the method of estimation removed the components due to the
372
indicator variables for the different test lakes and due to the reference lakes, before evaluating the ban component. Another way of saying this is that the log test lake response is being considered as the sum of a general intercept, a linear component relationship with the log reference lake response, an adjustment in the intercept for the specific test lake, an adjustment in the slope for the specific test lake, an adjustment in the general intercept for the pre/post ban and an adjustment in the intercept for a specific test lake for the pre/post ban with the analysis partitioning the test lake response variability into assignable sources in the order listed. Table 4 provides a summary of the analysis of variance for each of total phosphorus (TP), Secchi disc depth (SD) and chlorophyll-5 (CHLA). For the corrected total sum of squares, the variability was partitioned sequentially into the following components: reference lake, intercept adjustment for different test lakes, adjustment of slope of reference lake variables for different test lakes, and finally, intercept adjustment for post/pre-ban effect. Another way of expressing this is that one adjusts the total test lake response variability for potential relationship with the corresponding reference lake response and for individual test lake differences and then examines for the TABLE 4. Combined Covariate Analysis.
Model Steps
Parameter
sum of Mean Squares D.F. Square
Test Stat.
R
&A
2
373
effect of imposition of the ban. Only the Secchi disc depth measurement showed a detectable variation between the pre- and post-ban values at a five percent level of significance. By inspection of the column under R2 in Table 4, one can assess the proportion of variability in the data explained by the model. The model appears to do much better in this respect for total phosphorus (.51) and Secchi disc depth ( . 7 1 ) than it does for chlorophyll-& (.31). Some additional information of potential interest that can be obtained from Table 4 is the proportion of the variability explained by various groups of terms in the model. Those are summarized in Table 5. TABLE 5. Proportion of Variability Explained by Various Sources.
Source Measurement _____--------------------Reference Lake Covariate
Test Lake Difference
Proportion of Variability Exp 1ained by Model
--_------__-_
Proportion of Total Variability
_ _ - T - - - - _ _ _ _ _ _ _ _ _
TP SD CHLA
0.31 0.22 0.19
0.16 0.15
TP SD CHLA
0.63
0.34 0.49
0.69 0.71
0.06
0.22
Table 6 lists each test lake's estimates of the slope coefficients for the corresponding reference lake as well as estimates of the amount of shift in the model after the imposition of the ban. The estimated standard deviations of these estimates are listed in parentheses. Asterisks ( * I are used to indicate statistical significance of at least the five percent level. A shift associated with the ban was detectable at the five percent level in only 4 of the 21 cases, namely for total phosphorus in Enterprise Lake, Secchi disc depth in Elk and Townline Lakes, and chlorophyll-5 in Elk Lake. In two of these cases (total phosphorus for Enterprise Lake and chlorophyll-5 for Elk Lake) a positive direction in the post-ban shift is not something that could be attributable to the ban. Hence, from
314
TABLE 6 . Estimates of the post-/pre-ban shift and slope-coefficient for corresponding reference lakes. Chlorophyll - a
----------------
.oa
Swan
(.lo) Balsam
-.01
Butternut Elk Enterprise
*
.47
(-09)
(.I71
-.01 (.09)
.37 (.IT)
-.04 (.09)
.35 * (-17)
.17 (.09)
Moss
-.13 (.23)
0
1.14
*
*
.16 (.16)
-.0 2
.54
(.04)
(-17)
.33 .21)
-.06 (.I61
.38 .21)
.31 (.16
.26 (.I61
.40 .21)
0
.75
.15
.70 * .21)
.54
(.16)
(.20
.07
.66 (.20
.07
.24
.10
(-16)
(.09)
Townline
(.16)
*
(.I71 1.11
-.19
*
.22
*
*
.21) .32
* Statistically significant at at least the 5% level (
)
Standard Deviation
this analysis, the only effect that appears to be associated with the ban is for Secchi disc depth. The relative magnitude of this shift is approximately 10 percent, and, although statistically significant, a question could be raised about the meaningfulness of its significance. The number of slope estimates that are statistically significant is an indicator that the relationship of the reference lakes to the test lakes is accounting for a statistically significant proportion of the variability. These data were useful in making the analysis more sensitive. However, the amount of variability not explained by this relationship is larger still. Multivariate Analysis/Multiple Comparisons A general multivariate analysis taking into account the covariance structure of the data was carried out. It complements
375
the preceding two analyses by using statistical procedures which account for possible correlation of the measurements. To this end, the measurements for a given lake and year were considered to be a single multivariate variable, or vector, for purposes of analysis. For each post-ban year, the vector analyzed actually consisted of the differences from the corresponding sampling times for the single pre-ban year. In one analysis, the test lakes and the reference lakes were considered together. In another analysis, the test lakes were considered separately. In either case, the vectors of differences were analyzed in a two-way table in which the entries were identified by lake and by year. Simultaneous confidence intervals on the differences were also constructed. The description of the procedures for these analyses are given in the Appendix. A similar analysis was also performed for each test lake using the differences of the logarithms of the test lake measurements and the corresponding reference lake measurements, in a sense an analysis of the test lake data adjusted for a potential relationship with its corresponding reference lake. The estimated differences for correspondings dates between post- and pre-ban measurements and their simultaneous confidence intervals are best presented graphically. Figure 3 shows the
Toral Phosphorus
Chlorophyll 2
1
1982
1982
1982
1981
1981
1981
1980
1980
-100
-1.20
0.m
Um
Contrast Value
zoo
-2.M
-1.00
0.00
1.00
Contrasr Value
200
-0.80
-0.20
Om
Convast Value
Figure 3. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of data from all lakes.
&I
376
results for the three analyses for total phosphorus, Secchi disk depth, and chlorophyll-a. Figure 4 presents a similar analysis for test lakes only. For each response variable and year, the left curve is the lower confidence bound, the middle curve the estimated contrast value, and the right curve the upper confidence bound. A vertical "no effect" line passes through zero. It is clear from Figures 3 and 4 that the ban has not had a statistically significant effect on total phosphorus, chlorophyll-a, or Secchi disc depth, although the general positive nature of the estimate for the latter for all post-ban years may support an indication of some effect for Secchi disc depth. This multivariate analysis was also performed for data constructed from the differences of the log test lake responses and the corresponding log reference lake responses. Graphs of the simultaneous confidence intervals on the differences between post- and pre-ban years for each point in time that was sampled are given in Figure 5. In this analysis no effect of the ban is observable.
Secchl Dloc Depth
Chlorophyll r?.
T o t a l Phosphorus
.q 1982
1982
1981
1381
1981
1980
1980
1980
1982
I
-2.00
Contrast Value
-1.00
0.M
1.00
Contrast Value
200
50
-0.20
0.20
Contrast Value
Figure 4. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of test lakes only, logarithmic transform of data.
a60
-
377
Toral Phosphorus
Sccchl Disc Ocprh
Chlorophyll 2
1982
1982
1982
1981
1981
1981
isen
1980
... -2w
-1.30
0.m
Conrrasr
1.00
Value
0
-2.M
-1.M
0.M
1.M
Conrrasr Value
2.00
I
-0.20
p: 0.20
Conrrasr Value
1160
Figure 5. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of difference of logarithmic transform of data between test lakes and reference lakes. CONCLUSION
An effect of the phosphate ban, if any, was sufficently small that its detection with statistical significance was not possible with the amount of variability observed in the data. However, it would appear that models involving reference lake measurements had their sensitivity improved for detecting ban effects. One could use this improvement to estimate the amount of additional test lake measuring that would be needed to provide the same sensitivity if one chose to eliminate sampling the reference lakes. The multivariate/multiple comparison analysis, based upon assumptions that are more supportable, would only have been capable of detecting a ban effect if there had been much more data or the measurement variability was greatly reduced. It does permit a valid analysis without the necessity of using a model relating the response to the time of year. Of course, with sufficient frequency of sampling over time to permit reliable estimation of such a model, considerably greater power for detection of a ban effect would result.
378
APPENDIX D e s c r i p t i o n of t h e Mu1 t i v a r i a t e / M u l t i p l e Comparison A n a l y s i s
Logarithms of the measurements for each response variable for a lake in a year were analyzed as an eight-dimensional vector response. These data were then analyzed as a multivariate two-way layout. The model can be written mathematically as: X . . = wk + Yik + L j k + eijk ilk (i = 1,2,3,4, j = 1,2,3,4,5,6,7,8,9, k = 1,2,3,4,5,6,7,8) where Xijk is the kth observation on lake j in year i, pk is the kth component of the grand mean, Yik is the kth component is the kth component of the of the effect of year i, and L jk effect of lake j . The errors {e.. eij2, e113' . . eij4r eij5g eij6, eij7, e . . } are assumed to be independent eight-variable 1 38 Gaussian with zero mean and covariance matrix C . There are two principle advantages of this model: 1. It takes account of the covariance structure of the data. 2. It is simple, allowing for differences between lakes and years without assuming a specific mathematical model for the difference. Parameter Estimation The multivariate analysis of variance closely parallels its univariate counterpart. Maximum likelihood estimates of the effects are given by:
where the dot and bar denote averaging over subscripts. The maximum likelihood estimate, C , of the error covariance matrix, is proportional to the error sum of squares and cross products matrix E l where 4
9
iC= l
j C= l
(xijk - Gk - yik - L jk ' ) ( " ijg - g' - ' ig - Ljg)' The statistical tests of interest are multiple comparisons of contrasts Cik = Yik - Ylk, denoting the difference in measurement k between post-ban year i (1980, 1981, or 1982) and the pre ban year, 1978. The Cik are estimated by Cik = 'ik Ylk, (i = 1,2,3,4, k = 1,2,3,4,5,6,7,8). Ekg
=
379
By formula ( 8 ) , pp. 2 0 0 - 2 0 1 of Morrison ( 1 9 7 6 1 , the 100 (1- a) percent simultaneous confidence intervals on the {Cik} for a11 nine lakes are: 1
ik
-
ik
'ik
Here Xa is the upper lOOa percentage point of the greatest characteristic root distribution with parameters (in Morrison's notation), s = 3, m = 2, and n = 7 . 5 . We take a = 0.05, and find from Chart 11, p . 3 8 1 of Morrison that Xa = 0 . 6 6 5 . Although the above is for all nine lakes, similar expressions can be displayed for the other situations discussed in the Methods Section.
REFERENCES American Public Health Association. 1 9 7 6 . Standard Methods for the Examination of Water and Wastewater, 14th Edition. Bell, J.M. and A. Spacie. 1 9 7 8 . "Trophic Status of Fifteen Indiana Lakes in 1 9 7 7 . " Purdue University. Hartig, J.H. and F.J. Horvath. 1 9 8 2 . "A Preliminary Assessment of Michigan's Phosphorus Detergent Ban." Journal of the Water Pollution Control Federation, 5 4 ( 2 ) : 1 9 3 - 1 9 7 . Hirsch, R.M., J.R. Slack,and R.A. Smith. 1 9 8 2 . "Techniques of Trend Analysis for Monthly Water Quality Data." Water 107 - 121. Resources Research, s(1): Hutchinson, G.E. 1 9 7 3 . "Eutrophication." American Scientist, 61: 269
-
279.
Likens, G.E., ed. 1 9 7 2 . Nutrients and Eutrophication: The Limiting-Nutrient Controversy. American Society of Limnology and Oceanography, Inc. Lawrence, Kansas. Lillie, R.A. and J.W. Mason. 1 9 8 3 . Limnological Characteristics of Wisconsin Lakes. Technical Bulletin No. 1 3 8 . Department of Natural Resources, Madison, Wisconsin. 1 1 6 pp. Maki, A.W., D.B. Porcella and R.H. Wendt. 1 9 8 4 . "The Impact of Detergent Phosphorus Bans on Receiving Water Quality." Water Research, s ( 7 ) : 8 9 3 - 9 0 3 . Menzyl, D.W. and N. Corwin. 1 9 6 5 . "The Measurement of Total Phosphorus in Seawater Based on the Liberation of Organically Bound Fractions by Persulfate Digestion." Limnology and Oceanography, lo: 2 8 0 - 2 8 2 .
380
Morrison, D.F. 1976. Multivariate Statistical Methods, 2nd Edition. McGraw-Hill Book Company. 415 pp. Murphy, J. and J.P. Riley. 1962. "A Modified Single Solution Method for the Determination of Phosphate in Natural Waters." Analytica Chimica Acta, 27: 31 - 36. National Eutrophication Survey. 1974. "Relationships Between Drainage Area Characteristics and Non-point Source Nutrients in Streams." Working Paper No. 25. Pieczonka, P. and N.E. Hopson. 1974. "Phosphorus Detergent Bans How Effective?" Water and Sewage Works, July 1974: pp. 52. Prescott, G.W. 1962. Algae of the Western Great Lakes Area. C. Brown Company Publishers. Dubuque, Iowa. 977 pp.
Wm.
Runke, H. 1982. "Effects of Detergent Phosphorus on Lake Water Quality in Minnesota: A Limnological Investigation of Representative Minnesota Lakes, 1975 - 1980." A report prepared for the Procter and Gamble Company. Schindler, D.W. 1977. "Evolution of Phosphorus Limitation in Lakes." Science, 195: 260 - 262. Trautmann, N.M., C.E. McCulloch and R.T. Oglesby. 1982. "Statistical Determination of Data Requirements for Assessment of Lake Restoration Programs." Canadian Journal of Fisheries and Aquatic Sciences, 2 : 607 - 610. Van Belle, G. and J.P. Huqhes. 1984. "Nonparametric Tests for Trend In Water Quality." Water Resoukces Research, =(1) : 127 - 136. Wetzyl, R.G. 1975. Limnoloqy. W.B. Saunders, Philadelphia. 743 pp. Wisconsin Department of Natural Resources. 1975. "Classification of Wisconsin Lakes by Trophic Condition: April 15, 1975." G. Anderson, ed. WDNR, Bureau of Water Quality, Madison, WI. 108 pp.
THE CHANGE POINT PROBLEM: A REVIEW OF APPLICATIONS V.K. Jandhyala and I.B. MacNeill, The University of Western Ontario, London, Ontario, Canada N6A 5B9
Introduction In the context of process inspection schemes Page (1955) proposed a test for change in a parameter occuring a t an unknown time point. Since then, an extensive literature on this problem has appeared in various scientific journals. The problem has been dealt with under several model assumptions using existing and new statistical methodologies. While a majority of these papers contain theoretical developments and simulated data analysis, many contain both discussion of models applied to actual data obtained from a variety of practical situations and also analysis for the purpose of detection and estimation of change points at unknown times. The aim of this paper is t o review those papers that contain analysis of statistical models applied to real data. This literature contains: new modelling techniques; new methods of data analysis developed from the Bayesian approach and from likelihood methods; and non-parametric methods. The models and corresponding analyses have been applied to various types of data. While statisticalmodelling and analysis of the change point problem originated with Page (1955), literature dealing with actual application to data was initiated in 1971 by Bacon and Watts who proposed estimating the transition between two intersecting straight lines using a smooth transition function. They then applied the procedure to the estimation of change in the behaviour of stagnant surface layer height in a controlled flow of water down an inclined channel. Since then, the following authors have made contributions t o both theory and applications: Griffiths and Miller (1973), Sen and Srivastava (1975), Brown, Durbin and Evans (1975), Schweder (1976), Tsurumi (1977), Bagshaw and Johnson (1977), Pettit (1979), Hsu (1979, 1982), Smith and Cook (1980), Esterby and El-Shaarawi (1981), Menzefricke (1981), Worsley (1983), Commenges and Seal (1985) and MacNeill (1985).
Two-Regime Transition Models Bacon and Watts (1971) studied the change in behaviour of stagnant surface layer height in a controlled flow of water down an inclined channel using different surfactants. The data have been analysed by modelling them with a two-regime transition model that is sensitive to changes in the slope of a simple linear regression model. The transition model is given by
+
+
+
Y = a0 aI(z - 20) az(z - z,)trn((z - z0)/7) z where trn( (z- z~)/y)is a transition function satisfying the following smoothness conditions:
(1)
(i) 1ima+- tm(s/7) = 1, (ii) trn(0) = 0, (iii) lim,,o (iv) lirn,+-
trn(s/-y) = sgn(s), strn(s/7) = s,
and z is a random variable representing error. This model is alleged to be insensitive to the particular form of the transition function, and hence a transition function of the form trn(s/y) = tanh(s/7) is used. The parameters of the model are then obtained by a Bayesian approach. The joint marginal posterior density of zo and 7 was calculated and a peak was noted in the probability density function. Similar analysis has also been performed on another water flow data set.
382 Griffiths and Miller (1973) analyzed the same water flow data by modelling it as a regression involving a modified smooth transition function. The transition function suggested by Griffiths and Miller (1973) is
id-,
tm(s/r)= (2) thereby relaxing the condition trn(0) = 0 assumed by Bacon and Watts (1971). This transition function makes the regression line appear as a bent hyperbola. The actual fitted model for the first data set of stagnent band heights in a controlled flow of water is P = 0.556 - 0.735(z - 0.063) - 0.359d(z - 0.063)2 0.096). (3) The model of Bacon and Watts (1971) and this model are both adequate.
+
Changes in Mean Level Den and Srivastava (1986) analysed the following data sets for traffic from 1962 through 1971 in the State of Illinois: number of traffic deaths, number of thousands of traffic injuries, number of thousands of traffic accidents, and the number of deaths per hundred million vehicle miles. The data are modelled as
+
X;= Y;+l - Y; = hi 6, (4) for the purpose of detecting change of parameters at unknown time points. In the above, Y; is any of the four types of traffic data for the ith year and the 6;'s are error variables assumed to be normal with mean zero and variance a', which is unknown. The statistic suggested for detecting changes in the hi's at unknown times is U 7,
p= where
(5)
-
u = c,n_;' i ( X ; + l - X)
and
v = C,n_;l(x;+1~ ; ) 2 / 2 ( n - I). The computed values for nP2P1respectively were 0.164,0.283,0.069 and 0.198. The 95% significance point is 0.155, hence changes are detected in the lst, 2nd and fourth data sets. The significance level was based on a Monte Carlo study of 5000 simulations.
Detecting Changes in Regression Using Cusum Schemes Brown, Durbin and Evans (1975) developed tests for the stability of relationships over time based on cumulative sums of recursive residuals and cumulative sums of squares of recursive residuals. A computer package called TIMVAR was developed to implement these methodologies. These tests were applied to three practical examples. First, the methodology developed by Brown et al. has been applied to a regression model to explain growth in the number of local telephone calls. The model involves a constant and four independent variables. The cusum residual plot and the cusum of squares of residuals plot have been obtained through TIMVAR. The analysis showed no change up to 1964/65 and then indicated instability thereafter. The second example analyzed is one concerning the International Monetory Fund. If
Mt = per capita stock of money,
I& = long term interest rate, and
Yt = per capita income,
383 then, the model proposed is A log Mt = a Plog A& qlog AYt C t , (6) where A is the difference operator. TIMVAR analysis detected no changes that were statistically significant. Similar modelling and analysis has also been done on certain civil service data. Schweder (1976) studied the relative growth of different body parts of fin whales. A point of structural shift in a whale's life usually indicates that the whale has entered a new phase in its development. With
+
+
+
X, = log length of whale i ,
Y,= log height of dorsal fin of whale i, and
Z, = log length of base dorsal fin of whale i, a' = 1 , 2 , . . .,108; the relations
Y, = a1 +PIX, and
+
+ el,,
(7)
+
(8)
Z, = 0 2 PzY, ez, were studied for structural shifts. A structural shift is indicated if model (7) is transformed to
+ +
Y,= al +PIX. 7. el, (9) for some unknown i . The observations were ordered such that x1 < xz < . . . < x10*. A cusum procedure was developed to test for structural shifts. The minimum of the cusum has been observed to be -32.4 with a significance probability of 0.0014 indicating a structural shift in the regression of Y on X . The point of shift is estimated by the point a t which the cusum is a minimum, and is given by 9 = -48. Similar analysis has been applied to model (8). Bagshaw and Johnson (1977) analysed the first 112 observations of the series for IBM stock given in Box and Jenkins (1970). They fitted an IMA(1,l) model with non-zero mean: v z t = 1.28 (1 .29B)t; (11) that is 8: = 25.28, eo = -.29 and = 1.28. Then, a cusum test developed for testing parameter changes in ARIMA models was applied to detect a change in 0 at observation 270.
+ +
Non-Parametric Tests for Change of Parameter at Unknown Time Pettit (1979) analyzed the Lindisfarne Scribe's binomial data for changes a t unknown times using a non-parametric method. The data refer t o the number of occurrences of present indicative third person singular endings "-s" and "-d", for different sections of Lindisfarne. It is believed different scribes used the endings "s" and "-6" in different proportions. Pettit (1979) developed a non-parametric test for detecting changes at unknown times; this test is a version of the two sample Mann-Whitney test. For testing change against no change, the statistic is:
KT = maxlst
(12)
KT+= maxl
(13)
and
384
KT -- - minl
(14)
ut,~,
where Ut,T = and
cf=1c:=,+i D*~
(15)
D,,= sgn(X, - X , ) Exact distributional properties were obtained for Bernoulli random variables. Using a simple modification of the Bernoulli for binomial data, the Lindisfarne data were analysed associating “-S” with one and “-8’ with zero. The value of
gT = maxt=t,,.=l, ,- I Ut,T I (16) was found to be 7906 and the standardized statistic was 1.83. This standardized statistic has the same distribution as that of Smirnov’s statistic, and from this the significance level was found to be 0.25 percent hence strongly indicating a change. The same technique has been used t o analyze industrial data. Detecting Parameter Changes Using Bayesian Methods Smith and Cook (1980) studied changes a t unknown times in the functioning of a transplanted kidney by formulating it as a simple linear regression model given by: Y,=al+Plz,+e,, and
i = l , ..., rn
+
(20)
e;, i = r n + 1 , . . .,n. (21) then in the renal transplant application, 7 corresponds to the If the two lines meet at 7 = time at which a rejection occurs. Data from two patients were considered and an unconstrained version of the model was analysed by Bayesian methods with a vague prior specification consisting of the uniform distribution over 2 5 rn 5 n - 1. The posterior densities for m and 7 were obtained. Using these posterior distributions, the change points and 7 were estimated for both patients.
Y,= a2 + P z x ,
H,
Hsu (1979, 1982) analyzed U.S.stock market prices from July 1971 through August 1984 for detecting parameter shifts in the variance at unknown times. Hsu (1979) also analyzed air traffic densities in the New York area.Hsu (1979) then used the statistic
T = Cy=,(i- l ) X , / [ ( n
c;=,
- 1) Xi] in the standardized form given by
T*-
T-112
m
(18) (19)
to detect changes at unknown times in the variances. The statistic, T’,for squared & has been found to be 3.521 which is substantially larger than the critical point 2.326 at a 0.01 right side level. The change point was then estimated by maximum likelihood and found to be the 89th time point. This corresponds to the mid March 1973 when Watergate events caught the full attention of the U.S. public. Hsu (1982) proposed a step change in the parameters for the same stock market price data and analyzed them using a Bayesian inference procedure in a modified form that was basically developed by Box and Tiao (1973). The posterior probability functions for the change point indicated that a change in the market return distribution occurred in late February or March of 1973. It can be seen that these results coincide with the analysis in Hsu (1979). Hsu (1979) also studied the problem of detecting changes in air traffic densities observed in the New York area. Arrivals a t the New York airports on a single day were considered for the purpose of analysis. The 213 arrival times were first analyzed t o establish the inter-arrival time densities; the exponential distribution was found to fit well. Then T’ was calculated to be 1.232 which is well below the significance bounds thus suggesting that the aircrafts were arriving at constant rates.
385 Tsurumi (1977) examined whether there was a parameter shift in consumer expenditures on vitamins and other nutritional supplements in the Japanese pharmaceutical industry. The demand curve is taken to be eny; = aenp;
+ penxi + U;,
i = 1 , 2 , . . . ,16
where:
Y;= real average expenditures on vitamins and other nutritional supplements by consumers in income group i , 1965 Yen; p. = relative price of nutritional supplements to consumers in income group
i, 1965 = 100;
x; = real average disposable income of consumers in income group i, in thousands of 1965 Yen. The study was based on data from 1969 to 1974. A Bayesian method was developed to test for parameter shifts. The posterior probability density functions of 7t = pt - pt-1 and 7; = at - a t - 1 were derived based on diffuse priors. The results indicate parameter shifts from one year to another did not occur in the coefficients of the price variable, but rather in those of the income variable. The shift in the income variable was estimated to have occurred in the year 1971. Menzefricke (1981) studied the stock return data of Hsu (1979) and the industrial data of Pettit (1979) using a Bayesian procedure for detecting changes in precision at unknown times. For the stock return data, Menzefricke (1981) hypothesized the model:
X,
-
~ ( p 1 , q ; ' )i = 1 , 2 , .. . , m
(24)
and
~ ( p 2 , q ; l )i = m + 1 ,...,n (25) Based on a vague prior, the posterior probability function on m was determined and the mode was detected a t m = 89. Thus, this result is in close agreement with the result of Hsu (1979,1982). Menzefricke (1981) then analyzed the industrial data giving the percentages of a particular material in 27 batches. These data were first analyzed by Pettit (1979) using a non-parametric method. Pettit (1979) concluded that a change occurs in period 16. Menzefricke (1981) applied the Bayesian method and found results in close agreement with those of Pettit.
X,
D e t e c t i o n and Estimation of Change Points Using Likelihood Methods Esterby and El-Shaarawi (1981) performed an analysis on the change of pollen concentration in a lake-sediment core by making inferences about the point at which changes occur in the relationship between concentration and depth. The data were modelled by a regression of the form
Y,= C;=, e.+;1
+ el,,
i = 1, . . .,m
(22)
and
+
Y, = C,9=oe2jx;1 ez., i = m + 1 , . ..,TI, (23) where p and q are unknown. The unknown parameters p, q and m are estimated by marginal, conditional and maximum likelihood methods. It was observed that p' = q' = 4 and rh = 12. The analysis has been carried out under the assumption that u: # ui for the two regimes. The methods have been applied to other data sets. Worsley (1983) analyzed the Gross Domestic Product and labor and capital input in the United States for the years 1929-1967 by modelling the logarithm of the gross domestic product as a linear function of the logarithms of the labour input and capital input. The log likelihood ratio statistics under the assumptions u1 # u2 and 6 1 = u2 have been computed and the distributions of the maxima of these statistics were approximated by a Bonferroni inequality. The analysis indicated a significant change in the year 1942 and another significant change in the year 1946.
386
Commenges and Seal (1985)considered a key problem occurring in neurophysiology. The problem is that of determining whether, after presentation of a stimulus there has been a modification in the discharge of a recorded neuron. The problem was considered as that of estimation of a change point in a sequence of random variables. The analysis involved a window method which was applied systematically to look for changes corresponding to a decrease in the mean time interval between action potentials. D e t e c t i o n of Regression Parameter Changes at Unknown Times Using Raw Regression Residuals MacNeill (1985) considered the series of annual flows of the Nile river for the period 18701945 and analyzed the series for unknown interventions. A detection procedure, called Adaptive Forecasting and Estimation using Change-Detection, or AFECD for short, was developed to detect interventions a t unknown times in time series. AFECD was applied and a change has been detected in the river flow in the year 1903. This change in the river flow corresponds t o the year when the high dam a t Aswan was constructed. The procedure also indicated a negative slope to the river flow over the period of time leading up to the change and indicated a flat slope thereafter. This has been interpreted as a positive effect by the dam on river flow. The AFECD procedure is based on a change detection statistic
r
o
L
zn,
o
...
0
...
A detailed derivation of this change detection statistic and of several other change detection statistics along with their distributional results can be found in the unpublished Ph.D. dissertation of Jandhyala (1985). REFERENCES Bacon, D. W., and Watts, D.G. (1971). Estimating the transition between two intersecting straight lines. Biometrika 58, 525-534. Bagshaw, M., and Johnson, A.R. (1977). Sequential procedures for detecting parameter changes in a time series model. Journal of the American Statistical Association 72, 593-597. Box, G.E.P., and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control. San Francisco, Holden-Day. Box, G.E.P., and Tiao, G.C. (1973). Bayesian Inference in Statistical Analysis, Reading, Mass.: Addison- Wesley. Brown, R.L.,Durbin, J., and Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society, Series B 37, 149-192. Comnienges, D., and Seal, J. (1985). The analysis of neuronal discharge sequences: change point estimation and comparison of variances. Statistics in Medicine 4,91-94. Esterby, S.R., and El-Shaarawi, A.H. (1981). Inference about the point of change in a regression model. Applied Statistics 30, 277-285.
387 Griffiths, D.A., and Miller, A.J. (1973). Hyperbolic regression-a model based on two-phase piecewise linear regression with a smooth transition between regimes. Communications in Statistics 2, 561-569. Hsu, D.A. (1979).Detecting shifts of parameter in gamma sequences with applications to stock price and air traffic flow analysis. Journal of the American Statistical Association 74, 31-40. Hsu, D.A. (1982).A Bayesian robust detection of shift in the risk structure of stock market returns. Journal of the American Statistical Association 77, 29-39. Jandhyala, V.K. (1985). Residual processes for regression models with applications to detection of parameter changes at unknown times. Unpublished Ph.D. thesis, The University of Western Ontario, London, Ontario. MacNeill, I.B. (1985).Detecting unknown interventions with application to forecasting hydrological data. Water Resources Bulletin 21, 785-796. Menzefricke, V. (1981). A Bayesian analysis of a change in the precision of a sequence of independent random variables at an unknown time point. Applied Statistics 30, 141-146. Page, E.S. (1955).A test for a change in a parameter occurring at an unknown time point. Biornetrika 42, 523-526. Pettit, A.N. (1979). A non-parametric approach to the change point problem. Applied Statistics 28, 126-135. Schweder, T. (1976). Some optimal methods to detect structural shifts or outliers in regression. Journal of the American Statistical Association 71, 491-501. Sen, A.K., and Srivastava, M.S. (1975). Some one-sided tests for change in level. Technometrics 17, 61-64. Smith, A.F.M., and Cook, D.G. (1980). Straight lines with a change point: A Bayesian analysis of some renal transplant data. Applied Statistics 29, 180-189. Tsurumi, H. (1977).A Bayesian test of a parameter shift and an application. Journal ofEconornetrics 6, 371-380. Worsley, K.J. (1983).Testing for a two-phase multiple regression. Technometrics 25, 35-42.
SPECTRAL ANALYSIS OF LONG-TERM WATER QUALITY RECORDS PAUL H.
WHITFIELO,
INLAND WATERS DIRECTORATE,
ENVIRONMENT CANADA,
VANCOUVER,
B.C.
ABSTRACT The G r e a t e r Vancouver R e g i o n a l D i s t r i c t (GVRD) p r o v i d e s , services,
s u p p l y o f w a t e r comes f r o m t h r e e sources:
The t o t a l
t h e C a p i l a n o , Seymour and C o q u i t l a m
A s p a r t o f an e x t e n s i v e q u a l i t y c o n t r o l program,
Rivers.
along w i t h other
d r i n k i n g w a t e r t o t h e communities o f G r e a t e r Vancouver.
t h e GVRD measures
w a t e r t e m p e r a t u r e d a i l y and pH and t u r b i d i t y t h r e e t o f o u r t i m e s each week. These r e c o r d s commence i n 1959 and c o n t i n u e t o t h e p r e s e n t . These r e c o r d s were reduced t o weeky averages and examined i n t h e f r e q u e n c y domain u s i n g s p e c t r a l
analysis.
The f r e q u e n c y approach i n v o l v e s e s t i m a t i n g
how much o f t h e v a r i a t i o n i n t h e d a t a a r i s e s f r o m v a r i o u s f r e q u e n c y bands. a n a l y s i n g t h e d a t a p r e s e n t e d here,
In
t h e a p p l i c a t i o n o f s p e c t r a l a n a l y s i s as an
a i d t o i d e n t i f y i n g s i g n i f i c a n t f r e q u e n c y components i s examined. INTRODUCTION
One o f t h e o b j e c t i v e s f o r g a t h e r i n g w a t e r q u a l i t y d a t a i s t h e d e t e c t i o n o f trends. data
For t r e n d assessment programs t o be e f f e c t i v e t h e manner
are
collected
complimentary.
(i.e.
design)
and
The u l t i m a t e o p e r a t i o n a l
the
data
goal
analysis
methods
i n which must
be
i n t r e n d assessment m o n i t o r i n g
i s t o be a b l e t o d e t e c t t h e s m a l l e s t p o s s i b l e change o r t r e n d w h i l e m i n i m i z i n g t h e amount o f d a t a g a t h e r e d .
T h i s paper c o n s i d e r s p r a c t i c a l a p p l i c a t i o n o f
spectral
to
records
analysis are
techniques
described
long-term
subsequently.
water
Some a s p e c t s
quality and
records.
results of
These spectral
a n a l y s i s w h i c h a r e u s e f u l i n t r e n d assessment w i l l be demonstrated. There a r e b a s i c a l l y two approaches t o t i m e s e r i e s a n a l y s i s ,
a frequency
domain ( o r s p e c t r a l ) approach, and a t i m e domain ( B o x - J e n k i n s ) approach. s e r i e s a n a l y s i s methods a r e concerned w i t h t h e f o l l o w i n g aims o r g o a l s : description; analysis
(2)
explanation;
(3)
prediction;
and
(4)
control.
Time (1)
Spectral
i s a method f o r t r a n s l a t i n g f r o m t h e t i m e domain t o t h e f r e q u e n c y
domain and back a g a i n .
The use o f t h e s e e q u a t i o n s has been g r e a t l y
improved
by t h e a d d i t i o n o f t h e f a s t F o u r i e r t r a n s f o r m and h i g h speed computers. The premise o f s p e c t r a l a n a l y s i s i s t h a t a t i m e s e r i e s can be m e a n i n g f u l l y r e p r e s e n t e d by p u r e s i n e o r c o s i n e waves
summed o v e r a range o f f r e q u e n c i e s
389 ( B l o o m f i e l d , 1976). series
S p e c t r a l a n a l y s i s uses t h e F o u r i e r T r a n s f o r m o f t h e t i m e
t o obtain the coefficients of
frequencies.
Grouping
neighbouring
the
sinusoids
frequencies
at
a
smooths
discrete the
set
spectrum
of and
enhances t h e s t a t i s t i c a l s t a b i l i t y o f t h e e s t i m a t e s ( C h a t f i e l d , 1984). The
spectral
variables.
analysis
One o f
of
single
t h e most
v a r i a b l e s can be extended t o p a i r s o f
important
coherence o f t h e two v a r i a b l e s .
frequency
domain
is
quantities
the
The coherence o f two v a r i a b l e s r e f l e c t s t h e
l i n e a r c o r r e l a t i o n between t h e v a r i a b l e s i n d i f f e r e n t f r e q u e n c y bands. Spectral analysis
has many u s e f u l
s e r i e s d a t a ( B r i l l i n g e r 1981a,
1981b;
1971; Jones 1964, 1965, e t c . ) . useful i n the analysis
of
applications t o the analysis o f time Chatfield,
1984;
C h a t f i e l d and
Pepper
Several o f these a p p l i c a t i o n s a r e p a r t i c u l a r l y
water
q u a l i t y data.
The m a t e r i a l w h i c h
follows
a v o i d s t h e m a t h e m a t i c a l a r e a s o f s p e c t r a l a n a l y s i s and c o n c e n t r a t e s on t h e use o f spectra f o r i d e n t i f y i n g p e r t i n e n t features. reflect
some o f
the data
r e c o r d s and t h e i r seasonality, limits.
features
spectra allow t h e i r
trend,
non-normal
The v a r i a b l e s c o n s i d e r e d h e r e
t h a t a r e n o r m a l l y seen i n w a t e r q u a l i t y illustration.
data
These
distribution
and
features
changing
include
detection
The examples p r e s e n t e d h e r e a r e l i m i t e d t o s i m p l e a s p e c t s o f s p e c t r a l
analysis,
namely
smoothed
spectra,
and
how
these
can
provide
useful
information i n t h i s analysis o f time series. DATA SERIES Time
series
data
from t h r e e
presented here (Figures 1-3). Coquitlam
Rivers.
Vancouver
and
These
provide
for
three
different
variables
are
The t h r e e r i v e r s a r e t h e C a p i l a n o , Seymour and
three its
rivers rivers
drinking
are
located
water.
north
Data
for
of
the
three
City
of
variables,
t e m p e r a t u r e , t u r b i d i t y and pH a r e g a t h e r e d by t h e G r e a t e r Vancouver R e g i o n a l D i s t r i c t a t t h e water intakes. The raw d a t a was g a t h e r e d on a d a i l y b a s i s . were made each day o v e r t h e p e r i o d o f r e c o r d . pH were made f r o m t h r e e t o f i v e t i m e s
Measurements o f t e m p e r a t u r e Measurements o f t u r b i d i t y and
each week.
c o n v e r t e d t o computer f o r m u s i n g Lotus-123.
These measurements
were
The i n p u t d a i l y measurements were
t h e n processed t o weekly averages and t e m p e r a t u r e s c o n v e r t e d f r o m F a r e n h e i t t o C e l c i u s where needed. The r e c o r d s f o r t h e C a p i l a n o and C o q u i t l a m R i v e r s s t a r t and t h e Seymour R i v e r r e c o r d b e g i n s i n January 1961. 1400 weekly
values
in
each,
e v a l u a t i n g s p e c t r a l methods.
provide a Temperature
nearly
i n January 1959,
These l o n g r e c o r d s , some
ideal
set
of
records
for
i s a h i g h l y seasonal v a r i a b l e w i t h
h i g h e r v a l u e s d o m i n a t i n g i n summer and l o w e r v a l u e s
i n winter.
I n addition,
t e m p e r a t u r e v a l u e s a r e h i g h l y a u t o c o r r e l a t e d w i t h warmer v a l u e s b e i n g f o l l o w e d by warmer v a l u e s and s i m i l a r l y f o r c o o l e r p e r i o d s ( W h i t f i e l d & Woods.
1984).
390
Capilano River
L959
1961
1963
1965
1967
1969
1971
-
1973
Seymour River
GVRD
1975
1977
1979
1981
1983
1985
1979
1981
1983
1985
1979
1981
1983
1985
GVRD
~
1
1959
1961
1963
1965
1967
1969
1971
1973
Coquitlarri RiT-er
1959
FIgure 1 .
1961
1963
1965
I967
1969
1971
1973
1975
~
1977
GVRD
1975
1977
391
Cdpilano River
GVRD
~
85
8
75
7
% 65
6
55
5
1959
1961
1963
1965
1967
1969
1971
1973
Seymour River
1975
-~
1977
1979
1981
1983
1985
1979
1981
1983
1985
1979
1981
1983
1985
GVRD
85
e 78
7
% 65
6
55
1959
1961
1963
1965
1967
1969
1971
1973
Coquitlam River
1975
-
1977
GVRD
R5
0
75
-
7
I
a 65
6
55
5
1959
F l g u r e 2.
1961
1963
1965
1967
1969
1971
1973
1975
1977
392
Capilano River
GVRD
~
60
Seyniour River
LJLAA.
-
GVRD
,
L 1983
1939
1961
1963
1965
1967
1969
1971
1973
Coquitlarn RiT-er
Flgure 3.
1975
~
1977
GVRD
1979
l!
1985
393
F i g u r e 1 shows t e m p e r a t u r e r e c o r d s f o r t h e t h r e e r i v e r s , showing t h e seasonal v a r i a t i o n mentioned p r e v i o u s l y .
The C a p i l a n o and C o q u i t l a m w a t e r s u p p l i e s a r e
both
reservoirs
fed
by
relatively
temperatures.
small
which
buffer
the
excursion
of
T h i s i s p a r t i c u l a r l y e v i d e n t when a l l t e m p e r a t u r e s i n excess o f
15°C a r e c o n s i d e r e d .
Water
temperatures
In
the
Seymour
River
exceed
15"
a l m o s t e v e r y y e a r w h i l e t h e o t h e r two r a r e l y exceed t h i s v a l u e . The r e c o r d s o f pH a r e shown i n F i g u r e 2;
t h e y do n o t show a pronounced
seasonality.
C o q u i t l a m pH's a r e somewhat l b w e r t h a n e i t h e r t h e C a p i l a n o o r
t h e Seymour.
O f particular
visual
inspection of
this
interest
i s the
l o n g t e r m pH d r i f t
F i g u r e 2 suggests.
that
I s t h i s a r e a l trend,
close o r an
artifact? Turbidity reflecting
i s a
intense
h i g h l y episodic rainfall
and
other
processes
introduce
Coupled w i t h t h e e p i s o d i c n a t u r e o f t h e s e r e c o r d s a r e two o t h e r
features.
t h e r e c o r d s show two changes i n t h e d e t e c t i o n l i m i t o f t h e
First,
a n a l y t i c a l method.
These a r e r e f l e c t e d i n changes o f t h e
1965 and a g a i n a t t h e end o f 1969. River
shows
regular
baseline
i n late
Second i s t h e o c c u r r e n c e o f p e r i o d s where
t h e r e I s a higher l i k e l i h o o d o f h i g h t u r b i d i t y values Capilano
which
F i g u r e 3 demonstrates t h e h i g h l y e p i s o d i c n a t u r e o f
sediment i n t o t h e r i v e r s . turbidity.
variable w i t h periods o f high t u r b i d i t y
events
turbidity
peaks
in
the
b e i n g observed. mid w i n t e r
S i m i l a r r e s u l t s o c c u r f o r t h e Seymour R i v e r and f o r t h e C o q u i t l a m R i v e r . mid-winter
peaks
occur
with
less
regularity.
Does
this
The
period. but
non-normal
d i s t r i b u t i o n of data influence the a b i l i t y o f spectral analysis t o a i d I n the evaluation o f these data? DATA ANALYSIS A l l s t a t i s t i c a l a n a l y s e s p r e s e n t e d were p e r f o r m e d u s i n g t h e U n i v a r i a t e and
B i v a r i a t e S p e c t r a l a n a l y s i s program c o n t a i n e d i n t h e BMDP package computer
programs.
This
program p r o v i d e s
graphic
(PlT)
of
d i s p l a y s and d e s c r i p t i v e
s t a t i s t i c s f o r s i n g l e o r p a i r e d t i m e s e r i e s ( D i x o n . 1983). BANDWIDTH EFFECTS One o f variation sine
t h e basic goals
of
spectral
analysis
i s t o d e t e r m i n e how much
I n t h e d a t a s e t I s accounted f o r by d i f f e r e n t f r e q u e n c y bands o f
waves.
The
fast
Fourier
transform
of
a
data
series
provides
a
periodogram w h i c h p r o v i d e s t h e spectrum necessary t o p r e c i s e l y d u p l i c a t e t h e input series. temperature.
F i g u r e 4 shows one such periodogram f o r C a p i l a n o R i v e r w a t e r D e t a i l e d s p e c t r a such as t h i s a r e d i f f i c u l t t o i n t e r p r e t .
S p e c t r a l d e n s i t i e s can a l s o be e s t i m a t e d f o r f r e q u e n c y bands.
For wider
bandwidths, t h e perlodograms a r e averaged o v e r a range o f f r e q u e n c i e s t o f o r m estimated s p e c t r a l d e n s i t i e s .
T h i s enhances t h e s t a b i l i t y o f t h e e s t i m a t e s .
394
F i g u r e 4.
F i g u r e 5.
b u t i t g i v e s l e s s d e t a i l t h a n n a r r o w e r bandwidths. more r e l i a b l e p i c t u r e o f t h e observed s e r i e s . plots
for
0.0128.
Capilano This
River
figure
bandwidth.
A
presented
here.
bandwidth This
temperatures
shows
the
Wider bandwidths p r o v i d e a
F i g u r e 5 shows w i d e r bandwidth
namely
0.0005,
progresslve
0.0018,
smoothing
0.0064
with
and
increased
of 0.0128 was used i n a l l o t h e r s p e c t r a l e s t i m a t e s width,
while
perhaps
overly
smoothed,
shows
the
c h a r a c t e r i s t i c s we w l s h t o c o n s l d e r . A R I T H M E T I C OR LOGARITHMIC PLOTS?
One a d d i t i o n a l a r e a w h i c h needs m e n t i o n i n g r e g a r d i n g s p e c t r a l p l o t s i s t h e p r o s and cons o f u s i n g l o g p l o t s o r l i n e a r p l o t s .
The c h o i c e o f p l o t t i n g t h e
e s t i m a t e d spectrum o r i t s l o g a r i t h m depends upon a number o f c o n s i d e r a t i o n s . The advantage o f p l o t t i n g t h e e s t i m a t e d spectrum on a
logarithmic
scale
is
t h a t I t s v a r i a n c e i s independent o f t h e l e v e l o f t h e spectrum and hence o f constant width. power e x i s t , that
L o g a r i t h m i c p l o t s a l s o have v a l u e where l a r g e v a r i a t i o n s
and a l l o w more d e t a i l t o be shown.
logarithmic plots
Interpretlng a
show exaggerated
The converse of t h e s e a r e
e f f e c t s where
spectrum p l o t t e d on an a r i t h m e t i c
s i n c e t h e a r e a under t h e c u r v e corresponds t o power.
in
variation
is
small.
scale i s straightforward,
As a r e s u l t o f t h i s the
r e l a t l v e i m p o r t a n c e of peaks a r e e a s i e r t o assess ( C h a t f i e l d , 1984). For t h e most p a r t we w i l l use l o g p l o t s , detail.
In
some
cases,
since they
show more o f
p a r t i c u l a r l y when s e r i e s a r e a d j u s t e d ,
p l o t s s e r v e t o emphasize t h e d i f f e r e n c e s .
the
arithmetic
395 SPECTRA OF SEASONAL VARIABLE - TEMPERATURE
These
Logarithmic p l o t s o f t h e temperature spectra a r e g i v e n i n Figure 6 . three
spectra
identical. ,>
Ierriperature
~
band 0.0128
100
are
showing
nearly
most
of
the
variance a t low frequencies
and
dominated
by
the
frequency
of
once
Secondary
peaks
peak
at
per
exist
a
year.
a t twice
and t h r e e p e r y e a r b u t t h e s e a r e of
much
lower
With
magnitude.
t h e e x c e p t i o n o f t h e s e peaks t h e spectrum
is
that
random s e r i e s .
of
a
purely
The peaks a t t w o
and t h r e e p e r y e a r appear t o be
I 10
5
0
15
25
20
harmonics
of
frequency.
High values a t zero
the
principal
:JO
Qcles per Year
cycles per year i n d i c a t e s a h i g h degree o f F i g u r e 6.
autocorrelation within
the series.
A l s o e v l d e n t f r o m F i g u r e 6 i s t h e f a c t t h a t no c y c l e s e x i s t a t f r e q u e n c i e s Cycles a t t h o s e f r e q u e n c i e s m i g h t w e l l e x i s t i f man
g r e a t e r t h a n 10 p e r y e a r .
has a d i r e c t i n f l u e n c e on t h e w a t e r ( i . e . e f f l u e n t s o r o t h e r use). cycles
would
be l i k e l y
if
the
sampling
frequency
was
much
Additional
greater
(i.e.
d i u r n a l c y c l e s when t h e s a m p l i n g r a t e i s I n h o u r s ) . I n sampling a time as A t
series
one
of
the
main
concerns
is
the
sampling
I t i s o b v i o u s t h a t any sampling l e a d s t o a l o s s o f i n f o r m a t i o n
interval At.
increases.
However,
since cost
i s proportional
expensive t o reduce A t t o a v e r y s m a l l i n t e r v a l .
to l/At,
i t becomes
I n a d d i t i o n as A t becomes
s m a l l t h e degree o f redundancy i n t h e d a t a s e r i e s i n c r e a s e s ( W h i t f i e l d . 1983). For
the
frequency small.
sampled
increases.
series, This
the
spectral
indlcates
I f i t does n o t approach zero,
chosen.
estimates the
approach
current
At
is
zero
as
A t w h i c h has a p r o b a b i l i t y
the
sufficiently
t h e n a s m a l l e r v a l u e o f A t needs t o be
W h i l e t h i s i s n o t c r i t i c a l when s a m p l i n g f r o m a c o n t i n u o u s t r a c e ,
becomes i m p o r t a n t when g a t h e r i n g d a t a . case,
that
it
Sampling r e q u i r e s an i n i t i a l c h o i c e o f
o f being t o o l a r g e o r t o o small.
I n t h e former
t h e r e s u l t i n g d a t a i s d i f f i c u l t t o a n a l y z e , and i n t h e l a t t e r t h e d a t a
Is u n n e c e s s a r i l y expensive. I f t h e v a l u e o f A t t h a t i s chosen i s t o o l a r g e , a l i a s i n g may o c c u r . a l l a s r e s u l t s when v a r i a t i o n a t f r e q u e n c i e s (actually n/At)
greater than the
An
sampling r a t e
a r e f o l d e d back, p r o d u c i n g an e f f e c t i n t h e measured spectrum
396 (Chatfleld. peaks
are
1984). not
A1 l a s e d
i n the
apparent
temperature spectra. __
Unadiusted
lo;'?:
2
a a
1
LI) M 1
01
n 01
Ih
computer
program w h i c h
a l l o w s a s e r l e s t o be deseasonal-
/\
4
The
i s used t o e s t i m a t e t h e s p e c t r a
y
lzed.
-\ww
I n deseasonallzatlon t h e
average
value
removed
from
for
a
perlod
each
is
indlvidual
observatlon f o r t h a t period. the
0 on1
present
values
are
average
of
(eg.
.case
the
adjusted all
by
slmllar
t h e average
In
weekly
value
the weeks
for
the
t h l r d week I n J u l y I s s u b t r a c t e d
Soyrriour K i \ - o r
~
'Terriperature
~
band 0.0128
from
each
occurrence
for
the
t h i r d week I n J u l y ) . Spectra temperature
for
deseasonallzed
spectra
are
shown
w i t h t h e spectra f o r t h e o r i g i n a l d a t a s e r i e s i n F l g u r e s 7-9. each case.
the
spectra
deseasonallzed s e r l e s the
seasonal
harmonlcs 0
in
is 20 L)icl?s per Year
2s
peak
for the
lack
both
and
Its
(2&3/year).
spectra
closely
spectra
of
In
These
approximate t h e
random
series
which
have t h e same mean and v a r i a n c e .
100c
inc
2E "
1c
4
a
'n C
O
I
-1
01
n 01 5
F i g u r e 9.
in 15 20 ('ycles per Year
25
397 SPECTRA OF A TREND VARIABLE - pH
The s p e c t r a o f t h e pH s e r i e s f o r t h e t h r e e r i v e r s a r e shown i n F i g u r e 10. The o v e r a l l v a r i a n c e o f t h e s e t h r e e s e r i e s was q u i t e s m a l l , in
the
l o w magnitude
of
the
spectra
i n F i g u r e 10.
and i s r e f l e c t e d
The s p e c t r a
for
the
C a p i l a n o and Seymour R i v e r s i n d i c a t e a seasonal component, w h i l e t h i s f e a t u r e
i s l e s s evldent f o r t h e Coquitlam River.
pH
~
Capilano River
band 0.0128
1[
08
I
06
E
2
$
01
a,
04
a a
l i :
M -i
02
0 01
band 0.0128
-~
---
E
4
4
pH
~
'4 0 001
0 0 ,
20
15
10
25
Cycles per Year
F i g u r e 10.
Seyrriour River
pfi
~
Coquitlam River
band 00128
~
pH
-
band 00128 ~
..I
Unadjusted Adiusted for Trend
1-
0.8
j2
-
0.6-
04-
1
I ~
o;-''\
'"\ I_____
F i g u r e 12. Dominating a l l t h r e e s p e c t r a i s t h e s t r o n g peak a t z e r o frequency. peak
usually
indicates
autocorrelation.
a
trend
the
series
or
a
high
Such a
degree
of
F i g u r e s 11-13 show a r i t h m e t i c p l o t s o f t h e o r i g i n a l s p e c t r a
and t h o s e f o r t h e d e t r e n d e d s e r i e s . frequency.
D e t r e n d i n g t h e s e r i e s removes most o f t h e
Detrending
of
variance
at
seasonal
peak i n t h e C a p l l a n o R i v e r ( F i g u r e 11) o r t h e Seymour R i v e r ( F i g u r e
12).
zero
in
Detrendlng r e q u i r e s f i t t i n g a
the
linear trend
data over
does
not
affect
the
t h e t i m e p e r i o d and
398
TABLE 1 O e t r e n d i n g S t a t i s t i c s f o r t h e pH S e r i e s Series
Mean
Slope*
Max
M in
Capilano
original detrended
6.320 6.316
0.190 0.0
6.90 6.04
5.90 5.92
Seymour
o r ig i n a l detrended
6.322 6.322
0.182 0.0
7.00 6.99
5.800 5.82
Coqui t 1am
o r ig inal detrended
6.164 6.164
0.275 0.0
6.75 6.63
5.75 5.66
* T o t a l decrease o v e r 25 y e a r p e r i o d subtracting t h i s
from t h e o r i g i n a l
s t a t i s t i c s f o r each o f .the s e r i e s .
series.
Table 1 contains t h e p e r t i n e n t
A s i s e v i d e n t f r o m T a b l e 1 pH has shown a
decrease o v e r t h e p a s t 25 y e a r s on a l i n e a r b a s i s i n t h e s e t h r e e r i v e r s . might
be
better
to
try
higher
fits
order
to
this
relationship
to
It more
a d e q u a t e l y d e s c r i b e t h e t r e n d s l t u a t i o n e v i d e n t i n F i g u r e 2. The r e m a i n i n g peak a t z e r o f r e q u e n c y may be a t t r i b u t e d First
autocorrelation
Second,
of
the
series
may c o n t r i b u t e
l i n e a r f i t t i n g o f a higher order,
some o f
o r more c o m p l i c a t e d ,
t h e m a j o r source o f v a r i a n c e subsequent t o d e t r e n d i n g . subseries o f t h e o r i g i n a l series,
t o two sources.
with the inclusion of
t h e variance. t r e n d may be
Similar analysis of higher order
terms
m i g h t s e r v e t o e l i m i n a t e more o f t h e v a r l a n c e .
g i v e n as F i g u r e 14. case,
the
approaches frequency
I n o n l y one
spectral
(dJlldli0
1000-
estimates
zero w i t h Increasing (Seymour). Such
:
iao-
2 a, 0
spectra
1n d i c a t e
serious
problems w i t h t h e assumptions regarding these series. The o r i g i n a l data s e r i e s a r e h l g h l y episodic
and
the
data
10.
R u-
2
-1
11
01:
0017
II
b
kynlour roqultidm
399 such
Capilario River
Iiirbidity
~
barid 0.0128
~
series
series
often
which
results
are
in
approximately
normal. S p e c t r a f o r t r a n s f o r m e d and untransformed
series
are
given
f o r each o f t h e r i v e r s as F i g u r e s 15-17.
I n each case t h e s p e c t r a l
estimates
for
series
the
transformed
approaches
zero
as
frequency increases.
The s p e c t r a
for
series
the
transformed
t h e Seymour show
a
trend
frequency) ( 1 per
a
and
similar peak
is
results
original
(zero
component
a
harmonic
The C o q u i t l a m R i v e r
is
seasonal These
component seasonal
year)
component. series
for
and C a p i l a n o R i v e r s
although
the
less evident.
with
agree
observation
of
our
annual
midwinter high t u r b i d i t y periods (seasonal changes
features). in
described dominant
The
limit
detection
previously trend
are
the
component
and
t h i s i s reflected i n the figures as t h e peak a t z e r o f r e q u e n c y . F i g u r e 16.
Coquit larn Rive1
These s e r i e s a r e plagued by ~
nirbidity
~
band 0 0128
i n detectlon l i m i t .
t h e changes
I t i s l i k e l y t h a t most trend
component
frequency) the is
the
(i.e.
a
at
entirely
decreasing
fully or
is
of
zero
due
baseline.
difficult
the to This
situation
to
evaluate since truncation
censorlng original
of
baselSne
than s a t l s f y i n g . converse
(i.e.
later
data Is
to less
N e i t h e r .is t h e uncensoring)
of
t h e e a r l i e r d a t a a v e r y s a f e way
400 t o proceed.
C e n s o r i n g o f t i m e s e r i e s d a t a such as
i n t r o d u c e d by d e t e c t i o n
l i m i t s make t i m e s e r i e s a n a l y s i s v e r y d i f f i c u l t a t t h e p r e s e n t t i m e .
As b e t t e r
methods a r e developed f o r e s t i m a t i n g t h e s t a t i s t i c s o f censored d a t a ,
perhaps
t h e t i m e s e r i e s a n a l y s i s o f t h e s e t y p e s o f s e r i e s w i l l become more p r a c t i c a l . CROSS SPECTRA - BIVARIATE PROCESSES O f t e n i n w a t e r q u a l i t y a n a l y s e s one wishes t o c o n s i d e r t w o s e r i e s w h i c h are e i t h e r s i m i l a r or causally related.
These two s i t u a t i o n s
are the time
s e r i e s e q u i v a l e n t o f c o r r e l a t i o n and r e g r e s s i o n o f two s e r i e s X ( t ) and Y ( t ) . I n t h e t i m e domain, t h e n a t u r a l t o o l f o r e v a l u a t i n g such r e l a t i o n s h i p s i s I n t h e f r e q u e n c y domain t h e e q u i v a l e n t i s t h e
the cross-correlation function. cross-spectrum.
Usually
three
functions
are
plotted,
to
r e l a t i o n s h i p between t w o s e r i e s i n t h e f r e q u e n c y domain.
describe
the
These a r e coherence,
phase and g a i n , a l t h o u g h o t h e r f u n c t i o n s may be s u b s t i t u t e d ( C h a t f i e l d . 1984). The p l o t o f coherence I s a measure o f t h e l i n e a r a s s o c i a t l o n between t h e two t i m e s e r i e s a t v a r i o u s f r e q u e n c i e s and i s analogous t o t h e square o f t h e correlation
coefficient.
The
closer
the
coherence i s t o u n i t y ,
t h e more
c l o s e l y r e l a t e d a r e t h e two s e r i e s . The p l o t o f phase shows how t h e l i n e a r f i l t e r f o r f i t t i n g one s e r i e s f r o m the
other
shifts
the
phase
of
sinewaves
at
different
frequencies.
This
f u n c t i o n may be i n t e r p r e t e d a t t h e a n g l e between t h e f r e q u e n c y component o f X ( t ) and t h e c o r r e s p o n d i n g comparison o f Y ( t ) . . , linear
association.
This
Is
analogous
to
i.e.
the
the direction of
sign
of
the
the
correlation
coefficient. The p l o t o f g a i n shows how t h e f i l t e r f o r f i t t i n g Y ( t ) g a i n s o v e r X ( t ) a t f r e q u e n c y X.
This i s o f t h e nature o f t h e absolute value o f coefficient
and
a regression
is
a
constant
w i t h respect t o frequency. Capilmo Kiver.
~
Seymour K1wr
~~
Temperature
is
analogous
to
the
This
regression
c o e f f i c i e n t a t each f r e q u e n c y .
0
F i g u r e 18 I s a p l o t o f t h e coherence
between
Seymour
Capilano
River
temperatures.
The
between
two
these
particularly seasonal
high
and water
coherence series around
Is
the
peak and g e n e r a l l y l o w
a t o t h e r frequencies. The Cycles per Year
Figure 18.
plot
of
phase
(Figure
19) shows t h a t t h e l i n e a r f i l t e r
401 Capllano Ril-er - Seymour River - Temperature
.
'I
3
LO
5
Q
15
20
25
Cycles per Year
wiles per Year
F i g u r e 20.
F i g u r e 19
f o r f i t t i n g Seymour t e m p e r a t u r e s f r o m C a p l l a n o t e m p e r a t u r e s I s near z e r o a t a l l frequencles.
This
i n d i c a t e s t h a t t h e two
s e r l e s a r e synchronous.
The
p l o t o f g a i n o f t h e same f i l t e r ( F i g u r e 20) i s c o n s t a n t a t most f r e q u e n c i e s w i t h a mean o f
zero
suggests
the
that
(log of
1).
association
This
coupled w i t h
between
t e m p e r a t u r e s e r i e s I s a l m o s t immediate. f r e q u e n c y t h l s I s w i t h i n one week.
the
the
Capllano
zero
and
phase s h i f t
Seymour
RIver
I n terms o f t h e u n d e r l y i n g s a m p l i n g
Such a r e s u l t seems r e a s o n a b l e s i n c e b o t h
s e r i e s a r e t h e r e s u l t o f t h e same I n f l u e n c e s ( s o u r c e s o f h e a t and m o i s t u r e ) . APPLICATION PROBLEMS I n u n d e r t a k i n g t h i s s t u d y I found a number o f a r e a s where t h e n e x t s t e p was
l e s s than
I n t u l t l v e l y obvious.
I t I s d i f f i c u l t t o make t h e t r a n s i t l o n
between t i m e domain and f r e q u e n c y domaln concepts.
One becomes aware t h a t one
i s c o n s t a n t l y e x p r e s s i n g o n e s e l f i n analogues f r o m t i m e domain a n a l y s l s . i s always t h e case when one v e n t u r e s f r o m t h e known t o t h e unknown.
Such
There a r e
a c o u p l e o f s p e c i f i c a r e a s t h a t I f e e l a r e w o r t h m e n t i o n i n g . namely c h o o s l n g a band w i d t h and a l i a s i n g . Choosing a Band W l d t h The s e l e c t i o n o f an a p p r o p r i a t e band w l d t h f o r smoothing t h e s p e c t r a I s crltical.
I n a l a r g e number o f peaks w h l c h
Too n a r r o w a band w i d t h r e s u l t s
may o r may n o t be r e l e v a n t t o e v a l u a t l n g t h e spectrum. broad a band w i d t h relevant cycles.
results
i n an o v e r
The a p p l i c a t i o n s s o f t w a r e t h a t
spectrum and t h r e e
bandwidths
as d e s c r l b e d
band w l d t h s were t o o n a r r o w f o r broader
band
smoothed
wldth
(0.0128)
f r e q u e n c i e s o f p r i m a r y concern.
was
t h e data used
I used p r o v i d e s t h e e x a c t
previously. series that
and
was
On t h e o t h e r hand t o o
spectrum w h i c h m i g h t mask A l l t h r e e standard
I was examlnlng.
effective
A
I n showing t h e
402
The p r o b l e m w h l c h a r l s e s t h e n I s s e l e c t i n g a band w l d t h w h i c h m l g h t s e r v e f o r many p o t e n t l a l v a r l a b l e s . choice
of
a
band
Is
wldth
remain a
trial
slmllar
to
chooslng
a
class
Slnce t h e
Interval
when
i t I s d l f f l c u l t t o recommend a s i n g l e band w i d t h .
c o n s t r u c t l n g a hlstogram,
It w l l l
and f o r r e c o r d s o f v a r y l n g l e n g t h s .
and
error
process
for
each
series.
a p p l l c a b l l l t y o f t h e method I s r e s t r i c t e d because o f t h i s ,
The
general
s i n c e "automatic"
s e a r c h l n g o f s p e c t r a c a n n o t I n c l u d e t r l a l and e r r o r procedures. Allaslng The equal occurs
when
interval.
s p a c l n g I n t l m e o f o b s e r v a t l o n s may I n t r o d u c e a l l a s l n g . osclllatlons
exist
at
frequencles
greater
than
the
This
sampllng
A l l a s e d peaks I n t r o d u c e d i n t o s p e c t r a can be m i s l e a d i n g I f t h e y a r e
n o t r e c o g n l z e d as a l i a s e s .
Two problems a r e t h e a t t r l b u t i o n o f some mechanism
t o an a l l a s e d peak, o r t h e masklng o f a r e l e v a n t peak by an a l l a s e d peak. There a r e two o p p o r t u n i t i e s f o r d e a l l n g w l t h t h e a l l a s l n g problem.
One
o p t l o n i s t o choose a s a m p l i n g f r e q u e n c y w h i c h I s s m a l l enough so t h a t a l l s l g n t f i c a n t f r e q u e n c l e s a r e a d e q u a t e l y sampled.
T h l s o p t i o n may be f i n e f o r
v a r l a b l e s w h l c h can be measured
but
expensive f o r o t h e r varlables.
electronically
prohlbitlvely
The o t h e r o p t i o n I s t o c o l l e c t d a t a a t a g i v e n
The f l l t e r e d s e r l e s I s t h e n sampled a t a new I n t e r v a l
a t higher frequencies.
w h i c h I s an I n t e g e r m u l t l p l e o f t h e o r i g i n a l r a t e .
is
be
T h l s d a t a s e r i e s i s t h e n f i l t e r e d t o remove t h e unwanted power
sampllng r a t e .
mechanism
would
a
weighted
average
of
which
One p o s s l b l e f l l t e r l n g
numerous examples
exist
I n the
literature. APPLICATION ADVANTAGES S p e c t r a l a n a l y s i s t e c h n i q u e s a r e used t o perlodicltles serles spectra
I n t o a sum o f produced
s i n e and c o s l n e waves
represent
f r e q u e n c y components.
look
for
cyclical patterns
or
The F o u r i e r T r a n s f o r m I s used t o decompose t h e d a t a
I n data.
a
sum o f squares
of
various
frequencies.
( a s I n ANOVA)
The
f o r each o f t h e
Smoothlng o f t h e s p e c t r a l e s t i m a t e s g l v e s
a
spectrum
T h l s methodology w i l l be e x t r e m e l y u s e f u l I n i d e n t l f l c a t l o n o f
perlodlc
which can be u s e f u l l y a p p l i e d t o t h e d e s c r l p t l o n o f t h e I n p u t s e r i e s . and t r e n d components I n w a t e r q u a l i t y t l m e s e r i e s .
L i k e a l l methods t h e r e a r e
some p r a c t l c a l problems, however, t h e s e would appear t o be m i n o r . water q u a l i t y time series,
Spectra f o r
w l t h a p p r o p r i a t e smoothing, can be used t o l d e n t l f y
p e r l o d l c components o f p r a c t i c a l s l g n i f l c a n c e . W i t h t h e I n c r e a s i n g a v a i l a b i l i t y o f s p e c t r a l methods on microcomputers, and t h e g e n e r a l decrease I n computlng c o s t s , becoming more p r a c t i c a l .
appllcatlon of spectral analysis I s
A l t h o u g h m a t h e m a t i c a l l y complex,
the application o f
t h e a v a i l a b l e s o f t w a r e I s s t r a i g h t f o r w a r d , and t h e r e s u l t s o f p r a c t i c a l use.
403 CONCLUSIONS T h i s paper has shown a number o f a p p l i c a t i o n s o f s p e c t r a l a n a l y s i s t o t h e description presented,
and the
analysis spectra
of
water
obtained
quality
for
a
time
data
features o f the o r l g i n a l time series p l o t .
In
series.
series
each
confirmed
case
observable
Applications of spectral analysis
t o w a t e r q u a l i t y s e r i e s w i l l a l l o w s l g n i f i c a n t f r e q u e n c i e s t o be I d e n t i f i e d and e v a l u a t e d , a s i g n i f i c a n t a i d i n t r e n d assessment. Spectral a n a l y s i s i s a u s e f u l t o o l i n t h e a n a l y s i s o f water q u a l i t y time series. and
T h i s methodology complements t!me
Jenkins
(1976).
Brillinger
domain methods such as t h o s e o f Box
(1981a)
suggests
domain methods, as w e l l as t h e i r h y b r i d s , a r e a l l i e s ,
that
time
and f r e q u e n c y
r a t h e r than competitors.
ACKNOWLEDGEMENTS I would
District here. in
like
for
to
having
thank
Bob
provided
us
Jones with
and
the
Greater
Vancouver
the extensive data
Regional
s e r i e s presented
I n 1981, I a t t e n d e d t h e Time S e r i e s Methods i n Hydrosciences Conference
Burlington,
Ontario
where
I had
S p e c t r a l A n a l y s i s by David B r i l l i n g e r .
the
opportunity
The m a t e r i a l
to
be
introduced
he p r e s e n t e d ,
to
and o u r
subsequent d i s c u s s i o n s , l e d t o t h e a p p l i c a t i o n p r e s e n t e d h e r e . LITERATURE CITED F o u r i e r a n a l y s i s o f t i m e s e r i e s : An i n t r o d u c t i o n . John B l o o m f i e l d P., 1976. 258 p. Wlley. and G.M. J e n k i n s , 1976. Time S e r i e s A n a l y s i s : F o r e c a s t i n g and Box, G.E.P. C o n t r o l . Holden-Day. 575 p. B r i l l i n g e r D.R., 1981. Some c o n t r a s t i n g examples o f t h e t i m e and f r e q u e n c y I n : Time S e r i e s Methods i n domain approaches t o t i m e s e r i e s a n a l y s i s . Hydrosciences ( A . H . El-Shaarawi and S.R. E s t e r b y e d t . ) 1-15. B r i l l i n g e r D.R., 1981. Time S e r i e s : Data A n a l y s i s and Theory. Holden-Day. 540 p . C h a t f l e l d C . , 1984. The a n a l y s i s o f Time S e r i e s : An i n t r o d u c t i o n . Chapman and H a l l . 286 p. C h a t f i e l d C . and M.P.G. Pepper. 1971. Time-Series A n a l y s i s : An example f r o m g e o p h y s i c a l d a t a . A p p l i e d S t a t i s t i c s 20:217-238. D i x o n W.J.. 1983. BMDP S t a t i s t i c a l S o f t w a r e . U n i v e r s i t y o f C a l i f o r n i a Press. Jones R.H., 1964. S p e c t r a l a n a l y s i s and l i n e a r p r e d i c t i o n o f m e t e r o l o g i c a l time series. J. A p p l i e d M e t e r o l o g y 3:45-52. 1965. A reappraisal o f t h e periodogram i n s p e c t r a l analysis. Jones R.H.. Technometrics 7:531-542. W h i t f i e l d P.H., 1983. E v a l u a t i o n o f w a t e r q u a l i t y s a m p l i n g l o c a t i o n s on t h e Yukon R i v e r between Dawson, Yukon T e r r i t o r y and Eagle, Alaska. Water Resources B u l l e t i n 19:115-121. W h i t f i e l d P.H and P.F. Woods, 1984. I n t e r v e n t i o n a n a l y s i s o f water q u a l i t y r e c o r d s . Water Resources B u l l e t i n 26:657-668. Z l m e r m a n M . . 1981. A beginner's guide t o Spectral Analysis. Part 1 Byte F e b r u a r y 1981 :68-90. Z i m e r m a n M., 1981. A beginner's guide t o Spectral Analysis. P a r t 2 Byte March 1981 3166-198.
This Page Intentionally Left Blank
RAYES ESTIMATION OF PAPAMETERS OF FIRST ORDER AUTOREGRESSIVE PROCESS
M. S . Ahu-Salih
A. A.
Deyartment o f S t a t i s t i c s
Department o f S t a t i s t i c s
Yarmouk U n i v e r s i t y , I r b i d , JORDAN
King Saud U n i v e r s i t y , Riyadh,
Abrl-Alla
SAUDI ARABIA INTRODUCTION
1.
Consider t h e s t a t i o n a r y a u t o r e g r e s s i v e p r o c e s s of o r d e r o n e , AR(1) w i t h z e r o mean,
yt = eyt-l+
where
{E
t
1
E
~ ~,
E
= I
I ..., -2, -1, 0,
a r e i n d e p e n d e n t normal
1, 2,
...I
( 0 , ~ ’random )
(1.1)
v a r i a b l e s and 181.1.
This is
o f t e n a s t a t i o n a r y r e p r e s e n t a t i o n of t h e e r r o r t i m e s e r i e s i n economic models. S e v e r a l methods o f e s t i m a t i o n l i k e Yuie-Walker’s,
maximum l i k e l i h o o d ,
c o n d i t i o n a l maximum l i k e l i h o o d ( C M L ) , and l e a s t s q u a r e s , were u s e d by Box and Jenkins
(1970), F u l i e r (1976),Hasza (1980)and o t h e r s t o e s t i m a t e t h e unknown
p a r a m e t e r s 8 and
4‘.
Box and J e n k i n s (1970) o b t a i n e d t h e Bayes’ e s t i m a t o r for
8 assuming n o n - i n f o r m a t i v e p r i o r s on 8 and u’.
Abd-Alla and Abouammoh ( 1 9 8 2 )
c a l c u l a t p d Bayes’ e s t i m a t e s o f 8 u s i n g n u m e r i c a l i n t e g r a t i o n methods and assuming uniform and normal p r i o r s f o r 8. I n t h i s p a p e r w e o b t a i n t h e Bayes‘ e s t i m a t o r s f o r 8 and
0’
under i n f o r m a t i v e
and n o n - i n f o r m a t i v e p r i o r s , namely,
The e s t i m a t o r s are compared w i t h t h e CbIL and Box and J e n k i n s ’ e s t i m a t o r s through simulation.
2.
AAYES ESTIMATORS Consider t h e t i m e s e r i e s g i v e n by (l.l), where 8 and u2 a r e assumed t o be
random v a r i a b l e s w i t h j o i n t p r i o r g i v e n i n ( i ) . T h i s p r i o r assumes t h e independence o f u2 and 8 w i t h an i n v e r t e d gamma p r i o r on u2 and a p r o b a b i l i t y
406
density function (pdf) prior on 8 given by l / I I ( l - e 2 ) % for -1<8
We feel it is
necessary to truncate the prior on 8 because the series is stationary.
Prior
(i) reduces to Box and Jenkins’ prior if a+O, d=l and the range of 8 is (-=, Let Y1,Y2,
... ,Y
be a random sample from (1.11, and y. = (y,,y,
-.
realization of such a random sample.
y
= (yl,y2,...y
,...yn) be
m).
a
The likelihood function of
) is given by:
The posterior joint pdf of 8 and u 2 is given by: (2.2)
where
where n
R.=
n y.y. 1-1/,1 i=2 1=3 C
If
2
n and A = { ( 2 a +
2
n
I: yi)/ 1 ~ i=3 i=3
2
2
.
~ -R -
~
j
is an integer we use standard integration formulas and get:
407 cle
J -I
-
[-
(n1!)’(4A)~
-
2
r(0-R) + A 1 2
l
m C R(j)+KI, where i=l
Hence
from a Bayesian point of view we consider a - = (y,,y2, . . . , y ) denotes a realization of the decision function w(y) where y
In seeking an estimate of
0’
...,Yn’ and we assume a squared error loss function
random samp1.e Yl,Y2,
The Bayes’ estimator of u2 with respect to squared-error loss (2.4) is the given by: posterior mean of I?’
o21 = E[o’IyI. I
Using (2.2) and (2.3) we get:
(2.5)
Simplifying (2.5),we get: n A “2 0
1
=
2
1 Yi-q
i=3 -~
n+2d-3
[l-
Bim)
Z R( j)+2 AK j=1
Noticing that,
(2.6)
408
m ->
m
we g e t R ( m ) ->
0 as m ->
m
Substituting for A we get:
(2.7)
The Rayes e s t i m a t o r o f 8 w i t h r e s p e c t t o s q u a r e d - e r r o r
loss is
Carrying o u t t h e i n t e g r a t i o n and s i m p l i f y i n g w e g e t :
Since,
1
1
-
[A+(l+R)21m
[A+(1-R)'Im
2
i-
=
Am
and by S t i r l i n g ' s f o r m u l a , w e have
Then,
81
= R =
1 Y.Y. / i=2 1 1=1
C y;-, i=3
(2.8)
We n o t i c e t h a t f o r l a r g e n , t h e Bayes' e s t i m a t o r o f 8 under p r i o r ( i ) i s t h e same as t h a t o f Box and J e n k i n s (1970). Next, w e c o n s i d e r t h e j o i n t p r i o r pdf ( i i ) . To g e t t h e p o s t e r i o r pdf o f 8 and u2
, the
computation i s g r e a t l y s i m p l i f i e d i f Y1 i s t r e a t e d as f i x e d and
409 the conditional likelihood is used.
This idea is justified and used by Fuller
(1976) to obtain the CML estimators of 0 and u’.
The conditional likelihood function of
=
(y2,...,y ) given y1 is:
The posterior joint pdf of 8 and uz is given by: (2.10)
Carrying out the integration as in (2.3) we get:
(2.11)
n+2d-1 n+2d-3 whenever - is an integer and where rn = 2 ’
1
L, = --(tan i,
-1
* 6
-1
+tan
l+r
-),
b
5
r =
n (
a =
c Yi+2d i=2 2 - r n % 1 yi-1 i=2
The Bayes’ estimator of uz is:
n n 2 I: Y ~ Y ~ - Y~ /~ , -~ ~ i=l 1=2
410
n
7
(2.12)
The Rayes’ e s t i m a t o r of 8 i s :
L a s t l y we c o n s i d e r t h e c o n j u g a t e p r i o r of 8 and c2 given i n ( i i i ) . The p o s t e r i o r pdf o f
e
and u2 i s g i v e n by
C a r r y i n g o u t t h e i n t e g r a t i o n o v e r 8 t h e n o v e r u2 we g e t :
Where
D =
n [ C
2
2
yi-(l+ Z y i=2 i-1 i=2
’c’ i=2
411 The Bayes' e s t i m a t o r o f 0 i s :
n
n
n (2.15)
S i m i l a r l y t h e Bayes' e s t i m a t o r o f u2 i s :
-
3.
[
n+2d-3
"
c
i=2
2
y.-( 1
c
2
"
2
y.y. ) (1+ c y . r 1 + 2 a 1 ; i = 2 1-1 i = 2 1 1-1
(2.16)
RESULTS
I n o r d e r t o compare t h e performance o f t h e d i f f e r e n t estimates used i n t h i s p a p e r , we g e n e r a t e d samples o f s i z e s 50, 1 0 0 , 1 5 0 , and 700 f o r e a c h of t h e twenty -0.1,
f'our ( o , U 2 0.2,
cornbkatioris of parameter v a l u e s , namely, for 0 = -0.7,-0.4,
0.5, 0 . 8 , and O = 0 . 5 ,
1.0,
1 . 5 , and 2 . 0 .
The v a l u e s o f d and c1 were
e s t i m a t e d e m p i r i c a l l y by t h e method of moments.
For each combination o f p a r a m e t e r v a l u e s and sample s i z e , one hundred samples were g e n e r a t e d .
For e a c h sample, e s t i m a t e s o f 0 and u were c a l c u l a t e d u s i n g
t h e methods d e r i v e d i n t h i s p a p e r . each e s t i m a t e were r e c o r d e d . A2
u.
J'
T he mean and mean-square e r r o r (MSE) o f
Tables 1 t o
4
give t h e values of
Bi,
i = 1 , 2 , 3 and
j=l,2,3, and 4 for t h e t w e n t y four combinations mentioned. Since
el
i s t h e same as t h e e s t i m a t o r u s e d by Box and J e n k i n s as shown i n
(2.8), we d i d n o t i n c l u d e t h e l a t t e r i n t h e comparison.
A2
The estimate uL, i s
Box and J e n k i n s ' estimate o f u2 and it w a s i n c l u d e d i n t h e t a b l e s f o r t h e s a k e o f comparison w i t h our estimates.
I t i s n o t i c e d t h a t t h e v a l u e s o f e s t i m a t e s o f 8 by t h e d i f f e r e n t methods
412 ( i n c l u d i n g Box and J e n k i n s ) a r e v e r y c l o s e t o one a n o t h e r and v e r y c l o s e t o The same may b e s a i d a b o u t t h e estimates o f u 2 , b u t it i s
t h e assumed v a l u e .
n o t i c e d t h a t t h e l a r g e r t h e sample s i z e t h e c l o s e r t h e r e s u l t s a r e t o t h e assumed v a l u e . Observirrg t h a t I c . f .
GCL
=
(
'z'
yiyi-l)( i=2
I'
Z
i u l l e r 1Y761 t h e CML of 2
we notice t h a t
6 is
icLi s
very close t o
6
i=2
i n e q u a t i o n ( 2 . 1 6 ) and t h e r e f o r e
GcL
i s not included i n t h e t a b l e s .
as g i v e n
0'
=
e =
-0.7
n
6
50
01
0.110 0.109 0.110
-0.386 -0.381
-0,369
0.081 0.081 0.082
-0.398 -0.395 -0.388
0.101 0.099 0.098
-0.097 -0.096 -0.094
a 3 -0.679
0.063 0.063 0.063
-0.398 -0.396 -0.392
0.073 0.072 0.072
-0.694 -0.692 -0.689
0.052 0.052 0.052
-0.402
-0.401
0.471 0.465 0.474 0.482
0.077
gi2 0.479
i2
i1 -0.695 83
-0.691 -0.683
e2
-0.686 -0.684
,e2 83 6 2
ita32 c4
100
-0.686 -0.677 -0.663
-0.1 Mean M.S.E. 0.151 0.149 0.143
150
50
M.S.E.
-0.090 -0.088 -0.085
a2
200
-0.4 Mean M.S.E.
Mean
0.133 0.130 0.127
83
100
TABLE 1
0.5
2
g2 03; 64
0.476 0.481 0.485
0.2 Mean M.S.E.
0.199
0.5 Mean M.S.E.
0.8
Mean
M.S.E.
0.196 0.188
0.125 0.124 0.119
0.474 0.466 0.151
0.133 0.130 0.128
0.788 0.781 0.768
0.096 0.096 0.098
0.094 0.093 0.092
0.201 0.199 0.195
0.093 0.092 0.091
0.490 0.787 0.479
0.084 0.084 0.083
0.788 0.784 0.778
0.063 0.064 0.064
-0.097 -0.096 -0.095
0.084 0.084 0.083
0.200 0.199 0.196
0.077 0.076
0.075
0.486 0.484 0.479
0.070 0.070 0.069
0.790 0.787 0.783
0.054 0.054 0.054
-0.397
0.066 0.067 0.066
-0.096 -0.096 -0.095
0.063 0.063 0.062
0.201 0.200 0.198
0.070 0.070 0.069
0.506 0.504 0.500
0.058 0.057 0.057
0.794 0.793 0.790
0.042 0.043 0.043
0.076 0.076 0.079
0.490 0.484 0.487 0.502
0.096 0.095 0.095 0.099
0.495 0.486 0.486 0.502
0.094 0.093 0.093 0.096
0.472 0.465 0.466 0.479
0.097 0.097 0.097 0.100
0.466 0.459 0.464 0.471
0.090 0.089 0.089 0.092
0.497 0.493 0.505 0.483
0.093 0.093 0.093 0.083
0.062 0.062 0.062 0.063
0.491 0.487 0.489 0.496
0.071 0.070 0.071 0.072
0.501 0.496 0.497 0.505
0.059 0.059 0.059 0.060
0.482 0.477 0.478 0.486
0.066 0.065 0.065 0.066
0.492 0.489 0.491 0.496
0.072 0.073 0.073 0.074
0.504 0.501 0.508 0.504
0.066 0.066 0.066 0.066
0.060 0.061 0.061
0.061
0.501 0.498 0.500 0.505
0.059 0.059 0.059 0.059
0.497 0.494 0.495 0.500
0.059 0.059 0.059 0.060
0.491 0.488 0.488 0.494
0.051 0.052 0.052 0.053
0.488 0.485 0.487 0.491
0.064 0.063 0.063 0.064
0.491 0.490 0.494 0.493
0.050 0.050 0.050 0.050
0. 045 0. 045 0.045 0.046
0.492 0.490 0.491 0.495
0.055 0.055 0.055 0.055
0.492 0.490 0.490 0.494
0.048 0.048 0.048 0.049
0.497 0.495 0,495 0.500
0.051 0.051 0.051 0.051
0.498 0.496 0.498 0.500
0.049 0.050 0.050 0.050
0.491 0.491 0.494 0.493
0.049 0.049 0,049 0.049
uz = 1 8 =
-0.7
n
8
Mean
50
:i i 2
-0.686 -0.677
83
-0.670
M.S.E.
-0.4 Mean M.S.E.
TABLE -0.1 Mean M.S.E.
CL rp
2 9.2 Mean
..
:.I.S E
Mean
IF-
0.8
0.5
M.S.E.
Mean
M.S.E.
0.096 0.096 0.097
0.110 0.109 0.110
-0.386 -0.381 -0.375
0.133 0.130 0.128
-0.090 -0,088 -0.087
0.151 0.149 0.146
0.199 0.125 0.196 0.124 0.192 0.122
0.474 0.133 0.466 0.130 0.458 0.129
0.788 0.781 0.774
8 3 -0.687
0.081 0.081 0.081
-0.398 -0.395 -0.392
0.101 0.099 0.099
-0.097 -0.096 -0.095
0.094 0.093 0.093
0.201 0.093 0.199 0.092 0.i97 0.092
0.490 0.084 0.487 0.081: 0.&83 0.084
0.788 0.063 0.784 0.06L 0.781 0.064
-0,686 -0.684 6 3 -0.682
0.063 0.063 0.063
-0.398 -0.396 -0.394
0.073 0.072 0.072
-0.097 -0.096 -0.096
0.084 0.084 0.083
0.200 0.199 0.197
0.076
0.486 0.070 0.484 0.070 0.481 0.069
0.790 0.054 0.787 0.054 0.785 0.054
-0.694 -0.692 -0.690
0.052 0.052 0.052
-0.402 -0.401 -0.399
0.066 0.066
-0.096 -0.096 -0.095
0.063 0.063 0.063
0.201 0.070 0.200 0.070 0.199 0.070
0.506 0.058 0.504 0.057 0.502 0.057
0.794 0.042 0.793 0.042 0.792 O.Oh2
0.155 0.153 0.153 0.159
0.978 0.967 0.970 1.003
0.191 0.190 0.190 0.198
0.988 0.970 0.970 1.005
0.188 0.185 0.185 0.193
0.942 0.193 0.927 0.194 0.928 0.194 0.959 0.201
0.928 0.180 0.914 0.178 0.919 0.178 0.942 0.184
0.977 0.969 0.981 0.966
0.173 0.174 0.174 0.167
0.124 0.123 0.123 0.126
0.982 0.974 0.976 0.993
0.142 0.141 0.14: 0.1~4
1.002 0.992 0.992 1.011
0.118 0.118 0.118 0.120
0.962 0.954 0.954 0.971
0.983 0.976 0.979 0.993
0.144 0.145 0.145 0.147
1.003 0.598 1.005 1.008
0.132 0.131 0.131 0.132
0'981 0:990
0.121 0.121 0.121 0.123
1.002 0.997 0.998 1.009
0.117 0.117 0.117 0.119
0.994 0.989 0.989 1.001
0.117 0.118 0.118 0.120
0.982 0.102 0.975 0.104 0.976 0.104 0.987 0.105
0.975 0.127 0.970 0.127 0.971 0.127 0.981 0.128
0.981 0.977 0.982 0.986
0.100 0.100 0.100 0.101
0.997 0.987 0.997 1.004
0.090 0.090 0.090 0.091
0.984 0.980
0.109 0.109 0.109 0.110
0.963 0.979
0.095 0.096 0.096 0.097
0.994 0.101 0.990 0.101 0.990 0.101 0.999 0.102
0.996 0.997 0.993 1.001
0.981 0.098 0.980 0.098 0.983 0.098 0.987 0.099
-0.695
i 2 -0.691 dl
e2
61 $2
e3
,.' 00'978 981
''
"
$;
:j $4
0.981 0.990
0.067
0.979 0.988
0.077
0.076
0.131 0.130 0.130 0.132
0.099 0.099 0.099 0.100
02
TABLE 3
=1.5
a =
-0.7 Mean
M. 3.0,.
-0.686
0.110
-0.677 -0.672
0.109 0.109
-0.695 -0.691
0.081 0.081
-0.14 Plean
0.5
0.2
-0.1
M.S.E.
Mean
M.S.E.
Mean
0.133
0.151
o.iL7
0.199 0.196 0.193
..
>I S. E
0.8
Mean
M.S.E.
0.125 o.12L
0.47i 0.466
0.1.22
0.461
0.133 0.130 0.129
0.490
0.08h
0.788
0.487
0.084 0.084
0.784 0.782 0.790 0.787
Mean
b1.S.E.
-0.386 -0.381 -0.37’7
0.130 0.129
-0.090 -0.088 -0.087
0.101 0.099 0.099
-0.097 -0.096 -0.095
0.094 0.093 0.093
C.201
-0.688 0.081
-0.398 -0.395 -0.393
0.199 0.198
0.093 0.092 0.092
-0.686 -0.68k -0.682
0.063 0.063 0.063
-0.398 -0.396 -0.395
0.073
0.08L 0.084 0.083
0.200 0.199 0.198
0.077 0.076 0.076
o.l.86
0.072
-0.097 -0.096 -0.096
0.h8h 0.482
0.070 0.070 0.070
0.786
0.054 0.054
-0.694 -0.692 -0.691
0.052 0.052 0.052
-0.401 -0.399
0.066 0.067 0.066
-0.096 -0.096 -0.095
0.063 0.063 0.063
0.201 0.200 0.200
0.070 0.070 0.070
0.506 0.504 0.503
0.058 0.057 0.057
0.794 0.793 0.792
0.042 0.042 0.042
1.412 1.389 1.390 1.438
0.289 0.291 0.291 0.301
1.390 1.369 1.373 1.413
0.269 0.266 0.266
1.443 1.430 1.431
0.197 0.194
0.194
1,457
0.198
0.176 0.177 0.177 0.179
1.473 1.h63 1.463
0.154 0.156 0.156 0.158
0.143 0.144 1.468 0.144 1.482 0.116
1.1191 1.&85
-0.402
1.466
0.072
0.lL9
1.410 0.232 1.393 0.229 1.403 0.229 1.4117 0.238
1.450 1.153 1.505
1.435 0.186 1.427 0.185 1.432 0.185 1 . 4 5 5 0,189
1.473 1.361 1.463 1.489
0.213 0.211 0.215
1.488 0.177 1.488 0.177 1.516 0.181 1h91 7 A83 A83 I.501
1.471 0.181
1.503
1.466 0.182
1-1-95
1.469
0.287 0.285 0.285 0.297
0.211
1 *481 0.282 1.453
0.277 1 . 4 5 1 0.278 I.508 0.289 1.503
0.182
1.496
1.485 0.184
1.514
0.176 0.176 0.176 0.178
1.496 1.491
1.477
0.161
1.475
1.470
0.163 0.165 0.165
1.L68
0.135 0.136 1.494 0.136 1.506 0.117
1.471 1.465
0.177
1.1481
1.1385 1.499
0.152 0.152 0.152 0.153
0.481
1.462 1.1454 1.456
0.788
0.781 0.776
0.096 0.096 0.097 0.063 0.064 0.064 0.054
0.276
1.472
0.191 0.190 0.190 0,192
1.494
0.148
1.471
0.147
1.488
0.149
1.489 1.501
0.149
1.469 1.472
0.147 0.147
0.150
1.480
0.148
*
F
01
b 0
=
n 50
100
-0.7 0 Meari M.S.E. 0.110 Cil2 -0.686 0.109 ^ S z 2 -0.677 0.109 8 3 2 -0.674
HIz
M.S.E. 0.125 0.124 0.123
0.5 Mean 0.474 0.466 0.462
...
M S E
0.8 Mean
M.S.E.
0.133 0.130 0.129
0.788
0.781 0.777
0.056 0.096 0.097
0.788 754 0.783
0.063 0.364 0.064
0.790 0.786
0.054 0.054 0.054 0.042 0.042 0.042
-0.398 -c.395 -0.393
0.101 0.399 0.099
-0.097 - C , 296 -0.095
0.094 c.093 0.093
0.201 z.193 0.198
0.093 c.092 0.092
0.490 0.487 C.485
0.084
-0.398 -0.396 -0.395
0.073 0.072 0.072
-0.-97
-0.683
0.063 0.063 0.063
0.084 0.084 0.084
0.200 0.199 0.198
0.077 0.076 0.076
0.486 0.484 0.483
0.070 0.070 0.070
-0.694 -0.692 -0.691
0.052 0.052 0.052
-0.402 -0.401 -0.400
0.066
-0.096 -0.096 -0.095
0.063
0.067 0.066
0.063 0.063
0.201 0.200 0.200
0.070 0.070 0.070
0.506 0.504 0.503
0.058 0.057 0.057
0.794 0.793 0.792
c . 309
1 *954
u4*
1.880 1.857 1.866 1.929
0.305 0.305 0.317
1.932 1.935 2.007
0.382 0.380 0.380 0.396
1.973 1.937 1.938 2.010
0.375 0.370 0.370 0.385
1.881 1,851 I .852 1.918
.36f 0.388 0.388 0.402
1.852 1.823 1.828 1.884
0.358 0.355 0.355 0.367
1.930 1.914 1.927 1.932
0.333 0.335 0,335 0,333
i12
1.913
1.940
1.964 1.948 I. 950 1.986
0.284 0.282 0.282
uh2
0.248 0.247 0.247 0.252
2.003 1.983 ‘I .983 2.021
0.236 0.236 0.236 0.241
1.924 1.907 1 .go7 1.942
0.263 0.259 0.259 0.264
1.964 1.951 1.953 1.985
0.288 0.289 0.289 0.294
1.999 1.990 1.997 2.015
0,262 0,261 0.261 0,264
1.961 1.955
0.242 0.242 0.242 0.245
2.003 I. 993 1.994 2.019
u .234
1.988 1.977 1.977 2,002
0.234 0.236 0.236 0.739
964 1.950 I .950 1.974
3.205
0.234 0.234 0.237
0.207 0.207 0.210
1.949 1.938 1.940 1.962
0.254 0.253 0.253 0.256
1.9ie 1.952 1.956 1.972
0,200 0.199 0.199 c.201
1.995 1.989 1.991 2.008
0.180 0.181 0.181 0.183
I.965 1.960 1.961 1.980
0.219 0.218 0.218 0.220
1.966 1.957 1.958 1.976
0.190 0.192 0.192 0.194
1.988 1.980 1.980 1.999
0.202 0.203 0.203 0.205
1.991 1.984 1.985 2.002
0.198 0.199 0.199 0.200
1.961 1.958 1.961 1.974
0.195 0.196 0.196 0.197
!Iz 0z2 832 * 2
i i 2 c~~~
100
C.2 Mean 0.199 0.196 0.194
0.081 0.081 0.081
832
50
M.S.E. 0.133 n . 130 0.129
-0.1 M.S.E. Mean 0.151 -0.090 0.149 -0.088 0.147 -0.088
-0.695 -0.691 -0.689
822
200
-0.4 Mean -0.386 -0,381 -0.378
150
200 : 2 532 Gk2
0 .287
-0.096 -0.096
‘I.
c. 394 0.084
:.
0.787
417 REFERENCES Abd-Alla, A.A., and Abouammoh, A.M., 1982. A c o m p a r a t i v e s t u d y on e s t i m a t i o n o f parameters o f a Markovian process-1. Time S e r i e s Methods i n H y d r o s c i ences, E d i t o r s : E.H. El-Shaarawi and S.R. E s t e r b y . 1982. S c i e n t i f i c P u b l i s h i n g Company, Amsterdam. and J e n k i n s , G.M., 1970. Time s e r i e s a n a l y s i s f o r e c a s t i n g and Box, G.E.P., c o n t r o l . Holden-Day, San F r a n c i s c o . F u l l e r , W.A., 1976. I n t r o d u c t i o n t o s t a t i s t i c a l t i m e s e r i e s . John W i l e y & Sons I n c . , New York. 1980. A n o t e on maximum l i k e l i h o o d e s t i m a t i o n f o r t h e f i r s t Hasza, D.P., o r d e r a u t o r e g r e s s i v e process. Com. S t a t i s t . Theor. Math. A 9 ( 1 3 ) , 14131415.
A SYSTEMS APPROACH TO COMPUTERIZING DATA ACQUISITION
BY THOMAS R . CLUNE
Abstract: The problems o f c o m p u t e r i z i n g an e s t a b l i s h e d l a b o r a t o r y procedure a r e i e g l o n and h i g h l y s p e c i f i c . Even i n s u c c e s s f u l c o m p u t e r i z a t i o n p r o J e c t s , t h e s e problems te n d t o be d e a l t w i t h on an ad hoc b a s i s as t h e y a r i s e . This paper a t t e m p t s t o p r e s e n t a s y s t e m a t i c o v e r v i e w o f t h e a u to m a ti n g process, so t h a t c o m p u t e r i z a t i o n may be a ch i e ve d I n an o r d e r l y manner, a c c o r d i n g t o s p e c l f i c a t l o n . it is necessary t o c o n s i d e r a g r e a t deal o f d e t a l l i n d e s i g n i n g a c o m p u te r l z e d i n s t a l l a t i o n . i n t h i s pa p e r, t h e d e t a i l i s always c o n s i d e r e d from t h e p e r s p e c t i v e o f how i t a f f e c t s t h e o v e r a l l performance o f t h e a c q u i s i t i o n system. 1 . 0 INTRODUCTION Uni I k e most i n s t r u m e n t a t i o n purchases I n a l a b o r a t o r y , mlcrocomputers a r e a c q u l r e d f o r d a t a a c q u l s i t l o n most commonly o u t o f a g e n eral d e s i r e t o modernize and s l m p l l f y t h e r u n n i n g o f the lab, r a t h e r than t o perform a s p e c i f i c , w e l l - d e f i n e d task f o r whlch t h e computer i s underst ood t o be i d e a l l y s u i t e d . As a r e s u l t , most a t t e m p t s t o c o m p u t e r i z e l a b o r a t o r y f u n c t l o n s end i n a t least p a r t i a l f a i l u r e . The t i m e r e q u i r e d t o c o m p u t e r l z e an e s t a b i l s h e d . w e l l - u n d e r s t o o d procedure i s enormous. T y p l c a l development ti me s range from S I X months t o a y e a r . i t i s thus desirable t o e s t a b l i s h u n e q u i v o c a l l y t h a t t h e need e x l s t s f o r c o m p u t e r l z a t l o n b e f o r e t h e p r o J e c t I s undert aken. There a r e t h r e e major sources o f f a i l u r e I n au toma tl on p r o J e c t s . The f i r s t stems from u n d e r e s t i m a t i n g t h e amount o f d i g l t a l i n f o r m a t i o n necessary t o repro d u ce an an a l o g e x p e r l m e n t . For example, a s i n g l e sweep o f a d l g l t a i o s c l i i o s c o p e w i l l t y p i c a l l y r e p r e s e n t 2 Kbyt es o f d a t a . Related t o the u n d e r e s t l m a t l o n o f t h e amount o f d a t a i s t h e u n d e r e s t i m a t i o n o f t h e t l m e i t t a k e s t o download t h a t d a t a t o t h e computer. To i l l u s t r a t e : I comput erized a t i m e - r e s o l v e d l a s e r sp e ctrosco p y experiment a t B r a n d e l s U n l v e r s l t y whi ch employed a B i o m a t i o n 8100 waveform d i g i t i z e r and an I B M CS-9000 mi cro co mp u ter. The scan window o f t h e B i o m a t l o n was 20 microse co n d s. B l o m a t i o n s e l l s an IEEE-488 i n t e r f a c e c o n v e r t e r box f o r t h e 8100 t h a t makes i t c o m p a t i b l e w i t h most microcomput ers. However, t h e t h r o u g h p u t on i t would t h u s have r e q u i r e d 2 t h i s box i s 1 Kbyt e/ second. seconds j u s t t o t r a n s f e r each 20-microsecond scan t o t h e computer. We c o u l d n o t a f f o r d t o W al t t h a t l o n g on t h l s ex p e r l m e n t, so I designed a h y b r i d i n t e r f a c e between t h e 8100 and t h e IEEE-488 p o r t o f t h e computer whi ch was a b l e t o f u n c t i o n a t 300 K b y t e s / s e c . I have d e s c r i b e d t h e hardware C31 and s o f t w a r e C51 o f t h i s e xperlment elsewhere. What I want t o p o i n t o u t he re i s t h a t 1 ) t h e computer f u n c t l o n s o f a d a t a a c q u i s i t i o n experiment w i l l o f t e n be t h e slow s t e p , and 2 ) t h e m a n u f a c t u r e r s a r e n o t n e c e s s a r i l y v e r y good a t o p t i m i z l n g t h e comp u teri ze d performance o f t h e i r own i n s t r u m e n t s . Indeed, H ew i e tt-Pa cka rd , which i n v e n t e d t h e IEEE-488, o f f e r s a c q u i s i t i o n r a t e s from i t s d i g i t i z e r s v i a IEEE-488 t h a t a r e e s s e n t i a l l y t h e same as Blomation's. You would n o t expect such p e rforma n ce i f you looked
419 a t the Interface speclflcatlons rather than the instrument's u t l i l z a t i o n o f the interface. The maxlmum r a t e d t h r o u g h p u t on an IEEE-488 I n t e r f a c e i s 1 m e g a b y t e / s e c o n d ! The second k l n d o f m i s u n d e r s t a n d i n g t h a t undermines s u c c e s s f u l c o m p u t e r l z a t l o n m i g h t be c h a r a c t e r l z e d a s t h e b e l i e f t h a t p u t t l n g an A / D b o a r d i n t o a m i c r o c o m p u t e r c r e a t e s a d a t a a c q u l s l t i o n Instrument. I n r e a l i t y , t h e r e I s a g r e a t deal o f e n g l n e e r l n g t h a t goes I n t o a s t a n d - a l o n e i n s t r u m e n t . An A / D b o a r d i s o n l y one s m a l l component o f a commerclal I n s t r u m e n t . If you e l e c t n o t t o pay f o r an i n s t r u m e n t company t o s o l v e y o u r e n g i n e e r i n g p r o b l e m s f o r you, you must be p r e p a r e d t o do t h a t englneerlng yourself. The t h l r d d l f f i c u l t y e n c o u n t e r e d I n c o m p u t e r i z a t l o n stems f r o m t h e d e s l r e t o I n c l u d e u n n e c e s s a r y and h l g h l y complex r e f i n e m e n t s I n t h e system. F o r example, r e a l - t l m e d i s p l a y and a n a l y s i s o f data almost always I n t e r f e r e s w l t h t h e a b i l i t y t o acqulre the data i t s e l f . S l m l l a r l y , t h e d e s l r e t o use t h e data a c q u l s l t l o n computer f o r word p r o c e s s i n g o r d e p a r t m e n t a l bookkeeping as f o r e g r o u n d t a s k s w h l i e t h e system I s c o l l e c t i n g d a t a can J e o p a r d l z e t h e d a t a a c q u i s i t i o n p r o c e s s . T h i s paper w I I I a t t e m p t t o h e l p you d e t e r m i n e whether y o u r a p p l l c a t l o n I s s u i t a b l e f o r c o m p u t e r l z a t l o n , and, I f so, what k l n d o f c o n f l g u r a t l o n you w i l l need. Throughout t h e paper, t h e need f o r a systems a p p r o a c h t o a u t o m a t i o n i s emphasized. 2 . 0 A / D CONVERTERS
The f i r s t p r o c e s s we w I I I c o n s i d e r I s d l g i t l z l n g t h e data. T h i s I s t h e a r e a where y o u r a n a l o g e x p e r i e n c e I s l e a s t a p p l l c a b l e and, u n f o r t u n a t e l y , a l s o t h e a r e a f l i i e d w l t h t h e most f a l s e o r mlsleadlng statements i n t h e popular l i t e r a t u r e . I w i l l d i s c u s s d i g i t i z i n g by u s l n g A I D b o a r d s - - n o t because I b e l l e v e t h a t I t I s t h e b e s t c h o l c e , b u t because I t I s t h e a p p r o a c h most fraught w i t h d l f f l c u l t i e s . I b e l l e v e t h a t stand-alone d i g i t i z e r s make much more sense i n a l a b o r a t o r y t h a n do A/D b o a r d s . N o n e t h e l e s s , t h e r e l a t l v e c o s t o f a waveform d i g i t i z e r and an A / D b o a r d I s such t h a t many p e o p l e a r e t e m p t e d t o s a v e some money by u s i n g t h e A/D board. You c a n v i e w what f o l l o w s a s an argument on why such an a p p r o a c h i s l e s s a t t r a c t i v e t h a n I t may f l r s t a p p e a r . The p r l m a r y q u e s t l o n s a b o u t d i g i t l z l n g d a t a a r e : How f a s t do I need t o sample t h e a n a l o g s t r e a m ; what r e s o l u t i o n i s n e c e s s a r y i n t h e d i g l t i z a t l o n f o r u s e f u l a n a l y s l s ; and what l e v e l o f c o o r d l n a t i o n between d l f f e r e n t s e n s o r s ' r e a d l n g s do I need?
2.1
SAMPLING R A T E S AND ALIASING
Minimum s a m p l i n g r a t e I s I n v a r i a b l y d l s c u s s e d I n t h e I l t e r a t u r e I n conJunctlon w l t h allaslng. A typical presentatlon goes s o m e t h i n g I l k e t h i s : A l l a s l n g I s t h e phenomenon I n w h i c h a h l g h - f r e q u e n c y s i g n a l a p p e a r s t o be a l o w e r - f r e q u e n c y s l g n a l , and I s caused by I n s u f f i c i e n t s a m p l l n g r a t e . The N y q u i s t theorem s t a t e s t h a t t h e s a m p l i n g r a t e s h o u l d be a t l e a s t t w l c e t h e f r e q u e n c y o f t h e f a s t e s t waveform sampled. W h l l e t h i s I s n o t an e x a c t q u o t e o f any a r t i c l e w l t h w h l c h I am f a m i l i a r , t h e c o n t e n t s a r e f u n c t i o n a l l y e q u i v a l e n t t o a l m o s t any p o p u l a r e x p o s i t l o n t h a t you w i l l r e a d on t h e s u b j e c t . I t contains a v a r i e t y o f errors and m l s l e a d l n g i m p l l c a t i o n s . W h i l e t h e f l r s t s e n t e n c e approaches t h e t r u t h , I t r e q u i r e s s l g n i f l c a n t expansion. A l l a s i n g I s a phenomenon t h a t c a n o n l y be e x p r e s s e d r e l a t i v e t o an a n a l y s l s r o u t i n e . I f , f o r example, you a r e
420 a n a l y z i n g d a t a by u s i n g t h e f a s t F o u r i e r t r a n s f o r m (FFT), t h e r e w i l l be c o n d i t i o n s under which t h e t r a n s f o r m i n t r o d u c e s a systematic e r r o r i n t o t h e data. To u n d e rstan d t h e n a t u r e o f t h a t e r r o r , l e t us r e c a l l some o f t h e h i g h l i g h t s o f t h e method. F i r s t , t h e F F T i s a d i s c r e t e method, i . e . , s e p a r a t e p o i n t s o f a ( p r e s u m a b l y ) c o n t i n u o u s st ream a r e sampled and used t o r e c o n s t r u c t t h e f requency components o f t h e c o n t i n u o u s stream. S o f a r , t h e F F T does n o t d i f f e r from any o t h e r d i g i t a l sampling technique. Second, t h e F F T t r e a t s a i l waveforms as b e i n g c o n s t r u c t e d from some c o m b i n a t i o n o f superimposed s i n e waves. T h i r d , N d a t a p o i n t s produce an N/ P-po i n t t r a n s f o r m (frequency-domain o u t p u t ) . The o t h e r h a l f o f t h e p o i n t s a r e thrown away because d i s c r e t e t r a n s f o r m s s i m p l y produce t h e m i r r o r - i m a g e s o f t h e f i r s t N/2 p o i n t s . The same phenomenon t h a t produces t h e redundancy i n d i s c r e t e F o u r i e r t r a n s f o r m s produces a l i a s i n g f o r sampling r a t e s l e s s t h a n t w i c e t h e freq u e n cy o f t h e h i g h e s t - f r e q u e n c y s i g n a l component (because t h e n t h e tran sforme d spectrum o v e r l a p s t h e m i r r o r - i m a g e s p e c t r u m ) . The p o i n t h e re i s t h a t t h e N y q u i s t theorem i s n o t about a l i a s i n g per se, b u t about aliasing in the FFT. F u r t h e r , t h e n o t i o n o f a f requen cy component i n an F F T i s an a b s t r a c t one. The s i n e wave f requen cy o f t h e F F T has l i t t l e t o do w i t h t r a n s i e n t s . i f you have a s i n g l e t r a n s i e n t t h a t you have c o l l e c t e d two p o i n t s on, t h e F F T i s n ' t g o i n g t o g i v e you a m e a n i n g fu l d e s c r i p t i o n o f t h e s i g n a l . R a t h e r , i t i s assumed t h a t t h e t r a n s i e n t i s composed o f s i n u s o i d a i f r e q u e n c i e s t h a t r e p e a t t h r o u g h o u t t h e sampling wlndow and c o n s t r u c t i v e l y i n t e r f e r e t o form t h e observed t r a n s i e n t . Consider that t h e maximum sine-wave component i s i n a square wave ( o r any waveform t h a t i s n o t a s i m p l e s i n e wave). W h i l e t h e commonly made p o i n t t h a t you should always use a low-bandpass f i l t e r w i t h a c u t - o f f fre q u e n cy n o t more t h a n h a l f t h e sampling r a t e f o r FFT a n a l y s i s i s t r u e , i t d o e s n ' t t e l l you what t h e h i g h e s t f req u e n cy sine-wave component t h a t i s s i g n i f i c a n t t o your d a t a i s . i f anyone does n o t c l e a r l y u n d e r s ta n d t h i s p o i n t , I u r g e them t o t r y t h e f o l l o w i n g s i m p l e experiment: g e n e r a t e a 1 KHz square wave, f i l t e r t h a t wave t h r o u g h a 2 KHz low-pass f i l t e r , and d i s p l a y t h e o u t p u t on an osciiloscopt. There i s no s u b s t i t u t e f o r e x p e r i e n c e i n the se things. The p o i n t you must r e c o g n i z e i s t h a t t h e r e i s no r o y a l ro a d t o d e t e r m i n i n g t h e minimum necessary sa mp l i n g r a t e , even i n t h e w e l l - d e f i n e d and t h o r o u g h l y - s t u d i e d r e a l m o f t h e F F T . Wh i l e you can and s h o u l d l e a r n what t h e FFT I i m i t a t l o n s a r e i n t h e o r y , t h e a p p l i c a t i o n t o your experiment depends on an a n a l y s i s o f what your a c t u a l waveforms look l i k e . A l i a s i n g i s n o t a problem u n i q u e t o t h e F F T . You have p r o b a b l y seen cowboy movies i n which wagon wheels appear t o r o t a t e backwards w h i l e t h e wagon moves f o r w a r d s . The cause o f t h i s i s t h a t t h e r a t e o f sampling o f t h e movie camera r e l a t i v e t o t h e s y m m e t r i c a l l y e q u i v a l e n t p o s i t i o n s o f t h e wagon wheel can most d i r e c t l y be i n t e r p r e t e d by our p e r c e p t u a l a p p a r a t u s ' " a l g o r i t h m " by assuming t h a t t h e wheel i s s p i n n i n g I n r e v e r s e . You can r e a d i l y reproduce t h i s k i n d o f phenomenon i n t h e l a b o r a t o r y I f you have a waveform d i g i t i z e r and a s i n e wave generator. Set t h e sampling r a t e a t about 1 / 1 0 t h e g e n e r a t o r ' s f re q u e n c y , t h e n f i n e - t u n e t h e f requen cy g e n e r a t o r u n t i l you see what appears t o be a p u r e s i n e wave. What i s i n t e r e s t i n g about t h i s e x p e r i m ent i s t h a t t h i s k i n d o f a l i a s i n g i s v e r y frequency-sensitive. A v e r y small ad j u stmen t i n t h e frequency g e n e r a t o r w i l l make t h e d i g i t i z e d o u t p u t l o o k l i k e garbage. The F F T , on t h e o t h e r hand, w i l l always f i n d some s e t o f f r e q u e n c i e s that the data w i l l f i t . That i s , any t i m e you sample above t h e
N y q u i s t fr e q u ency and a n a l y z e t h e d a t a w i t h an FFT, you w i l l produce m e a n i n g f u l b u t f a l s e o u t p u t . For t h i s reason, band-pass f l l t e r i n g i s more i m p o r t a n t w i t h FFT t h a n w i t h most o t h e r a n a l y t i c a l methods. W i t h methods o f a n a l y s i s o t h e r t h a n an FFT, a i l a s i n g may o b t a i n o n l y f o r a v e r y s e l e c t number o f v e r y narrow bands. Thls f a c t i s e x p l o i t e d i n some A I D systems t o a l l o w you t o t r a c k c o n t i n u o u s s i g n a l s t h a t a r e a t a much h i g h e r fre q u e n cy t h a n t h e sam p l i n g r a t e . For example, t h e Hewl ett-Pa cka rd company s e l l s an in e x p e n s i v e d i g i t i z e r t h a t samples a t 25,000 sampies/second b u t w l i i t r a c k a c o n t i n u o u s s i g n a l o f up t o 5 M H z . i n order t o ac c o m p l i s h t h i s , t h e d l g i t i z e r i n c o r p o r a t e s a v e r y f a s t sample-and-hold c l r c u i t t h a t samples t h e s l g n a i a t random i n t e r v a l s f o r very b r i e f periods. I must c o n f e s s t h a t I d o n ' t know what t h e a l g o r i t h m i s f o r r e c o n s t r u c t i n g t h e waveform, b u t I know enough t o be w o r r l e d about i t . The f i r s t problem o f such a sam p i l n g method I s t h a t t h e waveform must be r i g o r o u s l y p e r i o d i c . A damped s i n e wave, f o r example, canno t be a n a l y z e d by such r a r e and random s a mpling. The second problem i s t h e o c c u r r e n c e o f t h a t " v e r y s e l e c t number o f v e r y narrow bands" o f a l i a s i n g . I have n o t seen t h e HP d i g i t l z e r m a l f u n c t i o n , b u t i would expect t h a t any such sampling t e c h n i q u e would have t o have a worst-case s e t o f i n p u t s under which I t would. The p o i n t h e r e I s n o t t h a t H e w l e tt- P a c k a rd i s s e i i l n g a f a u l t y p r o d u c t , b u t t h a t you must know t h e I i m i t a t i o n s o f your equipment and t h a t tho se I i m i t a t i o n s may n o t be i mmediat ely o b v i o u s . We a r e s t l i i l e f t w i t h o u t an answer as t o what t h e minimum s a m p ling r a t e r e q u i r e d f o r an e xp e ri me n t would be. Answering t h i s q u e s t i o n i n t h e a b s t r a c t I s always dangerous. However, i w i l l suggest g u i d e l i n e s t h a t I b e i l e v e t o be re a s o n a b l e . F i r s t , t h e problem o f a l i a s i n g s h o u l d never a r i s e i n an a c t u a l experiment. You must sample a t a r a t e much h i g h e r tha n A necessary t o a v o i d a l i a s i n g i n o r d e r t o have r e l i a b l e d a t a . good r u l e o f thumb i s t h a t no f ormal a n a l y t i c a l t e c h n i q u e i s b e t t e r t h a n your eye. I f you c a n ' t t e l l what you want t o know from a p l o t o f t h e raw d l g i t i z e d d a t a , a nu me ri ca l method o f anaiysls probably c a n ' t e i t h e r . As you read t h e l i t e r a t u r e on sample r a t e s , you w I i I d i s c o v e r t h a t t h e r e a r e two d i f f e r e n t s c h o o l s o f t h o u g h t on how many samples per p e r i o d i s enough. One, t h e computer s c i e n c e s c h o o l , argues t h a t t h e fewest number o f p o i n t s t h a t w i l l work i s t h e b e s t number o f p o i n t s . T h e i r concern i s t h a t more d a t a means more a n a l y s i s t i m e on t h e computer. I am r e a s o n a b l y c o n f l d e n t t h a t most s c i e n t i s t s w i l l n o t s h a r e t h i s p e r s p e c t l v e . Clearly. t h e a p p r o p r i a t e maximum number o f p o l n t s i s t h e maximum t h a t you can g e t . I f you have t o w a i t two hours f o r t h e a n a i y s l s t o be completed, t h e p a y o f f i s b e t t e r a n a l y s i s . As l o n g as you have th'e t i m e (and memory), t h e r e i s no b e t t e r way t o use I t t h a n w a i t i n g f o r good r e s u l t s . i t i s o f t e n supposed t h a t one method o f a n a l y s i s i s b e t t e r t h a n a n o t h e r , i n t h e sense t h a t i t w i l l g e n e r a l l y g i v e t h e same p r e c l s l o n w i t h fewer d a t a p o i n t s t h a n a n o t h e r . i n my experience, t h e r e i s l i t t l e d l f f e r e n c e I n t h e e f f i c i e n c y ( a l t h o u g h a l o t o f d i f f e r e n c e i n t h e a p p l i c a b i l i t y ) o f most common methods o f a n a l y s i s . The one method t h a t comes t o mind as g e n e r a l l y l e s s e f f i c i e n t t h a n most i s t h e moving average, o r bo x c a r , method. The one method t h a t i s al w a ys t h e b e s t f o r da ta smoothing a p p l i c a t i o n s ( b u t n o t f o r reasons o f e f f i c i e n c y ) i s s i g n a l averaging. There i s no mat he ma tl cai s u b s t i t u t e f o r d a t a . A good q u i c k o v e r v i e w o f methods o f a n a l y s i s i s [ l o ] .
422 2 . 2 RESOLUT I ON D i s c u s s l n g mlnlmum r e s o l u t i o n i s r a t h e r l i k e t e l l i n g a good-news, bad-news Joke. The good news i s t h a t you need fewer b i t s r e s o i u t l o n t h a n you t h l n k t h a t you do. The bad news i s t h a t you GET fewer b l t s t h a n you t h i n k you do. F l r s t , t h e good news. People commonly assume t h a t t h e minlmum r e s o l u t i o n needed f o r r e s e a r c h - q u a l l t y work i s 1 2 b i t s . I n r e a l i t y , p u b l i s h a b l e - q u a l l t y r e s e a r c h I s s t i l l done w l t h high-quality 6-bit d i g l t i z e r s . A t B r a n d e i s U n l v e r s i t y , one res e a r c h e r i n c o o r d l n a t l o n complexes does q u a n t l t a t i v e work on log-llnear data w i t h a 6 - b i t d l g i t l z e r ! I w o u l d n ' t recommend t h a t as an I d e a l number, b u t i t w I i I s u f f i c e f o r a l o t o f work. F u r t h e r , most work w i l l n o t r e q u i r e more t h a n 8 - b i t s o f resolutlon--assuming t h a t you r e a l l y have a f u l l 8 b i t s t o work with. And t h e r e I s a r e a l advantage t o n o t u s i n g a d i g l t i z e r l a r g e r t h a n 8 - b i t s I f you d o n ' t have t o . Many computers t r a n s f e r data a byte (8 b i t s ) a t a tlme. I f you use a 10 o r 1 2 - b i t c o n v e r t e r , you r e q u i r e two t r a n s f e r s per r e a d l n g . i f you use an 8 - b l t c o n v e r t e r , you can t r a n s f e r t w i c e as many samples i n t h e same p e r i o d o f t i m e . The r a t e o f t r a n s f e r o f d a t a I s o f t e n t h e i i m l t i n g f a c t o r I n how many samples per second you can make w i t h your A / D c o n v e r t e r . You w i l l g e n e r a l l y be b e t t e r serve d making more c o n v e r s l o n s per second t h a n more p r e c i s e c o n v e r s l o n s . The a v e r a g i n g o f n o l s e t h a t comes from a d d i t l o n a i r e a d l n g s n o r m a l l y w I i I be more u s e f u l t h a n h a v l n g a v e r y a c c u r a t e r e c o r d o f t h e noise. On t h e o t h e r hand, s i n c e you w i l l have t o Walt t h e same l e n g t h o f t l m e f o r a 6 - b l t t r a n s f e r and an 8 - b i t t r a n s f e r on a computer w i t h an 8 - b l t bus, you m i g h t as w e l l g e t t h e added resolutlon. S l m l i a r l y , I f your computer has a 1 6 - b l t bus, you m i g h t as w e l l use a 1 6 - b i t c o n v e r t e r ( u n l e s s t h e b o a rd w i l l sup p o r t two 8 - b i t t r a n s f e r s a t o n c e ) . The p o i n t I s t o a v o l d p a y i n g a speed p e n a l t y f o r r e s o l u t l o n , n o t t o a v o i d r e s o l u t i o n a t any c o s t . Now f o r t h e bad news. i f you rea d t h e m a n u f a c t u r e r ' s spec sheet on an A / D board, t h e r e s o l u t l o n w l I I i n v a r l a b l y be r e p o r t e d as + / - 1 LSB ( l e a s t s l g n l f i c a n t b i t ) or b e t t e r . For example, a 1 2 - b i t board t h a t i s designe d t o r e a d 0-10 V commonly w I I i be s a i d t o have a " r e s o l u t i o n " o f 0.0049 V . However, t h e e f f e c t i v e r e s o i u t l o n o f t h e board w i l l be o r d e r s o f magnitude l e s s t h a n t h a t I n most a p p i i c a t l o n s . C l e a r l y , what you c a r e about i s what t h e board w l i i r e a l l y do. I t i s extremely r a r e f o r I n my e x p e r i e n c e , i t an A / D b o a r d manuf act urer t o t e l l you t h a t . I s e x t r e m e l y r a r e f o r an A / D board m a n u f a c t u r e r t o even know what t h e e f f e c t l v e r e s o l u t i o n o f h i s board I s . W h l l e many t h i n g s w l i i a f f e c t t h e r e s o l u t i o n o f an A / D bo a r d , t h e r e I s o n l y one t h a t you can r e a d l i y do a n y t h i n g a b o u t. Boards t h a t p l u g I n t o a c o m p u t e r ' s expa n sl o n s l o t a r e s u b J e c t t o t h e e l e c t r o m a g n e t i c f l e i d o f t h e c o m p u t e r ' s power t r a n s f o r m e r . T h l s e f f e c t w l i i be e s p e c l a i i y pronounced i f you a r e t r y i n g t o rea d s m a l l v o l t a g e s , as w l t h a thermocouple. It I s also s l g n i f l c a n t i y a f f e c t e d by t h e c h o i c e o f s l o t on t h e computer. In ge n e r a l w i t h t h i s k i n d o f board, you s h o u l d use t h e s l o t f a r t h e s t from t h e power s u p p l y f o r t h e A / D b o a r d . To a v o l d t h i s e f f e c t , some A / D m a n u f a c t u r e r s p l a c e t h e l r A / D c l r c u i t r y I n a box e x t e r n a l t o t h e computer. I f t h e box does n o t have i t s own power s u p p l y , t h i s can be an e f f e c t i v e s t r a t e g y . I f i t does, you must c o n s i d e r whether t h e e x t e r n a l box was d e sl g n e d t o s h l e l d t h e A / D from e x t e r n a l f i e l d s o r s l m p i y t o p r o v l d e more r e a l e s t a t e f o r the A/D product.
423 The s l n g i e l a r g e s t source o f e r r o r i n A I D comparisons I s not a function o f f a u l t y electronics, but o f deslgn cholce. Most I ne x p e n s i v e A / D boards do n o t use sample-and-hold c l r c u l t r y on t h e a n a l o g I n p u t end. i n o r d e r t o und e rstan d what e f f e c t l v e r e s o l u t i o n can be expect ed from a board , you must u n d e rstan d what t h e consequences o f t h i s d e s l g n c h o i c e a r e . I w i l l expialn t h i s by example. Conslder a 1 2 - b l t A / D comparator t h a t can make 100,000 c o n v ersions/ second and i s s e t f o r 0-10 V measurements. For t h e sake o f s i m p l i c i t y , l e t us assume t h a t we a r e t r y l n g t o t r a c k a t r i a n g u l a r wave w l t h p o l e - t o - p o l e v o l t a g e swing o f 10 V , i . e . , a wave t h a t c o v e r s t h e f u l l s c a l e ( f s ) o f v a l u e s f o r t h e co m p a r a to r . Comparators u s u a l l y work by " s u c c e s s l v e a p p r o x i m a t l o n , " which means t h a t t h e y compare t h e s i g n a l t o 5 V , and i f t h e s i g n a l i s l a r g e r , s e t t h e most s l g n l f i c a n t b i t t o 1 , t h e n s e t t h e n e x t b i t and so on t h r o u g h a i l t w e l v e b l t s . One f u l l comparison t a k e s a p p r o x i m a t e l y 1/100,000 o f a second. i f t h e i n p u t s i g n a l I s a l l o w e d t o change as we a r e comparing i t , t h e I n p u t v o l t a g e must change no more t h a n 0.0049 V ( t h e v a l u e o f t h e LSB i n t h l s example) i n 1/100,000 o f a second i n o r d e r t o have 1 2 - b i t r e s o i u t l o n , assuming i d e a l e l e c t r o n i c s . A t r i a n g u l a r wave o f 10 V goes t h r o u g h a 1 0 V change I n 1 / 2 i t s period. Thus, t h e maxlmum fr e q u ency we c o u l d t r a c k w i t h 1 2 - b i t r e s o l u t i o n i s : 100,000 r e a d i n g s / s e c / (2048 d l v i s i o n s / f s swing * 2 f s s w l n g s / c y c i e ) = a p p r o x i m a t e l y 2 5 Hz. Make s u r e t h a t you un d e r s ta n d t h i s p o i n t - - i t I s seldom r e c o g n l z e d , b u t a b s o l u t e l y c r l t l c a i t o e v a l u a t i n g t h e l i m i t s o f p r e c l s l o n f o r an A / D board of t h i s type. I f you o n l y need 8 - b l t r e s o l u t i o n , p l u g 256 into t h e f o r m u l a i n s t e a d o f 2048 and g r l n d i t o u t . Understand t h a t t h l s v a l u e i s an i d e a l i i m l t o f p r e c i s i o n . I t assumes t h e b e s t p o s s i b l e waveform, changlng w l t h a b s o l u t e l i n e a r i t y , and I t assumes i d e a l e i e c t r o n l c s on t h e A / D b o a r d . To I n c r e a s e t h e r e s o l u t l o n o f t h e A / D a t h l g h e r speeds, some companles use sample-and-hold ( S / H ) c i r c u i t r y . What an S / H does i s t a k e a q u i c k r e a d l n g o f t h e an a l o g v o l t a g e and s t o r e i t u n t i l a c o n v e r s l o n can be made. What h o l d s t h e v o l t a g e i s a simple c a p a c l t o r . The e r r o r s t o whlch t h i s k l n d o f c l r c u l t I s h e l r a r e t h u s t h o s e a s s o c l a t e d w l t h any R C c i r c u i t . L e t us b r l e f i y I n d i c a t e t h e major p o t e n t i a l problems. F i r s t , the c a p a c i t o r may leak ( d r o o p ) , t h a t i s , I t may s l o w l y l o s e a charge that I t i s trying t o store. The second p ro b l e m I s a s s o c l a t e d w i t h charglng time. The RC c l r c u l t must be exposed t o t h e analog v o l t a g e f o r a r e p r o d u c l b i e p e r i o d o f t l m e a t r e g u l a r l y spaced intervals. The d e v l a t l o n from r e g u l a r i t y i s c a l l e d J l t t e r . W h i l e i t i s c h a r g i n g , t h e analog s i g n a l must be e s s e n t l a i i y c o n s t a n t , o r t h e same average v a l u e f o r a s i g n a l t h a t I s i n c r e a s i n g and a s l g n a i t h a t I s d e c r e a s l n g w i l l n o t be s t o r e d as Next, t h e equal. T h i s phenomenon I s c a l l e d h y s t e r e s i s . comparator must n o t e x e r t s l g n l f i c a n t load on t h e RC c i r c u i t w h l i e i t I s maklng i t s comparison, o r e l s e t h e RC c i r c u i t w l i i dlscharge w h i l e being read. F i n a l l y , t h e S / H c l r c u l t must have adequate t l m e t o d i s c h a r g e t o below 1 LSB V b e f o r e re-sa mp i l n g , o r t h e v o l t a g e r e a d w i l l be p a r t i a l l y due t o a r e s l d u a l ch a rg e from t h e l a s t sample. T h i s phenomenon i s c a l l e d memory. Most o f t h e s e problems s h o u l d be t h e concern o f t h e A / D m a n u f a c t u r e r , so assuming t h a t he has been c a r e f u l i n h l s b o a rd d e s l g n ( a heady a s s u m p t i o n ) , you need g l v e t hought o n l y t o t h e RC t l m e c o n s t a n t o f the S/H c i r c u i t . B e f o r e d l s c u s s l n g how t o e v a l u a t e t h e t l m e c o n s t a n t , I want t o p o i n t o u t a general f a c t o f I n s t r u m e n t a l l i f e . I f you do n o t need a S/H, e . g . , I f you a r e o n l y t r y l n g t o t r a c k a 25 Hz wave w l t h a 1 2 - b i t 100,000 Hz board, you a r e b e t t e r o f f w l t h o u t
424 i t . A i l e l e c t r o n i c s i n t r o d u c e e r r o r s o f t h e i r own i n t o your data. The fewer e l e c t r o n i c gadgets between you and your d a t a , the better. Assuming t h a t you need t o t r a c k a f a s t e r s i g n a l t h a n can be accomplished w i t h o u t t h e S / H c i r c u i t r y , how do you c a l c u l a t e i t s response t i m e ? Again, t h e method w i l l be I l l u s t r a t e d by example. Assume i d e a l e l e c t r o n i c s , a 10 V t r i a n g u l a r wave, and a 1 2 - b i t . 100.000 Hz c o n v e r t e r . The t i m e c o n s t a n t f o r an RC c i r c u i t i s simply R times C . St eve C i a r c i a o f B Y T E was k i n d enough t o p o l n t o u t t o me t h a t most S / H c l r c u i t s a r e CMOS d e v i c e s , so t h e i r r e s i s t a n c e s w i l l be about 400 ohms. The c a p a c i t o r v a l u e can be read from t h e bo a rd i t s e l f . Typical v a l u e s a r e i n t h e .001 t o .01 microFa ra d ran g e . L e t ' s assume t h a t our S / H c a p a c i t o r v a l u e i s .01 m i c r o f a r a d s . Then t h e t i m e c o n s t a n t w i l l equal 4 0 0 t i m e s 1x10A-8, o r 4x10A-6 seconds. This value represents t h e time i t takes f o r t h e c i r c u i t t o g a i n o r l o s e 63.2% o f i t s charge. T o d e t e r m i n e t h e t i m e t o dro p below 1 LSB, we m u l t i p l y : 4x10A-6 t i m e s i n ( 2 0 4 8 ) . w hi ch g i v e s a t i m e o f a p p r o x i m a t e l y 3x10A-5 seconds. That I s , t h e S / H can ch a rg e o r d i s c h a r g e f u l l y a p p r o x i m a t e l y 32,000 t i m e s a second. Remember ( T h i s Is a b i t o f an t h a t i t must do b o t h f o r each r e a d i n g . oversimplification. The c a p a c i t o r does n o t have t o ch a rg e f o r t h i s long. However, i f we use t h i s a p p r o x i m a t i o n , we do n o t have t o concern o u r s e l v e s w i t h h y s t e r e s i s . ) S o t h e S / H c i r c u i t r y can complete a f u l l c h a r g e / d l s c h a r g e c y c l e t o s u p p o r t 1 2 - b i t p r e c l s l o n 16,000 t i m e s a second. Of c o u r s e , t h i s does n o t include the time for the actual conversion. However, even when we want 1 2 - b i t p r e c i s i o n , we seldom c a r e about h a v i n g 4096 r e a d i n g s per wave, so we can n e g l e c t t h e c o n v e r s i o n t i m e as i r r e l e v a n t t o our purposes. i n essence, an S / H a c t s as an i n t e g r a t i n g c i r c u i t and i n t r o d u c e s t h e k i n d o f smoothing you would n o r m a l l y expect o f such c i r c u i t r y . As me n ti on e d above, you w i l l u s u a l l y want a t l e a s t 10 o r 1 2 samples per p e r i o d , so t h i s h y p o t h e t i c a l S / H would p r o v i d e between one and two o r d e r s o f magnitude Improvement over a f r e e - r u n n i n g c o n v e r t e r . Again, I emphasize t h a t t h e s e a r e c a l c u l a t i o n s f o r an i d e a l system. The p o i n t t h a t needs t o be r e c o g n i z e d i s t h a t a 100.000 sampie/second c o n v e r t e r i s n o t designed t o make 100.000 c o n v e r s i o n s a second. A good d i s c u s s i o n o f S / H c i r c u i t r y and many o t h e r a s p e c t s o f A / D c o n v e r s i o n can be found i n [ a ] .
2 . 3 COORDINATION AND CONTROL There i s more t h a n one reason t o use S / H c l r c u i t r y . Besides i n c r e a s i n g t h e p r e c i s i o n o f t h e c o n v e r s i o n on f a s t s i g n a l s , i t can be used t o c o o r d i n a t e r e a d i n g s . T y p i c a l l y , we want t o c o l l e c t c o r r e l a t e d d a t a on two o r more sensors i n an e x p e r i m e n t . For example, we may want t o measure te m p erat ure v e r s u s p r e s s u r e f o r a system. A / D boards appear t o o f f e r a c o n v e n i e n t way o f d o i n g t h i s . T y p i c a l l y , the y w i l l p r o v i d e 16 analog i n p u t s on one b o a rd . S u r e l y , we can s i m p l y use two l i n e s o f t h e A / D board and o b t a i n o u r c o r r e l a t e d reading. As you p r o b a b l y suspect by now, t h e answer I s , " n o t necessar i I y " A / D boards g e n e r a l l y use a m u l t i p l e x e r t o rea d t h e 1 6 different lines. What t h a t means i s t h a t one comparator s e q u e n t i a l l y s e r v i c e s each o f t h e ( u p t o ) 16 l i n e s t h a t you use i n an e x p e r i ment . Thus, t h e r e a d i n g s w i l l never be si mul tan e o u s. F u r t h e r , t h e y w i l l n o t even be as c l o s e l y spaced as t h e comparator c o n v e r s i o n r a t e . To see why, we must look a t how an A / D board h a ndles d a t a .
.
425 Once t h e comparator has made an A / D c o n v e r s i o n , I t passes t h e conversion through t o t h e c i r c u i t r y t h a t presents t h e d l g l t l z e d I n f o r m a t i o n t o an l n p u t / o u t p u t ( i / O ) p o r t on t h e computer. The computer must p l c k up t h e r e a d l n g a t t h e i / O p o r t b e f o r e a new r e a d l n g can be made by t h e co mp a ra tor. I f t h l s were n o t done, t h e r e a d l n g f o r t h e n e x t A / D l i n e would o v e r w r i t e t h e l a s t r e a d l n g , and you would n o t know w hl ch l i n e a r e a d l n g was from. Most o f t e n , t h e computer I / O f u n c t i o n i s t h e slow s t e p I n A/D operation. I w I I i have more t o say on t h i s when we d i s c u s s t h e computer, b u t f o r now we w i l l c o n t e n t o u r s e l v e s w l t h t h e r e c o g n i t i o n t h a t on an IBM PC, t h e maximum r e a d i n g r a t e f o r i / O i s a p p r o x l m a t e i y 100,000 byt es/ second u s l n g DMA. Since t h e A / D w a l t s f o r t h e d a t a t o be read b e f o r e b e g i n n l n g a new c o n v e r s i o n , t h e maxlmum t h r o u g h p u t w l t h a 100 KHz c o n v e r t e r i s a p p r o x l m a t e i y 50,000 b y t e s / s e c o n d . W l t h a 1 2 - b i t comp a ra tor, t h a t means t h a t we cannot r e a d more t h a n 25,000 samples/second. An Apple I I w I I I be even s l o w e r . Thus, t h e c l o s e s t t o s i m u l t a n e i t y t h a t we c o u l d g e t w l t h two A / D I l n e s and I d e a l components I s a 40-mlcrosecond separation. I f we wanted 1000 c o r r e l a t e d d a t a p a i r s / s e c o n d , t h i s v a l u e would r e p r e s e n t a mlnlmum o f a 4% s y s t e m a t i c e r r o r i n t h e tlme axis. Again, t h l s v a l u e I s a be st-case number. Most programming you do w i l l n o t be o p t i m a l . I f you program i n a h i g h - l e v e l language, t h e t i m e 1/0 f u n c t i o n s t a k e t o e xe cu te may be two o r d e r s o f magnitude g r e a t e r t h a n t h e o p t l m a l r a t e . One way t o m i n l m i z e t h l s e r r o r I s w l t h S / H c l r c u l t r y t h a t samples a l l channels a t once and h o l d s each u n t i l t h e c o n v e r s i o n I s c o m p l e te d . The o n l y problems w i t h t h i s approach a r e t h e problems t h a t we have i n d i c a t e d b e f o r e w i t h S / H c l r c u l t s . in g e n e r a l , I t i s a good, c l e a n answer t o t h e problem. Another s o l u t i o n p r o v i d e d by some o f t h e more expenslve A / D p r o d u c t s i s on-board s t o r a g e o f c o n v e r t e d d a t a . These p r o d u c t s f u n c t i o n r a t h e r I l k e low-end wave-form d l g i t i z e r s . They w i l l s t o r e a "sweep" o f a few thousand samples and download t h e e n t l r e d a t a s e t t o t h e computer a f t e r d a t a c o l l e c t i o n . Thls approach I s n o t bad, b u t t h e l e v e l o f c o o r d l n a t i o n I s s t i l l l i m i t e d by t h e r a t e o f c o n v e r s l o n o f t h e A / D co mp a ra tor. Wh i l e you may s h r i n k t h e d e l a y between c o n v e r s i o n s , some d e l a y I s s t l I I there. The major v i r t u e s o f o f f - b o a r d memory a r e t h a t i t can I n c r e a s e r e a d i n g r a t e s and, I n c o n J u n c t l o n w l t h an o f f - b o a r d t i m e r , can Improve t h e r e p r o d u c l b i i l t y o f c o n v e r s i o n i n t e r v a l s as we s h a l l d i s c o v e r p r e s e n t l y . A point worth mentionlng w i t h respect t o control of com p u te r i z e d d a t a a c q u i s i t i o n i s t h a t A / D bo a rd t r l g g e r s a r e n o t l i k e oscilloscope trlggers. Some exp e ri me n ts ca n n o t be done on A / D b o a r d s because o f t h i s , so I want t o e x p l a i n how th e se t r i g g e r s work. O s c i l l o s c o p e t r i g g e r s a l l o w you t o s e t t h e v o l t a g e l e v e l and d i r e c t i o n o f m o t i o n o f a s l g n a i . For example, you can t r l g g e r t h e scope on t h e f a i l l n g edge o f a 3 V s i g n a l . i n some e x p e riment s, t h i s c a p a b l l i t y i s v e r y i m p o r t a n t . U n f o r t u n a t e l y , A / D board e x t e r n a l t r i g g e r s do n o t have t h l s capability. The t ' r i g g e r i n g i s e f f e c t e d by v o l t a g e l e v e l o n l y . The d i r e c t l o n I s n o t s e l e c t a b l e . C l o s e l y r e l a t e d t o t h e s e issues i s t h e m a t t e r o f c o o r d l n a t l n g t r l g g e r l n g o f simult aneou s e v e n t s . There a r e many ways you can do t h i s , i n c l u d i n g : by o u t p u t t i n g a v o l t a g e on a D / A l i n e , s e t t i n g a d l g i t a i i / O l i n e h i g h , o r O u t p u t t i n g a command to a stand-alone instrument through a d i g l t a i i n t e r f a c e . The q u e s t l o n I s how much c o n t r o l you need. What makes c o n t r o l d l f f i c u i t I s t h a t you never know what sl m u i ta n e o u s means u n t i l you s e t up your e x p e r i m e n t . The e l e c t r o n i c s o f each i n s t r u m e n t and t h e c a b l e l e n g t h s o f t h e
426 p a r t i c u l a r s e t u p a f f e c t c o o r d i n a t i o n i n ways t h a t a r e b e s t de te r m i n e d e x p e r i m e n t a l l y . T h e r e f o r e , you want t o be a b l e t o f i n e - t u n e your c o n t r o l from t h e i n s t r u m e n t s th e mse l ve s. For example, t h e p r e t r i g g e r and delayed t r i g g e r f u n c t i o n s on your waveform d i g i t i z e r s i m p l i f y c o o r d i n a t i o n tre me n d o u sl y. Some A / D packages s u p p o r t delayed t r i g g e r i n g b u t , t o t h e b e s t o f my knowledge, none s u p p o r t p r e t r i g g e r i n g . 3 . 0 THE COMPUTER
The c o n s i d e r a t i o n s t h a t go i n t o what computer c o n f i g u r a t i o n you w i l l need I n c l u d e : c h o i c e o f i n t e r f a c e t o t h e d i g i t i z i n g equipment, s u p p o r t c h i p s you w i l l need, and language you w i l l use f o r programming. The c h o i c e o f i n t e r f a c e i s p r o b a b l y t h e most i m p o r t a n t , so we w i l l b e g i n w i t h t h a t . 3.1
INTERFACES
When you d e c i d e t o comput erize a s u c c e s s f u l exp e ri me n t (and you s h o u l d never t r y t o c o m p u t e r i z e an e xp e ri me n t t h a t you d o n ' t a l r e a d y f u l l y u n d e r s t a n d ) , t h e f i r s t t h i n g you s h o u l d look f o r i s ways t o use t h e equipment you a l r e a d y have. The reasons f o r t h i s a r e obvious. F i r s t , you a l r e a d y u n d e rstan d t h e equipment and know t h a t i t w i l l do your Jo b . And second, you have a l r e a d y bought t h a t equipment, so you can save money by u s i n g what you have. You s h o u l d b e g i n by d e t e r m i n i n g whether t h e i n s t r u m e n t you a r e u s i n g i s equipped w i t h an i n t e r f a c e a l r e a d y . Some o l d e r IEEE-488, o r and many newer i n s t r u m e n t s w i l l have an R S - 2 3 2 , C e n t r o n i c s p o r t on them as s t a n d a r d equipment. i f you a r e l ucky enough t o be b l e s s e d w i t h such an i n s t r u m e n t , your r o u t e t o c o m p u t e r i z i n g has been decided f o r you. i f your i n s t r u m e n t does n o t have a d i g i t a l p o r t on i t , f i n d o u t i f t h e manuf act urer s t i l l s u p p o r t s t h e model. i f so, you may be a b l e t o r e t r o f i t a d i g i t a l p o r t t o i t . Again, your r o u t e t o computerizing i s then c l e a r . i f n e i t h e r o f t h e s e c o n d i t i o n s o b t a i n s , your t a s k i s more difficult. You s h o u l d b e g i n by r e a d i n g t h e manual o f t h e Instrument. What you a r e l o o k i n g f o r i s a s i m p l e e n t r y f o r an interface. For example, i f your i n s t r u m e n t has a N i x i e - t u b e d i s p l a y , you may be a b l e t o w i r e up a bi n a ry-co d e d decimal ( B C D ) i n t e r f a c e t o t h e computer. Remember t h a t , i f t h e i n s t r u m e n t has a d i g i t a l r e a d o u t , i t has d i g i t i z e d t h e d a t a a t some p o i n t . Your t a s k I s t o f i n d o u t where and d e t e r m i n e whether t h e d i g i t i z i n g code i s s u f f i c i e n t l y c l o s e t o a s t a n d a r d t o p e r m i t you t o Use an off-the-shelf interface. i n o r d e r t o approach t h i s t a s k i n t e l l i g e n t l y , you need t o know what t h e s t a n d a r d D / D ( d i g i t a l t o d i g i t a l ) i n t e r f a c i n g optidns are. There i s an e x c e l l e n t o v e r v i e w o f t h e v a r i o u s f l a v o r s o f i n t e r f a c i n g boards t h a t was r u n as a s i x - p a r t S e r i e s i n BYTE a c o u p l e o f y e a r s ago [ 8 ] . T h i s has been r e v i s e d and r e p r i n t e d as a book t h a t s h o u l d be easy t o o b t a i n C91. Read one o r t h e o t h e r o f t hese b e f o r e you peru se t h e i n s t r u m e n t manual. You w o n ' t be i n a p o s i t i o n t o a c t u a l l y do t h e i n t e r f a c i n g from t h i s s e r i e s , b u t you w i l l know whether an i n t e r f a c e may be a p p l i c a b l e t o your t a s k . Once you have s e l e c t e d a s t r a t e g y , you can r e s e a r c h t h e d e t a i l s o f t h e i n t e r f a c e t o co mp l e te t h e t a s k . You s h o u l d a l s o r e a l i z e t h a t n o t a i l computers s u p p o r t a i l interfaces. Make s u r e t h a t t h e computer you buy s u p p o r t s t h e i n t e r f a c e you i n t e n d t o use. An a l t e r n a t i v e t o t h i s approach I s t o do a l i t e r a t u r e
421 se a r c h and see I f someone e l s e has I n t e r f a c e d your In strume n t t o a computer a l r e a d y . The Revlew o f S c i e n t i f i c I n s t r u m e n t s . f o r example, r u n s d l g l t a l a p p l l c a t l o n s n o t e s each Issue . I f you can f i n d someone e l s e who has a l r e a d y s o l v e d your problem, do what they d i d . F a l I I n g t hese, you can r u n an A / D c o n v e r t e r t o t h e c h a r t - r e c o r d e r o u t p u t o f an analog I n s t r u m e n t as a q u l c k and. d l r t y way o f c o m p u t e r l z l n g . T h l s I s n o t a bad approach, a l t h o u g h you must o b s e rve a l l t h e warnings on A / D c o n v e r t e r s p re se n ted above. The p o l n t o f a l l o f t h l s I s t h a t u s l n g t h e equlpment you have I s p r o b a b l y t h e s a f e s t , most c o s t - e f f e c t i v e way o f computerlzlng. I f none o f t h e above o p t l o n s o b t a i n , you have two c h o l c e s : buy an A / D c o n v e r t e r and c o m p u t e r l z e from s c r a t c h , o r buy new s ta n d-alone equipment. The second o p t l o n I s much more e x p e n s l v e , b u t a l s o much l e s s p r o b l e m a t i c . I f you have t h e l u x u r y o f b u y l n g a l l new equipment, make s u r e t h a t I t I s equlpped w i t h an IEEE-488 I n t e r f a c e . Thls I n t e r f a c e was deslgned f o r l a b o r a t o r y a p p l l c a t l o n s . As I have argued e l s e w h ere C41. t h e IEEE-488 I s v a s t l y s u p e r i o r t o any o t h e r f o r l a b o r a t o r y uses. I f c o s t I s a s e r i o u s I l m l t a t l o n and speed I s n o t c r l t l c a l , an a t t r a c t i v e a l t e r n a t l v e t o IEEE-488 l n t e r f a c l n g I s t h e HP-IL. T h l s I s a low-cost S e r l a l I n t e r f a c e developed by HP (and a v a l l a b l e on o n l y HP p r o d u c t s ) t h a t c o n t a i n s many o f t h e f e a t u r e s o f t h e IEEE-488, a l b e l t I n sl ow m o t i o n . You can even use a Hewlet t -Packard hand-held c a l c u l a t o r f o r t h e "c o m p u te r " w i t h t h l s I n t e r f a c e . More l n f o r m a t l o n on t h l s o p t l o n I s p r e s e n t e d I n C7l.
3.2 SUPPORT C H I P S Support c h i p s a r e p a r t s o f t h e computer o t h e r t h a n t h e m l c r o p r o c e s s o r t h a t I n c r e a s e a c o m p u t e r ' s performance by removlng some s p e c l a l l z e d t a s k from t h e l i s t o f t h l n g s t h a t t h e m l c r o p r o c e s s o r has t o do. There a r e two maJor s u p p o r t c h l p s t h a t a r e Im p o r ta n t I n comput erlzed d a t a a c q u l s l t l o n . First Is a d l r e c t memory access (DMA) c o n t r o l l e r . What a DMA c o n t r o l l e r does I s p l c k up l n f o r m a t l o n from one p a r t o f t h e computer and p l a c e I t somewhere e l s e . For example, I f you a r e c o l l e c t l n g d a ta from an 1/0 p o r t and s t o r i n g I t on d l s k o r I n main memory, t h e DMA c o n t r o l l e r may be used t o p e r f o r m t h l s t a s k a t t h e maximum r a t e t h e computer can s u p p o r t . My own p r e J u d l c e I s t h a t any computer t h a t l a c k s a DMA c o n t r o l l e r does n o t b e l o n g I n a d a t a a c q u l s l t l o n envlronment . W i t h a DMA c o n t r o l l e r , you can program I n any h l g h - l e v e l language t h a t a l l o w s you t o access 1/0 p o r t s ( e . g . , BASIC's OUT command) and memory l o c a t l o n s ( e . g . , BASIC's PEEK command) and a c h l e v e d a t a a c q u l s l t l o n r a t e s equal t o f u l l y o p t l m l z e d assembly-language a c q u l s l t l o n r o u t i n e s . There I s one I l m l t a t l o n on DMA c o n t r o l l e r s t h a t I s Important I n A / D a p p l l c a t l o n s . A DMA c o n t r o l l e r can access o n l y one p o r t l o c a t l o n a t a t l m e . I f your A / D bo a rd has more th a n 8 - b i t r e s o l u t i o n , I t may use two p o r t s t o o u t p u t d a t a t o t h e computer. I f s o , you cannot use DMA. However, some 1 2 - b l t A / D boards ( e . g . , Data T r a n s l a t l o n p r o d u c t s ) m u l t i p l e x t h e two-byte o u t p u t t o make I t a v a i l a b l e t o t h e same 1/0 p o r t so t h a t the y can s u p p o r t DMA o p e r a t l o n . The second s u p p o r t c h l p t h a t may be o f v a l u e t o you I s a nu m e r l c a l co-processor (NCP). I f you need t o do a l o t Of number-crunchlng on your d a t a , an NCP can speed t h e tu rn -a ro u n d t l m e by as much as two o r d e r s o f rnagnltude. I f you a r e d o l n g FFTs on l a r g e d a t a s e t s . f o r example, you w I I I p r o b a b l y want t h l s
capablllty. The major t h l n g t o watch o u t f o r w l t h r e s p e c t t o NCPs I s t h a t many computers w I I I s u p p o r t them, b u t t h e languages on t h e computer w l I I n o t use them. For example, M l c r o s o f t BASIC and F O R T R A N on t h e IBM PC w i l l n o t use t h e 8087 even I f I t I s Installed. A t h l r d k i n d o f s u p p o r t c h l p t h a t can be o f use I n A / D c o n t e x t s I s a programmable I n t e r v a l t i m e r ( P I T ) . T h i s c h i p keeps t r a c k o f t l m l n g I n t e r v a l s w i t h l n a computer. I d o n ' t emphaslze I t s use because you a r e g e n e r a l l y b e t t e r se rve d by an A / D t h a t has I t s own t i m e r f o r sample I n t e r v a l s . The p ro b l e m w i t h u s i n g t h e computer t o keep t r a c k o f t l m e I s t h a t a v a r l e t y o f housekeeplng f u n c t i o n s I n t h e computer may a f f e c t t h e P I T ' S operation. The c o m p u t e r ' s t i m e r i s d e si g n e d f o r use by t h e computer, n o t f o r use by p e r l p h e r a l s r e q u l r l n g h i g h r e s o l u t l o n o f tlme. An e x c e l l e n t d i s c u s s l o n o f t h e k l n d s o f problems a s s o c l a t e d w i t h t h e IBM PC t l m l n g f u n c t i o n s I s [ l l ] . The problems a s s o c l a t e d w i t h t h e PIT a r e s l g n l f l c a n t t o data a c q u l s l t l o n generally. There I s a d i f f e r e n c e between computer t l m e and r e a l t i m e . I f you use a computer t o t l m e your d a t a a c q u l s l t l o n , you w l I I g e t d a t a w l t h equal CPU t i m e sp a cl n g . But computers do a l o t o f housekeeplng o p e r a t l o n s t h a t g e n e ra te Interrupts. What t h l s means I s t h a t p e r i o d l c a l i y your a p p l i c a t i o n program I s p u t t o s l e e p w h l l e t h e computer, e . g . , updates I t s t i m e o f day c l o c k . I t may seem t h a t you c o u l d s o l v e t h l s p ro b l e m by d l s a b l l n g t h e system i n t e r r u p t s . U n f o r t u n a t e l y , you c a n n o t . There a r e two f l a v o r s o f I n t e r r u p t s i n a computer, maskable and non-maskable (NMI ). W h l l e you can, and u s u a l l y s h o u l d , d i s a b l e t h e maskable I n t e r r u p t s d u r l n g d a t a a c q u l s l t l o n , you cannot d l s a b l e NMls. F u r t h e r , you w l I I p r o b a b l y n o t be a b l e t o f l n d o u t what causes an NMI on your computer. I n some computers, any keyboard I n p u t w l I I g e n e r a t e an NMI. Bu t you w i l l n o t f l n d t h a t o u t by r e a d i n g t h e m a n u f a c t u r e r ' s docume n tatl on . Compoundlng t h e problem I s t h a t any s o f t w a r e manuf actu re r may i n vo ke an NMI f o r any reason t h a t he sees f i t . F u r t h e r , t h e f a c t t h a t t h e computer may be o ccu p i e d w l t h housekeeplng f u n c t i o n s when d a t a I s ready means t h a t , even I f you t l m e your d a t a a c q u l s l t i o n i n t e r v a l s e x t e r n a l t o t h e computer, your d a t a may be unevenly spaced because I t i s downloaded a t unequal I n t e r v a l s . To a v o i d t h l s problem, some m a n u f a c t u r e r s make A/Ds w i t h b u f f e r memory as w e l l as e x t e r n a l t l m e r s . Whlle t h l s approach makes p e r f e c t sense, t h e c o s t o f t h l s k i n d o f se tup I s t y p i c a l l y t h r e e t o f i v e thousand d o l l a r s . For n o t a whole l o t more money, you can g e t a f u l l - f u n c t i o n sta n d -a l o n e d l g i t l z e r . There a r e many advantages t o st and-al on e d l g i t l z e r s . They can be s e t w l t h f r o n t - p a n e l c o n t r o l s l i k e an o s c l l l o s c o p e I n s t e a d Of o n l y by programmlng, so you know what t h e d l g l t l z e r I s s e t t o do As because you can see t h e s e t t i n g s on t h e f r o n t - p a n e l d l a l s . mentioned b e f o r e , t h e t r l g g e r l n g o p t i o n s o f waveform d l g l t zer s a r e s u p e r l o r t o A/Ds. The range o f scan r a t e s and v o l t a g e ga I ns te n d s t o be much l a r g e r and more f i n e l y a d j u s t a b l e . And I YOU must have a r e a l - t i m e d l s p l a y o f d a t a , you can co n n e ct an o s c l l l o s c o p e t o t h e d l g l t l z e r ' s anal o g o u t p u t and view t h e scan wlthout I n t e r f e r i n g w l t h the a c q u l s i t l o n function. One o t h e r s u p p o r t c h l p needs t o be me n ti on e d : t h e programmable I n t e r r u p t c o n t r o l l e r ( P I C ) . I w l I I wlthhold d l s c u s s l o n o f t h e PIC u n t l l t h e s e c t i o n on programmlng languages. The k i n d o f I n f o r m a t i o n t h a t you need t o know f o r l a b o r a t o r y I n t e r f a c i n g t ends t o be v e r y s p e c l f l c t o t h e l n d l v l d u a l computer, and a v a i l a b l e ( I f a t a l l ) o n l y I n a r t l c l e s and books p u b l l s h e d by I n d i v i d u a l s who have worked w l t h t h e
429 system. For example, t h e i B M PC T ech n i ca l R e fere n ce Manual g i v e s no i n f o r m a t i o n on a c c e s s i n g o r programming t h e DMA c o n t r o l l e r . i t J u s t d o e s n ' t occur t o programmers o r businessmen t h a t anyone o u t s i d e t h e manuf act urer has any use f o r t h i s i n f o r m a t i o n . However, t h e r e a r e v a r i o u s books on t h e ma rke t t h a t do address t h i s q u e s t i o n f o r t h e PC. One good example i s 111, w h i ch was w r i t t e n by one o f t h e o r l g i n a i d e s i g n e r s o f t h e I B M PC. The p o i n t o f t h i s I s t h a t you s h o u l d p r o b a b l y a v o i d computer system c l o n e s i n t h e l a b , r a t h e r t h a n r i s k t h e i r h a v i n g address space or s u p p o r t c h i p s t h a t d i f f e r from t h e system d e s c r i b e d i n t h e literature. i f a PC i s s i m p l y t o o slow f o r your a p p l i c a t i o n , any o f t h e Versabus o r VMEbus 68000 systems w i l l p r o v i d e an o r d e r o f magnitude improvement i n perf ormance. However, t h e degree o f d i f f i c u l t y i n p u t t i n g your a p p l i c a t i o n t o g e t h e r w i l l a l s o be in c r e a s e d by an o r d e r o f magnit ude. T h i s i s p a r t l y because t h e r e a r e fewer p e o p l e w r i t i n g books and a r t i c l e s on a p p l i c a t i o n s f o r t h e s e systems and p a r t l y because t h e r e a r e fewer companies s u p p l y i n g boards f o r t h e s e systems. 3 . 3 PROGRAMMING CONSIDERATIONS
I look on programming as a necessary e v i l . The g o a l s o f programming a r e t w o f o l d . F i r s t , you want t o be done w i t h i t as q u i c k l y as p o s s i b l e . And second, you want t o a c h i e v e t h e l e v e l o f c o n t r o l t h a t you had b e f o r e you co mp u teri ze d t h e o p e r a t i o n . U n f o r t u n a t e l y , t h e s e g o a l s a r e n o t complementary. One way t o lessen t h e t i m e spent programming t h e c o n t r o l o f d a t a a c q u i s i t i o n i s by b u y i n g a d r i v e r program f o r your aC q U l S i tl On system. A d r i v e r i s a program t h a t s e t s t h e o p e r a t i o n o f a d e v i c e f o r you when you i n vo ke ( s u p p o s e d l y ) o r d i n a r y - l a n g u a g e commands. For example, t h e d r i v e r may l e t you s e t t h e r a t e o f c o n v e r s i o n on l i n e 1 o f t h e A / D b o a rd t o 1000 con v e r s i o n s /s econd by s a y l n g something l i k e , "SET.RATE(1,1000)." The a l t e r n a t i v e t o t h i s m i g h t be o u t p u t t i n g a s e r i e s o f hexadecimal numbers t o a g i v e n p o r t . i n p r i n c i p a l , t h e idea o f canned d r i v e r s I s v e r y a t t r a c t i v e . I n P r a c t i c e , t h e programs t e n d t o be u n n e c e s s a r i l y slow, f i l l e d w i t h bugs, and produce u n r e l i a b l e data. F u r t h e r , they w i l l o f t e n not support t h e o p e r a t i o n s you want t o p e r f o r m on t h e A I D b o a rd . The v a l u e o f d r i v e r s f o r D I D i n t e r f a c e s i s somewhat higher. Most IEEE-488 board m a n u f a c t u r e r s , f o r example, w i l l s u p p l y assembly language d r i v e r s f o r t h e i r b o a rd s t h a t p e r f o r m re a s o n a b l y W e l l . The programs a r e O f t e n n o t a d e q u a tel y debugged, however, so you s h o u l d make s u r e t h a t t h e so u rce code i s p r o v i d e d w i t h t h e package. 3 . 4 CHOICE OF LANGUAGE
i f you a r e g o i n g t o w r i t e your own programs, what language s h o u l d you use? W h i l e everyone seems t o have t h e i r own p r e f e r e n c e s on t h i s , I b e l i e v e t h a t BASIC i s by f a r t h e b e s t choice f o r s c i e n t i s t s . The v i r t u e s o f B A S I C a r e t h a t i t can be i n t e r p r e t e d w h i l e d e v e l o p i n g a program t o ease debugging and t h e n comp i l e d f o r (some) speed when t h e program has been debugged, I t p r o v l d e s access t o p o r t s and s u p p o r t c h i p r e g i s t e r s w i t h t h e INP and OUT commands, i t p r o v i d e s access t o memory l o c a t i o n s by t h e PEEK and POKE commands, and i t can be mastered i n a week o r two. I should m e n ti o n t h a t t h e computer cannot be ma stere d i n t h a t t i m e , b u t you w i i i know enough BASIC t h a t t h e language i s n o t what w i l l be
430 p r e v e n t i n g you from d o i n g something. The sl o w s t e p I s l e a r n i n g where t h e m a nuf act urer p u t t h e s u p p o r t c h i p s i n t h e c o m p u t e r ' s address space, f i g u r i n g o u t what p o r t l o c a t i o n your A / D o r i n t e r f a c e uses, what t h e c r y p t i c i n t e r f a c e o r A / D documentation means, e t c . The language w i l l n o t be t h e p ro b l e m. And t h a t i s a i l you can reasonably expect o f a language. Most commonly, BASIC on a microcomputer means M i c r o s o f t BASIC, so we w i l l b e g i n by d i s c u s s i n g I t . There a r e a few major f l a w s i n M i c r o s o f t BASIC. These i n c l u d e : i t i s s l o w , has v e r y l i m i t e d dynamic range ( a p p r o x . 1 0 * - 3 7 t o 1 0 A 3 7 , w h i ch i s i n s u f f i c i e n t f o r s o l v i n g a reasonabl y l a r g e m a t r i x by p i v o t a l c o n d e n s a t l o n ) , and can o n l y address a t o t a l o f 6 4 Kbytes combined program and d a t a space, even i f your computer has t e n t i m e s t h a t available. A l t h o u g h BASIC i s n o t a f a s t language, t h e r e a r e s i m p l e ways o f o b t a i n i n g adequate performance f o r your program. These include: u s i n g t h e support c h i p s i n t e l l i g e n t l y ; o p t i m i z i n g t h e o b j e c t code o f your BASIC c o m p i l e r o u t p u t b y, e . g . , kee p i n g intermediate values o f v a r i a b l e s i n r e g i s t e r s instead o f s h u f f l i n g them back and f o r t h t o main memory; and n o t o v e r b u r d e n i n g your program w i t h needl e ss t a s k s . A number o f new BASIC I mplemen tati on s have r e c e n t l y appeared on t h e market t h a t a t t e m p t t o r e d r e s s some o f t h e l i m i t a t i o n s o f M i c r o s o f t BASIC. Three t h a t a r e w o r t h m e n t i o n i n g a r e B e t t e r BASIC, True B A S I C , and M T B A S I C . I have n o t used any o f th e s e BASICS, so I cannot recommend them. However, the y have some f e a t u r e s t h a t may be i m p o r t a n t t o your w ork. Each o f them can use t h e f u l l amount o f memory on your computer, s u p p o r t t h e use o f t h e NCP ( M T B A S I C o n l y w i t h t h e $ 7 9 . 9 5 v e r s i o n ) , and p r o v i d e a dynamic range o f a t l e a s t 1 0 A - 9 9 t o 1 0 A 9 9 . On t h e n e g a t l v e s l d e , none o f t hese languages come i n i n t e r p r e t e d versions. F u r t h e r , True BASIC l a c k s t h e INP and OUT commands. MTBASIC and True BASIC i n c l u d e one o t h e r f e a t u r e : i n t e r r u p t handling. I am n o t a f a n o f I n t e r r u p t s , however. i n t e r r u p t s a r e used when you want t h e computer t o do some ta sk w h i l e i t i s w a i t i n g f o r some o t h e r t a s k t o be comp l e ted . For example, i f you a r e c o l l e c t i n g d a t a a t a r e l a t i v e l y sl o w r a t e , you m i g h t want t o have t h e computer p l o t a graph o f t h e d a t a i t a l r e a d y has c o l l e c t e d w h i l e w a i t i n g f o r more. When t h e new d a ta p o i n t i s r e a dy, t h e d a t a a c q u i s i t i o n d e v i c e w i l l s i g n a l t h e computer t h a t more d a t a i s a v a i l a b l e , i . e . , i t w i l l i n t e r r u p t the p l o t t i n g f u n c t i o n t o perform t h e data a c q u i s i t i o n f u n c t i o n . What I f i n d d i s t a s t e f u l about t h i s proced u re i s t h a t i t c o m p l e t e l y i g n o r e s t h e r e l a t i v e importance o f t h e two f u n c t i o n s . Your f i r s t concern s h o u l d be t o g e t t h e d a t a , and g e t i t r i g h t . We a r e n o t o f f e n d e d t h a t an o s c i i i o s c o p e j u s t s i t s and w a i t s f o r a t r i g g e r . We s h o u l d n o t be alarmed t h a t a micr oco mp u ter, wh c h i s no more i f your e x p e n s i v e t h a n a decent scope, i s no more i n d u s t r ous. d a t a a c q u i s i t i o n r a t e I s so slow t h a t t h e compute can c o m p l e t e l y p l o t t h e d a t a b e f o r e t h e n e x t p o i n t i s c o l l e c t e d , you d o n ' t need interrupts. i f you need i n t e r r u p t s t o a cco mp l i sh two t a s k s , you s h o u l d n ' t be d o i n g b o t h t a s k s on one computer. The overhead My concerns a r e n o t p u r e l y p h i l o s o p h i c a l . i n c u r r e d by i n t e r r u p t s can be v a r i a b l e , depending on what i n s t r u c t i o n t h e computer was w o r k i n g on when i t was i n t e r r u p t e d . T h i s u n c e r t a i n t y w i l l m a n i f e s t I t s e l f i n one o f two ways. First, t h e u n c e r t a i n t y caused by t h e t i m i n g problems a l r e a d y d i scu sse d w i l l be f u r t h e r exacerbat ed. O r , I f you t r y t o f o r c e r e g u l a r i t y i n t h e t i m e base by t r i g g e r i n g o f f a f r e e - r u n n i n g e x t e r n a l t i m e r , you may m i s s a d a t a p o i n t e n t i r e l y . F u r t h e r , t h e amount o f t i m e i t t a k e s t o s e r v i c e an i n t e r r u p t i s n o t i n s i g n i f i c a n t . For
example, t h e IBM PC r e q u i r e s over 8 0 c l o c k c y c l e s t o ha n d l e t h e bookkeeping a s s o c l a t e d w i t h an I n t e r r u p t . I f you a r e s l m p l y t r y i n g t o p l o t an incoming d a t a p o i n t on t h e C R T , an assembly r o u t l n e can a c h l e v e t h a t i n l i n e i n about t h e same t i m e t h a t i t would t a k e t o s e r v i c e t h e i n t e r r u p t . The g r e a t advantage o f u s i n g a s i n g l e r o u t i n e f o r d a t a a c q u l s l t i o n i s t h a t you normally debug and v e r i f y programs i n d e p e n d e n t l y . i f you t r y r u n n i n g two p r o p e r l y debugged r o u t l n e s I n tandom, you r u n t h e r i s k o f i n t r o d u c i n g a new s e t o f e r r o r s caused by t h e i n t e r a c t i o n o f t h e two r o u t l n e s . Such e r r o r s a r e d l f f l c u i t t o d e t e c t . To my mind, t h e o n l y l e g l t l m a t e use o f I n t e r r u p t s i n d a t a a c q u i s i t i o n I s s i g n a l l i n g u n a n t i c i p a t e d e v e n t s t h a t a r e more important than t h e data. For example, I f an i n s t r u m e n t malfunctions, i t I s desirable t o i n t e r r u p t the data a c q u i s i t i o n process. By t h e way, t h i s k l n d o f f u n c t i o n i s e a s i l y programmed on s ta n d - a l o n e i n s t r u m e n t s t h a t s u p p o r t t h e I E E E - 4 8 8 i n t e r f a c e . The SRQ l i n e o f t h a t i n t e r f a c e can be t i e d t o an I R Q I I n e o f t h e computer t o a u t o m a t i c a l l y generat e an i n t e r r u p t s i g n a l f o r any c o n t i n g e n c y t h a t you have programmed t h e i n s t r u m e n t t o m o n i t o r . I f you wanted t o employ t h l s k l n d o f a s e t u p , MTBASIC ( w h l c h i s l e s s e x p e n s i ve t h a n True BASIC) m i g h t be an a t t r a c t i v e a l t e r n a t i v e t o programming t h e I n t e r r u p t - h a n d l i n g r o u t i n e i n assembly as i s n o r m a l l y r e q u i r e d i n BASIC. 4 . 0 OVER-RELIANCE ON AUTOMATION
There I s a tendency when p e o p l e c o m p u t e r i z e t o p u t t o o much f a i t h I n t h e computer. T h i s g e n e r a l l y t a k e s one o f two forms. F l r s t , i t i s easy t o o v e r t r e a t d a t a . For example, The sometimes d a t a W l I i be smoothed b e f o r e i t i s a n a l yse d . s l g n i f l c a n c e o f s t a t i s t l c a l i n f o r m a t i o n on t h e degree o f f i t o f smoothed d a t a t o a l i n e i s , o f c o u r s e , t o t a l l y opaque. The second t r a p i s t o t r y t o make t h e computer do an a n a l y s i s t h a t you c o u l d b e t t e r do w i t h o u t i t . I w i l l illustrate t h i s p o i n t by example. i n my d a t a a c q u l s l t i o n c o u r s e a t B r a n d e i s , I assigned an experiment i n w h i ch t h e s t u d e n t was t o make a phase dlagram o f t h e a c e t a m i d e / s a l i c y i i c a c i d system. T h l s i s an i n t e r e s t l n g system because i t forms a p e r l t e c t i c m i x t u r e a t . 4 6 X s a l i c y l i c a c i d and because most o f t h e mole f r a c t i o n s o f t h e system a r e prone t o s u p e r c o o l i n g . As a r e s u l t , t h e c o o l i n g c u r v e s f o r t h i s system a r e a mess. They a r e Very easy t o a n a i y s e by eye, e s p e c i a l l y i f you f o r t i f y your a n a i y s l s w i t h an o b s e r v a t i o n o f t h e c l o u d p o i n t s . But t h e s t u d e n t s i n v a r l a b i y t r i e d t o w r i t e programs t o i d e n t i f y t h e break p o i n t s f o r them i n s t e a d o f s i m p l y h a v i n g t h e computer p l o t t h e p o i n t s and d o l n g t h e a n a l y s i s by eye. I t i s probably possible t o w r i t e an a n a l y s i s r o u t i n e f o r d o i n g t h i s , b u t none o f my s t u d e n t s was ever a b l e t o do i t . The p o i n t I wanted them t o l e a r n was t h a t t h e computer can be a l o t more t r o u b l e t h a n i t ' s w o r t h i f a p p l i e d t o t h e wrong pr o b l e m s . You s h o u l d n o t b o t h e r aut oma ti ng a n y t h i n g t h a t i s n ' t a pr o b l e m w i t h o u t a u t o m a t i o n . T h i s i s a more o b v i o u s p o i n t i n t h e a b s t r a c t than i t i s i n p r a c t i c e . 5 . 0 PUTTING I T ALL TOGETHER
The l a s t argument I want t o make a g a i n s t t h e use o f A / D boards i s a systems argument. When you s e t up an exp e rl me n t, you n o r m a l l y i n c o r p o r a t e one i n s t r u m e n t a t a t i m e i n t o t h e s e t u p . You v a l i d a t e t h e performance o f t h a t i n s t r u m e n t , t h e n add t h e n e x t one and so on. T h i s I s a n a t u r a l way t o procede. When you
432 g e t t o t h e l e v e l o f r u n n i n g e v e r y t h i n g a t once, t h e problems t h a t a r e l e f t a r e problems o f c o o r d l n a t l o n . You know t h a t because you know t h a t each o f t h e components I s p e r f o r m i n g as e xp e cted lndlvldually. Uslng an A / D board on a computer, however, sta n d s t h l s p r o c e s s on i t s head. You cannot t e s t pe rforma n ce o f your s e n s o r s s e p a r a t e l y from t e s t l n g t h e system. You must b e g i n by l n t r o d u c l n g t h e computer i n t o t h e loop. I f t h e r e I s a problem, and t h e r e always i s , you d o n ' t know whether I t I s I n t h e computer, t h e A / D board, t h e sensor, o r c o o r d l n a t l o n between some elements. W i t h st and-alone equlpment t h a t I s d l g i t a i i y I n t e r f a c e d t o t h e computer, however, you can work l i k e y o u ' v e always worked. The s ta n d - a l o ne I n s t r u m e n t can measure i t s sensor w i t h o u t b e i n g I n c o r p o r a t e d i n t o t h e system. A f t e r you know t h a t each p a r t I s w o r k i n g s e p a r a t e l y , t h e problems t h a t remain w i l l c l e a r l y be u n d e r s to o d as problems o f c o m p u t e r l z a t l o n . W h i l e I have emphasized t h e e r r o r s t h a t c o m p u t e r l z a t l o n may i n t r o d u c e I n t o a procedure, I t may be t h a t t h e p r e c l s l o n l o s t i n c o m p u t e r i z i n g d a t a a c q u l s i t l o n I s o f f s e t by a v o i d l n g t h e e r r o r s I n your c u r r e n t procedures, e . g . , manual d a t a e n t r y . Fu r th e r m o r e , t h e problems I have d l s c u s s e d a r e most c r i t i c a l i n hlgh-speed, h i g h - p r e c l s l o n work. The sl ow e r o r more q u a l l t a t i v e your work I s , t h e l e s s you need t o w o r r y about t h e dangers I have enumerated. My purpose has been t o make you aware o f t h e w a l l s i n A / D , r a t h e r t h a n t o suggest t h a t A / D bo a rd s have no use. REFERENCES
[13 B r a d l e y , Davld, Assembly Language Programming f o r t h e IBM Personal Computer. P r e n t i c e - H a l I , 1984. C23 C a r r , Joseph, l n t e r f a c l n g Your Microcomputer t o V l r t u a l l y An y th l n g , Tab, 1984. C3l Clune Thomas and K a r n e t t , M a r t l n , "Computer-Independent n t e r f a c e between a B l o m a t l o n 8100 and a ml croco mp u ter," I EEE-488 Revlew o f S c l e n t l f l c I n s t r u m e n t s , Nov. 1984, v . 55 no. 1 1 . p . 1879. C41 C l u n e Thomas, " I n t e r f a c l n g f o r d a t a a c q u l s l t i o n , " B Y T E , Feb. 0 no. 2, p . 269. 1985, v . -, "The IBM CS-9000 l a b co mp u ter," B Y T E , Feb. 1984, C5l v. 9 no. 2 , p . 278. [6] F e n s t e r . Samuel and F ord, D r . L i n c o l n , " S a l t , " BYTE, June 1985, v . 10 no. 6, p. 147. [7] Kane, G e r r y ; Harper, St eve; and U s h l J l m a , D a v l d , The HP-IL System, Osborne/McGraw-Hill, 1982. [8] L e l b s o n , St eve, "The I n p u t / o u t p u t p r l m e r , " B Y T E , s l x - p a r t s e r i e s from Feb. 1982, v . 7 no. 2 t o J u l y 1982, v . 7 no. 7. Cgl , The Handbook o f Mlcrocomputer l n t e r f a c l n g , Tab. [lo] L l s c o u s k i , Joseph, "Connect lng computer and e xp e rl me n ts:
-
N o i s e r e j e c t i o n t h r o u g h s o f t w a r e , " Computer A p p l l c a t l o n s i n t h e Lab, Aug. 1984, V . 2 no. 4, p . 208. [ll] S m l th , Bob and P u c k e t t , Tom, " L l f e i n t h e f a s t l a n e , " PC Tech J o u r n a l , Apr. 1984, v . 1 no. 7, p . 63.
HIGH FREQUENCY WATER QUALITY MONITORING OF A COASTAL STREAM NORMAN E. DALLEY, INLAND WATERS DIRECTORATE, ENVIRONMENT CANADA, 502-1001 WEST PENDER ST., VANCOUVER, CANADl V6E 2M9
ABSTRACT High frequency monitoring of a number of water quality indicators was carried out for a one year period in a Pacific coastal stream. A computer program was written to facilitate presentation
and preliminary analysis of the data collected. Application of the
program to these data demonstrated a number of interesting short term variations in the indicators
being
monitored.
This
study
confirms
the
conclusion
that
high
frequency
monitoring can be an appropriate strategy and concludes that in streams with widely varying discharge, i t is the preferred approach. Several limitations in the data acquisition system being used are noted. INTRODUCTION This
paper
reports
on
the
initial
part
of
a
study
of
coastal
stream monitoring
techniques and strategies. The purpose of the complete study is four-fold: (1) to develop a low-cost, versatile water quality monitoring system, (2) to evaluate the performance of data acquisition systems in the field,
(3) to select appropriate statistical methods for
analyzing
frequency
and
presenting
the
high
data
produced,
and
(4)
to
make
recommendations, based on this analysis, of appropriate strategies for the monitoring of coastal streams. This paper reports on work directed towards the first two goals. High frequency monitoring of a selected suite of water quality indicators was undertaken at a
site chosen to be typical of coastal streams. The frequency of monitoring desired and the volume of data that would be produced dictated use of a digital data acquisition system in which physical analog signals are converted to digital information. Up to the present, monitoring of the stream has utilized
a commercially available
data acquisition system with a number of limitations, particularly the difficulty of altering the
types of
sensors being
used.
An
inexpensive data
acquisition system
which will
overcome the limitations is needed. Data collecbd over the period August 1984 to August 1985 at 15 minute intervals indicate large variations in magnitude
over
very
short periods for a number of the
variables monitored. Scme observed variations were: (1) a rapid and dramatic drop in stream pH correlated with heavy rainfall; (2) a large rapid response of water level to rainfall; (3) a significant diurnal variation of water level and pH during summer months; (4) a wide variation of temperature with a large diurnal frequency component throughout
434 the year; and (5) significant diurnal variations of oxidation-reduction potential. and analysis. Also
A computer program was developed to aid in data. presentation under
development are
a
program to remove
the
types of
noise encountered in high
frequency environmental monitoring and methods for data analysis using existing software packages.
In
addition
reliable
methods
of
data
transfer
from
acquisition
system
to
microcomputer and microcomputer to mainframe were implemented. METHODS Kanaka Creek, a tributary of the lower Fraser River in southern British
Columbia
was selected as the site for this study. The northern portion of the watershed is heavily forested mountain slopes while the southern portion is lightly populated with small farms and
residential
areas.
This stream was chosen
i t exhibits highly
for several reasons:
episodic flow behaviour typical of Pacific coastal streams; it contains a hydrometric survey station with long term water quantity records; it is proximal to the city of Vancouver where Water Quality Branch offices are located; it is the site of a Salmon Enhancement Program (SEP) hatchery and has a hatchery manager on site twenty-four hours a day; it has power and telephone service. Equipment was installed in the stream and a nearby pumphousc. For the past year of the study a Hydrolab 8000 data acquisition system
For a detailed discussion of this system see Whi6eld (1984). The system
w a s used. included
the
data
transmitter
unit
with
sensors
(pressure,
temperature,
conductivity,
dissolved oxygen, pH and oxidation-reduction potential), the data control unit (logger) and the
data
management
compensated.
unit
Calibration and
(for
data
transfer).
The
pH
sensors
was
carried
cleaning of
electrode out
was
temperature
approximately once
every two to three months using standard solutions a s described in the Hydrolab 8000 instructions. The transmitter unit was enclosed in a PVC pipe which was anchored to a cement block in the stream bed. The sampling frequency was set a t once every 15 min. in order to effectively sample even short term variations.
While this frequency would
theoretically capture fluctuations with as short a time period as 30 minutes (see Fritschen
and Gay, 19791, practically we expected to observe phenomena with excursions lasting in the order of hours as a minimum. Data was transferred weekly from the data control unit
to
an
management
IBM-PC
compatible
portable
microcomputer
(Hyperion)
using
the
data
unit and a communications program with a terminal emulator (Dynalogic
Info-Tech, 1983). Batteries were changed and the system memory was cleared of data weekly. Batteries (12v, 20 ampere hour lead acid Yuasa or Gel Cells) were charged with a Johnson Controls 12v charger which switched to float charge at 80% charge capacity. Data collected on microcomputer floppy diskettes were edited and transmitted to a n IBM mainframe
coriiputer
communications computer
was
at
programs used
for
Simon
Fraser
(IN:TOUCH, further
University
using
3 101, Crosstalk,
editing, for
data
and
one
of
Kermit).
analysis and
several The
presentation,
different mainframe and
for
archiving. Programs were writLen in FORTRAN IV, or VS FORTRAN and utilized the
435
AES Station
- Haney Daily readings
East
60. [r)
$
50.
3
al .r(
I
40.
d 3 .r(
x
c
-
30.
.r(
20.
(d cl4
c
.r(
in.
P= 0. 1
16 1
161
AUG S E P 1964
16 I
OCT
161
NOV
16 1
DEC
161 16 1 16 1 16 I 16 1 16 1 16 1 FEE MAR APR MAY J U N JUL AUG S E P
16 1
JAN 1985
FIGURE 1
Kanaka Creek at SEP Hatchery Manual gauge readings 1.2
0.0
~,
, , , , ,
I
,
,
,
1
, , , ,
I
,
,
, ,
I
,
,
, , J
I 16 1 1 6 1 16 1 1 6 1 16 I 16 1 161 1 6 1 1 6 1 I 6 1 1 6 1 16 1 16 1 A U C S E P OCT NOV DEC JAN FEE MAR APR MAY J U N JUL AUG S E P 1984 1985
FIGURE 2
436 Plot Description System of the Michigan Terminal System. Plots were produced with a
QMS Lasergrafix 1200 printer or an HP7470A pen plotter. RESULTS @ Frequency Monitoring
Many of the variables being monitored rhanged rapidly over short periods of time (within a few hours). Many water quality
sites are monitored on a weekly or even
monthly basis and would not, of course, demonstrate such short term variations. The rapid response and episodic nature of flow data for this stream is illustrated by daily rainfall (Figure 1, data from AES, Atmospheric Environment Service) and daily gauge height readings (Figure 2). The record for the month of November 1984, one of heavy rainfall in the Kanaka Creek watershed, serves to illustrate the dramatic drop in stream pH which can occur after a rainstorm (Figure 3a). On four occasions stream pH fell between one-half and ‘ a full pH unit. In two out of three heavy rainfall periods in December 1984, there was a dramatic drop from the normal pH range of 6.0 to 6.3 down to either 5.2 or 5.4 pH units (Figure 3b). In both cases the significant portion of the drop occurred during a 5-6 hour period and there was a slow (approximately 4-day) rebound to the normal pH level for that month. Data for July 1985 indicate a marked diurnal variation in pH, water level, and temperature (Figure 4). The two days chosen to illustrate throughout
this
variation
are
typical
examples
of
the
behaviour
of
these
variables
5 shows that Lhere were also periods with significant
the month. Figure
diurnal variations of oxidation reduction potential. Equipment The Hydrolab sensors were very reliable and remained stable for long periods between calibrations. The major
problem
causing loss of data or inaccurate readings was the
unreliability of the batteries. Even new cells did not hold charges well, after as few as 10
cycles.
readings)
Data and
transfer from
required
the Hydrolab
attachment
of
the
was slow DMU
to
(about the
data
15 minutes
logger
and
for
4000
to
the
microcomputer. Computer Program
A computer program has been written specifically for high frequency environmental data presentation and is available on request from the author. In order to accentuate some types of errors and to provide smoothing for presentation purposes, this program will group data at the user’s request, calculating means and standard deviations of the grouped data and producing plotted output of the mean and standard deviation of the grouped data. The user selects the number of data points he wishes to group. Figure 6 a demonstrates a graph of true daily average temperatures calculated from 15 minute data while
Figure
6b
is
a
plot
of
the
standard
deviation
of
raw
data
grouped
eight
observations at a tiine. It should be noted that in many cases examination of the raw graphically
presentee
data
is
sufficient to
spot periods
of
equipment malfunction
or
437
Kanaka Creek a t SEP Hatchery In s i t u sensor d a t a
6.5 40. 3
3
.3
0.6
E
6.0
cn
3
1
.3
cl
3
30.
5.5
z a 5.0 .A
4.5
0.0
-c
0.
4.0
-
FIGURE 3a
Kanaka Creek a t SEP Hatchery In s i t u sensor d a t a 1.2
!i
1.0
cn
a
Q)
0.8
E
.A 3
40.
3 .H
0.6
.4
z 30. C
.+
42
c
50. 3
L
d
7.0
-
3
0.4
.A
cb
20.
'
6.5
cn
6.0
5 C
3
e a
5.5
5.0
h
=e
d
.H
0.2
0.0
(d
iz -
10.
0..
4.5
-
.i & -
Q
4.0
I
1 . 1
1 - 1
I
I 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272E293031 DECEMBER 1984
FIGURE 3b
438
Kanaka Creek a t SEP Hatchery In s i t u sensor data 2g,
1
25.1
.&
-0.8
c 13.
7.0
6.5
'
1
5.0
ci 4.5
-1.6
b
5.
4.0 27
FIGURE 4
Kanaka Creek a t SEP Hatchery
5
In situ sensor d a t a
3 690.
F: Q,
2
530.
.& 4
*6 .&
& X & 490.
MARCH 1 2 3
0
1985
FIGURE 5
4 5 6 7 8 9 101112131415161718I9202122232425262728293031
439
Kanaka Creek a t SEP Hatchery In s i t u sensor data
rn
3
24.
.d
rn
A
Q,
v
20.
rn al Q,
L 16. bn Q,
n G
12.
.+ Q,
L
8.
hj
4.-
E"
Q O .
\An I
b
,
16 1
I
,
,
)
,
,
I
I
I
1
I
I
I
I
I
I
I
I
I
FIGURE 6a
rn
Kanaka Creek a t SEP Hatchery
3
u
.d
Std Dev
rn
4 8
Q)
[/)
a
40-
Q) LI
M
a 32Q)
1
l
l
I 8 1 I6 1 18 1 16 1 18 I 16 1 18 I 16 1 16 1 16 1 16 OCT NOV DEC JAN FEE MAR A P R MAY J U N JUL AUG 1985
18 I
AUG SEP 1984
,
V"
-
Grouped d a t a
440 otherwise unreliable data (not shown).
DISCUSSION Frequency Monitoring The high frequency data collected shows many interesting variations in water quality variables which would not have been noted had Among
the
variables monitored
were
pH,
less frequent observations been made.
temperature,
dissolved
oxygen,
conductivity,
water level and oxidation reduction potential. The rapid drop in stream pH is correlated with rainfall events (e.g. Figure 3). Measurements of precipitation pH have been taken for many rainstorms and some have yielded readings in the range of 4 to 5 pH units. It is thought that this may contribute to the drop in stream pH along with leaching of organic acids or other chemicals from the forest floor. in water level during summer months was significant on a diurnal
The variation
basis in this stream (Figure 4). Since the lowest water levels occur in late afternoon and the highest levels in early morning shortly after sunrise, what is observed may be the result of transpiration by the abundant vegetation in the basin, evaporation caused by solar heating, and/or withdrawal for use by local residents. Similarly a diurnal variation of oxidation-reduction potential was noted, possibly the result of photoactivation of various chemical species (see Stumm and Morgan, 1981). The wide daily variation in temperature during
summer
months
demonstrates
the
quick
response
of
this
stream
to
physical
environmental influences. Equipment
A number of limitations were noted for the Hydrolab 8000 system. There was no facility for attaching other sensors, no local equipment servicifig was available, the local supplier could not provide schematic diagrams, and any changes of sensor type would be of
a
permanent
nature.
The
depth
and
conductivity
sensors
were
not
of
sufficient
sensitivity for the magnitudes being measured. This system could only support a fixed frequency
of
monitoring and
a maximum of 4096 observations during an unattended
monitoring interval. Alteration of the collection frequency required taking the unit apart, a difficult
procedure
to
perform
in
the
field.
In
addition,
current drain
was high
and
continual replacement of batteries was required. Another disadvantage was the high cost of a complete system (>US$ZO,OOO).
We are planning to test new devices which will allow for dynamic alteration of collection frequency, .i.e. based on the values of variables being monitored. The devices will also contain standard RS-232C interfaces directly on the logging units, they will be built
CMOS
using
technology
for
low
power
consumption
and
reliability
at
low
temperatures, and they will collect up to 10 times as much data as the previously used equipment.
The
U.S.$5000
each.
high-frequency
cost
of
Future
monitoring
these work and
systems with the
with the
selection
six
sensors
new
systems
and
testing
is
estimated
will of
focus
sensors,
to
be
on
continued
e.g.
about
ion-specific
441 electrodes, for monitoring additional variables. Requirements for satellite transmission are currently being investigated and this facility will be added in the coming year as will the ability to activate automatic samplers. The
actual reliability and
accuracy
of
the
systems in field use will be examined in the coming months. Computer Program Graphical output was chosen a s the most appropriate for the large volume of data 15 min. for one year (210,000 points).
resulting from monitoring or six variables every
The computer program developed to deal with these data provides graphical output only. The user
is
prompted interactively
for
the
time period
to
be
analyzed, the
type
of
graphical display (symbols used, presence or absence of a line connecting points) and the number of points to group for determining and plotting averages and standard deviations. Default values exist and can be chosen for most of these options. Data from one day up to several years will be appropriately handled. In addition, any format of time-series data
can be handled easily providing each time point of observations is an individual record in the data file and that the day, month, and year are provided on each record. Hours and
minutes will
be
used
if
provided.
Data
is
scaled
and
appropriate divisions and
labelling of the time axis are determined. When the appropriate number of points to group
is
chosen
by
the
user,
graphical
output of
the
standard deviations
serves to
highlight many types of erroneous data. CONCLUSIONS High frequency monitoring can provide insights into the variation
of water quality
indicators that would go unnoticed with monthly, weekly or even daily monitoring. I t is an appropriate approach to monitoring in some circumstances. In a stream with highly variable
discharge,
or
seasonally
low
flow
rates,
rapid
excursions
in
pH
and
other
variables can be expected. Assessment of changes in these variables under such conditions would
require
accomplished
high
frequency monitoring. Presentation of
graphically
as tables
of
such
data
prove
high
frequency
difficult
to
data
is best
comprehend.
Data
acquisition devices should be flexible and adhere to standards in their input and output functions, and Reliability
should
have
large
data
storage capacity
and
low
power
consumption.
and downtime are of prime importance. Information on reliability should be
obtained from other users prior to purchase if possible. Sensors and electronic recording devices must be chosen with care to ensure the appropriate sensitivity for the application being considered. ACKNOWLEDGEMENTS
I
would
like
to
thank
Mr.
Vancouver Regional District for
John
Heaven
for
his
invaluable
help,
the
Greater
their cooperation, and Bev McNaughton, Paul Whitfield
and Normand Rousseau for their assistance. The views presented are those of the author and not necessarily those of Environment Canada.
442
REFERENCES Dynalogic Info-Tech Carp., 1N:TOUCH Communications Program Manual, 1983. Fritschen, L.J. and L.W. Gay, Environmental Instrumentation, Springer-Verlag, 1979. Stumm, W., and J.J. Morgan, Aquatic Chemistry - An Introduction Emphasizing Chemical Equilibria in Natural Waters, John Wiley and Sons, 1981. Whitfield, P.H., Operation of the Hydrolab 8000 system for Collection of Water Quality Data. Yukon River Basin Study. Water Quality Working Group Report No. 5. Inland Waters Directorate. Environment Canada. Vancouver, B.C. 1984.
THE D E S I G N OF PI COST EFFECrIVE MICHOCOMPUlER-BASED D A r A ACBUISITION SYSTEM
t i y o h i s a 01 a m u r a .
F'rotessor
Ph-D.
Mechanical Engineering
O+
Bradley University Peoria,
!,amyab
I l l i n o i s 61625
Aghai -1 a b r i i
E l e c t r o n i c Software Engineer
1n t e g r at e d
1e c h n i c a l
Jalley City,
Systems
N o r t h D a k o t a 58c.172
AHS FHACT
cle5iqning of a d a t a a c q u i s i t i o n s y s t e m u t i l i z i n g a m a s s
!he produced
model
pr-trpclsi t i on.
o+
The
microcomputers
high
mi r r o c o m p u t e r -
makes
comtner c i a 1 ? y
available
volume
i t
production
of
inexpensive
quite
microprocessor-based
f 1 e : : i b i l i t y a n d v e r - s a t i l i t y of retaining
assemb I y to
a
very
attractive
a general
purpose
to
compared data
the
capacity
1 a n g u a g e programmi n a the
cc*stolriize
high level
data
of
-
a
acquisition
I h e microcomputer system p r o v i d e s t h e d e s i g n e r w i t h
c,ystem.
st:ll
is
the
language software while
h i g h s p e e d t h r o u g h t h e use o f
I t is p o s s i b l e f o r
acquisition
system
to
the
designer
hisiher
own
p a r t i c i . t l a r s p e c 1 f ication-?. irr
this
paper
acnc..tl sti o n s y s t e m presented.
The
Commodore 6 4 ,
t h e d e s i g n a n d t e s t i n g procedLkre f o r a d a t a
using unique
a
Commodore
feature
which i s o n e a f
of
64
computer
system
h a r d w a r e and s o f t w a r e of
t h e b e s t s e i l i n q and
lowest
are the
cost
444
computers,
utilized
is
design pitfalls a r e
to
simplify
pointed
out.
the
CSctual
design.
S o m e of t h e
experimental
reuslts
using the data acquisition system are shown.
INTRODUCTION
The
microprocessor-based
increasingly
as
plays
acquisition
important r o l e in modern instrumentation.
acquired in a digital Iorm can analyses
data
well
as
for
be
used
for
an
T h e data
various
data-base
which used t o be t h e wain
plotting,
Furpose +or t h e data acquisition in t h e pre-computer age. An engineer or a scientist who n e e d s a
microprocessor-based
acquisition system may have to make a decision o n selecting
data
o n e of t h e following three alternatives:
( 1 ) Purchase
a
specifications;
data
(2)
acquisition Purchase
system a
general
acquisition system conmerciaily available; construct a data acquisition,
tailored
to
purpose
the data
or ( 3 ) Design arrd
possibly u s i n g a mass produced
microcomputer system.
Alternative 1 requires the minimum effort He'she
or!
the
user s
can specify t h e system to b e a turn-l:'ey type.
it becumes
then t h e manufacturer's job t o make t h e system user-friendiv fool-pr-oci.
be
the
ax+
Quite understandably, t h i s aiternative will prove t 9
most
available
part.
expensive.
imnediat.ely.
In
addition,
filternative
2
the 2 5
system
prcbably
nav riot b e
the
flrt-~et
frequently utilized.
S i n c e t h e system is produced in a
to
alternative 2 c o s t s l e s s than alternative 1
large
quantity,
moderate
445 d o e s and t h e s v s t e m However,
since
the
readily
more
is
system
is n o t
to
available
taj
the
user.
l o r e d t o t h e u s e r ' s exact
s p e c i f i c a t i o r r s , h e / s h e m a y h a v e t o m o d i f y t h e s y s t e m d e p e n d i n g on
the situatiun.
C i l t e r n a t i x d e 3 is
Since t h e b a s e L w i t
a
is
complete
ea,;i i y d e v e l o p c u s t o m i z e d the
in
httrit
T h i s paper-
c i i r i l ized. deve;cjpnient
computer system,
software.
compc!ter-
In
presents
expenzive. t h e u s e r can
addition,
sc;broc:tines
as
least
the
usually
the
roitware
or- t - e r n e ? c a n b e + L i l i y
fundamentals
necessary
for
the
a d a t a a c q u i s i t i o n s y s t e m b a a e d un a m i c r o c o m p u t e r
,xi
s y s t m l [ 1 i.
Commodore 64 i C - 6 4 ! 3.5
a b a s e c!ni t
cystem
the
at
system.
I
m i c r o c o m p u t e r s q r t e m h3.s
T k i s m i trocumpcrter
time
the best-selling
As
ceiected
was t h e b e s t - s e I 1 1 n g
development
o+
been
compc!ter
of
t h i s d a t a acquisition
si;stelTi,
the price
cumputer- s v s t e m w a s ainong t h e l o w e s t w h i l e i t s a t i s f i e d ~ e q u l r e m e n t s fclr a d a t a a c G u i s i t i o n s y s t e m . b s emphasized
+ o r a ciatzt s c q i : i s i t i o n
bat-dwsre c n n i i g u r a t i o n s a r e p o s s i b l e
ar!a;oy
world
ipaper.
L.it-ewi5e: an i n f i n i t e
Hrjwsvet-,
an
with
the
computer
+or
5,:,5temm
interiacing
of
the
all
the
it should
Howeuer,
t h a . t m a n y o t h e r c o m p u t e r - 5 - j s t e m s may a l s o
a s a ba5e u n i t
quii1ified
somputef
be
we!l
..iJarza~is
t5e
real
o t h e r t h a n t h a t shown i n t h i s .
v a r i e t y of
software can b e developed.
a t t e m p t h a s b e e n made t o 5 i m p l i f y t h e c i r c u i t r y a n d
i 9 - f t w a t - e a s mcich a s p o s s i b l e .
1 . S.r'STEM FV!EF:'d5Elr) .FIJ.1
i 1lustrntes
data scquicitron 5;ystem c o r ? s i s t s of !C64SI7
a
block d i a g r a m of
!C64DAS)
t h r e e subsystems:
a Commodore 6 4 - b a s e d
and its s i g n a l f l o w .
The s y s t e m
a Commodore 64 m i c r o c o m p u t e r s y s t e m
t h e cot-e uf w h i c h i s a Commodore 64
('2-64)
microcomputer.
446 3
condltlon~ng system (SESf,
signal
!IFS!.
T h e e n t i r e process
c o n t r o l of t h e C - 6 4 . acquic-xtion
is
under
the
? h e p u r p o s e of t h e m i c t - o c o m p u t e r - b a s e d
data
of
data
a n d a.n i n t e r f a c i n g s y s t e m acquisition
i s
t o c o n v e r t electrical a n a l o g s i g n a i s c o m i n g f r o m
s e n s o r s / t t - a n s d u c e r s t o d i g i t a l d a t a and store i t
trmpot-arily
or
p e r m a n e n t l y i n t h e compoctter s y s t e m .
SIGNAL C O N D I T I O N I NG SYSTEM (SCS)
r--------
Commodore 64 Data Acquisition
Fig.1
1
System Signal Conditioner
SensordTransducers
I
.............. I
I
Signal Conditioner
))io L____--_1
COMMODORE 64 MICROCOMPUTER SYSTEM (C64S)
r----------1 I
I
Disk D r i v e U n i t
INTERFACING SYSTEM( I F S )
i I
____---Commodore 64
Control Main Frame Computer IBM 3081-D24
lrt
the
f o l l c w i n q e a c h of
wi 1 i b e d : . s c c t s s e d .
t h e s e s ! i b s y s t e m s and i t s e l e m e r ? t s
447 2.
SIGNAL CONDITIONING SYSTEM The o u t p u t s i g n a l
seldom
(SCS)
coming
directly
a
from
transducer
a s a n i n p u t t o I n t e r f a c i n g S y s t e m (IS).
suitable
1s.
The I S
requires t h e input to be within certain voltage l i m i t s .
For
the
s y s t e m p r e s e n t e d i n t h i s p a p e r t h e s e l i m i t s are
+5V.
If
the
signal
too l a r g e ,
is
i t must b e a t t e n u a t e d .
t h e s i g n a l is t o o s m a l l ,
hand,
if
data
suffers.
Accordingly,
There+ore, the
On t h e o t h e r
t h e r e s o l u t i o n of t h e a c q u i r e d
the
signal
should
be
amplified.
s i g n a l c o n d i t i o n e r shoctld i n c l u d e a n a m p l i f i e r
w h i c h would a d j u s t t h e s i g n a l
voltage
range.
should
Also,
and
it
the
amplifier
to
f i t
be
a
in
equipped
desirable
a bias
with
control. Another f u n c t i o n d e s i r a b l e f o r noise
60
wire(s),
Hz
n o i s e r a d i a t e d f r o m power l i n e s .
diffet-ential
be
amplifier
e s s e n t i a l l y of
may
common mode t y p e .
used a s i m p l e t w o s t a g e
cost
low
performance, desired:
is
The
encountered
The u s e of
shielded
w i t h t h e s h i e l d p r o p e r l y g r o u n d e d , u s u a l l y r e d u c e s b0 H r If
two
conditioner
T h e s i g n a l may c o n t a i n n o i s e .
noise t o a negligible level.
qf
signal
n o i s e v a r i e s b u t among t h e m o s t f r e q u e n t l y
sour-ce of
is
capability.
reducing
the
a
e.g.,
used
input
since
the
shown i n Fig.?,
operational high
does
Hi nrJise
6t:i
the
and
1%.
authors
which c o r ; s z s t s
amplifiersE21. impedance
a
work,
not
O n many o c c a s i o n s
amplifer
commercially R3
this scheme
If
?ow
a
high
drift,
is
a v a i l a b l e i n s t r u m e n t a t i o n a m p l i f i e r is R6
G A I N CONTROL
Fig.2
OUTPUT
0
B I A S COHTROL
Simplified Instrumentation Amp1 ifier
448 recommended. the
noise
(Example: source
Analog D e v i c e s A D 5 0 0 and b00 seriesj not
is
noise
level
i i l t e r i n q can b e
is
common mode n a t u r e ,
excessive.
filtering
m a i n frame computer.
is
shown
in
An e x a m p l e o f The
Fig.3.
if
procedure
filter of
The
the signal
i n t h e C-64
either
simple active
detailed
and,
necessary.
is
a c c o m p l i s h e d by h a r d w a r e as p a r t of
c o n d i t i o n e r or s c f t w a r e i n a c o m p u t e r ,
If
i t cannot be
or i n s t r u m e n t a t i o n a m p l i f i e r :
reduced by a d i f f e r e n t i a l the
of
-
or t h e design
active filter
Example o f Second Order Butterworth Active Low Pass F i l t e r Fig.3
0 $ouTpuT
d e s i g n c a n b e f o u n d i n Ref ' s . C Z I S C 3 1 . however.
ttiat
Theref ore, reduction
the
filter
the design i5 and
the
signal
p o s s i b l e by software: digital
filters
A1 though
t.hr r
it
well,
is
- w. f -
is
a
a
compromi s e t23.
f i 1 t e r s n o i s e of
between
Filtering
One
iilter.
rectangular
note,
b a n d w i d t h of a s i g n a l .
the
bandwidth
a digital
of
also
is
the
~ ~ i n d o wf i l t e r
noise
simplest (r.w.f.).
moderate f requnecies
quite
i n e f f e c t i v e i n suppressing g l i t c h type noise.
Olvmpic a v e r a g i n g f i l t e r ressonably
reduces u s u a l 1y
to
I t is i m p o r t a n t
C 4 1 is
a5
simple
as
the
r.w.f.
The and
e f f e c t i v e i n reducing glitcnes.
7.. INTEKFACING SYSTEM ( I F S ) At-:
beior-e
ar.aloq s i g n a l (5) m u s t
beinq trarisn!itted
be converted to a d i g i t a l
to t h e computer.
thev
must
m o r e than one s i g n a l be
multiplexed.
,
Thus,
i.e. the
(5)
The a n a l o g - t o - d i g i t a l
tal
c ~ f i ~ e r s i ois n executed bv an analog-to-digi I f t h e r e is
signal
,
converter
(ADC).
multichannel s i g n a l s ,
IFS
must
perform
two
449 functions:
multiple:-:ing
!Integrated Circai t ) are
o n cme c h i @ .
functions
&/a
and
available eowevet-,
Fip.4
which
intearate
the d e s i g n o f
CD4ii51 (€4-channel MUS)
and
l o w cost
ADCCjSG4 ( A n a l o g
INPUT CHANNELS
I
20
PC,
vcc
ijii
8
-
t
VIE1
13
OUT
1
2
KKiT
3
PBO
C
18 DO
PB1
D
17 Dl
PB2
E
PB3
F
PB4
H
7
PB5
J
0
PB6
K
PB7
CLKIN
12 06,
-
to-
T h e s e -two
F i g . 4 Interface System for Commodore 64 Data Acquisition System
5
L
DG 1 0
4'
11 0 7
t
v 1K
-
are
3.hrps
ctsed.
a.d,:antaqe
Rather
&I?C
e.9.
PB11
PORT 1
N o f USER PORT CN2 = GROUND
o+ t h e rwo-chip
than
assigning
(of
a
shared
type;
trv
all
approach
sern~orsf of
sigcsl
onIv a n e s i p n a l i o n a i t i o n e t -
s.r-6
diiierential
PBO
I N H "EE 'SS
many t r a n s d u c e r s
thac
transducer,
11 10
less e z p e n s i v e t h a n ~ r i ec h i p w h i c h d5es
Gnother
5Lippose
chip.
-
6
1K
arid
-
IFS.
4
CD4051
t5
,oh.
12
MUX
4
two
a n d an ADC.
Ciigi tal C o n v e r t e r bv s x c c e s s i v e a p p r o x i m a t i o n m e t h o d !
USER PI3RT CN2
the-se
an a p p r o a c h i s a d o p t e d i n t h i s
~ l l i i s t r a t e sa n e x a z p l e uf
w h i r h consists o f
IC s
riwnber o f
a m u l t i p l e x e r !!<EX)
to use t w r , s s p a t - a t e c h i p s :
palier
copvet-sicn.
afi
as.
foi3owr:
t h e ?,amp
t i , p e a.t-s
is
rcr,ditronet
can b e p l a c e d
transducers.
eqti~va';e!ii
~
4
3
eac'r
h e t x e e n EUX
Tf t h e siqi-,aIs
a r e rijF
CD4C151 s h c m f d b e r e p l a r e d w i t h an a p p t - o p r i a t e
CD4i152 w h i c t . h a s a c a p a b i l i t y of m u i t i p i e : - ; ; n g
four
c!iffer-e!itia? i n p u t s . T h e r e s o i u t i o n of ar: ADC
I.C.
c h i p m a y b e €4 S i t s ,
I(:) Sits,
12
or
bits
I6 bits.
i n a better resoiution.
Larger b i t s r e s u l t
s c a l e w h i l e t h e e r r o r b y 12 b i t - c o n v e r s i o n important t o note.
riot
Zjrec i s i o n
is
o n l y C1.024%.
t h a t t h e r e s o l u t i o n is
however,
accuracy.
Fnrthermore.
a c c u r a t e e n c u g h t o take f u l l
few
to
transducers
The
circui trr
simpler
t h a n a 1 0 or
using
t h a n t h o s e u s i n g a 12
t h a t many o f
then
sof t w a r e
present
time.
There
s l o p e method, dual
slope
noise.
W h i l e a f i 1t e r i f i g the
principle
with
instrumentation
a
pt-oduces
the
and
performance
flash.
The
Sirice
operation.
reduces
feature,
However,
slope
it
for 1
high
s t r a i n gauge (12
method
ampliiier of
very
a
a prominent
accuracy
approximation
dctal
.The C o R v e r s i o n t i m e b y t h e
dual
t r a c e a b l e t o t h e t J a t i o n a i Bureau of The s u c c e s 5 . i v e
the
They- d i f f e r
favorable
For example,
an
8 bitat
t h i s method
is t y p i c a l i y a few m s .
high
the data
conversion:
method
a
is
t h e signal.
that
wonder
an a d b a n a t g e a n d d i s a d v a n t a g e .
is
characteristic
reports
no
systems
of
characteristic
can b e achieved.
manufacturer tcgether
the
This
bandwidth oi
d u a l s l o p e method accuracy
speed
is e s s e n t i a l l y a n i n t e g r a l
method
frequency
limits
is
probably change i n f u t u r e .
successive approximation
integration has a low pass high
It
are t h r e e t y p e s of ADC's g e n e r a l l y u s e d .
i r o m each o t h e r i n t e r m s of
the
a r e much
or s o m e t i m e s e x c l u s i v e l y ,
The 5 . i t i ; a t i o n w i l l
are
bit-converter.
a n 8 bi t-converter
is e s p e c i a l l y t r u e f o r h i g h
It
than
l e a d i n g manufacturers of
a c q u i s i t i o n system o f f e r mostly. 'systems.
12
bit-converter.
the nation's
to
$%8 b i t - c o n u e r t e r
w h i c h t h e s y s t e m is c o n n e c t e d .
and
is
even 8 b i t s r e s o l u t i o n .
advantage of
is much f a s t e r afici less e ; : p e n s i v e
I t
related
A d a t a a c q u i s i t i o n s y s t e m car: p r o d u c e n o i n o r e a c c u r a c y
t.ransducer
a full
i s 0.33X of
the error b y a n 8 b i t - c o n v e r s i o n
For example.
bitsi
s t r a i n gauge
microinch/inch
Standara. method
in
general
executes
coriversi on
faster
mc;ci:
than
the
dual
T ~ p i c a l l y , i t . = c o n v e r s i o n t i m e i s 3:)ps.
capable oi :nethail
1Cli.i
ps c o n v e r s i o n .
ti:.
compared
m l c r i i i G bits
iftir35cj
dital
the
disadvantage
c~ailIvers.zon s p e e d . 51
fast
scci-ierter m a y b e u s e d . i-onvert1ng
At481!:ti3
chip
thls
Ir!
s'.?t:cessfv
i
(I I
c
1 5
Icwer
ctiannel
the
HUX atc, s h o w n .
!.i'G-
Hgwevet-.
to
1-F
dynamic
+lash
a
parallel
a
l i t a n d 3 are c o n n e c t e d
part
oi I : o m p l e x
The t-aiige o f
(CIFtl
since
r
d < d e c i ~ i m l : . The b i n a r y v a l u e of ! o r - a t z o n 56721.
Therefore,
#1,
a channel.
at:d Fi3i.i s? e l ~ e i r !t e s p e c t i v e l y a t selecCed
1 5
which employs t h e shows an The
are
ir.put vcitages i s limit
i s +5V
and
voltage references these pins
t-espectireiy.
1 . w h i c h is a c t u a l l y of
t h e C-64
a5 shown.
F o r example,
1Ihlghi
!:!lio?-;i a n d
if
I?
PBZ.
ther:
l(>lB,
equals
1 PR's is s t o r e d i n
memory
lf:tl(binary), Fort
example
i npcits
desired,
t o game p o r t
interiace Adapter
a r e ~ i s e dt o s e l e c t
i s
the
lZC!
of
speed
h i g n e r and l o w e r voltages:
11,
5
i nputs.
a n a f oq
a w i d e r t-anqe
?iris
channel
Flg.4
used.
r b i s w a y no additioi-ial
ccnr.ected
Fb:
CiT)C8(1).3,
Iri t h i s .=are the uppet-
be
pins
is
8
ca::
lhese
the
the i n m p a r a t o r s .
a
at-tained
a l o w cost c h i p ,
iar
1inl:t
a! e n e r e s s a r , . ' .
has
pasf;
preferred.
employs
a s i a s t a 5 t h e t-ise t i m e of
i ~ ! n : t e d b v VDD and VEE.
the
a
for
required,
Sirice t h e c o n v e r t e r
dpp'O::Ima-klon m e t h o d
tu
this
E m . paper
r c c i i t r *?
cvrineat.ed
is
lowers
alsc:
acquisition
ronvet-sion
;Matsushital
iiegasampf E;s;sec
I+
filter-
low
method u . t i 1 izing m a n v r e s i s t o r s a n d i o m p a r a t o r s t
c ~ ~ : v e t - s : o n 5Dee.d
.t
pass
lob4
I n cases o f d a t a
an e x t r e m e l i ;
Wt;er.
A.7
a
a
or
t h e 51-tccec-si VE; a p p r o x i m a t i o n i s g e n e r a l I y
ynai
of
c o n v e r - s i o n w h i c h w i l 1 a p p e a r a s . v i r t u a l noise.
einpluving
Eki:
does.
is a p o s s i b i l i t y o f
s l o p e method
l - t i i i ; p r o b l e m c a n b e r e s o l v e d by- s a m p i e - a n d - h o l d
filter-.
rrretiiod
E v e n a low c o s t c h i p is
major
The
slope
or
t h e c h a n n e l s e l e c t i o n c a n be e x e c u t e d
by
i n mem.10~.
s t o r i n g a c h a n n e l number
exercised,
however.
t h e C-64
Clhen
C a u t i o n must b e
is a l s o c o n n e c t e d t o t h e k ; e y b o a r d !
This p o r t
is t u r n e d o n ,
56321.
t h e bootstrapping procedure assigns
C I A #I P o r t B a s an i n p u t p o r t by poking D a t a D i r e c t i o n
at mem.lor.
(DDR)
matrix which
56.323 w i t h CJ.
makes
the
B a c t s as c o l u m n s o f
The p o r t
keyboard
work.
Therefore,
niwnber
Once
is
ttiis
comrn:.ir;ication impussible.
Ref
' 5
17J & C 1 0 7
is
keyboard through
detailed
w i t h 0 a5 p a r t o f
routine
before
information
in t u r n connected to pin 6 ,
to the of
ADC
can
one h a l f
of
of
the
reference
far- d a t a c o n v e r s i o n .
case.
R E F I T = +2.5
iliqztal
IPS
t h e program t o b e
be
back
to
obtained
the from
(:!at+, .:.?iGi ( $ - a t
I n t h i s case t h e
t h e ADC.
However.
the
ADC
is
P i n 9 c o r r e s p o n d s t o REF/2,
Here,
voltaqe.
the
reference
i .e.,
255,
is
assigned.
In
this
T h e r e f o r - e , t h e r e f e r e n c e v o l t a g e i s +5 V .
V.
t h a t +5
w h i c h is
t h e MUX,
V
corresponds
1111
to
the
maximum
converted
13318).
T h e c cw i ' v er si bn s p E E d d e p e n d s o n t h e ADC c l o c k ,
which i n t h i s
c a s e is d e t e r m i n e d b y r e s i s t a n c e a n d c a p a c i t a n c e C 6 1 .
ps5sitrle
to
s c a l e v o l t a g e t o w h i c h t h e maximum b i n a r y
riumlier
I t i i s imp!
instruction
i n p u t by u s i n g p i n s 6 a n d 7,
differential
a
means t h e f u l l
.doitaqe
and
t h e corresponding signal
ground-referenced.
is
taking
input,
w i t h pirt 7 d i s c o n n e c t e d f r o m g r o u n d . 1-e.
disabled
going
i s a u t n m a t i c a l l y c o n n e c t e d t o p i n 3, o u t p u t o f
capable
t h e channel
.
When t h e c h a n n e l has b e e n s e l e c t e d ,
input
is
t h e keyboard becomes
i t is n e c e s s a r y t u w r i t e a n
acquisition
More
the
computer
(mem.loc.56323)
~ i s e df o r d a t a
Leeyboard.
the
Hence.
t h e DDR
poke
however,
with
a
( 1 1 1 1 1111H) m u s t b e p o k e d i n t o t h e DDR.
255
done,
it
if
d e s i r e d t o u s e t h i s p o r t as a n o u t p u t p o r t t o c o n t r o l selectron,
Register
t o c o n n e c t an e x t e r n a l
I t is a l s o
clock t o p i n 4 w i t h r e s i s t o r a n d
capacitor
removed.
particular
WR
conversion
start
the
of
this
can
before
the
conversion.
INT
When t h e c o n v e r s i o n is c o m p l e t e d . signal
with
S i n c e C S a n d R D a r e b o t h h e l d low,
new d a t a i n .
!pin 3 ) a l o n e t r i g g e r s
This
speed
t h e A / D c o n v e r s i o n is a l w a y s c o m p l e t e d
t a k e s
computer
the
combination outpaces t h e a u t h o r s' software i n t h e
RC
i-e-,
C-64:
However,
!pin
5)
low.
goes
b e u t i l i z e d f o r c o m m u n i c a t i o n b e t w e e n t h e C-64
a n d t h e ADC b u t t h e d a t a a c q u i s i t i o n r o u t i n e c a n b e made c o m p l e t e withoc:t
this
connection.
executed by t h e conversion
If
microprocessor
speed
of
the
the
data
(CFU6510)
ADC,
this
kind
n e c e s s a r - y s i n c e t h e CPIJ h a s t o w a i t f o r t h e conversion
before
taking
t h e new d a t a i n .
d a t a a p p e a r s a t p i n s 17-11 w i t h p i n significant b i t #2 o f
t h e C-64,
56577,
the
ADC..
s o f t w a r e of
so t h a t
4.
completior?
of
dat.a
The c o n v e r t e d 8 b i t the
least CIG
This
t h e c o r r e s p o n d i n g DDK When m e m - l o c .
is p e e k e d b y BASIC or
assembly
feature
of
the
t h e next data C-64
simplifies
Once t h e
data
conversion
by
h a r d w a r e and 1s
stored
in
i t c a n be t r a n s f e r e d t o RAM (Random A c c e s s M e m o r y )
m e m . l o c . 56577 wi 1 1 b e r e a d y t o s t o r e t h e n e x t d a t a .
ST?FrwF\RE
rlEvEL.owErn
The d a t 3 a c q u i s i t i o n p r o g r a m c a n b e w r i t t e n i n e i t k e r or
the
a u t o m a t i c a l l y s e n d s a n e g a t i v e p u l s e t o PCZ,
t h e data acquisition.
m e m . loc.5.5577,
than
communication is
b e p o k e d w i t h (3 i n a d v a n c e .
must
CIA
This
of
representing
Therefore,
which i n t u r n t r i q g e r s t h e s t a r t of the
faster
w h i c h is d e n o t e d a s U s e r P o r t CN2 i n Fig.4.
which c o n t a i n s t h e d a t a ,
language,
is
routine
1 - h i s d a t a is t r a n s m i t t e d t o p o r t E; o f
!LSB).
p o r t i s a s s i g n e d merit. l o c . 56577.
[ m e m . loc.56579)
17
acquisition
assembly
language.
block: o f f r e e FiAH s p a c e .
The I t
data
acquired
BASIC
must he s t o r e d i n a
is focmd t h a t t h e f a s t e s t con-..fersion
454 s p e e d u t i l i z i n g BASIC
abocct
is
faster
high
assembly
level
1anguage
a
If
faster
t h e p r o g r a m may h a v e t o b e w r i t t e n
c o n v e r s i o n s p e e d is r e q u i r e d , in
3i’ s a m p l e s / s e c .
language
or
programming,
assembly the
language.
speed
With
4MK!
exceeding
s a m p l e s / s e c is e a s i l y a t t a i n e d . The d a t a a c q u i r e d c a n b e g r a p h i c a l l y d i s p l a y e d using
bitmapping
on
T h e b i t m a p p i n g vdith BASlC
scheme.
the
1s
iew
possible
h u t v e r y slow:
a screenfctl of
Theref ore,
a s s e m b l y 1 a n g u a q e p r o g r a m is s t r a n g l y recommended
an
minrites.
displat.. The a u t h o r s h a v e
acquis: t i o n , two
bitmapping t a k e s
w h i c h t a k e s lrss t h a m o n e second f o r a s c r e e n f u f
f o r bitmapping, of
a
CRT
acquisition
and
s o f t w a r e used for
physical
DATALUG.
‘.ma::imcim:
sof tware
for
data
and
1s
suitable for
DATALOG is s i m i l a t -
moni t o r i n g
quantities,
and
e.g.,
recording
and self-explanatory.
r a t h e r slowlr c h a n g i n g
atmospheric temperature
e n e r g y c o n s u m p t i o n i n a bcci P d i n g .
to
I t is
many c o m m e r c i a l l y a v a i l a b i e d a t a l o g g e r s .
I n
of
T h e DACE is f o r r e l a t i v e l y f a s t
samples/sec)
4300
package
The p a c k a g e c o n s i s t s
a t r a n s i e n t phenomenon.
o b t a i n i n g d a t a of
suitable
a
d i s p l a y and t r a n s m i s s i o n .
DGCO
parts:
developed
B o t h p r o g r a m s are
The main f e a t u r e s of
and
hourli/
menu-driven
DACQ a n d DATALOG w i l l
b e summarized i n t h e f o l l o w i n g .
DACQ
was
introduced
in
Fief.Ll3.
I t
has
the
following
f e a tLit-es :
il)
Number o f
( 2 ) Speed nf Standard
channels: sampl inq:
t3 F r o g r a m m a b l e up to 430CI, s a m p l e s / s e c .
r a t e s a v a i l a b l e f r o m menu a t 1 0 ,
sampling
100
a n d 1OVO s a m p l e s / s e c . ( 3 ) Number of
data
to
be
stored:
Frogrammable.
Default
455 r e s n l ts i n 320 s a m p l e s / c h .
(4) N o n - v o l a t i l e (5) Display:
s t o r a g e ot: d a t a :
S e q u e n t i a l f i l e in disk..
t w o or t h r e e c h a n n e l s of
One,
s e l e c t a b l e f r o m menu.
R e s o l u t i o n of
data
vs
t i m e
d i s p l a y - 32V X 200
f o r one channel. RSZTZ or MODEM.
!6) P a t a t r a n s m i s s i o n :
i n BGSIC l a n g u a g e o n l y .
DG1ALC)G is w r i t t e n
Among
the
main
f e a t u r e s a r e as follows:
( 1 ) Number o f
channels:
or
8
16
with
MUX
modification
(F3g.5).
CHANilELS 910 11 12 13
-
13 14
l6 'OD 3
- 15 - l 2 %051
-1
11
5
10
15 16 14
CHANNELS
l6
""
-
13-1 15 -3
MUX CD4051
-4 l2 1-
11
-t10
'SS"EEINH 8 7
2-7 INH'EE'SS 6 7
6 L
5
5-6
9 t-9
2
Fig.5 Sixteen Channel Mu1 t i p l e x i n g
14-2
PBO
-8 8
-
- PB1
t t e PB3
( 2 ) S p e e d of
sampling:
c C-64 PORT 1
Programmable anywhere from e v e r y few
s e c o n d s t o e v e r y few hours. ( 3 ) Number of
data
to
be
stored:
Programmable
at
menu
prompt.
(4) N c j r i - v o l a t i l e
(5) D i s p l a y : DCICQ.
s t o r a g e of
data:
Sequential f i l e i n disk.
S h a r i n g t h e DACL! d i s p l a y p r o g r a m .
Then s e l e c t D i s p l a y f r o m menu.
Load a n d run
456 (6! D a t a transmissian:
t7r
R S 2 Z 2 a n d MODEM.
When DATALOG p r o g r a m shown
are
Also
period.
the
on
CRT
shown
s a m p l i n g s made a n d
1s
on
actlvated. ard the
16 c h a n n e l s of
revlsed CRT
data
at e v e r y s a m p l l n g
are
the
number
t h e t i m e e l a p s e d from t h e b e g i n n i n g
of 0.f
d a t a acqulsltion.
5 - E X P E R I M E N T A L RESULTS
€::ampies
e x p e r i m e n t a i r e s u ? ts u s i n g t h e d a t a a c q u i s i t i o n
CI+
s.ysten\ prececterf
ip.
t h i s p a p e r are shown i n
Figs.&
&
7.
Fig.6
Fig.6 Cam Velocity Experiment Data by DACQ
*ep:-estlrrtci was a t t a c h e d ocit
plot
of
cam v e l o c i t y v s t i m e .
to a cam-cam
0s t b e e n g l n e ,
follower assemblv.
A velocity transducer which had been t a k e n
and t h e o u t p l i t o f t h e t r a n s d u c e r w a s c o n n e c t e d
457 t o t h e C64DAS. d i g i t i z e d signal The
data
u s i n g SAS
was
The assembly was r u n by an e l e c t r i c
DACO
was a c q u i r e d b y t h e C64DAS u s i n g sent
to
program.
t h e main f r a m e computer and was p l o t t e d
FLOl program.
Another example shown intensity used f o r
motor and t h e
i n
Fig.7
monitored by a p h o t o v o l t a i c
i s
the
cell.
record
of
solar
DATALOG program was
t h i s experiment.
SOLAR I N T E N S I T Y
Fig.7
c . 7
Solar
Intensity Experiment
Data b y DATALOG
A.M.
6. CDNCL US I CIN There e x i s t s a b e l i e f a
toy.
among academic people:
a Commodore i s
P r o b a b l y more v i d e o games have been developed f o r t h e C-
64 t h a n any o t h e r computers. foundation.
A t
So,
t h e same t i m e ,
this
however,
belief
i s
not
without
we have t o r e a l i z e t h e
458 fact that used.
No
"1" i s "1" a n d "0" is "0" no m a t t e r
sophisticated
computer
can
q u a l i t y than
" 1 " c r e a t e d b y t h e C-64.
limitations
i n u s i n g t h e C-64
system, i.e., C64DAS
published
an
Of
article
questions.
computer of
"1"
course,
is
better
t h e r e a r e many
as a b a s e u n i t o f d a t a a c q u i s i t i o n
satisfactory
inquiry w e r e received. technical
produce
s p e e d a n d memory c a p a c i t y .
produces
what
Rut,
results.
Since
the
the
authors
o v e r o n e hundred letters of
i n BYTE C l l ,
Many o f t h e m w e r e Especially,
i n many cases,
some
asking
specific
t h e a u t h o r s w e r e s u r p r i s e d by
many l e t t e r s f r o m E u r o p e . I t i s a l s o i n t e r e s t i n g t o know t h e nation's
leading
u s i n g a C-64, Glso,
manufacturers
fact
that
one
of
the
of h y d r a u l i c equipment h a s been
which w a s hooked t o
million
dollar
a v e r y l a r g e h o s p i t a l h a s been u s i n g C-64's
equipment.
t o monitor
medical equipment.
The
d a t a a c q u i s i t i o n s o f t w a r e p a c k a g e is a v a i l a n l e f r o m t h e
f i r s t a u t h o r a t a nominal c h a r g e t o b e p a i d The
package
user's manual. suggested
to
includes If
you
explore
to
his
department.
D A C a a n d DATALOG p r o g r a m s i n a d i s k a n d a uses
this
package
very
often.
it
is
a p o s s i b i l i t y t o t r a n s f e r t h e programs t o
EPROM a n d p u t i t i n a game
cartridge.
You
really
can
make
a
s o ph i st i r a t e d t ny
kEFERENCES Lll t::,.Ukamctra a n d t:..Gqhai-Tabriz, "A Low-Cost Data-Acquisition S y s t e m . " B Y T E , F e b r u a r y 1985, pp. 199-202. L Z J t.:..Okamiira, "Measurements L a b o r a t o r y Note," Dept . o f MEGM,
North D a k o t a State U n i v e r s i t y . C 3 1 H.M.Berlin, Design of Active Filters with Experiments, B1 a c k s b u r g . L 4 1 C.Okamura a n d W-Chu, " O l y m p i c A v e r a g i n g Method a s a D i g i t a l to be published in the Journal of Computer Filter,"
459
Applications (ACCESS). "A/D and D/A Converters Link: Digital Controls t o an Analog World," Control Engineering. December 1984, pp.5557. National Semiconductor, Linear Databook. Intersil. Data Acquisition Handbook. Analog D e v j ces, Integrated Circ~tit s Databook. Commodore Business Machines, Commodore 64 Programmer's Reference Guide. 1:. A g h a i -Tabr i z Data Acquisition based on Commodore 64, M . S . Thesis.Dept.of MEAM. North Dakota State University, 1985.
C51 H.M.Morris,
C61
C71 C81 C91 L 10 1
.
ON THE ESTIMATION OF MONTHLY MEAN PHOSPHORUS LOADINGS M.E. Thompson and K. Bischoping University of Waterloo ABSTRACT This paper reports on a study of estimation of monthly mean phosphorus loadings, using daily readings from the Niagara river for 1975-1982. Two alternatives to the current method of a c c o u n t i n g for missing data are proposed from finite population sampling theory.
1.
THE PROBLEM
Suppose we wish to measure the mean daily amount of a chemical such as phosphorus flowing past a certain location in a riverlover a short period of time such as a month . Formally, this can be expressed as
where N is the number of days in the period, xi is the flow past the location on day i, ci is the concentration of phosphorus in the water on day i, and the "loading" for day i is yi = c.x 1 i' Typically the flow xi is known for all N days, but the concentration ci is measured only for a sample s of n days. If we assume for simplicity that there is no technical error in the measureTent of concentration when it does take place, then the yi are known for
days in the sample s, and the problem of estimating p is one of Y estimating a finite population mean from sample values of the variate. Since the failure to measure concentration on certain days is not controlled, there is no reason to believe that the sample s is generated bv anything approaching a random sampling design. Thus the choice of estimator is not convincingly justified by an appeal to randomization based properties, although this has sometimes been attempted. For example, Dolan et a1 (1981) studied several ways of estimating p
Y
and concluded that the best choice was a stratified
461
version of
See also Lam et a1 (1983). Here, -
N (Ci=l
=
(IiEs y.)/n = sample mean daily loading
=
(Iics xi)/" = sample mean daily flow
UX
y
x
xi)/b1 = mean daily flow,
2
s2 =
[liES
s
[IiES xi yi
X
-
XY-
xi
-
n x21/(n-1)
-
n
=
x y]/(n-l)
sample variance of flow =
sample covariance of flow and loading.
Without the factor in braces, is the classical ratio estimator uY (Cochran, 1977); the factor in braces is designed to correct for sampling bias when simple random sampling has been chosen to select the sample. The estimator (1.2) was the only one considered by Dolan et a1 which took into account the knowledge of all values of the flow xi. Thus it is not surprising that it performed relatively well. HOWever, from the standpoint of recent developments in sampling theory this choice can be criticized. First of all, as indicated above, the sample of days for which concentrations would be available is not necessarily random. Second, there are other estimates besides the ratio estimator which would use the flow data fully in estim-
-
ating 1-1 the choice among these ought ideally to depend on a model Y' for the daily concentration. It can be shown that the ratio estimator would be optimal if the variance of the concentration varied as l/flow (Royall, 1971). From data from the Niagara River 19751982 this inverse relationship does not appear to be attained, although in the winter months flow decreases and concentration fluctuates a little more wildly. Thus the most widely accepted justification for the ratio estimator does not apply in this case. Since the sample is not random and some serial autocorrelation in the concentration series seems likely a predictive or model based aDproach to the estimation is indicated.
In this approach
yl, ...yN are jointly distributed random quantities and the sampled
462
yi are used to "predict" the sum of the unsampled yi and hence p Y' It is also to be hoped that the model will yield suitable estimates of uncertainty in the prediction of u Y' An examination of the Niagara River daily series 1975-1982 indicates that the mean level and covariance structure of the concentration series vary seasonally. Within most months, however, the log concentration series appears approximately stationary and Gaussian. Thus, for estimating monthly means no stratification seems to be necessary, although for the estimation of yearly means it would be desirable to stratify the series by season or by month. In the next sections we present some estimators based on simple models for concentration or its logarithm. The behaviour of these and corresponding uncertainty estimates is examined in a small scale empirical study involving artificial deletions from two months for which complete data are available. 2. 2.1
SOLUTIONS FROM THE PREDICTIVE APPROACH TO SAMPLING THEORY Estimators based on a zero correlation model for concentration. Suppose it is reasonable to assume, for simplicity, that the ci are stationary with mean C, variance 02, and zero autocorrelation. In this section they are not assumed to be Gaussian. Then the best linear unbiased estimator of C is (2.1) The best linear unbiased estimator (predictor) of 1-1
Y
is
(Royall, 1971). Since
in the sense of the above the mean squared error E (fi - 1,) IY model can be estimated in a robust manner by
where
463
See Royal1 and Cumberland, 1978, for a discussion of the sense in which (2.4) is robust to departures from the assumed constant variance of the ci series. If the ci are close to normally distributed or the xi are not highly variable it is reasonable to apply a normal approximation to the distribution of N(C - Y ) as exhibited in (2.3). YI 2.2
Estimators based on time series interpolation for concentration We may say that interpolation type estimators take the form
where Eiis an estimated or interpolated value for c from the i = 6 for sample. Estimator (2.2) in fact is of this type, where unsampled i, and it can be shown that the ordinary ratio estimator follows the same pattern with Ei = y/x. However, in this section, estimators with variable Ei will be derived, as would seem appropriate under models taking account of the time series structure of the data. Since for the Niagara river data the logarithm of concentration appears symmetrically distributed about its mean, consider the model
zi
Rnc.
=
X +
q
(2.6)
i'
where qi, i = l,...,N is a mean 0 Gaussian time series with auto2 correlation function p Under this = corr(nj,nk) and variance 0 jk 2 model ci is marginally lognormal, with Eci = exp{X + 0 /2} and 2 2 C O V ( C ~ , C ~=) exp(2X + 0 (1 + p ) I - exp12X + 0 I . Assume to begin with thato2jkand the p are known. If Rngi jk denotes the best linear unbiased predictor of an unsampled Rnci then it can be shown (Bartlett, 1983; Ripley, 1981) that
.
,.
where X is the best unbiased estimator of X which is homogeneous linear in the
nc
1'
and the aij are chosen to minimize
464 E(Rn2
-
Rnci)
2
.
I n general a i j of
-1
(2.7) i s t h e (i - j ) t h element of Vzs Vss,
where i f V i s t h e m a t r i x o f t h e p j k , m a t r i x w i t h rows i n d e x e d by m a t r i x Vss
I,,,
s
Vzs
i s t h e (N
-
n ) x n sub-
and columns by s , and t h e n x n
Also, X =
i s analogously defined.
1.,Es
aj Rncj wher;
aj = 1 and a . i s p r o p o r t i o n a l t o t h e j - t h column sum of V ss. 3
With t h i s d e f i n i t i o n
where b i j (2.7).
lkEs
= a . . + a.(la i k ) , t h e c o e f f i c i e n t o f Rnc i n 17 3 j Thus a “ c o r r e c t e d “ e s t i m a t o r i s
with t h i s d e f i n i t i o n E ( E i Setting
Ei
= ci
xt =
ci)
= 0.
i f t h e ci i s n o t m i s s i n g , w e c a n w r i t e t h e e s t -
imator (2.5) of loading a s where
-
( X ~ , . . - , X ~ ) ,
fiyTS Et
=
- 2,” -
= xt
( El f . . . r C i N ) .
Then t h e p r e d i c t i v e MSE i s x t Var ( E
-
-
-
and Var d e n o t e s t h e c o v a r i a n c e m a t r i x ; e l e m e n t of V a r ( $
- 2)
- -
c)x/N2
.
where c t = ( c l r . . , c N ) ,
note t h a t t h e (i - R ) t h
i s 0 i f ci o r c R i s sampled.
Thus anunbiased
o r nearly unbiased estimator of t h i s covariance matrix w i l l y i e l d a n e s t i m a t o r o f t h e MSE o f
GyTS.
With RnEi g i v e n by ( 2 . 8 ) , i t i s e a s y t o show t h a t EEiZR
= exp
= exp
Similarly
465
The resulting formula for E(Ci - ci) (EL - ca) is the (i - R)th element of var(E - c) . The use of the formulas above for estimation of the MSE of 'yTS requires estimation of A , o2 and the piR. h
If a first order autoregressive structure for Lnci is assumed, , where p is strictly between -1 and then piR is of the form p 1. For each day i let ji = number of days since previous nonmissing c value, and ki = number of days until next non-missing c value. If there is no previous [next] non-missing value set Now let c . = previous non-missing c value and ji = m[ki = m]. Pl c = next non-missing c value. The uncorrected estimator (2.7) qi becomes
I'-il
an;.
1
=
where
fi + ai(Rnc
Pi
ai
=
pJi(1
-
-
f i ) + bi(Rncqi - f i )
p 2ki)/(l
(2.9)
- p2(ki+ji)),
k. 2j b. = p '(1 - p 1)/(1 - p 2(ki+ji)). Note that if p = 0, RnSi is simply fi, while if p 1, Rnci a linearly interpolated value between Lncpi, Rncqi. For 0 < p 1, Rnsi will lie between fi: and the linearly interpolated value. In the first order autoregressive case, the variance o2 can be estimated in a consistent manner by -+
-2
u
=
Ziss(inci
-2 - Rnc) /(n -
-+
1);
this will be satisfactory if p is not too close to 1. The parameter p may be estimated or assumed. For moderately long series with almost complete data, to estimate it seems advisable. However, for the short concentration series considered here, the maximum marginal likelihood estimate for p (Ramakrishnan, 1985) has been found to be highly unstable when a large number of c values are missing. EMPIRICAL STUDY In the empirical study two months were chosen for which data were complete. These were March 1978 and November 1979. For these two months the laglautocorrelation for log concentration is estimated at .7 and .4 respectively. 3.
466
For e a c h of t h e two months, 1 0 0 samples w e r e g e n e r a t e d w i t h 5 randomly chosen c o n c e n t r a t i o n r e a d i n g s m i s s i n g , and a f u r t h e r 1 0 0 with 1 5 concentration readings missing.
For e a c h sample t h u s gen-
e r a t e d , t h e f o l l o w i n g e s t i m a t o r s were c a l c u l a t e d . (i) Dolan's estimator (ii) t h e e s t i m a t o r
c;YD
of
(1.2);
of (2.2) YI t i o n f o r concentrations;
(iii) t h e time-series e s t i m a t o r p = 0,
.4,
c;YTS
of
(2.9)
assump-
with
.95;
t h e MSE e s t i m a t o r Vs of
(iv)
based on t h e i . i . d .
(2.4),
associated with
cys:
GyTS.
(v) t h e MSE e s t i m a t o r VTs a s s o c i a t e d w i t h T a b l e s 3 . 1 and 3 . 2 compare t h e performance of t h e mean e s t i m a t o r s f o r t h e two months. TABLE 3 . 1
Performance of t h e e s t i m a t o r s o f p
Mean o f
1;
,.yD of p y D
Mean of MSE
of
f o r t h e month o f March, 1 9 7 8 .
5 points deleted 6715.79
True mean MSE
Y
c1
6745.97
6647.88
49967.10
322972.49
6732.16
6616.07
48869 -93
314361.84
Mean of
cyTSI
p = 0
6723.55
MSE
GyTS,
p = 0
48603.65
of
1 5 points deleted 6715.79
6599.93 313892-74
Mean of
GyTS,
p = .4
6700.40
6716.35
MSE
of
cyTS,
p = .4
43451.50
186371.48
Mean of MSE of
G
yTSl
P = -95
6598.35
6532.24
cyTSl
p = -95
36788.47
174288.32
Means and MSE's a r e o v e r 1 0 0 r e p l i c a t i o n s
467
TABLE 3.2 Performance of the estimators of 1.1 for the month of November,l979 Y True mean
5 points deleted 3717.30
15 points deleted 3717.30
3714.80
3768.61
17597.13
72685.98
3710.18 17310.83
3754.50 67973.80
3708.96
3752.82
17013.73
64804.02
Mean of _yD MSE of pyD Mean of f; YI MSE of f; YI Mean of f; yTS, MSE
of
Mean of MSE of
GyTS,
=
p = 0
cyTS, p iyTS,
= .4
3689.89
3743.64
p = .4
18198.76
69288.76
Mean of f; p = .95 *YTS' MSE of 1 . 1 ~ ~ p~ ' = .95 Mean and MSE's are over 100
3617.29 22303.53
replications
Although the number of replications is small, some observations did emerge: (i) The estimators $ and f; are similar in performance YD YI for both months. They have only a small bias, if any. ~ ~ to have a downward bias, particularly for (ii) 1 . 1 ~tends p = .95, which brings it close to a linear interpolation estimator. However, the stability of this estimator keeps its MSE low, and for March (with actual p about - 7 ) it has the lowest MSE's. h
(iii) The variance estimators VI and VTs give realistic values, VI tending to be conservative. The values of VI and VTs are extremely variable. It would be useful to study the coverage properties of confidence intervals based on them. REFERENCES Bartlett, R.F., 1983. On Estimation with Kriginq for Finite Populations under Superpopulation Models. Ph.D. Thesis, U n i v e r s i t y of Waterloo, Waterloo, Canada. Cochran, W.G., 1977. Sampling Techniques. Wiley, New York, 428 pp. Dolan, D.M., Yui, A.K. and Geist, R.D., 1981. Evaluation of river load estimation methods for total phosphorus. J. Great Lakes Res. 7:207-214.
468 Lam, D.C.L., Schertzer, W.M. and Fraser, A.S., 1983. Simulation of Lake Erie Water Quality Responses to Loading and Weather Variations. Environment Canada. Ramakrishnan, V., 1985. Marginal Likelihood Analysis of Growth Curves. M.Phi1. Thesis, University of Waterloo. Ripley, B.D., 1981. Spatial Statistics. Wiley, New York, 252 pp. Royall, R.M., 1971. Linear regression models in finite population sampling theory. In Foundations of Statistical Inference, ed. V.P. Godambe and D.A. Sprott. Holt, Rinehart and Winston, Toronto. Royall, R.M. and Cumberland, W.G., 1978. Variance estimation in finite population sampling. J. Amer. Statist. Assoc. 73: 351358.
ESTIMATION OF L O A D I N G BY N U M E R I C A L
A.H.
EL-SHAARAWI,
K.W.
INTEGRATION
KUNTZ A N D A .
SYLVESTRE
ABSTRACT Methods
based
on
are
used
interpolation loading The
from
a
variance
is
the
estimated to the
1984.
l o a d i n g of
1
The
results
estimator
source
into
is
loading estimate
a
of
a
the
the
input
system.
given.
yearly during
steady
linear
water
also
Niagara River
indicate
and
This
chloride
the
decline
period
in
input
c h l o r i d e t o Lake O n t a r i o .
INTRODUCTION
is
It the
well
l e v e l of
lake.
known
the
_e_F_
Vollenweider loading
Slater
Bangay
and
source
phosphorus
10,000
metric
1977.
Accuracy
factors
samples
are
load
tool
has
the
estimate.
In
this
follows. into
total
Let the
paper,
i n which are
estimator only
the
Formally,
the
a the
which
sample
recorded:
the of
and
used
problem
point
approximately
on
( i ) with the
the
metric
depends
following:
upon
to
reduction
Erie
from
5,700
consistency
the
depends
eutrophication.
Lake
reduced to
the
or
lake
to of
tons/yr
in
number
of
sampling the
water
analysis
is
(iii)
the
obtain
the
estimating
l o a d i n g i s d e f i n e d as
a c t ) d t b e t h e i n s t a n t a n e o u s l.oad of a s u b s t a n c e
water
load of
in
and
loading is considered.
the
that
estimate
t h e way
a
controlling
been
results
expression
of
chemical substances
recommended t h e
for
1972173
load
care
collected,
of
report
in
the
the
quality
c.( 1 9 8 0 ) a
(1980)
of
and
mathematical
water
(loading)
include
(ii)
performed,
as
tons/yr
which
strategy;
load
that
inputs
phosphorus
S
an
point
applied
1975 t o
integration
derive
a
l o a d i n g t o Lake O n t a r i o by
approach
then
to
or
river
of
numerical
system
in
S during the
the
interval
interval
(0,T)
(t,t+dt).
Then,
the
is
T L
=
E(t) dt. 0
(1)
470
2(t)
can be expressed as the product of the instantaneous water
flow f(t)
dt and the concentration C(t)
of S .
This gives
T L
f(t)
=
C(t)
dt
0
For a set
of
measurements
of
the
concentration and
the
flow
rate, the objectives are to estimate L and its standard error. All
methods
available
for
estimating L
assume
the
finite
According to this approach, the period T
population approach.
is divided into n intervals with
iG
interval (i = 1 , 2
(1)
reduces to
, . . . ,N )
the concentration within the
a constant.
In this c a s e formula
N
where f;
=
C;
the
=
concentration
then becomes how given
methods Casey
by for
and
t o
<
available for n is
of
in
S
i G interval
and
estimating Salbach
the
The finite population approach (1985).
Bischoping mean
(1974).
daily
Dolan
estimating the mean
Two
loading
et g . daily
that the best is the ratio estimator. pointed
and
The problem
estimate L given that the measurements are
N intervals.
Thompson
several ways of (1985)
the
the flow of water during the iLh_ interval.
are
(1981)
load
and
different given
in
described concluded
Thompson and Bischoping
out that the ratio estimator is optimal
if the
variance of the concentration varies as l/flow. In this paper, the finite population
approach
is not used.
The loading is assumed to be a continuous function in time and hence
numerical
estimating L.
integration
methods
are
appropriate
for
The stochastic characteristics of the flow and
the concentration are used to derive the standard error for the estimate o f
L.
Finally, this approach is applied to estimate
the chloride load from the Niagara River to Lake Ontario during the period
1 9 7 5 to 1 9 8 4 .
473 2
SOME COMMENTS O N THE METHODS F O R ESTIMATING THE L O A D I N G The
ratio estimator
(1985)
and
presented
w i l l
hence
by
Casey
and
i s d i s c u s s e d b y Thompson not
be
Salbach
considered
here.
(1974)
Dolan
and
and
Bischoping
The et
methods
al.
(1981)
load are
f o r e s t i m a t i n g t h e mean d a i l y
i=l and
L2
- _
=
c,
f
(5)
respectively,
=
n
where
the
number
of
for
days
which
the
flow
f
and
the
concentration C a r e measured, n and
n
Casey
and
likely
and
flow
Salbach
to
give the
(1974)
very
indicated
close
that
results.
concentration
are
This
the
two
is
true
uncorrelated.
methods
are
if
the
only
To
illustrate
this note that
-
L1
where S,
r
and
is
L2
the
Sf
same
result
correlation
are
respectively.
the
Hence, if
r
(smaller) than Lp.
(6)
r SfS
=
is
coefficient
standard
if
r
=
positive
0,
between
deviation the
two
(negative)
C
of
L1
f,
and
and
f ,
produce
the
C
methods and
and
is
larger
412 Under
is
a
the
random
assumption variable
methods
estimate
same).
The
v a r i a n c e of
which
Finally,
the
is
f;
mean
same
difference
thing
between
known and
pc
completely
(expected
the
the
values
variance
and
uZc,
variance
of
L1
C i
two
are
the
and
the
is
L2
indicates if
variables
that
with
that
both
with
L2
C;
means
has
and
a
smaller
f i
uc
are
and
variance
assumed and
pf
than
to
be
variances
L1.
random and
UZc
u 2 f , t h e n t h e e x p e c t e d v a l u e s o f L1 a n d L 2 a r e
where
is
p
appears
the
that
correlation
L1
and
means
that
daily
l o a d when p
with
L1
decreases
both
is
L2
methods
>
biased
f
of
and
sample
i n c r e a s e s and h e n c e
From of
the bias
size,
while
is a consistent
this
ucuf.
(underestimate)
Furthermore, the
c.
estimates
overestimate
0 (p
independent
as n
between
are
the
it
This mean
associated that
of
L2
estimate
of
UcVf. Under
the assumption that
and Q i s b i v a r i a t e normal, by
and
the
the variances
joint of
L1
d i s t r i b u t i o n of and L2
f
are given
413 respectively.
Lq
i s s u p e r i o r t o L1.
is
the
develop
an
This demonstrates above
that
assumptions,
t h e mean d a i l y
3
is
The d i f f e r e n c e between t h e s e v a r i a n c e s
Lp
maximum
Indeed,
likelihood
under
the
estimator
for
load.
TYPES OF DATA In
order
it
loading, data
to
is
important
available
for
the
to
appropriate understand
(i)
at
the
the which
flow
estimate
concentration ( i i ) at
the
and of
rate
the
and
true
and
type
of
ways
in
which are:
common o f y(ti)
and
x(ti)
concentration flow
rate
the
are
time
at
€(ti)
ti
and
the
C(t;):
, . . . ,t n + l ,
points
the
tO,tl
x(ti+6)
are
t h e amount
of
interval
t h e most
are
for
many
nature
There
t O , t l, . . . , t n + l ,
points
measured
the
calculation.
which t h e d a t a a r e a v a i l a b l e ,
estimator
available,
where
and
0
<
the
Y(ti)
estimate
into the system i n the
< -
6
is
Y(t;)
w a t e r which flowed
(ti*ti+l)
measurements
-
ti+l
ti.
This
and (iii) at
the
but
the
points t
4
. . . ,t n + l
points
concentrations lo,.
we
tO,tl,
..,t
x(ti)
where m
are
<
have
y(ti)
available
at
or
Y(ti)
only
m+2
n.
ESTIMATION OF L O A D I N G Expressions
interval
[O,T]
for
the
are
estimate
presented
for
of
the
each
loading
of
the
d a t a given above.
The t r a p e z o i d a l r u l e i s u s e d
integral,
is not
below
but
are
this
equally
i n t e g r a t i o n method.
a
restriction,
applicable
for
and any
during
three
the
types
of
to estimate the
the
methods
other
given
numerical
474 Data of
4.1
Type 1
The t r a p e z o i d a l r u l e e s t i m a t e o f
where L ( t i )
= x(ti) y(ti),
instantaneous load a ( t i ) . a model
then
E(L)
=
the
L,
Assume t h a t
expected value
1 Q
the
is a s t o c h a s t i c process with
Denote
E(Z),
i s t h e e s t i m a t e of
To d e t e r m i n e t h e p r o p e r t i e s o f
is needed.
for E(t;)
where n ( t ; )
which
L is
2
- t 1-1 . 1
(ti+l
-
a
random
a (ti) i=l + a ( tn+l)(tn+l - tn)}
- {a(to)(tl
=
of
variable
by
2
t o ) +
.
The e x p e c t e d mean s q u a r e e r r o r i s g i v e n b y E(L -
which of
L)*
is c o m p o s e d o f
the
above
to
numerical
be
evaluated
a,,
=
tl
-
two p a r t s
equation.
c h a r a c t e r i s t i c s of
+ (Q - L ) ~ .
= Var(i)
L,
The
as
shown by
first
represents
right the
The n u m e r i c a l
if
we
a
t o ,
a i
assume =
special
ti+i-t;-l
( i
integration
L.
By and
for
side
e r r o r due error
1 , 2 , . . . ,n )
form =
hand
stochastic
while the second r e p r e s e n t s the
integration.
tn+l-tn, then
the
can
setting an+l
=
475 0'
n+l
1
-{
V a r ( ~ )=
a'
2
+
n
n-i+l
I:
I:
4
Furthermore,
i=O
when
the
distance A=ti+l-ti,
measurements
U'
are
equally
spaced
with
the
.
(n + 1/2)
A'
under equal spacing, we have
Generally for
any p u ,
Finally,
expression
the
then
then =
Var(L)
u=l
. . ,( n + l ) ,
u = 1.2,.
= 0 for
When p u
i i+u
u
i=O
j
a a
P
i
the
for
autoregressive
process
where
pU=pu is
4.2
n+l
(n + 5 [ 2
V a r ( L ) = u 2 A'
l
+
p
4p{n-(n+l)
+
-_________
p+pn+lj
1
( 2 n + 1 ) ( n + l 1' ( 1 - p ) '
D a t a of T y p e 2 The
There
data are
two
available
in
estimators
this
case
available
are
which
Y(ti) are
and
x(ti+6).
dependent
on
the
assumptions used. (i)
Under
constant
the
within
assumption
the
interval
that
E
(ti,ti+l),
(x(ti+&))=c(ti) then
as
an
is
a
estimate
f o r L we h a v e n L1
=
1
i =O
Y(ti)
X(ti
+ 6)
.
If Y ( t i ) i s assumed t o be n o n - s t o c h a s t i c
(11)
then
476 n
( i i )
Under
c(t)
is
t o the
assumed
-
i=O
t.
where
an approximation
then
function,
c(t;+6)
for t i
ti)
<
t
5
. +1 t1
( 2 ) becomes
y(ti)
n
1
=
=
(x(ti+6))
E
flow r a t e is obtained as
Y(ti)/(ti+l
=
and hence e q u a t i o n L
smooth
t o be a
instantaneous
Y(ti)
that
assumption
the
1+1
-
.
Jti+l c ( t ) dt ti
ti
By t h e t r a p o z o i d a l r u l e w e h a v e
L
=
1
z"
2
i=O
The measurements
the
using
interpolation
are and c ( t ; ) concentration
c(ti+l) the for
Of
observed
.
+ C(ti))
{c(t;+,)
Y(ti)
They 6.
are
+ ti-1
+ 6,
ti
known.
not at
ti
concentration
BY
+ 6 a n d t ; + l + 6 , t h e e s t i m a t e o f L in t h i s c a s e i s =
x(ti+
Y(ti)
6)
6
- _ 2
4.3
+
1 2
i =O
:
i=O
Data of
Y(ti) .
ti
-
(X(ti
Y(ti)
+ 6 )
than
applied
-
X(ti
6)}
+
-
X(ti-,
+
6 ) )
+
6
2
Type 3
flow to
+ 6 )
i-1
In t h i s case the concentrations the
{x(ti+,
i=O
measurements.
derive
an
The
estimator
a r e measured approach for
the
used
less
frequently
above
loading.
could The
be
only
477
TABLE 1.
Annual Chloride Load and Its Standard Error (x lo5 mt yr-’ ) Type 3
Type 1 Year
.-____-
Load
76-77 76-77 77-78 78-79 79-80 80-81 81-82 82-83
83-84
TABLE 2.
Standard Error
Load
Standard Error
0.807 0.834 0.622 0.931 0.835 0.352 0.540 0.298 0.715
45.413 44.090 43.393 42.066 42.546 39.035 37.670 35.510 34.890
0.599 0.793 0.629 0.883 0.834 0.621 0.601 0.329 0.596
45.502 44.138 42.747 42.191 43.514 39.090 37.775 35.513 35.037
Autocorrelation Function for the Chloride Load Chloride Concentration for 197Sf1976
and
for
Autocorrelation __ Concentration
_ I -
Lag
Load
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1.00 .31 .08 .02 09 01 .03 .06 .03
____________ ________
-_
-. -.
.Ol .05 .07 .02 .02 .07
--
1.00 .36 .12 .08 .06 .05 11 .08 .04 02 -.05 -.01 .00 -.01 .03
-. -.
-___
478 additional of
the
Y(ti)'s
5
computation
values
of
is
the
the
estimation or
concentration
a r e measured but
the x(t;)'s
methods
estimate Lake
described
in
the
the
annual
chloride
Ontario
during
the
concentrations
1
type
of
load. load
are a v a i l a b l e 3
and
are
1,
In T a b l e clear
large.
is
data cases
that
necessary
to
and
weekly
it
appears
order
the
The
on a w e e k l y
basis
while
1,
year
AR(1) model,
2
the
as
for
and
4
between
the
weekly for the
respectively.
data
appropriate.
to
River
to
chloride the
type
3,
the
two
of
the
example.
are
is
given clear
flow
for data chloride
estimates of
the
respectively. estimates
these load
for
3.
type
is
estimates, type
In
1
both
a n a u t o r e g r e s s i v e model Table
2
gives
the
€or t h e l o a d and t h e c o n c e n t r a t i o n
an
and It
the
chloride
year
Using
the standard e r r o r s of
two m e t h o d s
estimating give
used
t h e methods
s t a n d a r d e r r o r of
within
is
AR(l),
1975/1976
Therefore,
concentration that
Niagara
1984.
difference
model
the
are
to
1 and d a t a of
type
To e v a l u a t e t h e
correlation function
the
where
available.
section
from
1975
available
columns
assuming d a t a of
is
not
of
a r e not
previous
load
period
i s a v a i l a b l e on a d a i l y b a s i s .
it
interpolation
points
APPLICATION The
It
the
the
at
the
properties
auto-
for the of
t h e load are c a l c u l a t e d
in
columns
that
the
3
and
chloride
5
of
Table
load
of
the for
1, the
N i a g a r a R i v e r t o Lake O n t a r i o shows a s t e a d y d e c l i n e d u r i n g t h e period of
this
study.
REFERENCES Casey, D . J . and S a l b a c h , S . E . , 1974. IFYGL s t r e a m m a t e r i a l s b a l a n c e s t u d y (IFYGL). P r o c . 1 7 t h Conf. Great Lakes R e s . , I n t e r n a t . A s s o c . Great L a k e s R e s . 668-681. Dolan, D.M., Yui, A.K. and Geist, R . D . , 1981. E v a l u a t i o n of r i v e r load e s t i m a t i o n methods f o r t o t a l phosphorus. J. Great Lakes Res. 7 : 207-214. Thompson, M . G . and B i s c h o p i n g , U . , 1985. On t h e e s t i m a t i o n of m o n t h l y mean p h o s p h o r u s l o a d i n g s . (This volume) Vollenweider, R.A., Rast, W. and Kerekes, J., 1980. The p h o s p h o r u s l o a d i n g c o n c e p t and Great Lakes e u t r o p h i c a t i o n , pp. 207-234. P h o s p h o r u s Management S t r a t e g i e s f o r L a k e s , R . C . L o e h r , C . S . M a r t i n a n d W . R a s t ( e d s . ) , Ann A r b o r S c i . , Ann A r b o r , M i c h . , 4 9 0 p p . S l a t e r , R.W. and Bangay, G . E . , 1980. Action taken t o control In P h o s p h o r u s M a n a g e m e n t phosphorus i n t h e Great Lakes. S t r a t e g i e s f o r Lakes, R.C. L o e h r , C.S. M a r t i n and W. Rast ( e d s . ) , Ann A r b o r S c i . , Ann A r b o r , M i c h . , 4 9 0 p p .
INTERVENTION ANALYSIS OF SEASONAL AND NONSEASONAL DATA TO ESTIMATE TREATMENT PLANT PHOSPHORUS LOADING SHIFTS K. A. Booman, The Soap and D e t e r g e n t A s s o c i a t i o n , New York P. M. Berthouex, The U n i v e r s i t y o f Wisconsin-Madison L a r s P a l l e s e n , T e c h n i c a l U n i v e r s i t y o f Denmark, Copenhagen INTRODUCTION I n t e r v e n t i o n s i n e n v i r o n m e n t a l systems a r e o f t e n p r o m u l g a t e d w i t h o u t complete knowledge o f t h e system.
The p r o m u l g a t o r s o f t e n impose t h e i n t e r v e n t i o n w i t h
a m i x t u r e o f c o n f i d e n c e and hope--confidence a r e a l change i n t h e d e s i r e d d i r e c t i o n
t h a t t h e i n t e r v e n t i o n w i l l cause
tempered by a degree o f h o p e f u l
s p e c u l a t i o n t h a t t h e s h i f t w i l l be l a r g e enough t o j u s t i f y t h e c o s t o f t h e intervention. S c i e n t i s t s n a t u r a l l y want t o e v a l u a t e t h e e f f e c t i v e n e s s o f t h e i n t e r v e n t i o n a f t e r d a t a have become a v a i l a b l e .
The g e n e r a l problem i s t o e s t i m a t e " t h e
e f f e c t o f an i n t e r v e n t i o n t h a t has been made w i t h t h e i n t e n t o f c a u s i n g a system t o change where t h e b e h a v i o r o f t h e system i s i n d i c a t e d by a s e t o f data t h a t are a time
s e r i e s , and so t h e o r d e r i n which t h e d a t a o c c u r as
w e l l as t h e i r magnitude i s i m p o r t a n t " (Box and T i a o , 1975). E s t i m a t i n g t h e change i n phosphorus l o a d e n t e r i n g a sewage t r e a t m e n t p l a n t when a d e t e r g e n t phosphate ban goes i n t o e f f e c t seems t o be a s t r a i g h t f o r w a r d task.
Abundant u s e f u l d a t a e x i s t .
And, a c a l c u l a t i o n comes q u i c k l y t o mind.
Use s i m p l e averages t o c h a r a c t e r i z e t h e l e v e l s b e f o r e and a f t e r t h e ban. U n f o r t u n a t e l y , t h i s method w i l l f r e q u e n t l y g i v e m i s l e a d i n g r e s u l t s . simple
average
(horizontal)
assumes
level
that
there
has
about which f l u c t u a t i o n s
been
a
occur.
long-term
If
Using a
stationary
there i s a trend
( s t o c h a s t i c o r d e t e r m i n i s t i c , l i n e a r o r n o n l i n e a r , upward o r downward), average
i s n o t a good r e p r e s e n t a t i o n o f t h e t i m e s e r i e s .
this
I f there i s a
seasonal p a t t e r n , a r b i t r a r y d e c i s i o n s must be made a b o u t how t o " c u t o u t " a s e c t i o n o f t h e d a t a o v e r w h i c h t h e ban t o o k e f f e c t .
480 More i m p o r t a n t l y ,
commonly a p p l i e d s i g n i f i c a n c e t e s t s assume t h a t t h e d a t a
a r e independent o f each o t h e r i n t i m e , i . e . , a r e random.
t h a t v a r i a t i o n s a b o u t t h e model
Most e n v i r o n m e n t a l t i m e s e r i e s d a t a t e n d t o be a u t o c o r r e l a t e d .
I g n o r i n g t h e a u t o c o r r e l a t i o n w i l l make (1) t h e averages seem more p r e c i s e t h a n t h e y a c t u a l l y a r e and ( 2 ) b i a s t h e s i g n i f i c a n c e t e s t s t o w a r d i n d i c a t i n g a s t a t i s t i c a l l y s i g n i f i c a n t change i n l e v e l when none has o c c u r r e d . Time
series
analysis
provides
a u t o c o r r e l a t i o n i n t o account. o f t h e f o r m ARIMA ( O , l , l )
the
means
for
properly
taking
the
A s i m p l e o b s e r v a t i o n e r r o r random walk model
f i t s many d a t a s e t s .
Where t h e r e i s an annual
seasonal p a t t e r n , a more c o m p l i c a t e d model must be used. F i g u r e s 1, 2, and 3 show some d a t a and t h e f i t t e d model.
The A R I M A ( O , l , l )
model f i t s t h e Kalamazoo and Saginaw, M i c h i g a n d a t a ( F i g u r e 1).
In all, it
gave an adequate f i t t o 12 o f 2 1 d a t a s e t s f o r Wisconsin t r e a t m e n t p l a n t s , two o f which a r e n o t r e p o r t e d because l o c a l c o n d i t i o n s a r e h i g h l y a t y p i c a l , and 6 o f 10 d a t a s e t s f o r M i c h i g a n t r e a t m e n t p l a n t s .
The Ann A r b o r , E a s t
Lansing, ( F i g u r e s 2 and 3 ) and Warren d a t a i l l u s t r a t e a seasonal p a t t e r n f o r which t h e s i m p l e model i s i n a d e q u a t e .
A seasonal model t h a t has been used
s u c c e s s f u l l y t o f i t these t h r e e Michigan p l a n t s i s presented.
THE B A S I C (NONSEASONAL) MODEL The b a s i c model i s s i m p l e i n concept.
The t i m e s e r i e s i s a f f e c t e d by an
i n t e r v e n t i o n t h a t i s n o t f u l l y r e a l i z e d u n t i l a few months l a t e r (we have used a four-month
t r a n s i t i o n gap f o r t h e d e t e r g e n t ban p r o b l e m ) .
t h i s gap a r e n o t used i n t h e i n t e r v e n t i o n a n a l y s i s , values,
they
represent
neither
the
before
or
Data i n
because as t r a n s i t i o n after
situation.
An
e x p o n e n t i a l l y w e i g h t e d moving average (EWMA) i s used t o e s t i m a t e t h e l e v e l s i m m e d i a t e l y b e f o r e and a f t e r t h e gap.
The magnitude of t h e e f f e c t o f t h e
i n t e r v e n t i o n i s t h e d i f f e r e n c e o f t h e two, (O,l,l) degree
since t h e f o r e c a s t f o r ARIMA
o r o b s e r v a t i o n e r r o r / r a n d o m w a l k model, of
uncertainty
associated w i t h
i n c r e a s e s as t h e . f o r e c a s t
the
i s a horizontal line.
estimated
effect
i n t e r v a l i n c r e a s e s and t h i s must be accounted f o r
when a s s e s s i n g t h e p r e c i s i o n o f t h e e s t i m a t e d e f f e c t . P a l l e s e n e t a l . (1985).
intervention
The
Details are given i n
6.4
- . _ ..
.............................................. . . :. . . . . . . . . . . . .. . . .S a. g. i.n.a .w . . . .
6.2
- -
6.0 5.8 5.6 5.4
- . u...T*
. . . . . . . . . . . . .
- . . . . . . . . . . . . . . . . . .. . . . . . . ................. . . .......... . . . . . . . . . . . . . . . . . .~ . . . . . . . . . . . . ... ln(inf. P, kg/day) . . ...........................
-
-
,
.
ban
5. '
'
'
'
20 '
'
'
'
'
'
40
3
,i 1 -2
1
'
'
60
80
month
month FIGURE 1 . SAGINAW AND KALAMAZOO DATA AND F I T OF THE RANDOM WALK-OBSERVATION ERROR MODEL.
100
P
CD N
6.6 6.4 6.2 6.0 5.8
l n ( i n f . P , kg/day)
5.6 5.4
month
0.22
--
0.12 . .
0.02
. . . . . .-
-
- 0.08
-
-- 0.18
0
20
40
60
month
ao
FIGURE 2. ANN ARBOR DATA, F I T OF RANDOM WALK, SEASONAL RANDOM WALK, OBSERVATION ERROR MODEL, AND THE SEASONAL COMPONENT OF THE F I T .
100
6.0
5.8
.........
5.6 5.4
. . . . . . . .
5.2
. . . . . . .
5.0
0
20
40
60
80
100
month
0.16
0.06 -0.02 . . . . . . . .
-0.12
-0.22 month
F I G U R E 3 . E A S T L A N S I N G DATA, F I T OF RANDOM WALK, SEASONAL RANDOM WALK, OBSERVATION ERROR MODEL, AND SEASONAL COMPONENT OF F I T .
484 Using
the
EWMA
instead o f
a
simple
average
to
estimate
the
pre-
and
p o s t - l e v e l s a r i s e s f r o m t h e t i m e s e r i e s model t h a t was used t o d e s c r i b e t h e data series.
The m a t h e m a t i c a l model d e s c r i b e s t h e d a t a as b e i n g g e n e r a t e d by
a process t h a t d r i f t s i n t h e p a t t e r n o f a random w a l k and w h i c h c a n n o t be observed w i t h o u t e r r o r . The model i s Y
t
= y + e t t
where Y t
i s the underlying true,
b u t unmeasurable v a l u e o f t h e v a r i a b l e , yt
i s observed v a l u e o f t h e v a r i a b l e , et i s t h e "measurement" e r r o r , and Et t h e shock c a u s i n g t h e random w a l k ( d r i f t ) . s e r i e s has s u b s t a n t i a l i n t u i t i v e appeal. accept
than
completely
random
a p p l i c a t i o n discussed i n t h i s
This representation o f the data
In p a r t i c u l a r , i t seems e a s i e r t o
variation paper,
is
about
the Yt
a
will
fixed
level.
be mass f l u x
In of
the
total
phosphorus o r t h e l o g a r i t h m s o f t h i s q u a n t i t y . Equations 1 and 2 can be combined t o e x p r e s s t h e model as
i n which at and a E
a r e random "shocks" ( w h i t e n o i s e ) and r e l a t e d t o et and t-1 i n E q u a t i o n s 1 and 2. T h i s i s t h e ARIMA ( O , l , l ) model, w h i c h i s d e s c r i b e d
t i n Box and J e n k i n s (1976) and Harvey (1981).
The s i n g l e
parameter o f
t h i s model,
e x p o n e n t i a l l y w e i g h t e d moving averages.
8 becomes t h e w e i g h t i n g f a c t o r f o r D e t a i l s o f how t h i s parameter i s
e s t i m a t e d a r e g i v e n i n P a l l e s e n e t a l . (1985).
THE SEASONAL MODEL I d e n t i f y i n g a s i m p l e , u s e f u l model has been c h a l l e n g i n g .
While t h e data sets
t h a t a r e seasonal show an annual c y c l e , t h i s c y c l e i s n o t smoothly s i n u s o i d a l and s o i t has been d i f f i c u l t t o model.
485
The seasonal model t h a t f i t s t h e E a s t L a n s i n g d a t a and a l s o d a t a f o r Warren and Ann A r b o r , M i c h i g a n ,
Yt
=
is
y t + st + et
S t= * t - 1 2
+
(4)
ft
T h i s model i n c o r p o r a t e s an annual p a t t e r n , b u t i t i s n o t a d e t e r m i n i s t i c s i n e p a t t e r n ; i n s t e a d , i t a l l o w s more f l e x i b i l i t y . f o r by E q u a t i o n 4.
O b s e r v a t i o n e r r o r i s accounted
E q u a t i o n s 4, 5 and 6 a r e c o n s i d e r e d t o r e p r e s e n t a u s e f u l
seasonal model. I n t h e s e e q u a t i o n s , st i s t h e seasonal e f f e c t component a t t i m e t and f t i s t h e random shock c a u s i n g ( o r a s s o c i a t e d w i t h ) change i n t h e seasonal e f f e c t component.
The model i s s i m p l e and y i e l d s l e v e l f o r e c a s t s .
The smoothness
priors-state
space approach o f Kitagawa and Gersch (1984),
Gersch and Kitagawa (1983),
and Kitagawa ( 1 9 8 1 ) , a f t e r m o d i f i c a t i o n t o make
i t f u l l y Bayesian, was used t o f i t t h e model g i v e n by E q u a t i o n s 4, 5, and 6
and t o e s t i m a t e t h e i n t e r v e n t i o n e f f e c t .
Kalman f i l t e r e q u a t i o n s a r e used t o
c a r r y o u t a p p r o x i m a t e maximum l i k e l i h o o d e s t i m a t i o n . The c a l c u l a t i o n s r e q u i r e d f o r a n a l y s i s o f t h e nonseasonal and seasonal d a t a s e t s were r e a d i l y programmed i n APL and c a r r i e d o u t on a microcomputer. Models o f t h e A R I M A f a m i l y c o n t a i n i n g t h e o p e r a t o r ( 1
-
3”’B
2
+ 6 ) , which
d e f i n e s a s i n u s o i d a l p a t t e r n w i t h 12-month c y c l e were a l s o t r i e d , f o r example (1 -
3 1 / 2 ~+ B ~ ) Y , = ( 1 -
a
This d e t e r m i n i s t i c sinusoida
(7) p a t t e r n was n o t s a t i s f a c t o r y .
486
RESULTS OF INTERVENTION ANALYSIS The phosphate d e t e r g e n t ban i n M i c h i g a n was i n f o r c e on October 1, 1977.
The
d a t a s e t s s t a r t J a n u a r y 1975 and r u n f o r seven y e a r s . The Wisconsin h i s t o r y i s more c o m p l i c a t e d . 1,
1979.
T h i s ban was
A ban was p u t i n t o e f f e c t on J u l y l a t e r , J u l y 1, 1982, and was
l i f t e d t h r e e years
reimposed on January 1, 1984.
TABLE 1.
ESTIMATED INFLUENT PHOSPHORUS LOAD SHIFTS DUE TO PHOSPHATE DETERGENT REGULATION I N W I S C O N S I N AND M I C H I G A N
Treatment P l a n t
S h i f t due t o ban' kglcap-yr
S h i f t due t o ban l i f t ' kglcap-yr
Algoma, W I
-1.061
t0.410
Brookfield, W I
-0.133
to. 179
Burlington, W I
-0.150
+0.254
Clintonville, W I
-0.094
t0.065
Kewaskum, W I Omro, W I
-0.251 -
-0.137 +O. 242
Racine, W I 3 Milwaukee, W I Grand Rapids, M I
-0.585
+0.277
-0.522
+O. 193
-0.247
NA4
Jackson, M I
-0.356
NA
Kal amazoo, M I
-0.833
NA
Lansing, M I
+0.201
NA
Midland, M I
-0.881
NA
Saginaw, M I
-0.508
NA
East Lansing, M I
-0.257
NA
Warren, M I
-1.17
NA
Ann A r b o r , M I
-0.581
NA
'October
1, 1977, f o r M i c h i g a n ; J u l y 1, 1979, f o r W i s c o n s i n
n
L
J u l y 1, 1982, f o r W i s c o n s i n o n l y ; no ban l i f t i n M i c h i g a n .
3 R e s u l t r e p o r t e d f o r t h e combined Jones I s l a n d and South Shore p l a n t s ; t h e A R I M A ( O , l , l ) model a l s o f i t s each p l a n t i n d i v i d u a l l y .
4NA
-
n o t a p p l i c a b l e s i n c e t h e r e was no ban l i f t i n M i c h i g a n .
487 Summory o f Jnterventfon Anolysfe Results 1.2
0.9
.
i
. . . . . . . . . . . . . . . . . . . . . .. . . . . . .
'
......
..
......
'I
U
-
0
a
.
0.3
t
....
......
'
Kg
................. ........
........
..........................................
I
Hlchlgon 1977
Wisconsin 1979
Yieconsfn 1982
FIGURE 4 . BOX AND WHISKER PLOT OF THE ESTIMATED EFFECTS OF THE BAN I N MICHIGAN AND THE BAN AND BAN-LIFT I N WISCONSIN ON WASTEWATER TREATMENT PLANT INFLUENT PHOSPHORUS LOADS.
The d a t a a n a l y z e d were m o n t h l y averages, day.
Calculations
were
done
after
P per
expressed as pounds o f t o t a l
taking
the
natural
logarithm
since
e x p e r i e n c e showed t h a t t h i s s t a b i l i z e s t h e v a r i a n c e . The e s t i m a t e d s h i f t s
i n i n f l u e n t phosphorus mass l o a d i n g f o r t h e s i x t e e n
p l a n t s f o r w h i c h t h i s model was adequate a r e g i v e n i n T a b l e 1.
These r e s u l t s
have been c o n v e r t e d f r o m t h e l o g a r i t h m i c t o t h e o r i g i n a l m e t r i c and t h e n converted t o kglcapita-year. The r e s u l t s f o r E a s t L a n s i n g , Warren, and Ann A r b o r were e s t i m a t e d u s i n g t h e seasonal
model.
All
other
results
come
from
the
simple
random
w a l k / o b s e r v a t i o n e r r o r model. F i g u r e 4 i s a box and w h i s k e r p l o t o f t h e p e r c a p i t a 1.
This
plot
shows t h e c o n s i d e r a b l e
variation
s h i f t s l i s t e d i n Table
between c i t i e s .
It also
i n d i c a t e s t h a t t h e e f f e c t o f a d e t e r g e n t ban i s a b o u t 0.3 k g / c a p i t a - y e a r .
488
CONCLUSIONS An A R I M A ( O , l , l ) percent o f
t i m e s e r i e s model has been used s u c c e s s f u l l y on a b o u t s i x t y
the data
set
analyzed t o
successfully a p p l i e d t o t h r e e data sets.
date.
A
seasonal
model
has
been
The model may p o s s i b l y be adequate
f o r t h e r e m a i n i n g s e t s b u t t h e a n a l y s i s i s n o t complete a t t h i s t i m e .
The
e f f e c t o f a d e t e r g e n t phosphate ban on i n f l u e n t wastewater t r e a t m e n t p l a n t P l o a d s appears t o be a b o u t 0.3 kg/cap. yr., as o f 1982.
489 REFERENCES
Box, G.E.P. and J e n k i n s , G.M., 1976. Time S e r i e s A n a l y s i s : F o r e c a s t i n g and C o n t r o l . Revised e d i t i o n , Holden-Day, Oakland, C a l i f o r n i a . 1965. A change o f l e v e l o f a n o n s t a t i o n a r y t i m e Box, G.E.P. and T i a o , G.C., s e r i e s , B i o m e t r i k a , 52: 181-192. 1975. I n t e r v e n t i o n a n a l y s i s w i t h a p p l i c a t i o n s t o Box, G.E.P. and Tiao, G.C., economic and e n v i r o n m e n t a l problems. J o u r . Amer. S t a t . Assoc., 70: 70-79. Gersch, W. and Kitagawa, G., 1983. The p r e d i c t i o n o f t i m e s e r i e s w i t h t r e n d s and s e a s o n a l i t i e s . J o u r . o f Bus. and Econ. S t a t . , 1: 253-264. Harvey, A. C., 1980. Time S e r i e s Models. H a l s t e d Press, New York. Kitagawa, G., 1981. A n o n s t a t i o n a r y t i m e s e r i e s model and i t s f i t t i n g by a r e c u r s i v e f i l t e r . J o u r . Time S e r i e s A n a l y s i s , 2: 103-116. Kitagawa, G. and Gersch, W., 1984. A smoothness p r i o r - s t a t e space m o d e l l i n g o f t i m e s e r i e s w i t h t r e n d and s e a s o n a l i t y . J o u r . Amer. S t a t . Assoc., 79: 378-389. P a l l e s e n , L., Berthouex, P.M. and Booman, K.A., 1985. E n v i r o n m e n t a l i n t e r v e n t i o n a n a l y s i s : W i s c o n s i n ' s ban on phosphate d e t e r g e n t s . Water Research, 19: 353-362.
SEDIMENT RESPONSES DURING STORM EVENTS I N SMALL FORESTED WATERSHEDS
W.A.
RIEGER and L.J. OLIVE
Department o f Geography Royal M i l i t a r y C o l l e g e ACT A u s t r a l i a
ABSTRACT Measurements o f s u s p e n d e d s e d i m e n t c o n c e n t r a t i o n and d i s c h a r g e d u r i n g storm e v e n t s a r e examined t o d e t e r m i n e t h e possible p a t t e r n s i n response of sediment t o flow i n f i v e s m a l l f o r e s t e d w a t e r s h e d s . The e x a m i n a t i o n o f s e d i m e n t r e s p o n s e i s c a r r i e d o u t i n t w o c o n t e x t s : ( a ) The r e s p o n s e o f s u s p e n d e d s e d i m e n t t o t o t a l d i s c h a r g e ( b a s e f l o w and q u i c k f l o w or s t o r m f l o w ) , o r i n t h e framework commonly u s e d f o r sediment p r e d i c t i o n modelling. ( b ) The r e s p o n s e o f s u s p e n d e d s e d i m e n t t o q u i c k f l o w , where q u i c k f l o w i s p o s t u l a t e d as a p o s s i b l e mechanism o f s e d i m e n t d e l i v e r y t o t h e channel. I n b o t h c o n t e x t s , h y s t e r e s i s d i a g r a m s are f i r s t u s e d t o d e t e r m i n e t h e b r o a d p a t t e r n s between s u s p e n d e d s e d i m e n t c o n c e n t r a t i o n and f l o w R e s u l t s i n d i c a t e t h a t seven d i f f e r e n t r e s p o n s e i n t h e time domain. Spectral analysis is then t y p e s are o p e r a t i n g i n t h e w a t e r s h e d s . used on t h e s t o r m e v e n t d a t a i n an a t t e m p t t o i s o l a t e p o s s i b l e f a c t o r s which may be c a u s i n g t h e d i f f e r e n t r e s p o n s e types. The temporal and s p a t i a l v a r i a t i o n s found t o be o p e r a t i n g i n t h e w a t e r s h e d s have i m p o r t a n t i m p l i c a t i o n s f o r b o t h t h e d e s i g n o f m o n i t o r i n g n e t w o r k s and t h e a s s o c i a t e d water s a m p l i n g t e c h n i q u e s ; and f o r t h e commonly u s e d l i n e a r p r e d i c t i v e methods of e s t i m a t i n g sediment l o a d s .
1.
INTRODUCTION
Suspended s e d i m e n t c o n c e n t r a t i o n s (mg 1-1) i n s t r e a m c h a n n e l s h a v e been used by r e s e a r c h e r s a s m e a s u r e s o f r a t e s o f e r o s i o n and s o i l During s t o r m e v e n t s i n a b a s i n , complex l o s s from d r a i n a g e b a s i n s . e r o s i o n a l p r o c e s s e s o c c u r over p a r t s o f t h e s l o p e s of t h e b a s i n and t h e r e s u l t a n t o f t h e s e processes i s s e d i m e n t d e l i v e r y t o t h e c h a n n e l ( W a l l i n g , 1 9 8 3 ) . By m o n i t o r i n g b o t h d i s c h a r g e (m3sec-l) and suspended s e d i m e n t c o n c e n t r a t i o n s a t a p o i n t i n t h e c h a n n e l , a measure o f s e d i m e n t d e l i v e r y c a n be o b t a i n e d f o r t h e c o r r e s p o n d i n g w a t e r s h e d area.
491 While such d a t a can be used f o r bulk e s t i m a t e s of e r o s i o n , t h e y can a l s o be used f o r t h e determination of t h e behaviour of suspended sediment c o n c e n t r a t i o n s , o r sediment responses, d u r i n g storm events. Once t h e responses a r e known and f u l l y understood, they can form t h e b a s i s f o r water q u a l i t y r e s e a r c h on suspended sediment. Networks can be designed s o a l l important a s p e c t s of sediment response a r e monitored, o r p o s s i b l e p r e d i c t i o n models developed based on t h e known response p a t t e r n s . I n t h e p a s t , however, t h e bulk of r e s e a r c h on sediment response has been hampered by t h e r e l a t i v e l y s i m p l i s t i c models of suspended sediment behaviour. The models a r e based on a simple l i n e a r r e l a t i o n s h i p between d i s c h a r g e and suspended sediment and w e r e o r i g i n a l l y developed f o r t h e p r e d i c t i o n of suspended sediment i n t h e form of a r a t i n g curve. The curve t a k e s t h e form:
c = a g b ,
(1)
where C i s suspended sediment c o n c e n t r a t i o n , Q i s discharge, a and b a r e c o n s t a n t s f o r a p a r t i c u l a r watershed. The curves a r e estimated from a sample of f i e l d d a t a c o n s i s t i n g of a wide range of discharges and t h e corresponding c o n c e n t r a t i o n s , u s i n g l e a s t squares r e g r e s s i o n on t h e l o g a r i t h m i c a l l y transformed d a t a . Though r a t i n g curves are convenient t o use, t h e i r simple l i n e a r framework g i v e s l i t t l e i n d i c a t i o n of t h e dynamic behaviour of t h e r e l a t i o n s h i p between c o n c e n t r a t i o n and discharge. Slope e r o s i o n , and thus sediment d e l i v e r y , i s a storm event based phenomenon with important temporal v a r i a t i o n s occuring throughout t h e storm. To use a simple l i n e a r model t o d e s c r i b e t h e behaviour of sediment d e l i v e r y ignores t h e s e q u e n t i a l n a t u r e of t h e v a r i a b l e s C and Q, and t h e f i x e d c o e f f i c i e n t s of t h e r a t i n g curve do not allow f o r p o s s i b l e v a r i a t i o n s i n t h e response of suspended sediment c o n c e n t r a t i o n a t d i f f e r e n t s c a l e s o r l e v e l s of d i s c h a r g e .
I n t h e following d i s c u s s i o n , t h e s e two important a s p e c t s of suspended sediment response a r e examined f o r d a t a obtained d u r i n g storm events. The s e q u e n t i a l n a t u r e of sediment response t o both t o t a l d i s c h a r g e and t o quickflow (stormflow) i s considered i n terms of h y s t e r e s i s diagrams which g i v e an i n d i c a t i o n of t h e behaviour of t h e v a r i a b l e s i n t h e time domain. P o s s i b l e s c a l e v a r i a t i o n s between suspended sediment and d i s c h a r g e are then considered by a t r a n s f e r t o t h e frequency domain, o r v i a s p e c t r a l a n a l y s i s . Conclusions a r e then drawn concerning t h e i m p l i c a t i o n s of varying response p a t t e r n s t o water q u a l i t y monitoring.
492 2.
STUDY AREA AND DATA
Data f o r t h e a n a l y s i s a r e from f i v e small f o r e s t e d watersheds i n south e a s t e r n New South Wales, with a l l f i v e streams flowing i n t o t h e Wallagaraugh River. The watersheds a r e a d j a c e n t t o one another and vary i n s i z e from 76ha t o 225ha. Within each watershed, a 140° V-notch weir has been i n s t a l l e d , s t a g e was measured w i t h a Rimco Sumner Mark I1 f l o a t r e c o r d e r , and water samples were taken with a Gamet automatic water sampler. The water samplers take p o i n t samples and a r e f l o a t switch o p e r a t e d . Concentration of suspended sediment f o r each water sample was determined using a membrane f i l t r a t i o n technique. The p r e s e n t a n a l y s i s i s based on t h e p e r i o d J u l y 1977 t o June 1979. During t h i s p e r i o d , 20 storm e v e n t s , with r a i n f a l l s varying from 12 t o 339mm were sampled i n t h e f i v e watersheds. Due t o equipment f a i l u r e and i n some i n s t a n c e s , l i t t l e o r no sediment response i n t h e watersheds during storm e v e n t s , a t o t a l of 39 i n d i v i d u a l storm hydrographs have been analysed. Both t h e d i s c h a r g e s e r i e s and t h e suspended sediment c o n c e n t r a t i o n s e r i e s f o r t h e s e 39 storm events were i n t e r p o l a t e d t o one hour time i n t e r v a l s before t h e a n a l y s i s was carried out.
3.
TIME DOMAIN ANALYSIS
3.1 Suspended Sediment Response t o Discharge To o b t a i n some idea of t h e broad behaviour between suspended sediment c o n c e n t r a t i o n and t o t a l d i s c h a r g e (baseflow and q u i c k f l o w ) , h y s t e r e s i s p l o t s w e r e used. These p l o t s a r e simply a s c a t t e r diagram f o r t h e two v a r i a b l e s , with t h e s z q u e n t i a l a s p e c t of t h e d a t a denoted by j o i n i n g a d j a c e n t p o i n t s i n t h e t i m e series with a straight line. Before t h e p l o t s were c o n s t r u c t e d , a t h r e e p o i n t moving average f i l t e r was a p p l i e d t o t h e two s e r i e s , t h u s removing high frequency components s o only broad p a t t e r n s of sediment response were i n d i c a t e d with t h e h y s t e r e s i s diagrams. The h y s t e r e s i s p l o t s f o r t h e d a t a from t h e 39 storm e v e n t s i n d i c a t e d t h a t seven d i f f e r e n t suspended sediment response types were o p e r a t i n g i n t h e watersheds ( F i g u r e 1 ) . A f u l l d e s c r i p t i o n of each of t h e s e responses i s given by Olive and Rieger ( 1 9 8 5 ) , and a b r i e f summary of t h e i r c h a r a c t e r i s t i c s i s a s follows: ( a ) S i n g l e rise storm events with sediment l e a d , o r a simple clockwise loop, occurred i n f o u r of t h e watersheds and made up 23% of the storm events analysed ( b ) S i n g l e rise storm events with sediment l a g , o r a c o u n t e r clockwise loop, occurred i n t h r e e watersheds and made up 8% of t o t a l events
493 SINGLE RISE
DD (a) Sediment lead
(b) Sediment lag
(c) Sediment-discharge correlation
MULTIPLE RISE
(d) Sediment lead
9 (e) Sediment lag
(f) Sediment lead-lag
(g) No recognisable pattern
DISCHARGE (cumecs)
F i g u r e 1:
)7
Sediment response t y p e s f o r storm e v e n t s .
494 ( c ) S i n g l e r i s e with t h e sediment and d i s c h a r g e peaks i n phase occurred i n two watershed s and made up 5% of t h e t o t a l e v e n t s ( d ) Multiple r i s e with sediment l e a d and sediment d e p l e t i o n occurred i n two watershed s and made up 10% of t o t a l e v e n t s ( e ) Multiple r i s e with sediment l a g and sediment d e p l e t i o n occurred i n two watershed s and made up 5% of t o t a l e v e n t s ( f ) Multiple rise with sediment l e a d and l a g occurred i n t h r e e watershed s and made up 8% of t o t a l e v e n t s ( g ) Responses i n which t h e r e was no i d e n t i f i a b l e p a t t e r n occurred i n f o u r watershed s and made up 41% of t h e t o t a l e v e n t s . These r e s u l t s a r e made more complicat ed when i n d i v i d u a l watershed s and storm e v e n t s a r e taken i n t o c o n s i d e r a t i o n . Over t h e two year study p e r i o d , p a r t i c u l a r streams demonstra ted up t o f i v e d i f f e r e n t response types during storm e v e n t s and no stream showed t h e same There were a l s o major d i f f e r e n c e s i n response type f o r a l l storms. s f o r p a r t i c u l a r storm watershed e v i f e h t among type response The dominance of response types with no i d e n t i f i a b l e events. p a t t e r n ( 4 1 % of t h e t o t a l storms a n a l y s e d ) p o i n t s f u r t h e r t o t h e complexit y of t h e behaviour of suspended sediment c o n c e n t r a t i o n s . 3 . 2 Suspended Sediment Response t o Quickflow
The examinati on of t h e response of suspended sediment t o quickflow , o r storm flow, i s i n t h e realm of p r o c e s s s t u d i e s i n t h a t t h e complexit y of t h e sediment d e l i v e r y problem i s reduced t o a form where quickflow i s p o s t u l a t e d a s a p o s s i b l e d e l i v e r y mechanism (Walling and Webb, 1982). Since t h e source of sediment i s w i t h i n a watershed and t h e sediment i s t r a n s p o r t e d t o t h e channel by s u r f a c e runoff and i n t e r f l o w , quickflow has appeal a s a p o s s i b l e d e l i v e r y mechanism. Baseflow s e p a r a t i o n was c a r r i e d out on t h e d i s c h a r g e series f o r storm e v e n t s using a r e c u r s i v e d i g i t a l f i l t e r proposed by Lyne and Hollich ( 1 9 7 9 ) . The f i l t e r i n g process t a k e s t h e form: Qq(t)
=
a
*
Qs(t-1)
+
(l+a)/2
*
[Q(t)-Q(t-l)],
(2)
where Q q ( t ) i s t h e quickflow component Q ( t ) is t o t a l streamflow a is the f i l t e r i n g coefficient. The values used f o r t h e c o e f f i c i e n t , a , were i n t h e range 0.7 t o 0.9 and phase c h a r a c t e r i s t i c s of t h e s e r i e s were p r e s e r v e d with a two pass forward and backward a p p l i c a t i o n of t h e f i l t e r . H y s t e r e s i s p l o t s were generated f o r suspended sediment c o n c e n t r a t i o n The and quickflow i n t h e same f a s h i o n a s o u t l i n e d i n S e c t i o n 3.1. follows: s a e r a s t n e v e storm 39 e h t r results fo
( a ) Those responses which showed i d e n t i f i a b l e p a t t e r n s , o r types ( a ) through ( f ) i n Section 3.1, showed s i m i l a r behaviour i n t h e sediment responses t o quickflow ( b ) Approximately h a l f of t h e storm events which showed no i d e n t i f i a b l e p a t t e r n i n t h e sediment - d i s c h a r g e p l o t s had i d e n t i f i a b l e p a t t e r n s i n sediment response t o quickflow. Thus i n using quickflow a s a p o s s i b l e sediment d e l i v e r y mechanism, sediment responses a r e almost a s complex a s were t h e responses t o t o t a l discharge. I n d i v i d u a l watersheds d i s p l a y e d d i f f e r i n g responses throughout t h e study p e r i o d and t h e r e was v a r i a t i o n i n response among t h e f i v e watersheds t o p a r t i c u l a r storm e v e n t s . The major d i f f e r e n c e with response t o quickflow was a r e d u c t i o n of t h e responses showing no i d e n t i f i a b l e p a t t e r n t o 2 1 % of t h e t o t a l storms studied. 4.
FREQUENCY DOMAIN ANALYSIS
The o b j e c t of t r a n s f e r r i n g t o the frequency domain v i a s p e c t r a l a n a l y s i s , i s t o i s o l a t e t h e important frequency components, o r s c a l e s , which might be p a r t of the p r o c e s s o p e r a t i n g between discharge and suspended sediment c o n c e n t r a t i o n . Frequency domain a n a l y s i s a l s o has t h e p o s s i b l e b e n e f i t of reducing t h e complexity of sediment response a s i n d i c a t e d by t h e time domain a n a l y s i s i n t h e above d i s c u s s i o n . The maximum Entropy Method ( M E M ) was used f o r t h e c a l c u l a t i o n of s p e c t r a l e s t i m a t e s . MEM was p r e f e r r e d t o t h e more t r a d i t i o n a l methods of s p e c t r a l e s t i m a t i o n ( J e n k i n s and Watts, 1968) because it can be used f o r s h o r t time s e r i e s (Ulrych and Bishop, 1975), which i s t h e case f o r storm event d a t a . Most of t h e 39 e v e n t s , used h e r e , have fewer than 1 0 0 o b s e r v a t i o n s i n t h e d i s c h a r g e and suspended sediment c o n c e n t r a t i o n series. A s t h e frequency domain a n a l y s i s i s t o be used f o r t h e study of t h e process o p e r a t i n g between d i s c h a r g e and c o n c e n t r a t i o n , t h e d a t a f o r each of t h e s e v a r i a b l e s w e r e combined t o g i v e a new v a r i a b l e which was r e p r e s e n t a t i v e of t h a t p r o c e s s . This new v a r i a b l e was generated by c a l c u l a t i n g t h e s l o p e between c o n c e n t r a t i o n and d i s c h a r g e f o r adjacent p o i n t s i n t i m e , g i v i n g a new series which measures t h e changing s e q u e n t i a l r e l a t i o n s h i p between t h e two v a r i a b l e s . I n e f f e c t , t h e new s e r i e s r e p r e s e n t s t h e s l o p e angle of a d j a c e n t p o i n t s along t h e h y s t e r e s i s p l o t f o r a storm event.
The g e n e r a l i s e d r e s u l t s of t h e s p e c t r a l a n a l y s i s of t h e 39 storm events a r e shown i n Figure 2 which can be summarised a s follows: ( a ) A l l storm e v e n t s a r e dominated by a low frequency component which i s l i k e l y a m a n i f e s t a t i o n of t h e broad loop i n t h e hysteresis plots.
-
.........
0
.1
identifiable sediment responses unidentifiable sediment responses
.2 FREQUENCY
Figure 2:
.3 (CYCLES HR-’)
Generalised spectra f o r storm event data.
.4
.5
497 ( b ) A l l storm events showed a high frequency component i n t h e range 0.37 t o 0.50 cycles h r - l . This frequency l i k e l y corresponds t o t h e f l u c t u a t i o n s about t h e broad h y s t e r e s i s loop, and was removed by t h e moving average f i l t e r a p p l i e d t o t h e d a t a when t h e h y s t e r e s i s p l o t s were generated i n Section 3 . ( c ) Those storm e v e n t s which showed no i d e n t i f i a b l e p a t t e r n i n t h e time domain were d i f f e r e n t i a t e d from t h e e v e n t s with recognisable p a t t e r n s by a mid-erequency component i n t h e range 0.20-0.33 cycles hr-l.
5.
CONCLUSIONS
The examination of t h e behaviour of suspend sediment c o n c e n t r a t i o n s i n stream channels d u r i n g storm e v e n t s has i n d i c a t e d a complex set of responses which have some important i m p l i c a t i o n s f o r water q u a l i t y monitoring and p r e d i c t i o n models. I n t h e c a s e of monitoring, both t h e temporal and t h e s p a t i a l v a r i a t i o n i n sediment response have t o be considered i n any sampling network. Since sediment response v a r i e s between watersheds f o r i n d i v i d u a l storms, sampling a p a r t i c u l a r watershed and e x t r a p o l a t i n g t h e r e s u l t s t o a d j a c e n t watersheds may prove t o be i n v a l i d . The a c t u a l sampling r a t e a t p o i n t s w i t h i n t h e network a l s o needs c a r e f u l c o n s i d e r a t i o n . For example, r a t e - o f - r i s e water samplers assume t h a t sediment response t o d i s c h a r g e is a l i n e a r f u n c t i o n , and would not c o r r e c t l y sample f o r responses with sediment l e a d s or l a g s . I n t h e case of p r e d i c t i o n models f o r suspended sediment, it i s obvious t h a t t h e commonly used r a t i n g curve does not manifest t h e t r u e behaviour between c o n c e n t r a t i o n and discharge. The l i n e a r response assumed by t h e r a t i n g methodology occurred i n only 5% of t h e storm e v e n t s s t u d i e d . However, t h e s p e c t r a of t h e storm event s e r i e s showed some s i m u l a r i t y i n dominant frequency components f o r t h e storm e v e n t s which had such v a r i e d responses i n t h e time domain. A l l events contained a high and a low frequency component, and events which showed no i d e n t i f i a b l e p a t t e r n i n t h e t i m e domain contained an a d d i t i o n a l middle frequency component.
ACKNOWLEDGEMENTS
The a u t h o r s would l i k e t o thank t h e A u s t r a l i a n Research Grants Scheme and t h e F o r e s t r y Commission of N e w South Wales f o r t h e i r assistance.
498
REFERENCES Jenkins, G.M. and Watts, D.G., 1968: Spectral Analysis and its Applications, Prentice-Hall, Englewood Cliffs, N.J. Lyne, V.D. and Hollick, M., 1979: Stochastic time-varying rainfall-runoff modelling, Hydrology and Water Resources Symposium, 89-92, Institution of Engineers, Australia. Olive, L.J. and Rieger, W.A., 1985: Variation in suspended sediment concentration during storms in five small catchments in south east New South Wales. Australian Geographical Studies, 23, 38-51. Ulrych, T.J. and Bishop, T.N., 1975: Maximum entropy spectral analysis and autoregressive decomposition, Reviews of Geophysics and Space Physics, 13, 1 83-20 0.
Walling, D.E. and Webb, B.W., 1982: Sediment availability and the prediction of storm period sediment yields, IAHS Publication No.137, 327-340. Walling, D.E. , 1983: The sediment delivery problem, Journal of Hydrology, 69, 209-237.
I N D E X
A A c i d r a i n , 64 A n a l y s i s o f d a t a , 261 A p p a l a c h i a n P l a t e a u , 133 A t l a n t i c C a n a d a , 53 A t m o s p h e r i c ' i n p u t s , 53 A u t o r e g r e s s i v e p r o c e s s , 404
B B h a t t a c h a r y y a ' s m e a s u r e , 266 B i o l o g i c a l mon i t o r i ng, 261 Biomagnification of a contaminant,
233 B r a y - C u r t i s i n d e x , 247 B r i t i s h C o l u m b i a , 434
C Canadian Shietd, 133 C e n s o r e d w a t e r q u a l i t y d a t a , 137 C h a n g e p o i n t p r o b l e m s , 381 D e t e c t i o n a n d e s t i m a t i o n , 385 Two r e g i m e t r a n s i t i o n m o d e l , 381 C h e m i c a l t r a n s p o r t , 345 Chi-square goodness-of-fi t test,
1 9 8 , 215 C h l o r i d e l o a d i n g , 469 C h l o r o p h y l l 2 , 273
Chromatographic and colorimetric e v a l u a t i o n , 64 C l u s t e r a n a l y s i s , 1 0 0 , 1 3 3 , 199 C o l i f o r m c o u n t s , 217 C o l i f o r m m o n i t o r i n g , 183 C o l l e c t i o n o f w a t e r s a m p l e s , 196 C o m p u t e r i z a t i o n , 418 A l i a s i n g , 419 Commodore 64 ( C - 6 4 ) , 445 C o m p u t e r c o n f i g u r a t i o n , 426 C o n v e r s i o n t o , 419 C o o r d i n a t i o n and c o n t r o l , 424 Cost-effective d e s i g n 444 D a t a a c q u i s i t i o n , 4 1 8 , 443 I n t e r f a c i n g s y s t e m , 448 M i c r o c o m p u t e r b a s e d , 445 P r o g r a m m i n g c o n s i d e r a t i o n s , 429 R e s o l u t i o n , 420 S i g n a l c o n d i t i o n i n g s y s t e m , 446 S o f t w a r e d e v e l o p m e n t , 453 S y s t e m s a p p r o a c h , 418 Contaminant analysis: in a q u a t i c b i o t a , 231 C o r r e l a t i o n a n a l y s i s , 199 C o v a r i a t e a n a l y s i s , 303 C r o s s c o r r e I a t i o n , 306
,
D Data acquisition,
18
500 D a t a u n c e r t a i n t y , 18 I m p l i c a t i o n s o f , 25 E s t i m a t i o n , 21 S o u r c e s , 1 8 , 28 Data u t i l i z a t i o n , 18 D e s i g n q u a 1 i t y a s s u r a n c e , 81 Detecting changes i n regression, 382 D e t e c t i n g p a r a m e t e r c h a n g e s , 384 D i s c r i m i n a t i o n t e c h n i q u e s , 326 Dispersion p a t t e r n s of b a c t e r i a , 196 D i s t r i b u t i o n a l P a r a m e t e r s , 137 D i s s o l v e d o r g a n i c c a r b o n , 61
H Hazen P l o t : lognormal frequency d i s t r i b u t i o n , 186 Heterogeneity characterization, 1 Hierarchical clustering analysis 3 0 , 33
I Ion Chromatography
(IC),
64
J E K e n d a l 1's T a u s t a t i s t i c , E c o l o g i c a l m o n i t o r i n g , 30 L a G r a n d e C o m p l e x , Q u e b e c , 30 E n g l a n d a n d W a l e s , 221 E s t i m a t e v e r i f i c a t i o n , 155 E s t i m a t i o n , 405 B a y e s ' e s t i m a t o r s , 405 C o m p a r i s o n , 41 1 Of e x t r e m e v a l u e s , 173 Of p a r a m e t e r s , 142 W i t h , c I a s s i f i c a t i o n , 148 E u c l i d e a n d i s t a n c e i n d e x , 248 E u t r o p h i c a t i o n , 273
348
L L a b o r a t o r y a n a l y s i s , 21 L a g o n e s e r i a l c o r r e l a t i o n , 347 Lake Erie, 2 L a k e O n t a r i o , 7 9 , 9 9 , 469 S u r v e i l l a n c e p r o g r a m , 274 Limnological d a t a set: statistical a s s e s s m e n t , 363 L o g - l o g A n c o v a m o d e l , 232 Long term water q u a l i t y records, 388
F F e r m e n t a t i o n t u b e t e c h n i q u e , 184 F r a n c e , 194 F r e q u e n c y a n a l y s i s , 1 7 7 , 1 8 7 , 495 F r e q u e n c y componen t i d e n t i f i c a t i o n , 388
G Gamma M a r k o v p r o c e s s e s , 293 I n p u t p r o p e r t i e s , 294 L i n e a r r e g r e s s i v e m o d e l , 297 W e i g h t e d s u m o f s e a s o n a l Gamma v a r i a b l e s , 299 Global v a r i a n c e , 335, 339 Gower S i m i l a r i t y C o e f f i c i e n t M a t r i x , 3 3 , 247 G r e a t L a k e s W a t e r Qua1 i t y Agreemen t , 2 7 3 Grouping procedures, 5
M M a n n - W h i t n e y t e s t , 333 M a s s d i s c h a r g e e s t i m a t i o n , 345 M a x i m u m e n t r o p y m e t h o d , 495 M e a n v a l u e e s t i m a t i o n , 187 Membrane f i l t r a t i o n t e c h n i q u e , 1 8 4 , 222 Methyl thymol b l u e procedure (MTB), 54, 64 M i c h i g a n , 4 8 6 , 487 M i c r o b i o l o g i c a l w a t e r q u a l i t y , 221 Assessment , 221 S t a n d a r d s , 222 M o n i t o r i n g a c t i v i t i e s , 19 B a c t e r i a l d e n s i t y , 194 C o a s t a l s t r e a m , 433 H i g h f r e q u e n c y , 433 M o r i s i t a i n d e x ( m o d i f i e d ) , 247 Most p r o b a b l e n u m b e r ( M P N ) , 222
501 M u I i p l e s t a n d a r d a d d i t i o n (MSA 65 MuI i p l e t u b e ( d i l u t i o n ) method, 222 Mu I i s p e c i e s s t u d i e s , 261 M u I i v a r i a t e m e t h o d s , 3 0 , 33
N N a t i o n a l a s s e s s m e n t p r o g r a m , 95 N a t u r a l v a r i a b i l i t y , 158 Negative b inomia I distribution, 21 5 New S o u t h W a l e s , A u s t r a l i a , 492 N i a g a r a R i v e r , 8 , 4 6 1 , 469 N e t w o r k d e s i g n , 95 N o n p a r a m e t r i c m e t h o d s , 3 3 3 , 383 Numerical i n t e g r a t i o n , 469 1 6 3 , 168 Nutrient concentration,
P P a r a m e t e r e s t i m a t i o n , 198 P h o s p h o r u s , 3 0 2 , 479 D e t e r g e n t b a n , 3 6 4 , 486 E s t i m a t i o n , 460 L o a d i n g , 3 6 3 , 460 L o a d i n g s h i f t s , 479 Monthly mean N o n - S e a s o n a l , 480 Time Series: S e a s o n a l , 484 Time Series: Z e r o c o r r e l a t i o n m o d e l , 462 P h y top Ia n k ton b iomass measurement, 274 P o i s s o n d i s t r i b u t i o n , 222 P o i s s o n m o d e l , 217 P r e c i p i t a t i o n , 51 P r i n c i p a l Components A n a l y s i s , 3 0 , 3 3 , 1 0 1 , 124
Q Q u a n t i f i c a t i o n of n o n - t i d a I t e m p o r a l v a r i a b i l i t y , 161 Q u e b e c R i v e r s , 117
R Random w a l k / o b s e r v a t i o n e r r o r , 480 R a n d o m i z a t i o n p r o c e d u r e s , 261
R a t i o o f i s o t o p e s of e l e m e n t s i n b i o g e n i c m a t e r i a l , 237 R a t i o of s e n s i t i v e species t o r e s i s t a n t s p e c i e s , 238 R e g r e s s i o n m o d e l s , 318 A u t o r e g r e s s i v e r e s i d u a l s , 31 8 F i r s t o r d e r p o l y n o m i a l , 319 Polynomial p l u s centred period c o m p o n e n t , 320 R e l a t i o n s h i p between a v a r i a b l e and i t s p o s i t i o n , 10 R e l a t i v e s t a n d a r d d e v i a t i o n , 24 L a G r a n d e , 30 Reservoir: R i v e r a c i d i t y , 44 Root m e a n s q u a r e e r r o r , 335
S St. L a w r e n c e R i v e r , 9 3 S a m p l e c o l l e c t i o n , 21 S a m p l i n g p r o p e r t i e s , 246 S a t t e r t h w a i t e ' s a p p r o x i m a t i o n , 33 1 S e a s o n a l v a r i a b i l i t y , 9 9 , 329 S e q u e n t i a l s a m p l i n g , 200 S i m i l a r i t y i n d i c e s , 2 4 7 , 262 S i m u l a t i o n t e c h n i q u e , 248 S n e d e c o r ' s F - t e s t , 331 S p a t i a l a u t o c o r r e l a t i o n methods, 8 S p a t i a I c o r r e l a t i o n s , 175 S p a t i a l d i s t r i b u t i o n , 100 S p a t i a l v a r i a b i l i t y , 117 S p e c t r a l A n a l y s i s , 388 S t a n d e r ' s m e a s u r e , 263 S t o c h a s t i c t r a n s f e r f u n c t i o n , 303 S t u d e n t s ' t - t e s t , 327 Sulphate determination, 53, 64 S u s p e n d e d s e d i m e n t r e s p o n s e , 490
T T e m p e r a t e e s t u a r y , 158 T i m e d o m a i n a n a l y s i s , 495 T i m e s e r i e s a n a l y s i s , 4 4 , 1 7 6 , 388 3 0 2 , 3 3 5 , 3 4 7 , 463 Iterative I inear interpolation, 337 M a r k o v i a n , 335 T e s t c o m p a r i s o n s , 347 T o t a l s u r v e y d e s i g n c o n c e p t , 21 T r a n s f e r f u n c t i o n , 4 4 , 302 T r e n d a s s e s s m e n t m o n i t o r i n g , 388 T r e n d s u r f a c e a n a l y s i s , 11 T r e n d s , 3 0 3 , 347 D e t e r m i n i s t i c , 1 0 0 , 350 L i n e a r r e g r e s s i o n m o d e l s , 351
502 [Trends] L o g i s t i c model, 3 5 3 Second o r d e r a u t o r e g r e s s i v e m o d e l , 353 S t o c h a s t i c , 350 Threshold a u t o r e g r e s s i v e model,
354
W W a t e r c o l o u r , 56 W a t e r q u a l i t y , 17 I n d i c a t o r s , 433 W i s c o n s i n , 4 8 6 , 487 Wisconsin Lakes, 363
U Univariate analysis,
47
v V a l u e of
V a n c o u v e r , 388 V i r g i n i a , 159, 267
z total estimate,
26
Zonation determination,
99