Statistical Aspects of Water Quality Monitoring: Proceedings of Workshop Held at Canadian Centre for Inland Waters, Oct 85 (Developments in Water Science)

STATlSTlCAl ASPECTS OF WATER QUALITY MONITORING Proceedings of the Workshop held at the Canada Centre for Inland Waters...

Author: A. H. El-Shaarawi

32 downloads 682 Views 22MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

STATlSTlCAl ASPECTS OF WATER QUALITY MONITORING Proceedings of the Workshop held at the Canada Centre for Inland Waters, October 7-10,1985

DEVELOPMENTS I N WATER SCIENCE, 27 OTHER TITLES I N THIS SERIES

1 G. BUGLIARELLO AND F. GUNTER COMPUTER SYSTEMS AND WATER RESOURCES 2 H.L. GOLTERMAN PHYSIOLOGICAL LIMNOLOGY 3 Y.Y. HAIMES, W.A. HALL AND H.T. FREEDMAN MULTIOBJECTIVE OPTIMIZATION I N WATER RESOURCES SYSTEMS: THE SURROGATE WORTH TRADE-OFF-METHOD 4 J.J. FRIED GROUNDWATER POLLUTION 5 N. RAJARATNAM TURBULENT JETS

6 D. STEPHENSON PIPELINE DESIGN FOR WATER ENGINEERS 7 v. HALEK AND J. SVEC GROUNDWATER HYDRAULICS 8 J.BALEK HYDROLOGY A N D WATER RESOURCES I N TROPICAL AFRICA 9 T.A. McMAHON AND R.G. MElN RESERVOIR CAPACITY A N D Y I E L D

10 G. KOVACS SEEPAGE HYDRAULICS W.H. GRAF AND C.H. MORTIMER (EDITORS) HYDRODYNAMICS OF LAKES: PROCEEDINGS OF A SYMPOSIUM 12-13 OCTOBER 1978, LAUSANNE, SWITZERLAND

11

12 W. BACK AND D.A. STEPHENSON (EDITORS) CONTEMPORARY HYDROGEOLOGY: T HE GEORGE BURKE M A X E Y MEMORIAL VOLUME

13 M.A. MARINO AND J.N. LUTHIN SEEPAGE A N D GROUNDWATER 14 D. STEPHENSON STORMWATER HYDROLOGY AND DRAINAGE 15 D. STEPHENSON PIPELINE DESIGN FOR WATER ENGINEERS (completely revised edition of Vol. 6 in t h e series) 16 w. BACK AND R. LETOLLE (EDITORS) SYMPOSIUM ON GEOCHEMISTRY OF GROUNDWATER 17 A.H. EL-SHAARAWI (EDITOR) I N COLLABORATION WITH S.R. ESTERBY TIME SERIES METHODS I N HYDROSCIENCES 18 J.BALEK HYDROLOGY A N D WATER RESOURCES I N TROPICAL REGIONS 19 D. STEPHENSON PIPEFLOW ANALYSIS

20 I.ZAVOIANU MORPHOMETRY OF DRAINAGE BASINS 21 M.M.A. SHAHIN HYDROLOGY OF T HE N I L E BASIN 22 H.C.RlGGS STREAM FLOW CHARACTER ISTICS M. NEGULESCU MUNICIPAL WASTEWATER TREATMENT

23

L.G. EVERETT GROUNDWATER MONITORING HANDBOOK FOR C OAL A N D O I L SHALE DEVELOPMENT

24

25 W. KINZELBACH GROUNDWATER MODELLING: A N INTRODUCTION WITH SAMPLE PROGRAMS I N BASIC D. STEPHENSON AND M.E. MEADOWS KINEMATIC HYDROLOGY AND MODELLING

26

STATISTICAL ASPECTS OF WATER QUALITY MONITORING Proceedings of the Workshop held at the Canada Centre for Inland Waters, October 7-10,1985

Edited by

A.H. EL-SHAARAWI National Water Research Institute, Burlington, Ontario, Canada

and

R.E. KWIATKOWSKI Water Quality Branch, Inland Waters Directorate, Ottawa, Ontario, Canada

ELSEVIER Amsterdam - Oxford - New York - Tokyo 1986

ELSEVIER SCIENCE PUBLISHERS B.V. Sara Burgerhartstraat 25 P.O. Box 21 1, 1000 AE Amsterdam, The Netherlands Distributors for the United States a n d Canada: ELSEVIER SCIENCE PUBLISHING COMPANY INC. 52, Vanderbilt Avenue New York, N Y 10017, U.S.A.

Lihrary nf Congres C~taloginginYublicationData

Etatistical aspects of water qucity monitoring (Developments in water science ; 27) Aibliography: p. Includes index. 1. Water quality--Measurement--Congresses. 2. Water quality--Statistical methods--Congresses. I. El-Shoarawi, A . H. 11. K v i a t k o w s k i , I;. E., 1949;. 111. Series. TD3C.7. S73 1990 628.1 '61 ub-24035

.

ISBN O-444-42<9S-l (U.S.)

ISBN 0-444-42698-1 (Val. 27) ISBN 0-444-41669-2 (Series) 0 Elsevier Science Publishers B.V., 1986 All rights reserved. N o part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, withhout the prior written permission of the publisher, Elsevier Science Publishers B.V./Science & Technology Division, P.O. Box 330, 1000 A H Amsterdam, The Netherlands. Special regulations for readers in the USA - This publication has-been registered with the Copyright Clearance Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. A l l other copyright questions, including photocopying outside of the USA, should be referred t o the copyright owners, Elsevier Science Publishers B.V., unless otherwise specified. Printed in The Netherlands

P R E F A C E

Statistics provides a to1 lection o f , techniques for extracting maximum i n f o r m a t i o n from a g i v e n d a t a set a n d a l l o w the c o n s t r u c t i o n of s t r a t e g y f o r f u t u r e d a t a c o l l e c t i o n . These techniques h a v e proven i n v a l u a b l e i n such f i e l d s a s a g r i c u l t u r e , medical science a n d business. However, i n the a r e a of environmental sciences, s t a t i s t i c a l a p p l i c a t i o n s are s t i l l i n t h e i r i n f a n c y , w i t h few attempts to s y s t e m a t i c a l l y develop techniques d e a l i n g w i t h environmental issues. The "Workshop on the Statistical Aspects of Water Quality M o n i t o r i n g " , h e l d October 7-10, 1985 a t the National Water Research Institute i n Burlington, Ontario, Canada was a n attempt to b r i n g together i n t e r n a t i ona I sc i en t i sts , s t a t i st i c i ans , a n d users of s t a t i st ica I methodology in I imnology, water quality regulation and control, m o n i t o r i n g network d e s i g n , a n d , m o d e l l i n g of a q u a t i c environments. The p u b l i c a t i o n of t h i s book i s one step towards identifying a p p r o p r i a t e s t a t i s t i c a l techniques a n d d i a g n o s i n g problems i n Water Q u a l i t y M o n i t o r i n g w h i c h r e q u i r e new s t a t i s t i c a l methodologies. The papers presented i n t h i s volume were peer reviewed a n d represent international expertise, consolidating detailed information on both conventional a n d new methods. The p r i m e o b j e c t i v e of the Workshop was to generate i n t e r a c t i o n between the s t a t i s t i c a l community a n d s c i e n t i s t s w o r k i n g i n the a r e a of Water Q u a l i t y M o n i t o r i n g . To t h i s end, topics covered i n t h i s Workshop fall i n t o two categories: (1) Methods Development, and ( 2 ) the I m a g i n a t i v e A p p l i c a t i o n of E x i s t i n g Methodologies. Subjects covered include: Time Series, . E s t i m a t i o n of Loading, Clustering, Model Development, Censoring Data Analysis, Quality Control and Data Acquisi t ion.

Deepest a p p r e c i a t i o n i s e x t e n d e d t o t h e N a t i o n a l W a t e r R e s e a r c h I n s t i t u t e , Department o f E n v i r o n m e n t ( M r . D.L. Egar, Director), and t h e I n l a n d Waters D i r e c t o r a t e , Water Q u a l i t y B r a n c h H e a d q u a r t e r s ( M r . Wm. T r a v e r s e y , D i r e c t o r ) f o r t h e i r C o - S p o n s o r s h i p o f t h e W o r k s h o p o n the S t a t i s t i c a l Aspects o f Water Q u a l i t y M o n i t o r i n g . We w o u l d a l s o l i k e t o e x p r e s s o u r g r a t i t u d e t o members o f t h e O r g a n i z i n g Committee ( M r . T.J. D a f o e , D r . A. Demayo, D r . S. E s t e r b y , a n d D r . G. H a f f n e r i ) f o r t h e i r u n t i r i n g h e l p i n r u n n i n g t h i s e v e n t . F u t h e r m o r e , we a r e i n d e b t e d t o e a c h of t h e Session M o d e r a t o r s f o r t h e i r d i l i g e n t efforts. In addition, sincere t h a n k s i s conveyed to Ms. J. M a j o r a n d M r . J.D. Smith who a s s i s t e d g r e a t l y w i t h p r e p a r a t i o n s f o r the Workshop, a n d , t o M r s . B. A r a f a t , M s . 6. Jones a n d Ms. S . A u s t i n f o r t h e s e c r e t a r i a l , word processing, a n d text e d i t i n g services provided. Last, b u t not least, we would like to thank the individual authors\ for their submissions. C o u n t r i e s represented b y these p a p e r s i n c l u d e : A r g e n t i n a , A u s t r a l i a , Canada, Denmark, E g y p t , England, France, Holland, Jordan, N o r w a y , S a u d i A r a b i a , S i n g a p o r e , S o u t h A f r i c a , Sweden, a n d t h e U n i t e d States of America. A.H. R .E.

EI-Shaarawi Kwiatkowsk i

C O N T E N T S

1.

S p a t i a l H e t e r o g e n e i t y of W a t e r Qua1 i t y P a r a m e t e r s S . R . Esterby

2.

U n c e r t a i n t y in Water Q u a l i t y D a t a R.H. Montgomery and T.G. S a n d e r s

3.

T h e Use of M u l t i v a r i a t e M e t h o d s i n t h e I n t e r p r e t a t i o n o f Water Q u a l i t y M o n i t o r i n g D a t a of a L a r g e N o r t h e r n Reservoir R . Schetagne

1

17

- A Transfer Function Approach

30

44

4.

Modeling River Acidity E . Damsleth

5.

S u l p h a t e , Water C o l o u r a n d D i s s o l v e d O r g a n i c C a r b o n R e l a t i o n s h i p s i n O r g a n i c Waters o f A t l a n t i c C a n a d a G.D. Howell and T . L . Pollock

53

S u l f a t e i n C o l o u r e d Waters. I. E v a l u a t i o n of Chromatographic a n d Colorimetric Data Compatibility V. Cheam, A.S.Y. C h a u and S. Todd

64

The Importance of Design Q u a l i t y Control Monitoring Program R.E. Kw i a t k o w s k i

79

6.

7.

8.

9.

D e t e r m i n a t i o n of Water Q u a l i t y Z o n a t i o n U s i n g Mu1 t i v a r i a t e T e c h n i q u e s M.A. Neilson and R.J.J. Stevens Spatial Variability M. Simoneau

to a National

i n Lake Ontario

i n t h e W a t e r Q u a l i t y o f Qukbec R i v e r s

99

117

VIII 10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

Estimation of Distributional Water Q u a l i t y D a t a D.R. Helsel

Parameters f o r Censored 137

N a t u r a l V a r i a b i l i t y o f Water Q u a l i t y in a Temperate Estuary L . E . G a d b o i s and B.J. N e i l s o n

158

Extension o f Water Q u a l i t y D a t a Bases i n P l a n n i n g f o r Water Treatment G.T. O r l o b and N. M a r j a n o v i i

173

S t a t i s t i c a I I n f e r e n c e s f r o m COY i f o r m M o n i t o r i n g of P o t a b l e Water W.O. P i p e s

183

M o d e l l i n g o f B a c t e r i a l P o p u l a t i o n s and W a t e r Q u a l i t y M o n i t o r i n g i n D i s t r i b u t i o n Systems A. M a u l , A.H. E l - S h a a r a w i a n d J.C. Block

194

A Goodness-of-Fit Test f o r t h e N e g a t i v e B i n o m i a l D i s t r i b u t i o n A p p l i c a b l e t o L a r g e Sets o f S m all Samples B. H e l l e r

215

R e p o r t i n g B a c t e r i o l o g i c a l Co u n ts .from W at er Samples: How Good i s t h e I n f o r m a t i o n f r o m an I n d i v i d u a l S a m p l e ? H.E. T i l l e t t

221

Some A p p l i c a t i o n s o f L i n e a r M o d e l s f o r A n a l y s i s o f Contaminants in Aquatic Biota R.H. Green

231

A Comparative Study of the Sampling Properties of Four Similarity Indices H.W. Khoo and T.M. L i m

246

Randomized S i m i l a r i t y A n a l y s i s o f M u l t i s p e c i e s L a b o r a t o r y and F i e l d S t u d i e s E.P. Smith

261

Association of Chlorophyll 5 w i t h Physical a n d C h e m i c a l F a c t o r s i n L a k e O n t a r i o , 1967-1981 A.H. E l - S h a a r a w i , J.R. E l l i o t t , R.E. K w i a t k o w s k i and D.R. P e i r s o n

273

21

Gamma M a r k o v P r o c e s s e s R.M. P h a t a r f o d

22

Dynamic C o v a r i a t e Adjustment of Water Q u a l i t y Parameters f o r Streamflow: Transfer Function Model Selection L.D. H a u g h , Y. Noda and J. M c C l a l l e n

293

23.

R e s i d u a l s from Regression w i t h Dependent E r r o r s R. J K u Ip e r g e r

24.

Alternatives for Differences E.A. McBean

.

Identifying Statistically

302

318

Significant 326

IX 25.

G l o b a l V a r i a n c e a n d Root M e a n S q u a r e E r r o r A s s o c i a t e d W i t h L i n e a r I n t e r p o l a t i o n of a M a r k o v i a n T i m e - S e r i e s D.A. Cluis

26.

E m p i r i c a l P o w e r C o m p a r i s o n s o f Some T e s t s f o r T r e n d K.W. H i p e l , A.I. M c L e o d and P.K. Fosu

27.

S t a t i s t i c a l A s s e s s m e n t of L i m n o l o g i c a l D a t a S e t : I n t e r v e nt i o n A n a l y s i s R. C l i f f o r d , J.W. Wilkinson and N.L. C l e s c e r i

335

347

363

28.

The Change P o i n t Problem: A Review of A p p l i c a t i o n s V.K. J a n d h y a l a and I.B. M a c N e i l I

38 1

29.

S p e c t r a l A n a l y s i s of L o n g - T e r m P.H. W h i t f i e l d

388

30.

Bayes Estimation of Parameters of F i r s t Order Autoregressive Process M.S. A b u - S a l i h and A.A. Abd-Alla

W a t e r Qua1 i t y R e c o r d s

31.

A Systems A p p r o a c h t o C o m p u t e r i z i n g D a t a A c q u i s i t i o n T.R. Clune

32.

H i g h F r e q u e n c y W a t e r Q u a l i t y M o n i t o r i n g of Stream N.E. Dalley

33.

405 418

a Coastal

T h e D e s i g n of a Cost E f f e c t i v e M i c r o c o m p u t e r - B a s e d D a t a A c q u i s i t i o n System K. O k a m u r a and K. A g h a i - T a b r i z

433

443

34.

O n t h e E s t i m a t i o n of M o n t h l y M e a n P h o s p h o r u s L o a d i n g s M.E. T h o m p s o n and K. Bischoping

460

35.

E s t i m a t i o n of L o a d i n g b y N u m e r i c a l I n t e g r a t i o n A.H. E l - S h a a r a w i , K.W. K u n t z and A. S y l v e s t r e

469

36.

I n t e r v e n t i o n A n a l y s i s o f S e a s o n a l and N o n s e a s o n a l D a t a to E s t i m a t e T r e a t m e n t P l a n t P h o s p h o r u s L o a d i n g S h i f t s

K.A. 37.

INDEX

Booman,

P.M.

479

B e r t h o u e x and L. P a l l e s e n

Sediment Responses D u r i n g Storm E v e n t s i n S m all F or es t ed Watersheds W.A. R i e g e r and L.J. O l i v e

490

499

This Page Intentionally Left Blank

SPATIAL HETEROGENEITY OF WATER QUALITY PARAMETERS

S.R. ESTERBY National

Water

Research

Institute, Canada

Centre

for

Inland

Waters, Burlington, Ontario, Canada

1.

INTRODUCTION

In water quality monitoring, it is difficult to think of a situation in which a decision does not have to be made where

to

collect

water

quality

samples.

Thus,

measurements

general, knowledge

have

at a

spatial

quality

about

potentially, all

In

component.

parameters vary

over

the region of interest is necessary to the understanding

of a

system.

o f h o w water

least

It may be of primary interest, as, for example, in the

study of the transport of a pollutant, or of secondary interest in that it may be necessary to remove the spatial component in order to detect changes over time.

Dependent upon our state of

knowledge, the analysis o f water quality data for the spatial component can be placed under one o f 1)

the following objectives:

characterization o f heterogeneity,

or

2)

testing

for

and

In this paper,

estimation o f a well-defined spatial component. only the first objective will be considered.

The nature of the data and the reasons for the analysis will determine

the

methods

used

for

Three broad classes of procedures methods,

2)

spatial

characterizing heterogeneity.

1)

are available:

autocorrelation

methods

and

3)

grouping methods

which involve fitting a function for the relationship between a water

quality

parameter

and

location,

with

or

without

the

ass,umption of independence.

In the present paper, a general overview of methods suitable for water quality

studies has

been

clearly t o o extensive to review here. has

been

analysis

limited for

to

spatial

separable is intended.

spatial and

(1984)

The

topic

is

Although the discussion

variation,

temporal

Haugh

attempted.

no

suggestion

variability

are

that

entirely

suggests directions

such

2

analyses might take for regularly spaced sample points based on extensions o f the Box and Jenkins approach to time series analysis. It will be useful to consider an example in which spatial heterogeneity is important before discussing methodology.

Such

an example is the issue (Barica, 1982) of anoxic conditions in Lake

Erie

objective between year

(for of

Canada

round

a

the

brief 1978

review Great

see

Lakes

1984).

Kwiatkowski, Water

and

the United

States was

aerobic

conditions

in

the

Quality

An

Agreement

the restoration o f

bottom

waters

of

the

Central Basin of Lake Erie. The historic records of hypolimnetic dissolved oxygen have been examined for the existence

of

a

time

trend

by

several

Gilbertson, 1971; Charlton, 1979; and Data selection was considered authors (Anderson et al.,

authors

Rosa

and

(Dobson Burns,

and

1981).

essential by all three sets of

1984).

Dobson and Gilbertson (1971)

used dissolved oxygen concentrations from samples for which the temperature was within

3°C

of the minimum.

Charlton

(1979)

used only near-bottom values at stations that had a depth over 15 m, were stratified and showed no evidence of incursion of Eastern Basin water. Burns and Rosa (1981) attempted to establish a representative, homogeneous area by calculating a depletion rate distribution map. clearly spatial considerations. variation Spatial

would

exist.

heterogeneity

This can

supplementary measurement

Some of these criteria are Others are reasons why spatial raises

sometimes

an be

additional associated

such as temperature.

point. with

a

Inclusion of

this additional variable in the model may make simplifying assumptions, such as independence of errors, tenable and permit the use of conventional statistical methods. DATA SETS AND NOTATION

2.

Notation is given here over

both

space

and

quality parameters

time

and

(Table and

1) for a data set collected

containing

descriptors of

the

a

number

of

water

location or water

mass, which will be called supplementary observations in this paper. The primary use for the notation will be to show which dimensions of such a general data set can be or are normally handled

by

a

particular method

measurements on p water quality

of

analysis.

parameters at

The matrix

of

I stations at

3

TABLE 1 Description of a g e n e r a l water q u a l i t y d a t a s e t . S u b s c r i p t s of Y M a t r i x

Description o f Rows Stat ion

Depth

-- - --- I

-

2 ^1 I I I y .

... ...

1 2

1,2 232

"1

2

n1 +I n1 +2

n1,2 n1+1,2 n1+2,2

... . .. ...

"2

nl+"2

n1+n2,2

.. .

1

n-n I +1 n-n + 2 I

n -n + 1 , 2 I n -nI+2, 2

Y1

2

n

I

...

...

.. .

n

Vector n o t a t i o n , length=IC Vector n t a t i o n , length=I derived v a r i a b l e

P

------

I,p 2,p

Supplementary Observation 1 2 9

...

ID

n-nI+l ,p n -n I + 2 , p

nsp

_ _ l l p l ^ l l l l l

Vector n o t a t i o n , length=n

--------

Water Q u a l i t y P a r a m e t e r 1

2

I

---

_ I I _ -

Row

.__-

1

2

-

S u b s c r i p t s of X Matrixa

I _ -

--

11

1 2

...

zp

1 1

11 1

2!2 1

...

2!,1

1 1 1 X21."

1 1

42

...

z -P

a

1 1 1

1 2

ll2

*

-y- 9

1! q l

...

u

--P

a S u b s c r i p t s o f m a t r i x X t a k e t h e same f o r m a s t h o s e o f m a t r i x Y. bAssumin t h e c o o r d i n a t e s of t h e s t a t i o n s a p e a r i n columns 1 and 2 o f t h e X m a t r i x , t h e e l e m e n t s of a coyumn d e n o t e d by unbroken v e r t i c a l l i n e s a r e e q u a l s i n c e t h e y c o r r e s p o n d t o t h e c o o r d i n a t e f o r t h e same s t a t i o n . cThe s e c o n d s u b s c r i p t i n d i c a t e s t h a t . t h e v e c t o r i n c l u d e s o n l y t h e elements correspondlng t o t h e f i r s t depth f o r each s t a t l o n . dAn e x a m p l e o f a d e r i v e d v a r i a b l e i s some f u n c t i o n of t h e v a l u e s of t h e o r i g i n a l v a r i a b l e f o r a l l d e p t h s s a m p l e d a t a particular s t a t ion.

4

time

is

t

denoted

corresponding

by

Yt

{y;j}t

=

supplementary

and

observations

the by

matrix Xt

of

q

{xik}t.

=

The number of values for each parameter is n since samples are assumed to be collected at ni depths at the ith station and

I

n

1

=

ni

i=l The subscript t will be dropped to simplify it

is

implicit.

the notation, but

For ease o f description, it will be assumed

that the first two columns of X contain the coordinates of the station, and the third column the depth of the measurement. 3.

THE IMPORTANCE OF CHARACTERIZING SPATIAL HETEROGENEITY The idea, that, if sampling is being conducted over time and

space, the variability in space should be accounted for in the design of the sampling program and the analysis of the data, is so

fundamental

objectives

of

that

it

needs

statistical

no

elaboration.

design

and

Two

analysis

of

the

which

are

achievable in environmental field studies are the increase in precision by removing as much and

the

elimination

of

as possible from the error term

bias.

spatial

component

could enter either as large variability or as bias.

An

ignored

Although

the examples given below are for other components, the comments are equally applicable to spatial variability.

In the overlooked

first at

example

the

the

design

day

to

stage.

In

heterogeneity o f phytoplankton, Platt conclusion that area

increased

remained

the between-station to

a

relatively

density

of

constant.

et

variability studying

al.

(1970)

variance rose as

ten By

.

day

stations

identifying

per

spatial d r e w the sampling

mile,

the

was

then

points

in

their Figure 1 by date and replotting in the original variance units

(Figure

I),

one

sees that

estimates

o f between-station

variance which were obtained on the same day are approximately equal.

The impression is obtained that the date of sampling is

important

in

sorting out

station variances.

the

reasons

for

different

hetween-

However, since variances based on markedly

different distances between stations were obtained for one day only, the day-to-day cannot be assessed.

variation in the between-station variances

5

26

20

el2 23

el6

14-

ex

12-

c\I

10-

CUN

0

el3 013

8-

22

64-

el9

AREA km2

Fig. 1

Plot of between-station reported

by

Platt et

variance and

al.

(1970).

sample area as

Numbers give the

day of sampling.

An example o f adjusting for a supplementary observation, by the inclusion of an additional term in a model o f a time trend, (1984)

is given by El-Shaarawi issue

o€

Lake

Erie

covariate

in the

existence

of

a

anoxia.

in further work related to the Water

level

was

analysis of oxygen depletion time

trend.

This

treated rates

provides

a

as

for

a

the

statistical

solution to the problem of correcting for hypolimnion thickness which had been identified

earlier by Charlton

(1979)

and Rosa

and Burns ( 1 9 8 1 ) . 4.

GROUPING PROCEDURES

In the present context, the common element to the procedures in

this

broad

class

is

the

division

of

a

set

of

points

in

space, usually sampling stations, into two or more groups such that members in the same group are more similar to each other than

to members

in other

groups with

respect

to one or more

6 water

quality

clustering plotting

parameters.

methods.

points

Included

The

in

are

geometrical

low-dimensional

geometrical

methods

Euclidean

and

consist

space

of that

so

points which are similar to one another occur close together. Clustering procedures use a mathematical criterion to partition the set o f points

into homogeneous groups.

methods

are

complementary

provide

objectivity

and

since

the

the

geometrical

natural groupings (Gordon, 1981).

The

two types of

clustering

procedures

procedures

display

T o discover the structure in

a data s e t , more than one procedure will often be needed. These procedures are widely used in ecology and recognition of

their

importance

is

be

to

found

in

the

book

(19841, which is devoted entirely to this topic. method,

factor

analysis,

geology (JBreskog et al., methods, include

than

with

most

frequently

Pielou

used

method

in

Books concerned more with the

applications

(1973),

Anderberg

is

1976).

by

The geometric

in

a

particular

(1975),

Hartigan

discipline,

( 1 9 8 1 ) and

Gordon

Lebart et al. (1984). With a few exceptions, these procedures divide the data set into

groups

parameters

on

strictly

and

no

categorization.

the

basis

information about

of

the

values

of

location is used

the

in this

The examination o f the spatial distribution of

the members of the groups is done after the categorization is complete,

usually

diagram which

by

plotting

represents the

the

groups

on

a

map

location in space.

or

other

Constrained

clustering methods, which require members o f a cluster to be spatially contiguous, are available for data on which a linear ordering has been imposed (e.g. but

other

spatial

Constrained

transect or depth profile data)

arrangements

clustering

methods

are have

much

more

been

complicated.

applied

to

pollen

percentages for sediment cores (Gordon, 1981). Grouping procedures one

or

more

water

are generally applied

quality parameters or

to the values

of

some combination of

water quality parameters and supplementary observations at one sampling

depth,

or

to

derived

values

specified part of the water column. y

=

(1111

(Table 1).

corresponding

to

a

F o r example, the matrices

(zl z2 ...

L P 1 ) or Z = Z P ) might be used A simple example o f a derived variable is the mean

L-21

* * *

of the concentration for all depths sampled at a station.

4.1

Examples o f grouping based on water quality d a t a Clustering methods have been developed for the zonation of a

lake into regions, homogeneous with respect to the level of one

or more water quality parameters.

A clustering method, which

defines a homogeneous set o f concentrations as one which can be fitted

by

coliform

a

Poisson

distribution,

concentrations

cruise by

cruise

season was

sampling

(El-Shaarawi

1984).

El-Shaarawi,

at

found

A

was

et

1981;

pattern

to

on

stations

al.,

spatial

applied and

which

surface

Lake

Erie,

Esterby changed

and with

be qualitatively consistent from year

t o

to

year, whereas apparent biases from year to year made comparison of

concentrations

impossible.

This

is

an

example

where

a

grouping procedure was a natural choice due to the discontinuA

ous nature of the spatial regions of similar concentration.

second procedure, suitable for a larger class of water quality parameters, is based o n a linear additive model for the station and

cruise

components

(El-Shaarawi

and

of

data

1978).

Shah,

collected

Allowance

in

is

orthogonality and the need to transform the data. station

component

is

significantly

one

made

different

year

for

non-

Provided the from

zero,

a

criterion based on the change i n the residual sum of squares is used to group stations. a

single

water

quality

1977)

Kwiatkowski,

The procedure has been used mostly for

but

parameter

the

( e.g.

multivariate

El-Shaarawi

extension

was

and also

given by El-Shaarawi and Shah. data on water

Existing

precipitation have

been

1985;

analyzed

and

quality

for sets o f lakes by

Haemmerli

grouping and

parameters

related

to

in regions o f Eastern methods

Bobge,

this

(El-Shaarawi volume)

to

et

al.,

determine

subsets of similar lakes which can subsequently be used design of future data collections.

acid

Canada

El-Shaarawi et a l .

in the applied

both a graphical method, in which the lakes were plotted on the first three principal components, and a k-means non-hierarchical nearest-centroid lakes.

Plots

variation

of

clustering method

the

explained

descriptions summary

of

the

by

clusters a

in

given

variables

statistics, histograms

to determine

space, number

within and

Q-Q

the of

groups of

percentage clusters,

of and

clusters,

by

means

of

plots,

were

used

to

select the number o f clusters and to characterize the clusters.

8 Two

examples

considerably following. oxygen

of

the

different

use

of

from

clustering methods the

above

which

are

are

the

examples

The first example is another analysis o f dissolved

concentrations

discussed

in the

in

Lake

Erie,

introduction.

the

issue

which

was

( 1 9 8 4 ) used

Anderson et al.

a

clustering method on three-dimensional spatial data, consisting

of

the

temperature and

surface and

bottom

the dissolved

samples at

each

oxygen concentration o f

station,

to

divide

these

points into groups which were assumed t o be either hypolimnetic or

not

hypolimnetic.

El-Shaarawi

stations on the Niagara clustering between

method

chlorinated

the

et

separately

substances

(1985)

al.

grouped

applying a complete linkage

standardized

calculated

organic

sediments. below

to

stations

River by

in

Euclidean

distances

from

ranks

water

the

and

in

of

suspended

Ranks were used as a means o f dealing with values

the detection

limit

and,

for each

phase,

the

stations

were ranked separately for each substance. 5.

SPATIAL AUTOCORRELATION METHODS If the value o f a variable depends upon the values of

same

variable

at

neighbouring

autocorrelation exists.

The

locations,

problem

of

autocorrelation exists is more difficult

then

the

spatial

determining

whether

for spatial data than

for a time series since dependence may extend in all directions for

the

spatial

data

series.

The

thorough

treatment

which

were

but

monograph of

introduced

by

only Cliff

the

into

the

and

Ord

spatial

earlier by

past

for

(1973)

the

autocorrelation

Moran

only examples f r o m geography were used

and

time

provides

Geary.

a

measures Although

by Cliff and Ord, the

methods are applicable to any fixed set of points in space. Let

11'

=

[ y l , y 2 , ...,yn]

be

the

vector

variable at m points in space, for example,

or

z1 corresponding

matrix Y (Table

1).

to

the

first

water

y

and

values

could be

quality

of

a

y1, 1 1 1

parameter

of

Then the general forms o f the Moran and

Geary spatial autocorrelation coefficients are

i# j

of

9

respectively, where and

The

spatial

weights the

information

{wij}.

form

of

The the

autocorrelation,

enters

only

€or

spatial

test

test

that

of

is

the

a

through

the

matrix

of

autocorrelation is

in

hypothesis

random

of

no

distribution

in

spatial space,

against an alternative as specified by the matrix {wij}. Jumars et al. (1977) first applied this method in the field of ecology.

These authors recognized that the method uses the

spatial information in the data, which is ignored in tests such as Fisher's

index o f dispersion, but

is less restrictive than

spectral analysis which requires intensive and regularly-spaced samples.

The

pairs

connected,

are

matrix

{wij] in

requires

some

the

sense, and

definition of which either

a measure

of

adjacency or of distance between pairs of connected localities (Sokal,

1979).

When

no

a

priori

distance

is

available, a

spatial correlogram, obtained using the unweighted autocorrelation

coefficient,

can

be

used

to

suggest

the

form.

A

more

recent application and a list of other ecological applications are given by Mackas (1984). One

of

the

applications

of

the

above

method

is

in

the

estimation of patch diameters.

Spectral analysis has been used

in ecology

Two examples are

for this purpose.

(1970) and Lekan and Wilson (1978). regularly sampled

Platt

et

al.

In both cases, densely and

transects with gradients of

temperature

and

salinity were used and the objectives included inferences about patch

size

or

acknowledged

length

that

this

scale

of

method

phytoplankton. would

Platt

et

underestimate patch

al. size

since the transect may not pass through the widest part of the patches.

As far as the author can determine, the methods based on the Moran and

Geary spatial autocorrelation coefficients have not

been applied to water quality parameters other than chlorophyl

10 1984).

(Mackas, biomass and

used

Lekan

and

considered

the

same

new

indicator

analyses

(1978).

each

the

the

spectral

Wilson

for

parameters,

is

which

in

The

by

form of

application

literature

on

of

phytoplankton

Platt

and

ecological

et

for

(1978)

al.

{wij}

needs

water

applications

to

be

quality

is

more

r e l e v a n t t h a n t h a t on g e o g r a p h i c a l a p p l i c a t i o n s .

6.

P A R A M E T E R E X P R E S S E D AS A F U N C T I O N OF P O S I T I O N

THE

Methods o f

the

previous

about

the

response

based

upon a n e x p l i c i t

position. two

However,

sections

section

surface.

since

provide

In t h i s

indirect

section,

is

there

are

and

its

c o n s i d e r a b l e o v e r l a p between

the

r e l a t i o n s h i p between a

spatial

inferences

t h e methods variable

is

autocorrelation

also

considered

here. in

As

the

Consider

x'

at

points

variable

m

1).

(Table quality

previous

again

The

parameter,

but of

x;p) the =

5;

or

a

=

1,2

,...,m .

location,

where

space

procedures

variable

term

(xil, Then

expressed

could

derived

this

xiz,

the

value the

sum

values

~ a

be

used

of

the

51

the

water

quality

only

5;

for (xil,

=

location

three-dimensional of

a

water

Let

provide

of a

1 ~, 1 o 1r

either

from

w i l l

univariate.

of

be of

section.

xi3) or

are

vector

values

parameter in

i n two-

as

the

which

gives

model

i t h v a r i a t e value

variable

deterministic

of

space for at

the

and

i

ith

random

is

terms,

If

or

the a

in

vector

parameter

parameters

section,

= [yl,y2, ...,ym],

X

is

a

the value

matrix of

the

m

rows

given

random

of

term

at

by

the

row

vectors

i does not

location

5;.

depend

upon t h e v a l u e a t n e a r b y l o c a t i o n s ,

where

the

parameters, in

(4).

ei's

are

uncorrelated.

ordinary least

In t h e m o r e g e n e r a l f o r m ,

stationarity

is required.

For

f

linear

in

the

s q u a r e s c a n be u s e d t o f i t t h e model

If

(3),

some

assumption

the dispersion matrix

about

i s assumed

11 known, p.

generalized

least

squares

Hand-drawn

contour

To

variability. geologists

can

procedure model

be

which

of

be

applied

grid

as

consists of

1)

surface

the

on

(Rao,

3)

on

(Davis,

method the

literature

is

the

thus,

using standard

known

is

prediction consists the

the

of

of

sum o f

unobserved

on

between

estimating

the value of done

point

in

Equation

(3)

covers

i.e.

stationarity

case

with

constant

drift,

the

unbiased

BLUE

and

mean

which

he

statistical

new

terminology

the observations

a

for

under

t h e mean,

(Delhomme,

at

this

estimator p,

kriging

(BLUE) and

for

that

it at

E(yp),

which

accounts

for

at

that an

this

is analogous

the

1981,

since

point

is

28-31).

hypothesis",

kriging,

f(xi)

and

t o what

pp.

"intrinsic

and u n i v e r s a l

the

distinction

unobserved

Smith,

1978)

are

variable

value,

Note

and

that

Contours

t h e mean

term

value

the

point

observation

kriging

of

shows

the

the

i.e.

reduces

to

a

i n t h e f i r s t case.

Delhomme

(1978)

provides

an e s t i m a t e of

contouring,

states

including

that

an advantage

to kriging

p r e c i s i o n which most

least

squares,

do

not.

The

method

of

ordinary

least

is

that

techniques

As

can

from t h e a n a l o g y w i t h r e g r e s s i o n which w a s g i v e n above, so.

trend

reference

in

variogram.

value

(1971)

(Draper

of

the

the

unobserved

an i n d i v i d u a l

regression

value

accounts

useful

(1971),

In kriging,

values.

the

which

language

defines

an

p

nearby

called

and t h e c o r r e l a t i o n t a k e s t h e form of a

best-linear at

constant

at

w a s d e v e l o p e d by

A

Watson

regression

2) using the

is

procedure,

English

effect,

Watson

yp

the

dependence

by

the The

the variable

of

Matheron.

the

predicting

points.

estimator

by

to in

procedures, assume

polynomial

as k r i g i n g ,

spatial

position.

procedure A

function called

by

unobserved

of

contours

s t a t i s t i c a l terms.

parametric

constructed

fitting a

known

paper

a r e assumed c o r r e l a t e d ,

not

1965,

procedure,

which

function

1973).

led

technique

and

a

display this

several

methods

interpolation

geomathematicians

this

adopted

constructing

numerical

analysis

relates

to of

spatial coordinates,

for spatial autocorrelation, French

used

t o e s t i m a t e t h e mean v a l u e o f

and

some

have

statistical

the variable

points

often

subjectivity

expressed

r e g r e s s i o n model

using

the

others)

well-known

variable

are

maps

avoid

(among

on

based

it

can

180).

squares

be

for seen

this

provides

is an

12 estimate o f the precision, however, the method o f trend surface analysis, as defined above, does not use this capability. El-Shaarawi and Esterby ( 1 9 8 1 ) to

an

have shown h o w this can be done

construct contours o f constant value, for either the mean or individual observation, and

these contours.

to attach confidence bands to

The method was applied to surface temperature

data from Lake Erie. Whatever the assumption about tion,

the

matrix.

difficulty

is

in

the form of the autocorrela-

estimation

of

the

dispersion

This requires estimation o f the variogram in kriging 1978)

(Delhomme,

which

is usually

done by

ad hoc

procedures.

Iterative procedures for regression with correlated errors are given

by

Cliff

and

(1973)

Ord

and

Cook

and

(1983).

Pocock

Maximum likelihood methods of estimation are considered by Cook and Pocock ( 1 9 8 3 )

and Mardia and Marshall ( 1 9 8 4 ) .

Methods o f testing for spatial autocorrelation in regression residuals applied (1973)

by

Cliff

in paleoecology

are

given

(Howe

and et

Ord al.,

(1973)

and

1984).

have

Cliff

been

and

Ord

stress the fact that detection of autocorrelation in the

residuals may be due to one of the following:

1) an inadequate

form

and

for

the

variables,

relationship

such

as

using

between a

dependent

linear

model

when

independent curvature

is

present, 2 ) omission o f one or more regressors and 3 ) the need for autocorrelation structure in the model. 2),

means

of

removing the

Clearly, in 1 ) and

autocorrelation from the residuals

exist, which are simpler than the methods incorporating spatial autocorrelation. Data collected in space need not exhibit spatial autocorrP1ation

for various

distances distance (1983)

between within

reasons. the

which

points

It may not be detectable because in

dependence

space

are

occurs.

larger Cook

discuss aggregation to remove correlation.

than

and

the

Pocock

Analogously,

both spacing and the use o f means or medians over seasons h a v e been suggested as methods o f reducing serial correlation in the analysis of water quality data for temporal trends (van Belle and Hughes,

1984).

The consequences o f using

ordinary

least

squares, when errors are correlated, are inefficient estimators of the regression parameters and a downwards biased

estimator

of the variance with the latter resulting in an overestimate of the significance o f the regression (Cliff and Ord, 1 9 7 3 ) .

13 Methods

in

expressing temporal

this

the

section

components

also

as

variable

and

provide

the

other

sum

of

the

capability

spatial

explanatory

of

components,

variables

such

as

An example of this, which encompasses many of the

temperature.

points'discussed in this section, is the complicated model used by

Eynon

and

Switzer

(1983)

to

construct

contour

maps

of

rainfall pH. 7.

DISCUSSION The many dimensions of water quality data sets make analysis

difficult.

Data is often collected to meet objectives related

to monitoring the change in water quality conditions which

are

necessarily

the

too

general

to

dimensions o f the problem.

be

of

help

in

reducing

Thus, cluster analysis and related

methods, which d o not use the spatial location but can be used to

examine

the

complementary

to

structure the

of

multivariate

univariate

methods

data,

which

do

are

use

the

spatial location.

The analyst can expect to use the classes of

methods discussed

here

scientific

in

understanding

an

iterative fashion, coupled with

of

the

system,

to

arrive

at

a

characterization o f spatial structure. Of the methods discussed, only the grouping procedures are strictly for the purpose of discovering structure in the data. The other methods, even, in the Characterization stage, are used for testing hypotheses and estimation. REFERENCES Anderberg,

M.R.,

1973.

Cluster

Analysis

for

Applications.

Academic Press, New York, 359 pp. Anderson,

J.E.,

Unny, T.E., Erie

El-Shaarawi,

1984.

(U.S.A.-Canada),

variability Hydrol., Barica, J.,

using

A.H.,

Esterby,

S.R.

and

Dissolved oxygen concentrations in Lake 1.

cluster

Study and

of

spatial

regression

and

temporal

analysis.

J.

72: 209-229. 1982.

Lake Erie depletion controversy.

Lakes Res., 8(4): Charlton, M.N., Lake Erie:

1979.

J. Great

719-722. Hypolimnetic oxygen depletion in central

Has there been any change?

Sci. Publ. Ser. No. 1 1 0 ,

25 pp.

Inland Waters Dir.,

14 Cliff,

A.D.

and

Ord,

J.K.,

1973.

Spatial Autocorrelation.

Methuen, New York, 178 pp. Cook,

D.G.

and

Pocock,

geograhical mortality correlated errors. Davis, J.C.,

1973.

S.J.,

1983.

Multiple

regression

in

studies, with allowance for spatially

Biometrics, 39: 361-371. Statistics and Data Analysis in Geology.

Wiley, New York. Delhomme, J.P., 1978.

Kriging

in

Water Resources, 1: 251-266. Dobson, H.H. and Gilbertson, M.,

the

1971.

hydrosciences.

Adv.

Oxygen depletion in the

hypolimnion of the Central Basin of Lake Erie, 1929-1970. Proc. 14th Conf. Great Lakes Res., Int. Assoc. Great Lakes Res., pp. 743-748. Draper, N.R. and Smith, Analysis. El-Shaarawi,

H.,

1981.

Applied

Wiley, New York, 709 pp. A.H. and Kwiatkowski, R . E . ,

describe

the

inherent

spatial

and

1977.

temporal

parameters in Lake Ontario, 1974.

Regression A

model

to

variability o f

J. Great Lakes Res.,

3:

177-183. El-Shaarawi, A.H. for

and Shah, K.R.,

classification o f

Inland

Waters

Institute, Ontario. El-Shaarawi,

1978.

a lake.

Directorate,

Canada A.H.,

Centre

National

for

Esterby,

Statistical procedures

Scientific Inland

S.R.

Series Water

Waters,

and

Dutka,

86. Research No.

Burlington, B.J.,

1981.

Bacterial density in water determined by Poisson or negative binomial

distributions.

Appl.

Environ.

Microbiol.

41:

107-1 1 6 . El-Shaarawi, regression

A.H. de

la

Lake

Erie

Esterby,

variation

A.H.,

1984.

El-Shaarawi, A.H., the

spatial

Dissolved

(U.S.A.-Canada),

dissolved oxygen in the Hydrol., 72: 231-243. 1985.

1981.

S.R.,

d'une

Analyse

de

caracteristique

Eau du Qudbec, 14: 222-228.

limnologique. El-Shaarawi,

and

2. Central

Esterby, S.R.,

oxygen concentrations A

statistical

Basin

of

Warry, N.D.

in

model

for

Lake Erie.

J.

and Kuntz, K.W.,

Evidence o f contaminant loading to Lake Ontario from Niagara

1278-1289.

River.

Can.

J.

Fish.

Aquat.

Sci.,

42:

15

El-Shaarawi, A.H., Esterby, S.R., Clair, T. and Lemieux, R., 1985. Spatial variability o f acidifiation-related water quality

parameters

Contribution C 8 5 -

.

for

lakes

in

Eastern

Canada.

NWRI

Environment Canada, Burlington, Ont.

1984. Col iform Esterby, S.R. and El-Shaarawi, A.H., Hydrobiologia, concentrations in Lake Erie - 1 9 6 6 to 1 9 7 0 . 111,

133-146.

Eynon, B.P. and Switzer, P., 1 9 8 3 . acidity.

The variability of rainfall

Canad. J. Statist., 1 1 : 1 1 - 2 4 .

Gordon, A.D.,

Classification.

1981.

1 9 3 pp. Hartigan, J.A.,

1975.

Chapman and Hall, London,

Clustering Algorithms.

Wiley, New York,

3 5 1 pp.

Haugh, L.D. O.D.

Practice 5.

introductory

overview of

modelling spatial time series.

to

Anderson

Howe, S .

An

1984.

approaches

(Editor),

Time

Series

some recent

In:

Analysis:

Theory

and

North Holland, Amsterdam, pp. 2 8 7 - 3 0 1 .

and Webb, T., 111, 1 9 8 4 .

climatic terms:

Calibrating pollen data in

improving the methods.

Quaternary Science

Reviews, 2 : 1 7 - 5 1 . Jdreskog, K.G., Klovan, J.E. and Reyment, R.A., 1976. Geological Factor Analysis. Elsevier, Amsterdam, 1 7 8 pp. Jumars,

P.A.,

Thistle, D.

two-dimensional

and

spatial

Oecologia, 2 8 : 1 0 9 - 1 2 3 . Kwiatkowski, R.E., 1984. El-Shaarawi Lakes

(Editor),

Surveillance

Ser. No.

136,

Jones, M.L., in

Lake

review.

Erie

biological

Statistical Assessment

Program,

1966-1981,

Detecting

1977.

structure

Lake

In : of

the

Erie.

data. A.H. Great Sci.

Inland Waters Directorate, Environment Canada,

Burlington, Ontario, pp. 3 - 2 6 . Lebart, L., Morineau, A. and Warwick, K.M., 1 9 8 4 . Multivariate Descriptive Statistical Analysis. Wiley, New York, 2 3 1 pp. Lekan,

J.F.

and

Wilson, R.E.,

1978.

Spatial variability

of

phytoplankton biomass in the surface waters of Long Island. Estuarine and Coastal Marine Science, 6 : 2 3 9 - 2 5 1 . Mackas,

D.L.,

community

1984.

composition

Spatial in

a

autocorrelation continental

of

shelf

plankton ecosystem.

Limnol. Oceanogr., 2 9 : 4 5 1 - 4 7 1 . Mardia, K.V. estimation regression.

and Marshall, R.J., 1984. Maximum likelihood o f models for residual covariance in spatial Biometrika, 7 1 : 1 3 5 - 1 4 6 .

16 1984.

Pielou, E.C.,

The

Interpretation o f

Ecological

Data.

Wiley, New York, 2 6 3 pp. Platt,

T.,

Dickie,

L.M.

and

Trites,

1970.

R.W.,

Spatial

heterogeneity o f phytoplankton in a near-shore environmeat. 27:

J. Fish Res. Board Can., Rao,

1965.

C.R.,

Linear

Statistical

Rosa, F. and Burns, N.M., of

1981.

central

and

approach indicates change. Basin

Inference

and

its

Wiley, New York, 5 2 2 pp.

Applications. hypolimnion

1453-1473.

Oxygen

Oxygen depletion rates in the eastern

Lake

Erie.

A

new

Presented at Workshop on Central

Depletion, Dec.

2-3,

1981,

Canada Centre for

Inland Waters, Burlington, Ontario. Sokal, R.R.,

1979.

correlograms. Contemporary

Ecological parameters inferred from spatial

In:

Patil and M.

G.P.

Quantitative

International Maryland, pp.

Co-operative

trend in water quality. G.S.,

Geology, 3 :

Rosenzweig (Editors),

and

Publishing

Related

Ecometrics.

House,

Fairland,

167-196.

van Belle, G. and Hughes, J:P., Watson,

Ecology

1971. 215-226.

1984.

Nonparametric tests for

Water Resour. Res., 2 0 :

Trend

Surface

Analysis.

127-136.

Jour.

Math.

UNCERTAINTY I N WATER QUALITY DATA

MONTGOMERY and THOMAS G. SANDERS Environmental Engineering Program, Department of C i v i l E n g i n e e r i n g , Colorado S t a t e U n i v e r s i t y , F o r t C o l l i n s , CO 80523 ROBERT H.

1.1 ABSTRACT Water q u a l i t y d a t a a r e c o l l e c t e d t o p r o v i d e information t o a s s i s t i n t h e u n d e r s t a n d i n g and managing of water r e s o u r c e s . The u s e f u l n e s s o f water q u a l i t y d a t a c o l l e c t e d is i n v e r s e l y r e l a t e d t o t h e amount of u n c e r t a i n t y i n t h e d a t a . Data u n c e r t a i n t y may be d e f i n e d a s a s t a t e o f d o u b t i n how r e p r e s e n t a t i v e o b s e r v e d values a r e of t h e t r u e p o p u l a t i o n c h a r a c t e r i s t i c s . Data u n c e r t a i n t y may b e e s t i m a t e d a s a f u n c t i o n of b o t h sampling and nonsampling e r r o r s . Sampling e r r o r s r e s u l t from t h e sampling network d e s i g n ( l o c a t i o n and f r e q u e n c y of sample c o l l e c t i o n ) which s a m p l e s o n l y a s u b s e t of t h e t o t a l p o p u l a t i o n . Nonsampling e r r o r s r e s u l t f r o m t h e p r o c e s s of m e a s u r i n g t h e amount of water quality material present. The measurement p r o c e s s may be d i v i d e d i n t o s a m p l e c o l l e c t i o n and l a b o r a t o r y a n a l y s i s . Sample collection includes t h e physical procedure f o r obtaining, s t o r i n g , and t r a n s p o r t i n g a water sample f o r l a t e r a n a l y s i s . L a b o r a t o r y a n a l y s i s c o n s i s t s o f some m e t h o d o f e s t i m a t i n g t h e amount ( c o n c e n t r a t i o n ) o f a g i v e n m a t e r i a l i n t h e water sample. P r e s e n t e d h e r e i s a g e n e r a l d i s c u s s i o n o f t h e s o u r c e s o f water q u a l i t y d a t a u n c e r t a i n t y , a method t o e s t i m a t e d a t a u n c e r t a i n t y i n w a t e r q u a l i t y v a r i a b l e s , and t h e i m p l i c a t i o n s o f u n c e r t a i n t y i n water q u a l i t y d a t a .

1.2

INTRODUCTION

F i n d i n g s o l u t i o n s t o water q u a l i t y management problems i s a n extremely d i f f i c u l t task. Systems a n a l y s i s t e c h n i q u e s p r o v i d e a method t o a s s i s t d e c i s i o n m a k e r s i n s o l v i n g problems. The systems a p p r o a c h t o problem s o l v i n g may b e c o n c e p t u a l i z e d a s a d e c i s i o n m a k e r who h a s a d e s i r e d o b j e c t i v e w i t h a t l e a s t t w o

18

p o s s i b l e c o u r s e s of a c t i o n a n d a s t a t e of d o u b t a s t o t h e b e s t c o u r s e of a c t i o n ( A c k o f f , 1 9 6 2 ) . After defining t h e objective, c o u r s e s o f a c t i o n , and c r i t e r i a f o r t h e s e l e c t i o n o f " b e s t " s o l u t i o n , t h e d e c i s i o n maker m u s t choose a c o u r s e of a c t i o n based upon a v a i l a b l e i n f o r m a t i o n . T h i s i n f o r m a t i o n may be q u a 1 i t a t i v e or q u a n t i t a t i v e , o r both, i n n a t u r e . Q u a l i t a t i v e i n f o r m a t i o n i s based on e x p e r i e n c e and judgement. Q u a n t i t a t i v e i n f o r m a t i o n i s o b t a i n e d t h r o u g h s t a t i s t i c a l a n a l y s i s and mathematical modeling u t i l i z i n g o b s e r v e d v a l u e s ( d a t a ) on s e l e c t e d v a r i a b l e s . U n f o r t u n a t e l y , o f t e n t h i s information is incomplete o r i n e r r o r , t h e r e f o r e t h e r e e x i s t s u n c e r t a i n t y ( s t a t e of d o u b t ) a s t o whether a g i v e n c o u r s e of a c t i o n w i l l r e s u l t i n a s p e c i f i c outcome. Types of u n c e r t a i n t y i n t h e d e c i s i o n making p r o c e s s a r e d a t a , m o d e l , p a r a m e t e r , and s u p p l e m e n t a l (Montgomery e t . a 1 . , 1 9 8 3 ) . Data u n c e r t a i n t y c o n s i s t s of e r r o r ( v a r i a b i l i t y and b i a s ) i n t h e d a t a t h a t r e s u l t e d f r o m i m p e r f e c t i o n s i n t h e sample d e s i g n and measurement t e c h n i q u e s . Model u n c e r t a i n t y r e s u l t s f r o m t h e f a c t t h a t models a r e o n l y r e p r e s e n t a t i o n s of r e a l world p r o c e s s e s , and may be d i s t u r b e d by e x t r a n e o u s s o u r c e s of v a r i a t i o n o r may n o t be complete r e p r e s e n t a t i o n s . Parameter u n c e r t a i n t y r e s u l t s from t h e f a c t t h a t model p a r a m e t e r s a r e o n l y e s t i m a t e s o b t a i n e d f r o m observed data. Supplemental u n c e r t a i n t y c o n s i s t s of a l l remaining u n c e r t a i n t y n o t a l r e a d y i n c l u d e d . For example, e r r o r s i n i n t e r p r e t a t i o n o f d a t a , t h e s e l e c t i o n o f a wrong model, o r human coding e r r o r . The o b j e c t i v e s of t h i s paper a r e t o : 1) d e s c r i b e t h e s o u r c e s of d a t a u n c e r t a i n t y i n w a t e r q u a l i t y v a r i a b l e s c o l l e c t e d by a w a t e r q u a l i t y m o n i t o r i n g network system, 2 ) p r e s e n t a method t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y , and 3 ) d i s c u s s t h e i m p l i c a t i o n s of w a t e r q u a l i t y d a t a u n c e r t a i n t y .

1.3

WATER Q U A L I T Y DATA UNCERTAINTY

Water q u a l i t y d a t a a r e o b s e r v a t i o n s o b t a i n e d from a w a t e r q u a l i t y m o n i t o r i n g system. A water q u a l i t y m o n i t o r i n g system may b e d i v i d e d i n t o two m a j o r c a t e g o r i e s d a t a a q u i s i t i o n ( o p e r a t i o n a l ) and d a t a u t i l i z a t i o n ( i n f o r m a t i o n a l ) ( S a n d e r s e t . a l . ,1983). D a t a a q u i s i t i o n c o n s i s t s of t h r e e a c t i v i t i e s : 1) network d e s i g n , 2 ) sample c o l l e c t i o n , and 3) l a b o r a t o r y a n a l y s i s . D a t a u t i l i z a t i o n a l s o c o n s i s t s o f t h r e e a c t i v i t i e s : 1) d a t a

19

h a n d l i n g , 2 ) d a t a a n a l y s i s , and 3 ) i n f o r m a t i o n u t i l i z a t i o n . Figure 1 p r e s e n t s a summary of t h e s p e c i f i c f u n c t i o n s f o r each of t h e s i x a c t i v i t i e s . The d a t a a q u i s i t i o n a c t i v i t i e s w i l l c r e a t e uncertainty i n t h e data. Model and parameter u n c e r t a i n t y w i l l r e s u l t from t h e d a t a a n a l y s i s a c t i v i t y . Supplemental u n c e r t a i n t y may occur from any of t h e w a t e r q u a l i t y m o n i t o r i n g a c t i v i t i e s . The amount of d a t a u n c e r t a i n t y due t o t h e network d e s i g n i s a f u n c t i o n of sample s i t e l o c a t i o n ( s p a t i a l ) and sampling frequency ( t e m p o r a l ) . The i n t e n t i n d e t e r m i n i n g s a m p l e l o c a t i o n a n d frequency is t o make a r e p r e s e n t a t i v e s e l e c t i o n t h a t w i l l p r o v i d e reasonable e s t i m a t e s of t h e t r u e p o p u l a t i o n c h a r a c t e r i s t i c s t h a t a r e of concern. Data u n c e r t a i n t y a r i s e s because o n l y a s u b s e t of

7- v

DATA ACQUISITION

1. 2. 3.

S t a t i o n Location Sampling Frequency Variable Selection

1. 2.

Sampling Technique F i e l d Measurements Sample P r e s e r v a t i o n Sample T r a n s p o r t

3.

iL 4.

1. 2.

3.

4.

DA~A UTILIZATION

A n a l y s i s Techniques Operational Procedures Q u a l i t y Control Data Recording

1. 2. 3.

Data Reception S c r e e n i n g and V e r i f i c a t i o n Storage Retrieval

1. 2.

Statistics Modeling

1. 2.

Reporting Formats U t i l i z a t i o n Evaluation

F i g . 1.1. Summary o f W a t e r Q u a l i t y M o n i t o r i n g A c t i v i t i e s (Sanders, e t a l . , 1 9 8 3 ) . t h e t o t a l p o p u l a t i o n i s s a m p l e d , and t h e n i n f e r e n c e s a r e made a b o u t t h e p o p u l a t i o n b a s e d on t h e l i m i t e d number o f o b s e r v e d values.

20

Sample s i t e l o c a t i o n i s d e f i n e d i n two l e v e l s - macro a n d micro l o c a t i o n . Macro l o c a t i o n r e f e r s t o s i t e l o c a t i o n i n r e f e r e n c e t o t h e e n t i r e a r e a (frame) of i n t e r e s t (e.g. watershed, a q u i f e r ) . Micro l o c a t i o n r e f e r s t o s p e c i f i c v e r t i c a l and l a t e r a l p o i n t of sample c o l l e c t i o n a t a p a r t i c u l a r macro l o c a t i o n s i t e . Data u n c e r t a i n t y a t t h e macro l o c a t i o n l e v e l i s a f u n c t i o n of t h e underlying s p a t i a l p r o c e s s t h a t governs t h e water q u a l i t y v a r i a b l e , t h e n u m b e r a n d l o c a t i o n of s a m p l e s i t e s , and t h e p o p u l a t i o n c h a r a c t e r i s t i c ( s t a t i s t i c ) of i n t e r e s t . A problem i s t h a t t h e u n d e r l y i n g p o p u l a t i o n i s n o t known ( i . e . t h a t i s what you a r e t r y i n g t o e s t i m a t e ) , t h u s one may r e a l l y n o t know whether t h e c o l l e c t e d d a t a i s r e p r e s e n t a t i v e of t h e t r u e p o p u l a t i o n . Micro l o c a t i o n d a t a u n c e r t a i n t y i s r e l a t e d t o how w e l l t h e p o i n t ( s ) of sample c o l l e c t i o n r e p r e s e n t t h e a c t u a l conditions, g i v e n t h a t s p a t i a l c o n c e n t r a t i o n g r a d i e n t s may e x i s t . For e x a m p l e , i n a r i v e r t h e r e may be c o n c e n t r a t i o n g r a d i e n t s a c r o s s t h e r i v e r and a t v a r i o u s d e p t h s . Thus, samples would h a v e t o b e t a k e n a t v a r i o u s p o i n t s v e r t i c a l l y and l a t e r a l l y , t o e s t i m a t e t h e a c t u a l c o n d i t i o n s (e.g. mean) i n t h e e n t i r e r i v e r c r o s s s e c t i o n . S a m p l i n g f r e q u e n c y i s t h e d e t e r m i n a t i o n o f t h e number of samples and s p a c i n g of samples t o be t a k e n p e r t i m e u n i t . Data u n c e r t a i n t y due t o sampling frequency a r i s e s from c o l l e c t i n g samples a t i n c o r r e c t times and f r o m l i m i t e d s a m p l e s i z e . The a p p r o p r i a t e t i m e s f o r c o l l e c t i o n of s a m p l e s i s r e l a t e d t o t h e d e s i r e d o b j e c t i v e ( s t a t i s t i c ) and t h e u n d e r l y i n g temporal p r o c e s s of t h e water q u a l i t y variable. For example, i f t h e d e s i r e d o b j e c t i v e i s t o e s t i m a t e t o t a l annual m a t e r i a l l o a d s t o a l a k e , t h e s a m p l i n g f r e q u e n c y ( t i m e of c o l l e c t i o n ) m u s t be s u c h t o account f o r s t o r m e v e n t s which may c o n t r i b u t e t h e m a j o r i t y of lake material load. D a t a u n c e r t a i n t y d u e t o t h e number of samples c o l l e c t e d r e s u l t s from t h e sample being o n l y a s u b s e t o r p r o p o r t i o n of t h e t o t a l p o p u l a t i o n . For example, i f a w a t e r q u a l i t y v a r i a b l e h a s an u n d e r l y i n g temporal p r o c e s s w i t h a weekly v a r y i n g component, t h e n c o l l e c t i n g one sample p e r w e e k may n o t be r e p r e s e n t a t i v e of t h e c h a r a c t e r i s t i c s o f t h e w a t e r q u a l i t y variable f o r t h e week. Another a s p e c t of sampling f r e q u e n c y t h a t may c a u s e d a t a u n c e r t a i n t y i s whether t h e s a m p l e s c o l l e c t e d a r e i n s t a n t a n e o u s g r a b samples o r t i m e c o m p o s i t e s , e a c h of which p r o v i d e i n f o r m a t i o n w h i c h i s r e p r e s e n t a t i v e of d i f f e r e n t t i m e intervals. T h u s , d a t a u n c e r t a i n t y may o c c u r b e c a u s e t h e t i m e

21

i n t e r v a l r e l a t e d t o t h e t y p e o f s a m p l e c o l l e c t e d may n o t b e r e p r e s e n t a t i v e i n f o r m a t i o n on t h e a c t u a l c o n d i t i o n s o r t h e de s i r e d o bj e c t i v e Sample c o l l e c t i o n i s t h e p h y s i c a l method of obtaining, s t o r i n g , and t r a n s p o r t i n g a water sample f o r l a t e r a n a l y s i s . The o b j e c t i v e o f s a m p l e c o l l e c t i o n i s t o c o l l e c t a p o r t i o n of m a t e r i a l small enough i n v o l u m e t o b e t r a n s p o r t e d c o n v e n i e n t l y and h a n d l e d i n t h e l a b , w h i l e s t i l l a c c u r a t e l y r e p r e s e n t i n g t h e Data u n c e r t a i n t y a r i s e s f r o m a l l water sour c e b e i n g s a m p l e d . t h r e e sample c o l l e c t i o n a c t i v i t i e s . The p h y s i c a l d e v i c e t h a t c o l l e c t s t h e water s a m p l e s h o u l d p r o v i d e a r e p r e s e n t a t i v e and uncontaminated sample. For example, i n ground water measurement t h e p h y s i c a l d e v i c e can cause a e r a t i o n o f t h e a n a e r o b i c w a t e r , hence c o n t a m i n a t e t h e sample. R i v e r s e d i m e n t s a m p l e r s a r e an example of t h e d i f f i c u l t y i n o b t a i n i n g r e p r e s e n t a t i v e s a m p l e s a s t h e y are unable t o sample t h e e n t i r e v e r t i c a l transect. Data u n c e r t a i n t y from sample s t o r a g e c a n b e a t t r i b u t e d t o c h e m i c a l s a d d e d t o p r e s e r v e t h e s a m p l e , f i l t r a t i o n p r i o r t o a n a l y s i s , and t h e t y p e of s t o r a g e c o n t a i n e r . The t r a n s p o r t a t i o n o f t h e s a m p l e may c r e a t e d a t a u n c e r t a i n t y due t o a g i t a t i o n , t e m p e r a t u r e , l i g h t , and t i m e u n t i l a n a l y s i s . L a b o r a t o r y a n a l y s i s i s t h e p r o c e s s of e s t i m a t i n g t h e l e v e l of m a t e r i a l p r e s e n t i n a given water sample using w e t chemistry a n d / o r i n s t r u m e n t a t i o n . Data u n c e r t a i n t y i n l a b o r a t o r y a n a l y s i s a c t i v i t i e s r e s u l t s from i n t e r f e r e n c e s d u e t o o t h e r v a r i a b l e s i n t h e sample, c a l i b r a t i o n procedures, sloppy experimental t e c h n i q u e , bad o r o l d s t a n d a r d i z e d r e a g e n t s , d e f e c t i v e i n s t r u m e n t s , a n d when m a t e r i a l l e v e l s a r e n e a r t h e d e t e c t i o n l i m i t of t h e a n a l y t i c a l t e c h n i q u e . The p r e s e n c e of h i g h s e d i m e n t c o n c e n t r a t i o n s i s a n e x a m p l e o f a v a r i a b l e t h a t may c a u s e p o t e n t i a l i n t e r f e r e n c e s i n c e r t a i n a n a l y t i c a l t e c h n i q u e s ( e . g. dissolved metals). Calibration procedures are inherently u n c e r t a i n due t o t h e s c a t t e r of t h e d a t a a n d t h e n e c e s s i t y of d e t e r m i n i n g a l i n e o f " b e s t f i t " t o r e l a t e t h e measurement readings t o t h e variable concentrations.

.

1.4

DATA UNCERTAINTY ESTIMATION

One method t o e v a l u a t e t h e amount of d a t a u n c e r t a i n t y i s t h e t o t a l s u r v e y d e s i g n (TSD) concept (Horvitz,l978). The TSD

22

concept a t t e m p t s t o minimize t h e t o t a l e r r o r of t h e e s t i m a t e o f i n t e r e s t by c o n t r o l l i n g t h e m a g n i t u d e o f i n d i v i d u a l e r r o r components through a l l o c a t i o n of a v a i l a b l e r e s o u r c e s . In order t o a p p l y t h e TSD c o n c e p t , i n f o r m a t i o n i s needed on t h e d e s i r e d m o n i t o r i n g ( s u r v e y ) o b j e c t i v e , model of s u r v e y e r r o r s , c o s t s o f a l t e r n a t i v e measurement p r o c e d u r e s , and c o s t s of e r r o r s i n d e c i s i o n making. Only t h e concept of s u r v e y model e r r o r s as used t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y , w i l l be examined i n t h i s paper. The t o t a l s u r v e y e r r o r i n m o n i t o r i n g programs i s a f u n c t i o n of both sampling and nonsampling (measurement) e r r o r c o m p o n e n t s and Sampling e r r o r may he estimated i n terms of v a r i a b i l i t y and b i a s . i s a f u n c t i o n of t h e n e t w o r k d e s i g n ( s t a t i s t i c a l s a m p l i n g t e c h n i q u e ) , u n d e r l y i n g p o p u l a t i o n of i n t e r e s t , and sample s i z e . Nonsampling e r r o r i s r e l a t e d t o t h e s p e c i f i c methods u s e d i n t h e measurement p r o c e s s . V a r i a b i l i t y may be e x p r e s s e d i n terms of t h e s u r v e y m e t h o d p r e c i s i o n , w h i c h may b e d e f i n e d a s r e p r o d u c i b i l i t y of a method when i t i s r e p e a t e d on a homogenous sample under c o n t r o l l e d c o n d i t i o n s r e g a r d l e s s i f s y s t e m a t i c e r r o r s a r e p r e s e n t o r not. The b i a s term measures t h e d i f f e r e n c e between t h e measured v a l u e and t h e t r u e v a l u e . A c c u r a c y may be d e f i n e d a s t h e a g r e e m e n t b e t w e e n t h e amount o f a component measured by a method and t h e a c t u a l amount p r e s e n t . Thus, f o r m e t h o d t o b e a c c u r a t e i t s h o u l d h a v e no b i a s a n d s m a l l variability. A g e n e r a l s u r v e y e r r o r model may be e x p r e s s e d a s (Kish,1965): T o t a l E r r o r = RMSE = (V:

+

2

Vns

+

COV(s,ns)

w h e r e RMSE = r o o t mean s q u a r e e r r o r ; V,

+

2

BS

+

2 )1/2 Bns

= sampling v a r i a n c e ; Vns

= nonsampling v a r i a n c e ; COV(s,ns) = c o v a r i a n c e between sample and nonsample; B, = sampling b i a s ; Bns = nonsampling b i a s .

I n g e n e r a l , t h e C O V t e r m may b e c o n s i d e r e d z e r o o r i n s i g n i f i c a n t i n w a t e r q u a l i t y m o n i t o r i n g d u e t o l a c k of Also, c o r r e l a t i o n between t h e two e r r o r components (Vs and V n s ) . t h e s a m p l i n g b i a s term i s u s u a l l y d r o p p e d s i n c e t h e t r u e p o p u l a t i o n must be known f o r it t o be e s t i m a t e d . Thus, a g e n e r a l

23

w o r k i n g model f o r e s t i m a t i n g t h e t o t a l s u r v e y e r r o r i n water q u a l i t y v a r i a b l e s is: 2 RMSE = (Vs

+

2 Vns

+

Bns

)1/2

(2)

T h i s model i s based on a s i n g l e macro and micro s t a t i o n , w i t h t h e term a f u n c t i o n of o n l y sampling f r e q u e n c y e r r o r , e s t i m a t e s of

V,

which a r e d i s c u s s e d i n t h e n e x t paragraph. The model may b e e x p a n d e d t o i n c l u d e : m i c r o l o c a t i o n e r r o r , by i n c l u d i n g e r r o r terms e s t i m a t e d f r o m o b s e r v e d c r o s s s e c t i o n d a t a ; a n d m a c r o l o c a t i o n e r r o r , by i n c l u d i n g e r r o r t e r m s e s t i m a t e d u s i n g s p a t i a l s t a t i s t i c s on o b s e r v a t i o n s a t d i f f e r e n t sample s i t e l o c a t i o n s . The s a m p l i n g v a r i a n c e t e r m is a f u n c t i o n of t h e s t a t i s t i c of i n t e r e s t (e.g. mean, v a r i a n c e , etc.) and t h e s t a t i s t i c a l sampling t e c h n i q u e ( e . g. random, s t r a t i f i e d , e t c . 1 u t i l i z e d . E s t i m a t o r s f o r v a r i o u s s t a t i s t i c s and s a m p l i n g t e c h n i q u e s may be f o u n d i n m o s t s t a t i s t i c a l s a m p l i n g t e x t s ( e . g . C o c h r a n , l 9 7 7 ) . For example, an estimate of sampling v a r i a n c e ' f o r t h e mean of samples o b t a i n e d f r o m s i m p l e random s a m p l i n g i s t h e s t a n d a r d e r r o r of t h e mean:

where sz = e s t i m a t o r of s t a n d a r d e r r o r of mean; s

= e s t i m a t o r of

sample v a r i a n c e ; n = sample s i z e ; f = sampling f r a c t i o n (n/N) For sampling from i n f i n i t e p o p u l a t i o n s w i t h o u t replacement, a s i n t h e c a s e of w a t e r q u a l i t y , t h e sampling f r a c t i o n term ( f ) i s v e r y s m a l l , t h u s one minus f may be assumed t o e q u a l one. While t h e sampling b i a s term is u s u a l l y n e g l e c t e d i n t h e water q u a l i t y s u r v e y e r r o r model because of l a c k o f i n f o r m a t i o n a b o u t t h e t r u e p o p u l a t i o n , t h e r e a r e some i m p o r t a n t p o i n t s t o c o n s i d e r under t h i s assumption. The a p p l i c a t i o n o f c e r t a i n s a m p l i n g t e c h n i q u e s may c r e a t e a biased e s t i m a t e f o r c e r t a i n s t a t i s t i c s . For example, a s e a s o n a l p r o c e s s t h a t i s sampled v i a simple random s a m p l i n g may b e b i a s e d i n t e r m s of t h e e s t i m a t e of t h e annual t o t a l . Two methods t o p r o v i d e some i n f o r m a t i o n on t h e p o t e n t i a l amount o f s a m p l i n g b i a s a r e p o s t e x a m i n a t i o n of d a t a and monte

24

c a r l o s i m u l a t i o n a n a l y s i s . P o s t examination of t h e d a t a c o n s i s t s of examining t h e observed d a t a a f t e r a c e r t a i n t i m e period f o r t h e underlying temporal and s p a t i a l p r o c e s s e s . Then, r e e v a l u a t i n g t h e s t a t i s t i c a l sampling technique u t i l i z e d f o r p o t e n t i a l b i a s i n l i g h t of o b s e r v e d p r o c e s s e s . Monte c a r l o s i m u l a t i o n a n a l y s i s may be u s e d t o s i m u l a t e c e r t a i n u n d e r l y i n g temporal o r s p a t i a l p r o c e s s e s and t h e n a p p l y a g i v e n s t a t i s t i c a l s a m p l i n g t e c h n i q u e and e s t i m a t e b i a s f o r c e r t a i n s t a t i s t i c s and v a r i o u s sample s i z e s . The amount of nonsampling v a r i a b i l i t y i s a s a f u n c t i o n of t h e v a r i a b i l i t y i n sample c o l l e c t i o n and l a b o r a t o r y a n a l y s i s procedures. F o r some w a t e r q u a l i t y v a r i a b l e s i t may a l s o be n e c e s s a r y t o make t h e amount of nonsampling e r r o r a f u n c t i o n o f t h e l e v e l of m a t e r i a l p r e s e n t , s i n c e l a b o r a t o r y v a r i a b i l i t y may i n c r e a s e a t v e r y low ( d e t e c t i o n l i m i t ) and h i g h c o n c e n t r a t i o n s . To e s t i m a t e t h e nonsampling v a r i a b i l i t y component of t h e t o t a l s u r v e y e r r o r model r e q u i r e s a method t o combine t h e e s t i m a t e s o f t h e v a r i a b i l i t y from t h e d i f f e r e n t measurement a c t i v i t i e s . The g e n e r a l procedure i s t o develop a model of t h e measurement p r o c e s s , which i s a f u n c t i o n of t h e measurement t e c h n i q u e s used f o r t h e s p e c i f i c water q u a l i t y v a r i a b l e . Then, some method i s a p p l i e d t o p r o p a g a t e t h e e r r o r of i n d i v i d u a l model components through t h e measurement p r o c e s s model t o e s t i m a t e t h e t o t a l e r r o r i n t h e estimated concentration. Some e x a m p l e s o f e r r o r p r o p a g a t i o n methods a r e s e n s i t i v i t y a n a l y s i s , f i r s t - o r d e r e r r o r a n a l y s i s , p r o b a b i l i t y t h e o r y o f f u n c t i o n s of random v a r i a b l e s , and monte c a r l o s i m u l a t i o n . Readers a r e r e f e r r e d t o B e r t h o u e x ( l 9 7 5 ) a n d E v a n s e t . a l . (1984) f o r d e s c r i p t i o n s of t h e f o u r methods. The a p p l i c a t i o n of t h e e r r o r p r o p a g a t i o n m e t h o d s r e q u i r e some f o r m o f e s t i m a t e s o f t h e v a r i a b i l i t y i n t h e i n d i v i d u a l components measurement mode. T h e s e e s t i m a t e s may b e o b t a i n e d f r o m e x p e r i m e n t a t i o n o r may b e f o u n d i n t e c h n i c a l j o u r n a l s and l a b o r a t o r y r e f e r e n c e manuals (e.g. SMEWW (19801, EPA (1979). The g e n e r a l method t o e s t i m a t e p r e c i s i o n i n m e a s u r e m e n t a c t i v i t i e s ( e . g . l a b o r a t o r y a n a l y s i s ) i s t h e c o e f f i c i e n t of variation (often referred a s t h e r e l a t i v e standard deviation) (USGSp1982) :

25

cv=s -X

(4)

I f t h e p r e c i s i o n i s c o n s t a n t over a l l c o n c e n t r a t i o n l e v e l s , which may be d e t e r m i n e d by s t a t i s t i c a l l y t e s t i n g f o r h o m o g e n e o u s v a r i a n c e , t h e p r e c i s i o n of t h e s p e c i f i c measurement component may be e s t i m a t e d a s (USGS,1982) :

To account f o r changes i n p r e c i s i o n a s a f u n c t i o n of t h e m a t e r i a l l e v e l , a s i m p l e s t a t i s t i c a l model may be d e v e l o p e d . If the p r e c i s i o n v a r i e s l i n e a r l y w i t h t h e l e v e l of m a t e r i a l , a l i n e a r r e g r e s s i o n l i n e c a n be d e t e r m i n e d . For c u r v i l i n e a r v a r i a t i o n , For e i t h e r m o d e l , a n p o l y n o m i a l e q u a t i o n s may b e f i t t e d . e s t i m a t e of model u n c e r t a i n t y ( e r r o r ) s h o u l d be i n c o r p o r a t e d i n t o t h e e s t i m a t e of t o t a l d a t a u n c e r t a i n t y . G e n e r a l l y , recommended sample measurement p r o c e d u r e s should be f r e e of b i a s . However, i n some i n s t a n c e s b i a s f r e e methods may n o t be a v a i l a b l e ; t h e n a n e s t i m a t e o f b i a s i s needed. AS w i t h measurement y a r i a b i l i t y , b i a s e s t i m a t e s f o r some measurement techniques a r e available in the literature. I n a d d i t i o n , it may a l s o be n e c e s s a r y t o make t h e b i a s e s t i m a t e a f u n c t i o n o f t h e l e v e l of m a t e r i a l . The a b o v e method f o r a c c o u n t i n g f o r m e a s u r e m e n t e r r o r i s r e l a t e d t o intra-measurement e r r o r (i.e. e r r o r associated with one m e a s u r e m e n t p r o c e s s ) . I n c a s e s where d a t a a r e c o l l e c t e d a n d / o r a n a l y z e d by d i f f e r e n t l a b o r a t o r y t e c h n i q u e s , p e r s o n n e l , and a g e n c i e s i t i s n e c e s s a r y t o a c c o u n t f o r i n t e r - m e a s u r e m e n t e r r o r i n t h e e s t i m a t e o f t o t a l measurement e r r o r . Intermeasurement e r r o r may be e s t i m a t e d by t h e same methods a s i n t r a measurement e r r o r . I n g e n e r a l , s i n c e inter-measurement e r r o r w i l l be l a r g e r t h a n i n t r a ( b e c a u s e i n t e r i n c l u d e s i n t r a ) i n t e r measurement e r r o r may be used t o r e p r e s e n t measurement e r r o r .

IMPLICATIONS OF DATA UNCERTAINTY The v a l u e of e s t i m a t i n g t h e t o t a l amount of w a t e r q u a l i t y d a t a u n c e r t a i n t y by i n c o r p o r a t i n g b o t h sampling and nonsampling e r r o r s 1.5

26

c a n be v e r y i m p o r t a n t . Four e x a m p l e s of t h e u s e f u l n e s s of t h e i n f o r m a t i o n g e n e r a t e d by e s t i m a t i n g d a t a u n c e r t a i n t y w i l l b e 1) t h e e f f e c t of b i a s on h y p o t h e s i s discussed. These a r e : t e s t i n g , 2) t h e e f f e c t of n o t i n c l u d i n g n o n s a m p l i n g e r r o r i n t o t h e e s t i m a t e of v a r i a n c e , 3 ) t h e d e t e r m i n a t i o n of o p t i m a l sample s i z e , and 4 ) t h e e f f e c t of d a t a u n c e r t a i n t y on d e c i s i o n making. To e x a m i n e t h e e f f e c t of b i a s on h y p o t h e s i s t e s t i n g , assume t h a t a sample i s c o l l e c t e d w i t h a mean ( 5 1 ) ; n o r m a l l y d i s t r i b u t e d s a m p l e a n d p o p u l a t i o n ; p o s i t i v e b i a s of B = /1 - Z, where A i s t h e t r u e mean; a n d t h a t t h e amount of b i a s i n t h e s a m p l e i s unknown. Hence t h e s t a n d a r d d e v i a t i o n c a l c u l a t e d from o b s e r v e d v a l u e s i s a b o u t x and n o t U. The e f f e c t of t h e b i a s i s t o i n c r e a s e t h e r e g i o n of r e j e c t i o n a t t h e upper t a i l and d e c r e a s e t h e r e g i o n a t t h e l o w e r t a i l , w h e r e t h e amount o f c h a n g e i s a f u n c t i o n of o n l y t h e r a t i o of b i a s t o s t a n d a r d d e v i a t i o n (Cochran,l977). For example, when t h e b i a s e q u a l s 0 . 4 times t h e e s t i m a t e d s t a n d a r d d e v i a t i o n t h e p r o b a b i l i t y of e r r o r of more t h a n 1 . 9 6 ~(6i s p o p u l a t i o n s t a n d a r d d e v i a t i o n ) i s : 0 . 0 6 8 5 compared t o a c t u a l v a l u e o f 0.05 f o r t h e t o t a l r e g i o n ; 0.0594 compared t o a c t u a l v a l u e of 0 . 0 2 5 f o r t h e l o w e r r e g i o n ; a n d 0 . 0 0 9 1 compared t o a c t u a l v a l u e of 0.025 f o r t h e upper r e g i o n . Thus, i f one were t e s t i n g f o r a t r e n d ( t w o - t a i l e d ) i n a w a t e r q u a l i t y v a r i a b l e t h a t was measured w i t h a n e g a t i v e b i a s e q u a l t o 0 . 4 , t h e a c t u a l p r o b a b i l i t y would b e 0 . 0 6 8 5 , and n o t 0.05, f o r r e j e c t i o n of t h e n u l l h y p o t h e s i s . The a f f e c t of n o t i n c l u d i n g an e s t i m a t e of n o n s a m p l i n g e r r o r w i t h s a m p l i n g e r r o r i s t o create an e s t i m a t e of v a r i a n c e s m a l l e r t h a n t h e a c t u a l amount. This could r e s u l t i n r e j e c t i n g a n u l l h y p o t h e s i s of e q u a l means o r v a r i a n c e s when i n r e a l i t y t h e r e i s no s t a t i s t i c a l d i f f e r e n c e . Hence, o n e i n c r e a s e s t h e l e v e l of Type I e r r o r . A p r o b l e m t h a t a r i s e s when d a t a u n c e r t a i n t y i s accounted f o r i s how t o s t a t i s t i c a l l y a n a l y z e a sequence of d a t a , when e a c h d a t a p o i n t i s n o t a s p e c i f i c v a l u e , b u t a c t u a l l y is r e p r e s e n t e d by some d i s t r i b u t i o n t h a t a t t e m p t s t o e s t i m a t e t h e amount o f d a t a u n c e r t a i n t y . T h e p r o b l e m may b e f u r t h e r complicated s i n c e t h e d i s t r i b u t i o n parameters and/or t h e d i s t r i b u t i o n f u n c t i o n i t s e l f may c h a n g e a s t h e amount of t h e w a t e r v a r i a b l e changes.

27

The d e t e r m i n a t i o n of o p t i m a l sample s i z e i s a f u n c t i o n of t h e s p e c i f i e d n e t w o r k d e s i g n and t h e l e v e l of p r e c i s i o n r e q u i r e d . One method t o d e t e r m i n e o p t i m a l s a m p l e s i z e i s t o c o n v e r t t h e t o t a l d a t a e r r o r and sample s i z e i n t o commensurable c o s t s i n a n e q u a t i o n of t o t a l e x p e c t e d c o s t (TEC). The TEC i s e q u a l t o t h e c o s t of t a k i n g n o b s e r v a t i o n s and t h e e x p e c t e d c o s t of e r r o r i n v o l v e d i n u s i n g a n e s t i m a t e of a p o p u l a t i o n parameter i n s t e a d of t h e p o p u l a t i o n p a r a m e t e r i t s e l f . The c o s t of t a k i n g n o b s e r v a t i o n s i n c l u d e s a f i x e d c o s t and t h e c o s t of t a k i n g a s p e c i f i c sample, and i s dependent upon t h e s t a t i s t i c a l s a m p l i n g t e c h n i q u e employed. T h e e x p e c t e d c o s t o f e r r o r may b e r e p r e s e n t e d by a l i n e a r o r q u a d r a t i c f u n c t i o n and i s r e l a t e d t o both sampling and nonsampling e r r o r s . The o p t i m a l sample s i z e i s determined by d i f f e r e n t i a t i n g t h e TEC w i t h r e s p e c t t o n , s e t t i n g t h e d e r i v a t i v e t o z e r o , and s o l v i n g f o r t h e o p t i m a l sample s i z e . I f t h e d e s i r e d o b j e c t i v e were t o d e v e l o p t h e a c t u a l o p t i m a l s a m p l e s i z e , e s t i m a t e s of b o t h s a m p l i n g and nonsampling e r r o r s a r e n e c e s s a r y , o t h e r w i s e a s m a l l e r sample s i z e t h a n a c t u a l l y needed would be s e l e c t e d . For a d e c i s i o n making p r o b l e m , a s f o r m u l a t e d i n t h e s y s t e m s a n a l y s i s f r a m e w o r k , t h e T E C may b e e x p a n d e d t o a l l o w t h e s e l e c t i o n t h e b e s t c o u r s e ' o f a c t i o n . The e x p a n d e d T E C i n c l u d e s c o s t c o m p o n e n t s f o r c o s t of t h e a n a l y s i s o f a g i v e n c o u r s e of a c t i o n , e x p e c t e d c o s t of a Type I e r r o r , and e x p e c t e d c o s t f o r a Type I1 e r r o r . The T E C i s t h e s u m of t h e f o u r c o s t components and t h e b e s t a c t i o n i s t h e o n e w i t h minimum c o s t . Thus, i f n o n s a m p l i n g e r r o r i s n o t accounted, a m i s t a k e may be made i n t h e s e l e c t i o n of t h e b e s t c o u r s e of a c t i o n .

1.6

CONCLUSIONS

I n f o r m a t i o n i s needed t o a s s i s t i n making w a t e r q u a l i t y management d e c i s i o n s . The worth of t h i s i n f o r m a t i o n i s i n v e r s e l y r e l a t e d t o t h e amount of u n c e r t a i n t y i n t h e i n f o r m a t i o n . A n i m p o r t a n t f a c t o r t h a t a f f e c t s i n f o r m a t i o n u n c e r t a i n t y is d a t a uncertainty. D a t a u n c e r t a i n t y i s a f u n c t i o n of s a m p l i n g a n d n o n s a m p l i n g e r r o r s . Sampling e r r o r s r e s u l t from c o l l e c t i n g o n l y a s u b s e t of a t o t a l p o p u l a t i o n . Nonsampling e r r o r s r e s u l t from t h e measurement p r o c e s s o f e s t i m a t i n g t h e l e v e l of some water qua1 i t y v a r i a b l e .

28

One m e t h o d t o e s t i m a t e t h e amount of d a t a u n c e r t a i n t y i s t o The model d e v e l o p a model o f t h e t o t a l s u r v e y e r r o r . i n c o r p o r a t e s v a r i a b i l i t y a n d b i a s terms f o r b o t h sampling and nonsampling components. T r a d i t i o n a l l y , t h e n o n s a m p l i n g e r r o r term of w a t e r q u a l i t y v a r i a b l e s h a s been n e g l e c t e d o r assumed t o be z e r o , r e s u l t i n g i n a n e s t i m a t e of v a r i a n c e s m a l l e r t h a n t h e t r u e amount.

T h i s may

r e s u l t i n i n c o r r e c t e s t i m a t e s o f Type I a n d Type I1 e r r o r s i n h y p o t h e s i s t e s t i n g and poor e s t i m a t e s of t h e o p t i m a l sample s i z e t o collect. B e s i d e s s a m p l i n g a n d n o n s a m p l i n g e r r o r s , o t h e r f a c t o r s may i n c r e a s e t h e amount o f i n f o r m a t i o n u n c e r t a i n t y . Examples include: d a t a r e c o r d i n g e r r o r s , biased s t a t i s t i c a l estimators, data processing e r r o r s , m i s i n t e r p r e t a t i o n of

i n a c c u r a t e computer

data analysis results,

programs,

and e r r o r i n

e s t i m a t e s of d a t a u n c e r t a i n t y . T h e i n c o r p o r a t i o n of u n c e r t a i n t y a n a l y s i s i n t h e d e c i s i o n making p r o c e s s i s needed t o h e l p d e c r e a s e t h e s t a t e o f d o u b t a s t o t h e b e s t c o u r s e o f a c t i o n t o t a k e t o s o l v e a problem. More r e s e a r c h i s needed t o d e v e l o p t h e : methodology of w a t e r q u a l i t y d a t a u n c e r t a i n t y a n a l y s i s , q u a n t i t a t i v e techniques t o estimate d a t a u n c e r t a i n t y , d a t a needed t o make e s t i m a t e s of w a t e r q u a l i t y d a t a u n c e r t a i n t y , and s t a t i s t i c a l and m a t h e m a t i c a l t e c h n i q u e s t o a n a l y z e w a t e r q u a l i t y d a t a t h a t h a s a n a s s o c i a t e d e s t i m a t e of data uncertainty.

1.7

ACKNCWLEDGMENTS T h i s r e s e a r c h was p a r t i a l l y funded by O f f i c e of Water Research and Technology g r a n t No. 14-08-001-G-1060. The a s s i s t a n c e of Dr. J a m e s L o f t i s i n h e l p i n g t o f o r m u l a t e t h e c o n c e p t s on w a t e r q u a l i t y d a t a u n c e r t a i n t y is appreciated.

REFERENCES 1962. S c i e n t i f i c Method: o p t i m i z i n g a p p l i e d Ackoff, R.L., research decisions. John Wiley and Sons, N e w York, New York. Berthouex, P.M., 1975. Modeling c o n c e p t s c o n s i d e r i n g p e r f o r m a n c e , v a r i a b i l i t y and u n c e r t a i n t y . I n M a t h e m a t i c a l Modeling f o r Water P o l l u t i o n C o n t r o l , ed. T.M K e i n a t h and M.P. W a n i e l i s t a , p405-440. Ann Arbor S c i e n c e P u b l i s h e r s , Ann Arbor, Michigan.

29

Cochran, W. G. , 1 9 7 7 . S a m p l i n g T e c h n i q u e s ( 3 r d E d i t i o n ) . John Wiley and S o n s , N e w York, N e w York. EPA, 1 9 7 9 . Handbook f o r A n a l y t i c a l Q u a l i t y C o n t r o l i n Water and Wastewater L a b o r a t o r i e s (EPA-60014-70-019) U. S. Environmental P r o t e c t i o n Agency, Washington, D.C. Evans, J . S . , C o o p e r , D. W . , and Kinney, P., 1984. On t h e Envirn. p r o p a g a t i o n o f e r r o r i n a i r p o l l u t i o n measurements. M o n i t o r i n g and Assesment 4 (1984) :139-153. H o r v i t z , D.G., 1978. Some d e s i g n i s s u e s i n sample s u r v e y s . I n : S u r v e y Sampling and M e a s u r e m e n t , e d . N . K . N a m o o d i r i , p3-11. Academic P r e s s , New York, New York. Kish, L., 1965. S u r v e y Sampling. John Wiley and S o n s , N e w York, N e w York. Montgomery, R . H . , L e e , V. D . , and Reckhow, K . H., 1983. P r e d i c t i n g v a r i a b i l i t y i n a Lake O n t a r i o p h o s p h o r u s model. J. Great L a k e s R e s . 9 ( 1 ) : 7 4 - 8 2 . S a n d e r s , T.G., Ward, R. C., L o f t i s , J. C., S t e e l e , T. D . , A d r i a n , D. D . , a n d Y e v j e v i c h , V., 1 9 8 3 . Design of Networks f o r Monitoring Water Q u a l i t y . Water Resources Publications, L i t t l e t o n , C o l o r ado. SMEWW, 1 9 8 0 . S t a n d a r d Methods f o r E x a m i n a t i o n o f Water and Wastewater ( 1 5 t h E d i t i o n ) . American P u b l i c Health A s s o c i a t i o n , Washington, D.C. USGS, 1982. Q u a l i t y a s s u r a n c e p r a c t i c e s f o r t h e chemical and b i o l o g i c a l a n a l y s e s o f water and f l u v i a l s e d i m e n t s . Book 5 , Chap. A6 o f T e c h n i q u e s o f Water-Resources I n v e s t i g a t i o n of t h e U n i t e d S t a t e s Geological Survey. U.S. G e o l o g i c a l S u r v e y , Washington, D. C.

.

TEE USE OF MULTIVARIATE METHODS IN THE INTERPRETATION OF WATER QUALITY MONITORING DATA OF A LARGE NORTHERN RESERVOIR R. SCHETAGNE Biologist, Andre Marsan et Associgs Inc., formerly from SocigtS d'snergie de la Baie James ABSTRACT The SociStg d'gnergie de la Eaie James Engineering and Environment Department has established an ecological monitoring network on the La Grande Complex, Quebec, Canada. Water quality studies of the La Grande 2 reservoir were initiated in 1977, two years before its impoundment, and continued for five years after its filling. Principal component a n a l y s e s were successfully used to single out the parameters showing the greatest changes (pH, dissolved oxygen, chlorophyll a and a number of nutrients), and to present the data in a clear and synthetic manner. Hierarchical clustering analysis provided good results when used on bottom data where redox decline triggered sharp chemical changes. These methods showed that the process of decomposition of submerged organic matter and the simple mixing of waters of different quality account for most of the changes measured. INTRODUCTION In its first phase, the La Grande Complex calls for the construction of three powerhouses on the La Grande Riviere at sites known as La Grande 2, La Grande 3 and La Grande 4 , as well as the partial diversion of the Eastmain and Opinaca rivers, from the South, and the Caniapiscau river, from the East. Conscious that its project will have repercussions on the ecology of the territory, the SociBtg d'Bnergie de la Baie James Engineering and Environment Department has established an ecological monitoring network with the following objectives: evaluate physical, chemical and biological changes, rationalize corrective measures and improve methods of predicting impacts in future projects. The use of principal component analysis and hierarchical clustering analysis on the water quality data greatly helped in a c h i e v i n g these objectives.

31

This paper is i n t e n d e d to provide an example of the use of multivariate methods in an ecological investigation rather than a discussion of its mathematical basis. MATERIAL AND METHODS - Samplig Program The impounding of the La Grande 2 reservoir began towards the end of 1 9 7 8 and lasted for one full year. It has a total 2 2 area of 2 , 8 3 6 km which includes 2,629 km of flooded terrestrial soils. Its mean flowthrough time is 6.3 months, its 3 mean depth, 2 2 metres, its total volume, 6 2 . 4 km and its mean drawdown, 3 metres. A total of six sampling stations permitted the monitoring of water quality in this reservoir. Their locations were selected in terms of inflow and morphometric conditions (Figure 1). A control station was set up in an undisturbed environment, to allow variations due to the impounding to be distinguished from those due to natural factors. Water quality sampling in the area of the La Grande 2 reservoir was initiated in 1 9 7 7 , almost two years before its impoundment, and continued for five years after its filling. Water was sampled every two weeks during the icefree period and four times, while under ice cover, between early December to early June. From 1 9 8 2 to 1 9 8 4 , winter sampling was reduced to one campaign which took place at the end of the season, when maximum variations were encountered. In addition to measuring the temperature, dissolved oxygen and conductivity from the surface to the bottom in situ, two types of samples were taken at each station. The first was a water column from the surface to a depth of 10 metres, usually the photic zone. In addition to chlorophyll pigments, the parameters analyzed in the laboratory from these samples were pH, conductivity, major anions (chlorides, bicarbonates and sulfates), color, transparency, nutrients (total Kjeldahl nitrogen, total phosphorus, total inorganic carbon, total organic carbon and silica) and finally tannins and lignins. The second type of sample was taken one metre from the bottom. This sampling coincided with the summer and winter low water levels and the spring turnover. The parameters were limited to the nutrients, pH, conductivity,

O

Figure 1 Localisation of water quality stations

Table 1 Summer mean values for the parameters showing the most change ( La Grande 2 Reservoir

Parameters

Natural Conditions

1

Post-impoundment

Flooding

1981 or 1982

1984

1979

maxinun

last year

Dissolved oxygen (% saturation)

82

89

PH

64

64

Total inorganic carbon (mg/l of C )

1,8

1,4

Total phosphorus (!@I of P)

15

12

3,1

23

0,9

0,9

1978

Chlorophylla (pg/l) Silica (mg/l of S i q )

33

bicarbonates, temperature and dissolved oxygen. The analytic methods used comply with the methods described by APHA ( 1 9 7 1 ) .

- Mathematical Processing of Reservoir Data The sampling program has yielded a great amount of data. Principal component a n a l y s e s were performed on the raw data matrixes in which a water sample (rows) is described by 1 6 physical chemistry parameters (columns). This analysis allows a summary of the major portion of the total variance of a set of data by a few major dimensions (Legendre et Legendre, 1 9 7 9 ) . The result is a two dimensional graphic instead of the 1 6 original dimensions (each parameter). In our graphics the relation between the parameters are superimposed on the reduced space ordination of the sample points, as this facilitates the interpretation (Jolicoeur and Mosimann, 1 9 6 0 ) . The dotted line in Figures 2 and 6 is the circle of signification defined by Scherrer ( 1 9 8 5 ) . Only parameters longer than the radius of this circle contribute significantly (at 9 5 % probability level) to the formation of the plane determined by the two principal axes. The clustering analysis used is the hierarchical flexible linkage clustering analysis (Lance and Williams, 1 9 6 6 and 1 9 6 7 ) with the parameters set at: aj 0. 625, a m E: 0. 625, P = 0 .2 5 and Y p 0. This analysis was performed on a Gower similarity coefficient matrix (Legendre et Legendre, 1 9 7 9 ) . RESULTS Figures 3 and 4 show the distribution of the temperature and dissolved oxygen saturation percentage isolines at a typical La Grande 2 reservoir station through one complete year. The reservoir presented, overall, two periods of thermal stratification (intense in summer and inverse and weak in winter) interspersed with two periods of water mixing. The deficiency in dissolved oxygen resulting from the degradation of submerged organic matter (Schetagne, 1 9 8 0 ) , has proved to be greatest in winter, when the presence of ice prevents contact with the atmosphere. The spring and fall turnover periods allow a reoxygenation of the deep zones and a redistribution of the products of degradation throughout the column. -Major biophysical phenomena were observed in two zones

34

of the reservoir: the summer photic zone and the bottom zone which is alternately anoxic and well oxygenated.

- Photic Zone During the Ice Free Period Figure 2 illustrates the parameter vectors in the reduced space obtained by principal component analysis performed on a total of 384 integrated samples (0-10 metres depth) taken during ice-free periods in 1978 (natural conditions),l979 (impounding) and from 1980 to 1984 (operation). This figure shows the correlation between the various parameters and their relative contribution to each of the principal axes. It shows strong correlation between conductivity, bicarbonates and pH. These correlations are well known, since these parameters correspond to the main ions. This figure also shows good correlations between the following parameters: tannins, total organic carbon, color and total Kjeldahl nitrogen. These parameters all measure the organic matter present which, before or after the impounding, was essentially of allochthonous origin (Schetagne, 1985). The good correlation between this group and the first may be linked to the influence of pH on the degradation of organic matter of plant origin (Campbell et al., 1976 and Sylvester and Seabloom, 1965). A strong, but inverse, correlation can be seen between the dissolved oxygen and the total phosphorus and total inorganic carbon parameters. The degradation of submerged organic matter, whether chemical or biological, induces consumption of the dissolved oxygen, the release of C 0 2 (which, at the pH values measured, accounts for a good part of the total inorganic cax-bon) and the release of nutrients like phosphorus (Wetzel, 1975) A strong correlation is also noted between total phosphorus and chlorophyll a, which, according to Berman and Eppley (1974), can be considered a good measure of the phytoplanktonic biomass. The importance of the phosphorus content in relation to the phytoplanktonic production is also well documented in the literature. Chlorophyll a also seems to be inversely correlated to the silica. This might be explained by the role played by the diatoms, a phytoplanktonic group which is important in this region, in the

.

Figure 2 R d u u d pax ordination of the centroids for each year of all the samples as defined by the first two axes.

'

0

TOT0 STATION

Pheo.

*. 1978 x A

+ A 0

1979 1980 1981 1982 1983 1984

-I1

W

cn

36

annual silica cycle (Magnin, 1 9 7 7 and Wetzel, 1 9 7 5 ) . The first two axes account for 2 7 and 1 8 % , respectively, of the total variance. Color, conductivity, tannins, bicarbonates, total organic carbon and total Kjeldahl nitrogen contribute most to the first axis. A s we will see this axis can be described as the dilution axis because it demonstrates firstly, the heterogeneity of the source stations in terms of degree of mineralization and organic content, and secondly the subsequent homogenization and dilution as a result of impounding by less rich waters. Silica, pheopigments, chlorophyll a, total inorganic carbon, total phosphorus and dissolved oxygen contribute most to the formation of the second axis. This axis is a better characterization of the events associated with decomposition. Indeed, decomposition of submerged organic matter resulted in a rise in nutrients (such as total phosphorus), which in turn resulted in an increase in phytoplankton biomass (chlorophyll pigments) , which caused a drop in silica content. Figure 2 also shows the centroids of the samples collected each summer period at each station. The c o o r d i n a t e s of a centroid are the mean c o o r d i n a t e s of the samples it represents. The interpretation is done with the reduced space ordination of all the samples but only the centroids are shown here. This figure shows that the samples taken in 1 9 7 8 (symDol % ) , under natural conditions, are essentially distributed along the first axis from left to right. This demonstrates the heterogeneity of the stations before impoundment. The migration of the centroids in time are from the bottom to the top of this figure. Generally, the water was relatively poor in chlorophyll pigments, total inorganic carbon and total phosphorus, but relatively rich in dissolved oxygen and silica. Movement toward the upper part of the figure indicates a decrease in the silica and dissolved oxygen content and, inversely, a rise in the concentration of phosphorus, total inorganic carbon and chlorophyll (symbols X to 4 ) A general migration of the points toward the center of the figure in time can be seen along the first axis. This

.

37

represents an homogenization as a result of impounding. This trend can be explained mainly by the simple mixing of waters with different degrees of mineralization and organic content. Thus the centroids of the samples taken from 1981 to 1984 (symbols A and 0 ) can all be found in the upper center part of the figure, since all the stations have shown the same decomposition mechanisms and homogenization. Analysis of the particular changes at a specific station will make it easier to follow the events caused by impounding. The circled symbols have been taken at the "Toto station". Because of its geological location, the waters of this station had a relatively high degree of mineralization and a substantial amount of organic matter. These characteristics explain that, under natural conditions (symbol % ) , this station is positioned at the right of the first axis, while r?latively high values of silica and dissolved oxygen and relatively low total phosphorus, total inorganic carbon and chlorophyll a account for its position near the bottom. This station was submerged in 1979 ( X ) by the waters of the La Grande Rivizre which are less mineralized and less rich in organic matter. This fact explains the first movement to the left of the reduced space. The decomposition of the submerged plant matter, resulting in a drop of the dissolved oxygen, a decrease of the pH,and increases of the total inorganic carbon and phosphorus, explains the subsequent migration of the points toward the upper part of the figure (symbol A ) . In 1981, a sharp increase in chlorophyll a and pheopigments and a drop in silica associated with a continued increase in decomposition products, explain the continued rise of the points in the figure (symbol a). Because this station is located in a small bay away from the main body of water of the reservoir (Figure 11, it has shown a degree of mineralization and organic matter content intermediate between its initial characteristics and those of the reservoir. This explains why its centroids (circled) have not quite joined those of the other stations. But continuous drawdown and subsequent partial refilling have weakened the differencesand explain the subsequent

.,

+,

migration from right ( H in 1981) to left ( 0 in 1984) , closer to the others. The reduced space ordination of all the samples of all the stations superimposed on the parameter vectors on a large color graphic has permitted the interpretation of the events occuring from 1978 to 1984. This analysis has shown the bunching up of the 1981 through 1984 samples of all the stations. Subsequent a n a l y s e s done on these samples only have shown spatial differenciation of the reservoir waters and subsequent a n a l y s e s have g i v e n finer details. The results given by reduced space ordination (by principal component analysis) were interpretable when at least 40% of the total variance was explained by the first two axes. Principal component analysis singled out the parameters showing the greatest changes. It has shown that in the photic zone, the dilution (first axis) explains the greatest proportion of the total variance observed after impoundment. Table 1 gives the variations shown by the important parameters. It shows that, in the photic zone, most of the absolute variations were weak.

- Deep Zone Figure 6 is a representation of the parameter vectors in the reduced space obtained by the principal component analysis performed on the 111 samples taken one meter from the bottom of four La Grande 2 reservoir stations. These samples were taken between 1979 and 1984 at the time of winter anoxic conditions, summer stratification and spring turnover. The distribution. of the vector parameters shows a strong negative correlation of dissolved oxygen with the following parameters: total inorganic carbon, total phosphorus, total Kjeldahl nitrogen, conductivity and bicarbonates. All these correlations are associated with the decomposition mechanisms of submerged organic matter (Burdick and Parker, 1971). The o x i d a t i o n of this matter, by chemical or biological processes, results in the consumption of dissolved oxygen and the release of C 0 2 , ions (increased conductivity and bicarbonates) and nutrients (increase of total phosphorus and total Kjeldahl nitrogen).

Figure 3

Figure 5

Annual evolution of isotherms at Bereziuk station ( La Grande 2 Reservoir

Dendrogam representi the clusters obtained by flexble linkage clustering analysis (Lance and Williams)

0,92

0.50

bottom

1980

472

Figure 4 Annual evolution of ioopleths of percentage oxygen saturation at Bereziuk station ( La Grande 2 Reservoir )

(0.42)

bottom

1980 1

-9-

sampling dates percentage oxygen saturation

w

(D

40

The two main axes explain 48 and 1 8 % , respectively, of the total overall variance. The parameters which contribute the most to the first axis are: total inorganic carbon, saturation percentage of dissolved oxygen, total phosphorus, total Kjeldahl nitrogen, conductivity and bicarbonates. The following parameters contributed most to the second axis: pH, total organic carbon, temperature and sulfates. The greatest variations observed at depth were shown by the parameters associated with the process of decomposition of submerged plant matter (Schetagne, 1 9 8 5 ) . Thus, unlike the first principal component analysis, the decomposition axis (I) explains the largest percentage of the total variance. The hierarchical flexible linkage clustering analysis represented by the dendrogram in Figure 5 was created from a Gower similarity matrix calculated with the data from the bottom samples. Two very different groups can be noted. They have a similarity index of only - 0 . 7 2 . The two polygons in Figure 6 define the position of the sample-points of each group in the reduced space of the first two principal components. The two groups can be distinguished by the value of dissolved oxygen in their samples. Roughly speaking the samples of the group on the right have dissolved oxygen saturation rates below 20% and those on the left have a saturation rate above 2 0 % . This boundary corresponds to the zone of rapid change of the redox potential. The solubility of a number of compounds substantially increases in a reductive environment (Wetzel, 1 9 7 5 ) . Thus, the dissolved oxygen level is a preponderant factor in the changes in the water quality of a reservoir because it influences the rate of exchange between sediments and water. According to Campbell et al. ( 1 9 7 6 ) , the impact of the sediments on the quality of the overlying water can be up to fifty times greater in anaerobic conditions than in aerobic conditions. In this case, the group on the right side of the reduced space can be distinguished from the one on the left by the very low or totally absent levels of dissolved oxygen and thus by a very low redox potential and by much higher values of total phosphorus, total Kjeldahl nitrogen,

Figure 6

Table 2

RIcluwd space ordination of the first two clusters of bottom samples.

Values measured in one meter from bottom samples ( La Grande 2 Reservoir )

.

Parameters

'S

\

O2

1

'ercentage oxgen saturati (%) Conductivity ( pSlcm) Bicarbonates (mg/l of HCO;) PH

End of winter After sprilg m o v e r

c-20

60-9 1

19-88

14-22

5-36

3-9

588-68

5,8-6,3

0.5- 17,O

1.0-2.5

<$8- 158

1,&43

3.4-2 1A

33- 10,5

030-122 0,09-Q.m 25-178

10-28

42

conductivity, bicarbonates and total inorganic carbon. The differences between these groups are not a function of station location because, as a result of the turnover periods, the deep zones of all these stations show the same alternation in their dissolved oxygen saturation rates (Figure 4). Table 2 shows that the variations observed in the bottom samples were quite great. The much smaller variations observed in the photic zone indicate that the effect of the bottom zone on the overall reservoir is limited. A number of clustering a n a l y s e s were also performed on photic zone samples but, in contrast with the analysis of the bottom zone, they were not very instructive. The goal of clustering analysis is to locate discontinuities which in ecology are not usually well defined. In our studies, they were only helpful in the analysis of data from the bottom zone where the declining redox potential triggered sharp chemical changes. CONCLUSION Multivariate methods (principal component ing analysis) were successfully used to single parameters showing the greatest changes and to large amounts of data in a clear and synthetic

and clusterout the present very manner.

REFERENCES American Public Health Association et al., 1981. Standard Methods for the Examination of Water and Wastewater. APHA, Washington, D.C. 1134 p. Berman, T. and Eppley, R.W., 1974. The measurement of phytoplankton parameters in nature. Sci. Prog. 61:219-239. Burdick, J.C. and Parker, F.L., 1971. Estimation of water quality in a new reservoir. Department of Environmental and Water Resourcss Engineering, School of Engineering, Vanderbilt and U.S. Army Corps of Engineers, Report no. 8500 p. Campbell, P.G. et al., 1976. Effets du dscapage de la cuvette d'un reservoir sur la qualit6 de l'eau emmagasinee: 6laboration d'une msthode d'6tude et application au rsservoir de Victoriaville (Rivisre Bulstrode, Quebec) INRS-Eau, Quebec. 301 p. (Rapport scientifique no. 37). Jolicoeur, P. and J.-E. Mosimann, 1960. Size and shape variation in the painted turtle. A principal component analysis. Growth 24: 339-354. Lance, G.N. and Williams, W.T., 1966. A generalized sorting strategy for computer classification. Nature (Lond.) 212-218. Lance, G.N. and Williams, W.T., 1967. A general theory of classificatory sorting strategies. I. Hierarchical

43

system. Computer J. 9 3 7 3 - 3 8 0 . Legendre, L. et Legendre, P., 1 9 7 9 . Ecologie num6rique. Tome 2: la structure des donn6es 6cologiques. Presses de 1'UniversitS du QuGbec, Masson, Paris, 2 4 7 p. Magnin, E., 1 9 7 7 . Ecologie des eaux douces du territoire de la Baie James. Soci6t6 d'6nergie de la Baie James, Montrgal, 4 5 4 p. Scherrer, B., 1 9 8 5 . Construction et utilisation des cercles de contribution dquilibrde et de signification pour interpr6ter les r6sultats d'analyses en composantes principales. Rapport du groupe de recherche et d'6tudes en biostatistiques et en environnement pour la Soci6tB d'gnergie de la Baie James. 12 p . Schetagne, R., 1 9 8 5 . Le rsseau de surveillance Bcologique Physico-chimie et pigdu Complexe La Grande 1 9 7 7 - 1 9 8 4 . ments chlorophylliens. SociBt6 d'dnergie de la Baie James, Direction inggnierie et environnement, en pr6paration. Sylvester, R.O. et Seablom, R.W., 1 9 6 5 . Influence of site characteristics on quality of impounded water. J. AWWA, 57:1528-1546.

Wetzel, R.G., 1 9 7 5 . Toronto, 7 4 3 p.

Limnology.

W.B. Saunders Company,

MODELING RIVER ACIDITY

- A TRANSFER FUNCTION APPROACH

EIVIND DAMSLETH Norwegian Computing Center, Oslo, Norway

INTRODUCTION General For quite some time, Norwegian authorities have been concerned with pollution, with special emphasis on the pollution caused by acid precipitation. An extensive research program: "Acid precipitation - effects on forest and fish", was run through the period 1972 - 1 9 8 0 (Overein, Seip and Tollan, 1 9 8 0 ) . Most of the research activities in this field are now organized at the Norwegian Water Research Institute and the Norwegian Institute for Air Research. The research is, to a large extent, financed through the Norwegian State Pollution Authority. For a long time much effort was spent on collecting data from various sources, in addition to laboratory analysis. In the later years there has been an increasing understanding of the importance of analysing the data through more sophisticated methods, which leads to statistical data analysis. The Norwegian Computing Center (NCC) is a research institute founded by the Norwegian Council for Scientific Research. NCC is involved in research within data technology, data communication and cartography. Mathematical statistics and data analysis are also important fields of research. During the later years we have worked in close cooperation with the water and air research institutes, to assist them in their analysis and to bring their use and understanding of various statistical methods up to date. This is one of the major tasks f o r our Section f o r Statistical Analysis of Natural Resources Data. 1.

1.1

This study Time series analysis plays an important role in the analysis of pollution data, when it comes to questions about long term trends, e.g. whether air or water pollution at specific locations has changed from one year to another, or to judge the effects of specific actions. In 1983 and 1984 NCC was engaged in a project called "Stochastic time series models applied to pollution data". The 1.2

45

study covered air- as well as water pollution. A complete description of the project and the results is given in Damsleth (1984a). The results regarding air pollution are also presented elsewhere (Damsleth, 1984b1, and the present paper gives a summary of the results for river acidity. DATA We have analysed data from three rivers in the Southern parts of Norway: Nid River, Tovdal River and Mandal River. The location of these rivers are shown in Figure'l. Only the analysis from Nid River is presented here; the findings from the two other rivers are given only briefly at the end of the paper. 2.

Fig. 1. The geographical location of the three rivers under study. The river acidity is given as pH-measurements taken weekly from March 4. 1970 until December 30. 1980, 566 weeks in all. The data are shown in Figure 2 2.1 Missins values and outliers There were 18 weeks where the data were missing for various reasons: Three in 1970, six in 1971, seven in 1972 and one in 1976 and in 1977. In the analysis, these missing values were treated as follows: - The missing values were first estimated using simple linear interpolation between the observations preceeding and following the gap.

46

Discharge -

6

1000

100 -

10

...

.....

I

I

I

I

I

1970

I

I

I

I

1971

1

I

1000 -

10

! 4

1972 -6

I

I

I

I

I

I

I

I

I

I

1973

1974

1975

1976

1977

1978

4

I

;-

5

Fig. 2. pH (dotted, right scale) and discharge (solid line, left scale) for Nid river 1972 - 1982.

-

A univariate time series model was identified and estimated for this

adjusted series. - Optimal estimates (Damsleth, 1980) using this model were inserted for the

missing values.

- The series thus obtained was used to build a model for the relation between the acidity and the discharge. - When a model for this relation was found, this was used to calculate new optimal values. Finally the model parameters were estimated once more using the final estimates for the missing values. The various steps in the above process gave only small changes in the estimates for the missing values, and the model- and parameter estimates were almost unaffected by these changes. During the model building process, it became obvious that the data contained some "wild" values (outliers). There can be several reasons for such values: There may be an error in the data registration, an error may have occured during the laboratory analysis or the value may in fact be correct, but strongly affected by some occasional outlet from a non-typical acid or alkaline source. There are eight obvious outliers in the data, two in 1 9 7 0 and one in 1 9 7 1 , 1 9 7 5 , 1 9 7 6 , 1 9 7 7 and 1 9 7 8 . Such extreme values may have a very strong influence on the model building. To avoid this, we chose to ignore the outliers and to treat them as missing values according to the procedure above. UNIVARIATE ANALYSIS 3.1 3iihe series model, Simple univariate analysis gave the model 3.

(1

-

0.828 - 0.148

52

(0.04) (0.02)

)(Yt- 5 . 1 6 ) = at, oa= 0.1033 (0.04)

where we have used the standard notation (Box & Jenkins, 1 9 7 0 ) for time series models, so that Y represents the observed pH at time t, at is a white noise t sequence with standard deviation oa, and B is the backwards shift operator so that BkYt= Yt-k. Note the highly significant, though small, lag 5 2 term, which implies a certain seasonal pattern. Residual analysis There is reason to believe that the acidity may differ for the various seasons of the year, and there may also have been a change through time. In Table 1 we give the mean and standard deviation for the residuals from model ( 1 1 , according to season and according to year. In this context we define the seasons as follows: Winter is December through February, Spring is March through May, Summer is June through August and Autumn is September through There a r e s e v e r a l i n t e r e s t i n g f e a t u r e s i n T a b l e 1. There a r e no November. 3.2

s i g n i f i c a n t changes i n t h e a c i d i t y b v e r t i m e ; none o f t h e y e a r l y means d i f f e r s i g n i f i c a n t l y f r o m 0.

48

TABLE 1 Mean and standard deviation for the residuals from model ( I ) ,according to year and season. On the 5 S, level, the means marked * are significantly different from 0 and the standard deviations marked * significantly different from 0 . 1 0 3 3 , the estimated overall standard deviation. Year

1970

-.OOO Mean St.dev. . I 0 5 39 # obs.

1971

1972

. 0 1 6 -.006 . 0 7 7 * .093 52 53

1973

1974

1975

1976

1977

1978

.012 -.017 .I07 .I20 52 52

.016 .lo3 52

.001 -.020 -.015 .099 .099 .095 53 52 52

1979

1980

.006 . 0 0 4 .129* . l o 1 52 52

# obs.

n

When it comes to the seasons, the summer shows a clearly significant positive mean. We can thus conclude that the model changes according to the seasons, and that the summer months are not satisfactorily described. The table also shows that there is a large variation in the residual standard deviation between years and between seasons. In particular, the standard deviation is definitely smaller during the winter period. Figure 3 shows a histogram of the residuals from model ( 1 ) . It appears quite nice and symmetric, without any peculiarities. We have not performed any formal test of normality, but the visual impression gives no reason to doubt this assumption. The normality assumption is by no Fig. 3 . Histogram of means critical in the previous analysis, except perhaps the residuals from for the judgements of significance. model ( I ) . 4.

RELATION BETWEEN ACIDITY AND DISCHARGE There is reason to believe that the acidity in the river is affected by the discharge. The above analysis, which showed that the univariate model was not satisfactory during the summer when the flow is low, supports this hypothesis. We therefore wanted to incorporate the discharge into the model, using a transfer function model framework. 4.1 D i s c h a r g e d a t a The Norwegian Water and E l e c t r i c i t y Board has k i n d l y p r o v i d e d d a t a f o r t h e discharge i n Nid River f o r t h e p e r i o d o f i n t e r e s t .

The d a t a a r e d a i l y

49

measurements of the discharge in m3 per second. The flow is thus measured much more frequent that the acidity, which allows several possible choices of input variables to the transfer model. The natural approach would be to use several input series: one with the flow measurements taken the same day as the pH, one with the flow the day before and so forth. Unfortunately, the daily measurements are strongly autocorrelated, so that this approach would give a transfer function where the input series are strongly correlated. This leads to problems in the identification as well as the estimation process (Damsleth, 1 9 7 9 ) . Some experimenting lead to the use of the average flow during the last seven days prior to the pH-measurement as the only input series to the transfer function model. This input series is also plotted in Figure 2 .

. .

The discharse/acldltv m o m From Figure 2 one may deduce a negative co-variation between the discharge and the acidity. This is more pronounced in Figure 4 a, which shows a plot of pH-values against discharge. The figure also shows a smooth estimate of the functional relationship between the two variables, where we have used the LOWESS smoothing technique (Cleveland, 1 9 7 9 ) . The curve is clearly curved, and typical of a log-linear relationship. This is confirmed by Figure 4 b, where the flow is plotted in logarithmic scale, giving a near linear picture. Bearing the logarithmic definition of pH in mind, this is not surprising. 4.2

Fig. 4 . Relationshipbetweenacidityand discharge in Nid river. In (a) the discharge is plotted directly (in m3) along the X-axis, while in (b) the X-axis is given on a logarithmic scale. A few iterations of identification, estimation and diagnostic checking finally gave the transfer function/noise model:

50

0.55B)-1(-0.17 - 0.05B)(lnXt- 4.38) + Nt (0.07) (0.02) (0.02) 2 ( 1 - 0.738 - 0.12B INt = at, oa = 0.09156 (0.04) (0.04)

Yt

-

5.19 = (1 (0.03)

-

Here Xt represents the discharge as defined above, Nt is the noise in the transfer function model, and Yt and at is as given earlier. The residual standard deviation for model (2) was 0.09156, compared with the value 0.1033 from the univariate model (I), a significant, though not astonishing, improvement. It is worth noticing that introduction of the discharge as input to the model has removed the seasonal effect found in model (1). 4.3 Figure 5 again shows a histogram of the residuals from model (2). As for model (I),there is no reason

to doubt the normality assumption.

0 As in Table 1 for the univariate model, Table 2 gives the mean and standard deviation for the residuals from model (21, according to year and to season.

Fig. 5. Histogram of the residuals from model (2).

TABLE 2 Mean and standard deviation for the residuals from model (2), according to year and season, are presented in Table 3. The means marked * are significantly different from 0 at the 5 b level, and the standard deviations marked * are significantly different from 0.09156, the estimated overall standard deviation. Year

1970

1971

Mean St.dev. # obs.

.007 .089 36

.019 .021 -.006 -.005 .070* .079 .092 .099 52 53 52 52

1972

1973

1974

1975

1976

1977

1978

1979

1980

.013 -.012 -.009 -.006 -.011 -.008 .097 .085 .080 .094 .124* .087 52 52 53 52 52 52

# obs. ~~

The e n t r i e s i n T a b l e 2 have a much more homogeneous appearance when compared t o T a b l e 1.

There i s no r e a s o n t o s a y t h a t t h e r e has been any change t h r o u g h

t h e y e a r s , a l t h o u g h i t i s i n t e r e s t i n s t o n o t e t h a t t h e means a r e a l l n e g a t i v e f r o m 1976 on.

The seasons s t i l l d i f f f e r .

Summer s t i l l has a c l e a r l y p o s i t i v e

51

mean, and the spring has become negative. The standard deviation, however, has been significantly reduced for all four seasons. The introduction of the discharge has thus enabled us to,adjust for most of the seasonal behaviour, but it will be neccessary to develop a more complex model to remove it completely. Overall, the introduction of discharge gave a reduction in the residual standard deviation from 0 . 1 0 3 4 to 0 . 0 9 1 5 6 , that is 1 1 . 4 %. Comparison of Tables 1 and 2 shows that the reduction is not uniform over the seasons. The largest reduction is for the autumn, with 1 8 . 8 %. The reductions for summer and spring are 1 5 . 7 b and 1 0 . 9 % respectively, while the reduction during the winter period is only 1 . 4 %. The main reason for this is that the discharge is much more stable during the winter period, compared to the rest of the year. Since there is little variation in the flow, only a small proportion of the variation in the acidity can be explained by it, and it is not important for the residual standard deviationwhether the discharge is included in the model or not. 5.

CONCLUSIONS We have analysed pH-data from three rivers in Southern Norway. In Nid River and Tovdal River observations were taken weekly, while the pH in Mandal River was observed twice a month. All three series are well described by simple, univariate ARIMA-models. A small, but significant, term is included to account for the seasonal variation in the series. The model structure is very similar for all three rivers. Two of the rivers, Nid River and Mandal River, are exploited extensively for power production. Most of the discharge in these rivers consists of storage water, which is known to have a more stable water chemistry as compared to rivers which run freely. This is reflected in the residual standard deviation of the pH, which is much larger for the unexploited river. The discharge affects the acidity in all the three rivers. The relationship between flow and pH has been described in a transfer function model, where the input is the logarithm of the average discharge during the last seven days prior to the pH-observation. The effect, however, is much more pronounced in the two exploited rivers, which again can be explained by the stable chemistry of the storage water. The use of discharge as input to the model has removed most of the seasonal variation in all three series. For the controlled rivers it is fairly satisfactory to apply the same model for the whole year, independent of the season. For the uncontrolled Tovdal . River, this is not the case. Here the relation between discharge and acidity exist only during the summer and autumn. The explanation lies in the snow, which stores most of the water during the winter and early spring, discharching it again during the spring flood. Thus, during the winter the discharge in Tovdal River consists mainly of local rain and snow melting. During the spring,

52

flood water from the melting snow constitutes most of the discharge, while the summer and autumn discharge is mostly rain coming more or less directly into the river. It is not surprising that the relationship between discharge and acidity is different in these situations. For the exploited rivers, the discharge is mostly magazine water. The input to the magazines have of course the same various sources as described for Tovdal River, but the homogenization process in the magazines results in a model which can be used for the whole year. The analysis gives no reason to say that there has been any systematic trend in the river acidity. After adjustment for the variations in the discharge, there is vague evidence that the rivers have beensignificantlymore acid during the years 1977-79. This is somewhat surprising compared with other investigations which found a significant deterioration in the river acidity in Southern Norway (Henriksen et al., 1981). The explanation is that earlier studies used data all the way back to 1966, and thus caught the deterioration during the late sixties. The situation is obviously stabilized during the seventies, when our data were collected. REFERENCES Box, G.E.P. and Jenkins, G.M., 1970. Time Series Analysis, Forecasting and Control. Cleveland, W.S., 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. JASA, 74: 829-836. Damsleth, E., 1979. Analysis of Multi-input Transfer Function Models when the Inputs are Correlated. NCC-report no. 641, Norwegian Computing Center, Oslo. Damsleth, E., 1980. Estimating Missing Values in a Time Series. Scand. J. of Stat., 7: 33-39. Damsleth, E., 1984a. Time Series Analysis of Pollution Data - A Methodological Study (in Norwegian). NCC-report no. 745, Norwegian Computing Center, Oslo. Damsleth, E., 1984b. Long Range Transport of Air Pollution into Norway - a Transfer Function Approach. MIC, 5 : 141-150 Henriksen, A., Snekvik, E. and Volden, R., 1981. Changes in pH during the period 1966-1979 for 38 Norwegian Rivers (in Norwegian). Governmental Program for Pollution Monitoring Report no. 2/1981. Norwegian Institute for Water Research, Oslo. Overein, L.A., Seip, H.M., and Tollan, A., 1980. Acid Precipitation - effects on Forest and Fish. SNSF-project, FR19/80. Oslo.

SULPHATE, WATER COLOUR AND DISSOLVED O R G A N I C CARBON RELATIONSHIPS I N O R G A N I C WATERS OF A T L A N T I C CANADA HOWELL a n d T . L .

G.D.

POLLOCK

M o n i t o r i n g a n d S u r v e y s D i v i s i o n , Water Waters D i r e c t o r a t e , A t l a n t i c R e g i o n

Q u a l i t y Branch,

Inland

ABSTRACT This the

study

interference

l a r g e d a t a set t o re-examine

associated with

(e.g.

determinations

organic

and

anion

the

anions)

inability

concentrations,

analyses. colour

Using

threshold

sulphate

of

value

As

of

sulphate

a r e s u l t of c o l o u r

colourimetric

to

sulphate

quantitatively

analytical

perform p a r a l l e l colourimetric

some o f

t h e determination of

i n highly organic w a t e r s .

concentrations

organic

a

utilizes

difficulties

determine

laboratories

and i o n chromatographic

data

from

Atlantic

Hazen

20

units,

a water

Canada,

below

often

sulphate

which

there

i s n o s i g n i f i c a n t d i f f e r e n c e b e t w e e n t h e two a n a l y t i c a l t e c h n i q u e s E i g h t e e n d a t a s e t s w e r e exam’ined t o d e t e r m i n e

c a n be e s t a b l i s h e d . whether could

or

be

not

h i s t o r i c a l colourimetric

corrected

for

organic

anion

sulphate

concentrations

interference.

Nine

t h e s e d a t a s e t s n o t o n l y h a d good l i n e a r r e l a t i o n s h i p s ( r 2

of .65)

b e t w e e n A SO4 ( SOlMTB - SO4Ic) a n d w a t e r c o l o u r b u t a l s o e x h i b i t e d

similar slopes.

Similar r e s u l t s w e r e observed f o r relationships

between A S 0 4 and were

with

A

versus

SO4

few

DOC,

although

exceptions

colour

or

water

more

headwaters

colours)

can

the

for (i.e.

be

correlation

than

regressions.

c o r r e c t i o n o f S O 4MTB d a t a one

lower

those

These

headwater those

coefficients

observed

results

sites o r

for

indicate

the that

draining

sites

which have h i g h l y v a r i a b l e

accomplished

a

using

general

correction

equation. ACKNOWLEDGEMENTS The a u t h o r s w i s h t o t h a n k t h e s t a f f o f t h e A n a l y t i c a l S e r v i c e s Division

of

samples. Branch,

Canadian

and Oceans which

the

Q u a l i t y Branch

the

who

analysed

the

water

a p p r e c i a t i o n i s e x t e n d e d t o Water Q u a l i t y

Wildlife

Service

and

p e r s o n n e l who c o l l e c t e d t h e

formed

Computer

Water

In addition,

basis

of

the

data

Boulter.

of

l a r g e number

set

a s s i s t a n c e w a s p r o v i d e d by D.

t h e m a n u s c r i p t w a s t y p e d by L .

Department used

in

Fisheries of

this

Bingham a n d L .

samples paper. Wong a n d

54 INTRODUCTION

Recent of

interest

strong

in

mineral

the

t h e atmospheric d e p o s i t i o n

e f f e c t s of

acids

on

waters

surface

has

increased

the

demand f o r r e l i a b l e a n a l y t i c a l d e t e r m i n a t i o n o f n i t r a t e , s u l p h a t e and hydrogen i o n c o n c e n t r a t i o n s . mediated

by

t h e more

conservative

as

used

a complex

are

processes,

concentrations abiotic

A s b o t h hydrogen i o n and n i t r a t e ?

an

of

indicator

hydrogen

ion

series o f sulphate

loading.

biotic

ion

has

However,

and been

there

is evidence that s u l p h a t e is also mediated by b i o t i c and a b i o t i c processes

et

(Rao

al.,

1984;

Nriagu,

1984)

and

thus

does

not

R e g a r d l e s s o f these o b s e r v a t i o n s , s u l p h a t e

behave c o n s e r v a t i v e l y .

has been used e x t e n s i v e l y f o r the c a l c u l a t i o n of t a r g e t l o a d i n g s

as

and

a

basis

the

for

development

several

of

acidification

m o d e l s ( H e n r i k s e n , 1 9 7 9 ; Thompson, 1 9 8 2 ) . During

the

1970's,

sulphate

were

concentrations

generally

d e t e r m i n e d u s i n g a n a u t o m a t e d m e t h y l thymol b l u e (MTB) p r o c e d u r e (Lazrus et a l . , Cronan

(1979)

to

susceptible Although

that

interference

this

hypothesized with

1966; A n a l y t i c a l Methods Manual, 1 9 7 9 ) . indicated

that

the

in

organic

MTB

colourimetric

highly

is

interference

sulphate

the not ions

a

chemical,

rather

a

than

it

understood,

present ions,

physical,

is

s o l u t i o n compete

in

resulting

estimation of t h e a c t u a l sulphate concentration. of

was

organic coloured waters.

well

f o r a v a i l a b l e barium

However,

method

in

an

over-

This contention

colour

interference

m e c h a n i s m i s s u p p o r t e d b y t h e f a c t t h a t w i t h MTB s u l p h a t e c o n c e n t r a t i o n s good i o n b a l a n c e s are o b s e r v e d f o r t h e s e c o l o u r e d w a t e r s even

though there

This

would

i s n o m e a s u r e of

suggest

the

that

MTB

organic anion concentration. sulphate

determination

gives

an indication of sulphate plus organic anion concentration. The

in

the

techniques aquatic

employed t o

milieu

have

determine

progressed

sulphate concentrations considerably,

reflecting

advances i n b o t h a n a l y t i c a l methodology and automated procedures. With in

the

the

development

late

1970's,

of

reliable

many

ion chromatographic techniques

laboratories

switched

from

automated

c o l o u r i m e t r i c s u l p h a t e a n a l y s i s t o i o n chromatography. I n November o f began

ion

collected

1 9 8 1 , t h e Water Q u a l i t y B r a n c h , A t l a n t i c Region,

chromatographic

sulphate

i n LRTAP p r o g r a m s .

As

analysis

waters

for

of C a n a d a a r e c h a r a c t e r i s t i c a l l y h i g h l y c o l o u r e d , to

document

the

effects

of

dissolved

all

samples

i n t h e A t l a n t i c Region organic

it w a s necessary

matter

on

MTB

55 sulphate

measurements

and

thus

sulphate

measured by b o t h i o n chromatography a s m a l l d a t a s e t , Kerekes e t a l .

were

concentrations

and MTB c o l o u r i m e t r y .

Using

( 1 9 8 4 ) observed l a r g e differences

between c o l o u r i m e t r i c and i o n c h r o m a t o g r a p h i c s u l p h a t e ( A S O ) 4 f o r samples w i t h h i g h w a t e r c o l o u r s and d i s s o l v e d o r g a n i c m a t t e r . These

findings

sulphate

MTB

mass

had

serious

implications

particularly

when

of

in

budgets

sulphate

regarding

considering

organic

the

seasonal

systems.

of

use

trends

an

In

and

attempt

t o s a l v a g e h i s t o r i c MTB s u l p h a t e d a t a , t h e s e a u t h o r s i n v e s t i g a t e d the

possibility

ference.

of

The

correcting

authors

sulphate

MTB

observed

that

for

colour

although

site

inter-

specific

v e r s u s w a t e r c o l o u r and A S O

relationships did exist for ASO,

4 v e r s u s DOC, i t a p p e a r e d t h a t a g e n e r a l c o r r e c t i o n f o r t h e o r g a n i c i n t e r f e r e n c e of MTB s u l p h a t e was n o t f e a s i b l e . Although

is

ion no

balancing

convenient

of

organic

of

Oliver e t a l . the

the

organic

technique

from DOC and pH, chemist

need

places

this

colour

for

as

a resource

paper

the

has

two

investigate

available

for

at

(MTB)

SO4

present

the

there

determination

Although t h e e m p i r i c a l f o r m u l a t h i s has n o t been w i d e l y a d o p t e d an

analyzing

a

dual

ion

balancing

samples on

by

tool.

both

Despite

methods,

concentrations

SO4 ( I C )

burden

the

purpose.

analytical

The

primary

the

for

all

laboratory. is

goal

to

l a r g e a v a i l a b l e d a t a s e t t o e s t a b l i s h a water

t h r e s h o l d below which

between

because

(Cheam

( 1 9 8 3 ) h a s been u s e d t o e s t i m a t e o r g a n i c a n i o n

use the relatively

to

accurate

waters

method

t i m e and c o s t of d e t e r m i n i n g samples Thus

more

concentration.

analytical

obvious

of

direct

anion

concentration by

the

1 9 8 5 ) , t h e a n a l y t i c a l chemist s t i l l r e q u i r e s

e t al., for

is

SO4 ( I C )

sulphate further

t h e r e i s no s i g n i f i c a n t d i f f e r e n c e

methods.

the

secondary

A

possibility

of

objective

using

is

empirical

r e l a t i o n s h i p s t o c o r r e c t t h e l a r g e amount of h i s t o r i c a l

SO4

(MTB) d a t a .

METHODS Ion

chromatographic

NAQUADAT

method

code

sulphate 16309

was

(WQB,

determined

1983)

while

according automated

s u l p h a t e was measured u s i n g NAQUADAT method code 1 6 3 0 4 . water

colour

equipped with calibrated

was

estimated

with

a

Hellige

visual

to MTB

Apparent comparator

2 0 0 mm Nessler t u b e s and a c o m p a r a t o r c o l o u r d i s c

from 0

to

1 0 0 Hazen u n i t s

( N A Q U A D A T 02011L).

Water

56 c o l o u r s g r e a t e r t h a n 100 w e r e determined by an a p p r o p r i a t e sample dilution. The the

set

data

Atlantic

July using

Region

and

1984

both

employed,

collected

analyzed

sulphate

included a l l

originally

by

the

between Water

methods.

the

(MTB)

data

were

above

mg/L

20

The

initial

from

1981 t o

laboratory

screening

process

samples and a l l samples

concentrations

to

eliminated

Nov.

Q u a l i t y Branch

e l i m i n a t e d p r e c i p i t a t i o n and groundwater w i t h SO4 ( I C ) o r SOq

dates

samples

above

mg/L.

20

overcome

Those

interpretation

problems a s s o c i a t e d w i t h sample d i l u t i o n a t h i g h c o n c e n t r a t i o n s . Additional

screening eliminated samples w i t h w a t e r colours

data

1 0 0 I-lazen u n i t s

of

recorded

set

data on

as

a s GT 1 0 0 . of

up t o

1983 w a t e r

100 w e r e

above

S t a t i s t i c a l analysis was

samples.

2682

colours

This screening process resulted i n a f i n a l performed

a VAX 1 1 / 7 5 0 c o m p u t e r e q u i p p e d w i t h t h e R S / 1 d a t a m a n a g e m e n t

system.

Data

test

Smirnoff

was

tested

for

sample

while

normality

median

using

values

the

were

Kolmogorov/

tested

using an

u n p a i r e d Mann-Whitney t e s t . RESULTS A N D DISCUSSION To

facilitate

value, of

the

data

the

establishment

set

was

colour ranges.

Water

sorted

a water

of

and

colour

subdivided

threshold

a

into

series

colour was chosen rather than dissolved

o r g a n i c c a r b o n as c o l o u r i s a n e a s i l y m e a s u r e d p a r a m e t e r performed on most s a m p l e s s h o r t l y a f t e r receipt a t t h e l a b o r a t o r y . TABLE 1

Results

of

Mann-Whitney

Unpaired

Test

for

Four

Water

Colour

Ranges Colour Range 1 2 3 4

0-20 20-30 30-40 40-50

*

=

**

=

SO IC MeAian

3.1 2.9 2.6 2.6

SOqN1C

S O MTB M e&an

869 183 279 488

3.2 3.2 3.2 3.3

SO4 MTB

2 S t a t i s t i c Significance

Level

N

-1.5 -2.8 -6.2 -8.5

864 183 275 479

0.132* 0.004** 0.0001** .0001**

No s i g n i f i c a n t d i f f e r e n c e b e t w e e n m e d i a n s a t 9 5 % C 1 S i g n i f i c a n t d i f f e r e n c e between medians a t 95% C 1

Table

1 presents

the

c o n c e n t r a t i o n s of S O 4

Mann-Whitney

(MTB) a n d SO4 ( I C )

test

of

the

median

for four colour ranges.

These r e s u l t s i n d i c a t e no s i g n i f i c a n t d i f f e r e n c e i n median v a l u e s only

for

the

0-20

Hazen

units

colour

range.

Thus

a

cut-off

57

level

of

Hazen u n i t s

20

can be e s t a b l i s h e d ,

below which it

is

u n n e c e s s a r y t o a n a l y z e samples by b o t h s u l p h a t e methods. Although this may

may

be

not

be

statistically necessary

to

valid, rigidly

on

an

adhere

operational to

the

20

level

Hazen

The r e s u l t s o f l i n e a r r e g r e s s i o n a n a l y s i s of SO4

limit.

it

unit

(IC)

v e r s u s SO4 ( M T B ) f o r f i v e c o l o u r r a n g e s a r e p r e s e n t e d i n F i g u r e 1.

P R

I

E' D I C T b E

D S

044

8

x o

(

n

F i g . 1. Sulphate C o l o u r Ranges

(IC)

Values

greater

of

sulphate

vs

concentrations

than

the

0-20

mg/L

8.0

for

Five

excluded

from

(MTB)

were

unduly

influenced

U s i n g t h e s e r e g r e s s i o n e q u a t i o n s , SO4 can

be

calculated

a n y v a l u e o f SO4 ( I C ) ( T a b l e 2 ) . for

Sulphate

as a few h i g h p o i n t s

the regression analysis the regression.

Predicted

water

colour

for

colour

(MTB)

range

given

This approach i n d i c a t e s t h a t

range,

b e t w e e n SO4 ( I C ) a n d SO4 ( M T B )

each

there

which

o f t h e Mann-Whitney t e s t o f m e d i a n s .

is

little

corroborates

difference

the

results

F o r t h e 2 0 t o 4 0 Hazen u n i t

r a n g e c o n s i d e r a b l e d i f f e r e n c e s b e t w e e n SO4 ( I C ) a n d p r e d i c t e d SO4

(MTB)

colour

are

range

apparent

up

increases,

to

the

3 mg/L

significant

level.

differences

As

the water

between

the

two s u l p h a t e m e t h o d s a r e o b s e r v e d a t h i g h e r SO4 ( I C ) c o n c e n t r a t i o n s . In fact

for

the

80-90

water

colour

range,

large

a r e o b s e r v e d up t o a SO4 ( I C ) c o n c e n t r a t i o n o f 6 mg/L.

differences

58 TABLE 2

D i f f e r e n c e b e t w e e n SO4 (IC) a n d SO4 ( M T B ) p r e d i c t e d f r o m Regression equations of s p e c i f i c colour ranges Sulphate Colour 0-20 I C (mg/L) Sulphate M E

Slope

Intercept R**2 N

Sulphate M l B

the

1.

to

of

40

4.

Colour 80-90 Sulphate MlB

1.30 2.30 3.20 4.20 5.10 6.10 7.00 8.00

2.00 2.80 3.60 4.40 5.20 6.10 7.00 7.80

2.10 2.90 3.80 4.60 5.40 6.30 7.10 8.00

4.90 5.70 6.50 7.30 8.20

2.50 3.30 4.20 5.00 5.90 6.70 7.60 8.40

0.96 0.35 0.95 867.00

0.85 1.03 0.87 440.00

0.84 1.24 0.87 462.00

0.82 1.60 0.79 380.00

0.85 1.63 0.73 319.00

The h i s t o g r a m o f SO4 (IC) that

Colour 45-60 Colour 65-80 Sulphate MT'B Sulphate MTl3

Colour 20-40

majority

of

mg/L, w h i c h

Hazen u n i t s

4.10

concentration

values

found

suggests

is not

2.40 3.20

in

that

(Figure this

2)

region

a water

colour

indicates range

from

threshold

f e a s i b l e and t h e c u t o f f l e v e l should

be set a t 20.

N T

e

Fig.

2.

wm 2 2 upm 4 4 upm 6 6 upm 8 8 upm ie

Frequency

D i s t r i b u t i o n of

for t h e A t l a n t i c R e g i o n .

Sulphate

(IC) C o n c e n t r a t i o n s

59

However i n r e g i o n s where SO4 ( I C )

concentrations

are

higher,

i t may be p o s s i b l e t o i n c r e a s e t h e 2 0 Hazen u n i t s l i m i t . With t h e r e a l i z a t i o n of t h e problems a s s o c i a t e d w i t h SO4 (MTB) a n a l y s i s i n c o l o u r e d w a t e r ,

s e v e r a l a u t h o r s have investigated

t h e p o s s i b i l i t y o f c o r r e c t i n g h i s t o r i c a l SO4 ( M T B ) relationships between A S 0 4 ( S O colour

(Kerekes

et

4 1984)

al.,

( P o l l o c k and Komadina, 1 9 8 3 ) . were

very s i t e s p e c i f i c

and

(MTB)

-

and

data

using

SO4 ( I C ) ) and w a t e r

dissolved

organic

carbon

However t h e r e l a t i o n s h i p s o b s e r v e d i t was

s u g g e s t e d t h a t b a s e d on t h e

l i m i t e d a v a i l a b l e d a t a an o v e r a l l e q u a t i o n f o r SO4 ( M T B )

was

unfeasible. TABLE 3

Regression

analysis

of

delta

sulphate

vs

water

colour

for

s e v e r a l s i t e s i n t h e A t l a n t i c Region. Site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

M t Tom Bk A t k i n s Bk W e s t River N S Lakes N S B a s i n 'E' Nfld Lakes Moose P i t Bk P e b b l e . Outflow Rogers Bk Nfld R i v e r s Upper Mersey R i v e r L i t t l e River G r a f t o n Bk Whiteburn Bk NS B a s i n ' D A t o D N ' NB R i v e r s Lower Mersey R i v e r NB L a k e s

Table 3

presents

of ASO,

versus

sites.

Of

I n t e r c e p t R**2

F-Value

0.021 0.020 0.018 0.021 0.020 0.016 0.022 0.016 0.017 0.016 0.015 0.011 0.013 0.013 0.013 0.011 0.012 0.002

-0.108 0.177 0.269 -0.127 -0.201 0.289 -0.445 0.091 -0.069 0.044 0.061 0.465 0.129 0.151 0.039 0.097 0.107 0.181

144.0 105.0 117.0 1294.0 4082.0 386.0 557.0 119.0 732.0 214.0 128.0 13.0 67.0 16.0 20.0 43.0 15.0 0.3

the

water

the

Slope

colour

had

DOC

c o e f f i c i e n t s of

of

for

between

eighteen

were

A

also

determination

found u s i n g w a t e r c o l o u r ( T a b l e 4 ) .

26 20 22 307 1115 103 166 41 328 196 268 30 170 60 91 252 88 75

linear regression analysis

investigated,

sites

relationships (r2> 0.65) between A S 0 4 and

results

0.860 0.850 0.850 0.810 0.790 0.790 0.770 0.750 0.690 0.520 0.320 0.320 0.280 0.210 0.180 0.150 0.140 0.004

Sample

SO4

only

groups 5 0 % had

and c o l o u r .

considered

but

considerably

of

selected

good

linear

Relationships

i n most

cases

lower t h a n t h o s e

60 TABLE 4

Regression

(mg/L) f o r several

d e l t a s u l p h a t e v s DOC

analysis of

sites i n t h e A t l a n t i c Region. Site 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Slope

Moose P i t Bk 0.175 NS B a s i n ' E ' 0.185 0.137 Nfld Lakes M t T o m Bk 0.179 NS L a k e s 0.219 R o d g e r s Bk 0.149 L i t t l e River 0.171 West R i v e r 0.110 0.181 Pebble. Outflow A t k i n s Bk 0.126 Nfld Rivers 0.139 Upper M ersey River 0.109 W h i t e b u r n Bk 0.137 N S B a s i n ' DA t o D N ' 0 . 1 3 3 0.051 G r a f t o n Bk 0,071 Lower M e r s e y R i v e r NB R i v e r s 0.032 NB L a k e s -0.002

I n t e r c e p t R**2

F-Value

Sample Size

-0.262 -0.220 0.039 0.423 -0.291 -0.045 0.099 1.720 0.170 2.080 -0.003 0.365 -0.100 -0.231 0.321 0.426 0.335 0.242

688.000 2667.000 235.000 33.800 446.000 543.000 26.500 19.500 41.300 13.200 76.700 86.500 28.100 2.600 11.300 0.008 5.400 0.300

167 LO10 86 18 222 333 19 15 35 14 131 187 61 62 132 64 187 75

0.810 0.730 0.730 0.670 0.670 0.620 0.610 0.580 0.550 0.500 0.370 0.320 0.299 0.298 0.079 0.079 0.014 0.000

Some i n s i g h t i n t o t h e p r o b l e m s o f SO4 ( M T B ) be

gained

the

from

exception,

those

AS04 and

water

intercepts

(-0.5

similar

have the an

0.3). of

of

terminology

presented

.

in

slopes

c a u s i n g o r g a n i c matter

colour

et

(1985)

al.

Statistics

these

describing

c o l o u r s a n d l a r g e s t a n d a r d d e v i a t i o n s have t h e b e s t the

colour

lakes,

considerably

r2 s i t e s . this

relationships.

Newfoundland

deviations

or t o use

sites

water

the

and

t h e s e sites

that

represent colour

( T a b l e 5 ) i n d i c a t e t h a t those sites w i t h h i g h w a t e r

distribution

water

Without

(.016-.022)

T h i s would s u g g e s t

Cheam

3.

Table

may

good r e l a t i o n s h i p s between

similar

have

colour to

types

"iso-chrome"

results

sites which e x h i b i t

correction

The

pattern,

fact

In

first

the

higher

that

fact,

the

than

with

nine

the

sites

those

lakes

may b e e x p l a i n e d b y t h e f a c t t h a t

4 exception

had

observed

Newfoundland

AS0

of

standard for

the

deviate

low from

t h e s e headwater

l a k e s are sampled d u r i n g homothermal c o n d i t i o n s d u r i n g t h e s p r i n g and

fall

of

each

c h a n g e s on t h e A S 0 eliminated.

year

and

versus

therefore

water

the

effects

of

seasonal

c o l o u r r e l a t i o n s h i p have

been

61 TABLE 5

Mean,

Median,

Dev.,

St.

Sample

Size

and

Range

Water C o l o u r s

of

for Selected Sites.

Dev.

Site

Mean

Median

St.

1 2 3 4 5 6 7 8 9

176 147 136 51 105 70 86 114 41 38 38 81 35 75 50 33 61 16

210 140 155 30 110 60 70 120 35 35 35 80 35 80 50 30 60 10

94.2 61.2 58.0 56.8 52.8 51.1 49.6 49.2 34.3 33.7 31.3 28.2 24.2 21.9 18.0 16.9 16.9 12.5

10 11 12 13 14 15 16 17 18

A t k i n s Bk Moose P i t Bk W e s t River N S Lakes M t Tom Bk NS B a s i n ' E ' R o d g e r s Bk Pebble. Outflow N f l d Lakes Nfld Rivers N S B a s i n ' D A t o DN' L i t t l e River NB R i v e r s Upper M e r s e y R i v e r W h i t e b u r n Bk G r a f t o n Bk Lower M e r s e y R i v e r NB Lakes

are

one

or

sites

more have

drainage

headwater

headwater some

bog

area

basin

contrast,

either

sites

the

lakes

drainage, covered with

lakes

although

by

low

bogs

water

20 174 22 307 26 1138 330 41 104 196 91 30 264 268 60 170 88 76

streams

or

In

6).

(Table

Range

280 260 160 240 180 320 220 180 220 320 140 90 120 200 70 110 75 45

the gxeatest w a t e r colour

I n g e n e r a l t h e sites which e x h i b i t variability

Sample Size

the

is

all

drain these

percentage

highly

colour

which cases,

of

variable.

standard

the In

deviations

a r e e i t h e r h e a d w a t e r l a k e s w i t h no bog d r a i n a g e o r a r e d o w n s t r e a m of

a

series

with

large

which The

damp

of

lakes.

storage

reservoirs

seasonal

organic

These

observations

that

systems

have l o n g e r w a t e r r e s i d e n c e t i m e s

variability

fractionation

imply

work

of

both

of

and water

DOC

Glooshenko

and

colour.

Bourbonniere

( 1 9 8 5 ) has i n d i c a t e d t h a t f u l v i c a c i d s comprise a l a r g e percentage

(1

9 0 % ) of

the

humic

acids

Thus,

although

DOC

make

up

these

leaving

a

humic

in

acids.

the This

bog

significant

systems

c o n c e n t r a t i o n s and c o l o u r s , variations

the

relative

while

portion

exhibit

low

further of

the

downstream DOC

variability

pool.

in

DOC

it is l i k e l y that they exhibit large proportions

variation

in

of

chemical

various

fulvic

composition

of

and the

o r g a n i c c a r b o n p o o l a t t h e s e s i t e s may e x p l a i n t h e p o o r r e l a t i o n s h i p s o b s e r v e d between ASO,

and w a t e r c o l o u r and ASO,

a n d DOC.

62 These

results

suggest

that

it

is

possible

SO

MTB d a t a f o r bog i n f l u e n c e d h e a d w a t e r

the

regression equations f o r eight

4 drain

only

headwater

lakes.

t h a t t h e observed ASO,

to

Furthermore,

(mg/L)

of

is

colour

correct

l a k e s o r s y s t e m s which

the

the

similarity

sites

nine

1 . 9 % of

generally

of

indicates

water

the

colour. TABLE 6

Drainage

areas

basin

and

percent

a r e a c o v e r e d by l a k e s

of

and

bogs f o r e l e v e n s i t e s . Site

Station Drainag5 Area km

1 Moose P i t Brook 2 Atkins Brook 3 Mount Tom Brook 4 W e s t River 5 Rodgers Brook 6 Pebbleloggitch Outflow 7 L i t t l e River 8 Grafton Brook 9 Whiteburn Brook 10 Upper Mersey River 11 Lower Mersey River

16.7 15.0 11.1 119.0 10.0 1.7 131.0 52.9 7.1 295.0 723.0

N u m b e r of Lakes i n System

Percent Area Covered by Lakes

0 0 1*

Percent Area Covered by Bogs

1.2 6.3 2.7 3.2 1.0 8.8 1.8 1.9 0.0 2.3 2.5

0.0 0.0 1.4 2.4 0.0 20.6 6.2 12.4 3.9 7.6 9.4

8* 0

1* 14 8

1* 46 83

Headwater Lakes

' * I

CONCLUSIONS This

paper

employs

associated with the A

a l a r g e d a t a set t o r e v i e w t h e problems

measurement

sulphate i n coloured w a t e r s .

of

2 0 Hazen u n i t s w a s e s t a b l i s h e d b e l o w w h i c h

t h r e s h o l d v a l u e of

t h e r e i s no s i g n i f i c a n t d i f f e r e n c e b e t w e e n Although

(MTB).

Region

lakes

trations

it

and may

this

rivers be

in

SO4 ( I C ) a n d

SO4

level is appropriate f o r Atlantic

cut-off

areas w i t h higher

possible

to

increase

s u l p h a t e concen-

the

threshold

level

t o 4 0 Hazen u n i t s o r g r e a t e r . The p o s s i b i l i t y of c o r r e c t i n g h i s t o r i c SO4 ( M T B ) investigated

and

curve

could

not

sites

with

high

AS04 vs

and have

water

be

was

determined

developed.

standard

colour

intercepts. limited

it

However,

deviation

linear

that of

an

overall

it

was

water

relationships

observed

colour

with

data

was

correction had

that good

s i m i l a r slopes

T h i s i n d i c a t e d t h a t bog i n f l u e n c e d s i t e s w h i c h

water

storage

general correction f a c t o r s .

potential

can

be

corrected

using

63 REFERENCES

Methods M a n u a l , 1979. I n l a n d Waters D i r e c t o r a t e ' , Water Q u a l i t y B r a n c h , O t t a w a , C a n a d a . Sulphate i n Coloured Cheam, V., Chau, A. a n d Todd, S . , 1985. Waters: I n v e s t i g a t i o n on Methodologies, Data R e l i a b i l i t y Data. and Approaches t o Salvage Historical Colorimetric

Analytical

NWRI Report. C o n t r i b u t i o n Number 8 5 - 9 5 . Cronan, C.S., 1979. D e t e r m i n a t i o n of S u l p h a t e i n o r g a n i c a l l y A n a l . Chem. 5 1 : 1 3 3 3 . coloured w a t e r samples. 1985. Impact of Organic G l o o s h e n k o , W.A. a n d B o u r b o n n i e r e , R . A . , Waters f r o m P e a t l a n d D r a i n a g e o n A q u a t i c E c o s y s t e m s . Study Long Range T r a n s p o r t o f A i r P o l l u t a n t s P r o g r e s s Report. 1984/85 Annual R e p o r t . H e n r i k s e n , A., 1980. Acidification of freshwaters - a large scale t i t r a t i o n . p. 68-74. I n D. D r a b l o s a n d A. T o l l a n ( e d . ) E c o l o g i c a l Impact o f acid P r e c i p i t a t i o n . Kerekes, J . , Howell, G. and Pollock, T., 1984. Problems associated w i t h s u l p h a t e d e t e r m i n a t i o n i n c o l o u r e d , humic w a t e r s i n K e j i m k u j i k N a t i o n a l P a r k , Nova S c o t i a , C a n a d a . V e r h . I n t e r n a t . V e r e i n Limnol. 22:1811-1817. H i l l , K.C. a n d Lodge, J . P . , 1965. A new c o l o u r i Lazrus, A.L., metric microdetermination of s u l p h a t e i o n . Automatic Anal. Chem., p . 2 9 1 . 1984. Role o f i n l a n d w a t e r s e d i m e n t s as s i n k s Nriagu, J.O., f o r a n t h r o p o g e n i c s u l p h u r . S c i . T o t a l . E n v i r o n . 38:7-13. a n d Malcolm, R.L., 1983. The O l i v e r , B.G., T h u r m a n , E.M. C o n t r i b u t i o n o f Humic S u b s t a n c e s t o t h e A c i d i t y o f C o l o u r e d Geochim. C o s m o c h i m . A c t a , 47:2031-2035. N a t u r a l Waters. 1983. Determination o f sulphate P o l l o c k , T.L. a n d Komadina, V.A., Workshop P r o c e e d i n g s , i n A t l a n t i c Canada s u r f a c e waters. K e j i m k u j i k Calibrated Catchments Program, A p r i l , 1983. Edited by J. Kerekes. J u r k o v i c , A.A. and Nriagu, J.O., 1984. Bacterial Rao, S . S . , A c t i v i t y i n Sediments of Lakes Receiving A c i d P r e c i p i t a t i o n . E n v i r o n m e n t a l P o l l u t i o n ( S e r i e s A ) 36:195-205. T h e c a t i o n d e n u d a t i o n r a t e as a q u a n t i T h o m p s o n , M.E., 1982. t a t i v e i n d e x of s e n s i t i v i t y o f E a s t e r n Canada rivers t o acidic atmospheric p r e c i p i t a t i o n . Water, A i r , S o i l P o l l u t i o n 18:215-226. NAQUADAT. D i c t i o n a r y o f P a r a m e t e r Codes. Water WQB, 1 9 8 3 . Q u a l i t y Branch, Environment Canada, O t t a w a .

SULFATE IN COLOURED WATERS. I. EVALUATION OF CHROMATOGRAPHIC AND COLORIMETRIC DATA COMPATIBILITY

V. CHEAM, A.S.Y. CHAU AND S. TODD National Water Research Institute, Canada Centre for Inland Waters, Burlington, Ontario, Canada

ABSTRACT The compatibility and reliability of colorimetric and chromatographic SO, data were evaluated. The multiple standard addition technique was applied to numerous natural and humic acid fortified waters. A total of more than 20 different waters was used, in which the colour ranged from 50 to 440 H.U. and the organic carbon from 0.7 to 20 ppm. For the first time, it was demonstrated that Ion Chromatography (IC) data on organic-contaminated coloured waters are reliable.It was also confirmed that the Methyl Thymol Blue (MTB) colorimetric data were biased high. An approach for salvaging historical colorimetric data was found and briefly discussed. 1. INTRODUCTION There has been a great deal of discussion and concern over the analysis of sulfate in coloured waters. This is due to its importance in the study of acid rain, and to its questionable colorimetric data caused by interference from coloured matter in the waters. Early sulfate data were generated by the colorimetric method using methyl thymol blue (MTB). The validity of these data have been discussed in several papers; many scientists believe that these data are biased high (Kerekes et al., 1984; Pollock, 1983; Underwood et al., 1983; Kerekes et al., 1982, Watt et al., 1983; Kerekes and Pollock, 1983, Underwood et al., 1982). The high bias of MTB results was suspected as early as 1979 by Cronan. In 1980, Crowther also reported high MTB results in comparison to ion chromatography (IC) results for water samples from Dorset area. The report suggested that the colorimetric method was invalid due to the presence of tannins, lignins, humates and

65

fulvates whereas the IC methodology was relatively unaffected by these interferences. In 1982, Cheam conducted an interlaboratory special quality control study on soft and coloured waters and observed significant difference between MTB and IC results. Although the IC methodology appears to be unaffected by colour interferences, the reliability of sulfate data generated by IC has not been established (Workshop, 1983). Uncertain data lead to uncertain interpretations and conclusions. If sound conclusions are to be made, the reliability of analytical data must be first ensured. Thus in this paper, we wish to evaluate in detail the compatibility of MTB and IC data and to establish their reliability (or non-reliability) . A brief discussion on handling historical data will also be made. 2. STUDY DESIGN Establishing data compatibility or reliability would be greatly simplified if pertinent certified reference materials (CRMs) are available. Since there are no coloured water CRMs, the study design is more complicated and time consuming. The design utilizes the principle of multiple standard additions (Julshamm and Brackhan, 1975; Klein and Hack, 1977; Agemian and Cheam, 1978; Bader, 1980; Kalivas, 1983) and many different types of organic-contaminated waters, including seven different natural coloured waters from the Atlantic and Ontario regions (Table 1) and many humic acid fortified coloured waters (Table 2). By studying the commercial humic acid (H.A.) along with natural organic matter in coloured waters, we diversified the types of organic matter studied, and at the same time were able to create a more even spread of colours. 3 . EXPERIMENTAL Multiple standard addition (MSA) An advantage of MSA is its ability to diagnose the amount present in an unknown. Bader (1980) pointed out that in nearly every case, an appropriate method of standard addition can give the best absolute value for an unknown. In our case, the MSA experimental design is schematically presented in Figure 1 where sample S 0 4 - N represents each of the principal natural samples (Table 1) and humic acid fortified samples (Table 2 ) . ____.._____.I__

FIG 1

EXPERIMENTAL DESIGN FOR STUDYING NATUFiAL COLOURED WATER SAMPLES (TABLE 1 ) AND HUMIC ACID FORTIFIED WATER SAMPLES (TABLE 2)

(CMTB = SO4 concentration by MTB method cIC = SO4 concentration by IC method DOC = dissolved organic carbon)

67

TABLE 1.

IDENTIFICATION OF NATURAL SAMPLES ---- -. .- - . .. . . - - - - - . . .. . ..Sample Origin Colour Name H.U. -- _ __ _ _- _ _ _ - .- - - -- ____ so, -I P e b b 1e l o g g i t c h , A t 1a n t i c R e g i o n 100 __ - - _.__- - __ --- - - __- - -- - -- - .- - -_ - .- -. - - - _- -- - -_ ..- - . so,-I11 Moose R i v e r , O n t a r i o R e g i o n 60 ~__-- ____ __ _- ..~ -~ -. _._ so, -1v Dickie Lake, O n t a r i o Region 100 (Dorset a r e a )

so, -v -- -. so, - V I

A t k i n s Brook, A t l a n t i c Region 160 .. __- -- ..- .__- - - ..- -- - -. __ Upper Mercy R i v e r , 90 A t l a n t i c Region ___ __ Mount Tom B r o o k , A t l a n t i c R e g i o n 100 - _ _ - - . ..--- --- - __ _ _ - __ Sand Pond, A t l a n t i c Region 400 -- - - __ .- _ - .- .- - - - --- - - .. - -- _ - _ - _ - - - - -- .- - - .

__

so, - V I I

~

so, - V I

II

--

TABLE 2 .

H U M I C A C I D ( H A ) FORTIFIED COLOURED WATERS ( A l l v a l u e s are rounded d e s i g n v a l u e s )

._______-___I.________.____-_.-_

Sample

SO, , O r i g i n a l S p i k e , ppm

Name

_ __

--

so, -x -___ so, - X I

__

~

Colour H.U.

_

- .. PH Ba Spike Ad j u s t e d PPm _____ - ___ 4.3 0 _-_ __- 4.3 0

Spike mg/L

H.A.

~

0

60

0

100

10

0

250

25

4.3

0

0

400

40

4.3

0

so, - X I V

2

60

6

4.3

0

so, -xv

2

100

10

4.3

0

so,

2

2 50

25

4.3

0

6 -

~

~

so,

-XI I

so, - X I 1 1 --

-.-

-XVI

L--

so,

-XVII

2

400

40

4.3

0

so, - X X X I

2

400

40

4.3

1

68

Before

subjecting

approximate

( T a b l e 1 ) were d e t e r m i n e d .

SO,+-XIV ( T a b l e 2 ) ,

to

samples

the

concentrations,

SO,

MSA,

the

original

waters f o r t i f i e d samples SO4-X t o c o n c e n t r a t i o n s were f o u n d

xoI

F o r H.A.

t h e o r i g i n a l SO,

of

the

natural

t o b e v e r y s m a l l ; s u b s e q u e n t l y , e a c h w a s s p i k e d w i t h 2 ppm SO, t o p r o d u c e samples SO,-XIV t o SO4-XVII, a n d w i t h 5 ppm SO, t o samples SO4-XXII t o SO4-XXV ( T a b l e 2 ) . These spiked t a k e n as o r i g i n a l a p p r o x i m a t e c o n c e n t r a t i o n s , xoI of t h e f o r t i f i e d samples, produce

v a l u e s were

t h e samples w a s t h e n s u b s a m p l e d i n t o f o u r g r o u p s o f

Each o f

1, a n d t o to y i e l d a

t r i p l i c a t e s u b s a m p l e s a c c o r d i n g to t h e scheme i n F i g . e a c h s u b s a m p l e w a s a d d e d a known SO, final and

added

concentration

2 xo.

(Note

that

s t o c k volume

to

equal

xo;

0.0

a stock solution of

0.5

analyses

shown

i n t o d u p l i c a t e subsamples

in

replicate

analyses

a d d e d SO,

level.

Figure of

each

group of

seven natural

s i x o u t of

concentrations

within

this

the

there

subsamples with

samples

range.

various

are

six

a known

( T a b l e 2 ) were c h o s e n

T h e o r i g i n a l s p i k e s o f 2 a n d 5 ppm SO, because

for

Effectively,

1.

was

This resulted

E a c h of t h e s e s u b s a m p l e s was

in a negligible dilution effect). subdivided

1 xo

1 0 0 0 ppm SO,

u s e d so t h a t o n l y a v e r y s m a l l v o l u m e w a s a d d e d . further

xo,

The

used

( T a b l e 1) h a d

colour

250

H.U.

of

f o r t i f i e d waters ( T a b l e 2 ) w a s s t u d i e d t o p r o v i d e a more e v e n l y spaced

colour

range

than

waters.

natural

Also

the

pH

was

a d j u s t e d t o a p p r o x i m a t e l y 4 . 3 so t h a t acid

rain

pH

of

1964),

Martell,

4-5. only

To

it is w i t h i n t h e u s u a l p r e c i p i t a t i o n ( S i l l e n and

avoid

- 1 ppm

Ba

was

to

added

two

SO,-XXX and SO,-XXXI, t o see w h e t h e r t h e r e is i n t e r f e r e n c e or p r o b l e m a s s o c i a t e d w i t h i t s p r e s e n c e .

samples, any

Ba

Tina 1y s e s The using

ion an

chromatography

automated

Dionex

analyses 2100

of

were

SO,

system.

All

carried out samples were

f i l t e r e d b e f o r e b e i n g i n t r o d u c e d i n t o a 50 p L sample loop. e l u e n t was prepared b y d i s s o l v i n g 2 . 2 5 o f NaHCO, rate

was

g of N a , C 0 3

i n 1OL o f d e i o n i z e d d i s t i l l e d water. 2.0

mL/minute.

The

sample p a s s e d

The

and 2.25 g

The e l u e n t f l o w through

a

guard

or precolumn, a s e p a r a t o r column, an anion f i b r e s u p p r e s s o r w i t h d i l u t e H,SO, as r e g e n e r a n t , a n d f i n a l l y a conductivity detector. The d e t e c t e d s i g n a l w a s a m p l i f i e d and

column

69

to

converted

concentration

Hewlett

a

through

Packard

recorder/integrator. T h e c o l o r i m e t r i c SO, automated

No.

m e a s u r e m e n t s were c a r r i e d o u t u s i n g t h e

methylthymol

blue

(MTB)

method,

16306 (Environment Canada, 1 9 8 1 ) .

at

then

uncomplexed

NAQUADAT

t h e m e t h o d a l l o w s B a t o r e a c t w i t h SO,

o f B a C 1 2 a n d MTB, pH;

as

coded

Using equimolar s o l u t i o n

high

pH,

MTB,

reacts

Ba

is

which

with

measured

MTB, and

at low a grey t o SO,

leaving equated

c o n c e n t r a t i o n p r e s e n t i n t h e sample. D i s s o l v e d o r g a n i c c a r b o n (DOC i n ppm) was a n a l y s e d by t h e I R Analyser 1981);

Method,

was

pH

Apparent

Naquadat

code

measured

colour,

in

using

Hazen

Units,

(Environment

Radiometer

was

was m e a s u r e d u s i n g a CDM-83 Absorption

direct

by

HD061197);

No.

Na2S0,,

technique,

Naquadat

d i s t i l l e d water Chau,

1982).

flasks,

Stocks

whereas

from

NaHCO,

l e a s t o n e week

and

standards

before

were

samples were

test

the

and

Na2C0,

at

for

code

1981).

Inc.

J.T.

c o n t a i n e r s were c l e a n e d a n d

A l l

visual

a n d B a by t h e

C hem - __i c a 1s,-G_l_as_s_-a nd P 1a s t i c w a re H u m i c a c i d w a s p u r c h a s e d from A l d r i c h Chemical Co. C h e m i c a l Company.

meter.

S p e c i f i c conductance

c o n d u c t i v i t y meter,

aspiration

56101 (Environment Canada,

Canada,

PHM64

determined

a H e l l i g e Aqua T e s t e r .

comparison using

Atomic

06101

a

in

(Lot

Baker

stored

in

use

(Cheam a n d

in

volumetric

plastic

containers

made

w i t h s i z e s r a n g i n g f r o m 50 mL t o 500 mL. 4.

RESULTS AND DISCUSSION

C o m...p__ a _t i b__ i l.-i t y a n d r e.l i a b i l i t y o f MTB a n d

-Lg-daka

T h e a p p l i c a b i l i t y of t h e MSA r e q u i r e s t h a t t h e r e c o v e r i e s b e uniform, line,

t h e a d d i t i o n l i n e b e s t r a i g h t and p a r a l l e l to s t a n d a r d

the

dilution

effect

s t a n d a r d be about 0.5,

t h e samples ( F i g u r e 1 ) .

t o 1 5 d i f f e r e n t samples 1) and Each

of

eight

humic

these

be

minimal,

and

the

addition

of

1 . 0 and 2 . 0 times t h e o r i g i n a l v a l u e s i n

acid

1 5 water

I n t o t a l , w e a p p l i e d t h e MSA p r o c e d u r e

-

s e v e n n a t u r a l c o l o u r e d waters f o r t i f i e d coloured

samples

was

waters

analyzed

by

(Table

(Table 2). IC

and

MTB

m e t h o d s b e f o r e a n d aEter m u l t i p l e s t a n d a r d a d d i t i o n s . The g e n e r a l b e h a v i o u r o f methods of

MSA a p p l i c a t i o n t o t h e MTB a n d I C

t h e s e 1 5 water samples a r e i l l u s t r a t e d i n F i g u r e 2 .

The o r d i n a t e r e p r e s e n t s t h e a n a l y t i c a l r e s p o n s e or t h e amount

I0

TABLE

3.

COMPARISON OF CMTB RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN NATURAL AND FORTIFIED SAMPLES*

___ _

.-

-

-

Sample

.. - .-

Direct Analysis (amount found)

so,-I so, -11 I so,-1v

5.50 9.60 4.73 6.13 6.44 5.17 9.57 3.07 3.50 5.82

-V so, -VI so, -VI I so4-VIII so, -XIV SO,

so, -xv

f f f

f f 8.40 f 6.75 f

SO, -XVI

so, -XVII SO, -XXI I S04-XXIII SO, -XXIV

5.38 f 7.63 f 10.74 f

so, -xxv

6.64 9.21 10.20 10.52 6.80 7.47 8.18 3.54 4.11 6.40 10.95 7.42 5.17 7.42 12.14

f 0.52 f 0.20 f 0.06 f f

.-~

__

_ _ - _ - -~ _____

_ _ ..

~

-. . . ..-

MSA (amount present)

0.83 0.48 0.31 0.12 0.17 0.12 0.12 0.13 0.16 0.08 0.23 0.06 -

.

__

f

2.26

f 0.15 f

4.94

f 2.52 f f 2

i f f f -I f

f f

-

0.87 1.74 0.10 0.40 0.27 0.45 0.66 0.48 0.32 0.62 1-40 -

~. .. .. . -.-

*CHTB = SO, concentration by MTB method. TABLE 4 .

COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN NATURAL WATERS*

~

-- _-

~

Sample -

- - - - - - - - - - - - - - -.

so,-I so4-I11

so, -1v

.

Direct Analysis (amount found)

so,-v so,-VI so, -VII so4-VIII ._._ - - - __ - - - _

-

MSA (amount present)

- - -- .- - - _ . . ..- - - - - - - - - --.- _ _. - .- - - 2.37 f 0.16 2.86 f 0.18 8.95 f 0.45 8.99 f 0.32 1.63 0.19 1.61 f 0.11 1.56 f 0.31 1.67 f 0.08 4.96 f 0 . 1 5 5.10 f 0.12 1 . 9 3 f 0.36 2.08 f 0.15 2.39 f 0.42 1 . 9 5 ?r 0 . 0 6 - - - - _ - - - - - - - - - - - - - - - __ _ - - - _._ .- .- - - - - .- __ - - - --

_

*CIC = SO, concentration by IC method. TABLE 5 .

COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA IN HUMIC ACID FORTIFIED SAMPLES* (AT 2 PPM SO, SPIKE LEVEL)

- -_ - - - - - - - _ - - - -- - - - _ - - - - - - - - -- -

-

--

- -

- .- - - -

Sample

Direct Analysis (Amount Found) - - - - - __-- - - - - - - - - _ __ .____ - - __ - _ _ - - _. -_ - -

so,-x so, -XI so, -XI I so4-XI11 so, -XIV

so, -xv so, -XVI so, -XVII so, -xxx so4-XXXI -- _ - - __ - - - -- -._ - - ---

0.06 0.06 0.08 0.11 2.17 2.09 2.20 2.07 2.07 2.18 .- - - - - - - - -

_ .- -

__ - - .- .- -

.-

MSA (Amount Present) .-

__ .. ... ...-

-

_-

f 0.03 f 0.02 f

0.02

i 0.02

0.15 0.07 f 0.19

2.19 f 0.19 2.03 f 0.05 2.09 f 0.18 2.05 f 0.05

f f f f

0.02 0.03

f 0.03

_

- -_ - - - - _ - - -_ - -

*CIC = SO, concentration by IC method.

__ _-

- - - - -_ - - - - - - - - -

71 f o u n d by d i r e c t a n a l y s i s , w h e r e a s t h e a b c i s s a r e p r e s e n t s t h e c o n c e n t r a t i o n added and t h e amount " p r e s e n t " by e x t r a p o l a t i o n of

addition

line.

The

amount

is

present

defined

a b s o l u t e a b c i s s a v a l u e a t t h e i n t e r s e c t i o n of and t h e e x t r a p o l a t e d l e a s t - s q u a r e d

as

the

the abcissa l i n e

addition line.

T h e MTB l i n e ( F i g u r e 2 ) is c u r v e d , w h i c h i n d i c a t e s e x i s t e n c e of

interference

makes

and h e n c e u n c e r t a i n t y o f d a t a .

extrapolation

meaningless.

extrapolated as a s t r a i g h t l i n e ,

if

But

This curvature the

line

was

t h e amount " p r e s e n t " would be

is u n a c c e p t a b l e . Table 3 two t y p e s o f a m o u n t s a n d i n d e e d i n d i c a t e s t h a t t h e amount p r e s e n t is i n g e n e r a l h i g h e r t h a n t h e amount f o u n d .

h i g h e r t h a n t h e amount f o u n d ,

which

summarizes t h e Table

3 f u r t h e r shows t h a t

SO,-XVII

t h e r e s u l t s f o r samples SO4-XIV

i n c r e a s e w i t h c o l o u r a n d DOC,

to

a n d a r e much h i g h e r t h a n

t h e e x p e c t e d 2 ppm; l i k e w i s e , t h e r e s u l t s f o r sample SO,-XXII t o SO4-XXV i n c r e a s e w i t h c o l o u r a n d DOC a n d a r e h i g h e r t h a n t h e expected Finally, with,

5

ppm.

Thus,

the

MTB

t h e I C r e s u l t s (Table 3 vs.

are

results

are h i g h e r t h a n ,

t h e MTB r e s u l t s

not

reliable.

thus not compatible

Tables 4,

5 and 6 ) .

TABLE 6 .

COMPARISON OF CIC RESULTS OBTAINED BY DIRECT ANALYSIS AND BY MSA I N H U M I C A C I D FORTIFIED SAMPLES* (AT 5 PPM SO, SPIKE LEVEL) __ - - - .- - - - - - - - - - - - - - - - - - - - . - - - - ._ --- .. - - .- .- . .- - .-. - Sample Direct A n a l y s i s MSA (Amount F o u n d ) (Amount P r e s e n t ) ___- - - - - - - - - - - - -- - - __ - - - - - - - - .- .- - - - - _ _ - - - - - - - - .._ ._ -.- - - - .. so, - X X I I 5.68 f 0.08 5.54 2 0 . 1 5 so, - X X I I I 5.09 f 0.07 5.03 f 0.02 so, - X X I V 5.16 f 0 . 0 3 5.05 f 0.10 so, -xxv 5.09 f 0.06 5.06 f 0.17 - ..- - - - - .- - . - - - - - - - - - - - - - - - - . - - - - - - - - - .- - - .- . .- . . . .- ..- - - . . . *CIC = SO, c o n c e n t r a t i o n by I C m e t h o d . The I C s t a n d a r d an d a d d i t i o n l i n e s r e p r e s e n t i n g e a c h o f 1 5 samples a r e d e p i c t e d ly.

For n a t u r a l waters,

IC analyses.

of

uniformity

obeyed. the

third

F i g u r e s 3-9

I n e v e r y case, and

(=2 xo),

l e v e l o f n e a r l y 30 ppm. (Figure

o t h e r samples.

4)

and

dictated

t h e criteria

by

the

MSA

are

which

corresponds

t o a h i g h SO,

To a v o i d t h i s c o n c e n t r a t i o n e f f e c t , w e

d i l u t e d t h e s o l u t i o n s t w o times, line

as

w e o b s e r v e d s l i g h t downward c u r v a t u r e a t

F o r SO4-111, addition

p r e s e n t t h e MSA p l o t s o f

e x c e p t f o r SO4-111,

parallelism

the

3 to F i g u r e 17, r e s p e c t i v e -

i n Figure

then

obtained a s t r a i g h t addition

calculated

the

amounts

as w i t h

the

12

c

P x

Amxnt l 0 d

,/’ //’

/’ /

Anuunlpreseni Fig. 2

24

0

Cmcentrstim

THE GENERAL BEHAVIOR OF MSA APPLlCATlON TO THE M T B AND I C METHODS.

Fie 3

K: STANDARD ADDITION CUFiVE FOR SO4- I

-

22 -

20 -

7-

MOOSE RIVER

coLoR.6o,Doc~lz

coLoR.1oo.wc.11

ppm =4 added FQ 4 IC STANDARD ADDITION CWE FOR SO.-m

Ppn SO4

Fig 5

added

IC STANDARD ADDlTlDN UlRM FOR SO,-E!

73

T a b l e 4 c o m p a r e s t h e SO, amounts o b t a i n e d by d i r e c t a n a l y s i s (amount f o u n d ) and by MSA (amount p r e s e n t ) f o r t h e n a t u r a l waters,

and

shows good

agreement w i t h i n

This gave a s s u r a n c e t h a t

I C produces

errors.

experimental

reliable

results

in

the

p r e s e n c e of o r g a n i c matter i n n a t u r a l w a t e r s . f o r t i f i e d waters, a t colour

For humic a c i d

440

,

H.U.

the

parallelism

addition

(Figures

5,

agreement

225 and and the

v a l u e c a n be t a k e n

b a c k g r o u n d c o n c e n t r a t i o n s were

2 ppm s i n c e t h e

(Table

90,

also i n d i c a t e uniformity I n t h e s e f o u r samples,

lines

10-13).

o r i g i n a l s p i k e was 2 ppm, so t h e " t r u e " SO, as

50,

very

small

t o SO,-XIII). T a b l e 5 a l s o shows good b e t w e e n t h e t w o SO,, a m o u n t s d e t e r m i n e d by d i r e c t SO,-X

a n a l y s i s and by MSA.

T h e s e amounts r e p r e s e n t f o r a l l p r a c t i c a l

t r u e value,

is: amount f o u n d = amount p r e s e n t = t r u e amount, which i n d i c a t e s d a t a r e l i a b i l i t y . 100% of

purposes For

waters,

higher

the

levels

SO,

Figures

14-17

agreement between t h e

in

that

four other

humic

Table 6 c l e a r l y

and

t h r e e amounts,

acid

fortified

indicate excellent

found-present-true,

thus

a d d i n g f u r t h e r s u b s t a n t i a t i o n t h a t t h e d a t a o b t a i n e d by I C a r e unaffected

i n t e r f e r e n c e s from c o l o u r and

by

various

types

of

o r g a n i c m a t t e r , and a r e t h e r e f o r e r e l i a b l e . O t h-_e r -P-_ o s s- i___ ble I n t e r f e r e n __ ts

.-

Crowther

(1980)

tested

possible

interference

from

pH,

F e ( II), Mn(VI1) , humic a c i d , t a n n i c a c i d and l i g n i n s u l f a t e on a s i n g l e s t a n d a r d a d d i t i o n o f 1 0 ppm SO, and f o u n d t h a t t h e I C

r e c o v e r i e s were s a t i s f a c t o r y . The Before

presence the

estimated

of

barium

introduction the

precipitation

amount of

of

BaSO,,

( S i l l e n and M a r t e l l ,

may

of

Ba

which

Ba

using

1964).

cause c h e m i c a l i n t e r f e r e n c e . s a m p l e s t o be t e s t e d , w e

to

the

could

be

added

solubility

without

product

data

One ppm was e s t i m a t e d t o be s a f e

t h i s amount was a d d e d t o two HA f o r t i f i e d waters h a v i n g colour o f 50 and 440 H . U . and 2 ppm SO,. The r e c o v e r i e s o f SO, a s d e t e r m i n e d by I C were - 1 0 0 % ( T a b l e 5 , SO,-XXX, SO,-XXXI). T h e s e r e s u l t s i n d i c a t e t h a t B a , a l s o , causes no i n t e r f e r e n c e and

and h e n c e f u r t h e r s u b s t a n t i a t e s t h e r e l i a b i l i t y o f t h e v a l i d a t i o n o f t h e I C method.

I C d a t a and

74

1

:I 7

"I

:/

J

/ J

-

0

x 10

so4 -I! ATKINS B R O X

coLoR.1Go.Doc.1.1

SO4-

PI

UPPER MERCY RIVER COLcf?.So, Doc.9

I

'

0

2

4 Ppm

Fig 6

6

8

10

12

14

so4 added

ppm SO4 added Fig. 7 IC STANDARD AM)ITiON CVRVE FOR SO,-=

IC STANDARD ADDITION CURVE FOR SCb- P

7 t7

/

MOUNT TOM BROOK

coLoR-xx).Doc.lo.7 coLoR-xx).Doc.lo7

FQ. 8

IC STANDARD ADDVlON CURVE FOR SO.-lXS

Fig. 9

I C STANDARD Aw(TK)N WRVE KH S04-XIU

15

6-

6-

FIE 10 I C STANDARD ADDITION CURVE

A)R

S04-Xm

Fig 11

'f

F u . W IC

STANDARD ADDITON CURM

A)R

SO4-=

IC

STANDARD W T I O N CuLlVE

A)R

S04-XQ

76

,

12r

0

2

4

6

wm Fig 14

IC STANDARD ADDITON CURVE FOR SO,-XXn

8 1 0 1 SO4added

Fio 15 I C STANDARD ADDITON

"i

aRIM

'

2

FOR SO4--

12r

121

U

HA FORTIFIED CoLoR.440.Doc~7.2

' 0

2

4

6 8 10 ppm SO4 added

Fig.16 IC STANDARD ADDITCN CURVE

12

FOR SO,-xXm

4

6

0 ppm SO, added

rb

1'2

Fig. l7 I C STANWRD ADDITION CURVE FOR Sod-=

i4

77

A p p r o a c h e s for S a l v a g i n g H i s t o r i_c_a_l _D_ a_t a ~

Having e s t a b l i s h e d t h e q u a l i t y o f and

MTB

approaches data.

we

methods, in

This

elsewhere.

to

order

salvage

is

evaluation

data generated

to

proceeded

explore

and

historical

lengthy

and

by t h e I C

evaluate

the

MTB

colorimetric

be

communicated

will

The f o l l o w i n g b r i e f l y s u m m a r i z e s t h e f i n d i n g s :

1)

There

i s no

which

readily

simple

and

universal

converts

the

correction

historical

data

factor

to

true

values.

SO,

However, w e h a v e f o u n d t h a t f o r a s p e c i f i c amount and

2)

n a t u r e of o r g a n i c m a t t e r , t h e r e e x i s t s a r e l a t i o n s h i p b e t w e e n SO,

d e t e r m i n e d by MTB and SO,

d e t e r m i n e d by

t o s a l v a g e h i s t o r i c a l MTB d a t a , w e s i m p l y c a s e by case t h e t w o t y p e s of SO, values,

Thus,

IC.

obtain

r e l a t e them by a p o l y n o m i a l e q u a t i o n , a n d i n t e r p o l a t e t h e corresponding h i s t o r i c a l values to o b t a i n t h e e x p e c t e d t r u e SO,

values.

The c a s e by case t r e a t m e n t

can involve a s p e c i f i c site, r i v e r , of

them

which

similar

have

lake,

amount

or a group

and

nature

of

o r g a n i c matter. ACKNOWLEDGEMENT

We t h a n k D r . T. P o l l o c k f o r t h e many e n l i g h t e n e d d i s c u s s i o n s and

Dr.

Kerekes

J.

region;

for

providing

MacCrae and P .

R.

the

samples

from

Atlantic

Campbell f o r s u p p l y i n g t h e samples

from O n t a r i o r e g i o n . REFERENCES Agemian,

a n d Cheam,

H.

V.,

1978.

Anal.

Chim.

lo?,p p .

Acta,

193-197. Bader, M . ,

1980.

J . Chem.

Ed.,

5 7 ( 1 0 ) , pp.

703-706.

" S p e c i a l s t u d y on s o f t and c o l o u r e d w a t e r s t o LRTAP w a t e r s h e d s t u d y a r e a s - I o n i c b a l a n c e , a c i d i t y , a l k a l i n i t y , major i o n s and p h y s i c a l p a r a m e t e r s I R Q C 88 t o 9 1 " . NWRI M a n u s c r i p t N o . 6 1 , AMD-6-82-VC. 1982. "Manual €or t h e b i m o n t h l y Cheam, V. and Chau, A.S.Y., Cheam,

V.,

1982.

pertinent

interregional NO.

quality

control

studies".

Chem.,

5 1 ( 8 ) , pp.

NWRI

Manuscript

48, AMD-6-82-VC.

C r o n a n , C.S.,

1979.

Anal.

1333-1335.

78

Crowther, J., 1980. "Sulfate analysis for streams and lakes from Dorsett area". Memorandum to S. Villard and P. Dillon, Ontario Ministry of the Environment. Environment Canada, 1981. "Analytical Methods Manual". Inland Waters Directorate, Water Quality Branch, Ottawa. Julshamm, K. and Braekhan, O.R., 1975. At. Absorpt. Newsletter, 14(3), pp 49-52. Kalivas, J.H., 1983. Anal. Chem., 55, pp 565-567. Kerekes, J., Howell, G., Beauchamp, S. and Pollock, T., 1982. "Characterization of three lake basins sensitive to acid precipitation in central Nova Scotia (June, 1979 to May, 1980)". Int. Revue Ges. Hydrobiol., 67(5): 679-694. Kerekes, J. and Pollock, T., 1983. Can. J. Fish. Aquat. Sci., 40 (12)I pp 2260-2261. Kerekes, J., Howell, G., and Pollock, T., 1984. Problems associated with sulfate determination in coloured, humic waters in Kejimkujik National Park, Nova Scotia (Canada). Verh. Internat. Verein. Limnol., 22: 1811-1817. Klein, R. Jr., and Hach, C., 1977. Am. Lab, July 1977, pp. 21-27. Pollock, T., 1983. "Determination of sulfate in Atlantic Canada surface waters". Draft of a report. Sillen, L.G. and Martell , A.E. I 1964. Special Publication No. 17, The Chemical Society, London. Underwood, J.K., Vaughan, H.H., Ogden, J.G., 111, and Mann, C.G., 1982. "Acidification of Nova Scotia Lakes 11; Ionic Balances in Dilute Waters". Nova Scotia Department of the Environment (Halifax, Nova Scotia). Technical Report. Underwood, J.K., McCurdy, R.F. and Borgal, D., 1983. Effects of colour inteference on ion balances in Atlantic Canada lake waters using methylthymol blue sulfate results: Some preliminary observations. Workshop Proceedings (Kejimkujik Calibrated Catchments Program), Life Science Centre, Dalhousie University, Halifax, Nova Scotia (April 26, 1983). Ed. J. Kerekes. Watt, W.D., Scott, C.D. and White, W.J., 1983. Can. J. Fish. Aquat. Sci. , 40, pp 462-473. __ Workshop on Chemistry of Organic Waters, January 20, 1983. NWRI, CCIW, Burlington, Ontario.

THE IMPORTANCE OF DESIGN QUALITY CONTROL TO A NATIONAL MONITORING PROGRAM R.E.

KWIATKOWSKI

Water Quality Branch, 10th Floor, Place Vincent Massey, Inland Waters Directorate, Ottawa, Canada, K1A OE7 ABSTRACT: Water quality monitoring within Canada has been carried out by the Water Quality Branch, Department of the Environment, since 1970. Recent analyses of national (coast to coast) data for a variety of water quality parameters have identified an inherent difficulty with the interpretability of the data sets stored on NAQUAOAT (Canada's National Water Quality Data File). A lacustrine system (Lake Ontario) will be used as a case example of this difficulty, described as network design assurance. The importance of network design assurance, both spatially and temporally, to regional and national data sets, will be discussed, and a possible solution presented. Comments on the design of lotic networks under the newly implemented Water Quality Federal-Provincial Agreements will be discussed in relation to the above difficulty, identifying Canada's present attempt to produce data sets capable of generating statistically and ecologically valid national reports.

_INTRODUCTION _ Water represents one of Canada's most valuable resources, covering some percent of its surface. There is no substitute for water. The survival of all forms of life depends upon an adequate supply of water of acceptable quality. Thus, a sound knowledge of water quality is essential to all levels of government for the management of present water uses and for the planning of future uses. While management responsibilities for water in Canada are shared between the provinces and the federal government. the federal government plays an important leadership role, particularly when addressing water quality on a national level. The Water Quallty Branch (WQB), Inland Waters Directorate (IWD), Department of the Environment (DOE) is responsible for providing this leadership. Since its conception in 1970, the Water Quality Branch has carried out water quality monltorlng to provide scientific and technical information and advice on water quality in order to promote the conser,vation and enhancement of the quality of Canada's Inland water resources for the economic and social lenefit of all Canadians. 7.6

80

I t i s envisaged t h a t n a t i o n w i d e ( c o a s t t o c o a s t ) i s s u e s (e.g.

acid rain,

p e s t i c l d e r . e t c . ) w i l l c o n t i n u e t o be o f c o n c e r n and m a j o r energy developments w i l l be t h e cause f o r w a t e r q u a l i t y concerns a f f e c t i n g e n t l r e r e g i o n s w i t h i n Canada i n t h e f u t u r e .

Proposed c o n t i n e n t a l w a t e r d i v e r s i o n

schemes, such as t h e N o r t h American Water and Power A l l i a n c e t o d i v e r t w a t e r f r o m A l a s k a and t h e Yukon t o t h e s o u t h w e s t e r n U n l t e d S t a t e s , and t h e GRAND Canal ( G r e a t R e c y c l i n g and N o r t h e r n Development Canal) p r o j e c t ,

i n v o l v i n g t h e b u i l d i n g o f a dam a c r o s s t h e mouth o f James Bay and d i v e r t i n g w a t e r t h r o u g h t h e G r e a t Lakes t o t h e c e n t r a l and e a s t e r n U n i t e d S t a t e s , pose s i g n i f i c a n t e c o l o g l c a l concerns.

A sound knowledge o f t h e

w a t e r q u a l i t y i n Canada i s b a s i c t o an assessment o f t h e e n v i r o n m e n t a l and economic i m p a c t o f such developments and i t I s t h e f e d e r a l government t h a t has t h e mandate and t h e r e s p o n s i b i l i t y t o c o l l e c t t h i s n a t i o n a l w a t e r q u a l l t y data. To meet t h i s n a t i o n a l o b j e c t i v e t h e WQB c o n s i s t s o f a h e a d q u a r t e r s , l o c a t e d i n Ottawa, and f i v e r e g i o n a l o f f i c e s (Moncton L o n g u e u i l - Quebec Region, B u r l i n g t o n

-

-

A t l a n t i c Region,

O n t a r l o Region, Regina - Western

and N o r t h e r n Region, and Vancouver - P a c i f i c and Yukon Region. F i g u r e 1 ) . Water Q u a l i t y m o n i t o r i n g programs c a r r i e d o u t by t h e r e g i o n a l o f f i c e s a r e i n d i r e c t response t o t h e needs o f t h e r e g i o n , as d i c t a t e d by t h e R e g i o n a l D i r e c t o r , w h i l e h e a d q u a r t e r s (WQB) c a r r ! e s o u t a f u n c t i o n a l g u i d a n c e role.

I t s h o u l d be n o t e d t h a t t h e WQB i s o p e r a t i o n a l i n n a t u r e .

Research

and r e l a t e d s u p p o r t s e r v i c e s a r e p r o v i d e d by t h e r e s e a r c h i n s t i t u t e s . N a t i o n a l Water Research I n s t i t u t e (NWRI) and N a t i o n a l H y d r o l o g y Research I n s t l t u t e (NHRI).

The Branch i n t u r n p r o v i d e s o p e r a t i o n a l s u p p o r t t o b o t h

institutes. REGIONAL OFFICES AND HEADQUARTERS WATER QUALITY BRANCH CANADA

81 Recent r e q u e s t s , such as t h a t f r o m t h e Pearse Comnisssion's I n q u i r y on F e d e r a l Water P o l i c y , f o r d a t a on a v a r l e t y o f w a t e r q u a l i t y parameters f r o m v a r i o u s r i v e r b a s i n s a c r o s s Canada have r e v e a l e d a m a j o r d i f f i c u l t y w i t h t h e I n t e r p r e t a b i l i t y o f t h e d a t a s t o r e d on NAQUADAT ( N a t i o n a l Water Q u a l l t y Data Base, W h i t l o w and Lamb, 1983).

This d i f f i c u l t y .

referred t o

i n t h i s m a n u s c r i p t as " d e s i g n q u a l i t y assurance". must be c o n s i d e r e d when i n t e r p r e t i n g national (coast t o coast) or regional data t o d e f i n e A l a c u s t r i n e system (Lake O n t a r i o ) w i l l

d i f f e r e n c e s between r i v e r b a s i n s . be o f f e r e d as a case example.

Comments on how t o e s t a b l i s h a l a c u s t r i n e

and a l o t i c system m o n i t o r i n g program t o overcome t h e d i f f i c u l t y w i l l a l s o be s u p p l i e d . The Problem There a r e v a r i o u s phys c a l and b i o c h e m i c a l processes i n v o l v e d i n t h e s t o r a g e and t r a n s p o r t o f p o l l u t a n t s i n a l a c u s t r i n e system. C o n c e n t r a t i o n s o f p o l l u t n t s I n t h e aqueous phase a r e maximum a t t h e p o i n t o f e n t r y i n t o t h e a q u a t i c system.

These i n p u t s can undergo c o a s t a l

entrapment due t o s p e c i f i c p a t t e r n s and t h e r m a l s t r u c t u r e s c h a r a c t e r i s t i c o f i n s h o r e areas (Csandy. 1970).

As a r e s u l t , n e a r s h o r e environments

o f t e n d i s p l a y s i g n i f i c a n t l y h i g h e r c o n c e n t r a t i o n s o f i n p u t s and g r e a t e r

As a

v a r i a b l l l t y t h a n do t h e o f f s h o r e areas (Beeton and Edmondson, 1972).

r e s u l t concentrations of various p o l l u t a n t s are n o t stationary, but rather t h e y f o r m a continuum, s p a t i a l l y and t e m p o r a l l y .

A common o b j e c t i v e o f

m o n i t o r i n g o r s u r v e l l l a n c e programs i s t o sample t h i s continuum and e s t i m a t e a mean v a l u e ( f o r a g i v e n a r e a f o r a g i v e n t i m e p e r i o d ) f o r a v a r i e t y o f water q u a l i t y v a r i a b l e s :

where: ji

=

n 1

xIjp

r21

j=l

XA i s t h e annual average v a l u e f o r t h e g i v e n w a t e r q u a l i t y parameter. I I s t h e number o f sampling t r i p s f o r t h e g i v e n year i = 1 , 2, 3

xc

.... N .

I s t h e sampling t r i p average v a l u e f o r t h e g l v e n w a t e r q u a l i t y

parameter. x i j I s t h e c o n c e n t r a t l o n f o r t h e g l v e n w a t e r q u a l i t y parameter d u r i n g t h e I t h sampling t r i p , a t t h e j t h s t a t i o n . n i s t h e number o f s t a t i o n s j f o r sampling t r l p 1.

82

Subsequent comparisons o f t h e mean values ( a n n u a l l y , sampling t r i p ,

XA,

o r by

Xc) u t i l i z i n g a v a r i e t y o f parametric o r nonparametric

s t a t i s t i c a l techniques. a r e then performed, e i t h e r s p a t i a l l y t o determine areas o f s i g n i f i c a n t d e t e r i o r a t i o n , o r t e m p o r a l l y t o determine trends. Major d i f f i c u l t i e s i n o b t a i n i n g an adequate e s t i m a t e o f t h e mean value

XA

include:

t h e s p e c i f i c a t i o n o f s t a t i o n l o c a t i o n t o supply adequate

s p a t i a l coverage; adequate sampling frequency, t o d e s c r i b e seasonal f l u c t u a t i o n s ; and s u f f i c i e n t observations, t o reach a d e s i r e d confidence i n t e r v a l about t h e mean value; a l l w i t h i n a predetermined budget,

Thus

t h e l o c a t i o n , number and frequency o f measurements a r e based on t h e o b j e c t i v e s o f t h e program. t h e a v a i l a b l e s c i e n t i f i c knowledge, and resource a v a i l a b i l i t y .

U n f o r t u n a t e l y these v a r i a b l e s a r e n o t constant

between regions, and o f t e n n o t between programs w i t h i n a r e g i o n . Consequently t h e e f f o r t p u t i n t o t h e e s t i m a t e

(X)

o f t h e mean value (11)

o f t h e water q u a l i t y v a r i a b l e v a r i e s s u b s t a n t i a l l y between programs, depending on t h e issues o r parameters o f i n t e r e s t . When n a t i o n a l ( c o a s t t o c o a s t ) assessments, f o r any g i v e n parameter, a r e c a r r i e d out, various d a t a f i l e s f r o m t h e v a r i o u s programs c a r r i e d o u t w i t h i n t h e f i v e WQB regions a r e simply brought t o g e t h e r f o r e i t h e r comparative purposes, o r t o e s t a b l i s h a n a t i o n a l average c o n d i t i o n . Rarely i s design v a r i a b i i t y between programs taken i n t o account i n t h e interpretation of the r e u l t s .

I t i s a simple m a t t e r t o o b t a i n t h e number

and l o c a t i o n o f s i t e s ex s t i n g i n t h e v a r i o u s r i v e r basins from t h e computer f i l e s ; however, i t i s n o t always c l e a r how t h e number o f s t a t i o n s , t h e i r l o c a t i o n and parameters measured were chosen.

Yet w i t h o u t

t h i s i n f o r m a t i o n ( d e s i g n q u a l i t y assurance), gross m i s r e p r e s e n t a t i o n o f p o l l u t a n t c o n c e n t r a t i o n s on a r e g i o n a l o r n a t i o n a l s c a l e can occur, making comparisons, though s t a t i s t i c a l l y s i g n i f i c a n t . e c o l o g i c a l l y i n s i g n i f l c a n t . The L a c u s t r i n e System Concerned about t h e d e t e r i o r a t i o n o f t h e water q u a l i t y o f t h e lower Great Lakes, t h e Governments o f Canada and t h e U n i t e d States signed t h e Great Lakes Water Q u a l i t y Agreement I n 1972.

Prior t o the signing of the

Agreement, m o n i t o r i n g o f Lake O n t a r i o was i n c o n s i s t e n t .

No two years

d i s p l a y e d s i m i l a r s t a t i o n p a t t e r n s , o r c r u i s e (sampling t r i p ) frequency. I n 1974 a c o n s i s t e n t open l a k e s u r v e i l l a n c e program was designed f o r Lake O n t a r i o t o m o n i t o r v a r i o u s water q u a l i t y parameters, i n c l u d i n g t h e biomass parameter c h l o r o p h y l l

a.

The s t a t i o n p a t t e r n was designed t o g i v e an

o v e r a l l view o f t h e lake, w i t h maximum sampling o c c u r r i n g i n t h e two t o t e n k i l o m e t r e range from shore ( t h e area which showed maximum v a r i a b i l i t y I n t h e past. F i g u r e 2 ) .

F u r t h e r d e t a i l s on s t a t i o n l o c a t i o n . sampling

83 STATION PAlTERN FOR LAKE ONTARIO, 1974-80

n

STATION DELETED 1975

CI

STATION NOT SAMPLED 1975-1976

I

STATION DELETED 1977

L

I

.110111r%

I r a

+ urn

0

STATION ADDED 1975

0

STATION ADDED 1976

STATION ADDED i 9 n

F i g u r e 2:

S t a t i o n l o c a t i o n f o r Lake O n t a r i o S u r v e l l l a n c e program, 1974-1980.

frequency and parameters sampled, f o r t h e Lake O n t a r i o S u r v e l l l a n c e Program 1968-1980,

can be found i n Kwlatkowskl and N e l l s o n (1983).

years o f s u r v e i l l a n c e d a t a f r o m Lake O n t a r l o , 1974-1976, q u a l i t y parameter c h l o r o p h y l l

a

f o r t h e water

was s u b j e c t e d t o a r e g r e s s i o n model

developed by El-Shaarawl and Shah (1978). o r i g i n a l observations,

Three

The model transformed t h e

so t h a t t h e transformed values approximately

s a t l s f y t h e assumptions r e q u i r e d f o r t h e a p p l i c a t i o n o f t h e a n a l y s i s o f variance; 1.e.

t h e mean v a l u e can be expressed as a l i n e a r combinations o f

t h e I n f l u e n c i n g f a c t o r s , constancy o f variance, a d d i t i v i t y o f t h e e r r o r term, and n o r m a l l t y o f d i s t r i b u t i o n .

An a d d i t l v e l i n e a r model w l t h

seasonal and s p a t l a l components was then f i t t e d t o t h e transformed data. A h l e r a r c h l c a l c l a s s l f l c a t i o n procedure u s i n g e s t i m a t e s o f s p a t l a l e f f e c t s was t h e n a p p l i e d t o d i v i d e t h e l a k e I n t o s t a t l s t l c a l l y homogeneous zones, ( p 5 0.05).

I n o r d e r t o p r o v i d e an overview, a composite z o n a t i o n map

o f t h e 1974-1976 d a t a was drawn ( F i g u r e 3,) w i t h a p o i n t source zone ( a r e g l o n o f d l r e c t i n p u t , w l t h maxlmum c o n c e n t r a t i o n s and maximum v a r l a b i l l t y ) . an i n s h o r e zone ( a r e g i o n where n a t u r a l d i l u t i o n had reduced c o n c e n t r a t i o n s and v a r i a b i l i t y t o moderate l e v e l s ) , and an o f f s h o r e zone ( t h e l a r g e , deep c e n t r a l p o r t i o n o f t h e l a k e which can o n l y be a f f e c t e d by prolonged l o a d i n g s ) . Although each year produced s l l g h t l y d i f f e r e n t seasonal p a t t e r n s w l t h respect t o chlorophyll

a

(Kwiatkowskl, 1978). s l m i l a r i t i e s . t y p e f i e d by

t h e 1974 d a t a were e v l d e n t .

C r u i s e averages f o r t h e 1974 c h l o r o p h y l l

d a t a f o r each zone d e s c r i b e d i n F i g u r e 3 a r e p l o t t e d a g a l n s t t l m e i n

a

84

LAKE ONTARIO COMPOSITE OF CHLOROPHYLL ZONES

INSHORE IPOINT

OFFSHORE

F gure 3:

A composlte zonation map of chlorophyll &.

Taken from Kwiatkowski 1978. Figure 4 . The chlorophyll & concentrations for each zone displayed spring and fa 1 peaks typical of temperate lakes. There was a progression for the zones closest to the nutrient inputs to have higher chlorophyll concentrations. Similarly, the timing of the spring peak was progressively later, the farther away from shore the sample was taken. The movement o f the thermal bar (Rodgers, 1965) is probably the most SEASONAL PATTERN FOR CHLOROPHYLL ZONES (LAKE ONTARIO 1974)

~.

~~

Figure 4: Cruise mean value obtained for chlorophyll concentrations, 1974, from composite zonation map.

c

0 1 J

F

' '

1

M

h

1

A

,

1

'

M

'

I

' '

1

J J DATE

' '

'

A

'

' ' S

' '

'

O

' ' '

N

'

'

85

l i k e l y explanation f o r t h i s r e s u l t .

It i s interesting t o note that i n

zone 3 ( p o i n t s o u r c e ) t h e s p r i n g peak had a l r e a d y o c c u r r e d b e f o r e t h e

f i r s t c r u i s e ( 1 . e . t h e peak o c c u r r e d b e f o r e A p r i l ) . E v i d e n t l y a b i a s can be i n t r o d u c e d i n t h e a n n u a l (XA) o r c r u i s e means ( X c ) by s i m p l y moving s t a t i o n s f r o m one zone t o a n o t h e r .

The i m p o r t a n c e

o f t h i s b i a s can b e s t be e x p l a i n e d t h r o u g h a h y p o t h e t i c a l example, u s i n g t h e 1 9 7 4 Lake O n t a r i o c h l o r o p h y l l

a

data.

Assume t h a t t h e r e a r e two Lake

O n t a r i o s i d e n t i c a l i n e v e r y a s p e c t , e x c e p t one i s l o c a t e d i n t h e WQB-Ontario Region, and one i s l o c a t e d i n t h e WQB-Pacific and Yukon Region.

B o t h r e g i o n s e s t a b l i s h m o n i t o r i n g programs.

Due t o budget

r e s t r i c t i o n s , b o t h r e g i o n s can o n l y a f f o r d t o do 32 s t a t i o n s .

I n an

attempt t o b e t t e r d e f i n e t h e temporal v a r i a b i l i t y i n h e r e n t i n b i o l o g i c a l sampling, 1 4 c r u i s e s a r e conducted by each r e g i o n .

O n t a r i o Region,

c o g n i z a n t o f t h e f a c t t h a t t h e w a t e r r e s o u r c e i n Canada i s a shared federal-provincial

r e s p o n s i b i l i t y , m o n i t o r s m a i n l y t h e open l a k e

component, w i t h a few p o i n t s o u r c e s t a t i o n s ( F i g u r e 5 ) .

P a c i f i c and Yukon

Region, i n t e r e s t e d i n d e f i n i n g t h e p o l l u t a n t movement f r o m p o i n t sources and aware t h a t maximum v a r i a b i l i t y o c c u r s i n t h e n e a r s h o r e w a t e r s , samples m a i n l y n e a r t h e known i n p u t s , w i t h some open l a k e s t a t i o n s t o d e f i n e p r i s t i n e conditions (Figure 6).

A p l o t o f t h e mean c r u i s e v a l u e s o b t a i n e d

by t h e two r e g i o n s , as w e l l as t h a t o b t a i n e d f r o m t h e t r u e s u r v e i l l a n c e program a r e p l o t t e d i n F i g u r e 7. and d e t a i l s on t h e number o f o b s e r v a t i o n s

(N), t h e average v a l u e ( X c ) s t a n d a r d d e v i a t i o n (SD) and t h e c o e f f i c i e n t of v a r i a t i o n (CV),

by c r u i s e a r e g i v e n i n T a b l e 1. STATION LOCATIONS FOR LAKE ONTARIO ONTARIO REGION

\I 79'30' + 43'00'

F i g u r e 5:

4

3?

KILOMETERS

H y p o t h e t i c a l s t a t i o n l o c a t i o n s f o r Lake O n t a r i o program Water Q u a l i t y Branch, O n t a r i o Region.

86 Though t h e seasonal p a t t e r n i s q u i t e s i m i l a r f o r a l l t h r e e sampling designs, mean c r u l s e c o n c e n t r a t i o n s ,

especially during the spring period,

a r e s i g n i f i c a n t l y (p50.01) d i f f e r e n t between t h e O n t a r i o design, versus t h e P a c i f i c and Yukon design, Table 1.

O f t h e f o u r t e e n c r u i s e s conducted,

o n l y two ( c r u i s e s 11 and 13) were n o t s i g n i f i c a n t l y ( p 0 . 0 5 ) d i f f e r e n t from one another.

The d i f f e r e n c e i n c h l o r o p h y l l

a

c o n c e n t r a t i o n s between

t h e two r e g i o n s , expressed as a percentage, v a r i e d from 8.1 t o 133.2%, w i t h an annual average d i f f e r e n c e o f 44.5% ( T a b l e 2 ) . Dobson (1981) has i n d i c a t e d t h a t c h l o r o p h y l l g c o n c e n t r a t i o n s i n s u r f a c e waters can be used as a t r o p h i c index, w i t h values l e s s t h a n 2pg/l i n d i c a t i n g o l i g o t r o p h i c waters, 2-6pg/l 6pg/l e u t r o p h i c .

mesotrophic and g r e a t h e r t h a n

I f t h e s e d e s c r i p t o r s were t o be accepted as g u i d e l i n e s

by t h e WQB, d a t a generated by t h e O n t a r i o Region's network would i n d i c a t e t h a t t h e i r l a k e i s moderately mesotrophic, w i t h maximum c o n c e n t r a t i o n s (5.20pg/1)

o c c u r r i n g i n e a r l y summer, and d u r i n g t h e f a l l p e r i o d

(8.3Opg/l)

a f t e r thermal s t r a t i f i c a t i o n breakdown. ( F i g u r e 7 ) .

summer minimum o f 2.77pg/l

The l a t e

would be d e s c r i b e d as a r e s u l t o f n u t r i e n t

d e p l e t i o n o f t h e e p i l i m n e t i c waters.

The network d e s i g n by P a c i f i c and

Yukon Region would i n d i c a t e t h a t t h e i r Lake i s e u t r o p h i c w i t h maximum l e v e l s o c c u r r i n g I n t h e s p r i n g (8.41pg/l), (9.14pg/l)

periods.

summer (6.89pg/l)

and f a l l

Low (mesotrophic) l e v e l s a r e o n l y reached i n l a t e

summer due t o n u t r i e n t d e p l e t i o n . I f a n a t i o a1 ( a c r o s s Canada) comparison o f t h e t r o p h i c s t a t u s o f l a k e s ,

as p r e d i c t e d by t h e biomass i n d i c a t o r c h l o r o p h y l l a w e r e t o be requ red, t h e i n f o r m a t on would be e x t r a c t e d f r o m t h e two d a t a s e t s w i t h i n STATION LOCATIONS FOR LAKE ONTARIO PACIFIC AND YUKON REGION

I

79-30'

+ 43-00,

F i g u r e 6:

KILOMETERS

H y p o t h e t i c a l s t a t i o n l o c a t i o n s f o r Lake O n t a r i o program Water Q u a l i t y Branch, P a c i f i c and Yukon Region.

SEASONAL PATTERN FOR CHLOROPHYLL FROM THREE SAMPLING SCENARIOS (LAKE ONTARIO, 1974)

F i g u r e 7: chlorophyll

C r u i s e mean v a l u e s f o r

a

c o n c e n t r a t i o n s , 1974,

f r o m t h e O n t a r i o , P a c i f i c and Yukon, and S u r v e i l l a n c e programs.

:i,,-Efii 8

OYTARIO REGION

0 J

F

M

A

M

J

J

A

S

O

N

D

DATE

NAQUADAT.

P o s s i b l e c o n c l u s i o n s drawn f r o m comparisons o f t h e s e two d a t a

s e t s would be t h a t i n O n t a r i o , though f a l l l e v e l s o f c h l o r o p h y l l

a

are

worrisome and d e s e r v e f u t u r e s t u d y , t h e l a k e does n o t r e q u i r e any r e m e d i a l action.

I n t h e P a c i f i c and Yukon Region c h l o r o p h y l l & l e v e l s a r e

s i g n i f i c a n t l y h i g h e r t h a n i n O n t a r i o Region f o r a l l seasons.

The l a k e i s

h i g h l y e u t r o p h i c d u r i n g t h e s p r i n g and f a l l p e r i o d s , w i t h h i g h m e s o t r o p h i c l e v e l s r e p o r t e d i n t h e summer.

Remedial a c t i o n ( n u t r i e n t r e d u c t i o n . ) i s

warranted. One p o s s i b l e s o l u t i o n t o t h e above s t a t e d p r o b l e m i s a r e a l w e i g h t i n g . By assuming:

t h a t a l a k e can be p a r t i t i o n e d i n t o n homogeneous s a m p l i n g

p o p u l a t i o n s o r s t r a t a , each o f a r e a ha; t h a t t h e sample s t a t i o n s a r e independent and randomly o b t a i n e d ; and t h a t t h e sample mean i s n o r m a l l y d i s t r i b u t e d . N, t h e number o f o b s e r v a t i o n s needed t o e s t a b l i s h t h e t r u e c r u i s e mean l e v e l , a t any g i v e n c o n f i d e n c e i n t e r v a l can be c a l c u l a t e d f r o m :

where u i s t h e e s t i m a t e o f u based on t h e a v a i l a b l e i n f o r m a t i o n , and

88

has m degrees o f freedom.

The v a l u e o f t t o be used i n t h i s f o r m u l a i s

t h e c r l t l c a l v a l u e r e a d f r o m t h e t a b l e o f S t u d e n t s ' t f o r m degrees o f freedom, a t t h e l e v e l o f s i g n i f i c a n c e c o r r e s p o n d i n g t o t h e r e q u i r e d c o n f i d e n c e c o e f f i c i e n t and L i s t h e s p e c i f i c e r r o r .

& d. 1979,

1979, Ward

(Mandel 1964, Green

Dunnette 1980 and Nelson and Ward 1981).

Thus

g i v e n t h e r e q u i r e d v a l u e s o f L and t h e c o n f i d e n c e c o e f f i c i e n t , N can be computed f r o m a subsample where x, and m a r e known.

Once t h e c r u i s e means

f o r each s t r a t a a r e determined, and by assuming a s i m p l e l i n e a r d i l u t i o n between t h e m i d - p o i n t s o f t h e v a r i o u s s t r a t a , a whole l a k e mean c r u i s e v a l u e can be e s t i m a t e d by:

where:

-

i s t h e whole l a k e w e i g h t e d s u r f a c e c o n c e n t r a t i o n f o r any

xCW

given cruise,

xj

i s t h e average s u r f a c e c o n c e n t r a t i o n f o r s t r a t a j, j = 1 t o

n, a,

t h e s u r f a c e a r e a between t h e m i d p o i n t s o f each s t r a t a ,

'IS

i

=

1 t o n.

I t s h o u l d be n o t e d t h a t ha can be r e p l a c e d by Av ( t h e volume o f each strata).

However, t h i s would r e q u i r e a s i g n i f i c a n t l y g r e a t e r sampling

e f f o r t ( w i t h depth),

e s p e c i a l l y d u r i n g t h e summer s t r a t i f i c a t i o n p e r i o d .

C l u s t e r i n g o f t h e d a t a I n t o s t a t i s t i c a l l y homogeneous zones ( o r s t r a t a ) as done i n F i g u r e 3 a l l o w s f o r g r e a t e r e f f o r t , i n terms o f s t a t i o n numbers, t o improve

x3'

t h e e s t i m a t e d mean c o n c e n t r a t i o n o f s t r a t a j ,

w i t h o u t a d v e r s e l y a f f e c t i n g t h e whole l a k e average.

Increased e f f o r t ,

even w i t h i n f i x e d networks, i s an i n h e r e n t component i n a l l m o n i t o r i n g programs.

W i t h i n t h e s u r v e i l l a n c e program on Lake O n t a r i o , a program

o r i g i n a l l y designed t o be s t a b l e w i t h r e s p e c t t o s t a t i o n l o c a t i o n and sample frequency, changes have o c c u r r e d i n program.

response

t o t h e needs o f t h e

Between t h e y e a r s 1974 t o 1980, changes i n s t a t i o n l o c a t i o n

( F l g u r e 2) have r e s u l t e d i n a 40% i n c r e a s e i n e f f o r t f o r t h e p o i n t source area, a 13% l n c r e a s e i n e f f o r t f o r t h e i n s h o r e area, and a 6% decrease i n e f f o r t f o r t h e o f f s h o r e area.

Thus comparison o f whole l a k e averages

between h i s t o r i c a l and p r e s e n t l y c o l l e c t e d d a t a , must t a k e t h i s change i n t o account t o ensure t h a t changes have t r u l y o c c u r r e d and a r e n o t due t o r e l o c a t i o n o f sampling e f f o r t . A r e a l w e i g h t e d v a l u e s ( e q u a t i o n 4) f o r t h e 1 4 c r u i s e s conducted i n 1974 were c a l c u l a t e d u s i n g t h e t h r e e sampling s c e n a r i o s p r e v i o u s l y d e s c r t b e d

89 (e.g.

O n t a r i o Region, P a c i f i c and Yukon Region, and t h e s u r v e i l l a n c e The c a l c u l a t e d

program), and t h e t h r e e s t r a t a d i s p l a y e d i n F i g u r e 3. values a r e p l o t t e d i n F i g u r e 8.

The s p r i n g p e r i o d s t i l l produces t h e

g r e a t e s t d i f f e r e n c e s between t h e v a r i o u s scenarios, however, d i f f e r e n c e s between t h e O n t a r i o Region's and P a c i f i c and Yukon Region's sampling networks ( t h e two extreme c a s e s ) w e r e reduced t o an average o f 8.0% f r o m The d i f f e r e n c e s found i n t h e f i r s t t h r e e ( s p r i n g )

44.5% ( T a b l e 2 ) .

No s i g n i f i c a n t

c r u i s e s were reduced f r o m an average o f 107.9% t o 15.2%.

(PS0.05) d i f f e r e n c e s i n weighted mean c r u i s e c o n c e n t r a t i o n s were found between t h e O n t a r i o versus t h e P a c i f i c and Yukon network designs. SEASONAL PATTERN FOR CHLOAOPHVLL WITH WEIGHTING (LAKE ONTARIO, 1974)

F i g u r e 8:

A r e a l weighted c r u i s e

mean values f o r c h l o r o p h y l l c o n c e n t r a t i o n s , 1974, from t h e O n t a r i o , P a c i f i c and Yukon, and s u r v e i l l a n c e programs.

0

~ J

' F

~

'

M

' A

~ M

"

'

J

~ J

' A

~ S

"

' O

" N

'

~

~

*

~

'

'

'

~

'

~

D

DATE

A g r e a t e r r e d u c t i o n i n c r u i s e mean d i f f e r e n c e s between t h e v a r i o u s

scenarios can be o b t a i n e d by e s t a b l i s h i n g p r o g r e s s i v e l y s m a l l e r s t r a t a (Aa's).

A p p l i c a t i o n o f t h e model (El-Shaarawi and Shah, 1978) on t h e

1974 c h l o r o p h y l l

a

d a t a a t t h e p 5 0.25 s i g n i f i c a n c e l e v e l , d i v i d e d t h e

l a k e i n t o seven homogeneous zones ( F i g u r e 9 ) .

I t should be p o i n t e d o u t

however, t h a t as t h e s u r f a c e areas (Aa) o f each homogeneous s t r a t a a r e reduced, t h e number o f s t r a t a , and t h e r e f o r e t h e number o f sampling

-

s t a t i o n s (N) i n c r e a s e s .

As

Aa

+

0. N

+

151

90

Table 1:

(x),

Number of observations (N), mean standard deviatlon (SD) and coefficient of variation (CV) for chlorophyll cruises. Pacific and Yukon

Ontario

Cruise 1** 2** 3** 4** 5** 6* 7* 8** 9** lo** 11 12** 13 14**

30 30 30 32 32 15 20 32 32 31 31 25 27 31

annual

* **

X

3.28 4.12 4.01 3.71 5.20 5.20 3.31 3.45 2.77 3.98 6.04 5.45 8.30 4.68

SD

CV

N

2.95 3.61 3.19 1.80 1.58 2.45 1.74 1.61 1.01 1.41 3.32 1.74 2.53 1.32

89.9 87.6 79.6 48.5 30.4 47.1 52.6 46.7 36.5 35.4 55.0 31.9 30.5 28.2

32 32 31 32 30 25 21 32 32 31 32 26 31 30

398 4.49 2.65 59.0

SD

X

7.65 8.41 7.47 5.04 6.40 6.89 4.39 5.68 4.11 5.26 6.53 6.82 9.14 6.54

-

CV

4.69 61.3 4.79 56.9 4.29 57.4 2.01 39.9 2.19 34.2 3.54 51.4 2.42 55.1 7.38 129.9 2.29 55.7 2.34 44.5 4.46 68.3 2.33 34.2 3.16 34.6 3.02 46.2

417 6.49 4.04

X

SD

5.28 6.15 6.29 4.70 5.48 5.60 3.92 4.43 3.45 4.61 5.96 6.00 8.77 5.94

4.28 4.29 4.28 2.05 1.92 3.01 2.05 4.70 1.88 2.08 4.02 2.05 2.73 2.45

N

62.2

81 82 83 85 83 55 62 85 83 84 84 71 78 79

1095 5.47 3.41

CV 81.1 69.8 68.0 43.6 35.0 53.8 52.3 106.1 54.5 45.1 67.4 34.2 31.1 41.2 62.3

significant differences ( p I 0.05) between Ontarlo. and Pacific and Yukon; one tailed student 'tt". slgnlficant differences ( p IO.01) between Ontario, and Paclfic and Yukon; one tailed student 'lt".

Table 2:

Cruise 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Annual P&Y

N

Surveillance Proqram

-

=

Percentage dlfference between scenarlos. before and after weighting. Ontario/Surveillance Before After -37.9 -7.7 -33.0 -3.3 -36.2 -3.6 -21.1 -6.1 -5.1 6.8 -7.1 -1.2 -15.6 -11.7 -22.1 -11.2 -19.7 -9.0 -13.7 -11.6 1.3 1.7 -9.2 0.4 -5.4 4.2 21.2 -16.6 -17.9

-4.1

Pacific and Yukon Region

P&Y/Surveillance Before After 44.9 10.1 36.7 10.0 18.8 8.6 7.2 -6.3 16.8 3.4 23.0 16.0 12.0 6.4 20.9 -4.2 19.1 -0.7 14.1 6.6 9.6 -1.5 13.7 -1.3 4.2 0.0 3.0 10.1 18.6

3.6

P&Y/Ontario Before After 133.2 104.1 86.3 35.8 23.1 32.5 32.6 64.6 48.4 32.2 8.1 25.1 10.1 39.7

19.3 13.7 12.7 -0.3 -3.2 17.3 20.5 7.9 9.2 20.6 -3.1 -1.8 -4.0 23.4

44.5

8.0

91 LAKE ONTARIO STATISTICAL (P-25) ZONATION 1974 CHLOROPHYLL

u KILOMETFRS

79'30'

+ 43'00'

Flgure 9: Statlstlcal zones (PS0.25) formed with chlorophyll & data on Lake Ontario. 1974. As it has been demonstrated, location of sampling stations influences

annual mean concentrations. Similarly it has also been demonstrated that for Lake Ontarlo, varying the timing of sampling, can also result in a biased whole lake mean concentration (Kwiatkowski. 1985). Weekly chlorophyll data, collected from the inshore strata of Lake Ontario, was subjected to four different sampling scenarios based on a systematic design with k equal t o one month. Not only were annual means significantly altered but so were the seasonal cycles (Figure 10).

11 -

J FMAMJ J A 5ON DJ FMAMJ J A S 1 st WEEK 2 nd W E E K

ON DJ F M A M J J A 5 ON D J 3 i d WEEK

F MA M J J A S

ON D

4thWEEK

DATE

Figure 10: Seasonal cycles obtained for chlorophyll & w i t h varying sampling scenarios. Taken from Kwiatkowski 1985.

t

The L o t l c System The r i v e r b a s i n has become t h e fundamental u n i t o f e v a l u a t i o n w i t h i n t h e WQB m o n l t o r l n g e f f o r t s .

Water q u a l i t y v a r i a b l e s w i t h i n a l o t l c system,

f r o m t h e headwaters t o i t s mouth, p r e s e n t a c o n t i n u o u s g r a d i e n t o f environmental c o n d i t i o n s .

P o l l u t a n t s e n t e r l n g t h e system a t a p o l n t

source o r a t a d i f f u s e source, r e s u l t s I n h l g h e r t h a n background c o n c e n t r a t i o n s t o be observed a t t h e p o l n t o f e n t r y .

These l e v e l s ,

t h r o u g h d l l u t i o n , a r e reduced downstream u n t i l t h e y a r e I n d i s t i n g u i s h a b l e f r o m background l e v e l s .

A d i r e c t p a r a l l e l i n t h e m o n i t o r i n g o f l o t i c and

l a c u s t r i n e systems can be e s t a b l l s h e d .

Distinct strata (pristine, point

source and r e c o v e r y ) w i t h i n t h e l o t i c system can be i d e n t i f i e d . w h i c h a r e e q u i v a l e n t t o t h e s t r a t a ( o f f s h o r e , p o i n t s o u r c e and n e a r s h o r e ) p r e v i o u s l y d e s c r i b e d i n t h e l a c u s t r i n e system ( F i g u r e 11). RIVER MOHlTORlffi

ftgure 11:

Hypothetical

zonatton

rhtch can occur in a rlver system.

I t i s n o t t h e i n t e n t o f t h i s manuscript t o belabour t h e p o i n t o f z o n a t i o n ; however, one case example o f a l o t i c system f r o m t h e WQB-Quebec Reglon w i l l be o f f e r e d .

The S t . Lawrence R l v e r I s one o f t h e m a j o r r i v e r s

o f t h e w o r l d and I s o f p r i m e i m p o r t a n c e t o t h e p e o p l e o f Quebec.

Its

d l v e r s l f i e d b i o l o g i c a l r e s o u r c e s and t h e number a n d ' s i z e o f t h e m u n i c l p a l i t i e s I t supplies w i t h water t e s t i f y t o i t s environmental Importance, j u s t as m u n i c i p a l , a g r l c u l t u r a l and i n d u s t r i a l s t a t l s t i c s b r i n g o u t i t s economic I m p o r t a n c e .

I n Canada, t h e h e a l t h o f t h e

93

St. Lawrence River, which drains one of the most developed hydrographic baslns of the world Is the subject of particular concern (St. Lawrence River Study Committee 1978). In 1971, following an inventory carried out on the Great Lakes, the Canada-Quebec Consultative Committee on water problems set up a task force to review the historical records on the quality of the water In the St. Lawrence River, and to develop programs for its management and use. The St. Lawrence River Network (1977-1981), consisted of 46 stations located between Cornwall and Quebec City. Sampling was carried out six times a year, during the ice free period, for a variety of water quality parameters (Figure 12). The objectives of the program were to provide information on the location. severity, frequency and duration of non-achievement with the water quality objectives set for the various uses of the aquatic resource (Germain and Janson 1984). Analyses of the data indicated that the water quality in the St. Lawrence River varied significantly from one location to another, generally with lower concentrations of most pollutants being found in the high velocity navigation channel. Municipal and industrial effluents resulted in particularly high concentrations along the shore, at or below the discharge outlets. These elevated levels were found to continue far downstream because cross stream mixing took place slowly, due to natural channelization of the river. Specific conductance measurements (due to their conservative nature) were used as a tracer to delineate the impacted areas and define the effects of channelization (Figure 13) As it can be readily seen, mean sampling trip concentrations (2,) for specific conductance In the St. Lawrence River can easily be altered by simply

Figure 12: Station locations for the St. Lawrence River monitoring program, 1977-1981.

94

moving stations from one area to another. Figure 13 also provides an interesting example of a unique problem to lotic systems - e.g. cross stream variability. Thus, not only is it important to sample downstream from the effluent source. but also, to distinguish between the water quality near the shore versus that in the main channel, or from one shore to another. Presently, the 320 kilometre long, Cornwall to Quebec section of the St. Lawrence River has been divided into 23 homogenous zones through the combined application of correspondence analysis and cluster analysis, (Lachance fi d.1979). Though the zonation is based on only three water quality variables (turbidity, inorganic nitrogen and inorganic phosphorus) it brings the water quality monitoring o f the St. Lawrence River a step closer to the rationalization of a statistically sound network, (Lachance et al. 1979. Germain and Janson 1984). Once zonation has been established equation 3 can be modified for a lotic system, where Aa, the area o f the homogeneous zone within the lacustrine system, now becomes Al, the length (or stretch) o f river between the mid points of each strata. Av, the volume could also be used but requires substantially more information than is routinely available from a monitoring program. Again, as in the lacustrine system, estimates of the number o f samples needed to estimate the mean of each strata, within predetermined confidence limlts can be calculated from equation 3. Similarly, as with the lacustrine system, superimposed on the spatial gradient of the lotic system, is the effect of the temporal gradient. Flow variation of the river and its tributaries, climatic conditions and variations in the volume of municipal and industrial wastes are only some of the causes of temporal variability, which complicates environmental aualitv evaluation. SPECIFIC CONDUCTANCE ZONATION PRODUCED FOR THE ST. LAWRENCE RIVER 1977-1981

5225 226-275

=

276-325

3326

Figure 13: Zones formed with specific conductance data on St. Lawrence River, 1977-1981.

I

I I

95

Conclusions

I t I s I m p o r t a n t t o n o t e t h a t m o n i t o r i n g a c t i v i t i e s must be c o n s i d e r e d as The d a t a c o l l e c t e d , w h i c h i s l a t e r

p a r t o f a n o v e r a l l management program.

t r a n s f o r m e d i n t o i n f o r m a t i o n , must n o t o n l y meet t h e immediate l o c a l o r r e g i o n a l needs, I t must a l s o meet f u t u r e and n a t i o n a l needs, ( H e r r i c k s . 1984).

A s a r e s u l t , n e t w o r k d e s i g n must become an i n t e g r a t e d c o - o p e r a t i v e

program o f a l l i n t e r e s t e d p a r t i e s ( c l i e n t s ) t o e n s u r e t h a t b i a s e d , meaningless, o r u n r e l i a b l e r e s u l t s a r e n o t g e n e r a t e d a t e i t h e r t h e l o c a l , regional or national level.

The e s t a b l i s h m e n t o f enough w a t e r q u a l i t y

s t a t i o n s t o e n s u r e t h a t each s t r a t a i n each and e v e r y r i v e r b a s i n w i t h i n Canada i s sampled s u f f i c i e n t l y t o e s t a b l i s h r i v e r mean c o n c e n t r a t i o n s a t predetermined s i g n i f i c a n c e l e v e l s i s n o t p o s s i b l e w i t h t h e present r e s o u r c e s a v a i l a b l e t o t h e F e d e r a l Government.

The WQB, however, has

embarked on an a m b i t i o u s N a t i o n a l Assessment Program, d e s i g n e d t o e n s u r e t h a t s c i e n t i f i c r e s u l t s a r e o b t a i n e d w i t h i n a l l r e g i o n s o f Canada.

In

1982, C a b i n e t p r o v i d e d t h e Department o f t h e Environment w i t h t h e a u t h o r i t y and t h e r e s o u r c e s t o n e g o t i a t e F e d e r a l - P r o v i n c i a l m o n i t o r i n g agreements t o e f f i c i e n t l y implement a comprehensive w a t e r q u a l i t y n e t w o r k ; t o i m p r o v e i n t e r j u r i s d i c t i o n a l assessments and t o address n a t i o n w i d e a q u a t i c e n v i r o n m e n t a l concerns. Development and i m p l e m e n t a t i o n o f t h e Agreements i s based on t h e p r e m i s e t h a t b o t h t h e f e d e r a l and p r o v i n c i a l governments have a r e s p o n s i b i l i t y t o c o l l e c t s c i e n t i f i c a l l y sound w a t e r q u a l i t y m o n i t o r i n g d a t a .

Through t h e

w i s e c o o r d i n a t i o n and i n t e g r a t i o n o f t h e s e m o n i t o r i n g a c t i v i t i e s t h e r e i s a r e a l o p p o r t u n i t y t o e n s u r e e f f e c t i v e use o f e x i s t i n g r e s o u r c e s t o p r o v i d e b o t h f e d e r a l and p r o v i n c i a l governments w i t h a comprehensive p i c t u r e o f water q u a l i t y c o n d i t i o n s . (Haffner i n press). The N a t i o n a l Water Q u a l i t y Assessment Program ( H a f f n e r , i n p r e s s ) c o n s i s t s o f t h r e e interdependent network concepts: 1) a N a t i o n a l I n d e x Network t o p r o v i d e b a s e l i n e d a t a ; t o e s t a b l i s h l o n g

t e r m t r e n d s ; and t o a c t as an e a r l y w a r n i n g system t o h i t h e r t o unknowns a t t h e basin, r e g i o n a l o r n a t i o n a l l e v e l .

i t ) a R e c u r r e n t R i v e r B a s i n Network t o d e t e r m i n e sources and a r e a s o f impact; t o i d e n t i f y e x i s t i n g o r d e v e l o p i n g w a t e r q u a l i t y concerns; t o d e t e r m i n e s t a t i o n l o c a t i o n , sample f r e q u e n c y and parameter l i s t s ; and t o e s t a b l i s h r i v e r b a s i n s p e c i f i c water q u a l i t y o b j e c t i v e s . i i l ) A S p e c i a l S t u d i e s Network t o c o n d u c t i n d e p t h s t u d i e s a t a l o c a l , r e g i o n a l o r n a t i o n a l s c a l e t o address p r i o r i t y i s s u e s . C o - o r d i n a t i o n o f t h e s p e c i f i c t e c h n i c a l d e t a i l s f o r t h e t h r e e networks

w i l l be developed -by R e g i o n a l and Headquarters f e d e r a l s t a f f , and

96

Provincial staff. through a Federal-Provincial Task force. The Regional federal member will be responsible for operation of the networks, In cooperation with the Provincial member. The Federa Headquarters member will be responsible for ensuring the overall compat bility and co-ordination of the program with the nine other Federal-Provincial Task Forces, t o ensure a national perspective. Headquarters will also be responsible for the storage and maintenance of a centralized data system composed of the various federal and provincial data sets, while the National Laboratory in Burlington will ensure compatible analytical results. A wealth of information exists on how to design a specific monitoring network or a research study to provide scientifically sound information to meet a set objective on a specific river reach. Unfortunately. the same cannot be said about the establishment of large scale monitoring programs. Historically, national monitoring networks have consisted of an assemblage of existing independent programs, each designed to provide the proper information base required for the local or regional managerial group. The data files from these independent programs have all been stored and thereby formed a national data archive - but not necessarily a national water quality data bank. This approach has obvious weaknesses with regards to a national program. Therefore i t i s of paramount importance to obtain a concerted effort, both by the federal and provincial managers responsible for water quality, to ensure the coordination of all monitorlng components into a bl-laterally coordinated, efficient and cost-effective manner. Without this cooperation Canada’s National Water Quality Monitoring Program will be reduced to the parable of the elephant and the blind men. The Blind Men and the Elephant It was six men of Indostan T o learning much inclined, W h o went to see the Elephant (Though all of them were blind), That each by observation night satisfy his mind. T h e First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: “God bless! but the Elephant I s very like a wall!”

97 T h e S e c o n d , f e e l i n g o f the t u s k , C r i e d , "Ho! w h a t h a v e we here So v e r y r o u n d and s m o o t h and s h a r p ? T o me ' t i s m i g h t y c l e a r T h i s wonder o f an E l e p h a n t I s v e r y like a spear!" T h e T h i r d a p p r o a c h e d the a n i m a l , And h a p p e n i n g t o t a k e The squirming t r u n k w i t h i n h i s hands. T h u s b o l d l y u p and s p a k e : " I see." q u o t h h e . " t h e E l e p h a n t I s v e r y l i k e a Snake!" The Fourth reached o u t an eager hand. And f e l t a b o u t t h e knee, "What m o s t t h i s w o n d r o u s b e a s t i s l i k e Is m i g h t y p l a i n . " q u o t h h e : " T i s c l e a r e n o u g h the E l e p h a n t Is very like a tree!" T h e F i f t h who c h a n c e d t o t o u c h the e a r . S a i d : " E ' e n the b l i n d e s t man Can t e l l w h a t t h i s r e s e m b l e s m o s t ; Deny the f a c t who c a n , T h i s marvel o f an Elephant Is very like a fan!" T h e S i x t h no sooner had b e g u n A b o u t the b e a s t t o g r o p e , T h a n . s e i z i n g on the s w i n g i n g t a i l That f e l l within his scope, " I see." q u o t h h e , " t h e E l e p h a n t I s very like a rope!" And so these men o f I n d o s t a n D i s p u t e d l o u d and l o n g , Each i n h i s own o p i n i o n E x c e e d i n g s t i f f and s t r o n g , T h o u g h e a c h w a s p a r t l y i n the r i g h t And a l l w e r e i n the wrong!

by J.G. Saxe (1816-1887) REFERENCES

Beeton. A.M., and Edmondson, W.J., 1972. The eutrophication oroblem. J. Fish. Res. Board Can. 29:673-682. Csanady, G.T.. Proceedings 5th i970. Coastal entrapment I n Lake Hydron. International Conference on Water Pollution Research 3:1-7, New York, Pergamon Press. Csandy G.T., 1970. Coastal entrapment in Lake Huron. In Proreedings 5th Internatlonal Conference on Water PollutionResearch 3:1-7. New York. Pergamon Press. Dobson H.F.H.. 1981. Trophlc conditions and trends in the Laurentlan Great Lakes. W.H.O. Water Quality Bulletin 6 ~ 1 4 6 - 1 5 1 , 1 5 8 & 160. Dunnet e, D . A . , 1980. Observed frequency optimization using a water quality Index. Journal WPC F 52:2807-2811.

98

El-Shaarawl. A.H., and Shah, K.R., 1978. Statistical procedures for classlflcatlon of a lake Inland Waters Directorate, DOE, Sclentlflc Serles No. 86. Germaln. A.. and Janson. M . 1984. Quallte des eaux du fleuve Salnt-Laurent d e Cornwall A Quebec (1977-1981) IWO publication. Green, R.H., 1979. Sampllnq Deslsn and Statlstlcal Methods for Envlronmental Bloloqlsts. John Wlley & Sons, Inc. 257 pp. Haffner, G.D. (in press). Water Quality Branch Strategy for Assessments of the quallty of Aquatlc Envlronments. Water Quallty Branch. Inland Waters Directorate, Sclentlflc Serles. Herrlcks. E.E.. 1984. Aspects of monltorlng In river basin management. Wat. S c l . Tech. 16:259-274. Kwlatkowskl, R.E., 1978. Scenarlo for an ongoing chlorophyll survelllance plan on Lake Ontario for non-intenslve sampllng years. J. Great Lakes Res. 4:19-26. Kwlatkowskl. R.E.. 1985. The lmportance of temporal avallabillty to the deslgn of large lake water quallty networks. Accepted for publlcatlon J. Great Lakes Res. 11:462-477. Kwlatkowskl, R.E., and Nellson, M.A.T.. 1983. Lake Ontario survelllance data, 1968-1980. Inland Waters Dlrectorate. DOE. Technlcal Bulletin No. 126. Lachance. M . , Bobee, 6.. and Gouin. D. 1979. Characterlzatlon of the water quallty In the Salnt Lawrence River: Determlnatlon of Homogeneous Zones by Correspondence Analysts. Water Resources Research 15:1451-1462. Mandel, J., 1964. The statlstlcal analvsls of experimental deslqn. John Wlley and Sons, Inc. Nelson. J.D.. and Ward, R.C.. 1981. Statistical conslderations and sampllng technlques for ground-water quallty monltorlng. Ground Water 19:617-625. Rodgers. G.K., 1965. The thermal bar in the Laurentlan Great Lakes. Unlv. Mlch., Great Lakes Research Division, Ann Arbor, MlCh., PUbl. N O . 1 3 ~ 3 5 8 - 3 6 3 . St. Lawrence Rlver Study Commlttee 1976. Rapport du Comlte d'etude sur le fleuve St-Laurent. Environment Canada et le servlce d e la protection d e l'envlronnement du Quebec. L'edlteur offlclel du Quebec, 293 pages. Verslon anglalse s'lntltude. Final Report St-Lawrence River Study Committee, 209 pages. Ward R.C., Loftls, J.C., Nlelsen. K . S . . and Anderson, R.D.. 1979. Statlstlcal evaluatlon of sampling frequencies in monltorlng networks. Journal WPC F 51:2292-2300. Whlt ow, S., and Lamb, M . , 1983. NAQUADAT - Guide t o Interactlve retrleval Inland Waters Directorate. Water Quality Branch. Ottawa Canada. 63 pp.

DETERMINATION OF WATER QUALITY ZONATION IN LAKE ONTARIO USING MULTIVARIATE TECHNIQUES

NEILSON AND R.J.J. STEVENS, Water Quality Branch, Ontario Region, Inland Waters Directorate 867 Lakeshore Road, P.O. Box 5050, Burlington, Ontario, L7R 4A6 M.A.

The surface water qua1 i t y c h a r a c t e r i s t i c s of Lake Ontario were studied d u r i n g 29 c r u i s e s conducted on a monthly b a s i s throughout 1977, 1981 and 1982. El-Shaarawi and Shah's (1978) c l a s s i f i c a t i o n procedure was f i r s t used t o reduce each y e a r ' s multi-cruise information (each cruise sampling 94 s t a t i o n s f o r 14 parameters) t o a single value ( T I f o r each s t a t i o n and year. Principal components analysis was then applied t o these T-values, reducing the multiple parameter l i s t t o 3 f a c t o r scores. Ward's clustering procedure grouped together s t a t i o n s , according t o t h e i r factor scores, which demonstrated similar properties. These groups, o r zones, were then validated u s i n g discriminant analysis. INTRODUCTION The principal objectives of water quality monitoring i n Lake Ontario are t o determine long-term trends i n a variety of parameters and t o examine how the i n t e r r e l a t i o n s h i p s of variables both a f f e c t and respond t o observed changes. E l -Shaarawi and Kwiatkowski (1 977) examined the r e l a t i v e magnitude of both spatial and seasonal v a r i a b i l i t y f o r physical (temperature, transparency) and biochemical ( p a r t i c u l a t e organic carbon and nitrogen, chlorophyll 5 ) parameters i n the surface waters of Lake Ontario and found these parameters t o exhibit highly s i g n i f i c a n t ( p < O . O l ) spatial gradients. While t h e i r analysis permitted the determination of d i s t i n c t water masses i n the lake, i t was limited i n t h a t i t produced a d i s t i n c t zonation pattern f o r each parameter, thereby complicating any attempt t o i n t e r r e l a t e the parameters.

100

The need t o consider water mass distributions from a multivariate perspective was recognized by Moll e t al. (1985) who applied principal canponents and c l u s t e r analyses t o a multivariate data s e t f o r Lake Huron. The lake was regionalized on a cruise-by-cruise b a s i s because i t was expected t h a t homogeneous water masses would not have the same areal distributions throughout the year. Analysis of the s p a t i a1 distribution of water quality parameters i n Lake Ontario supported t h i s (Neilson and Stevens, i n p r e s s ) , however, cruise-by-cruise zonation complicates e f f o r t s t o determine trends i n time as well as seasonal changes w i t h i n any given region. A fixed s t a t i o n design and, consequently, a fixed zonation pattern, has been demonstrated t o be the most s u i t a b l e for determining long-term trends i n water quality parameters as i t eliminates t h e confounding influences of s t a t i o n numbers and placement (El-Shaarawi 1982; van Belle and Hughes 1983.) I l l u s t r a t i o n o f this point i s provided by the controversy surrounding the determi nation of trends i n hypo1 imnetic oxygen d e f i c i t i n Lake Erie (Charlton 1980; Barica 1982; Anderson e t a1 1984; Rosa and Burns 19851, much of which i s a t t r i b u t a b l e t o how the investigators defined a homogeneous water mass when employing a variable station design. Consequently, t o achieve the s t a t e d objectives, i t was necessary t o derive a zonation pattern t h a t would divide the lake i n t o limnologically d i s t i n c t water masses which, a t the same time, would be applicable t o a l l c r u i s e s i n a l l years, thereby maintaining a fixed s t a t i o n design.

Confronted w i t h similar objectives, Leach (1980) used factor and c l u s t e r analyses t o determine d i s t i n c t water masses i n Lake St. Clair. The standardized r e s u l t s of biweekly sampling a t 14 s t a t i o n s were used t o derive the zones, then these zones were superimposed on the data from other years. A similar b u t s l i g h t l y modified approach was adopted i n the current study. Because of the enormity of the database (30,000 data p o i n t s ) and t o canpensate f o r seasonality, prior t o application of principal components and c l u s t e r analyses, i t was necessary t o i n i t i a l l y reduce the r e s u l t s f o r each parameter (29 sampling events i n 3 y e a r s ) t o normalized non-dimensional estimates of the s p a t i a1 e f f e c t s associated w i t h each of the 94 s t a t i o n s sampled. The r e s u l t a n t zonation pattern, validated by use of discriminant analysis, i s t o be used f o r future trend analysis and i n the development of a more e f f i c i e n t sampling design.

101

METHODS

D u r i n g 1977, 1981 and 1982, lakewide cruises were conducted on Lake Ontario a t approximately monthly i n t e r v a l s sampling t h e 94 s t a t i o n s i l l u s t r a t e d i n Figure 2. Nutrient and major i o n samples were collected a t multiple d i s c r e t e depths u s i n g a modified Rosette sampler, however, only values from the l m depth were used i n this study. Chlorophyll 2 (corrected f o r phaeopigments), CHLA, p a r t i c u l a t e organic carbon, POC, and total p a r t i c u l a t e nitrogen, TPN, samples were collected u s i n g a 0-20m integrating sampler designed by the Engineering Section a t the Canada Centre f o r Inland Waters as per Schroeder (1969). Temperature, dissolved oxygen, D02, and s p e c i f i c conductance were measured on s i t e (Carew and Williams 1975). Ammonia, NH3, and n i t r a t e - p l u s - n i t r i t e , N03, samples, f i l t e r e d t h r o u g h 0.45 um Milli-pore f i l t e r s , were a l s o analyzed on board s h i p (El-Shaarawi and Neilson 1984). Samples f o r t o t a l phosphorus, TP, f i l t e r e d chloride, C L , and sulphate, S04, t o t a l f i l t e r e d nitrogen, TFN, a l k a l i n i t y , ALK, and soluble reactive s i l i c a , SRS, weve returned t o the Water Qua1 i ty Branch laboratory i n Burl ington f o r analysis. Detai 1s of preservation and analytical methodology can be found in the Analytical Methods Manual (Environment Canada 1979). Each of the fourteen parameters sampled formed a (94 x 9 ) matrix i n 1977 ( s t a t i o n s x c r u i s e s ) and a (94 x 10) matrix i n 1981 and 1982l. A regression model, developed by El-Shaarawi and Shah (1978), was applied t o each parameter matrix. The model transforms the original observations y i j , corresponding t o the j t h s t a t i o n s d u r i n g the i t h c r u i s e , according t o the equation: (Y;j -1 ) / A , X*O Z..= (1 1 1J I n Yij, X=O (Box and Cox 1964). The method of maximum likelihood i s used t o estimate the value f o r A t h a t best approximates a normal d i s t r i b u t i o n of the variable Z i j . The mean value o f Z i j can be resolved into three additive components: u, the general mean; a i , the e f f e c t due t o the i t h cruise, and b the e f f e c t due t o the j t h sampling station. The j' model then t e s t s the null hypothesis f o r differences between s t a t i o n s using Fisher's F - t e s t ( i e , Ho: b 1 =b 2=...=b 94=O).

1

.......................................................................... Due t o l o g i s t i c a l c o n s t r a i n t s , not a l l s t a t i o n s were sampled during every cruise.

102

A1 though t h e model t h e n continues, zones f o r each o f Kwiatkowski

1980,

c o n s t r u c t i n g s t a t i s t i c a l l y homogeneous

t h e parameters 1984;

s p a t i a l e f f e c t (T-values), t o c o n s t r u c t t h e zones,

(El-Shaarawi

N e i l s o n 1983),

the

and Kwiatkowski

standardized

1977;

estimates o f

which a r e c a l c u l a t e d f o r each s t a t i o n and used

were t h e d e s i r e d end p r o d u c t o f t h i s procedure.

The f o u r t e e n parameter m a t r i c e s f o r each y e a r were t h u s reduced, s t a t i o n ' s mu?t i p l e single spatial

c r u i s e i n f o r m a t i o n h a v i n g been a s s i m i l a t e d

e s t i m a t e (T-value).

The n e t r e s u l t was

a

each

into a

(94 x 1 4 )

( s t a t i o n s x T-values) m a t r i x f o r each year. Principal

components a n a l y s i s was then a p p l i e d t o

u s i n g a minimum eigenvalue o f 1.0,

t h e T-values,

and employing varimax r o t a t i o n as i t

c e n t r e s on s i m p l i f y i n g t h e columns o f a f a c t o r m a t r i x (ie.,

T-values) by

maximizing t h e v a r i a n c e o f t h e squared l o a d i n g s i n each column (Nie e t al.

1975).

From t h e r o t a t e d f a c t o r p a t t e r n m a t r i x ,

(F)

f a c t o r scores

were c a l c u l a t e d as 14

Where tk a r e coefficients

the for

standardized each

k

T-values

parameter.

and

The

fk a r e

factor

the

scores

composite scales r e p r e s e n t i n g t h e t h e o r e t i c a l dimensions the respective factors.

factor

score

produced

are

associated w i t h

The f a c t o r scores f o r each y e a r were subjected t o

Ward's h i e r a r c h i c a l c l u s t e r i n g procedure (Ward 1963; Ward and Hook 1963) t o examine i n d i v i d u a l segmentation schemes.

Ward's procedure was t h e n

a p p l i e d t o t h e combined f a c t o r scores t o d e r i v e a composite zone map. H i e r a r c h i c a l methods do n o t guarantee an o p t i m a l s o l u t i o n i n terms o f t h e c l u s t e r i n g c r i t e r i o n as t h e c l u s t e r s formed do n o t c o n s i d e r a l l p o s s i b l e c a n b i n a t i o n s o f t h e d a t a (Anderberg 1973).

Thus,

the results o f

t h e Ward c l u s t e r i n g procedure on t h e combined f a c t o r s c o r e m a t r i x were employed as a t r i a l composite zone map and d i s c r i m i n a n t a n a l y s i s a p p l i e d . Discriminant

analysis

(Nie

et

al.

1975)

compares

predicted

group

membership w i t h a c t u a l group membership, e m p i r i c a l l y measuring t h e success i n d i s c r i m i n a t i o n by o b s e r v i n g t h e p r o p o r t i o n o f c o r r e c t c l a s s i f i c a t i o n s . The canposite

zone map was

t h e n a p p l i e d t o t h e 1977, 1981 a n d 1982

i n d i v i d u a l f a c t o r score m a t r i c e s and d i s c r i m i n a n t a n a l y s i s again conducted t o determine t h e appropriateness o f t h e composite zone map.

103

To evaluate the efficiency of the proposed zonation pattern i n reducing the s p a t i a l v a r i a b i l i t y f o r each parameter, t h e following formula from El-Shaarawi (1984) was used: EC= (1 -TS. /TS 1x1 00

(3

where TS. i s t h e sum of parameter v a r i a b i l i t i e s exhibited d u r i n g a cruise i n each zone:

w i t h n k = number of s t a t i o n s sampled i n zone k , and TS i s the t o t a l c r u i s e v a r i a b i l i t y f o r t h a t parameter: N

(Xi-X)2

TS =

(5)

i =1 w i t h N = total number of s t a t i o n s sampieu. RESULTS AND D I S C U S S I O N

Table 1 presents the h values used t o transform each of the parameters. For most parameters, the necessary transformation varied from year t o year. For example, the biomass estimator parameters, POC, TPN, and CHLA, were log-transformed i n 1977 and 1981, while i n 1982, a square-root transformation was required. Frequency and timing of sampling, especially w i t h respect t o such phenomena as thermal bar formation, s t r a t i f i c a t i o n and upwelling, a f f e c t the d i s t r i b u t i o n of the data and, consequently, A . For example, the maximum likelihood estimates of X were 3.0, 0.5 and 0.0 f o r C L i n 1977, 1981 and 1982. In 1977, C L was sampled during only 5 cruises (March-June and September) as opposed t o 10 Thus the respective yearly C L means and cruises d u r i n g 1981 and 1982. 2 2 27.33 ( U =2.99), associated variances, u , for the three years were: 26.06 ( U2 = 27.10) and 25.97 ( C 2 =13.30). Having transformed the data, the variance r a t i o , F , was calculated. For a l l years, s i g n i f i c a n t spatial differences a t the 5% level were found i n a l l parameters. Although f o r a l l parameters considered the differences were of the same order of magnitude, conductivity showed the g r e a t e s t s p a t i a l heterogeneity. This then evidences real differences between s t a t i o n s , suggesting t h a t the lake be considered as more than a single homogeneous zone.

104

TABLE 1 .

Maximum likelihood estimates f o r h PARAMETER

1977 -

1981 -

1982 -

Temperature Conductivity

0.5 1 .o 0.5 3.0

0.5 0.5 1 .o 3.0 0.5 3.0

0.5 0.5 2.5 3.0

DO2

ALK CL S04 P oc TPN

3.0 3.0 0.0 0.0

CHLA NO3

0.0

TFN

0.0

NH3 TP S RS

0.0 0.0

1 .o

0.5

0.0 0.0 0.0

1 .o 0.0 0.0 -1 .o 0.0

0.0 1 .o

0.5 0.5 3.5 1.5 1 .o 0.0 -1 .o 0.0

Principal cmponents analysis was applied t o these T-values and the r e s u l t was, f o r each year, 3 f a c t o r s , orthogonal t o each other, which best accounted f o r the variance exhibited by the data. The cumulative percent v a r i a b i l i t y explained by the respective f a c t o r s was 83, 82 and 80 for 1977, 1981 and 1982. In each case, more than half the explained v a r i a b i l i t y could be accounted f o r by t h e f i r s t factor. Table 2 presents the factor score c o e f f i c i e n t matrices f o r each year. As expected, the biomass estimator parameters, POC, TPN and CHLA, always loaded highly on the same f a c t o r , usually i n conjunction w i t h temperature. Similarly, the major ions (ALK, SO4 and C L ) and conductivity scored together on a second factor. Figure 1 i s a graphical presentation of the rotated orthogonal f a c t o r s f o r 1982. Note the high degree of separation between the parameter c l u s t e r s denoting the lack of correlation between them. Dendrograms and t h e associated approximate zonation ( c l u s t e r i n g ) patterns, r e s u l t i n g from application of Ward's c l u s t e r i n g procedure t o each y e a r ' s f a c t o r scores, are presented i n Figures 2,3 and 4 . Note t h a t the zonation patterns described on the s t a t i o n maps are only approximations as, i n some cases, a s t a t i o n was clustered w i t h a zone t o

105

TABLE 2. F a c t o r score c o e f f i c i e n t m a t r i c e s o f t h e 3 f a c t o r s (FI,F2,F3)

f o r 1977, 1981 and 1982.

Temp.

-.306

Conduct,

.912 -

F2

-

.848

-.039 -.469

F3 .240

F1 .655

.667 .176

-.202

.873 - -.083

-.129

-.235

.891

.861 - -.316

-.181 -.lo8

.931

-

.894

.275

- -.165 .937

.904

.250

.940

-

CHLA

NO3

.294

TFN

.136

NH3

-.150

-.608

TP

-.229

.222 .228

S RS

-.695 -.483

-

.893

-.814 -

.043 .321 .659

-

.006 -.080

-.151

-.119

.914 - -.068 -.202 -.009 .305 -.026

-.126

-.216

.E .908 .949 -

.028 -090

.926 - -.lo3

-.010

.951 - -.076

-.045

-

.854

-.117

-.084

.093

.E

.886

.131

.072

.945

.918 _ .

-.073

-

.813 -

.723

-.lo5

.288

.676

-.166

.002

-

-715

-.411

.288

.735

-.311

.169

-.512

.735

.163

-.638

.525

.825 .251

-.158

1. Temp. 2. Cond. 3. TP 4 . DO2

6. TPN 7. POC

8. CHLA 9. ALK 10. NH3 11. so4 12. NO3 13. TFN 14. SRS

Factor 3 F i g . 1.

-

.312

CL

-.195 -.202

-.424

-.458 -.214

-.444 -.007

TPN

- -.300 .728

.151 .796 - -.319

-.221 -.167

- .240

F3 .211

DO2 ALK

.go9 -

.755 -

-.164

F2 -935

.583 ,833 - -.162

-.177

-.535

F3

-.159

-.135

SO4

F2.

F1

-.168

-.217

POC

1982

1981

1977 F1

Rotated f a c t o r score c o e f f i c i e n t s f o r 1982 parameters.

106

(u

In

’R

-

Station

Fig. 2 .

Dendrogram and r e s u l t a n t zonation p a t t e r n based on 1977 f a c t o r s c o r e s .

_d

__a

_-In

L

L

W

ln

0

L

ln V

0

u

c, Y4

E 0

%

7

ln

W

-0 4

L

E

n

W

n

4

c,

E 0 .r

c, N

0

c m

4

E

c, c, ln 3

7

L W E

U

m E

4

L

m 0

L

O

W

U E

m

cn LL

.r

107

108

CD

v)

L W 0

L

v) V

0

t: % 4 (u

!3? S

7

m v)

Q,

0 U

S

n L

Q,

n

3 4

S

0

S

4

w

.C

0 N

c, S m I-'

v)

3

7

01 L

U S m E

m

4 L

S

L 0 U

n W

0)

d

LL

-7

109 which,

geographically,

it

could

not

belong.

For

example,

s t a t i o n 64 was c l u s t e r e d i n t o zone 5 i n 1977 ( F i g u r e 2), i t was necessary t o i n c l u d e i t w i t h zone 6.

c l u s t e r s was grouped

somewhat

have

fixed

subjectively geographical

although

geographically

The f i n a l d e f i n i t i o n o f t h e

determined

as

locations.

the

stations

Consequently,

being was

it

necessary t o c o n t i n u e t h e c l u s t e r i n g procedure u n t i l l o g i c a l geographic d i v i s i o n s c o u l d be formed. the m s t

satisfactory.

For each y e a r , a s i x - c l u s t e r s t r u c t u r e proved Although,

w i t h the exception

of

zone 1,

the

boundaries o f t h e zones were n o t s t a t i c from one y e a r t o t h e next, general p a t t e r n o f d i s t i n c t water masses was e v i d e n t . shallow n o r t h e a s t e r n r e g i o n o f t h e l a k e ,

a

Zone 1 d e f i n e d t h e

t h e K i n g s t o n Basin, w h i l e t h e

deep o f f s h o r e waters were separated i n t o an e a s t e r n (zone 6 ) and western (zone 5) component.

Zone 2

formed

a transitional

zone between t h e

Kingston Basin and midlake and a l s o encompassed t h e e a s t e r n s h o r e l i n e . The

southern

and

western

nearshore

regions

c l u s t e r s , zones 3 and 4, r e s p e c t i v e l y .

emerged

as

2

distinct

Cluster analysis also i d e n t i f i e d

several s t a t i o n s w i t h water qual i t y c h a r a c t e r i s t i c o f nearshore i n p u t s as opposed t o whole l a k e ( r e f e r t o z o n a t i o n maps).

These s t a t i o n s were t h u s

designated as r e f l e c t i n g p o i n t sources and excluded from t h e zones. Because t h e 3 segmentation schemes e x h i b i t e d s i m i l a r p a t t e r n s i n clustering o f the stations,

t h i s suggested combining a l l

( 9 ) i n an a t t e m p t t o produce a composite zone map.

f a c t o r scores

When Ward's procedure

was a p p l i e d t o t h e combined f a c t o r scores, s t a t i o n s 8, 90, 97, 76, 71 and 86 were i d e n t i f i e d as r e p r e s e n t a t i v e o f nearshore source i n p u t s and t h u s isolated,

w h i l e t h e remaining 87 s t a t i o n s were merged i n t o 6 c l u s t e r s ,

hence d e f i n i n g t h e composite zone map shown i n F i g u r e 5. I n an i n v e s t i g a t i o n o f t h e s p a t i a l d i s t r i b u t i o n o f

n u t r i e n t s and

p a r t i c u l a t e o r g a n i c m a t t e r t h r o u g h o u t 1981 and 1982 i n Lake Ontario, N e i l s o n and Stevens

( i n p r e s s ) p r o v i d e d s u p p o r t i n g evidence f o r t h i s

r e g i o n a l i z a t i o n o f t h e lake. Hamil ton-Niagara

They found t h e Kingston Basin, t h e Toronto-

r e g i o n and t h e s o u t h e r n shore between t h e Niagara and

Genesee r i v e r s t o demonstrate d i s t i n c t i v e water qual ity c h a r a c t e r i s t i c s . These

areas

Furthermore,

correspond a

to

west-to-east

our

zones

decreasing

1,

4

and

3,

concentration

respectively. gradient

was

observed f o r a v a r i e t y o f parameters by N e i l s o n and Stevens ( i n p r e s s ) and i n p r e v i o u s s t u d i e s (Shiomi and Chawla 1970;

Gachter e t a l .

19741,

which would e x p l a i n t h e d i v i s i o n o f t h e midlake i n t o a west (zone 5 ) and e a s t (zone 6 ) component.

F i g . 5. Dendrogram and r e s u l t a n t z o n a t i o n p a t t e r n when f a c t o r s c o r e s from 1977, 1981 and 1982 were combined t o form composite zones.

111

Because t h e segmentation scheme derived from the combined f a c t o r scores ( i e . , the composite zone map) was determined i n c o n s i d e r a t i o n o f geographical c o n s t r a i n t s , i t was v a l i d a t e d using d i s c r i m i n a n t a n a l y s i s . The s t a t i o n s were assigned t o their r e s p e c t i v e zones and, using the f a c t o r scores a s d i s c r i m i n a t i n g v a r i a b l e s , the percent s t a t i o n s c o r r e c t l y c l a s s i f i e d was c a l c u l a t e d . Discriminant a n a l y s i s showed t h a t the composite zone map was indeed the optimal r e g i o n a l i z a t i o n a s 100% of t h e s t a t i o n s were c o r r e c t l y c l a s s i f i e d . Table 3 1 i s t s the standardized canonical discriminant function c o e f f i c i e n t s f o r t h e f o u r s i g n i f i c a n t functions, as determined by Wilks' lamda, used i n the a n a l y s i s . These 4 functions were a b l e t o account f o r 99% of the v a r i a b l i l i t y i n the f a c t o r scores. The f i r s t f a c t o r scores of each y e a r , and the t h i r d f a c t o r score of 1977, were t h e primary determinants of functions 1 , 3 and 4 while function 2 appeared a combination of many f a c t o r scores. Cross-referencing the

results of t h e p r i n c i p a l components a n a l y s i s (Table 2 ) discriminant f u n c t i o n 1 primarily represented 1981 POC,

revealed t h a t TPN and CHLA

d i s t r i b u t i o n s , function 3 was composed of 1977 conductivity and ion and

TABLE 3. Standardized canonical discriminant f u n c t i o n c o e f f i c i e n t s f o r the 4 s i g n i f i c a n t f u n c t i o n s r e s u l t i n g from discriminant a n a l y s i s of the canbined f a c t o r s c o r e s (FS).

FUNCTION 1 1977

1981

1982

FS1 FS2 FS3 FS1 FS2 FS3

.34982

FUNCTION 2

FUNCTION 3 .97442

- .08233 - .127 44

.157 42

1 .01605

-.16822 .16137 .03854

-.23994 .46603 32237

.07521 .53193 .26336

-.42521

.27212

.40318 .20713 .35664

-.091 43

-.02814 .33614

- .73885 - .04 975

FS1

-.13944

FS2

-.01110

FS3

.51483

-.12275 .48973

FUNCTION 4

.72198

- .38744

-.4546 2

-.

.25323

-.34552 -.27403

112

1982 POC, TPN and CHLA d i s t r i b u t i o n s , w h i l e f u n c t i o n 4 was almost e x c l u s i v e l y r e f l e c t i v e o f 1977 TP and NH3 l e v e l s . Figure 6 i s a s c a t t e r p l o t , d e f i n e d by t h e f i r s t two d i s c r i m i n a n t f u n c t i o n s , separation

amongst

zones,

indicating

that

the

factor

showing t h e scores

weve

successful d i s c r i m i n a t o r s .

8-

C

-I-

0

4

c

4-

3 LL

3

+

CT a C .-

.-E 5 .-tn

n

-f

4

.-0

3 3 : 3

3

0-

2

22 2 0222

11

1.

11 1

4

1

4

* 4 4 4

4.

4

5 5 5 5 555 5*855 5555 5 5 55

-4 -

Canonical Discriminant Function 1

F i g . 6.

S c a t t e r p l o t d e f i n e d by t h e f i r s t 2 canonical d i s c r i m i n a n t

f u n c t i o n s which r e s u l t e d from a n a l y s i s o f combined f a c t o r scores.

When t h e c a n p o s i t e zone map was t h e n imposed on t h e f a c t o r scores o f the i n d i v i d u a l years,

d i s c r i m i n a n t a n a l y s i s r e v e a l e d t h a t 74, 90 and

90 percent o f t h e s t a t i o n s i n 1977, correctly

classified.

misclassified,

When

a

1981 and 1982,

station

is

respectively,

designated

i t demonstrates a h i g h e r p r o b a b i l i t y o f

being

belonging t o a

zone o t h e r t h a n t h a t t o which i t was o r i g i n a l l y assigned. p r e d i c t e d zone d i d n o t correspond t o t h e a c t u a l zone (ie.,

as

were

Where t h e

t h e group f o r

which t h e s t a t i o n showed t h e g r e a t e s t p r o b a b i l i t y o f membership),

the

113

second highest probability zone was, i n almost a l l cases, the originally proposed zone. In this study, the g r e a t e s t percent of misclassifications This i s n o t unexpected due t o the occurred between zones 5 and 6. variation i n areal extent of these two midlake zones d u r i n g the three years ( r e f e r t o Figures 2,3 and 4 ) . Prior t o t h i s study, Lake Ontario had been subjectively divided i n t o 17 zones (Figure 7 ) on the basis of basin geomorphology, location of nearshore inputs and the summer epilimnetic c i r c u l a t i o n patterns (International J o i n t Commission 1977). To investigate whether t h e approach adopted f o r this study produced a superior zonation pattern, discriminant analysis of the f a c t o r scores was repeated u s i n g these 17 zones (herein referred t o as the IJC zones). Results of t h i s analysis indicated t h a t t h e r e was l i t t l e d i f f e r e n t i a t i o n between zones and only 47, 61 and 49 percent of the s t a t i o n s i n 1977, 1981 and 1982, respectively, were correctly c l a s s i f i e d . When t h e actual group membership was canpared w i t h the predicted group membership, IJC zone 1 showed a high percent of c o r r e c t c l a s s i f i c a t i o n s . All s t a t i o n s assigned t o IJC zones 9, 10, 1 2 and 15, however, consistently demonstrated higher probabilities of belonging t o other zones, suggesting t h a t , based on t h e chosen discriminating variables, these zones were i n f a c t not d i s t i n c t water masses. Similarly, only half t h e s t a t i o n s i n IJC zone 17 showed highest probability of membership f o r this zone; the remaining s t a t i o n s were categorized w i t h up t o 7 other zones.

F i g . 7.

Seventeen IJC zones previously used t o divide Lake Ontario.

114

TABLE 4.

Percent explained variab lity ( E C ) for the (6) composite zones of Lake Ontario. (a)

1977

Percent explained v a r t a t i o n f E C )

07/1807/22

08/1508/19

09/1209/16

10/1010/13

11/1411/17

Parameter

03/1603/20

04/1204/15

05/0905/13

06/0606/10

Temperature

58.4

63.3 52.6

40.4

37.2

26.6

10.7

12.6

5.4

45.8 62.3

73.3

Conductivity

50.0

38.1

31.3

31.5

18.1

14.1

44.1

49.9

61.3

6.2

14.4

40.4

36.9

48.1

18.1 52.7

DO2

36.1

24.7

42.2

64.9

ALK

10.3

42.5

28.1

21.2

NO3

41.9

28.7

60.7

61.4

22.9

SR S

30.5

30.1

59.9

73.0

47.2

TP

17.6

17.3

38.4

15.3

TFN

18.5

57.7

48.8

NH3

9.3 38.6

60.8

22.8

21.4

CL

40.9

75.0

51.6

SO4

40.6

60.1 62.3

59.5

47.8

POC

33.2

62.7

56.4

TPN

29.5

65.9

58.0

CHU

19.6

60.3

03/1603/20

W/O60411 0

Temperature

71.1

45.6

Conductivity

20.7

16.8

DO2

76.3

44.9

ALK

11.8

35.4 6.0

NO3

73.8

80.9

SR S

64.8

63.8

TP

9.4

TFN

22.9 15.6

(b)

32.7

17.8

13.9

16.7 25.4 40.7

23.3

68.6

41.2 29.1

55.3

16.7

37.1

11.3

39.4

56.2

8.9

34.9

15.1

41.1

56.5

55.4

42.8

47.9

15.2

27.4

61.1

58.6

W/2705/01

05/1905/22

06/1506/19

0711307/17

0811008/14

10/0510/09

1111611/20

12/0712/11

53.3

74.0

29.7

16.9

45.5

20.6

56.8

67.9

63.3

43.9 27.9

32.2

25.9

28.9

61.2

41.2

25.6 43.9

28.1 29.1

33.7 56.3

20.1 49.5

-

27.4

9.5 41.9 36.4

40.4

39.7 39.7 34.1

44.0

42,8

1981

NH3

37.8 39.7

CL

53.7

SO4

16.8

29.4

46.9 7.2 50.9

23.9

39.8

-

-

-

37.9 39.8

69.7

66.9

34.2

21.0

69.1 29.0

37.1

25.0

38.4 20.5

-

49.8 29.7

-

POC

62.8

68.6

65.0

74.9

5.0

12.8

21.6

6.7

29.1

55.0

TPN

63.9

58.3

59.2

75.0

6.0

12.0

22.0

1.5

24.1

61.3

CHLA

61.1

55.2

67.2

67.3

4.7

17.1

37.6

6.6

9.9

52.1

031080311 2

03/2904/02

W/2604/30

05/1705/21

06/1406/18

07/120711 6

0811 608/20

09/1309/17

10/1210/17

11/1511/19

56.3 29.7

54.3 28.5

41.0 40.1

49.3 33.2

64.6 22.9

56.8 51.7

30.2 42.2

61.0 28.3

19.9 49.7

78.4 39.9

54.6

52.6

62.0

49.7

59.3

13.8

(c)

1982

Tempe r a t ure Conductivity

DO2

24.4

65.0

37.8

ALK

24.8

45.2

46.5

NO3

5.0

41.2

74.3

SR S

TP

46.4 49.7

59.4 36.0

73.9 43.7

TFN

15.0

53.9

44.6

NH3

46.7 44.7

32.6 72.9

46.7 74.6

POC

16.2 23.6

23.2 55.1

70.3 55.2

TPN

31.2

,61.4

CHLA

23.2

69.4

CL

SO4

-

60.1

-

33.8 13.8

25.8

66.0 37.5

47.1

91.0

41.9

50.3 41.1

78.2

26.1

-

41.9 31.3 76.3 66.5 51.2 57.4 39.6 64.0

13.1

36.5 11.3

33.8

27.6

16.1 16.9

7.2

57.9 42.1

36.4

-

-

56.6

53.5

56.3

58.1

48.6

30.4

20.6

32.2

10.5

47.0

48.7

47.2

57.3

33.3

43.5

21.2

11.2

44.4

115

This then indicates t h a t the proposed composite zone map i s indeed a more e f f e c t i v e scheme f o r identifying water masses of similar water quality. For each cruise, a measure of the effectiveness of the composite zones i n reducing the t o t a l spatial v a r i a b i l i t y demonstrated f o r each parameter, EC, i s presented i n Table 4. The values of EC l i e between 0 and 100, and the closer this value i s t o 100, the more effective the zonation pattern.

For most of

the parameters measured

there d i d not appear t o be a d i r e c t relationship between time of sampling and EC, although the percent variation explained by the zones was generally higher i n spring (April/May).

T h i s suggests t h a t the composite Hence, the e f f e c t of zones have optimal application in the s p r i n g . spatial v a r i a b i l i t y on the accurate determination of trend i s greatly reduced by incorporating t h i s zonation scheme i n t o the a n a l y s i s of s p r i n g data. Having now determined

a

c l a s s i f i c a t i o n scheme which

identifies

d i s t i n c t water masses of similar water quality i n Lake O n t a r i o , the next step will be t o use these zones t o derive the m i n i m u m number of s t a t i o n s which need t o be sampled, and t h e i r distribution over t h e lake, t o obtain the same efficiency as t h a t obtained using the sampling strategy outlined i n t h i s study.

REFERENCES Anderberg, M.R. 1973. Cluster analysis f o r applications. Academic Press. New York. 359P. Anderson, J.E., A.H. El-Shaarawi, S.R. Esterby and T.E. Unny. 1984. Spatial and temporal v a r i a b i l i t y of dissolved oxygen i n Lake Erie, p. 103-130. In/A.H. El-Shaarawi (ed.) S t a t i s t i c a l assessment of the Great Lakes Surveillance Program, 1966-1981, Lake Erie. Env. Can., IWD S c i e n t i f i c Series No. 136. Barica, J . 1982. Lake Erie depletion controversy. J. Great Lakes Res. 8 ( 4 ) :71 9-722. Box, G.E.P. and D.R. Cox. 1964. An Analysis o f Transformations. J.R. s t a t i s t . SOC. B. 26:211-243. Carew, T.J. and D.J. Williams. 1975. Surveillance Methodology. Inland Waters Directorate. Technical Bulletin No. 92. 28p. Charlton, M.N. 1980. Oxygen depletion i n Lake Erie: Has t h e r e been any change? Can. J. F i s h . Aquat. Sci. 37:72-81. El-Shaarawi, A.H. 1982. Sampling strategy f o r estimating bacterial density i n large lakes. J. francais d'hydrologie 13:171-187. El-Shaarawi, A.H. 1984. Temporal changes i n Lake Erie. p. 27-102. In /A.H. El-Shaarawi (ed.) S t a t i s t i c a l assessment o f t h e Great Lakes Survei 11 ance Program, 1966-1 981 , Lake Erie. Environment Canada. IWD S c i e n t i f i c Series No. 136.

116

El-Shaarawi, A.H. and R.E. Kwiatkowski. 1977. A model t o d e s c r i b e the inherent s p a t i a l and temporal v a r i a b i l i t y of parameters i n Lake Ontario 1974. J . Great Lakes Res. 3 (3-4): 177-183. El-Shaarawi, A.H. and M.A. Neilson. 1984. Changes i n nutrient l e v e l s o f l a k e water s t o r e d a t 4'C. Can. J . Fish. Aquat. S c i . 41(6):985-988. El-Shaarawi, A.H. and K.R. Shah. 1978. S t a t i s t i c a l procedures f o r c l a s s i f i c a t i o n of a lake. Inland Waters D i r e c t o r a t e S c i e n t i f i c S e r i e s No. 86. 9p. Environment Canada, 1979. Analytical Methods Manual. Water Qual i t y Branch. Inland Waters D i r e c t o r a t e . Ottawa. Canada. Gachter, R., R.A. Vollenweider and W.A. Glooschenko. 1974. Seasonal v a r i a t i o n s of temperature and nutrients i n the s u r f a c e waters of l a k e s Ontario and Erie. J . Fish. Res. Board Can. 31:275-290. International J o i n t Commission. 1977. Great Lakes Water Qual i ty Board. Appendix B. S u r v e i l l a n c e Subcommittee Report. 11 Op. Kwiatkowski, R.E. 1980. Regionalization of the Upper Great Lakes w i t h respect t o s u r v e i l l a n c e e u t r o p h i c a t i o n data. J . Great Lakes Res. 6 :38-46. Kwiatkowski, R.E. 1984. Comparison of 1980 Lake Huron, Georgian Bay-North Channel surveillance data w i t h historical data. Hydrobiol. 11 8 ~255-266. Leach, J.H. 1980. Limnological sampling i n t e n s i t y i n Lake S t . C l a i r i n r e l a t i o n t o d i s t r i b u t i o n of water masses. J. Great Lakes Res. 6 ( 2 )~141-145. Moll, R.A., R. Rossmann, D.G. Rockwell, and W.Y.B. Chang. 1985. Lake Huron intensive survey, 1980. Special r e p o r t no. 110. Great Lakes Research Division. Great Lakes & Marine Waters Center. University of Michigan, Ann Arbor, Michigan. Neilson, M.A. 1983. Trace metals i n Lake Ontario, 1979. Inland Waters Directorate. S c i e n t i f i c S e r i e s No. 133. 13p. Neilson, M.A. and R.J.J. Stevens. In Press. Vertical and horizontal d i s t r i b u t i o n of nutrients and p a r t i c u l a t e organic m a t t e r i n Lake Ontario - 1981, 1982. Can. J. F i s h . Aquat. Sci. Nie. N.H..G.H. Hull. J.G. Jenkins. K.Steinbrenner. and D.H. Bent. . 1975. S t a t i s t i c a l Package- f o r the S o c i a i Sciences. McGraw-Hill. New York 6 / 3 0 Rosa, FT and N.M. Bhrns. 1985. Lake Erie Central Basin oxygen depletion changes from 1929-1 980. Environment Canada. NWRI Contribution # 85-1 02. Shiomi, M.T. and V.K. Chawla. 1970. Nutrients i n Lake Ontario. Proc. 13th Conf. Great Lakes Res. 715-732. Schroeder, R. 1969. E i n summierander wasserchopfer. Arch. Hydrobiol. 66~241-243. van B e l l , G. and J.P. Hughes. 1983. Monitoring f o r water q u a l i t y : f i x e d s t a t i o n s versus intensive surveys. J . Water P o l l u t . Control Fed. 55: 400-404. Ward, J r . , T H . 1963. Hierarchical grouping t o optimize an o b j e c t i v e function. J . Amer. S t a t i s t . Assoc. 58(301 ):236-244. Ward, J r . , J.H. and M.E. Hook. 1963. Application of an h i e r a r c h i c a l grouping procedure t o a problem of grouping p r o f i l e s . Educ. and Psycho1 Measurement 23(1 ):69-82.

.

SPATIAL VARIABILITY IN THE WATER QUALITY OF QUEBEC RIVERS

MARC SIMONEAU

Ministere de 1’Environnement du Quebec, Direction des releves aquatiques, rue Marly, 5e etage, Sainte-Foy (Quebec), Canada, G1X 4E4

3900,

ABSTRACT The spatial variability of the water quality of Quebec rivers was studied These data, which using data collected over a five-year period (1979-1983). were obtained through the operation of a monitoring network, were analyzed using multivariate analytical methods. A principal component analysis (PCA) was used to condense the information contained in the data matrix and to identify the physical chemical parameters responsible for the major portion of the among stations variance. Furthermore, a cluster analysis (using the squared Euclidean distance as similarity coefficient and Ward‘s method as an agglomerative hierarchical clustering algorithm) was performed to reveal the presence of homogeneous water quality regions in the province. The PCA produced two significant principal components wh ch explain 76 percent of the among stations variance. The first axis 51 percent) represents a mineralization and nutrient gradient whereas the second (25 percent) The cluster analysis has represents an organic content and color gradient revealed the presence of six distinct groups, whose water quality reflects the geology of the different basins and, to various degrees, the anthropogenic activities. One of these groups is composed of twelve problem-stations, most of which found in drainage basins with small surface area affected by local point source pollution. INTRODUCTION Water quality monitoring of Quebec rivers is an activity that w a s begun in 1967 with the creation, by the ministere des Richesses Naturelles, of a water quality network. As was the case in many other countries, a basic knowledge

118

119

of the water quality (and its spatial and temporal evolution) was found to be lacking at the time, and was needed in order to solve the problems caused by the utilization of the water resources in the province of Quebec. Since the early days, the water quality network has undergone many modifications in order to improve the quality of the collected data and to simplify its operation. Among those were the systematization of the sampling procedure, a change in the conservation method of the samples, a decrease in the delay between sampling and laboratory analyses and finally, the use of better analytical methods. Over the years, the measured parameters list became more extensive and, in addition to the major ions and physical parameters, went to include the nutrients and trace metals. Furthermore, changes took place in the sampling sites list, as new stations were added and old ones suppressed or relocated closer to the mouth of the rivers. The progressive development of the network was brought about by a change in the objectives which became more precise. The initial goal of obtaining a basic knowledge of water quality was found insufficient and too general, and therefore was replaced. The new goals became the characterization of the water quality of rivers on an annual or seasonal basis; the study of the spatial variability o f Quebec rivers; the study of the temporal evolution of the measured parameters and the prediction of their long term trend; and finally, the creation for potential users of an adequate water quality data bank. The current edition of the river water quality network, now operated by the Direction des releves aquatiques du ministere de l'Environnement, i n existence since the end of 1978 (Goulet

1979),

i s the r e s u l t of the

recommendations formulated by Bobee et al. (1977) in their thorough evaluation of the network activities. These authors studied the data collected between 1967 and 1975 and proposed corrective measures. This paper uses the data collected between January 1979 and December 1983 to study the spatial variability in the water quality of Quebec rivers. The objectives of the study are to identify the parameters responsible for the observed among stations variance, and to detect the presence of homogenous water quality regions in the province.

STUDY AREA The river network is composed of 136 stations located in 81 drainage basins, south of -latitude 52O (Fig. 1). The sampled rivers come from eight

Fig. 1. Position of the sampling stations used ir: monitoring the water quality of Quebec rivers.

120

of the ten hydrological regions of the province of Quebec. The choice of the rivers was based on the surface area and the population density of their drainage basin (Goulet 1979). Minimum values were set in both cases in order to eliminate the small basins (< 400 km2) and the quasi-uninhabited northern basins ( < 500 inhabitants). The presence of polluting industries or mines, and the importance of lumbering and farming on the basins were also considered in the selection of the rivers. Additional sampling stations were required in areas where socio-economic activities were more intense or dispersed, to better evaluate the effects of these activities and to observe how water quality varies along the different reaches of the same river.

S a m 1 i nq Drocedure The water samples collected at the different stations come from two distinct sources: the observers and the technicians of the Ministry. The observers are inhabitants living close to the sampling stations. They are recruited, trained and paid by the Ministry to collect f o r t n i g h t l y a water sample and to send it to the laboratory where the chemical analyses are performed. This group, which samples 113 stations, collects the major portion of the samples ( > 80 percent). Furthermore, they are asked to report to the Ministry any emergency situation that might arise on the river (spills, fish kills, etc.) so that immediate action may be taken. The rest of the samples are collected by the technicians of the Ministry on the same rivers sampled by the observers and at exactly the same location but on a seasonal basis. They also sample exclusively 23 other stations on a seasonal or monthly basis. In addition to the routine water sample collection, they perform some field measurements and take additional samples for the analyses of particular parameters and for occasional bioassays. The water samples collected by both the observers and the technicians, are depth-integrated grab samples. They are obtained by sinking a sampling iron at a constant rate over the water column and retrieving it after the desired depth has been reached. Sampling takes place on a bridge in the middle o f the river bed. The water samples contained in polyethylene bottles are kept refrigerated, and are sent to the laboratory in an insulated shipping box with ice-packs. The samples are usually received by the laboratory within a 24-hour period.

121

Laboratory a n a l y s i s the

All

chemical

analyses

were

performed by

the

laboratory

of the

m i n i s t e r e de 1’Environnement du Quebec (Complexe S c i e n t i f i q d e , 2700 rue E i n s t e i n , Sainte-Foy, Quebec, G l P 3W8). The analyzed parameters included t h e major

and minor

are

ions,

the

nutrients,

the

The complete parameter 1is t

parameters. shown

in

1.

Table

The

methods

trace

and t h e i r used

determination are described i n Longpre e t a l .

in

metals

and

physical

measurement frequencies performing t h e

chemical

(1982).

Data anal v s i s The

raw

data

matrix

used

in

the

present

study

contained

all

the

measurements obtained f o r 36 parameters a t 134 sampling s t a t i o n s between 1979 and 1983

(Fig.

Two s t a t i o n s were removed a t t h e onset o f t h e a n a l y s i s

2).

because they were n o t sampled over t h e whole f i v e - y e a r period. was

synthesized by

computing,

value f o r t h i s t i m e period. The new data m a t r i x was twelve parameters o n l y were chosen, as discussed l a t e r , s t a t i s t i c a l analyses. sulfate, iron, t o t a l

T h i s data set

f o r each parameter and by s t a t i o n ,

a median

f u r t h e r reduced as f o r t h e subsequent

These v a r i a b l e s were calcium, magnesium, c h l o r i d e , nitrogen, t o t a l phosphorus, t o t a l organic carbon,

tannins and l i g n i n s , t u r b i d i t y , a l k a l i n i t y and pH. The f i r s t a n a l y s i s performed was a p r i n c i p a l using t h e c o r r e l a t i o n m a t r i x between the twelve point.

component a n a l y s i s (PCA), parameters as a s t a r t i n g

The c o r r e l a t i o n m a t r i x (standardized data) was chosen i n s t e a d o f t h e

covariance

matrix

(centered data)

because t h e parameters selected

f o r the

a n a l y s i s had d i f f e r e n t magnitudes, ranges and scales o f measurement which, i f n o t taken i n t o account, would have given more weight t o c e r t a i n v a r i a b l e s due e n t i r e l y t o t h e i r r e s p e c t i v e variance (Legendre e t Legendre, 1983; W h i t f i e l d , 1983). This p a r t i c u l a r type o f o r d i n a t i o n transforms a data s e t c o n t a i n i n g n observations (samples) on p v a r i a b l e s (physical chemical v a r i a b l e s ) i n t o a reduced data s e t containing n observations on k

of

i n f o r m a t i o n caused t h e parameters

by

accounting

some

manner t h a t minimizes

t h e r e d u c t i o n (Green, f o r t h e major

1979).

portion

of

The PCA t h e among

s t a t i o n s variance. The second a n a l y s i s used i n t h e study was a c l u s t e r i n g procedure, t h e purpose o f which was t o produce groups o f s t a t i o n s w i t h s i m i l a r water quality.

Data were standardized p r i o r t o

the calculation o f

a similarity

TABLE 1

L i s t o f variables measured in the water samples along with their sampling frequencies. MEASUREMENT FREQUENCY BSERVERS

(13 PER YEAR)

EVERY 4 WEEKS PH ALKALINITY COLOR TURBIDITY T A N N I N S AND L I G N I N S FLUORIDE SILICA

SULFATE CHLOR IOE CALCIUM MAGNESIUM SODIUM POTASSIUM

IRON MANGANESE COPPER ZINC LEAD CADMIUM

1

I

NICKEL CHROMIUM ARSENIC

I 1

( 2 5 PER YEAR)

EVERY 2 WEEKS TEMPERATURE CONDUCTIVITY

ECHNIC I A N S

CARBON - TOTAL - INORGANIC

NITROGEN ( D I S S O L V E D ) - KJELDAHL - AMMONIA - NITRATE t N I T R I T E

PHOSPHORUS - TOTAL D I S S O L V E D - TOTAL P A R T I C U L A T E (MONTHLY FOR 6 S T A T I O N S )

SEASONAL SAME PARAMETERS A S ABOVE

D I R E C T MEASUREMENTS - D I S S O L V E D OXYGEN - DH - CONDUCTIVITY - TEMPERATURE

NONFILTRABLE RESIDUES

CYANIDES

TOTAL I N O R G A N I C PHOSPHORUS

ALUMINUM - TOTAL - DISSOLVED BIOASSAYS (SOME STAT IONS )

OCCASIONAL SILVER BARIUM

COBALT LITHIUM

SELENIUM STRONTIUM

OTHFR T O X I C A N T S

123

RAW DATA MATRIX

(36

PARAMETERS X 21906 SAMPLES)

MEDIANS MATRIX

(36

PARAMETERS X 134 STATIONS)

DATA REDUCTION

SELECTION OF PARAMETERS

1

MEDIANS MATRIX

(12 PARAMETERS X 134 STATIONS)

ORDINATION (PCA)

CLUSTER ANALYSIS

CORRELATION MATRIX

DATA STANDARDIZATION

FACTOR PATTERN

SQUARED EUCLIDEAN DISTANCE

1

t

i-

PRINCIPAL COMPONENTS SCORES

WARD'S METHOD

J

CLUSTERS WITH SIMILAR WATER QUALITY STATIONS SUPERIMPOSITION

F i g . 2.

Diagram showing the steps followed in the data analysis.

coefficient, the squared Euclidean distance. This step was necessary because the Euclidean distance does not have a maximum value. It increases with the number of parameters selected and is affected by the original scales of the parameters (Legendre et Legendre, 1983). Ward's method was used as the agglomerative hierarchical clustering algorithm. The results of the cluster analysis were then superimposed on the plot of the principal components scores to show the exact relationships between the objects (stations). Both the PCA and the clukter analysis were performed using SAS programs (SAS Institute Inc., 1982).

124

RESULTS AND DISCUSSION As o f t e n i s parameters

the

chosen

case

with

physical

chemical

f o r t h e s t a t i s t i c a l analyses

c o n c e n t r a t i o n d i s t r i b u t i o n s over

time

variables,

were

and over

most

found t o

stations.

have skewed

Consequently, we since i t i s

used t h e median as e s t i m a t o r o f t h e c e n t r a l tendency o f t h e d a t a n o t a f f e c t e d as much as t h e mean by extremely h i g h values

o f the

.

No attempt was made t o f i l t e r o u t t h e temporal e f f e c t s s i n c e most s t a t i o n s were sampled

on a r e g u l a r

month o f t h e y e a r .

b a s i s and d a t a

were o b t a i n e d f o r each

a seasonal b a s i s i n o r d e r t o o b t a i n d a t a which showed t h e o f water q u a l i t y . month p e r i o d , the r i s k

of

unusual

Furthermore,

they take getting

annual v a r i a b i l i t y

s i n c e d a t a used i n t h i s study covered a 60-

i n t o account i n t e r - a n n u a l v a r i a b i l i t y

non-representative

hydrological

Consequently,

and every

The s t a t i o n s sampled by t e c h n i c i a n s o n l y , were v i s i t e d on

events

which

water

could

quality

prevail

d a t a used i n t h e present study g i v e

and decrease

data on

imputable t o

a

given

a reliable

image

year. o f the

water q u a l i t y o f each s t a t i o n ( r i v e r o r r i v e r reach). perform t h e PCA and t h e

The v a r i a b l e s s e l e c t e d t o chosen

as t o o f f e r a general image o f

so

considerable logically,

v a r i a t i o n between s t a t i o n s independent o f

each

t h e water and most

other.

cluster

a n a l y s i s were

quality.

They showed

of

them were,

at least

Furthermore, these v a r i a b l e s c o u l d

r e f l e c t t h e g e o l o g i c a l and l a n d use e f f e c t s on water q u a l i t y . P r i n c i p a l comDonent a n a l y s i s

A rivers

f i r s t PCA, performed on t h e 134 s t a t i o n s , has r e v e a l e d behaved v e r y d i f f e r e n t l y from t h e o t h e r s . I n order

t h a t twelve to

avoid a

d i s t o r t i o n o f t h e s p a t i a l v a r i a b i l i t y image, these s t a t i o n s were removed from the

data The

set.

PCA,

Their

water q u a l i t y w i l l

conducted on

be

t h e remaining 122

discussed l a t e r . stations,

components w i t h eigenvalues equal t o o r g r e a t e r than one. t h e broken s t i c k model ( F r o n t i e r 1976), our study,

has

produced t h r e e

However, based on

o n l y t h e f i r s t two are considered i n

s i n c e t h e percentage o f variance e x p l a i n s by

t h e t h i r d component

they are s i m i l a r , s i n c e t h e y may be a p a r t on a t h i r d o r f o u r t h component. solve t h i s problem,

t h e r e l a t i o n s h i p s between s t a t i o n s were s t u d i e d

To

using a

cluster analysis. The c l u s t e r a n a l y s i s produced f i v e

d i s t i n c t groups o f

same t w e l v e v a r i a b l e s s e l e c t e d f o r t h e PCA. the

c l u s t e r analysis

identified five

groups

on

the of

principal

stations

stations using the

By superimposing t h e r e s u l t s o f components

scores

and t h e r e s p e c t i v e

(Fig.

4), we

position

o f these

125 PC

1 ,c

(2!

0.E

TA\

+ z Y

: 0 0.0 0

0

w

ul

-0.5

‘PH

-l,o 0.0 FIRST COMPONENT

-0.5

1.0

0.5

Fig. 3. Projection o f the twelve descriptor axes in t h e reduced plane formed by the first two principal components. also drawn is the equilibrium circle o f contribution ((d/n)1/2 = (2/12)1/2 = 0.41).

II

F

- 1.

d

-2-

-1

0

1

2

3

FIRST COMPONENT

Fig. 4. Superimposition o f the cluster analysis results on the principal components scores (position o f t h e stations in t h e reduced plane).

TABLE 2

Ranges o f station median concentrations within the groups revealed by the cluster analysis.

Variable

Calcium (mg 1-1) Magnesium (mg 1-1) Chloride (mg 1-1) Sulfate (mg 1 - 1 1 Iron (mg 1-1) Total nitrogen (mg 1-1) Total phosphorus (mg 1 Total org. carbon (mg 1 Tannins and lignins (mg Turbidity (NTU) Alkalinity (mg 1-1) PH

Group

6a (n=2)

1 (n=48)

2 (~31)

3 (n=13)

4 (n=23)

(n=7)

1.20-8.10 0.30-1.35 0.2-4.0 1.0-8.5 0.03-0.57 0.10-0.49 0.010-0.040 6.0-15.5 0.60-2.30 0.4-5.2 1.8-17.0 6.10-7.30

4.eO-23.80 1.40-5.00 1.1-32.0 5.6-30.2 0.15-0.86 0.34-0.90 0.020-0.110 7.5-14.0 0.60-1.85 1.5-10.0 12.0-50.0 7.00-7.60

17.00-36.95 2.90-12.00 12.6-46.5 10.1-31.O 0.32-0.68 0.84-2.07 0.060-0.370 10.2-17.0 0.60-1.20 3.8-20.0 44.0-97.0 7.20-7.80

11.25-35.50 2.00-6.60 1.1-11.0 5.0-19.2 0.01-0.35 0.20-0.64 0.010-0.080 5.0-10.8 0.10-0.70 1.0-6.0 42.0-93.O 7.50-7.80

7.10-29.00 1.50-4.30 1.7-12.2 6.0-19.8 0.39-1.97 0.42-0.77 0.040-0.120 15.5-23.0 1.50-3.26 6.0-17.5 11.0-60.0 6.70-7.70

6.0-20.0 1.5-3.7 0.04-0.19 0.06-0.70 0.22-1.01 0.38-1.51 0.029-0.127 0.030-0.190 4.0-8.8 5.0-20.5 10.0-30.0 3.5-6.5 185.0-328.5 20 .O-49.0 60.0-142.0

1.4-7.8 0.3-1.6 0.01-0.06 0.01-0.08 0.10-0.36 0.03-0.32 0.005-0.240 0.006-0.027 2.5-33.O 3.0-7.5 5.0-20.0 0.8-5.5 122.0-208.0 1 .O-23.0 49.0-105.1

2.4-10.4 5.5-17.0 0.9-2.5 0.7-1.3 0.06-0.09 0.05-0.11 0.02-0.15 0.01-0.12 0.06-0.24 0.05-0.10 0.22-0.59 0.29-0.53 0.022-0.050 0.027-0.040 0.017-0.052 0.021-0.077 2.2-29.8 5.0-7.0 7.5-11.0 7.5-7.5 6.1-160.0 15.5-20.0 4.8-7.2 4.1-9.2 74.0-216.0 68.0-227.0 37.0-50.0 49.0-82.0 17.8-25.8 25.6-87.4

5

5.40-8.00 1.00-1.40 3.2-17.9 7.6-42.8 0.51-0.66 0.28-0.73 0.050-0.132 43 .O-80.0 10.00-15.85 6.0-25.O 7.0-13.0 6.40-6.70

6b (n-lo) 10.70-70.00 6.00-30.00 22.5-88.5 14.0-102.0 0.29-2.27 0.90-4.OO 0.083-0.289 9.25-18.00 0.55-1.60 3 .O-40.0 102.0-188.5 7.04-8.40

The following variables were not used in the cluster analysis: Sodium (mg 1-1) Potassium (mg 1-1) Manganese (mg 1-1) Ammonia (mg 1-1) Nitrate t nitrite (mg 1-1) Kjeldahl nitrogen (mg 1-1) Total part. phosphorus (mg 1-1) Total diss. phosphorus (mg 1 - l ) Copper (us 1-1) Lead (ug 1-1) Zinc (pug 1-1) Silica (mg 1-1) Conductivity (US cm-1) True color (Hazen) Hardness (mg 1-1)

0.5-3.8 0.2-0.8 0.01-0.03 0.01-0.04 0.02-0.31 0.06-0.31 0.003-0.021 0.03-0.018 2.0-26.0 1 .o-20.0 5.0-14.0 2.6-10.15 12.0-73.0 12.0-56.0 4.2-25.9

1.5-20.0 0.5-1.8 0.01-0.16 0.02-0.16 0.15-0.55 0.16-0.50 0.009-0.051 0.009-0.057 2.5-14.O 3.0-7.5 5.0-28.1 2.1-6.3 58.0-246.0 17.0-37.0 19.1-73.1

10.0-75.0 2.6-6.1 0.04-0.12 0.05-0.90 0.50-1.55 0.35-1.60 0.027-0.180 0.058-0.192 2.5-9. O 5.0-9.0 8.2-180.0 5.4-9.4 252.0-791.5 25.0-61.0 65.1-279.3

127

groups in the reduced plane informed us about their physical chemical characteristics. Hence, the two analyses complemented each other very well and produced an image which summarizes all the information contained in the initial data matrix. Table 2 provides a summary of the water quality of each group. Some of the variables, not used in the cluster analysis, are not listed in this table because they did not show any variation among the groups (fluoride, cadmium, chromium and nickel), they provided redundant information (total carbon, inorganic carbon, and apparent color) or they were only measured at some stations on few occasions (trace metals and toxicants). The first group revealed by the cluster analysis contains most of the stations (rivers) located on the Canadian Shield (Fig. 5a). They correspond to large quasi-uninhabited drainage basins virtually unaffectea by human activities (low nitrogen and phosphorus concentrations). The water quality of these rivers reflects the geology of the Canadian Shield dominated by Precambrian rocks very resistant to erosion. As a result, these waters are weakly mineralized and have low alkalinity, pH and turbidity values (Table 2).

The second group contains rivers whose water quality shows the influence of various human activities. Agriculture and farming, the presence of pulp and paper mills and/or municipal discharges pollute to some extent these rivers. These waters are more mineralized than those of group 1, have higher alkalinity and pH values (Table 2 ) , and correspond to drainage basins located in the St. Lawrence lowlands and to the Ottawa River below Temiscaming (Fig. 5b). A high percentage of the phosphorus and nitrogen values recorded at the stations of this group exceed the water quality guidelines proposed for the protection of aquatic life (McNeely et al. 1980). Members of group 3 are more polluted than those of group 2 . They belong to five basins of the St. Lawrence lowlands region which also suffer from various anthropogenic activities (Fig. 5c). The Yamaska River basin, which is densely populated, has 40 percent of its surface area used for agricultural practices (including commercial 1 ivestock) and hence, counts numerous agriculture food-related and textiles-related industries. The Nicolet River basin similarly has 35 percent of its territory devoted to agriculture, compared to 26 percent for the Chiiteauguay River and 15 percent for the L'Achigan River. These three basins also have various industries (furniture, dyeing and finishing textiles, and food-canning industries). The L'Achigan River, which is part of the L'Assomption River drainage basin, suffers particularly from the swine farming industries concentrated in this region. Finally, the Pike River, which belongs to the Richelieu River basin,

128

a l s o shows t h e

influence o f agriculture,

t h e major a c t i v i t y o f

t h e region.

A l l those a c t i v i t i e s t a k i n g place on t h e basins, i n a d d i t i o n t o t h e municipal discharges from t h e d i f f e r e n t agglomerations, c o n t r i b u t e t o t h e poor water quality o f high

these r i v e r s .

alkalinity,

pH,

T h e i r waters

are s t r o n g l y

mineralized, and show

t u r b i d i t y , t o t a l n i t r o g e n and t o t a l phosphorus values

(Table 2 ) . The r i v e r s which belong t o group 4 are a l l found on t h e south shore o f t h e St.

mainly i n t h e Gaspe Peninsula and t h e lowlands regions

Lawrence River,

(Fig. 5 d ) . They correspond t o drainage basins v i r t u a l l y unaffected by human a c t i v i t i e s and, as a r e s u l t , t h e water q u a l i t y o f these r i v e r s t r u l y r e f l e c t s the

geology o f

(sedimentary

However,

Appalachian

rocks

p l a t e a u and

susceptible t o

the

weathering

Lawrence lowlands

St.

and

composed

of

i n terms o f m i n e r a l i z a t i o n

(Table 2 ) .

and n u t r i e n t s concentrations

they d i f f e r markedly f o r t h e parameters associated w i t h

p r i n c i p a l component, iron. These r i v e r s

soluble those o f

The water c h a r a c t e r i s t i c s o f t h i s group are s i m i l a r t o

minerals), group 2

the

t h e second

namely tannins and l i g n i n s , t o t a l organic carbon and a l l have r e l a t i v e l y transparent and weakly colored

waters, w i t h a low t u r b i d i t y and a h i g h pH. The r i v e r s which c o n s t i t u t e group 5 d i f f e r very much from those o f group 4 since they have t h e most c o l o r e d and t u r b i d waters. They show t h e highest tannins and l i g n i n s ,

i r o n and t o t a l organic carbon concentrations (Table 2).

Geographically speaking r e g i o n (Fig.

5e).

however,

these r i v e r s

do

n o t come

from

t h e same

T h e i r water q u a l i t y seems r e l a t e d t o t h e surface area o f

the drainage basin and t h e nature o f t h e t o p s o i l s .

For example, t h e G e n t i l l y

River and t h e Du Ch&ne River both have small basins which e s s e n t i a l l y d r a i n two regions o f t h e S t . Lawrence lowlands whose s o i l s are dominated by marine clays. S i m i l a r l y , t h e Ticouape River basin i s small and p o o r l y d r a i n s t h e found

north

Rivikre-du-Loup

lowlands

River,

of

Lake

which has

Saint-Jean. a

a f f e c t e d by t h e numerous organic matter t h i s region o f t h e S t . the

headwaters

of

the

larger

deposits and

Lawrence lowlands. Harricana

On drainage

River

the

other

basin,

hand, seems

the t o be

peat-bogs dispersed i n

F i n a l l y , t h e K i n o j e v i s R i v e r and have

a water

quality

which i s

i n f l u e n c e d by t h e presence o f humic s o i l s and wetlands, and by t h e mining a c t i v i t i e s t a k i n g place i n t h e r e g i o n ( h i g h copper and z i n c concentrations). As mentioned above,the f i r s t PCA performed on a t h e i n i t i a l 134 s t a t i o n s has revealed t h e existence o f twelve r i v e r s which d i f f e r markedly from t h e

Fig. 5. Geographical l o c a t i o n o f t h e s t a t i o n s composing each o f t h e s i x clusters.

129

130

131

132

rest of the stations. These problem rivers (stations) were removed from the data set to obtain a clearer image of the spatial variability which otherwise would have been distorted. A closer look at these highly polluted rivers reveals that they all have relatively small drainage basins ( 5 540 kd), and are concentrated in the St. Lawrence lowlands, except for one (the Malbaie River), which has a larger basin (1850 km2) and is located on the Canadian Shield (Fig. 5f). Furthermore, in addition to the geological effects and the influence, in some cases, of agriculture, these rivers all suffer from the presence of point sources of pollution. As a result of their low discharges, most of them have reduced self-cleaning capacity. The pollutants entering them are less diluted and tend to remain longer in the aquatic environment. The twelve rivers can be subdivided in two groups. The first one, constituted by two rivers strongly affected by a pulp and paper mill and other industries (The Malbaie and the Shawinigan Rivers), shows the highest median values for both total organic carbon and tannins and lignins. However, the water quality observed at the mouth of these rivers should not be considered representative of the whole basin since the pollution sources are concentrated in this segment of the rivers. The second group, containing the ten other stations, has the most mineralized waters. Some of these rivers present the highest median values for alkalinity, pH, turbidity, and total nitrogen, and their total phosphorus concentrations are similar to those of group 3 . The uppermost station of the Becancour River belongs to this group and shows, in addition to the effects of other human activities, the influence of asbestos mining on water quality (high magnesium concentrations). However, the river condition improves markedly downstream from the mining area, and the water quality observed at Lyster (middle station on the river) places this station in group 4 . The other rivers of the group drain the St. Lawrence lowlands and have a water quality which reveals severe anthropogenic effects (agriculture, industries and municipal discharges). CONCLUSION The use of multivariate techniques of analysis has produced very interesting results. The PCA has identified the list of parameters responsible for most of the among stations (rivers) variability. The superimposition of the cluster analysis results on the principal components scores (position of the objects in the reduced plane) has shown which was inferior to the percentage predicted by the model. The reduced plane

133

formed by the first two components explains 76 percent of the variance among stations. The twelve variables used in the PCA, when projected in the reduced (two-dimensional) plane, all produced vectors that exceed the equilibrium circle of descriptors (Legendre and Legendre, 1983) and consequently, contribute significantly to the formation of the plane. The variables associated with the first component were sulfate, chloride, total phosphorus, total nitrogen, turbidity, calcium, magnesium, alkalinity and pH. This axis represents a mineralization gradient. The variables correlated with the second component are the tannins and lignins, iron and total organic carbon. This second axis illustrates an organic content and Since the eigenvectors were normalized to the square water color gradient. root of their respective eigenvalue, the angle between two descriptor axes or between a descriptor axis and a component (Fig. 3) represents the correlation between variables or between a variable and a component (Legendre and Legendre 1983). The percentage of variance explained by the two components is rather high. The first axis in particular (51 percent), which explains !wice the amount of variance of the second axis (25 percent), suggests that there is some redundancy in the information concerning the mineralization of water (Scherrer, 1984). The major ions used to characterize the geology of the different drainage basins are strongly correlated with each other. This redundancy could have been reduced by summing cations and anions and using the two sums instead of the respective ions. There is nevertheless no doubt that mineralization plays a major role in the variability among stations since the geology of the different basins varies considerably at the scale of the province (for example, the Canadian Shield versus the Appalachian P1 ateau) . The positioning of the objects (stations) in the reduced plane preserved the Euclidean distances of the standardized (centered and reduced) data since the scoring coefficients were normalized to give principal components scores with unit variance (SAS Institute Inc., 1982). This representation eliminates the effects related to the units of measurement and the respective variance of the variables. C1 uster anal vsi s The principal components scores positioned the objects (stations) in the reduced plane according to their respective water quality. However, the proximity of two objects in a reduced plane does not necessarily imply that stations share a similar water quality. Furthermore, the relative position

134

of the five homogenous groups in the plane informed us about their general water quality characteristics. Ordination and cluster analysis complemented each other very well and summarized all the information contained in the data matrix. The six groups of stations (or rivers) revealed by our analysis show the importance of geological factors and land uses on the water qlrality. The rivers of group 1 and 4 are mostly pristine, and their water quality reflects the geology of the Canadian Shield and the Appalachian Plateau region respectively. The geographical regions corresponding to these two groups have low population densities and hence, these rivers are relatively unaffected by human activities. Group 2 and 3 represent rivers which are affected to different degrees by anthropogenic activities taking place on the drainage basins. The land use effects and municipal discharges from the agglomerations add up to the geological effects to produce the observed water quality. Agricultural practices play a major role as determinants of water quality in these geographical regions which are also densely inhabited. The seven rivers forming group 5 have a water quality which reflects the particular nature of the soils of these drainage basins, their surface area and the drainage quality. Finally, our study has identified problem rivers which are, with a few exceptions, found in the most populated and most industrialized region of Quebec. These rivers which are characterized by small drainage basins and low discharges, suffer from the important socio-economic activities going on in the region. For some of these rivers, the observed water characteristics are biased by the presence of a few major sources of pollution which often mask what would otherwise be an acceptable water quality. REFERENCES Bobee, B . , D. Cluis, M. Goulet, M. Lachance, L. Potvin, et A . Tessier. 1977.cvaluation du reseau de la qualite des eaux. Analyse et interpretation des donnees de la periode 1967-1975. Service de la qualite des eaux, Ministere des Richesses naturelles du Quebec, Q.E. 20, Quebec. 2 volumes, 514 p. Frontier, S. 1976. Etude de la decroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modele du biton brise. J. exp. mar. Biol. Ecol. 25: 67-75.

135

Goulet, M. 1979. Reseau de base de la qualite du milieu aquatique en rivieres a l’echelle du Quebec, Service de la qualite des eaux, ministere de l’Environnement, rapport interne 79-04, 60 pages, Envirodoq 02015. Green, R. 1979. Sampling design and statistical methods for environmental biologists. John Wiley and Sons, New-York, 257 p. Legendre. 1983. Numerical ecology. Development in Legendre, L. and P. environmental Modelling, 3. Elsevier, Amsterdam, 419 p. Longpre, G., G. Joubert, et J. Trottier. 1982. Guide d’information sur l’analyse physique, chimique, biologique et bacteriologique des milieux environnementaux. Ministere de 1’Environnement du Quebec, Direction des laboratoires, 152 p. McNeely, R.N., V . P . Neimanis et L. Dwyer. 1980. References sur la qualite des eaux. Guide des parametres de la qualite des eaux. Direction generale des eaux interieures. Direction de la qualite des eaux. Ottawa. 100 p .

SAS Institute Inc., 1982. SAS User’s guide: statistics, 1982. edition. SAS institute Inc., Cary, North Carolina. 584 p. 1984. Scherrer, B . Montreal, 95 p.

Analyse en composantes principales.

G.R.E.B.E. Inc.,

1983. Regionalization o f water quality in the upper Whitfield, P . H. Fraser basin, British Columbia. Water Res. 17, 1053-1066.

This Page Intentionally Left Blank

E S T I M A T I O N O F D I S T R I B U T I O N A L PARAMETERS FOR C E N S O R E D W A T E R Q U A L I T Y DATA D E N N I S R.

HELSEL

Geological Survey,

U.S.

Reston,

Virginia

INTRODUCTION Investigations o f t r a c e substances i n ambient waters increasingl y c o n d u c t e d d u r i n g t h e l a s t 10 y e a r s h a v e e n c o u n t e r e d a r e c u r r i n g

d i f f i c u l t y : a substantial

p o r t i o n o f water-sample concentrations

a r e below t h e l i m i t s o f d e t e c t i o n e s t a b l i s h e d by a n a l y t i c a l l a b o r a tories.

Data s e t s w i t h these "less-than''

"censored data" i n s t a t i s t i c a

observations are termed

terminology.

Censored d a t a do n o t

p r e s e n t a s e r i o u s i n t e r p r e t a t on p r o b l e m i f c o n c e n t r a t i o n s o f p r i mary i n t e r e s t a r e w e l l above o f t e n n o t t h e case.

he detection l i m i t ,

F o r some c h e m i c a l s ,

but this i s

established water-quality

c r i t e r i a a r e b e l o w commonly a p p l i e d d e t e c t i o n l i m i t s . others,

F o r many

t h e great u n c e r t a i n t y i n t h e e f f e c t s o f long-term exposure

t o v e r y l o w l e v e l s a l s o make i t d e s i r a b l e t o a s s e s s t h e f r e q u e n c y o f occurrence o f c o n c e n t r a t i o n s below t h e d e t e c t i o n l i m i t . short,

In

t h e r e i s a need t o e s t i m a t e t h e f r e q u e n c y d i s t r i b u t i o n o f

c o n c e n t r a t i o n s above,

near,

and below d e t e c t i o n l i m i t s u s i n g o n l y

d a t a above t h e d e t e c t i o n l i m i t . The p u r p o s e o f t h i s s t u d y i s t o a d d r e s s s e v e r a l

key aspects o f

estimating d i s t r i b u t i o n a l parameters from censored data.

These

include: 0

The p e r f o r m a n c e o f s e v e r a l e s t i m a t i o n m e t h o d s when e s t i m a t i n g d i s t r i b u t i o n a l parameters f r o m s m a l l samples drawn f r o m a wide range o f u n d e r l y i n g d i s t r i b u t i o n s and censored t o v a r y i n g degrees.

0

C r i t e r i a f o r determining, maining a f t e r censoring,

b a s e d o n l y on a t t r i b u t e s o f d a t a r e which e s t i m a t i o n method i s most l i k e -

l y t o be b e s t f o r each d a t a s e t . 0

The r e l i a b i l i t y o f e s t i m a t e s f r o m c e n s o r e d d a t a o f f o u r d i s t r i b u t i o n a l parameters: a n d in t e r q u a r t i l e r a n g e .

t h e mean,

standard deviation,

median,

138 A P P R O ACH

1.

Generation o f data.

Sixteen p a r e n t d i s t r i b u t i o n s were s e l e c t e d

as r e p r e s e n t a t i v e o f t h e range o f f r e q u e n c y d i s t r i b u t i o n s t h a t i s typical of trace water-quality s a m p l e s i z e s 10,

25,

data.

a n d 50 l i m i t .

Five hundred data sets o f Several sample s t a t i s t i c s were

computed f o r each d a t a s e t and t h e one w h i c h b e s t i n d i c a t e d t h e All

p a r e n t d i s t r i b u t i o n was s e l e c t e d . i f i e d using that statistic.

d a t a s e t s were t h e n c l a s s -

B e n e f i t s i n method s e l e c t i o n and

i m p r o v e d a c c u r a c i e s o f RMSEs w e r e e v a l u a t e d .

2.

Parameter E s t i m a t i o n Methods.

f o r e s t i m a t i n g t h e mean,

E i g h t methods were e v a l u a t e d

standard deviation,

q u a r t i l e range o f censored data.

median,

and i n t e r -

The r e l i a b i l i t y a n d r e l a t i v e

p e r f o r m a n c e o f m e t h o d s was e v a l u a t e d b a s e d o n t h e i r r o o t mean s q u a r e d e r r o r s (RMSEs).

3.

Estimation without classification.

and sample s i z e ,

For each c e n s o r i n g l e v e l

a l l d a t a s e t s f r o m t h e 16 p a r e n t d i s t r i b u t i o n s

w e r e c o m b i n e d f o r c o m p u t a t i o n o f RMSEs f o r e a c h m e t h o d a n d d i s t r i b u t i o n parameter.

B e s t methods,

b a s e d on minimum RMSE,

were

i d e n t i f i e d f o r each parameter f o r every combination o f censoring l e v e l and sample size.

RMSEs o f t h e s e b e s t m e t h o d s f o r e a c h s u c h

combination were e v a l u a t e d i n r e l a t i o n t o t h e most r o b u s t method over a l l simulation conditions. 4.

Estimation with classification.

Method s e l e c t i o n and t h e

a c c u r a c y o f RMSEs m i g h t b e i m p r o v e d b y c l a s s i f y i n g d a t a s e t s b a s e d on a t t r i b u t e s o f d a t a a b o v e t h e d e t e c t i o n l i m i t .

Several sample

s t a t i s t i c s were computed f o r each d a t a s e t and t h e one which b e s t i n d i c a t e d t h e p a r e n t d i s t r i b u t i o n was s e l e c t e d . were t h e n c l a s s i f i e d u s i n g t h a t s t a t i s t i c .

A l l data sets

B e n e f i t s i n method

s e l e c t i o n a n d i m p r o v e d a c c u r a c i e s o f RMSEs w e r e e v a l u a t e d . 5.

Verification.

Method s e l e c t i o n r e s u l t s were v e r i f i e d by apply-

i n g t h e same t y p e o f a n a l y s i s t o a c t u a l w a t e r - q u a l i t y

data.

The

c l a s s i f i c a t i o n s y s t e m d e v e l o p e d i n t h e s i m u l a t i o n s was t e s t e d b y comparing method performance f o r a c t u a l and s i m u l a t e d d a t a w i t h i n each c l a s s ,

and by e v a l u a t i n g t h e a b i l i t y o f c l a s s i f i c a t i o n t o

separate water-quality

d a t a s e t s h a v i n g d i f f e r e n t RMSEs o f p a r a -

meter estimates. 6.

E s t i m a t i o n o f sample s t a t i s t i c s .

The a b i l i t y o f t h e e i g h t

methods t o e s t i m a t e t h e v a l u e o f uncensored sample s t a t i s t i c s , r a t h e r t h a n t h e p o p u l a t i o n parameter as before,

was e v a l u a t e d i n

a s i m u l a t i o n u s i n g t h e same 1 6 p a r e n t d i s t r i b u t i o n s , levels,

and sample s i z e s .

censoring

T h e r e s u l t i n g RMSEs w e r e c o m p a r e d t o

139 those f o r estimating population parameters.

Finally,

these results

d a t a s e t s f r o m t h e U.S.

were v e r i f i e d u s i n g uncensored t r a c e - m e t a l

Geo 1 o g ic a 1 S u r v e y ' s N a t i o n a l S t r e a m Q u a l i t y A c c o u n t i n g N e t w o r k (NASQAN). E a c h o f t h e s e c t o n s o u t l i n e d a b o v e i s now d i s c u s s e d f u r t h e r . Additional detail,

ncluding t a b es o f results,

G i l l i o m and H e l s e l

1 9 8 5 ) a n d He s e l a n d G i l l i o m ( 1 9 8 5 ) .

may b e f o u n d i n

GENERATION O F DATA I n d e s i g n i n g t h e Monte C a r l o

xperiments,

a p r i m a r y g o a l was t o

m i m i c as c l o s e l y a s p o s s i b l e t h e t y p e s o f d a t a t h a t a c t u a l l y o c c u r for concentrations o f trace constituents sample p r o p e r t i e s and t h e v i s u a l

i n water.

Based o n t h e

i n s p e c t i o n o f sample histograms,

f o u r p a r e n t d i s t r i b u t i o n s w i t h p o s i t i v e skew w e r e c h o s e n : normal,

contaminated lognormal

and d e l t a ( l o g n o r m a l

augmented by z e r o s ) .

d i s t r i b u t on were c o n s i d e r e d ,

log-

(mixture o f two lognormals),

gamma,

Four v a r i a n t s o f each

h a v i n g C V ' s o f 0.25,

0.50,

1.0,

and

The d e n s i t y f u n c t i o n s f o r t h e r e s u l t i n g 1 6 p a r e n t d i s t r i b u -

2.0.

t i o n s a r e s h o w n i n f i g u r e 1.

I n a l l cases,

t h e m e a n s e q u a l e d 1.0.

A boxp o t which combines 100 d a t a s e t s f r o m each o f t h e 16 p a r e n t d i s t r i b u t i o n s i s compared t o b o x p l o t s f o r t r a c e m e t a l and nut r i e n t p l u s s e d i m e n t d a t a f r o m t h e U S G S N A S Q A N p r o g r a m i n f i g u r e 2. P r e s e n t e d a r e c o e f f i c i e n t s o f v a r i a t i o n (CV) and a measure o f

MS,

,symmetry, MS =

75 450

-

where

950

-

425

and q i i s t h e ith p e r c e n t i l e o f t h e data set. A l l three types o f d a t a have s i m i l a r d i s t r i b u t i o n s o f these non-dimensional v a r i a n c e and symmetry c h a r a c t e r i s t i c s .

Therefore,

t h e s e 16 d i s t r i b u t i o n s

were considered r e p r e s e n t a t i v e o f t h e d i s t r i b u t i o n s o f t r a c e cons t i t u e n t concentrations found i n water. The r e l a t i o n s h i p s u s e d t o g e n e r a t e d a t a f r o m t h e s e d i s t r i b u t i o n s a r e summarized below,

f o l l o w e d by a b r i e f d e s c r i p t i o n o f t h e

s i z e s and c e n s o r i n g o f d a t a s e t s .

All x's

r e f e r t o real-space

v a l u e s and a l l y ' s r e f e r t o log-space values. Lognormal D i s t r i b u t i o n When y = I n x i s n o r m a l l y d i s t r i b u t e d w i t h mean u y a n d v a r i G ~ a , s e t o f c o n c e n t r a t i o n s , xi, i=l,... n can be generY a t e d u s i n g e q u a t i o n 1: ance

X i

= exp(uy

+ uy*€i)

(1)

140

CV=0.25

CV=l.O ---- --

CV=0.50 __-

cv=2.0 _.____._.__..-

W

0

z

W

nf

[1L

3

0 0

0

G

I

I

0

>

0 Z

W

3

a w CY

G

Fig. 1.--Probability density functions butions used i n simulations.

f o r the parent d i s t r i -

where E i i s a randomly chosen v a l u e f r o m a normal d i s t r i b u t i o n w i t h a mean o f z e r o a n d v a r i a n c e o f one. Contaminated Lognormal

Distribution

The c o n t a m i n a t e d l o g n o r m a l d i s t r i b u t i o n u s e d i n t h i s s t u d y cons i s t s o f a m i x t u r e o f one p r e d o m i n a n t l o g n o r m a l

(pxl,

d e s c r i b e s 80 p e r c e n t o f t h e overa.11

and a c o n t a m i n a n t

population,

a x l ) which

141

MAXIMUM

41 97

45.3.

321

I

40-

4.0 -

30 -

30

Explanation:

>

T - Trace V - Verification (Nutrient and Sediment) S - Simulated

-

0

r”

-

20-

2.0 -

10-

10-

’1

n-

0-

T

V

S

T

v

N=781

N=918

N=1600

N=781

N=918

DATA TYPE

DATA TYPE

Fi g .

s N.1600

2.

Symmetry m e a s u r e (MS) a n d c o e f f i c i e n t o f v a r i a t i o n ( C V ) t y p e s [ * 35 d a t a s e t s h a v e d e n o m i n a t o r = O , a n d a r e be y o n d t h e m a x i m u m ’ ] f o r t h r e e d:ta

2 x ,2 ) , w h i c h d e s c r i b e s 2 0 p e r c e n t o f t h e o v e r a l l lognormal ( ~ ~ u population. P r o p o r t i o n a l r e l a t i o n s h i p s were s p e c i f i e d between t h e parameters o f t h e two d i s t r i b u t i o n s which allowed unique solut i o n s f o r t h e i r e x a c t parameters f o r any o v e r a l l d i s t r i b u t i o n s p e c i f i e d b y p, a n d u x . u x 2 = 1.5 U x 1 and

-

ox 2 - - - 2.0 ux 2

a

The c o n d i t i o n s imposed were:

ux 1 -.

Uxl

Gamma D i s t r i b u t i o n Two-parameter

gamma d i s t r i b u t i o n s ,

c h a r a c t e r i z e d by a shape

parameter, a x , and a s c a l e parameter, 8 , were generated u s i n g t h e I n t e r n a t i o n a l Mathematical and S t a t i s t i c a l L i b r a r i e s generating routine.

142 Delta Distribution The d e l t a d i s t r i b u t i o n i s a m i x t u r e o f a

ognormal d i s t r b u t i o n

( p x l , a x l ) a n d some p o r t i o n ( p ) o f z e r o v a l u e s F o r a l l simu at i o n s , t h e p o r t i o n o f z e r o s was 5 p e r c e n t ( p = 0 5 ) . The mean a n d standard d e v i a t i o n o f t h e o v e r a l l d i s t r i b u t i o n were g i v e n by Aitchison (1955). Sample S i z e s and C e n s o r i n g O f i n t e r e s t was t h e e f f e c t o f c e n s o r i n g o n d a t a s e t s o f v a r y -

i n g sample s i z e s . ducted,

Therefore,

w i t h d a t a s e t s o f 10,

simulation,

t h r e e s e p a r a t e s i m u l a t i o n s were con25,

a n d 50 o b s e r v a t i o n s .

I n each

500 d a t a s e t s w e r e g e n e r a t e d f r o m each o f t h e 16 p a r e n t

distributions.

A l l d a t a s e t s were censored a t f o u r d i f f e r e n t

levels (detection limits)--the

20th,

i l e s o f the parent distributions. i n g a r e common i n t r a c e - l e v e l

I " censoring (David,

40th,

60th,

and 8 0 t h p e r c e n t -

Such h i g h p e r c e n t a g e s o f c e n s o r -

water-quality

data.

With t h i s "type

1981), t h e actual percentage o f observations

censored v a r i e d f o r each d a t a s e t due t o sample v a r i a b i l i t y . t h e gamma d i s t r i b u t i o n w i t h CV=2.0, were s o c l o s e t o z e r o (0.0043

For

t h e 2 0 t h and 4 0 t h p e r c e n t i l e s

and 0.070)

t h a t t h e y were d i s c a r d e d

as b e i n g u n r e a l i s t i c d e t e c t i o n l i m i t s . We r e q u i r e d t h e c o n d i t i o n t h a t a t l e a s t t h r e e o b s e r v a t i o n s b e p r e s e n t i n e a c h d a t a s e t a f t e r c e n s o r i n g o r t h e d a t a s e t was d i s carded. for

F o r n=10,

t h i s e l i m i n a t e d about 72 percent o f t h e d a t a

censoring a t t h e 80th percentile.

R e s u l t s f o r n=10 a t t h e

8 0 t h p e r c e n t i l e were t h e r e f o r e n o t considered meaningful. P AR AMET E R EST I M AT I 0 N MET H 0 D S

E i g h t m e t h o d s w e r e e v a l u a t e d f o r e s t i m a t i n g t h e p o p u l a t i o n mean, standard deviation,

median,

and i n t e r q u a r t i l e range.

These a r e

l i s t e d below along w i t h t h e i r a b b r e v i a t i o n s used i n t h i s r e p o r t . ZE :

C e n s o r e d o b s e r v a t i o n s w e r e assumed t o e q u a l z e r o .

DL :

Censored o b s e r v a t i o n s were assumed t o e q u a l t h e d e t e c t i o n

UN :

Censored o b s e r v a t i o n s w e r e assumed t o f o l l o w a u n i f o r m

limit. d i s t r i b u t i o n between z e r o and t h e d e t e c t i o n l i m i t . f o r t h e ordered observations o f data censored,

symmetric around one-half NR :

Xi,

i=1,2,

xi=dl (i-l)/(nc-1),

...n c

Thus,

a n d nc=nurnber

a distribution

the detection l i m i t (dl).

C e n s o r e d o b s e r v a t i o n s w e r e assumed t o f o l l o w t h e z e r o - t o d e t e c t i o n l i m i t p o r t i o n o f a normal d i s t r i b u t i o n which was f i t t o t h e u n c e n s o r e d o b s e r v a t i o n s u s i n g l e a s t

143 squares r e g r e s s i o n as f o l l o w s .

"Normal

scores,"

z,

were

computed f o r each uncensored o b s e r v a t i o n u s i n g 1

z = w h e r e 9-

1

@-

(r/n+l)

i s t h e i n v e r s e cumulative normal d i s t r i t u -

t i o n function,

r i s t h e observation rank (r=nc+l,

...n )

and n i s t h e sample s i z e f o r t h e e n t i r e d a t a set. least-squares

A

r e g r e s s i o n o f c o n c e n t r a t i o n on normal

s c o r e s f o r a l l d a t a a b o v e t h e d e t e c t i o n l i m i t was extrapolated t o estimate censored observations (ranks

... n c ) .

r = l,

Any e s t i m a t e d v a l u e s f a 1 1 i n g b e 1 ow z e r o

were s e t equal t o zero. LR :

C e n s o r e d o b s e r v a t i o n s a r e assumed t o f o l l o w t h e z e r o - t o detection l i m i t p o r t i o n o f a lognormal d i s t r i b u t i o n f i t t o t h e uncensored o b s e r v a t i o n s by l e a s t squares r e g r e s sion.

The m e t h o d i s i d e n t i c a l t o NR,

c e n t r a t i o n s were log-transformed NM :

except t h a t con-

p r i o r t o analysis.

C o n c e n t r a t i o n s a r e assumed t o be n o r m a l l y d i s t r i b u t e d w i t h parameters estimated from t h e uncensored observat i o n s b y t h e maximum l i k e l i h o o d m e t h o d f o r a c e n s o r e d normal d i s t r i b u t i o n

LM :

(Cohen,

1959).

C o n c e n t r a t i o n s a r e assumed t o be l o g n o r m a l l y d i s t r i b u t e d w i t h parameters e s t i m a t e d u s i n g l o g a r i t h m s o f t h e uncens o r e d o b s e r v a t i o n s i n C o h e n ' s ( 1 9 5 9 ) maximum l i k e l i h o o d method.

T h e mean a n d s t a n d a r d d e v i a t i o n o f t h e u n t r a n s -

formed c o n c e n t r a t i o n s a r e t h e n e s t i m a t e d u s i n g t h e e q u a t i o n s g i v e n by A i t c h i s o n and Brown (1957). DT :

Censored o b s e r v a t i o n s a r e assumed t o be z e r o and uncens o r e d o b s e r v a t i o n s a r e assumed t o f o l l o w a l o g n o r m a l distribution.

Estimates o f parameters o f t h e o v e r a l l

d e l t a d i s t r i b u t i o n a r e o b t a i n e d b y c o m p u t i n g maximum l i k e l i h o o d e s t i m a t e s o f p a r a m e t e r s of t h e u n c e n s o r e d lognormal p o r t i o n and u s i n g r e l a t i o n s h i p s between t h e s e and t h e o v e r a l l d e l t a d i s t r i b u t i o n d e s c r i b e d by Aitchison (1955). The commonly u s e d method o f d i s c a r d i n g c e n s o r e d o b s e r v a t i o n s p r i o r t o c a l c u l a t i n g p a r a m e t e r e s t i m a t e s was n o t i n c l u d e d i n t h i s study.

Discarding censored observations w i l l always r e s u l t i n

b o t h h i g h e r b i a s a n d h i g h e r R M S E t h a n t h e DL m e t h o d .

Because t h i s

c a n n e v e r b e t h e m o s t a p p r o p r i a t e ( m i n i m u m RMSE) m e t h o d ,

i t was

144 n o t considered here.

The commonly u s e d s u b s t i t u t i o n o f v a l u e s

t h e d e t e c t i o n l i m i t was a l s o n o t i n c l u d e d ,

equal t o one-half

t o i t s s i m i l a r i t y t o t h e UN m e t h o d . i d e n t i c a l e s t i m a t e s f o r t h e mean,

due

These two methods w i l l

produce

w h i l e a range i n values between

z e r o a n d t h e d e t e c t i o n l i m i t f o r t h e UN m e t h o d s h o u l d p r o d u c e b e t t e r estimates o f t h e other t h r e e parameters than s u b s t i t u t i n g a single,

a r b i t r a r y value f o r a l l censored data.

T h e e v a l u a t i o n o f t h e r e l i a b i l i t y o f e s t i m a t i o n m e t h o d s was b a s e d o n RMSEs c o m p u t e d f r o m a c t u a l p a r a m e t e r s o f t h e u n d e r l y i n g distribution.

D e v i a t i o n s between t h e parameter v a l u e s e s t i m a t e d

from each censored d a t a s e t and t h o s e o f t h e u n d e r l y i n g d i s t r i b u t i o n w e r e d i v i d e d b y t h e t r u e p o p u l a t i o n v a l u e s t o e x p r e s s RMSEs as f r a c t i o n s o f t h e t r u e values.

F o r example,

the equation for

t h e R M S E o f t h e mean i s

[ p (*) i'

RMSE = where

xi

1'2

i=l U

i s t h e e s t i m a t e o f t h e mean f o r t h e i t h o f N data sets.

A l s o computed were t h e b i a s p o r t i o n o f t h e RMSE and t h e s t a n d a r d e r r o r o f t h e RMSE,

which d e s c r i b e s t h e r e l i a b i l i t y o f RMSE e s t i -

mates. EST I M AT I O N W I THOUT CL A S S I F I CAT I O N Simulation results without classification of data sets are g i v e n i n f i g u r e 3 f o r d a t a s e t s o f s i z e n=25 t o show t h e t y p i c a l p a t t e r n o f r e s u l t s f o r a l l parameter e s t i m a t i o n methods. RMSEs a r e h i g h e r a n d l o w e r f o r n = 1 0 a n d n = 5 0 , same e s t i m a t i o n m e t h o d s a l w a y s p e r f o r m w e l l

Though

respectively,

the

f o r a p a r t i c u l a r com-

b i n a t i o n o f c e n s o r i n g l e v e l and d i s t r i b u t i o n a l parameter. T h e r e a r e s e v e r a l ways t o a p p r o a c h i d e n t i f y i n g t h e " b e s t " mation method(s).

f o r every single combination o f censoring level, sample s i z e .

Alternatively,

t h a t works w e l l

esti-

One a p p r o a c h w o u l d b e t o d e s i g n a t e a b e s t m e t h o d parameter,

and

a s i n g l e r o b u s t method c o u l d be chosen

over t h e e n t i r e range o f conditions simulated.

Figure 4 i l l u s t r a t e s these two method-selection

approaches.

The

b e s t o v e r a l l m e t h o d was c h o s e n b y s u m m i n g t h e r a n k s o f RMSEs f o r each method o v e r a l l sample s i z e s ,

censoring levels,

T h e m e t h o d w i t h t h e s m a l l e s t sum o f r a n k s ,

LR,

and parameters.

was c o n s i d e r e d b e s t .

RMSEs f o r LR a r e s h o w n f o r a l l p a r a m e t e r s i n f i g u r e 4,

along with

t h o s e f o r a n y o t h e r m e t h o d s h a v i n g RMSEs s i g n i f i c a n t l y

(a=0.05)

l o w e r t h a n t h a t o f LR.

L i t t l e r e d u c t i o n i n R M S E f o r t h e mean a n d

145

t

c

0 ZE

0 NR

Q)

a

0 NM

8 ZE DT 0 NR

~ L DL M UN LR

0

MEAN

I

I

1

SD

MEDIAN

IQR

F i g . 3. E r r o r s o f e s t i m a t i n g t h e mean, median, and i n t e r q u a r t i l e range ( I Q R ) . with censoring a t the 40th percentile.

standard deviation (SD), Sample s i z e e q u a l s 25,

s t a n d a r d d e v i a t i o n i s accomplished by c o n s i d e r i n g d i f f e r e n t sample s i z e s and c e n s o r i n g l e v e l s s e p a r a t e l y .

T h e RMSEs o f LR a r e l o w e s t ,

or not s i g n i f i c a n t l y d i f f e r e n t than the lowest,

i n v i r t u a l l y every

situation. For t h e median and i n t e r q u a r t i l e range,

on t h e o t h e r hand,

s i g n i f i c a n t r e d u c t i o n s i n RMSE can be a c h i e v e d by u s i n g t h e b e s t m e t h o d f o r a p a r t i c u l a r s e t o f c o n d i t i o n s r a t h e r t h a n u s i n g LR f o r all

(fig.

4).

The l a r g e s t r e d u c t i o n s i n RMSE o c c u r f o r s m a l l

sample s i z e s and h i g h c e n s o r i n g . c e n s o r i n g l e v e l and sample s i z e ,

For a l l b u t f o u r combinations o f t h e b e s t method f o r e s t i m a t i n g

t h e m e d i a n a n d i n t e r q u a r t i l e r a n g e i s LM. r a n g e a t 20 p e r c e n t c e n s o r i n g , n=50.

For t h e i n t e r q u a r t i l e

LM i s t i e d w i t h LR f o r n = 2 5 a n d

F o r t h e m e d i a n a t 80 p e r c e n t c e n s o r i n g a n d n=25 a n d n=50,

LM i s a c l o s e s e c o n d t o N R . F i g u r e 4, approaches,

w h i l e showing t h e extremes o f method s e l e c t i o n

suggests an e f f e c t i v e t h i r d c o u r s e - - s e l e c t i n g

LR f o r

t h e mean a n d s t a n d a r d d e v i a t i o n a n d LM f o r t h e m e d i a n a n d i n t e r q u a r t i l e range.

I n fact,

LR h a s t h e l o w e s t sum o f r a n k s ( l o w e s t

r a n k w i t h l o w e s t RMSE) o f a n y m e t h o d f o r t h e mean a n d s t a n d a r d

146

W

100

I

I

3

I

100

MEAN

t

I

1

5

P f Y u

60-

-

w

40.

n w

/

if

-_____---

n.10

UN

n25

n:50

3

2

-

20-

4

%

5 20

60

40

80

I

I

20

40

POPULATION PERCENTILE OF CENSORING LEVEL

60

80

POPULATION PERCENTILE OF CENSORING LEVEL EXPLANATION n number of observations In each sample before

-RMSE 01

,

L

O

0

20

40

60

80

censoring

LR method __.. RMSE 01 best method [mdicated lor each datum)

POPULATION PERCENTILE OF CENSORING LEVEL

Fig. 4.

R o o t mean s q u a r e d e r r o r s f o r b e s t e s t i m a t i o n m e t h o d s .

147 d e v i a t i o n o v e r a l l c e n s o r i n g l e v e l s a n d s a m p l e s i z e s w h i l e LM h a s t h e l o w e s t sum o f r a n k s f o r t h e m e d i a n a n d i n t e r q u a r t i l e r a n g e . L i t t l e r e d u c t i o n i n RMSE i s a c c o m p l i s h e d by u s i n g o t h e r methods f o r d i f f e r i n g sample s i z e s o r censoring l e v e l s . T h e LM m e t h o d p r o d u c e s some e r r a t i c a l l y h i g h e s t i m a t e s o f t h e mean a n d s t a n d a r d d e v i a t i o n ( f i g u r e 3 ) , censoring levels.

particularly f o r higher

T h i s o c c u r r e d f o r t h e same d a t a s e t s f o r w h i c h

LM g e n e r a l l y p r o d u c e d t h e b e s t e s t i m a t e s o f t h e m e d i a n a n d i n t e r q u a r t i l e range,

a n d c a n b e e x p l a i n e d u s i n g f i g u r e 5.

The e s t i m a t e d

p r o b a b i l i t y d i s t r i b u t i o n s p r o d u c e d b y t h e LM a n d LR m e t h o d s a r e compared t o t h e p a r e n t d i s t r i b u t i o n f o r one h i g h CV d a t a s e t censored a t t h e 60th percentile.

F i g u r e 5 i l l u s t r a t e s t h a t t h e LM

method p r o d u c e d an e s t i m a t e d d i s t r i b u t i o n t h a t more c l o s e l y m i m i c s t h e p a r e n t d i s t r i b u t i o n t h a n t h e LR m e t h o d . accurate estimates o f percentiles.

This results i n

To d o t h i s ,

however,

a n d s t a n d a r d d e v i a t i o n w e r e g r o s s l y o v e r e s t i m a t e d a t 4.7 respectively.

T h e LR m e t h o d ,

the parent distribution,

0.14

I

I

I

t h e mean a n d 453,

though n o t m i m i c k i n g t h e shape o f

p r o d u c e d a c c u r a t e e s t i m a t e s o f t h e mean

I

I

I

I

I

1

I

I

__ Parent gamma ( p = l . O , U = 2.01 -_ - LR (X = 1.09, s 2.101 LM (X = 4 . 7 , s = 4 5 3 )

0.12

---

W

0

6 0.10 n

n 3

$

0.08

0 L L

>

-\h

Censoring level

0.06

0 Z

s

0.04

K

U

0.02

0.00 0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

CON c E NTRA T I

2.00

2.25

2.50

2.75

oN

F i g . 5. E s t i m a t e d f r e q u e n c y d i s t r i b u t i o n s b y LM a n d LR ( n = 2 5 ) The d a t a s e t c o m p a r e d t o t h e gamma C V = 2 . 0 p a r e n t d i s t r i b u t i o n . was c e n s o r e d a t t h e 6 0 t h p o p u l a t i o n p e r c e n t i l e .

3.00

148 (1.09)

a n d s t a n d a r d d e v i a t i o n (2.10).

B e c a u s e t h e LR,

NR,

and UN

methods i n v o l v e s i m p l y c a l c u l a t i n g sample p a r a m e t e r s t a t i s t i c s a f t e r estimating censored observations, estimates o f distributional

they r a r e l y produce w i l d

parameters.

E S T I M A T I O N WITH CL A S S IF I C AT I O N

R a n k i n g s a n d RMSEs w e r e p r e v i o u s l y p r e s e n t e d i n f i g u r e 3 w i t h all

16 p a r e n t d i s t r i b u t i o n s e q u a l l y r e p r e s e n t e d .

d i s t r i b u t i o n w e r e known,

however,

If the parent

t h e o t h e r 15 c o u l d be i g n o r e d ,

w i t h t h e r e s u l t i n g m e t h o d r a n k i n g a n d RMSE m a g n i t u d e s p o s s i b l y q u i t e d i f f e r e n t t h a n f i g u r e 3.

For example,

figure 6 separately

p r e s e n t s RMSEs f o r t h e mean f o r d a t a s e t s f r o m e a c h o f t h e f o u r

P

40

+

c 9) 9)

30

-

20

-

a c

.w" t n

B

lo 0

t

L

0

LR ALL LM ALL

t

I

I

I

0.25

0.50

1.0

2.0

cv F i g . 6. E s t i m a t i o n e r r o r s u s i n g t h e LR a n d LM m e t h o d s f o r 4 l o g normal d i s t r i b u t i o n s ( d i f f e r i n g CV's) and f o r a l l 16 p a r e n t d i s t r i b u t i o n s combined. Sample s i z e e q u a l s 25, w i t h c e n s o r i n g a t t h e 80th percentile.

lognormal d i s t r i b u t i o n s .

A l l d a t a s e t s c o n s i s t e d o f 25 o b s e r -

v a t i o n s and were censored a t t h e 8 0 t h p e r c e n t i l e . d i s t r i b u t i o n w i t h CV=O.25,

( L M ) h a s a n RMSE o f 9 p e r c e n t , a l o w e r R M S E o f 39 p e r c e n t .

For a lognormal

t h e l o w e s t ranked e s t i m a t i o n method w h i l e f o r CV=2.0

However,

t h e LR m e t h o d h a s

w i t h a l l 16 d i s t r i b u t i o n s

149 t o g e t h e r LR a n d LM e s t i m a t e t h e mean w i t h a n RMSE n e a r 30 p e r c e n t (fig.

6).

Therefore,

i f the parent d i s t r i b u t i o n o f a data set

c o u l d be i n f e r r e d f r o m a t t r i b u t e s o f d a t a above t h e d e t e c t i o n limit,

i m p r o v e d e s t i m a t e s o f RMSE m a g n i t u d e and p e r h a p s method

selection should result.

This i s t h e goal o f c l a s s i f i c a t i o n .

N o t e t h a t i f t h e t r u e C V w e r e 2.0, b e l a r g e r t h a n t h e 30 p e r c e n t w i t h a l l

an RMSE o f 39 p e r c e n t w o u l d 16 d i s t r i b u t i o n s i n c l u d e d .

Yet i t w o u l d b e a more a c c u r a t e e s t i m a t e f o r t h a t h i g h e r r o r p a r e n t distribution.

C l a s s i f i c a t i o n i n t h i s case should exclude data

f r o m l o w e r e r r o r ( l o w e r CV) d i s t r i b u t i o n s . Selection o f Class Boundaries Four d i m e n s i o n l e s s sample s t a t i s t i c s computed f r o m t h e d a t a above t h e d e t e c t i o n l i m i t were e v a l u a t e d f o r t h e i r a b i l i t y t o c l a s s i f y each d a t a s e t i n t o a g r o u p c o n t a i n i n g one o r more p a r e n t distributions.

S u c c e s s f u l c l a s s i f i c a t i o n o c c u r r e d when t h e p a r e n t

d i s t r i b u t i o n g e n e r a t i n g t h a t d a t a s e t was c o n t a i n e d i n t h e a s s i g n e d group.

T h e m o s t e f f e c t i v e s t a t i s t i c was t h e r e l a t i v e q u a r t i l e

r a n g e o r r q r ( G i l l i o m and H e l s e l ,

1985),

a measure o f t h e d i s p e r -

s i o n o f d a t a above t h e d e t e c t i o n l i m i t r e l a t i v e t o t h e m a g n i t u d e of the detection l i m i t .

The b e s t s e p a r a t i o n b e t w e e n g r o u p s was

evaluated using pairwise discriminant analysis.

The p r o b a b i l i t y

d e n s i t y f u n c t i o n e q u a t i o n s f o r each c o n s e c u t i v e group p a i r were solved,

a n d t h e p o i n t a t w h i c h t w o d e n s i t i e s w e r e e q u a l was t h e

optimum p o i n t o f s e p a r a t i o n . be d i s c r i m i n a t e d ,

Some d i s t r i b u t i o n g r o u p s c o u l d n o t

a n d t h e r e f o r e some r q r c l a s s e s r e p r e s e n t t w o

d i s t r i b u t i o n groups. B e n e f i t s o f C1 a s s i f i c a t i o n T h e b e s t e s t i m a t i o n m e t h o d was d e t e r m i n e d f o r e a c h c o m b i n a t i o n o f sample s i z e ,

c e n s o r i n g l e v e l and r q r c l a s s .

results without classification,

I n l i g h t of the

b e s t m e t h o d s f o r t h e mean a n d

standard d e v i a t i o n were determined se parately from those f o r the median and i n t e r q u a r t i l e range.

The b e s t m e t h o d was t h a t w h i c h

m i n i m i z e d t h e r a n k s o f RMSEs a c r o s s t h e t w o d i s t r i b u t i o n a l meters being considered.

If additional

nificantly different (t-test parameters,

a t a=0.05)

para-

m e t h o d s h a d RMSEs n o t s i g from t h e best f o r both

these were a l s o i n c l u d e d as "best."

Finally,

a single

b e s t m e t h o d o v e r a l l t h r e e s a m p l e s i z e s was s e l e c t e d f o r e a c h r q r class.

R e s u l t s a r e g i v e n i n G i l l i o m and H e l s e l ( 1 9 8 5 ) .

The s i n g l e

b e s t m e t h o d was o f t e n t h e o n l y m e t h o d t h a t q u a l i f i e d f o r b e s t f o r a l l t h r e e sample s i z e s .

Where m o r e t h a n o n e m e t h o d q u a l i f i e d o r

150 w h e r e n o n e was b e s t o v e r a l l s a m p l e s i z e s ,

t h e method which m i n i -

m i z e d t h e sum o f s q u a r e d R M S E s o v e r t h e t h r e e s a m p l e s i z e s was selected. I n every r q r class,

t h e b e s t e s t i m a t i o n method f o r t h e median

a n d i n t e r q u a r t i l e r a n g e was LM.

P r i o r t o c l a s s i f i c a t i o n t h e LR

m e t h o d was g e n e r a l l y b e s t f o r e s t i m a t i n g t h e mean a n d s t a n d a r d deviation,

b u t w i t h c l a s s i f i c a t i o n t h e LM,

UN,

o r NR m e t h o d s some-

t i m e s p r o d u c e s l i g h t l y l o w e r RMSE t h a n d i d L R .

These s l i g h t l y

l o w e r RMSEs a r e i n most i n s t a n c e s n o t s i g n i f i c a n t l y d i f f e r e n t (a=.05)

t h a n t h e RMSE o f L R .

tically significant, UN,

Even w h e r e d i f f e r e n c e s a r e s t a t i s -

they are not large.

In contrast,

n o r NR a r e s i m i l a r l y r o b u s t o v e r a l l r q r c l a s s e s .

n e i t h e r LM, For example,

LM has a s l i g h t l y b u t s i g n i f i c a n t l y l o w e r RMSE t h a n L R f o r b o t h t h e mean a n d s t a n d a r d d e v i a t i o n a t t h e 6 0 t h p e r c e n t i l e c e n s o r i n g l e v e l a n d r q r = 0.25

t o 0.60

(n=25).

i n the next highest rqr class mean a n d s t a n d a r d d e v i a t i o n ,

Yet L M i s t h e w o r s t method

( r q r = 0.60

t o 1.4)

f o r both the

w i t h RMSEs o v e r 100 p e r c e n t o f t h e

t r u e value f o r standard deviation. When a p p l y i n g p a r a m e t e r e s t i m a t i o n m e t h o d s t o a c t u a l w a t e r q u a l i t y data,

an i m p o r t a n t c o n s i d e r a t i o n i s method r o b u s t n e s s .

Given t h e p o s s i b i l i t y o f m i s - c l a s s i f y i n g i n d i v i d u a l d a t a s e t s based on r q r ,

and t h e small

any r q r c l a s s , making l o w - r i s k

i n c r e a s e s i n RMSE when L R i s u s e d f o r

t h e use o f t h e more r o b u s t L R method i s b e s t f o r e s t i m a t e s o f t h e mean a n d s t a n d a r d d e v i a t i o n f o r

a l l data sets. Accuracy o f RMSEs Though t h e c l a s s i f i c a t i o n s y s t e m does n o t ,

i n practice,

method s e l e c t i o n compared t o r e s u l t s w i t h no c l a s s i f i c a t i o n , does r e s u l t i n s u p e r i o r e s t i m a t e s o f e r r o r (RMSE),

alter it

by c o n s i d e r i n g

d i f f e r e n c e s due t o t h e p r o b a b l e p a r e n t d i s t r i b u t i o n .

Figure 6

showed t h a t R M S E s v a r y c o n s i d e r a b l y b e t w e e n d a t a s e t s f r o m d i f f e r ent parent distributions.

T h e c l a s s i f i c a t i o n s y s t e m was d e s i g n e d

t o i n d i c a t e t h e t y p e s o f p a r e n t d i s t r i b u t i o n s from which each d a t a s e t may h a v e o r i g i n a t e d ,

and t h e r e f o r e y i e l d more a c c u r a t e e s t i -

mates o f e r r o r ( w h e t h e r h i g h e r o r l o w e r ) t h a n t h e a v e r a g e RMSE f o r a l l data sets from a l l

16 p a r e n t d i s t r i b u t i o n s ,

such as g i v e n i n

f i g u r e 3. To i l l u s t r a t e t h e i m p r o v e m e n t i n R M S E a c c u r a c y f o l l o w i n g c l a s sification,

t h e data f o r 60th p e r c e n t i l e censoring (n=25) i s

p l o t t e d i n f i g u r e 7.

Shown i n t h e f i g u r e a r e t h e R M S E s f o r p e r f e c t

151

250 a

W

a

3 1

5

200 -

1

W

3 I1I

I

........ ........ 95-percent confidence interval of

+ uW

RMSE when all data sets are correctly classified 95-percent confidence interval of RMSE for all data sets falling in the rqr class corresponding t o each distriDution group

150

-

...............1

c7

RMSE for all data sets combined and no classification

<

-i

Z W

0

100

a v)

Q

w

0

I 50 [r

0

I

m

II

Iy:

P

PT

DISTRIBUTION GROUP F i g . 7. Comparison o f RMSEs w i t h and w i t h o u t c l a s s i f i c a t i o n f o r e s t i m a t e s o f t h e median f r o m d a t a s e t s o f n=25 censored a t t h e 60th popoulation percentile.

c l a s s i f i c a t i o n i n t o p a r e n t d i s t r i b u t i o n group, actual classification according t o rqr, classification.

those f o r t h e

a n d t h e RMSE w i t h o u t

When d a t a s e t s a r e c l a s s i f i e d , m o r e r e l i a b l e

RMSE e s t i m a t e s a r e o b t a i n e d . G i l l i o m and H e l s e l

( 1 9 8 5 ) show t h a t t h e r q r c l a s s i f i c a t i o n

system r e s u l t s i n RMSEs which a r e v e r y s i m i l a r t o t h e b e s t e s t i m a t e o f t r u e RMSE,

that of perfect classification.

Only a t 8 0 t h p e r -

c e n t i l e c e n s o r i n g do t h e RMSE v a l u e s s u b s t a n t i a l l y d e p a r t f r o m truth.

This r e f l e c t s the greater i n a b i l i t y t o correctly c l a s s i f y

152 Even a t 8 0 t h p e r c e n t i l e c e n s o r i n g ,

h i g h l y censored data sets. however,

r q r c l a s s i f ic a t i o n g e n e r a l y i m p r o v e s t h e a c c u r a c y o f

RMSE e s t i m a t e s o v e r t h o s e w t h no c a s s i f i c a t i o n . V E R I F I CAT I O N U n c e n s o r e d d a t a s e t s w i t h m o r e t h a n 50 o b s e r v a t i o n s f o r s u s pended sediment,

t o t a l phosphorus,

t o t a l Kjeldahl nitrogen,

and

n i t r a t e n i t r o g e n c o n c e n t r a t i o n s were o b t a i n e d f r o m 313 s t a t i o n s o f t h e U.S.

Geological Survey's

NASQAN n e t w o r k .

m o n t h l y samples t a k e n d u r i n g 1974-81,

Most d a t a were

r e s u l t i n g i n 917 d a t a s e t s

h a v i n g more t h a n 50 o b s e r v a t i o n s and no c e n s o r i n g . Suspended s e d i m e n t and m a j o r n u t r i e n t s d a t a w e r e a n a l y z e d r a t h e r t h a n t r a c e c o n s t i t u e n t s because: o

most a v a i l a b l e d a t a s e t s f o r t r a c e c o n s t i t u e n t s c o n s i s t e d o f l e s s t h a n 30 o b s e r v a t i o n s .

o

most t r a c e c o n s t i t u e n t d a t a s e t s c o n t a i n e d c e n s o r e d o b s e r v a tions.

0

s u s p e n d e d s e d i m e n t a n d n u t r i e n t s a r e t r a n s p o r t e d b y t h e same t y p e s o f p r o c e s s e s a s many t r a c e c o n s t i t u e n t s .

T h i s l a s t p o i n t i s i m p o r t a n t because s i m i l a r i t y i n t r a n s p o r t p r o c e s s w i l l t e n d t o r e s u l t i n s i m i l a r l y shaped f r e q u e n c y d i s t r i butions.

T h i s s i m i l a r i t y was p r e v i o u s l y c o m p a r e d i n f i g u r e 2.

For t h e v e r i f i c a t i o n t e s t s , and one o f n=25,

two subsamples,

were randomly s e l e c t e d w i t h r e p l a c e m e n t f r o m each

o f t h e 917 s e d i m e n t and n u t r i e n t d a t a s e t s . s a m p l e was c e n s o r e d a t 2 0 , method ( D a v i d ,

one o f s i z e n=10

1981),

40,

60,

Each r e s u l t i n g s m a l l

and 80 p e r c e n t b y t h e t y p e I1

as p o p u l a t i o n p e r c e n t i l e s w e r e n o t known.

W i t h t h i s m e t h o d t h e same f r a c t i o n o f e a c h d a t a s e t i s c e n s o r e d . Each o f t h e e i g h t p a r a m e t e r e s t i m a t i o n m e t h o d s w e r e a p p l i e d t o each censored sample.

Sample s t a t i s t i c s computed f r o m t h e o r i g i n a l

( n > 5 0 ) s e d i m e n t and n u t r i e n t d a t a s e t s w e r e u s e d as e s t i m a t e s o f

t h e t r u e p o p u l a t i o n p a r a m e t e r s i n RMSE c a l c u l a t i o n s . Results B e s t methods f o r t h e v e r i f i c a t i o n d a t a , R M S E o r w i t h RMSEs n o t s i g n i f i c a n t l y ( t - t e s t

t h e lowest,

methods w i t h t h e lowest a t a=0.05) l a r g e r t h a n

were i d e n t i c a l t o t h o s e o f t h e s i m u l a t i o n .

o v e r a l l m e t h o d f o r e s t i m a t i n g t h e mean,

The b e s t

standard deviation,

median,

a n d i n t e r q u a r t i l e r a n g e b a s e d o n h a v i n g t h e s m a l l e s t sum o f R M S E ranks over a l l f o u r d i s t r i b u t i o n a l levels,

and t h r e e sample s i z e s ,

parameters,

four censoring

was a t i e b e t w e e n LR a n d UN.

LR

153

p r o d u c e d t h e l o w e s t summed R M S E r a n k f o r t h e m o m e n t p a r a m e t e r s a n d LM f o r t h e p e r c e n t i l e p a r a m e t e r s f o r t h e v e r i f i c a t i o n d a t a . V e r i f i c a t i o n d a t a were t h e n c l a s s i f i e d by r e l a t i v e q u a r t i l e range ( r q r ) ,

a n d RMSEs w e r e c a l c u l a t e d f o r e a c h r q r c l a s s .

Ranks

o f m e t h o d RMSEs w e r e a g a i n s e p a r a t e l y summed f o r t h e m o m e n t a n d p e r c e n t i l e p a r a m e t e r s o v e r b o t h n=10 and n=25 sample s i z e s . RMSEs w e r e s i g n i f i c a n t l y

(t-test

a t a=0.05)

No

lower than those of

LR f o r t h e m o m e n t p a r a m e t e r s a n d o f LM f o r t h e p e r c e n t i l e p a r a meters. best,

T h e r e f o r e f o r e v e r y r q r c l a s s t h e s e two methods a r e e i t h e r

o r not signficantly d i f f e r e n t from the best,

and no s i g n i f i -

c a n t r e d u c t i o n i n e r r o r would r e s u l t f r o m s e l e c t i n g s e p a r a t e methods f o r each r q r class.

T h i s method s e l e c t i o n e x a c t l y f o l l o w s t h a t o f

t h e simulation study. The v e r i f i c a t i o n

r e s u l t s are strong evidence t h a t t h e previous

s i m u l a t i o n s t u d y l e d t o o p t i m a l c h o i c e o f e s t i m a t i o n methods f o r t h e mean,

standard deviation,

censored water-quality

median,

data sets.

and i n t e r q u a r t i l e range o f

Furthermore,

the verification

r e s u l t s show t h a t t h e r q r c l a s s i f i c a t i o n s y s t e m d e v e l o p e d f r o m s i m u l a t i o n s t u d i e s p r o v i d e s a n e f f e c t i v e means o f d i s t i n g u i s h i n g between d a t a s e t s o r i g i n a t i n g f r o m d i f f e r e n t t y p e s o f p a r e n t d i s t r i butions. E S T I M A T I O N O F SAMPLE S T A T I S T I C S F o r some a p p l i c a t i o n s ,

e s t i m a t e s o f sample s t a t i s t i c s r a t h e r

than population parameters might be desired from censored data. Uncensored w a t e r - q u a l i t y tics,

d a t a a r e summarized by t h e i r sample s t a t i s -

and comparisons between t h e s e d a t a and censored d a t a should

be on a n e q u a l b a s i s . Second S i m u l a t i o n S t u d y To d e t e r m i n e how w e l l t h e e i g h t m e t h o d s e s t i m a t e s a m p l e s t a t i s tics,

a s e c o n d s i m u l a t i o n s t u d y was p e r f o r m e d .

Distributional

shapes and o t h e r c r i t e r i a a r e i d e n t i c a l t o t h e p r e v i o u s s i m u l a t i o n study.

However,

RMSEs a n d b i a s w e r e c a l c u l a t e d ( u s i n g t h e mean

f o r example) as:

(3)

bias =

!

i= 1 ( x i

$,xo)/N

(4)

154 w h e r e yo i s t h e s a m p l e mean f o r t h e u n c e n s o r e d d a t a s e t ( r e p l a c i n g u ) , and t h e o t h e r p a r a m e t e r s a r e as p r e v i o u s l y g i v e n . C e n s o r i n g was a t t h e 20,

40,

60,

and 8 0 t h p e r c e n t i l e s o f each s i m u l a t e d

sample ( t y p e I 1 c e n s o r i n g ) ,

as opposed t o p e r c e n t i l e s o f t h e p a r e n t

population i n the f i r s t simulation study (type I censoring).

This

was t o f a c i l i t a t e c o m p a r i s o n w i t h t h e v e r i f i c a t i o n r e s u l t s . An e x a m p l e o f t h e r e s u l t s a r e s h o w n i n f i g u r e 8 .

Best methods

f o r t h e moment a n d p e r c e n t i l e p a r a m e t e r s i n t h i s new s i m u l a t i o n s t u d y w e r e LR a n d LM,

respectively,

rankings over a l l censoring levels. LR.

b a s e d o n t h e sum o f m e t h o d T h e o v e r a l l b e s t m e t h o d was

Best p e r f o r m i n g methods f o r e s t i m a t i n g sample s t a t i s t i c s were

t h u s i d e n t i c a l t o those f o r e s t i m a t i n g p o p u l a t i o n parameters. ever,

How-

t h e m a g n i t u d e s o f RMSEs d i f f e r f r o m t h o s e f o r p o p u l a t i o n para-

meters.

RMSEs o f s a m p l e e s t i m a t e s i n f i g u r e 8 c a n b e c o m p a r e d t o

t h o s e o f t h e p o p u l a t i o n p a r a m e t e r s p r e s e n t e d i n f i g u r e 3. a r e g e n e r a l l y s m a l l e r when e s t i m a t i n g s a m p l e s t a t i s t i c s .

RMSEs Therefore,

c o n f i d e n c e i n t e r v a l s a r o u n d t h e LR o r LM e s t i m a t e c a l c u l a t e d f r o m t h e b i a s a n d RMSE ( H e l s e l a n d G i l l i o m ,

1985) a r e s m a l l e r f o r i n -

c l u s i o n o f t h e uncensored sample s t a t i s t i c as compared t o t h e p o p u l a t i o n parameter.

RMSEs f o r t h e moment s a m p l e s t a t i s t i c s

70

fNy

N !R

NU ZE DT

60

+

.* l.M

!I

0

9)

NU

Q

c

40

d v)

I 30 K

8 DT ZE

iBk LR

NR

8 LR UN ZE DL NR DT

20

.W

10

SD F i g . 8. E r r o r s o f e s t i m a t i n g t h e u n c e n s o r e d s a m p l e mean, s t a n d a r d d e v i a t i o n ( S O ) , median, and i n t e r q u a r t i l e range (IQR). Sample s i z e e q u a l s 25, w i t h c e n s o r i n g a t t h e 4 0 t h p e r c e n t i l e .

155 decrease w i t h increasing r q r class, o f t h e population parameters.

the opposite trend from t h a t

T h i s i s due t o t h e g r e a t e r i n f l u e n c e

o f t h e h i g h e r o b s e r v a t i o n s o n t h e s a m p l e mean a n d s t a n d a r d d e v i a tion with higher rqr. censoring,

These h i g h e r o b s e r v a t i o n s r e m a i n a f t e r

p r o d u c i n g a more a c c u r a t e l y e s t i m a t e d sample s t a t i s t i c

w h i l e i n d i c a t i n g much l e s s a b o u t t h e p o p u l a t i o n p a r a m e t e r . V e r i f ic a t i o n o f S a m p l e S t a t is t i c E s t i m a t e s To v e r i f y t h e new s i m u l a t i o n r e s u l t s ,

t h e uncensored t r a c e

m e t a l d a t a s e t s s u m m a r i z e d i n f i g u r e 2 w e r e c e n s o r e d ( t y p e 11) a t t h e 20,

40,

60,

and 8 0 t h sample p e r c e n t i l e s and e r r o r s were c a l c u Table 1

l a t e d by comparison t o t h e uncensored sample estimates. l i s t s the water-quality s e t s f o r each.

p a r a m e t e r s c h o s e n a n d t h e number o f d a t a

Sample s i z e s r a n g e d f r o m 10 t o 40 o b s e r v a t i o n s .

Eleven o t h e r t r a c e c o n s t i t u e n t s had no d a t a s e t s which c o n t a i n e d o n l y uncensored o b s e r v a t i o n s and were n o t used. a l a r g e r number o f d a t a s e t s ,

In order t o obtain

i r o n and manganese d a t a w e r e i n c l u d e d

even though t h e y a r e n o t u s u a l l y found a t " t r a c e "

levels.

T r a c e m e t a l d a t a s e t s c o n t a i n i n g 1 0 t o 20 o b s e r v a t i o n s w e r e combined i n t o one group,

r e p r e s e n t i n g s a m p l e s i z e s g e n e r a l l y compa-

r a b l e t o n=10 s i m u l a t i o n r e s u l t s .

Data sets having fewer than

t h r e e d a t a p o i n t s a f t e r c e n s o r i n g were deleted.

A second group o f

d a t a s e t s h a v i n g f r o m 2 1 t o 40 o b s e r v a t i o n s was f o r m e d f o r c o m p a r i son t o n=25 s i m u l a t i o n r e s u l t s . a p p l i e d t o t h i s data.

Again,

The e i g h t e s t i m a t i o n methods w e r e

LR p r o v e d t h e b e s t o v e r a l l m e t h o d .

LR was b e s t f o r t h e m o m e n t p a r a m e t e r s a n d LM was b e s t f o r t h e p e r c e n t i l e parameters,

based on t h e rank c r i t e r i a g i v e n p r e v i o u s l y .

When c l a s s i f i e d b y r q r ,

RMSEs f o r a c t u a l t r a c e w a t e r - q u a l i t )

d a t a were s i m i l a r t o t h o s e o f t h e s i m u l a t i o n s .

O n l y m e d i a n esti-

m a t e s f o r 60 a n d 80 p e r c e n t c e n s o r i n g a p p e a r d i f f e r e n t , l a t i o n RMSEs h i g h e r t h a n a c t u a l .

w i t h simu-

T h i s i s perhaps due t o t h e

i n c l u s i o n o f l a r g e r sample s i z e s i n t h e a c t u a l t r a c e - d a t a mates,

esti-

with the simulation results representing conservative error

e s t i m a t e s based o n l y on n=10 o r n=25. CONCLUSIONS The m o s t r o b u s t e s t i m a t i o n method f o r m i n i m i z i n g e r r o r s i n e s t i m a t e s o f t h e mean,

median,

and i n t e r q u a r t i l e

r a n g e o f c e n s o r e d d a t a was t h e l o g - p r o b a b i l i t y

r e g r e s s i o n method

(LR).

standard deviation,

T h i s method i s based on t h e assumption t h a t censored observ-

ations follow the zero-to-censoring

l e v e l p o r t i o n o f a lognormal

156 d i s t r i b u t i o n o b t a i n e d by a least-squares

r e g r e s s i o n between

l o g a r i t h m s o f uncensored c o n c e n t r a t i o n o b s e r v a t i o n s and t h e i r normal scores. When m e t h o d p e r f o r m a n c e was e v a l u a t e d s e p a r a t e l y f o r e a c h d i s t r i b u t i o n a l p a r a m e t e r , LR r e s u l t e d i n t h e l o w e s t RMSEs f o r t h e mean a n d s t a n d a r d d e v i a t i o n .

The l o g n o r m a l maximum l i k e l i h o o d

e s t i m a t o r f o r c e n s o r e d d a t a ( L M ) p r o d u c e d l o w e s t RMSEs f o r t h e median and i n t e r q u a r t i l e range.

These two methods c o n s t i t u t e t h e

best procedures f o r t h e i r respective parameters. Using t h e r e l a t i v e q u a r t i l e range ( r q r ) ,

the interquartile

range o f uncensored observations d i v i d e d by t h e detection l i m i t , c e n s o r e d d a t a s e t s c a n b e c l a s s i f i e d i n t o g r o u p s r e f l e c t i n g probable parent distributions.

W i t h i n these r q r groups,

t h e accuracy o f

RMSEs s u b s t a n t i a l l y i m p r o v e d o v e r t h o s e w i t h o u t c l a s s i f i c a t i o n . The e i g h t methods were a p p l i e d t o uncensored suspended sediment and n u t r i e n t d a t a h a v i n g l a r g e sample s i z e s ( n > 5 0 ) . t h e e s t i m a t i o n m e t h o d t h a t was b e s t o v e r a l l , p e r c e n t i l e parameters separately,

Selection of

b e s t f o r moment a n d

and b e s t w i t h i n e v e r y r q r c l a s s

exactly followed those o f the simulation. E r r o r s i n e s t i m a t i n g s t a t i s t i c s o f uncensored samples r a t h e r than p o p u l a t i o n parameters were a l s o evaluated. e s t i m a t i n g s a m p l e s t a t i s t i c s w e r e LR a n d LM, moment a n d p e r c e n t i l e p a r a m e t e r s .

B e s t methods f o r

respectively,

for the

RMSEs w e r e a l m o s t a l w a y s s m a l l ' e r

when e s t i m a t i n g s a m p l e s t a t i s t i c s t h a n f o r p o p u l a t i o n p a r a m e t e r s (LM m e d i a n e s t i m a t e s o c c a s i o n a l l y h a v e g r e a t e r R M S E s ) , a n d w e r e s o m e t i m e s much s m a l l e r .

Therefore,

e s t i m a t e s o f uncensored sample

s t a t i s t i c s are i d e n t i c a l t o those o f population parameters,

but

have s h o r t e r c o n f i d e n c e i n t e r v a l s . These r e s u l t s f o r m t h e b a s i s f o r making t h e b e s t p o s s i b l e e s t i mates o f e i t h e r p o p u l a t i o n parameters o r sample s t a t i s i c s from censored water-quality

data.

The L R , m e t h o d f o r moment p a r a m e t e r s

a n d LM m e t h o d f o r p e r c e n t i l e p a r a m e t e r s s h o u l d b e t h e m e t h o d s o f c h o i c e when e s t i m a t i n g d i s t r i b u t i o n a l p a r a m e t e r s f o r c e n s o r e d trace-level

water-qua1 i t y data.

157

T a b l e 1.--Trace c o n s t i t u e n t s f r o m t h e NASQAN n e t w o r k used t o e s t i m a t e sample s t a t i s t i c s Number o f d a t a s e t s n=10-20 n=21-40 Parameter arsenic 100 7 dissolved arsenic 3 63 barium 5 0 boron 11 3 dissolved boron 19 7 1 13 copper dissolved copper 1 5 0 17 1e a d nickel 9 3 zinc 1 32 d i s s o l ved z i n c 0 2 iron 12 273 d i s s o l v e d ir o n 4 68 manganese 11 180 d i s s o l ved manganese 0 15 REFERENCES A i t c h i s o n , J o h n , On t h e d i s t r i b u t i o n o f a p o s i t i v e . r a n d o m v a r i a b l e h a v i n g a d i s c r e t e p r o b a b i l i t y mass a t t h e o r i g i n , J. A m e r i c a n S t a t i s t i c a l ASSOC., Sept., 901-908, 1955. A i t c h i s o n , J o h n , a n d J . A. C . B r o w n , T h e L o g n o r m a l D i s t r i b u t i o n , 1 7 6 pp., U n i v e r s i t y P r e s s , C a m b r i d g e , 1 9 5 7 . Cohen, A. C., Jr., S i m p l i f i e d e s t i m a t o r s f o r t h e normal d i s t r i b u t i o n when s a m p l e s a r e s i n g l y c e n s o r e d o r t r u n c a t e d , T e c h n o m e t r i c s , 1, 3, 2 1 7 - 2 3 7 , 1 9 5 9 . D a v i d , H. A., O r d e r S t a t i s t i c s , 2 n d Ed., 3 6 0 pp., J o h n W i l e y a n d Sons, I n c . , 1981. G i l l i o m , Robert J . , and Dennis R. H e l s e l , E s t i m a t i o n o f d i s t r i b u t i o n a l parameters f o r censored t r a c e - l e v e l w a t e r - q u a l i t y data. I : E s t i m a t i o n t e c h n i q u e s , Water Resources Research, i n p r e s s , 1985. a n d R o b e r t J. G i l l i o m , E s t i m a t i o n o f d i s t r i b u H e l s e l , D e n n i s R., t i o n a l Darameters f o r censored t r a c e - l e v e l w a t e r - a u a l i t v data. 11: V e r i f i c a t i o n and a p p l i c a t i o n s , Water Resources R e s e a r c h , i n p r e s s , 1985.

NATURAL VARIABILITY OF VATER QUALITY I N A TEMPERATE ESTUARY

1

Laurence E . Gadbois" and Bruce J . N e i l s o n V i r g i n i a I n s t i t u t e of Marine S c i e n c e / S c h o o l o f H a r i n e S c i e n c e The C o l l e g e of William & Mary i n V i r g i n i a G l o u c e s t e r P o i n t , VA 23062

AESTRACT I n t e r p r e t i n g t h e d a t a fron: w a t e r q u a l i t y m o n i t o r i n g n e t w o r k s i s difficult if

t h e n a t u r a l v a r i a b i l i t y o f t h e s y s t e m i s n o t known.

A n a l y s i s of d a t a from e s t u a r i e s i s made more d i f f i c u l t by t h e a d v e c t i o n o f s p a t i a l p a t t e r n s with t h e o s c i l l a t i n g t i d e s .

I n t h i s s t u d y samples

w e r e c o l l e c t e d froni a p o l y h a l i n e , p a r t i a l l y - m i x e d

e s t u a r y which

t y p i c a l l y h a s minilral l o n g i t u d i n a l g r a d i e n t s f o r n o s t water q u a l i t y measures.

W a t e r s a m p l e s f r o m a 2.5 meter s h o a l w e r e a n a l y z e d

for

n i t r o g e n and phosphorus c o n t e n t . Data f r o n two 57-hour

intensive studies indicate that hourly

f l u c t u a t i o n s were on t h e o r d e r o f 15%.

Furthermore t h e variations

showed no s i g n i f i c a n t c o r r e l a t i o n w i t h t i d a l h e i g h t . I n t h e s e c o n d p a r t o f t h e s t u d y , samples c o l l e c t e d a t 45 m i n u t e i n t e r v a l s were composited t o d e t e r r i i n e d a i l y a v e r a g e c o n d i t i o n s o v e r a n annual cycle.

In a d d i t i o n t o a s t r o n g s e a s o n a l s i g n a l , i t was found

t h a t d a i l y f l u c t u a t i o n s were on t h e o r d e r o f 20 t o 50 p e r c e n t f o r t o t a l n i t r o g e n a n d t o t a l p h o s p h o r u s a n d 30 t o 70 p e r c e n t f o r n i t r z t e - p l u s n i t r i t e nitrogen.

Data from m o n i t o r i n g networks w i t h less f r e q u e n t

o b s e r v a t i o n s must b e i n t e r p r e t e d w i t h c a u t i o n g i v e n t h e magnitude o f t h e s e s h o r t term v a r i a t i o n s w h i c h a r e a s s u m e d t o a r i s e f r o m n a t u r a l phenomena.

'VIMS C o n t r i b u t i o n KO. XXXX. *Current a d d r e s s :

Naval Ocean Systems C e n t e r , San Diego, CA 92152.

159

I N T R ODUCTIOIJ

A s s e s s m e n t o f w a t e r q u a l i t y c o n d i t i o n s i n a q u a t i c and m a r i n e systems t y p i c a l l y i n v o l v e s t h e c o l l e c t i o n o f g r a b s a m p l e s o n w h i c h p o l l u t a n t c o n c e n t r a t i o n s a r e measured.

O f t e n w e do n o t know t h e e x t e n t

t o which t h e s e g r a b samples a r e measuring " t y p i c a l " v a l u e s a s opposed t o v a l u e s s t r o n g l y i n f l u e n c e d by t i m e - t r a n s i e n t

perturbations.

€!ence,

n a t u r a l t e e p o r a l v a r i a b i l i t y can i n f l u e n c e t h e v a l i d i t y a n d u s e f u l n e s s of c o n c l u s i o n s based upon a s i n g l e o r s m a l l number of samples. Natural v a r i a t i o n s occur in both s p a c e and t i m e .

Spatial scales

r a n g e f r o m t h e a i c r o g r a d i e n t s s u r r o u n d i n g p l a n k t o n and s u s p e n d e d p a r t i c l e s (Lehman and S a n d g r e n , 1 9 8 2 ; K o r s t a d ,

1983) t o v e r t i c a l and

h o r i z o n t a l m a c r o g r a d i e n t s of t h e same s c a l e a s t h e w a t e r body. space v a r i a t i o n s a r e i n t e r - r e l a t e d

T i m e and

i n e s t u a r i e s because s p a t i a l p a t t e r n s

a r e a d v e c t e d up and down r i v e r w i t h t h e o s c i l l a t i n g t i d e s .

This e f f e c t

c a n be s e e n i n t h e d a t a ( F i g u r e 1 ) from an around-the-clock

sampling of

t h e P a g a n R i v e r , a small t r i b u t a r y o f t h e James R i v e r i n V i r g i n i a (Bosenbaum and N e i l s o n , 1 9 7 7 ) .

S a l i n i t y l e v e l s were h i g h e s t a t E i g h

W a t e r S l a c k (PWS) a n d l o w e s t a t Low W a t e r S l a c k (LWS).

Municipal

wastewater d i s c h a r g e s and t h e e f f l u e n t from meat p a c k i n g p l a n t s r e s u l t e d i n e l e v a t e d b e c t e r i a l l e v e l s i n t h e upper r e a c h e s o f t h e e s t u a r y .

Fecal

c o l i f o r n i l e v e l s were l o w e s t a t HWS when d i l u t i o n w i t h r e l a t i v e l y c l e a n James R i v e r w a t e r was t h e g r e a t e s t .

Thus t h e t e m p o r a l v a r i a t i o n s o f

f e c a l c o l i f o r m s and s a l i n i t y w e r e 1 8 0 d e g r e e s o u t o f p h a s e , b u t b o t h showed s e m i - d i u r n a l v a r i a t i o n s w i t h t h e t i d e s .

A l g a l growth, s t i m u l a t e d

by t t e n u t r i e n t s i n t r o d u c e d by t h e s e v e r a l d i s c h a r g e s , response t o

t h e d a i l y c y c l e of

sunlight.

varied

in

Dissolved oxygen

160

l2

(a)

t

* .

+

Figure 1 .

SALINITY

+

+

t

Short-term v a r i a t i o n s i n water q u a l i t y i n t h e Pagan River, Virginia: ( a ) semi-diurnal ( t i d a l ) v a r i a t i o n s i n s a l i n i t y l e v e l s a t three s t a t i o n s , ( b ) semi-diurnal ( t i d a l ) v a r i a t i o n s i n f e c a l coliforni l e v e l s a t four s t a t i o n s , and ( c ) diurnal v a r i a t i o n i n d i s s o l v e d oxygen concentrations a t a s i n g l e s t a t i o n .

161 c o n c e n t r a t i o n s , w h i c h w e r e i n t u r n a f f e c t e d by t h e p h o t o s y n t h e t i c a c t i v i t y , showed a marked d i u r n a l s i g n a l w i t h l i m i t e d t i d a l e f f e c t s . The p r e s e n t s t u d y had a s i t s o b j e c t i v e q u a n t i f i c a t i o n of n o n - t i d a l t e m p o r a l v a r i a b i l i t y u s i n g two d a t a s e t s .

Day-to-day

v a r i a t i o n s were

studied using observations o f d a i l y a v e r a g e w a t e r q u a l i t y c o n d i t i o n s made over an annual c y c l e .

Hourly w a t e r q u a l i t y measurenents t a k e n o v e r

two 57 hour p e r i o d s were used t o i n v e s t i g a t e s h o r t term v a r i a t i o n s s u c h a s t h o s e due t o a s t r o n o m i c a l t i d e s .

YATERIALS AND METHODS W a t e r s a m p l e s were drawn from t h e mid-depth of t h e 2.5 meter w a t e r colunin o v e r a n e a r s h o r e s h o a l a r e a i n t h e p o l y h a l i n e Y o r k R i v e r ( L a t i t u d e 37 1 4 . 8 ,

Longitude 76 30.1).

Samples were c o l l e c t e d w i t h a n

I S C O a u t o m a t i c w a t e r sampler, d e p o s i t e d i n g l a s s j a r s packed i n i c e , and

c o l l e c t e d w i t h i n t h r e e days.

Samples t h a t had been withdrawn from t h e

r i v e r e v e r y 45 m i n u t e s were combined i n t o d a i l y c o m p o s i t e s a a p l e s . S a m p l e s w e r e f i l t e r e d t h r o u g h a 300 m i c r o n n y l o n mesh t o remove d e t r i t u s and l a r g e zooplankton.

Sampling was c o n d u c t e d fron! J u l y 1 9 8 3

t o June 1984. During t h e two 57 hour i n t e n s i v e s t u d i e s (0800 Hay 22 t h r o u g h 1 6 0 0 Hay 24 and 0800 P a y 3 0 t h r o u g h 1600 June 1, 19841, samples were t a k e n from t h e r i v e r e v e r y h o u r , c o l l e c t e d w i t h i n e i g h t h o u r s , f i l t e r e d t h r o u g h a 300 micron nylon mesh as d e t a i l e d above, and f r o z e n w i t h i n 1 2 hours of sampling.

T h e s e s a m p l e s were a n a l y z e d

individually.

The p e r i o d s c h o s e n w e r e 180 d e g r e e s o u t o f p h a s e w i t h r e g a r d t o t h e t i d a l cycle.

162 N u t r i e n t measurements i n c l u d e d t o t a l phosphorus (EPA, 1979 365.21,

t o t a l n i t r o g e n (D’Elia

Method 353.21,

and S t r e u d l e r ,

avin’onia n i t r o g e n (EPA, 1979

p l u s - n i t r i t e n i t r o g e n (EPA, 1979

- Method

1 9 7 7 , a n d EPA, 1 9 7 9

- Method 353.21.

- Piethod

350.11,

-

and n i t r a t r -

Every t e n t h sample

was r u n i n d u p l i c a t e a n d s p i k e d w i t h a known s t a n d a r d t o measure t h e p r e c i s i o n and a c c u r a c y o f t h e a n a l y t i c a l t e c h n i q u e .

D u p l i c a t e s and

s p i k e s were w i t h i n a c c e p t a b l e limits (EPA, 1979). A l l c o n t a i n e r s and l a b ware which c o n t a c t e d t h e samples were r i n s e d

w i t h t a p w a t e r t h r e e t i m e s , r i n s e d w i t h 50% H C 1 o n c e , r i n s e d w i t h d i s t i l l e d deionized water t h r e e t i m e s , and a i r d r i e d b e f o r e u s e .

The

i n t a k e h o s e f o r t h e a u t o m a t i c w a t e r s a m p l e r was washed a s d e s c r i b e d above e a c h week.

RESULTS

Hour-to-hour

0.093

variability:

T o t a l phosphorus ranged between 0.041 and

m g / l d u r i n g t h e two i n t e n s i v e sampling p e r i o d s .

The mean v a l u e ,

s t a n d a r d d e v i a t i o n , r a n g e , minimum v a l u e , maximum v a l u e , and mean h o u r l y f l u c t u a t i o n w e r e v e r y s i m i l a r f o r t h e two p e r i o d s

(See T a b l e 1 and

F i g u r e s 2 and 3 ) .

T o t a l n i t r o g e n c o n c e n t r a t i o n s showed g e n e r a l l y

similar behavior.

A l t h o u g h mean c o n c e n t r a t i o n s were s l i g h t l y h i g h e r

d u r i n g t h e second p e r i o d , t h e s t a n d a r d d e v i a t i o n , r a n g e o f v a l u e s , a n d mean h o u r l y f l u c t u a t i o n a l l were s m a l l e r d u r i n g t h e l a t t e r sampling effort.

When t h e d a t a f o r t h e s o l u b l e i n o r g a n i c p o r t i o n s a r e e x a m i n e d ,

one notes that nitrate-plus-nitrite e l e v a t e d and ammonia-nitrogen sampling period.

n i t r o g e n l e v e l s were s l i g h t l y

l e v e l s were much h i g h e r d u r i n g t h e s e c o n d

P r e v i o u s s t u d i e s i n t h e York R i v e r have documented

changes i n water q u a l i t y (Webb a n d D ‘ E l i a ,

1 9 8 0 ; D‘Elia

e t a l . 19811

163 Table 1.

Summary o f n u t r i e n t d a t a from t h e i n t e n s i - v e samplings.

TP

-

F i r s t sampling: May 2 2 24, 1984 Mean ( n = 5 7 ) Standard Deviation

NH4

TN

0.055 0.011

mg 0.548 0.096

Range Minimum Maximum

0.040 0.041 0.081

Kean Hourly F l u c t u a t i o n S t d Dev of H r l y F l u c

0.012 0,009

-----.-------

/

------------

1 0.073 0.025

0.030 0.005

0.400 0.343 0.743

0.106 0.025 0.131

0.021 0.019 0.040

0.094 0.075

0.014 0.015

0.004 0.004

-Standard Deviation Range Mean Rourly F l u c t u a t i o n Second sampling: May 30

-

N02+N03

--

a s p e r c e n t of sample mean 20 18 34 17 73 73 145 70 21 17 19 12

June 1,1984

TP

TN

-----------g

NH4

N02+N03

1 1 --------------

Ifean ( n = 5 7 ) Standard Deviation

0.058 0.011

0.581 0.068

0.188 0.022

0.035 0.006

Range M i n imum Maximum

0.050 0.043 0.093

0.374 0.431 0.805

0.128 0.105 0.233

0.031 0.019 0.050

Mean E o u r l y F l u c t u a t i o n S t d Dev o f H r l y F l u c

0.011 0.009

0.068 0.055

0.015 0.014

0.005 0.005

-- a s Standard Deviation Range Mean Hourly F l u c t u a t i o n

19 86 18

p e r c e n t of sample mean -12 12 18 64 68 89 12 8 14

t h a t o c c u r when t h e r e i s i n c r e a s e d m i x i n g and reduced s t r a t i f i c a t i o n around times o f s p r i n g t i d e (Haas e t a l . 1 9 8 1 ) .

For t h e c a s e a t hand,

t h e t i d e r a n g e w a s a b o u t 55 cm d u r i n g t h e f i r s t sampling (neap t i d e ) and a b o u t 8 0 cm d u r i n g t h e s e c o n d p e r i o d ( s p r i n g t i d e ) .

The e l e v a t e d

ammonia c o n c e n t r a t i o n s c o u l d b e t h e r e s u l t of t h e mixing of ammonia-rich

164

May 2 2

-

24

Q

0.80

; I

I

I

0.70

1 cn

0.60 .r

0.50

+ :

I-

0.40

I

I

a30

A

Q

:

0.04

P

-

c

0.03

3 +N 0 L II

Ro2

Figure 2 .

I

I

I

I

I

10

20

30

40

50

0

4

Short-term v a r i a t i o n s i n water q u a l i t y i n t h e York River a t Gloucester P o i n t , May 22-24, 1984: ( a ) Total phosphorus, (b) Total n i t r o g e n , ( c > Ammonia n i t r o g e n , ( d ) N i t r a t e - p l u s - n i t r i t e n i t r o g e n , and ( e > Tidal h e i g h t .

165

:

10-

I I I I

May 30

-

June 1

I 7

1

.08-

m

E c

-r

n t-

1;

-061.80

II I

; c

I

\

I 1

1.70

:

-04-

:

5

-25-

1 I I

L50

-g \

20-

-

-. E

L60

+

)A0

:

l.05

:

c

cz

I

.15-

a

I 1

L

7

II I

\ m

:

I

.lo-

*4

E E .r

m

s

5-

I

+-

L O ~8

I I I

I1

0

I

I

I I

I

4-

CI

W I&W

Lo2

:

c c

c,

r

0, c

3-

W

X

m

-0

.,-

:20

I I I I I

:

Figure 3.

'0

I

10

20

30

I

I

40

50

4

Short-term v a r i a t i o n s i n water q u a l i t y i n t h e York River a t Gloucester P o i n t , Hay 30-June 1 , 1984: ( a ) Total phosphorus, (b) Total n i t r o g e n , ( c > Ammonia n i t r o g e n , ( d ) N i t r a t e - p l u s - n i t r i t e n i t r o g e n , and ( e l Tidal h e i g h t .

166 bottom w a t e r s t h r o u g h o u t t h e water column a t t h e t i m e o f s p r i n g t i d e s . It i s c u r i o u s t h a t t o t a l n i t r o g e n l e v e l s , however, remained nearly constant.

The i n c r e a s e i n mean TN (0.04 m g / l ) was much s m a l l e r t h a n

t h e i n c r e a s e i n mean ammonia l e v e l s (0.11 m g / l > .

The s o l u b l e i n o r g a n i c

f r a c t i o n s a c c o u n t e d f o r a b o u t 197: o f t h e t o t a l n i - t r o g e n d u r i n g t h e f i r s t s a m p l i n g b u t m a d e u p 38% o f t h e t o t a l n i t r o g e n d u r i n g t h e s e c o n d sampling. The d a t a i n d i c a t e t h a t hour-to-hour

v a r i a t i o n s a r e on t h e o r d e r o f

10% t o 20% o f t h e mean o f a l a r g e number o f samples.

Addi-tionally, t h e

r a n g e o f c o n c e n t r a t i o n s o b s e r v e d was o f t h e same o r d e r of magnitude as t h e mean c o n c e n t r a t i o n f o r e a c h of t h e w a t e r q u a l i t y m e a s u r e s . analysis of

Factor

t h e h o u r l y n u t r i e n t c o n c e n t r a t i o n s and t i d a l h e i g h t s

r e v e a l e d no s i g n i f i c a n t c o r r e l a t i o n b e t w e e n n u t r i e n t l e v e l s a n d t h e s t a g e of t h e t i d e .

T h e l a c k o f c o r r e l a t i o n i s a p p a r e n t when t h e d a t a

a r e compared i n g r a p h i c a l f o r m a t ( F i g u r e s 2 and 3 ) .

Day-to-day

variability:

Seasonal f l u c t u a t i o n s i n d a i l y a v e r a g e n u t r i e n t

c o n c e n t r a t i o n s were pronounced ( F i g u r e 4 ) .

T o t a l phosphorus l e v e l s were

h i g h e s t i n t h e summer (mean f o r J u l y t h r o u g h September of a b o u t 0.080 mg/l).

From t h i s p e r i o d u n t i l m i d - J a n u a r y ,

d e c l i n e d t o a b o u t h a l f t h e suninier v a l u e s .

total nutrient levels

The i n c r e a s e which began i n

mid-January and p e r s i s t e d t h r o u g h t h e end o f sampling i n J u n e was n o t a s r a p i d a s t h e d e c l i n e f r o m mid-summer

levels.

Examination of t h e

g r a p h i c a l summary o f t h e d a t a shows t h a t m o s t o f t h e v a l u e s f e l l i n a band o f a b o u t 0 . 0 2

t o 0.04 mg/l w i d t h , b u t f r e q u e n t l y v a l u e s t h a t were

much h i g h e r w e r e r e c o r d e d .

P h o s p h o r u s i s known t o s o r b t o m i a e r a l

p a r t i c l e s and t h e s e e l e v a t e d r e a d i n g s c o u l d b e a s s o c i a t e d w i t h i n c r e a s e d

167

0.15-

--

YORK RIVER

A

A A%

A

1 I

(a

TOTAL PROWORUS daily average A

I

0 JUN JUL AUG SEP OCT NOV DEC JAN 1983

FEB MAR APR MAY 1984

JUN

JUL

+ NITRATE

YORK RIVER

average A A A

JUN JUL AUG SEP OCT NOV DEC JAN 1983

F i g u r e 4.

FEB MAR

APR MAY 1984

JUN

JUL

Annual v a r i a t i o n of d a i l y average water q u a l i t y c o n d i t i o n s i n t h e York R i v e r a t G l o u c e s t e r P o i n t from June 1983 t o J u l y 1984: ( a ) T o t a l phosphorus , (b) T o t a l n i t r o g e n , and ( c ) N i t r a t e - p l u s - n i t r i t nitrogen

.

168

levels of turbidity that occur following storms. Total nitrogen followed a similar, although less pronounced, pattern.

Concentrations averaged over 0.7 mg/l from July through the

end o f October and 0.5 mg/l during the winter.

The seasonal trend for

njtrate-plus-nitrite, however, was the inverse of the total nitrogen pattern and was of a far greater magnitude.

Early summer levels were

near zero (mean of 0.003 for June 1983) and the mean for July and August was only 0.016 mg/l.

Concentrations increased from late August

averaged around 0.083 mg/l through January.

and

Daily values of 0.10 mg/l

in late January were followed by a rapid drop in concentration in February and March; spring (February through April) values averaged about 0.040 mg/l and decreased to a mean of about 0.025 mg/l for May and June. The pattern of day-to-day variability resembles the seasonal pattern in that nitrate-plus-nitrite was substantially more variable than total nitrogen and total phosphorus.

The daily fluctuations were

on the order of 30% to 50% for TN and TP and several hundred percent for nitrate-plus-nitrite.

DISCUSSION One would expect nutrient concentrations in the water column to be affected by runoff from the land.

Generally s p e a k i n g , h i g h

values for one nutrient usually were not correlated with high values for other nutrients.

This probably is due to missing data, the large volume

of the river near the sampling site, and the effects of tidal mixing. Iiowever in mid-April 1984, all three variables measured showed elevat ec levels (days 116-118).

River flow was high for the month with local

169 maxima on t h e 1 8 t h (day 1 0 9 ) and t h e 25th ( d a y 1 1 6 ) .

Rainfall records

i n d i c a t e t h a t r a i n f a l l n o t o n l y was above normal, b u t t h a t most o f i t o c c u r r e d on a few d a y s ( A p r i l 4-5,

14-16,

and 2 2 - 2 3 ) .

It is not c l e a r

why t h e s e e v e n t s had s u c h a pronounced e f f e c t on w a t e r q u a l i t y , b u t t h e c o n c u r r e n t r i s e i n TN, TP and n i t r a t e - p l u s - n i t r i t e

a t a t i m e of high

r i v e r f l o w s u g g e s t s t h a t r u n o f f was t h e c a u s e . A marked r e d u c t i o n i n c i t r a t e - p l u s - n i t r i t e a b o u t d a y 45.

l e v e l s can be noted a t

T h e York R i v e r e s t u a r y t y p i c a l l y e x p e r i e n c e s a s p r i n g

p h y t o p l a n k t o n bloom and t h i s i s b e l i e v e d t o be t h e c a u s e o f t h e c h a n g e in nitrate-plus-nitrite that t i m e .

levels.

Water t e m p e r a t u r e s were a b o u t 5 C a t

I n December a n d J a n u a r y , t h e w a t e r was r e l a t i v e l y c l e a r

( S e c c h i d e p t h r e a d i n g s w e r e o n t h e o r d e r o f 1 . 5 m) i n p a r t b e c a u s e p h y t o p l a n k t o n l e v e l s were low ( c h l o r o p h y l l c o n c e n t r a t i o n s a v e r a g e d a b o u t

6 micrograns per l i t e r ) .

F r o m mid-February

t h e S e c c h i d e p t h a v e r a g e d o n l y a b o u t 0.75 a v e r a g e d o v e r 20 m i c r o g r a m s p e r l i t e r .

t h r o u g h t h e end o f Karch, m and c h l o r o p h y l l l e v e l s

Whether t h e a l g a e u t i l i z e d t h e

n i t r a t e and n i t r i t e d i r e c t l y , or u t i l i z e d ammonia, t h e r e b y r e d u c i n g t h e amount of ammonia a v a i l a b l e f o r n i t r i f i c a t i o n , t h e d a t a s u g g e s t t h a t t h e decrease i n nitrate-plus-nitrite

l e v e l s was r e l a t e d t o t h e s p r i n g a l g a l

bloom.

CONCLUSIOhTS

D a t a f r o m two t y p e s o f s a m p l i n g i n d i c a t e t h a t n a t u r a l v a r i a b i l i t y i n water q u a l i t y c o n d i t i o n s i s g r e a t .

Hour-to-hour

v a r i a t i o n s are on

t h e o r d e r o f 1 0 % t o 20% o f t h e mean o f a l a r g e number of samples.

The

r a n g e o f c o n c e n t r a t i o n s o b s e r v e d o v e r a d a y or two i s o f t h e same magnitude a s t h e mean c o n c e n t r a t i o n .

170 S e a s o n a l v a r i a t i o n s c a n b e pronounced f o r water q u a l i t y . T o t a l n i t r o g e n a n d t o t a l p h o s p h o r u s l e v e l s were h i g h e s t i n t h e summer and lowest i n t h e w i n t e r ; n i t r a t e - p l u s - n i t r i t e

n i t r o g e n was p r e s e n t a t

v e r y low l e v e l s d u r i n g t h e summer and was abundant d u r i n g t h e w i n t e r , p r e s u m a b l y a s t h e r e s u l t o f u p t a k e o f ammonia a n d n i t r a t e b y phytoplankton.

Day-to-day

v a r i a t i o n s were on t h e o r d e r of 30% t o 50%

f o r Tn a n d TP a n d u p t o s e v e r a l h u n d r e d p e r c e n t f o r n i t r a t e - p l u s n i t r i t e , d e s p i t e a sampling p r o t o c o l d e s i g n e d t o r e d u c e t h e i n f l u e n c e o f t i d e s and o t h e r s h o r t t e r n phenomena.

Presumably m e t e o r o l o g i c a l e v e n t s

s u c h a s t h e p a s s a g e of f r o n t s , winds, and r u n o f f from t h e a d j a c e n t l a n d produce some of t h e v a r i a b i l i t y o b s e r v e d . The i n t e r p r e t a t i o n o f v o n i t o r i n g d a t a ciust be conducted w i t h t h e understanding t h a t t h e r e is considerable v a r i a b i l i t y i n t h e r e c o r d s a t time s c a l e s of

h o u r s and d a y s .

Care m u s t b e t a k e n t o i n s u r e t h a t

c o n c l u s i o n s d e r i v e d from w a t e r q u a l i t y m o n i t o r i n g programs a r e n a d e w i t h t h a t u n d e r s t a n d i n g i n mind.

REFERENCES A N ) 1.TTERATURE CITED D'Elia, C . F . and C. S t r e u d l e r , 1977. " D e t e r m i n a t i o n o f t o t a l n i t r o g e n i n aqueous samples u s i n g p e r s u l f a t e d i g e s t i o n " Limnology & Oceanography 2 2 ( 4 ) : 760-764. "Time Varying D'Elia, C . F., K . I.. Webb a n d R . L. W e t z e l , 1 9 8 1 . H y d r o d y n a m i c s a n d Water Q u a l i t y i n a n E s t u a r y " i n Estuaries and Nutrients, N e i l s o n and Cronin Eds., Hunana P r e s s , C l i f t o n , N. J.

E n v i r o n m e n t a l P r o t e c t i o n Agency (EPA) , 1 9 7 9 . Methods for Chemical Analysis of Water and Wastes. EPA-600/4-79-020. G a d b o i s , L. E . , 1984. "The Fesponse of B e n t h i c R e s p i r a t i o n t o N u t r i e n t Levels", u n p u b l i s h e d KS t h e s i s , School of M a r i n e S c i e n c e , C o l l e g e of William & Nary i n V i r g i n i a , 91pp.

171 Haas, L. W., F. J . Holden and C. S. Welch, 1981. "Short Term Changes i n V e r t i c a l S a l i n i t y D i s t r i b u t i o n of t h e York R i v e r E s t u a r y A s s o c i a t e d w i t h Reap-Spring T i d a l Cycle" i n Estuaries and Nutrients, N e i l s o n and Cronin E d s , Humana P r e s s , C l i f t o n , N. J. K o r s t a d , J. , 1983. " N u t r i e n t r e g e n e r a t i o n by z o o p l a n k t o n i n s o u t h e r n J. G r e a t . Lakes Res. 9(3): 374-388. Lake Huron". Lehman, J. T., and C. D. Sandgren, 1982. " P h o s p h o r u s d y n a m i c s o f t h e Limnol. & Oceanogr. p r o c a r y o t i c nanoplankton i n a Michigan lake". 27(5) : 828-838. Rosenbaum, A. and B. N e i l s o n , 1977. "Water Q u a l i t y i n t h e Pagan River" S p e c . Rep. No. 132, V i r g i n i a I n s t i t u t e o f M a r i n e S c i e n c e , G l o u c e s t e r P o i n t , VA. Webb, K . I.. and C. F. D'Elia, 1980. " N u t r i e n t and Oxygen R e d i s t r i b u t i o n D u r i n g a S p r i n g Neap T i d a l C y c l e i n a Temperate E s t u a r y " S c i e n c e 207, 29 Feb 1980, pp. 983-985.

This Page Intentionally Left Blank

EXTENSION OF WATER QUALITY DATA BASES I N PLANNING FOR W A T E R TREATMENT G.T. ORLOB A N D N. M A R J A N O V I ~ University of California, Davis

ABSTRACT Design of of are

water

treatment facilities requires estimation of extreme values

critical water quality parameters.

or

sparse

non-existent

a

When water quality data for the source

sufficient

record

for

statistical

be constructed from fragmentary records at nearby locations.

analysis

must

A procedure is

described for construction of the necessary record and derivation of a design target

vector

records, quality

of

time

water

series

parameters

quality.

analysis,

from

It

includes spatial

frequency

analysis

correlations of

and

correlation

partial

of

water

both continuous and grab sampling campaigns.

It is

demonstrated for the example of tho North Bay Aqueduct of the California State Water Project. 1.

INTRODUCTION The North Bay Aquaduct, a component of the California State Water Project

(SWP), will divert water from a tributary of the Sacramento River in

Northern

California to serve municipal and industrial users, who will have to

provide

treatment

preparatory

to distribution.

Initially,

the

SWP planned to divert

water from Cache Slough in the northern Sacramento-San Joaquin Delta, the present

location

of

deterioration of was

installed

the

intake for the

water

has

quality

motivated

alternative location

at

City of

designers of

on nearby

Vallejo.

this location the

However, progressive

since Vallejo's pumping plant new

aquaduct

Lindsey Slough, as shown in

to

consider

an

Figure 1, where

water is expected to be of superior quality. It is necessary for the design of statistical

properties of

water

water treatment facilities to derive the

quality

at

the

new location

using records at

Cache Slough, Lindsey Slough, and other sampling stations without the advap tage of a common period of observation.

The temporal distribution of partial

records at various locations in the study area is summarized in Table 1. Records at Cache Slough, obtained by a continuous EC recorder over the period 1972 to 1984, are sufficiently detailed in the temporal sense to allow estimation cycles

(the

of

long location

term is

trends, influenced

seasonal by

tides)

variations, and

longer

quality

period

changes

due

tidal to

174

FIGURE 1. LINDSEY SLOUGH AND VICINITY, LLXXTION M4F' FOR DIVERSION POINT TABLE 1 SPATIAL CDRRELATIONS BMWEEN EC AT CACHE SLOUGH AND SELECTED UXATIONS EC(sta. Stat ion

Locat ion

Sample

Analysis

Period

)/

EC(C. s.)

2

Cache Slough at Vallejo Pumping Plant

C

Ec

72-84

1.0

3

Lindsey Slough at Hastings Cut

G

IT

77-83

0.69 0.57-0.67

4

Barker Slough at Hwy 113

G

85

0.50

5

Calhoun Cut at Hwy 113

G

85

0.60

6

Prospect Slough Liberty Island

G

oc oc cx:

85

0.25

7

Lindsey Slough near Rio Vista

G

oc

52-66

0.40 0.37-0.43

8

Barker Slough at Proposed Pumping Plant

G

oc

85

0.77-0.96

9

Cache Slough at Hastings Island Pwnping Plant

G

IT

77-83

0.55 0.52

C Continuous recorder; GGrab

EGElectrical conductivity; -Partial;

EC,CI,TDS; OC=Gomplete chemical

175 discrete hydrologic events, yet they do not include water quality parameters of greatest interest to treatment plant designers. Records at Lindsey Slough, on the

other hand, although

extending over a

period

more detailed in terms

without

regard

to hydrologic conditions that

problem

in this investigation

Slough location

of

quality

constituents,

1952-1969, are from monthly grab samples collected

may affect water quality.

The

is to derive a record of quality at the Cache

sufficient to allow correlation with the Lindsey Slough data.

When this is accomplished the Lindsey Slough record, with more quality information relative statistically

to design can be extended in time, translated in space and

analyzed

to

establish

limiting

criteria

for

treatment

plant

design. In this paper a procedure for development of the statistical properties of at

EC

the

proposed

diversion

location

(Station

8, Figure

1) is

described.

Additionally, the extension of this record to create a vector of water quality concentrations of statistical

key design parameters is discussed.

analysis,

after

adjustment

for

Finally, the results of

treatment

plant

operational

constraints, are transformed into specific targets for design. STATISTICS OF WATER QUALITY Two basic problems are presented i n this situation, one concerned with the

2.

spatial displacement observation

and

between the

the

other

location

concerned

of

with

the

diversion

and

points

temporal discontinuities

in

of the

various records. 2.1 Spatial Correlations

As illustrated in Table 1 there were no periods of concurrent observation at the two locations of longest record, Cache Slough and Lindsey slough near Rio Vista.

However, one set of grab samples (EC and chlorides) taken over the

period 1977-1983 does include both Cache Slough Hastings (Sta. 9) and Lindsey Slough

Hastings

Vallejo (Sta. Figure 2 .

2).

(Sta. A

plus

3),

the

correlogram

continuous

Synoptic surveys conducted

information

that

sion location.

permitted Results

extension of

of

these

EC

record

at

Cache

Slough

for the Cache Slough stations is shown in in

1984 and 1985 provided additional

the 77-83 correlations to the diver-

studies are summarized for

all stations in

the area in Table 1. A key relationship i n translating the experience of the two longer records to the diversion location tions (3 and 7). is

generally

Sloughs. that

is the correlation between the Lindsey Slough sta-

Results of

degraded

in

an

correlation analysis indicates that water quality upstream

direction,

in

both

Cache and Lindsey

For example, in Cache Slough the lower station shows water quality

is superior to that

at

the Vallejo diversion point

by

the ratio 0.55:l.

176

In Lindsey Slough the lower station is also superior by a ratio of 0.40:0.69 (in terms of

Cache Slough quality).

the dominance of

land-derived

The significance of

sources of

this degradation

is

salinity over the primary source of

water for diversion, the Sacramento River at the confluence with its two tributaries.

During periods of storm runoff, water entering the upper reaches of

the sloughs is generally inferior, accumulated

during the

persists

accretions

as

added to the system. is

inverse

apparently as a result

prior

dry

from

groundwater

period.

The overall result

from that

of

During dry and

local

pick

u p of

periods this irrigation

salts

condition

drainage

are

is a salinity (quality) gradient that

normally encountered

in

estuarial systems,

i.e.

negative

in the seaward direction.

F

2

500

200

z t-

(I)

r

/ /O 00

t

,-doocr,

'

/

I

/\

-0

oo

/ EC,

=

0.55 EC,

/

V

/

W

0

0

I

I

I

EC

I 600

I

I

200

0

400

J

I

I

800

VALLEJO PUMPING PLANT, pmhos/cm

FIGURE 2. CORRELATION BETWEEN EC's AT TWO CACHE SLOUGH STATIONS, 1978-1983 2.2 Time Series Analysis

Attempts to were not ties of

extend

the partial

records

by

traditional statistical

altogether successful in this case, apparently

the

estuarial

environment.

Nevertheless,

they

methods

due to the complexiprovided useful insight

in interpretation of partial water quality records. The Cache Slough EC record, a fragment of which is illustrated in Figure 3, was divided into two parts of equal length and tested for stationarity with

BMDP (Dixon, et al, 1981). significant

and

the

existed, apparently

time

The difference in mean values was found to be series

was

tested

due to the accumulation of

for

trend.

A

positive

trend

salt in the tributary drainage

177

due to domestic waste discharges of a small city where increasing use of water softeners

has

been

increased

salinity

noted.

were

After

identified,

detrending a

primary

of

the

cycle

data

associated

two

cycles

with

of

surface

runoff during the period October through March and a secondary cycle related to

the

irrigation

period

April

through

September.

The

dominant

cause

of

abnormal salinities, however, is surface runoff.

I

FIGURE 3. PARTIAL RECORD OF EC AND PRECIPITATION AT CACHE SLOUGH, CITY OF VALLEJO PUMPING PLANT

Regression

with

attempted.

precipitation

This effort

ficiently strong

to justify

data

base to overlap that

then

to

resort

to

at

the

nearest

was unsuccessful; utilization of

frequency

of

that

meteorologic

a regression

of

the

was

equation to extend the

the Lindsey Slough station. analysis

station

is, correlations were not suf-

partial

It

records,

was necessary relating

these

through the spat!al correlations described above. 2.3 Frequency Analysis The

time

series analysis did

analysis of Cache Slough EC data. with

periods

of

high

produce

information

of

value

in

frequency

I t associated t h e dominant episodes of EC

surface runoff,

thus

indicating

the

importance of

this

source

of

salinity

in

establishing

critical

design

criteria

for

water

treat-

ment. Two factors control the design of water treatment from the point of view of

specific water quality parameters:

peak

concentration and duration.

In

analysis of EC data at Cache Slough individual episodes were characterized by frequencies

of

exceedence

at

specified

durations

of

1,

Results of this analysis are summarized in Table 2. EC at Cache Slough are illustrated in Figure 4. translated to Lindsey Slough and the location correlation relationships summarized in Table 1.

3,

7

and

days.

30

Typical distributions for

These distributions are then of

proposed diversion by

the

TABLE 2 F REQ UENCY-DU RATION-EXCEE DENCE ELECTRICAL CONDUCTIVITY AT CACHE SLOUGH 1972 - 1984

Limits of Exceedence, pmhos/cm Recurrence Interval - years Duration, days

1

2

5

10

1 3 7 30

1170 1070 950 580

1220 1110 1000 740

1350 1140 1070 870

1950 1160 1120 950

2.4 Other Quality Parameters

Electrical

conductivity

is

not

itself

is necessary also to describe the water position,

hardness,

silica,

e.g. iron and manganese.

turbidity

and

sufficient

for

design

purposes.

It

supply in terms of its mineral comthe

concentration of

certain

metals,

Since these data were not available at Cache Slough

they had to be developed for the Lindsey Slough

-

Rio Vista location, then

transferred to the diversion point. For

the

quantities required

mineral

derived

constituents, i.e.

from

values can

be

these,

like

derived

by

the

principal

hardness

and

cations and

total

correlation with

EC.

dissolved In

anions and solids,

general

the these

correlations take the form

x

= K(EC)"

(1)

where X is the desired quality parameter and K and n are constants. Table 3 summarizes the EC correlations developed for the Lindsey Slough location. vs EC.

Figure

5

presents

a

representative quality

correlation,

chlorides

179

0

I

\

l

I

I

l

I

I

I

I

I

I

I

I

I

I

-

v)

0

c

-

E

=

-

1000-.

FIGURE 4. FREQUENCY O F EXCEEDENCE OF E C AT VARIOUS DURATIONS, CACHE SLOUGH--VALLEJO

140 120

2

z ~

W

e

s

[I

0

I

I

I

I

I

I

I

I

I

I

I

+ /

-

+ -

/+

100-

+/

H+

00

-

+ + + + /’

+A

-

+

++p+ & , /*

60-

+

4020

/

-

o y

0

0

’

-

CI- = 0.015 EC’’3

/ + / J I

I

1

I

I

I

I

I

I

1

ELECTRICAL CONDUCTIVITY pmhos/cm FIGURE 5. WATER QUALITY CORRELATION, CHLORIDES VS ELECTRICAL CONDUCTIVITY, LINDSEY SLOUGH NEAR RIO VISTA, 1952-1966

,

180

TABLE 3 CORRELATION OF WATER QUALITY CONSTITUENTS WITH ELECTRICAL CONDUCTIVITY, LINDSEY SLOUGH NEAR RIO VISTA Constituent

Range

EC Correlation

EC

140 - 500

TDS

100

-

1.0 40 + 0.46 EC

270

c1-

10

TH (as CaC03)

50 - 160

0.153 EC1.I4

Na+

10 -

40

0.035 EC1*l4

8 -

36

Ca++ (as CaCO3)

0.015

50

ECLmJ

0.075 EC1.14 0.078 EC1.14

Mg++ (as CaC03) SO4 HC03

60 - 200

0.71 ECoa9

Si02

10 -

25

none

Turbidity

20

-

700

none

Reactive processes,

an

(dissolved) silica,

cannot

indigenous soils

-

7

be related of

the

0.0008 EC1.83

70

important

t o EC, but

tributary

area.

certain

industrial

is more closely identified

with the

In

parameter

for

this locality

centrations varied between rather narrow limits, from

soluble silica

con-

1 0 t o 25 mg/l, and did

not appear t o depend on hydrologic or agricultural conditions. Turbidity, ditions,

on

particularly

pitation.

Since

frequency

analysis

the to these

other

hand,

episodes of were

was

closely

surface runoff

generally

was possible,

related

stochastic

although limited

to

hydrologic

generated by

in

heavy

character

t o some extent

conpreci-

traditional by available

Turbidities measured a t t h e Cache Slough Vallejo intake for a period of

data. about

four

point.

years

1980-1983

served

as

surrogate

They were utilized directly without

measures

for

the

diversion

correction for geographic disloca-

tion. 2.5 Water Quality a t Diversion Point Five year-1 day concentrations of key water quality parameters were determined

at

the

several

sampling

locations,

point by means of correlations presented for

the

diversion

point

was

formed

then

translated

in Table 1.

that

was

to

the

diversion

Thus, a quality vector

considered

representative

of

extremes that would have t o be accommodated in an economic design for water treatment.

The final design criteria a r e presented in Table 4.

181

TABLE 4 WATER TREATMENT DESIGN TARGETS NORTH B A Y AQUEDUCT POINT OF DIVERSION Constituent

Target, mg/L*

Turbidity NTU Dissolved S O 2 , mg/L Calcium Magnesium Total Hardness

710 30 180 170 350

Sodium Potassium

180 14

C hlor ide Sulfate Alkalinity

128 175 24 1

Total Dissolved Solids, mg/L Electrical Conductivity, ,umhos/cm

760 810

*As equivalent C a C 0 3 except a s otherwise noted 3.

SUMMARY A N D CONCLUSIONS A

water

general

procedure

treatment

described.

It

quality

adjacent

at

cause-effect

for

facilities includes

developing using

a

considerations

locations,

relationships.

The

water

water of

quality

of

spatial

fragmentary

and

principal

steps

targets

unknown and

temporal

discontinuous in

the

for

quality

design has

variations records,

procedure

are

of

been in and as

follows: 1.

Spatial correlation between stations with partial records

2.

Time series analysis of selected records

3.

Frequency analysis

4.

Selection of design frequency and duration of exceedence

5.

Correlation analysis between multiple parameters

6.

Translation of quality characteristics t o design location

7.

Formation of a design target vector.

The procedure was applied t o water quality data from the CacheLindsey Slough area in the vicinity of a proposed pumping diversion t o t h e North Bay Aqueduct of the California S t a t e Water Project.

A vector of

design of a water treatment plant was derived.

water quality targets for

182

REFERENCES Dixon, W.J., Brown, M.B., Engleman, L., Frane, J.W., Hill, M.A., Jennrich, R.I. and Toporek, J.D., 1981. "BMDP Statistical Software", University o f California Press, Berkeley, Ca.

STATISTICAL WESLEY 0 .

INFERENCES FROM COLIFORM MONITORING O F POTABLE WATER

PIPES

INTRODUCTION C o l i f o r m m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems i n v o l v e s c o l l e c t i n g samples f r o m w a t e r s e r v i c e l o c a t i o n s and d e t e r m i n i n g i f c o l i f o r m b a c t e r i a a r e p r e s e n t i n one o r more subsamples,

each sub-

s a m p l e h a v i n g a s t a n d a r d v o l u m e o f e i t h e r 10 m l o r 1 0 0 m l .

If the

membrane f i l t e r t e c h n i q u e (MF) f o r s a m p l e e x a m i n a t i o n i s u s e d ,

a

s i n g l e s u b s a m p l e o f 100 m l i s t e s t e d a n d a n u m b e r , t h e MF c o l i f o r m colony count,

i s obtained along w i t h t h e information about t h e

presence o f c o l i f o r m bacteria. method i s used,

I f t h e fermentation tube (FT)

f i v e 10 m l subsamples a r e t e s t e d and t h e number o f

subsamples w i t h p o s i t i v e r e a c t i o n s ( c o l i f o r m s p r e s e n t ) i s recorded. Samples a r e c o l l e c t e d o n e o r more d a y s p e r m o n t h ( b u t u s u a l l y n o t e v e r y day o f t h e month) and f r o m one o r more s a m p l i n g l o c a t i o n s (but c e r t a i n l y n o t every possible sampling l o c a t i o n ) . o f t h e month t h e l a b o r a t o r y r e s u l t s a r e t a b u l a t e d ,

A t t h e end

c e r t a i n para-

m e t e r s a r e c a l c u l a t e d and compared w i t h s t a n d a r d s and t h e a c c e p t a b i l i t y o f t h e w a t e r f o r human c o n s u m p t i o n

is d e t e r m i n e d f r o m t h e

comparisons. There a r e s e v e r a l v e r y i n t e r e s t i n g s t a t i s t i c a l problems r e l a t e d t o t h e process o f c o l i f o r m monitoring. o f the statistical

l i t e r a t u r e which developed from t h e problems o f

coliform monitoring.

T h i s l i t e r a t u r e has been r e v i e w e d e l s e w h e r e

( E l Shaarawi and Pipes, here.

There i s a l a r g e section

1982) and w i l l n o t be e x p l o r e d f u r t h e r

Some o f t h e s t a t i s t i c a l

problems have been d e a l t h w i t h i n

g r e a t d e p t h w h i l e o t h e r s have b a r e l y been touched. T h e r e g u l a t o r y rationale f o r c o l i f o r m m o n i t o r i n g i s t o p r o v i d e a b a s i s f o r d e c i s i o n making.

The s a m p l i n g r e s u l t s f o r a m o n t h a r e

compared w i t h acceptance c r i t e r i a . teria,

I f t h e r e s u l t s exceed t h e c r i -

t h e n some a c t i o n m u s t b e t a k e n t o r e d u c e t h e l e v e l o f c o n -

t a m i n a t i o n o f t h e water system.

On t h e o t h e r h a n d ,

are l e s s than t h e acceptance c r i t e r i a ,

i f the results

no a c t i o n need be t a k e n .

I t i s usual t o r e p o r t t o t h e p u b l i c t h a t t h e w a t e r meets t h e bac-

t e r i o l o g i c a l standards w i t h o u t e x p l a i n i n g t h a t c e r t a i n l e v e l s o f c o n t a m i n a t i o n a r e a c c e p t a b l e under t h e standards used.

However,

184

Table 1 U. S . P R I M A R Y D R I N K I N G WATER REGULATIONS M i c r o b i o l o g i c a l Maximum C o n t a i n m e n t L e v e l s

A.

Membrane F i l t e r (MF) M e t h o d ( 1 0 0 m l S a m p l e s )

1. 2. 3.

B.

Sample a v e r a g e c o u n t s h a l l n o t be g r e a t e r t h a n 1 p e r 1 0 0 m l

No m o r e t h a n 1 s a m p l e w i t h c o u n t > 4 p e r 1 0 0 m l , i f l e s s t h a n 20 samples a r e exainined. No m o r e t h a n 5 % o f s a m p l e s w i t h c o u n t > 4 p e r 1 0 0 m l , o r more samples a r e examined.

F e r m e n t a t i o n Tube ( F T ) T e c h n i q u e ( f i v e 10 m l

1. 2. 3.

i f 20

o r t i ons )

No m o r e t h a n 1 0 % o f t u b e s p o s i t i v e .

No m o r e t h a n 1 s a m p l e w i t h 3 o r m o r e p o r t o n s p o s i t i v e

if

l e s s t h a n 20 s a m p l e s a r e examined. No m o r e t h a n 5 % o f s a m p l e s w i t h 3 o r m o r e p o r t i o n s p o s t i v e , i f 20 o r more s a m p l e s a r e examined.

i f t h e s t a n d a r d i s e x c e e d e d a n d t h i s f a c t is r e p o r t e d t o t h e p u b l i c , i t i s usual t o e x p l a i n t h a t i n s p i t e o f t h e existence o f "contamination"

i n t h e water t h e r e i s no danger t o h e a l t h .

The maximum m i c r o b i o l o g i c a l c o n t a m i n a n t l e v e l s ( M C L ' s ) o f t h e U.

S.

D r i n k i n g W a t e r R e g u l a t i o n s a r e g i v e n i n t a b l e 1.

examples o f acceptance c r i t e r i a p r e s e n t l y i n use. d i f f e r e n t r u l e s f o r each method o f examination.

These a r e

There a r e two It should be

noted t h a t t h e r u l e s a r e w r i t t e n i n terms o f sample parameters r a t h e r than parameters o f t h e occurrence o f c o l i f o r m b a c t e r i a i n t h e d i s t r i b u t i o n system. method a r e p a r a l l e l .

The two r u l e s f o r e a c h e x a m i n a t i o n

The f i r s t r u l e i n e a c h c a s e i s a l i m i t o n

t h e a v e r a g e number o f c o l i f o r m b a c t e r i a i n t h e samples a n d t h e second r u l e i s a l i m i t on t h e f r a c t i o n o f t h e samples w i t h l a r g e numbers o f c o l i f o r m b a c t e r i a p r e s e n t .

The number o f s a m p l e s e x -

amined each month v a r i e s f r o m 1 f o r systems s e r v i n g l e s s t h a n 1000 p e o p l e t o more t h a n 500 f o r v e r y l a r g e systems. There a r e two o t h e r problems which w i l l be mentioned here as an It i s not

a s i d e and t h e n c o n s i d e r e d f u r t h e r i n l a t e r s e c t i o n s .

c l e a r t h a t t h e r e i s any reason f o r u s i n g one month as a standard sampling p e r i o d o t h e r t h a n as a m a t t e r o f convenience.

Ideally,

t h e r e p o r t i n g p e r i o d s h o u l d be r e l a t e d t o t h e p e r s i s t e n c e l o f t h e microbiological water quality.

Also,

i t i s n o t c l e a r w h y t h e num-

b e r o f samples examined p e r r e p o r t i n g p e r i o d s h o u l d be d i f f e r e n t f o r water d i s t r i b u t i o n systems o f d i f f e r e n t s i z e s .

Indeed,

sam-

p l i n g t h e o r y s u g g e s t s t h a t t h e number o f s a m p l e s r e q u i r e d i s r e l a t e d t o t h e desired p r e c i s i o n o f t h e parameter estimation,

not t o

185

t h e s i z e o f t h e w a t e r system. S i n c e 1 9 7 8 , we h a v e e x a m i n e d some o f t h e q u e s t i o n s r e l a t e d t o t h e m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a i n studies a t Drexel University.

The o b j e c t i v e o f t h i s p a p e r i s

t o i d e n t i f y some o f t h e y e t u n s o l v e d p r o b l e m s a n d s t i m u l a t e f u r t h e r i n t e r e s t i n attempts t o f i n d s o l u t i o n s f o r these problems. C o l i f o r m m o n i t o r i n g d a t a c a n p r o v i d e much m o r e i n f o r m a t i o n a b o u t w a t e r s y s t e m s t h a n i s now o b t a i n e d a n d t h e r e a r e some s i g n i f i c a n t problems needing f u r t h e r s t a t i s t i c a l

investigation.

FREQUENCY DISTRIBUTIONS FOR COLIFORM DENSITY The i n i t i a t i o n o f t h e s t u d i e s a t D r e x e l was t h e q u e s t i o n o f t h e minimum number o f samples p e r month needed f o r m o n i t o r i n g t h e s a m l l e s t w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a (Pipes and C h r i s t i a n ,

1982).

I t was w i d e l y r e c o g n i z e d t h a t t h e o n e sam-

p l e p e r month f o r t h e s m a l l e s t systems i s n o t adequate b u t ,

i n

1 9 7 8 , t h e r e was n o g o o d m e t h o d o f d e t e r m i n i n g how many s a m p l e s would be adequate.

To a p p r o a c h t h e q u e s t i o n o f t h e a d e q u a c y o f

t h e n u m b e r o f s a m p l e s i t i s n e c e s s a r y t o a s s u m e t h a t t h e r u l e was i n t e n d e d as a l i m i t o n t h e a v e r a g e c o l i f o r m d e n s i t y i n t h e w a t e r d i s t r i b u t i o n system.

C l e a r l y , t h e average c o l i f o r m colony count o f

t h e s a m p l e s c a n b e u s e d t o e s t i m a t e t h e mean c o l i f o r m d e n s i t y o f t h e w a t e r i n t h e d i s t r i b u t i o n s y s t e m a n d i t seems r e a s o n a b l e t o assume t h a t t h e c o m m i t t e e t h a t f o r m u l a t e d t h e f i r s t r u l e i n t e n d e d , i n some way,

t o p u t a l i m i t on t h e t o t a l number o f c o l i f o r m b a c t e r i a

w h i c h i s t h e mean d e n s i t y t i m e s t h e v o l u m e o f w a t e r i n t h e s y s t e m . Also,

i n o r d e r t o e v a l u a t e adequacy o f t h e number o f samples i t

i s n e c e s s a r y t o assume s o m e t h i n g a b o u t t h e d e s i r e d p r e c i s i o n o f t h e e s t i m a t e o f t h e mean c o l i f o r m d e n s i t y . s e t a t 1 p e r 100 m l ,

S i n c e t h e l i m i t was

we a s s u m e d t h a t t h e f o r m u l a t o r s o f t h e r u l e

w e r e c o n c e r n e d t h a t a mean c o l i f o r m d e n s i t y o f 1 p e r 1 0 0 m l w o u l d i n d i c a t e l a c k o f adequate p r o t e c t i o n ;

i.e.,

t h a t t h e r e i s something

s i g n i f i c a n t a b o u t 1 p e r 100 m l o t h e r t h a n t h a t i t i s a s m a l l number which i s not zero. m a t t e r o f concern, ficiencies.

We f u r t h e r a s s u m e d t h a t ,

i f 1 p e r 100 m l i s a

t h e n 10 p e r 1 0 0 m l w o u l d i n d i c a t e s e r i o u s d e -

I n o t h e r words,

a confidence i n t e r v a l on t h e estimate

o f t h e mean c o l i f o r m d e n s i t y w h i c h i n c l u d e d 1 0 p e r 1 0 0 m l w o u l d n o t be acceptable.

This leads t o the formulation o f a c r i t e r i o n

t h a t t h e s a m p l e s t a t i s t i c s s h o u l d a l l o w a n e s t i m a t i o n o f a mean c o l i f o r m d e n s i t y o f 1 p e r 100 m l w i t h a 95% c o n f i d e n c e i n t e r v a l o f t o r

-

1 p e r 100 m l .

186 E s t i m a t i o n o f t h e mean c o l i f o r m d e n s i t y o f a w a t e r d i s t r i b u t i o n system i s e a s i e r i f t h e f r e q u e n c y d i s t r i b u t i o n o f c o l i f o r m d e n s i t y

i s known.

I n particular,

i f the variance o f the coliform density

i s r e l a t e d t o t h e mean d e n s i t y ,

then i t i s essential

t o know t h e

frequency d i s t r i b u t i o n . Our i n v e s t i g a t i o n s o f t h e f r e q u e n c y d i s t r i b u t i o n s o f c o l i f o r m d e n s i t i e s have r e l i e d e n t i r e l y o n MF c o l i f o r m c o l o n y c o u n t s .

MF

c o l i f o r m c o l o n y c o u n t s h a v e t o b e i n t e g e r s w h i c h h a s l e d some i n vestigators t o t r y t o f i t the counts t o a negative binomial d i s tribution.

We h a v e p u b l i s h e d o n t h i s ( C h r i s t i a n a n d P i p e s ,

b u t now b e l i e v e t h a t t h i s p r o c e d u r e i s i n c o r r e c t .

1983)

Use o f t h e n e -

g a t i v e b i o n o m i a l r e q u i r e s t h e a s s u m p t i o n t h a t 100 m l i s a n a t u r a l sampling u n i t .

I t i s t r u e t h a t c o l i f o r m bacteria occur o n l y i n

u n i t s o f one c e l l ;

however,

a r b i t r a r i l y selected. p l e volume,

t h e 1 0 0 m l v o l u m e f o r e x a m i n a t i o n was

I f 1 2 3 . 7 4 m l h a d b e e n s e l e c t e d as t h e sam-

i t w o u l d have been c l e a r t h a t c o l i f o r m d e n s i t y i s a

c o n t i n u o u s v a r i a b l e b e c a u s e an M F c o u n t o f 1 w o u l d h a v e i n d i c a t e d a d e n s i t y o f 0.81 p e r 100 m l . There a r e s e v e r a l c o n t i n u o u s f r e q u e n c y d i s t r i b u t i o n s w h i c h a r e s u i t a b l e f o r d e s c r i b i n g t h e MF c o l i f o r m c o l o n y c o u n t s w h i c h a r e o b t a i n e d i n samples f r o m w a t e r d i s t r i b u t i o n systems.

We h a v e u s e d

t h e l o g n o r m a l d i s t r i b u t i o n b e c a u s e i t i s f a m i l i a r t o some w a t e r works p e r s o n n e l and i t i s c o n v e n i e n t t o w o r k w i t h .

The l o g n o r m a l

d i s t r i b u t i o n c a n be d e s c r i b e d c o m p l e t e l y b y two p a r a m e t e r s w h i c h can be s p e c i f i e d i n two d i f f e r e n t domains. d e n s i t y and Y = logX.

L e t X be c o l i f o r m

Then Y i s n o r m a l l y d i s t r i b u t e d w i t h mean

u and v a r i a n c e u 2. The p a r a m e t e r s i n t h e c o u n t d o m a i n a r e t h e Y Y and t h e g e o m e t r i c s t a n d a r d d e v i a g e o m e t r i c mean, p x = a n t i l o g p Y' t i o n , uX = a n t i l o g u The mean a n d v a r i a n c e o f t h e u n t r a n s f o r m e d Y' 2 d e n s i t i e s a r e CI. = e x p ( p + 1 / 2 o Y 2 ) a n d B = a 2 ( e x p u Y 2 1) reY spectively.

-

I t has a l r e a d y been p o i n t e d o u t t h a t a sample o f w a t e r w i t h a

low c o l i f o r m d e n s i t y i s u n l i k e l y t o p r o d u c e c o l i f o r m c o l o n i e s o n a MF f i l t e r when a 100 m l s u b s a m p l e i s u s e d .

i s 0.1 p e r 100 m l

(1 per l i t e r ) ,

t h e p r o b a b i l i t y o f one o r more

c o l i f o r m s i n a 100 m l sample i s 0 . 0 9 5 2 i s 0.01 p e r 100 m l ( 1 p e r 10 l i t e r s ) ,

so f o r t h .

I f the coliform density

and,

i f the coliform density

t h e p r o b a b i l i t y i s 0.01 and

I f t h e w a t e r i n a d i s t r i b u t i o n system meets t h e r e g u l a -

t o r y c r i t e r i a o f a n a v e r a g e o f no m o r e t h a n 1 p e r 1 0 0 m l ,

t h e geo-

m e t r i c mean i s c o n s i d e r a b l y l e s s t h a n 1 p e r 1 0 0 m l e v e n w i t h a moderately small

uX.

Thus,

i n u s i n g MF c o l i f o r m c o l o n y c o u n t s we

187

are trying t o evaluate a

px

w h i c h i s u s u a l l y much l e s s t h a n a n y o f

t h e c o l i f o r m d e n s i t i e s t h a t we a r e a b l e t o m e a s u r e . There i s a l s o an upper l i m i t t o t h e c o l i f o r m d e n s i t y which can b e m e a s u r e d b y t h e MF m e t h o d .

I f two c o l i f o r m b a c t e r i a l a n d n e x t

t o e a c h o t h e r o n a membrane f i l t e r ,

t h e colonies t h a t they produce

w i l l merge and be c o u n t e d as a s i n g l e c o l o n y .

This e f f e c t i s not

t o o f r e q u e n t a t d e n s i t i e s i n t h e 1 t o 10 c o l o n i e s p e r f i l t e r r a n g e b u t i t becomes m o r e p r e v a l e n t a t h i g h e r d e n s i t i e s . t h e U.

The r u l e t h a t

E n v i r o n m e n t a l P r o t e c t i o n Agency uses t o m i n i m i z e t h i s

S.

e f f e c t i s t o r e c o r d a n y MF c o l i f o r m c o l o n y c o u n t g r e a t e r t h a n 80 o n a s i n g l e f i l t e r a s " t o o n u m e r o u s t o c o u n t " o r TNTC.

T h u s , we

have c o l i f o r m d e n s i t i e s w h i c h a r e " i n d e t e r m i n a t e h i g h " as w e l l as c o l i f o r m d e n s i t i e s which are "indeterminate low." Figure 1 i s a cumulative lognormal frequency d i s t r i b u t i o n P l o t ) f o r uX o f 3 0 a n d u X b e t w e e n l o - '

(Hazen

The h o r i z o n t a l

and

l i n e s r e p r e s e n t samples volumes w h i c h m i g h t be used f o r m o n i t o r i n g Any d e n s i t y l e s s t h a n 1 p e r s a m p l e

water d i s t r i b u t i o n systems.

volume w i l l be i n d e t e r m i n a t e as w i l l any d e n s i t y g r e a t e r t h a n about 80 p e r s a m p l e v o l u m e .

The p o i n t s used t o

a r e r e l a t i v e l y c l o s e t o g e t h e r and long extrapolation.

Thus,

px

estimate

t h e s l o p e ux

i s estimated from a rather

i t i s d i f f i c u l t t o h a v e much c o n f i d e n c e

i n t h e estimates o f t h e lognormal parameters o r even i n t h e select i o n o f t h e l o g n o r m a l as t h e f r e q u e n c y d i s t r i b u t i o n . E s t i m a t i o n o f t h e a r i t h m e t i c mean i s a somewhat d i f f e r e n t p r o blem t h a n e s t i m a t i o n o f t h e lognormal parameters and t h e e s t i m a t e s h o u l d be more p r e c i s e . lower l i m i t o f detection,

However,

the value o f interest i s a t the

the variance o f the densities i s very

l a r g e i n r e l a t i o n t o t h e mean a n d m o s t o f t h e s a m p l e s h a v e i n d e terminate densities.

T h e p r o b l e m o f e s t i m a t i n g a mean v a l u e f r o m

i n d e t e r m i n a t e r e s u l t s has n o t been t r e a t e d a d e q u a t e l y i n t h e s t a tistical

literature.

A l l things considered,

i t m i g h t be w i s e t o

s e l e c t some o t h e r p a r a m e t e r t o c h a r a c t e r i z e t h e m o n i t o r i n g r e s u l t s . E S T I M A T I O N O F FREQUENCY-OF-OCCURRENCE The s e c o n d m i c r o b i o l o g i c a l

MCL r u l e o f t h e U.

S.

D r i n k i n g Water

R e g u l a t i o n s i s an example o f a frequency-of-occurrence rule.

type o f A c o l i f o r m d e n s i t y i s s e l e c t e d as a l i m i t t o d i s t i n g u i s h be-

tween "contaminated" water and "uncontaminated" water. s e n t U.

100 m l .

S.

I n the pre-

R e g u l a t i o n s t h e l i m i t i s s e t a t a MF c o u n t o f 4 p e r

Then a f r a c t i o n i s s e l e c t e d ( i n 5% o f t h e s a m p l e s examined

i n a n y m o n t h ) w h i c h i s a l l o w e d as p o s i t i v e o r " c o n t a m i n a t e d "

188

Percent of Samples with Coliforms .o 1 5% Positive Samples

I

-I

~

/

I

-----

1 per 50ml 1 per lOOml---*-1 per 200ml-

Hazen Plot for

GSD = 30

I

,

GM = .007 10-5

10-6

--

! 2

I

I

5

10

,

1

1

1

1

I

20 30 40 50 60 70 80

I

90

95

98 99

,

I

99.8 99.9 99.! 9

Percent of Samples without Coliforms F i g u r e 1. C u m u l a t i v e Lognormal F r e q u e n c y D i s t r i b u t i o n ( H a z e n P l o t ) f o r C o l i f o r m D e n s i t i e s i n Water samples. The f r a c t i o n p o s i t i v e i s an e s t i m a t o r o f t h e f r e q u e n c y of-occurrence of col iform b a c t e r i a . The U. S . E n v i r o n m e n t a l P r o t e c t i o n Agency i s c o n s i d e r i n g t h e e l i m i n a t i o n o f t h e f i r s t MCL r u l e f o r r e v i s e d d r i n k i n g w a t e r r e g u lations. o f rule.

T h i s would l e a v e o n l y t h e f r e q u e n c y - o f - o c c u r r e n c e t y p e I f t h i s change i s adopted, i t i s l i k e l y t h a t the l i m i t i n g

c o l i f o r m d e n s i t y w i l l be r e d u c e d f r o m ' 4 p e r 1 0 0 ml t o 1 p e r 1 0 0 ml a l t h o u g h t h e 5 % f r a c t i o n p o s i t i v e w i l l p r o b a b l y be r e t a i n e d .

189

The a d o p t i o n o f t h i s a p p r o a c h t o m i c r o b i o l o g i c a l m o n i t o r i n g o f water d i s t r i b u t i o n systems provides several p r a c t i c a l advantages f o r sample e x a m i n a t i o n and f o r parameter e s t i m a t i o n .

It i s easier

and cheaper t o d e t e r m i n e i f c o l i f o r m b a c t e r i a a r e p r e s e n t i n a s a m p l e o f w a t e r t h a n i t i s d e t e r m i n e how m a n y c o l i f o r m b a c t e r i a are present.

The l a b o r a t o r y e x a m i n a t i o n can be a s i m p l e b r o t h

f e r m e n t a t i o n t e s t such as C l a r k ' s P-A t e s t ( C l a r k 1969) and t h e r e d u c e d c o s t p e r s a m p l e c a n make f e a s i b l e samples.

the

e x a m i n a t i o n o f more

The a p p r o p r i a t e f r e q u e n c y d i s t r i b u t i o n f o r f r e q u e n c y - o f -

occurrence i s t h e b i n o m i a l and t h e c a l c u l a t i o n o f c o n f i d e n c e l i m i t s i s r e l a t i v e l y simple.

For instance,

i f 60 s a m p l e s a r e e x -

a m i n e d a n d 3 o f t h e 6 0 ( 5 % ) a r e p o s i t i v e , we c a n s a y t h a t we a r e 95% c o n f i d e n t t h a t l e s s t h a n 10% o f t h e w a t e r i s "contaminated".

O n t h e b a s i s o f t h e s t u d i e s d o n e a t D r e x e l , we h a v e r e c o m m e n d e d t h a t t h e minimum number o f s a n p l e s p e r month r e q u i r e d f o r m o n i t o r i n g be 5 .

T h i s w o u l d t h e n g i v e a t o t a l o f 60 samples i n a 12

month p e r i o d .

The 5% r u l e w o u l d a l l o w 3 o f t h e 60 samples t o be

p o s i t i v e i n any 12 month p e r i o d .

I n a l l p r o b a b i l i t y there would

a l s o be a l i m i t o f no more t h a n one p o s i t i v e sample i n any month and any t i m e t h e f o u r t h p o s i t i v e sample t u r n e d up i n any 1 2 m o n t h period,

a n d MCL v i o l a t i o n w o u l d b e r e c o r d e d w i t h o u t w a i t i n g u n t i l

t h e end o f t h e y e a r o r even t h e end o f t h e month. This approach t o m i c r o b i o l o g i c a l monitoring o f small water systems b r i n g s up a g a i n t h e q u e s t i o n o f t h e logical water quality.

1s

i t

persistence

o f microbio-

reasonable t o try t o characterize

t h e m i c r o b i o l o g i c a l q u a l i t y o f t h e water i n a d i s t r i b u t i o n system o v e r a p e r i o d o f a y e a r o r even o v e r a p e r i o d o f a month? present time,

A t the

t h e r e i s no good b a s i s f o r a n s w e r i n g t h a t q u e s t i o n .

T h i s p r o b l e m seems t o b e a n i n t e r e s t i n g o n e f o r a t i m e s e r i e s a n a l y s i s approach. EXAMPLE - S Y S T E M WH An e x a m p l e o f some o f o u r s t u d i e s o n m i c r o b i o l o g i c a l

monitoring

o f w a t e r d i s t r i b u t i o n s y s t e m s i s b a s e d o n s e v e r a l samplings o f Woodbury H e i g h t s , 3,600

people.

New J e r s e y .

T h i s s y s t e m serves a p o p u l a t i o n o f

The w a t e r i s s u p p l i e d f r o m a w e l l a n d t h e o n l y

treatment i s chlorination. A s u m m a r y o f o u r s a m p l i n g d a t a f o r s y s t e m WH i s g i v e n i n T a b l e

2.

P e r i o d I was t w o w e e k s i n A p r i l 1 9 7 9 ,

i n May 1 9 7 9 ,

P e r i o d I 1 was t w o weeks

P e r i o d I 1 1 was t w o weeks i n J u n e 1 9 8 1 ,

f o u r weeks i n A u g u s t 1983 and P e r i o d

P e r i o d I V was

V was f o u r w e e k s i n O c t o b e r

Table 2 C o l i f o r m Sampling Data f o r System WH Sampl i ng Period

Number o f 1 0 0 ml S a m p l e s ___-

Total

Positive '

9D 9E PerGd I 9F 9G P e r G d I1

46

4

90 -

4

136

8

126 172 298

45 -

~

168 174

31

76

10

Fraction Positive

Frequency-ofOccurrence (95% C.I.)

0.01-0.11

0.25 0.26 0.26

0 . 1 7 - 0 . 32a 0.20-0.33, 0.21-0.31

0.06 0.16 0.11

0.02-0.10, 0.11-O.2la 0.08-0.14

342

3E 3F 36 3H P e r f i d IV

55 52 35 63 __ 205

1 0 0 1 2

0.07.

0 0 0.02 0.01

<0.05 ~0.04 ~0.05
50 49 45 60

6 1 2

0.12 0.02 0.04 0.02 0.05

0.03-0.21 <0.06
3s 3T 3v 3x Perfid V a.

204

38

1 10

Average

45

0.98 0.16 0.43

13.84 0.99 5.42

0.03-0.83

1.29 1.81 1.59

25.85 39.76 34.06

0.39-2.1ga 0.87-2.75, 0.92-2.26

>536

>O. 71 >2.40 >1.57

>39.43 >117.79 >79.78

2

0.04

0.04

< o . 09

S i g n i f i c a n t c h a n g e from p r e c e e d i n g p e r i o d [week o r m o n t h )

14 59

162 312 474

>119 >=

-

1

-

Variance

Mean Col i f o r m Density (95% C . I . )

Total

0.09 0.04 0.06

1G 1H P e r z d I11

28

MF C o l i f o r m C o l o n y C o u n t

-

-

3

0.02 0.01

0.02 0.01

29 1 2 1 33

0.58 0.02 0.04 0.02 0.16

0.46 0.02 0.03 0.02 0.11

-

<2.09 <0.36

-

~1.51 <0.06

10.10
<0.06

191

isolated A r e a s o f Woodbury Heights Distribution S y s t e m F i g u r e 2. F r a c t i o n s P o s i t i v e f o r t h e I s o l a t e d Areas f o r Sampling P e r i o d I 1 1 Show a S i g n i f i c a n t l y L o w e r C o l i f o r m O c c u r r e n c e i n t h e C e n t r a l Area

1983.

The n u m b e r o f

p o s i t i v e samples l i s t e d i n column 3 i s t h e

n u m b e r o f s a m p l e s f o r w h i c h t h e MF c o l i f o r m c o l o n y c o u n t was 1 o r m o r e f o r a 100 m l s a m p l e .

The f r a c t i o n o f t h e samples p o s i t i v e i s

g i v e n i n column 4 and t h e f r e q u e n c y - o f - o c c u r r e n c e , confidence i n t e r v a l f o r t h e f r a c t i o n p o s i t i v e ,

5.

which i s the 95%

i s g i v e n i n column

The t o t a l MF c o l i f o r m c o l o n y c o u n t s f o r t h e p o s i t i v e s a m p l e s

a r e g i v e n i n column 6. t h e s a m p l e s was "TNTC".

The > s i g n i n d i c a t e s t h a t a l e a s t one o f Those r e s u l t s w e r e i n c l u d e d i n as >80 p e r

1 0 0 m l i n t h e c a l c u l a t i o n o f t h e means a n d v a r i a n c e s .

The a v e r a g e

MF c o l i f o r m c o l o n y c o u n t s a r e g i v e n i n c o l u m n 7 a n d a r e u s e d t o e s t i m a t e t h e mean c o l i f o r m d e n s i t y o f t h e w a t e r d i s t r i b u t i o n s y s tem.

When a > 8 0 c o u n t o c c u r s ,

t h e mean d e n s i t y i s n o t e s t i m a t e d

because o f t h e i n a c c u r a c y due t o h i g h range d a t a t r u n c a t i o n .

192

Table 3 D i f f e r e n c e s Among I s o l a t e d A r e a s o f WH D i s t r i b u t i o n S y s t e m

~

S a m p l in g Period

~

_

Number o f 1 0 0 m l S a m p l e s

Total Positive East 42 4 0 Central 14 Southwest 36 4 North 44 0 _______ I 1 East 74 16 Central 72 23 Southwest 78 21 North 74 16 111 East 33 0 Central 74 2 Southwest 84 12 North 106 16 I V East 33 0 Centr a1 38 0 0 Southwest 25 North 109 2 V East 91 4 Central 25 3 Southwest 23 0 North 65 2 a - S i g n i f i c a n t d i f f e r e n c e s a t 5% l e v e l .

I

_ x2

for Contingency Table 6.41

2.83

7.9ia

1.78

3.95

There a r e s i g n i f i c a n t changes i n t h e f r e q u e n c y - o f - o c c u r r e n c e f r o m week t o week a n d f r o m m o n t h t o m o n t h .

The m o s t s t a r t l i n g

c h a n g e o v e r t i m e was P e r i o d I t o P e r i o d I 1 ( a l s o w e e k 9 E t o w e e k 9F).

T h i s c h a n g e was a l s o d e t e c t e d a s a s i g n i f i c a n t d i f f e r e n c e i n

t h e mean d e n s i t y p a r a m e t e r .

However,

the microbiological water

q u a l i t y r e m a i n e d c o n s t a n t f o r a t l e a s t f o u r weeks a t a t i m e i n 1 9 8 3 a n d p o s s i b l y f o r as l o n g a s f o u r m o n t h s ( P e r i o d s I V a n d V ) . T h e WH w a t e r d i s t r i b u t i o n s y s t e m was d i v i d e d i n t o i s o l a t e d a r e a s as shown i n F i g u r e 2.

Since t h e w e l l and standpipe a r e b o t h i n

t h e E a s t a r e a t h e f l o w o f w a t e r is a l w a y s f r o m E a s t t o C e n t r a l a n d from C e n t r a l t o N o r t h and Southwest areas.

These i s o l a t e d a r e a s

w e r e p r o d u c e d b y t h e f a c t t h a t t h e c o m m u n i t y was d i v i d e d b y a t u r n pike,

a r a i l r o a d , a major highway,

and a p r i v a t e s c h o o l .

The

c o l i f o r m d a t a f o r each o f t h e sampling p e r i o d s a r e d i s t r i b u t e d among i s o l a t e d a r e a s i n t a b l e 3 .

T h e r e was o n e p e r i o d w i t h a s i g -

n i f i c a n t d i f f e r e n c e among t h e i s o l a t e d a r e a s .

This suggests t h a t

i t may b e d e s i r a b l e t o m o n i t o r t h e i s o l a t e d a r e a s s e p a r a t e l y . c o u l d b e a rationale

This

f o r r e q u i r i n g more samples p e r sampling p e r i o d

f o r l a r g e r water d i s t r i b u t i o n systems.

193

SUMMARY

Routine m o n i t o r i n g o f w a t e r d i s t r i b u t i o n systems f o r c o l i f o r m b a c t e r i a f o r r e g u l a t o r y purposes has r e s u l t e d i n t h e r e c o g n i t i o n o f several interesting s t a t i s t i c a l

problems.

A reasonably l a r g e

b o d y o f s t a t i s t i c a l . l i t e r a t u r e d e a l i n g w i t h some o f t h e s e p r o b l e m s has d e v e l o p e d o v e r t h e l a s t 75 y e a r s b u t t h e r e a r e s t i l l p r o b l e m s which need a d d i t i o n a l study. One p r o b l e m i s t h e e s t i m a t i o n o f mean c o l i f o r m d e n s i t y .

Most

o f t h e sample r e s u l t s a r e i n d e t e r m i n a t e , e i t h e r t o o h i g h o r t o o low f o r t h e a v a i l a b l e methods t o measure a c c u r a t e l y .

A technique

f o r estimation o f an average from indeterminate values would prob a b l y be o f g r e a t b e n e f i t i n s e v e r a l d i f f e r e n t s c i e n t i f i c f i e l d s but,

short o f that,

i t i s p r o b a b l y b e t t e r t o u s e some p a r a m e t e r s

o t h e r t h a n mean d e n s i t y f o r r e g u l a t o r y p u r p o s e s . The n o r m a l r e p o r t i n g p e r i o d f o r m i c r o b i o l o g i c a l d r i n k i n g w a t e r q u a l i t y i s one month,

b u t t h e r e i s no s c i e n t i f i c b a s i s f o r u s i n g

a m o n t h r a t h e r t h a n a week o r a y e a r .

The q u e s t i o n o f t h e p e r s i s -

tance o f m i c r o b i o l o g i c a l w a t e r q u a l i t y i n a d i s t r i b u t i o n system needs t o be i n v e s t i g a t e d f u r t h e r i n r e l a t i o n t o r e g u l a t i o n and monitoring. Under p r e s e n t r e g u l a t i o n s i n t h e U n i t e d S t a t e s , t h e number o f samples p e r month r e q u i r e d f o r m i c r o b i o l o g i c a l m o n i t o r i n g runs f r o m 1 p e r month t o 500 p e r m o n t h and i n c r e a s e s w i t h i n c r e a s i n g s i z e o f t h e w a t e r d i s t r i b u t i o n system.

Sampling t h e o r y suggests

t h a t t h e number o f s a m p l e s needed i s n o t r e l a t e d t o t h e s i z e o f t h e system.

T h e r e may b e d i f f e r e n c e s i n m i c r o b i o l o g i c a l w a t e r

q u a l i t y among d i f f e r e n t a r e a s o f a w a t e r s y s t e m a n d l a r g e r s y s t e m s p r o b a b l y have g r e a t e r

heterogeneity

.

This greater

heterogeneity

may b e a r a t i o n a l e f o r r e q u i r i n g m o r e s a m p l e s f o r l a r g e r s y s t e m s . T h i s a l s o needs f u r t h e r s t u d y . REFERENCES C h r i s t i a n , R . R. a n d P i p e s , W. O . , 1983. Frequency D i s t r i b u t i o n o f C o l i f o r m s i n Water D i s t r i b u t i o n Systems, Appl. E n v i r o n . M i c r o b i o l . , 45: 603-609. H . , 1969. The D e t e c t i o n o f V a r i o u s B a c t e r i a I n d i c a t i v e Clark,. o f W a t e r P o l l u t i o n b y a P r e s e n c e - A b s e n c e ( P - A ) P r o c e d u r e , Can. J . M i c r o b i o l . , 15 771-780. E l - S h a a r a w i , A. a n d P i p e s , W. O . , 1 9 8 2 . E n u m e r a t i o n and S t a t i s t i c a l I n f e r e n c e , pp. 43-65, I n : Bacterial Indicators o f Pollu3, ( W . 0 . P i p e s , e d . ) , C R C P r e s s , B o c a R a t o n , FL. P i p e s , W. 0. a n d C h r i s t i a n , R . R . , 1 9 8 2 . Sampling FrequencyM i c r o b i o l o g i c a l D r i n k i n g Water R e g u l a t i o n s , EPA 570/9-82-001, O f f i c e o f D r i n k i n g W a t e r , U. S . E n v i r o n m e n t a l P r o t e c t i o n A g e n c y , W a s h i n g t o n , DC.

MODELLING OF BACTERIAL POPULATIONS AND WATER QUALITY MONITORING IN DISTRIBUTION SYSTEMS A. MAUL', 'Centre

A.H.

EL-SHAARAWI'

and J.C.

BLOCK'

des Sciences de 1'Environnement

University of Metz, France 'National

Water Research Institute, Burlington, Ontario, Canada

ABSTRACT Bacteriological surveys were performed on the drinking water distribution system of the city o f Metz in France according to a

systematic

temporal

sampling

distribution

design of

to

determine

heterotrophic

the

spatial

bacteria

in

and the

network.

A non-hierarchical nearest-centroid clustering method

was used

for dividing the water distribution system into zones

corresponding to different levels o f bacterial density.

Since

the frequency distributions o f microorganisms within the zones could be modelled water

by

the negative binomial

distribution system

studied may be

distribution, the

considered

composed o f several heterogeneous subsystems.

as being

Information o n

the spatial and temporal variability of bacteriological data is used

to

develop

a

sampling

quality monitoring.

design

for

use

density of the water is

given

which

design. the

future

water

Under the assumption that the objective of

monitoring is to determine whether or not

the mean

bacterial

exceeds a specific standard, a criterion

determines

stations allocated

in

to each

the

optimal

number

of

sampling

zone in case of a one-run sampling

These stations are determined by assuming that either

risk

of

sampling

(i.e.,

making

the

wrong

decision)

is

prespecified or that the total number o f stations to be sampled is predetermined.

1.

INTRODUCTION Sampling

programs

designed

to

monitor

or

to

study

the

bacterial density in water distribution systems usually involve

195 the

collection

locations

and

objective

of

of

a

over

number

an

public safe

bacterial

quality

specify

( i )

a

and

maximum of

( i i ) a threshold

v a l u e of

1976). the

period

basic

have

biological

been

counts

may

(Colwell 1981)

al.,

et

and

organisms/mL) Drinking

Water

plate

even

Regulations

Community

(EEC)

that

arithmetic

mean

the

should not Prior

exceed to

monitoring

Where

to

samples be erroneous the

fact, the

drinking

level.

guidelines

depends

(A),

on

(i)

( i i )

the

variability Pipes a

and

given

quality the

of

the

the

temporal

in

data al.,

et

Interim

500

Primary

the

European

water

states

(i.e.,

mean of

1982).

program

water

of

during

the

that

on

bacterial

both

in

Therefore,

6

of and

the

of

at

a

water

bacterial

programs the

water

( i i i )

(Esterby,

the

the

when,

monitoring

that

the

a ) and

risk

density

monitoring depends

two

monitoring the

bacteria

shows

nature,

8).

collected,

of

This for

largely

variation

controlling

samples

the

violating

risk,

risk,

bacterial

distribution

should

regulations

probability

true

number

is

producers'

of

the

samples?

is declaring that

microbiological

consist the

this

o€

first

consumers'

the

When

for

to

How many

( i i i )

the

program

answers

(i)

system

with

violated

the

water,

and

the

are

Christian,

of

methods

(i.e.,

sampling

the

(i.e.,

of

However,

sampling

be

bacterial

Means

drinking

required:

samples?

not

should

specific

a

possible:

true

quality

of

compliance

objective

water

to

quality

Further,

drinking

are

water

is

it

i s not

this

other

1978;

U.S.

for

EPA,

U.S.

continues

water

any sampling d e s i g n

are

declaring

basic

(1976).

of

the

For

the

when

is

second

collect

taken?

regulation

1978,

(Y) f o r h e t e r o t r o p h i c b a c t e r i a a t 2 0 ° C

quality

of

the

a

over a

averaged

limitation

in

preparation

decisions

quality

exceeding

coliforms) or/and

heterotrophic

al.,

regulation

following three questions ( i i )

samples

100 organisms/mL.

the

the

thus

potability,

et

proposed

Economic

most

Canada,

count

ensure

regulations

(e.g.

complementary

McFeters

total was

water

major

water,

enumeration

instance,

useful

1978;

a

of

Welfare

of

For

provide

or

various a

to

drinking

bacteria

and

test

advanced.

is

microbiology

although coliform

at

Since

t h e mean s a m p l e c o u n t

(Health

However,

time.

guidelines

indicator

samples

of

reliable

proportion

particular value

specified

water

period

health

bacteriologically water

of

extended

the 1982;

efficiency of

microbiological the

spatial

populations

in

and the

196 systems

sampled.

population

as

1976) is not setting a the

Moreover,

in

always advisable.

(El-Shaarawi Clearly, the

degree

this

of

of

an

primary

is

paper

aim of

of

assumption

of

water

distinct

a

in

structured

distribution

regions

of

bacterial

heterogeneity observed

that

need

the

determining to

6,

risk

different of

and

zones

sampled ( i i ) of

is

stations

which

be

in

a

w i l l

becomes a v a i l a b l e ,

presented

the

and

in

showing

by

a

underlying

composed

be

of

the

examined.

in this and

the

run

the

several negative

allocating

The

number a

the

of

stations

given

level

stations total

sequential

information

patterns

study a r e then used

for

Further,

new

this

spatial

d e n s i t y assuming the

prespecified. takes

of

in

sampling

populations

modelled

single

optimally

analysis

with

an o b j e c t i v e c l a s s i f i c a t i o n

location

bacterial

continually

2.

the

and the

being

(i.e.,

system i n t o zones w i l l

(i)

system

heterogeneity

as

the water

population.

setting

picture

in

is

water

correlated

particular,

of

for:

in

the

be

bacterial

d i s t r i b u t i o n ) by means o f

parameter

in

s i z e of

to

EPA,

the

water

spatial

system

(U.S.

bacteria

clear

In

samples

quality

statistical

a

system.

important

might

the

of

drinking water

factor

the

give

number

regulations

the

heterotrophic

heterogeneous

binominal

not

population

indirect

to

variation

the

and

the

heterogeneity

distribution

whole

the

The most

of

1985)

al.,

size

The

temporal water

patterns

represents

design. in

et

the

it

relating

water quality

sampling program t o monitor

dispersion

case

U.S.

the

into

of

to

the

number

sampling,

account

it

as

a l s o be d i s c u s s e d .

MATERIALS A N D METHODS

2.1

Sampling s t a t i , o n s and c o l l e c t i o n o f w a t e r The

tion

s a m p l i n g was

system of

the

coming

from

water portion

of

the

confined t o c i t y o f Metz the

same

distribution

conducted

covers

the

districts

of

the

city

samples

were

an

Metz

of

the

samples water

distribu-

i n F r a n c e s e r v e d e x c l u s i v e l y by

treatment

plant.

system

which

northwestern of

area

to

end

and

including

all

a

The this of

the

small

few

enclaved study

was

southern adjoining

communities. Water number

of

systematic surveys During

sites

were each

spread

sampling thus survey,

collected over

design

performed

during

the

study

(Figure from

a 3 hr. p e r i o d area

1).

December

102 s a m p l e s were

according

Six

1983

taken with

from

a

to

a

bacteriological to

June

1984.

s t e r i l e 5 0 0 mL

197

X .-v)

a,

5 2

E .c

m

m

c

-0 a,

m

5

3

v)

C .-

a)

5

0

L

198 glass

bottles

solution)

flushed

at

with a

full

from

taps

pressure

of

as

were

Long

transported

within

s i x hours

2.2

Sample p r o c e s s i n g After

the

dilutions percent

the

NaCl

low n u t r i e n t

agar

at

the

shaken

for

72

then

obtain water

at

and

ambient

processed

each survey.

serial

made

through

ten-fold

sterile

in

mL p o r t i o n s of

um pore

( R e a s o n e r and

20°C

thio-

flamed

samples held

manually,

filtered 0.45

Hawg,

sodium

to

laboratory

were

t o 0.01

Ten

percent previously

necessary

Bottle to

samples

then

were

(Millipore,

incubated

were

original

solution.

samples

filters

as

from t h e b e g i n n i n g o f

bottles

of

3

a

were

which

stabilized temperature.

temperature

water

1 mL

(containing

sulfate

the

sterile

membrane

s i z e ) , d e p o s i t e d o n R-2A

Geldreich,

1985)

The

hours.

to

be

bacterial

finally density,

e x p r e s s e d as t h e number o f

heterotrophic bacteria contained

1

sample,

of

mL

the

combination

of

original the

available

was

counts

calculated

in

p r o b a b i l i t y d i s t r i b u t i o n s and

t h e number of b a c t e r i a

negative of

(Maul e t

in the

dilutions

al.,

1981).

S t a t i s t i c a l analyses Fitting

If

from

successive

a c c o r d i n g t o t h e maximum l i k e l i h o o d m e t h o d

2.3

0.8

initial

binomial

parameter

i n t h e s a m p l e s i s assumed

distribution

(Fisher,

19411,

the

estimation. to

follow a

probability

f i n d i n g r o r g a n i s m s i n a sample i s g i v e n by

This

distribution is specified

mean

of

the

distribution

is

by

the

=

pk.

A,

p a r a m e t e r s p and The

k.

The

maximum-likelihood

e s t i m a t e k for k s a t i s f i e s t h e e q u a t i o n

where

7

i s t h e a r i t h m e t i c mean o f

b a c t e r i a l c o u n t s from n independent A

likelihood

estimate of

goodness-of-fit

,..., r n

rl

samples.

-

p is as p = r/k.

statistic

which are t h e T h e maximum-

The c h i - s q u a r e

( S n e d e c o r and Cochran,

1967) w a s used

199 test

to

the

represent If

each

binomial

1,2

adequacy

the

of

sets

1,

of

distribution

,..., a ) ,

used

of

negative

the

test

to

estimation

of

data

with

likelihood

are

distribution

ratio

the

equality

the

common

represented

parameters

of

test the

value

a s s u m i n g t h e n u l l h y p o t h e s i s Ho ki's).

binomial

to

the data.

ki's.

(kc)

sets

equality of

estimate,

kc,

( i

=

can

be

requires

II

the

negative

k;

1968)

This

of

a

and

(Lindgren,

(i.e.,

The m a x i m u m - l i k e l i h o o d

by

pi

the

of

data

the

f o r k,

s a t i s f i e s the equation

w h e r e 7i i s :he which

are

of

set

on

a

computer

rejected it

the

data.

is

(i.e.,

of

(i)

k.

starting

is

k

by

the

the

p. d a t a to

with

determine

two

(iii)

if

test

then

the the

as

the

process of

test

difference

not

between

stopped.

The

where

test

the

is

the

k

is

values

is

k

common

at of

of

a

data

sets

The

two

values

the

three

k; ( i v ) as

long

significant

the

number

the

two

time, a

obtained

accepted.

to

(ii)

prespecified

for

a

a

follows:

values of

once

the

then

as

those

at

set

(v)

is

Ho

k),

order;

k's

smallest

estimate of m represents for

is

correspond

adding one d a t a

ith

d i s t r i b u t i o n s with

equality

significant;

m
m,

equality

three

When

this

equality

the

numerically

common

ascending

which

for

test

a

of

doing in

sets

data

and

k

for

on

solved

have

subset

ki's

dats s e t s corresponding to the continue the

a

used

samples

mi

method.

sets do n o t

different

level;

accepted,

the

( 3 ) can be

t o m negative binomial

the of

from

and

Newton-Raphson

procedure

the

values

significance of

(2)

fitted The

arrange

lowest

counts

Equations

interest

which can be common

bacterial

...,r i m i ;

ril, ri2,

a r i t h m e t i c mean o f

process of

data

estimate

is

sets

of

kc

i s o b t a i n e d from e q u a t i o n ( 3 ) . Correlation analysis. (T)

was

calculated

v a r i a b i l i t y between the Clustering

Kendall

to

rank the

correlation coefficient reproducibility

of

the

surveys.

method.

c l u s t e r i n g m e t h o d was

study

A

used

non-hierarchical to divide

the

nearest-centroid

stations

in

the

water

200

distribution system into

sets.

When

these sets are given on

the map o f the distribution system, they will be called zones. The

clustering method

is

given

in detail

in

Anderson

et

al.

(1984). Sampling strategy. binomial

If a system is modelled by the negative

distribution with

probability

parametrs

p

and

k

(A

=

pk),

the

of accepting that the water in the network is

(PA)

in state o f control

(e.g.,

not

exceeding the

100 bacteria/mL

EEC standard) is given as

l1 oon r=O

=

PA

which

is

(nk + r-l)! (nk-l)!r!

Pr (1 + p)"-

function o f

the

a

number

(4)

of

samples collected,

n.

El-Shaarawi et al. (1985) approximated PA as 100-x P

A

=

$

(5)

)

(

JX/n'(l+X/k where $ ( z )

--

from

to

is the area under the standard normal distribution Hence, the estimate o f the number of samples (n)

Z .

level, 8 ,

to be collected, when PA is set at a prespecified

is

given as n

hz20 (l+X/k)

->

where

(100

- A)*

zo

is

probability

the

level

normal

8.

inversely related to k.

From

variable formula

corresponding

(6)

it

to

appears that

the n

is

Reciprocally, equations (4) or ( 5 ) can

be used to calculate the risk 6 associated with a given number of samples, n. More generally, if a system can be divided into 11 zones with

the

negative

model

for

the

above

formula

binomial

distribution

dispersion o f bacteria permits

the

in

determination

samples to be allotted to each zone. all

representing each of

suitable

the

the

number

of

Suppose 6 is fixed

for

the z o n e s , the number (ni) of samples needed

zone can then be calculated

a

zone, then

from the ith

directly from equation ( 6 ) .

How-

ever, for administration purposes or monetary constraints, the total number of samples ( N ) to be collected, might be fixed and

201 the

problem c o n s i s t s

n

=

i

of

d e t e r m i n i n g how

to

The o p t i m a l a l l o c a t i o n i s o b t a i n e d

zones.

d i v i d e N among

the

(7)

from e q u a t i o n

1 + A / k 21

N

T

(7)

1

1

( 1 + A/ki)

i=l i s t h e parameter a s s o c i a t e d w i t h zone i .

w h e r e k;

The r i s k 6 c o r r e s p o n d i n g t o s u c h a d e s i g n i s g i v e n a s

- x

100

Sequential sampling. in

a

are

system, to

be

modelled

reported

monthly),

then

determine

whether

regulations. out

the

sequential the

The

first

negative

every

:

A

H1

:

A

L

sampling

task

sequential

A1

200,

in

the

time

be

performed

compliance sampling

l e v e l of

(say to

with

the

is

set

to

the population's

less

than

6/1-a,

true

is

as

showed

the

than

case t h i s

B(n)

soon

that

than

and

less

A(n)

and A 1 c a n b e c h o s e n a s

A0

100 and

respectively).

as

the

rejection

accumulated

or

data

acceptance

of

provide

Ho

is

a

very

sampling i s continued.

(1947)

is

< >

in

distribution, of

s a m p l i n g a l l o w s a d e c i s i o n t o b e made o n t h e mean

indication

true

present

can

is

(e.g.,

density

unlikely i f Wald

period

system

A0

Sequential bacterial strong

binomial

specific

namely

(A),

Ho

Tn

by

for

two o p p o s i n g h y p o t h e s e s a b o u t

mean

Tn

t h e r e s u l t s of a m o n i t o r i n g program

If

and

that

6

if risk

a

i f

the the of

r i s k of

a c c e p t i n g Ho w h e n H1

likelihood accepting

Kn

is equivalent

is to

larger

ration

H1 w h e n than

(Rn)

is

actually

1-B/a.

is

less

HO i s In

the

N

0

N

1.0-

/

0.9-

.--

________

0

JAN FEB APR MAY JUN

10 21 3 15

26

I

I

I

I

1

2

3

4

Number of bacteria per mL (log scale)

FIG. 2 Observed cumulative frequency distributions for the Metz water distribution system data.

/"

203 where

is

Tn

the

cumulative

f i r s t n samples, w i t h A(n)

number

of

bacteria

found

in

the

and B ( n ) g i v e n as

where

The

sampling

process

has

to

be

continued

until

Tn

falls

o u t s i d e t h e i n t e r v a l d e t e r m i n e d by A(n) and B ( n ) .

RESULTS A N D APPLICATIONS

3.

Characterization

3.1

of

the

spatial

and

the

temporal

v a r i a b.i 1 i t y The e m p i r i c a l surveys

(i.e.,

May 1 5 a n d the

cumulative distribution

December

June

chi-square

26)

binomial

were

not

significant

unsuitability of in

s p a t i a l reproducibility of the

different

of

A l l

of

probability

stable

water

divide

the

data

surveys

coefficient

surveys.

percent

from

at

the

the

six

21, A p r i l

first

the

can

the

level,

the

and

3,

last

by

calculated

into

The

is

It

distribution

density

rank

for

pair

each

at

zones

on

the

1

bacteriologically

appropriate

application

in

Kendall

significant a

This

The d e g r e e o f

bacterial

reflecting

system.

surveys.

of

were

T'S

thus

binomial

assessed was

level.

surveys.

pattern be

15

1 percent

negative

(T) which

distribution

six

the

Only t h e v a l u e s o f

for

the

four of

d i s t r i b u t i o n system the

February 1.

for

the probability distribution is negative

for describing the data

correlation

test

goodness-of-fit

assuming t h a t

the

10,

are given i n Figure

surveys,

shows

January

13,

functions

the of

then

basis

the

of

to the

clustering

method t o a l l t h e d a t a a d e q u a t e l y d i v i d e d t h e w a t e r system i n t o four

z o n e s a s shown i n F i g u r e

this

clustering

total

(R)

variations.

density,

includes

Zone

53

2.

The v a r i a b i l i t y e x p l a i n e d by

represented

1,

stations

sampled and h e n c e c o v e r e d most

more

the from of

than

zone a

80

of total

t h e a r e a of

percent

lowest of

102

of

the

bacterial stations

t h e network.

The

204

TABLE 1.

Number of stations, mean, estimates of the parameters of the negative binomial distribution and goodness-of-fit statistic for each combination of survey and zone.

--

Date

Zone

Number of stat ions

Parameters of the negative binomial distribution

Mean, r

P Dec. 13

Jan. 10

Feb. 21

Apr. 03

June 26

* **

Degrees of freedom

x2

-

1

52

36.81

105.54

0.34877

3

8.60*

2

31

668.23

1224.24

0.54583

1

0.54

3

14

561.64

1072.43

0.52371

4

3

3820.00

53.21

71.78516

1

53

24.21

68.53

0.35324

3

10.53"

2

31

852.65

2676.59

0.31856

1

2.79

3

15

233.20

448.11

0.52041

4

3

9420.00

10.78

874.22991

1

52

44.08

127.08

0.34683

3

4.88

1

4.39"

2

31

627.94

1547.44

0.40579

3

14

537.58

1206.38

0.44561

4

3

13433.33

487.67

27.54602

1

52

35.92

66.95

0.53659

3

5.22

2

31

284.64

308.46

0.92280

1

2.27

3

15

787.39

2266.19

0.34745

4

3

7833.33

1090.56

7.18284

1

51

87.10

128.98

0.67527

3

11.08*

2

30

446.70

468.96

0.95253

1

2.22

3

14

577.72

626.41

0.92227

4

3

2669.67

45.20

59.05907

May 15

Goodness-of-fit statistic

-------

, -

1

52

197.60

420.58

0.46982

3

2.31

2

31

3168.90

2387.71

1.32717

1

1.84

3

15

291 19.99

3305.08

8.81069

4

3

7633.33

1088,.53

7.01250

-

Value is significant at the 5% level. Value is significant at the 1% level.

-I-

205

26

FIG. 3 Temporal variation of the bacterial density for each zone.

zones of higher density consisted and e a s t e r n e n d s o f The for zone.

Thus,

the

(calculated

t h e number o f

was

significant

the

negative

the

zones.

to

at

negative been

s t a t i o n s n,

Chi-square

binomial

s u r v e y and the

None o f

level.

d i s t r i b u t i o n may

heterotrophic

each

the

the negative binomial

goodness-of-fit

1 percent

within

t h e mean Y = A ,

of

zone.

distribution

reconsidered

bacterial

i n Table 1 for

the values

This be

(x2)

statistic

1 and 2 o n l y ) are p r e s e n t e d

binomial

represent

the has

the parameters

f o r zone

each combination of

model

of

data

e s t i m a t e s p and k o f d i s t r i b u t i o n and

the northwestern

the city.

appropriateness

describing

primarily of

taken

of

indicates as

counts

a

x2

that

suitable

within

the

206 The t e m p o r a l the

various

variation

c l e a r l y shows t h e in

the

system

emphasized the

uniformally

indicate

by

3.2

an

in

the

in

the

words,

the

zones

graph

population

it

However,

consistency

within

The

bacterial

survey.

change i n b a c t e r i a l

rather

density

3.

Figure

must

be

trajectories

i n t e r a c t i o n between the

other

the

seem t o behave

mean b a c t e r i a l

increase of

lack of

in

the

last

the

the

effect:

of

illustrated

general

for

that

centroids

time

is

zones

zone

are

density

and

not

and,

of the

affected thus,

they

variability

w i l l

i n d e p e n d e n t l y from each o t h e r .

Sampling s t r a t e g y f o r f u t u r e .data c o l l e c t i o n Available

information

now b e u s e d

-

One-run

2

Table

presents

estimated, test

number

kc,

to

samples of

1 only. 2

zone

bacterial

maximum-Likelihood

the

ki's

However,

and

zone

this

common

each

the

has been chosen t o c h a r a c t e r i z e

then

formula

that

should

6 and X

for (n)

as

is

shown

a

(6) be

in

The

dictate

to

results

been

zone

was

k

assigned

the

the

number

provided

another

total

of

these

by n g i v e s

w h i l e column headed

Table

3.

for

the

second

k;

(i.e.,

of

using

the

Assuming

zone,

stations

the values

samples

way,

number

of

that

from formula

calculations in

The

assigned

t o each

levels

that

for

zone 4.

t h e o p t i m a l a l l o c a t i o n among t h e

summarized

of

assuming d i f f e r e n t Seen

be

observed

of

4.

low.

accepted

could

The minimum number

the r i s k 6 is obtained

2 are

lowest

determine

t o each

k,

Figure

i s N,

column headed

0.05,

used

given.

function of

to equalize

Figure

is

for k has

allocated

are

constraints collected

value

too

approach

7.01250)

specific

the

k

4 since the

zone

was

zone for

in

a

while

of

s i g n i f i c a n c e of

for

zone

value

following

paper

the

except

within

a

3

in

estimate

t h e maximum-likelihood

section

When

this

ki's,

involved

the

zone,

f o r e a c h zone and

comparing

homogeneity zone

the

s u r v e y and

€or kc

for

of

the

future data collection.

design

each combination of

the

about

for designing

required

€or 6 a n d A , if

practical

samples

to

be

zones i n order (7).

zones

shown

in

X i s s e t a t 200,

the allocation corresponding t o 6 = by n p r e s e n t s t h e

optimal allocation

207 TABLE 2 .

Zone

1

Maximum-likelihood estimate of k for each combination of zone and survey, maximum-likelihood estimate of k, for each zone and significance o f the test for homogeneity. Survey

ki

Dec 1 3

0.34877

Jan 1 0

0.35324

Feb 2 1

0.34683

3

0.53659

May 1 5

0.67527

Jun 2 6

0.46982

Dec 1 3

0.54583

Jan 10

0.31856

Apr

Test for the Homogeneity of the k i t s (significance)

kc

0.43927

0.100

P

.-

2

------

Feb 2 1

0.40579

Apr

3

0.92280

May 1 5

0.95253

Jun 2 6

1.32717

<

0.001

0.52371

Jan 1 0

0.52041

Feb 2 1

0.44561

3

0.34745

May 1 5

0.92227

Jun 2 6

8.81069

Dec 1 3

71.78516

Apr

4

P

---Dee 1 3

3

0.65823

Jan 1 0

874.22991

Feb 2 1

27.54602

3

7.18284

May 1 5

59.05907

Jun 2 6

7.01250

Apr

0.84629

41.54600

P

<

0.001

208

18C

16C

14C

x

= 200 ___.___._. = 150

120 Ul

a, Q

E, 100 (I)

c

0

2 L

80

5

z

60 40

20

0 k

FIG.4

Number of samples to be collected as a fraction of k assuming different levels for the bacterial density h and the risk p

TABLE 3.

Optimal

a l l o c a t i o n of

sampling

stations

to the water

d i s t r i b u t i o n s y s t e m of t h e c i t y of M e t z .

Zone

k

Number

of S a m p l i n g

St at i o n s

1

0.43927

52

24. 70

35.96

2

0.50514

31

21.48

31.28

3

0.51945

15

20.89

30.42

4

7.01250

3

1.59

2.33

209 of

100

=

(6

samples

0.0235).

The

efficiency

of

the

1983-84

s a m p l i n g d e s i g n c a n b e e v a l u a t e d by c o m p a r i n g t h e v a l u e s of or n w i t h

those

n.

Table

3 shows

the

Metz

water

detect

0.95

X

(i) at

that

=

the

water

200

and

( i i )

a s s o c i a t e d with 6 = 0.05, first

two

i n zone

-

zones

system

of

and

order

the

to

sampling

t h e number of the

able

probability

optimal

increases

be

from to

above

design,

stations in the

number

of

stations

3. Sequential sampling

Although 1983-84 from

sequential

study,

the

1

zone

sampling has

is

method

which

yielded

6

set

are

different displays

equal

an

levels the

of

path

the of

two

the

fell

to

opposing

outside

the

the

The

using

of

kc

process

uncertainty

as

the data

0.43927.

are

given

hypotheses. bacterial

first

in

the

( 9 ) when b o t h a

curves

cumulative

1.

zone

performed

here

from formula These

t a k e n randomly from t h e d a t a of corresponding

been

estimate

0.05.

to

not

illustrated

5 shows t h e c u r v e s o b t a i n e d

Figure and

Tn

in

quality with

reduces

slightly

n

d e n o t e d by

70 s a m p l e s a r e n e e d e d

least

distribution

violations when

used during the s i x surveys

actually

for

Figure counts

5

(Tn)

and t h e s i x t h s u r v e y s

was

continued

region

until

delimited

by

In two

associated curves. Thus, from

the

the

compliance

11th

sample

while v i o l a t i o n of sample

onwards

example, February

21,

April

r e g u l a t i o n had and

regulation

stated 28th

June

26

3

and

been m e t

December

be

data

procedure

the

could

t h e s t a n d a r d c o u l d be o b s e r v e d from t h e the

for

surveys

performed

May

data. on

15 s u r v e y s d a t a

respectively

from t h e

a

As

the

further

January

showed 12th,

that

22nd,

10, the 14th

14th sample onwards.

4.

DISCUSSION AND CONCLUSION The g o o d n e s s - o f - f i t

the

the

13 surveys

for

same

the

with

onwards

unsuitability

representing However, zones

the

showed

the

of data

tests the

negative

from

goodness-of-fit that

the

s u m m a r i z e d i n T a b l e 1 c l e a r l y show

the

six

tests

negative

binomial

distribution

bacteriological calculated

binomial

model

for

within

f i t s

for

surveys.

the

the data

Number of samples Fig. 5 Sequential sampling illustrated for zone 1 using the data for December 13 and June 26 surveys

211 well

in

the

different

distribution being

system

composed

modelled

by

of

the

by d i f f e r e n t

able

in

the

negative

l e v e l s of

The

The

the

of

of

system,

hydraulic

which in

a

may

2

are

more

of

the

plant

In

and

other to

to

a

water as

(i.e.,

characterized

Furthermore,

the

can be c o n s i d e r -

necessarily occur

system

spatial

bacteria

in

of

of

therefore

age

of

al.,

the

zones

may

be

pipes,

the

residuals,

and

habitat,

succession

1981).

of

In

this

heterotrophic

peripheral it.

close to

of

chlorine

ecological et

bacterial

into

heterogeneity.

and c h e m i c a l c h a r a c t e r i s t i c s

parameters

in

than

in

far

locations

some o f patterns regard, bacteria from

the

Such an u n d e r s t a n d i n g

density

the

a

be

helps

to

improve

the

bacterial

was

constancy

to

likely

month

time.

might

collected

be

to

k

to in

vary

the

order

within

of

to

zone

bacteria

the

values

number to

of

maintain

i n view of

preferable

shown

1

than

s i g n i f i c a n t l y from

between

However,

so t h a t

of

dispersion

discrepancies

heterogeneity

sampling design is

fix

of

k

samples the

same

administration the

t h e maximum r i s k

number

of

is controlled

level.

zonation

of

a

and w i l l

In t h i s regard,

the

during all

during the

k

from month

it

of

the

readjustment

sampling over

to

relative

stability in

The

imply

incidence

p a r a m e t e r k ) on t h e

where

problem a r e a s

violated

the

densities

occur

zones

specific

The

not

chronological

(Means

high

month.

constraints, samples

of

level

stage

The

better

continually r i s k of

the

rather

study,

a

normally

at

that

emphasized.

the

subsystems

distribution

the

other

the

i n t e r m s of

reflects

month

does

the

considered

f u t u r e monitoring sampling programs.

this

(i.e.,

water

as

of

variability

design of

it

but

pattern

systems

likely

treatment

be

distribution)

t o some p h y s i c a l such

dictate

shows

Hence

may

heterogeneous

heterotrophic

number

unperturbed

Figure

a

conditions,

probably

Metz

binomial

network

partially related of

network.

of

the bacterial density.

structured

incidence

the

same m a n n e r .

the

division a

of

city

in the bacterial population

whole

in

reflects

the

several

temporal variation

t h e zones

zones

of

last

network thus

be

useful

facilitate

may

taking

100 b a c t e r i a l m i s t a n d a r d , the

s i x surveys,

survey only,

for

zone

for

for

for zones

1.

determining

remedial

2,

action.

i n s t a n c e , was

3 and 4 ,

and

212

Sequential sampling offers another way to monitor of

bacterial

density

in

a

system.

In

the level

addition

to

k,

the

procedure takes into account the concentration of bacteria the

samples

through

after

such

a

predetermined

each

collection.

strenuous

number

of

sampling samples

This

may

avoid

program

as

using

collected

in

a

a

large

single

run.

Moreover, sequential sampling may show increased efficiency with

comparison

mean

the

bacterial

standard.

density

design,

is

far

bacterial

and B(n)

illustrated

lies

sampling, both (8)

consumers' risk

or

above

below

in

true the

somewhere between

the producers'

risk

and

ho

(a) and

hl.

the

are considered and the graphical procedure

in Figure 5

analysed

enough

the

for some large sample s i z e , the true mean

density probably

In sequential

especially when

the cumulative sample Tn remains

Nevertheless, if

between A(n)

water

on-run

in

going

allows

into

further

different

classification of

classes

of

the

bacterial

concentration. Although the examples given in this paper use heterotrophic bacteria

and

a

specific

presented

may

be

easily

other

bacteriological

regulation,

adapted

water

the

to other

quality

data

sampling

designs

regulations o r described

even

by

the

negative binomial distribution. 5.

ACKNOWLEDGEMENTS The

authors

thank

S.R.

Wters) who kindly made

Esterby

(Canada

Centre

for

Inland

available to us a computer program to

perform the clustering method. 6.

LITERATURE CITED

Anderson, T.E.,

J.E.,

1984.

(U.S.A.

-

variability Hydrol.

El-Shaarawi, Dissolved 1.

Canada), using

A.H.,

oxygen Study

cluster

Esterby,

S.R.

concentrations

and

of

spatial

regression

and

Unny,

in Lake Erie and

temporal

analysis.

J.

72: 209-229.

Colwell, R.R.,

Austin,

B.

and

Wan.

L.,

considerations of the microbiology

65-75.

In: Evaluation

drinking

water

(C.W.

of

1978.

t h e microbiology

Hendricks,

ed.)

Protection Agency, Washington, D.C.

Public

health

of "potable" water. U.S.

standards

p. for

Environmental

213 El-Shaarawi, A.H., historical

Block, J.C.

data

for

and M a u l , A.,

estimating

the

1985.

The use o f

number

required for monitoring drinking water.

of

samples

Sci. Tot. Environ.

42: 289-295. Esterby,

S.R.,

1982.

Fitting

bacteriological data.

water quality guidelines. Fisher,

R.A.,

1941.

Ann. Eugen. Health

and

distributions

for

surveys

J. Fr. Hydrol.

The

negative

and

to for

13: 189-203.

binomial

distribution.

11: 182-187.

Welfare

drinking

probability

Considerations

water

Canada,

1978.

quality.

Guidelines

Canadian

for

Government

Canadian

Publishing

Centre, Supply and Services Canada, Hull, Quebec.

1968.

Lindgren, B.W.,

Statistical

theory, 2nd

ed.

Collier-

Macmillan, London. Maul, A.,

D o l l a r d , M.A.

principe

du

and Block, J.C.

maximum

de

vraisemblance

bactdrien s u r milieu gelos6. McFeters,

G.A.,

Alternative

Shillinger, indicators

J.E.

of

.

37-48.

p.

standards

for

In:

drinking

Application du

(N.P.P)

J. Fr. Hydrol. and

water

physiological characteristics of water

1981.

Stuary,

titrage

D.G.

contamination

1978. and

some

heterotrophic bacteria

Evaluation

water

au

12: 245-254.

(C.W.

of

the

in

microbiology

Hendricks,

ed.),

U.S.

Environmental Protection Agency, Washington, D.C. Means, E.G.,

Hanami, L., Ridgway, H.F.

Evaluating mediums bacteria 73:

Pipes,

and

plating

and O l s o n , B.H.

techniques

in water distribution systems.

for

1981.

enumerating

J. Am.

Works Ass.

585-590. W.O.

and

Christian,

frequency-microbiological report.

U.S.

R.R.,

drinking

Environmental

1982.

water

S a m p 1 ing

regulations-final

Protection

Agency,

EPA

R-805-63719-82-001, Washington, D.C. Reasoner, D.J.

and

enumeration and

Geldreich, E.E., subculture o f

Appl. Environ. Microbiol. Snedcor, G.W. 6th ed.

and

1985.

A

n e w medium

for

bacteria f r o m potable water.

49: 1-7.

Cochran, W.G.,

1967.

Iowa State University Press.

Statistical methods, Ames, Iowa.

214 U.S.

Environmental Protection Agency, Office of Drinking Water, 1976.

National

EPA-57019-76-003, Washington, W a l d , A.,

1974.

interim primary U.S.

drinking water

Environmental

regulations,

Protection

D.C. Sequential analysis.

New York, Wiley.

Agency,

A GOODNESS-OF-FIT TEST FOR THE NEGATIVE APPLICABLE TO LARGE SETS OF SMALL SAMPLES

BINOMIAL

DISTRIBUTION

BARBARA HELLER, Illinois Institute of Technology, Mathematics Department

Frequently, in microbiological work, bacterial counts are obtained serially in time or in space. I f there are replicates, they are few in number. If we assume that the same probability model can be used for the whole set of counts, then parameter values might vary from one point in time to the next.

Our object is to devise a goodness-of-fit test

for a given probability model, taking into account the effect of varying parameters and small sample sizes.

If we assume a Poisson model, then the index of dispersion statistic DL is available for testing goodness-of-fit.

Suppose that we have a sequence of size M of sets of

replicates of size n; where n is small and M i s large. Using the property that the Poisson mean i s equal to i t s variance, we compute the ratio Di2 of those two sample moments for each sample i = I, 2,

..., M.

Under the null hypothesis, each Di2 has, asympototicolly,

a x 2 distribution with n-l degrees of freedom. We utilize the fact that M i s large, even though n is small, by considering the frequency distribution of the set comparing it with the X 2(,-l)distribution

using, e.g.,

Di2

and

the chi-squared goodness-of-fit test.

We consider, here, the case where the assumed model i s negative binomial.

See

El-Shaarawi, Esterby, and Dutka (l98l), Christian and Pipes (1983), Pipes and Christian

(1984). As in the case of the Poisson distribution, we devise a sample statistic based upon a characterizing property which involves sample moments. Then we take advantage of large M by combining the individual sample statistics into one test statistic. However, here we have the added complication of dealing with two unknown parameters. Estimating unknown parameters directly, when the sample size n is small, leads to difficulties due to high variance of the estimators, especially for moments higher than the first. By working conditionally on the sample mean, we can avoid estimating one of the parameters, but the other one remains a problem. We have data of the following form. Let { Xi1 ,Xi2

M samples of size n.

For i

# i', we

,...,Xin],

i = 1.2

,...,M, represent

assume that the sets {X.. l a n d tX.,.l are mutually IJ

IJ

independent random variables with the same type of probability distribution but with possibly different parameter values. Consider underlying negative binomial distribution prqx for x = O,I,2

,...,

216

In Lukacs (I 963) and Heller (I 985131, it is shown that the negative binomial, Poisson pair of distributions are characterized by the zero regression on the mean of a statistic T: T = nL4- [n-(n-4)L ] L3 + (3-2n)L22 + [(n+l) L + L 2 ] L2-L3

n

and L = X I + X2 +

.. + Xn.

for sample XI, X2,

.., Xn.

(2)

As given in Heller (1985a), we construct a test statistic based upon the statistic

T.

I f X i s either negative binomial or Poisson, E ( T I L) = 0.

depend upon any unknown parameters.

The statistic T does not

Normalization is accomplished by utilizing the

conditional variance of T given L. (See Gart (1974) for conditional tests involving the Poisson distribution). From Heller (1985~)we have two formulas for conditional variance. Put V = Var (T I L ) if the underlying distribution i s Poisson. Put W(r) = Var (T /L) i f the underlying distribution is negative binomial. In the Poisson case, V doesn't depend upon any unknown parameters. negative binomial case, W depends upon the parameter r. As r

+ a,

But in the

the statistic W(r)

approaches the statistic V. In Heller (1985a), there are constructed two test statistics, representing two ways of dealing with the unknown parameter in W(r).

In each case, the null hypothesis is that

the underlying distribution i s either negative binomial or Poisson. Also in each case, the test statistic has, approximately, a t-distribution with M-l degrees of freedom, under the null hypothesis. Using formulas (I) and (2), (and the formulas for V and W), for each sample, Yli = Ti/Vi'/2 Y2i = Ti/Wi I /2

.

Then, for each of the above sets {Yli

1 and

{Yzi 1, put

7 equal to the sample mean and S 2

equal to the sample variance, (according to the usual formulas). For the first statistic, A, we use the set I Y l i

1

. The normalization is correct i f

the underlying distribution is Poisson and approximate i f it is negative binomial. Put A =

(Y1/SI) (M)'l2

For the second statistic, C(R), we use the set tY2i} and approximate the unknown parameters r r i } w i t h one "central" value R.

217 Put C(R) = (YZ(R)/S2(R)) where R is chosen such that S22 (R) has a value which is close to I. (See Heller (I 9850)). In Heller (19850), Monte Carlo studies on tests A and C(R) indicate that significance levels are satisfactory for M >, 10 and n 5 10. Also, we must hove n > 2.

If

the null hypothesis i s not rejected, we can distinguish between negative binomial and Poisson data by using the index of dispersion test. We consider an example of coliform counts (MF) which were obtained from the

U.S. Environmental Protection Agency, A.E.

McDoniels ( I 984).

The data consists of

several series of replicate counts obtained from a study on samples which were collected from a chlorinated municipal distribution system. The samples were split, held at 4O and 2OoC, and analyzed by the standard total coliform method a t 0, 2, 6, 24, 30, 48, and 72 hours.

Most analyses were made in triplicate but there were also a good many sets of 6

replicates. coliforms.

One set of 12 samples were analyzed as above for naturally occurring Another set of 12 samples were dosed with a pure coliform culture.

(A

discussion of this example may also be found in Haas and Heller (1985)). We perform the negative binomial

- Poisson goodness-of-fit

test on 4 sets of data:

Coliform count for "natural" samples for n = 6 and n = 3, Coliform count for "dosed" samples for n = 6 and n = 3. We consider first, the case where n = 6. For coliform counts from dosed samples, there were 68 sets of 6 replicates each.

For natural samples, there were 54 such sets.

Results are given in Tables I and 2. In Table I, significance of statistic A and C values is to be estimated by using the t-distribution with M - l degrees of freedom.

We also note,

that i f the data were Poisson, statistic s 1 would be close t o I .OO and statistic R would be h 2 large. In Table 2, D frequencies, f, are to be compared with expected frequencies, f, colculated from the

x2

distribution with 5 degrees of freedom.

We consider the D2 results first.

Evidently we reject the Poisson model for the

natural somples and do not reject it for the dosed.

Now looking at the A and C test

results for the natural samples, we see that a negative binomial model i s not rejected. Therefore, we are led to the conclusion that the Poisson model is appropriate for dosed samples and negative binomial for natural ones in this experiment. Looking more closely a t the A and C tests we see corroborative evidence for our conclusions. For the dosed samples: ( i) A and C values are close to each other, ( ii) S1 is not far from 1 , (iii) R is not small. These are all commensurate with the Poisson model. For the natural samples:

218 Table I. Neqative binomial qoodness-of-fit test results for dosed and natural samples, n = 6. Dosed M = 68

A = -1.40 C = -1.26

S I = 1.834 S2 = 1.000, R = 12.7

Natural M = 54

A = -0.898 C = 0.903

S I = 11.726 S2 = 1.000, R = 8.94

Table 2. D2 frequencies, n = 6.

NATURAL*

DOSED

06s. INTERVAL

FREQ. f

EXP. FREQ.

( 0, 1.6) (1.6, 3.1) (3. I, 4.7) (4.7, 6.3) (6.3, 7.9) (7.9; 9.4) > 9.4 Total

5 15 12 12 8 6 10 68

6.48 15.44 15.37 1 I .70 7.82 4.83 6.34 67.98

i

06s. (f-?/i

FREQ.

EXP. FREQ.

0.34 0.0 I 0.74 0.0 I 0.00 0.28 2.1 1 3.49

I 4 6 9 8 4 22 54

5.15 12.26 12.20 9.29 6.2 I 3.84 5.04 53.99

Table 3. Neqative binomial goodness-of-fit test results for dosed and natural samples, n = 3. Dosed M = 147

A = -0.956 C = -0.805

s1=

Natural M = 117

A = 0.1 I 3 C = 0.661

SI = 1.796 S2 = I .OO, R = 2 I .O

1.112 S2 = 1.00, R = 127.0

Table 4. D2 frequencies, n = 3.

NATURAL*

DOSED

06s. INTERVAL

FREQ, f

( 0, (1.6. (3. I * (4.7,

71 39 18 9 10 I47

1.6) 3.1) 4.7) 6.3) >

Tota I

6.3

EXP. FREQ.,

80.00 36.46 16.62 7.57

6.34 146.99

P

06s. (f-;I2/i

1.01 0.18 0.1 1 0.27 2.1 I 3.68

FREQ. f

55 27 13

II II I I7

EXP. FREQ., 63.7 29.0 13.2 6.0 5. I 117.0

^r

(f-i)2/i

I .20 0.14 0.00 4.17 6.83 12.34

219

(i) (ii)

A

and C are both not significant but are not close to eoch other,

SI

is appreciably larger than I. These are commensurate with the negative

binomial model. We perform a similar analysis on the triplicate data. Here there were 147 sets of dosed triplicates and I 17 sets of natural triplicates. We see here that, just as in the case where n = 6, the Poisson hypothesis is not rejected for dosed samples. For the natural samples, the Poisson model is rejected and the negative binomial not rejected; but the Poisson model i s not so far o f f the mark as was the case far n = 6. This illustrates the loss in discriminatory power (for both tests) in the case of n = 3 as compared to n = 6. In conclusion, we note various alternative distributions to the negative binomial model which are in common use. On the one hand, i f the data derive from Neyman Type

A

or Poisson-with-added-zeros

negative.

distribution,

the expected value of statistic TI

Therefore we expect the test statistics

A

and C to be "too" negative.

is

On the

other hand, if the data are from the logarithmic-with-zeros distribution, the stotistics

A

and C will have positive expected value and we expect to see values which are"too" positive.

(See El-Shaarawi (1985) for a discussion of some of these alternatives).

A

detailed discussion of the power of this goodness-of-fit test for the negative binomial distribution with respect to the above alternatives can be found in Heller (1985a).

220

REFERENCES C h r i s t i a n , R.R. and Pipes, W.O., 1983. Frequency d i s t r i b u t i o n o f c o l i f o r m s on water d i s t r i b u t i o n systems. A p p l i e d and Environmental M i c r o b i o l o g y , 45: 603-609. and Dutka, B.J., 1981. B a c t e r i a l d e n s i t y El-Shaarawi, A.H., Esterby, S.R., i n water determined by Poisson o r n e g a t i v e b i n o m i a l d i s t r i b u t i o n s . A p p l i e d and Environmental M i c r o b i o l o g y , 41: 107-116. El-Shaarawi, A.H., 1985. Some g o o d n e s s - o f - f i t methods f o r t h e Poisson p l u s added zeros d i s t r i b u t i o n . A p p l i e d and Environmental M i c r o b i o l o g y , 49: 1304-1306. Gart, J.J., 1974. The Poisson d i s t r i b u t i o n : The t h e o r y and a p p l i c a t i o n o f some c o n d i t i o n a l t e s t s . I n : G.P. Pate1 e t a l . ( E d i t o r s ) S t a t i s t i c a l D i s t r i b u t i o n s i n S c i e n t i f i c Work 2, pp. 125-140. Haas, C. and H e l l e r , B., 1985. S t a t i s t i c s o f enumerating t o t a l c o l i f o r m s i n water samples by membrane f i l t e r procedures. Water Research ( I n Press). H e l l e r , B., 1985a. A new n e g a t i v e binomial g o o d n e s s - o f - f i t t e s t based upon c h a r a c t e r i z a t i o n by z e r o r e g r e s s i o n ; u s e f u l f o r sequences o f small samp l e s (Submitted). H e l l e r , B., 1985b. C h a r a c t e r i z a t i o n o f t h e n e g a t i v e b i n o m i a l d i s t r i b u t i o n by r e g r e s s i o n p r o p e r t i e s ; re-examination o f a s t a t i s t i c due t o Lukacs (Submitted). H e l l e r , B., 1985c. Computation o f c e r t a i n c o n d i t i o n a l variances r e l a t i n g t o t h e Poisson and n e g a t i v e b i n o m i a l d i s t r i b u t i o n s w i t h t h e a i d o f MACSYMA (Submitted). Lukacs, E., 1963. C h a r a c t e r i z a t i o n problems f o r d i s c r e t e d i s t r i b u t i o n s . In: G. P a t i l ( E d i t o r ) Proc. I n t e r n a t i o n a l Symposium on C l a s s i c a l and Contagious D i s c r e t e D i s t r i b u t i o n s , pp, 65-73, Pergamon Press, Oxford, New York. 1984. Personal communication. McDaniels, A.E., McDaniels, A.E. and Bordner, R.H., 1983. E f f e c t s o f h o l d i n g t i m e and temperat u r e on c o l i f o r r n numbers i n d r i n k i n g water. Journal o f t h e American Water Works A s s o c i a t i o n , 75: 458-463. Pipes, W.O. and C h r i s t i a n , R.R., 1984. E s t i m a t i n g mean c o l i f o r m d e n s i t i e s o f water d i s t r i b u t i o n systems. Journal o f t h e American Water Works Associa t i o n , pp. 60-64.

REPORTING BACTERIOLOGICAL COUNTS FROM WATER SAMPLES: HOW GOOD I S THE INFORMATION FROM AN INDIVIDUAL SAMPLE? HILARY E . TILLETT Communicable Disease S u r v e i l l a n c e Centre, P u b l i c H e a l t h L a b o r a t o r y S e r v i c e , 61 C o l i n d a l e Avenue, London NW9 5EQ ABSTRACT I n a s s e s s i n g t h e m i c r o b i o l o g i c a l q u a l i t y o f w a t e r t h e s t a t i s t i c i a n has two main r e s p o n s i b i l i t i e s . i n d i v i d u a l sample.

F i r s t l y , t o assess c o r r e c t l y t h e i n f o r m a t i o n f r o m each

Secondly, t o a d v i s e on s a m p l i n g schemes f o r e f f i c i e n t and

r e a l i s t i c monitoring.

T h i s paper i s concerned w i t h t h e f i r s t problem.

I n B r i t a i n b a c t e r i o l o g i c a l e x a m i n a t i o n o f d r i n k i n g and r e c r e a t i o n a l w a t e r s i s o f t e n assessed u s i n g t h e m u l t i p l e d i l u t i o n ( " m u l t i p l e t u b e " ) method o r , i f n o t , by t h e membrane f i l t r a t i o n t e c h n i q u e .

Evidence f r o m q u a l i t y c o n t r o l

t r i a l s , where r e p l i c a t e s i m u l a t e d specimens were i s s u e d t o v o l u n t e e r l a b o r a t o r i e s , shows t h a t c o l i f o r m c o u n t s f r o m membrane f i l t r a t i o n tended t o be lower than the intended r e s u l t .

The m u l t i p l e t u b e method was more s e n s i t i v e i n

d e t e c t i n g the b a c t e r i a i n waters w i t h low contamination. Membrane f i l t r a t i o n g i v e s a p r e c i s e c o u n t whereas t h e m u l t i p l e t u b e method g i v e s an e s t i m a t e d c o u n t w h i c h s h o u l d be q u a l i f i e d by a range o f p r o b a b l e counts.

P u b l i s h e d t a b l e s o f most p r o b a b l e numbers (MPN) o f b a c t e r i a use expo-

n e n t i a l a p p r o x i m a t i o n s w h i c h r e q u i r e t h e assumption t h a t t h e w a t e r examined comes f r o m a l a r g e body o f homogeneous w a t e r .

Some MPN's have been r e c a l c u l a t e d

w i t h o u t making any such assumption and u s i n g occupancy t h e o r y .

I t i s suggested

t h a t , i n s i t u a t i o n s where t h e r e a r e c l o s e c o n t e n d e r s f o r t h e t i t l e MPN, a r a n g e of p r o b a b l e numbers s h o u l d be q u o t e d r a t h e r t h a n a s i n g l e MPN.

If the bacterio-

l o g i c a l c o n t e n t o f t h e w a t e r i s b e i n g compared w i t h a Standard, t h e n s h o u l d t h e whole o f t h e " p r o b a b l e range" o f c o u n t s pass t h i s Standard? The b a c t e r i o l o g i c a l r e s u l t f r o m a s i n g l e w a t e r sample s h o u l d be r e p o r t e d w i t h care. only.

I t s h o u l d be made c l e a r t h a t i t r e p r e s e n t s t h a t p l a c e a t t h a t t i m e

I t g i v e s no i n f o r m a t i o n a b o u t l i k e l y ranges o f c o u n t s a t t h e w a t e r

source, e x c e p t i n t h e u n l i k e l y s i t u a t i o n t h a t t h e sample comes f r o m a homogeneous body o f w a t e r . INTRODUCTION I n England and Wales r e s p o n s i b i l i t y f o r r o u t i n e t e s t i n g o f w a t e r samples f o r m i c r o - o r g a n i s m s i s shared by t h e P u b l i c H e a l t h L a b o r a t o r y S e r v i c e (PHLS) and

222

r e g i o n a l water a u t h o r i t i e s .

Since j o i n i n g t h e European Economic Community (EEC)

d i s c u s s i o n s have l e d t o t h e i n t r o d u c t i o n o f Standards f o r , amongst o t h e r t h i n g s , d r i n k i n g and b a t h i n g waters. The p r e - e x i s t i n g Standards used f o r water s u p p l i e d f o r d r i n k i n g have n o t had t o be a l t e r e d i n t h i s c o u n t r y .

They i n c l u d e requirements t h a t sampling be

f r e q u e n t and t h a t " s a t i s f a c t o r y " samples should c o n t a i n no E s c h e r i c h i a

coli

organisms, t h a t no consecutive samples should c o n t a i n any c o l i f o r m organisms and t h a t no i n d i v i d u a l sample should c o n t a i n more than t h r e e c o l i f o r m s p e r 100 m l (DOE, 1983). EEC d i r e c t i v e s on b a t h i n g waters r e q u i r e a minimum sampling frequency o f f o r t n i g h t l y , and g u i d e l i n e s i n c l u d e t h a t c o l i f o r m organisms should n o t exceed 500 and f a e c a l c o l i f o r m s should n o t exceed 100 p e r 100 m l .

Individual countries

a r e allowed t o s e t o t h e r l e v e l s than these, b u t t h e y should be w i t h i n c e r t a i n l i m i t s (European Communities, 1976). The r e s u l t s from r o u t i n e sampling should be s t u d i e d f o r t i m e t r e n d s b u t a l s o i t i s c l e a r t h a t each i n d i v i d u a l sample needs t o meet c e r t a i n c r i t e r i a . Therefore t h e s t a t i s t i c i a n needs t o advise, n o t o n l y on sampling s t r a t e g y and time s e r i e s a n a l y s i s , b u t a l s o on t h e i n t e r p r e t a t i o n o f b a c t e r i a l r e s u l t s from i n d i v i d u a l samples.

This paper i s concerned w i t h b a c t e r i a l counts from

i n d i v i d u a l samples and c o n s i d e r s whether t h e l a b o r a t o r y method used t o achieve t h e count should i n f l u e n c e t h e i n t e r p r e t a t i o n o f t h a t count. METHODS Routine water samples a r e i n v e s t i g a t e d f o r t o t a l c o l i f o r m organisms and f o r

_E. _c _o l i .

I n t h e PHLS two methods predominate, t h e membrane f i l t r a t i o n technique

and t h e m u l t i p l e tube ( d i l u t i o n s e r i e s ) method. ( i ) Membrane f i l t r a t i o n technique y i e l d s a count o f v i a b l e organisms ( s e l e c t e d by media, temperature and c u l t u r e c o n d i t i o n s ) i n t h e volume f i l t e r e d . I f t h e water sample comes from a homogeneous body of water t h e n t h e count can be

taken as an e s t i m a t e o f t h e v a r i a n c e o f b a c t e r i a l d e n s i t y i n t h a t body ( u s i n g t h e Poisson d i s t r i b u t i o n ) . I f t h e sample comes from a non-homogeneous water then t h e count r e p r e s e n t s t h a t sample s i t e and a t t h a t time o n l y . ( i i ) M u l t i p l e tube method y i e l d s an e s t i m a t e d count ( t r a d i t i o n a l l y t h e most probable number

-

MPN) i n t h e volume examined.

F o r standard d i l u t i o n s e r i e s

MPN's and accompanying confidence i n t e r v a l s (which a p p l y t o homogeneous waters) can be o b t a i n e d f r o m p u b l i s h e d t a b l e s (DOE, 1983; APHA, 1975) o r , f o r nonstandard s e r i e s , by computer program ( H u r l e y and Roscoe, 1983).

I f t h e water

sample comes from a non-homogeneous water then a c t u a l ranges o f probable numbers can be c a l c u l a t e d u s i n g occupancy t h e o r y ( T i l l e t t and Coleman, 1985) A homogeneous body o f water i s one i n which t h e b a c t e r i a a r e d i s t r i b u t e d

223

w i t h random v a r i a t i o n o n l y .

Such a c r i t e r i o n i s u n l i k e l y t o be e n c o u n t e r e d i n Perhaps random v a r i a t i o n

recreational waters o r pre-treatment d r i n k i n g waters.

ought n o t t o be assumed i n t r e a t e d w a t e r s s i n c e t h e aim o f s a m p l i n g i s t o l o o k f o r e v i d e n c e o f breakdown i n t r e a t m e n t r e s u l t i n g i n an i n f l u x o f b a c t e r i a .

I f w a t e r samples a r e assumed t o come f r o m w a t e r sources w i t h p o t e n t i a l l y non-random v a r i a t i o n (non-homogeneous) t h e n t h e membrane f i l t r a t i o n c o u n t c a n n o t be q u a l i f i e d by a c o n f i d e n c e i n t e r v a l and t h e r e f o r e i t i s c l e a r whether o r n o t t h e c o u n t exceeds a Standard.

B u t t h e MPN i s an e s t i m a t e d c o u n t and t h e r e may

be a range o f o t h e r l i k e l y c o u n t s .

T h i s paper i l l u s t r a t e s such examples and

suggests t h a t t h e r e s h o u l d be more d i s c u s s i o n on s e l e c t i n g t h e a p p r o p r i a t e range f o r comparison w i t h a Standard.

Then a comparison w i l l be made between membrane

f i l t r a t i o n and m u l t i p l e t u b e r e s u l t s i n some m u l t i - l a b o r a t o r y q u a l i t y c o n t r o l t r i a l s , w i t h r e g a r d t o d e t e c t i n g t h e presence and t o c o u n t i n g numbers o f coliforms.

The q u e s t i o n w i l l be asked whether e i t h e r r e s u l t s o r Standards need

t o be a d j u s t e d t o t a k e i n t o a c c o u n t t h e l a b o r a t o r y method used. RESULTS 1.

P r e c i s i o n o f m u l t i p l e $be

method

I f V i s a volume o f w a t e r i n which n b a c t e r i a a r e randomly d i s t r i b u t e d and

i f a s u b p o r t i o n , v, i s i n o c u l a t e d and i n c u b a t e d i n a t u b e o f c u l t u r e medium t h e n t h e p r o b a b i l i t y , p, o f no g r o w t h i s p = ( 1 -vn/V = e .

- v/V)~; if

v/V i s v e r y s m a l l t h e n

approximately p

T h i s a p p r o x i m a t i o n has been made f o r p u b l i s h e d t a b l e s o f MPN's and conf i d e n c e i n t e r v a l s (DOE, 1983; APHA,

1975) and computer program ( H u r l e y and

Roscoe, 1983) and i s v a l i d when a v e r y l a r g e sample i s c o l l e c t e d and m i x e d f o r e x a m i n a t i o n o f a v e r y s m a l l subsample, o r where t h e sample examined comes f r o m a l a r g e homogeneous source.

I f a r e l a t i v e l y s m a l l sample i s c o l l e c t e d f r o m a non-

homogeneous w a t e r s o u r c e t h e n p r o b a b l e numbers o f b a c t e r i a can be c a l c u l a t e d as follows :

-

I f m t e s t t u b e s w i t h e q u a l volumes o f t h e sample c o n t a i n n b a c t e r i a d i s t r i b u t e d

a t random, t h e n t h e p r o b a b i l i t y t h a t ( m - j ) t u b e s w i l l be s t e r i l e i s o b t a i n e d u s i n g occupancy t h e o r y ( D a v i d and B a r t o n , 1962). p ( j o c c u p i e d / n b a c t e r i a ) = -1 - m! mn ( m - j ) ! where

j,n

p(l/l) =

j,n

i s S t i r l i n g ' s number o f t h e second k i n d w i t h i n i t i a l c o n d i t i o n s

1, p(O/n) = 0 f o r a l l n and

p(j/n) =

0 for j>n.

P r o b a b i l i t i e s can be developed f o r d i f f e r e n t d i l u t i o n s e r i e s as shown by T i l l e t t and Coleman (1985) where t h e p a r t i c u l a r 11-tube s e r i e s 1 x 50 m l : 5 x 10 m l and 5 x 1 m l i s e x p l o r e d and c o n d i t i o n a l p r o b a b i l i t i e s t a b u l a t e d i n

224

The most p r o b a b l e numbers o f b a c t e r i a , u s i n g t h e s e e x a c t c o n d i t i o n a l

detail.

p r o b a b i l i t i e s , a r e v e r y c l o s e t o t h o s e o b t a i n e d f r o m t h e Poisson a p p r o x i m a t i o n . What i s a p p a r e n t i s t h a t t h e r e a r e s i t u a t i o n s i n which a s i n g l e MPN i s inappropriate. F i g u r e 1 shows c o n d i t i o n a l p r o b a b i l i t i e s a s s o c i a t e d w i t h a s i n g l e p o s i t i v e r e a c t i o n i n t h i s 11-tube s e r i e s .

I f the r e a c t i o n i s i n a 1 m l tube then i t i s

v e r y u n l i k e l y t h a t t h e r e a r e two o r more b a c t e r i a p r e s e n t i n t h e 105 m l sample examined.

I f t h e r e a c t i o n i s , as i s most l i k e l y , i n t h e 50 m l t u b e t h e n t h e

e s t i m a t e o f n i s n o t so c l e a r c u t . F i g u r e 2 shows c o n d i t i o n a l p r o b a b i l i t i e s a s s o c i a t e d w i t h a sample g i v i n g (1,5,1)

positive reactions.

t i t l e "MPN";

C l e a r l y t h e r e a r e many c l o s e c o n t e n d e r s f o r t h e

i n f a c t a l l v a l u e s o f n f r o m 30 t o 42 have a p r o b a b i l i t y a t l e a s t

95% o f t h e maximum p r o b a b i l i t y (most p r o b a b l e r a n g e ) and v a l u e s o f n f r o m 12 t o 120 have a p r o b a b i l i t y a t l e a s t 10% o f t h e maximum ( " p o s s i b l e " r a n g e ) . Most p r o b a b l e ranges f o r s e l e c t e d r e s u l t s f o r t h i s d i l u t i o n s e r i e s a r e shown i n t h e f i n a l column of T a b l e 1. TABLE 1

11-Tube D i l u t i o n S e r i e s 1 x 5 0 : 5 x 1 0 : 5 x 1 m l Selected combinations ( i , j , k ) o f p o s i t i v e r e a c t i o n s which a r e t h e most l i k e l y g i v e n t h e presence o f n b a c t e r i a p e r 100 m l ; and t h e most p r o b a b l e numbers o f b a c t e r i a f o r t h e s e c o m b i n a t i o n s when n i s unknown.

(i,j,k) 0 1 1 1 1 1 1 1 1 1 1 1

1 0 1 2 3 4 5 5 5 5 5 5

0 0 0 0 0 0 0 1 2 3 4 5

Values f o r n f o r which t h i s c o m b i n a t i o n i s most l i k e l y

Most P r o b a b l e Numbers i . e . quoted r e s u l t w h e f i i , j . k ) i s observed*

It

1

It

1

2- 3 4- 6 7 - 10 1 0 - 17 1 8 - 19 20 - 40 41 - 68 69-110 111-175 176- 00

2 4- 5 7- 9 11 - 14 20 - 27 29 - 40 44 - 65 75-110 134-1 90

>300- co

t b o t h combinations e q u a l l y l i k e l y . *defined as n such t h a t p ( i , j , k I n ) >0.95 x maximum p ( i , j , k

I n).

225

3. p(I,O,O In)

2. p(O,I,O In)

Fig I.

Probabilities of observing growth in one tube in the dilution series I x 5 0 m l : 5xIOml: 5xIml, conditional on the presence of n bacteria

MOST PROBABLE NUMBER = 35

PROBABLE NUMBERS = 30 - 42

POSSIBLE NUMBERS = 12 - 120

~b Fig 2.

io

io

40

1-

50

1 - - - r 60 70

do

do

do

n

lio

Probability of observing growth in l,5,l tubes in the dilution series I x 5 0 m l : 5xIOml: 5xIml, conditional on the presence of n bacteria

do

226

2.

Comparison o f methods

A s e r i e s o f m u l t i - l a b o r a t o r y q u a l i t y c o n t r o l t r i a l s i s b e i n g o r g a n i s e d by t h e PHLS Water Committee. So f a r E. -c o l i a t d i f f e r e n t d e n s i t i e s f o r each t r i a l have been i n t r o d u c e d i n t o a p r e p a r e d s i n g l e b a t c h o f w a t e r u s i n g t h e method d e s c r i b e d by Gray and Lowe (1976).

The r e p l i c a t e samples ( u s u a l l y 1 0 ) a r e s e n t

t o each l a b o r a t o r y and i n t e r s p e r s e d w i t h s t e r i l e samples.

Each l a b o r a t o r y i s

asked t o examine specimens u s i n g b o t h t h e m u l t i p l e t u b e method and membrane filtration.

L a b o r a t o r i e s have been u s i n g an 11-tube d i l u t i o n s e r i e s and pub-

l i s h e d MPN's (DOE, 1983). R e s u l t s have been compared w i t h r e s p e c t t o t h e d e t e c t i o n r a t e and t h e s i z e o f reported counts. ( i ) D e t e c t i o n o f E. c o l i .

with

r. 2were

O v e r a l l 387 samples f r o m batches i n o c u l a t e d

examined by t h e m u l t i p l e t u b e method and 19 (5%) were r e p o r t e d

s t e r i l e compared w i t h 37 (11%) o f 330 samples examined by membrane f i l t r a t i o n . T h i s d i f f e r e n c e i s s i g n i f i c a n t b u t i t would be more a p p r o p r i a t e t o c o n f i n e t h e comparison t o samples a c t u a l l y examined by b o t h methods and t o compare w i t h i n t r i a l s because o f v a r y i n g d e n s i t i e s between t r i a l s . T a b l e 2 shows t h a t t h e r e were c o n s i s t e n t l y more f a l s e n e g a t i v e s by membrane f i l t r a t i o n t h a n by m u l t i p l e t u b e ( p = 0.0002).

The f a l s e n e g a t i v e s were s p r e a d

amongst most o f t h e p a r t i c i p a t i n g l a b o r a t o r i e s and were n o t c o n f i n e d t o l a b o r a t o r i e s i n which t h e membrane f i l t r a t i o n t e c h n i q u e was t h e l e s s f a m i l i a r method.

When more t r i a l s have been done i t i s hoped t o make a more d e t a i l e d

a n a l y s i s o f r e s u l t s w i t h i n and between l a b o r a t o r i e s . TABLE 2 D e t e c t i o n o f E. c o l i i n S i x Q u a l i t y C o n t r o l T r i a l s 245 samples examined b y b o t h methods Trial No.

1 2 3 4 5 6

M u l t i p l e Tube -ves total

1 0 0

20

0 5

60 80 60 9

0

16

Cochran's X = -3.68,

Membrane F i 1t r a t i o n -ves total 3 3

20 60

5

80

4

60

6

9

2

16

p = 0.0002

( i i ) Reported c o u n t s .

T a b l e 3 shows t h e median c o u n t s o f

z.coo r e p o r t e d

i n each q u a l i t y c o n t r o l t r i a l f o r samples examined by b o t h methods.

I n every

t r i a l t h e median r e s u l t by m u l t i p l e t u b e was h i g h e r t h a n b y membrane f i l t r a t i o n e x c e p t f o r one t r i a l where t h e y were t h e same.

227

TABLE 3 Reported numbers of E.coli in six quality control trials 245 samples examined by both methods ~~~

~~

Trial No.

No. of samples

Median Count Multiple tube* Membrane filtration

~~

1 2 3 4 5 6

20 60 80 60 9 16

15 24 63 50 0 6

50 35 90 50 1 13

* Laboratories reported MPN's from published tables for the 1 x 50:

5 x 10:

5 x

lml dilution series

Was this a genuine difference or an artefact caused by the large gaps in tabulated MPN's? For example, in the 11-tube series used by these laboratories the published tables give no count between the range 50 for a (1,5,2) result and 90 for a (1,5,3) result, so that if an actual count was 75, it might register as a (1,5,3) result and, traditionally, be reported as an MPN of 90. In order to investigate this question results have been studied for each laboratory and each quality control trial. For example laboratory 1 in trial 3 reported 10 membrane filtration counts of 42, 46, 62, 71, 79, 79, 80, 84, 84, 85 with a median value of 79. Thus the estimated bacterial density for this water batch was 79 per 1OOml by this laboratory. A typical sample from a water with 79 E.coli per 1OOml would contain 83 E.coli in the 105ml used in the dilution series. From probabilities conditional on the presence of 83 organisms (Tillett and Coleman 1985) it can be shown that the probable numbers of tubes observed positive would be:p (1,5,2/83) = 0.280 p (1,5,3/83) = 0.341 p (1,5,4/83) = 0.202 Therefore the most probable multiple tube result would be (1,5,3) for which the laboratory would have reported the MPN of 90. However the MPN results recorded for these 10 replicate water samples were 50, 2 x 90, 7 x 160 with a median value of 160 By this comparison the higher counts from the multiple tube method are not accounted for by the fact that the observed membrane filtration average count of 79 could have yielded an MPN of 90. (Table 1 shows the most likely multiple tube results for given values of n bacteria per 100ml.) In Table 4 this comparison is repeated for each laboratory in the three

228

largest quality control trials. Far 14 of the 20 comparisons the membrane median count, expressed as it equivalent MPN, was less than the observed median MPN. In only one comparison was it larger.

TABLE 4 Comparison of median counts in each laboratory between methods, adjusting membrane filtration counts to their most likely MPN Trial No.

Membrane Count converted to MPN

<

=

>

3 1

0 1

1

0

Total Laboratories

observed MPN 2 3 4

3 6 5

6 8 6

DISCUSSION When routine sampling of water to monitor the bacteriological content is planned then allowance should be made for variability of counts with time and place. In recreational and pre-treatment water these variations could be large, but must be observed in order to check that there is no undesirable upward trend. As well as studying trends the results from each individual water sample has to be checked to see that it conforms to certain Standards in Britain. If the sample has been examined by membrane filtration then an actual count is achieved and can be compared directly with the Standard. I f the water has heen examined using a multiple tube method then there may be an estimated range of counts so that the comparison with the Standard may not be clearcut. An explanation of exact probability methods to calculate the probable numbers of bacteria was introduced in a previous paper (Tillett and Coleman, 1985) and has been continued here. The selection of a range of most probable numbers should, perhaps, reflect two things. Firstly, the presence of close contenders for the title 'most probable number'. It has been shown that these ranges are quite large when most of the tubes have shown a positive reaction. As a suggestion a range had been presented where all the conditional probabilities p( i,j,k/n) are at least 95% as great as the value o f p(i,j,k/ MPN), the maximum conditional probability. However, it could be argued that this range is not wide enough and that values with a 90% probability, or even less, could still be regarded as probable.

229

Secondly, i t must be r e a l i s e d t h a t t h e d i l u t i o n s e r i e s may have gaps between most p r o b a b l e numbers, e s p e c i a l l y i n t h e r a n g e where n e a r l y a l l t h e tubes show a p o s i t i v e r e a c t i o n .

Thus a t r u e c o u n t o f 75, f o r example,

is

most l i k e l y t o g i v e a c o m b i n a t i o n o f p o s i t i v e r e a c t i o n ( i n t h e 1 1 - t u b e s e r i e s s t u d i e d ) f o r which t h e MPN has t r a d i t i o n a l l y been quoted as 90.

However, t h e

Indeed, f o r t h e most o f t e n

suggested most p r o b a b l e r a n g e i s 75 t o 110.

observed c o m b i n a t i o n s o f p o s i t i v e t u b e s i n t h i s s e r i e s ,

t h e range counts f o r

which each c o m b i n a t i o n i s t h e most l i k e l y i s o f t e n comparable t o t h e most p r o b a b l e range o f 95% maximum c o n d i t i o n a l p r o b a b i l i t i e s . W i t h t h e membrane f i l t r a t i o n method i n t e r p r e t i n g r e s u l t s and comparing them w i t h a S t a n d a r d i s s t r a i g h t f o r w a r d .

However, m u l t i - l a b o r a t o r y q u a l i t y

c o n t r o l t r i a l s i n England and Wales have, o v e r a two-year p e r i o d , c o n s i s t e n t l y shown t h e m u l t i p l e t u b e method t o be more s e n s i t i v e i n d e t e c t i n g t h e presence o f E . c o l i and t o g i v e h i g h e r c o u n t s f r o m t h e same f r e s h w a t e r sample.

(Each w a t e r 'sample' was o f s u f f i c i e n t volume t o a l l o w h a l f t o be

used i n t h e 1 1 - t u b e d i l u t i o n s e r i e s d e s c r i b e d and l O O m l t o be f i l t e r e d . ) D i f f e r e n c e s i n s e n s i t i v i t y were l a r g e r t h a n c o u l d be accounted f o r b y t h e s l i g h t d i f f e r e n c e s i n volumes examined

-

105ml compared w i t h 100ml.

Twelve l a b o r a t o r i e s have been i n v o l v e d i n t h e s e t r i a l s so f a r , b u t i t i s hoped t o i n c l u d e more and t o e x p l o r e o t h e r and m i x e d b a c t e r i a . I f t h e s e d i f f e r e n c e s a p p l y e l s e w h e r e t h e n i t would seem t h a t s l i g h t l y

g r e a t e r volumes o f w a t e r s h o u l d be f i l t e r e d when l o w l e v e l s o f c o n t a m i n a t i o n a r e expected.

T h i s s h o u l d improve s e n s i t i v i t y .

s h o u l d be g i v e n t o ' d o u b l e s t a n d a r d s ' .

With higher counts thought

Should t h e l e v e l o f a c c e p t a b l e c o u n t s

be s e t h i g h e r when t h e m u l t i p l e t u b e method i s used so as t o a l l o w f o r r e p o r t i n g o f ranges r a t h e r t h a n s i n g l e c o u n t s and t o a l l o w f o r t h e f a c t t h a t r e s u l t s may t e n d t o be h i g h e r because o f t h e method and because o f t h e r o u n d i n g up f o r t r u e c o u n t s which f a l l between t h e sequence o f o b s e r v a b l e combinations o f tube r e a c t i o n s ? I n c o n c l u s i o n , t h e r e s u l t s f r o m t h e m u l t i p l e t u b e method s h o u l d be expressed as a r a n g e o f most p r o b a b l e counts.

I t s h o u l d be made c l e a r t h a t

t h e range i s because o f t h e u n c e r t a i n t y o f t h e method.

( I t i s n o t meant as

an i n d i c a t i o n o f l i k e l y r a n g e o f c o u n t s a t t h e w a t e r s o u r c e be e s t i m a t e d o n l y b y t a l k i n g m u l t i p l e samples.)

-

t h a t range can

I n most s i t u a t i o n s t h e r a n g e

s h o u l d be c a l c u l a t e d u s i n g methods a p p r o p r i a t e f o r non-homogeneous waters. Evidence f r o m one s e t o f q u a l i t y c o n t r o l t r i a l s i m p l i e s t h a t a w a t e r sample i s more l i k e l y t o e r r o n e o u s l y pass a S t a n d a r d i f i t i s examined b y membrane f i l t r a t i o n r a t h e r t h a n b y m u l t i p l e t u b e method.

230 ACKNOWLEDGEMENT Thanks a r e due t o t h e P u b l i c H e a l t h L a b o r a t o r y S e r v i c e Water Committee f o r a l l o w i n g me t o p r e s e n t r e s u l t s f r o m t h e Q u a l i t y C o n t r o l t r i a l s .

REFERENCES American P u b l i c H e a l t h A s s o c i a t i o n , American Water Works A s s o c i a t i o n , Water P o l l u t i o n C o n t r o l F e d e r a t i o n , 1975. S t a n d a r d Methods f o r t h e E x a m i n a t i o n o f Water and Waste-water. APHA, Washington D.C. David, F.N.

and B a r t o n , D.E.

1962.

C o m b i n a t o r i a l Chance.

G r i f f i n , London.

Department o f t h e Environment, Department o f H e a l t h and S o c i a l S e c u r i t y , P u b l i c H e a l t h L a b o r a t o r y S e r v i c e , 1983. The b a c t e r i o l o g i c a l E x a m i n a t i o n o f D r i n k i n g Water S u p p l i e s , 1982. Her M a j e s t y ' s S t a t i o n e r y O f f i c e , London. European Communities, 1976. Hygiene o f b a t h i n g waters. o f H e a l t h L e g i s l a t i o n . 27: 709-724.

I n t e r n a t i o n a l Digest

1976. The p r e p a r a t i o n o f s i m u l a t e d w a t e r samples Gray, R.D. and Lowe, G.H. f o r t h e purpose o f b a c t e r i o l o g i c a l q u a l i t y c o n t r o l . J o u r n a l o f Hygiene. 76: 49. H u r l e y , M.A. and Roscoe, M.E. 1983. Automated s t a t i s t i c a l a n a l y s i s o f m i c r o b i a l enumeration b y d i l u t i o n s e r i e s . Journal o f Applied B a c t e r i o l o g y . 55: 159-164. T i l l e t t , H.E. and Coleman, R. 1985. E s t i m a t e d numbers o f b a c t e r i a i n samples f r o m non-homogeneous b o d i e s o f w a t e r : how s h o u l d MPN and membrane f i l t r a t i o n r e s u l t s be r e p o r t e d ? J o u r n a l o f A p p l i e d B a c t e r i o l o g y . 59: 38L

SOME APPLICATIONS OF LINEAR MODELS FOR ANALYSIS OF CONTAMINANTS IN AQUATIC BIOTA

ROGER H. GREEN University of Western Ontario

1.

INTRODUCTION This paper deals with log-log linear models and some examples of their

application to water quality monitoring.

Such models arise out of any

situation where it is desired to estimate a proportion, percentage, or ratio which is in practice calculated from two other observed variables.

The

common practice of actually calculating this derived variable for each sampled observation, and then using the desired value as the response variable in a statistical model such as ANOVA, leads to problems of both statistical analysis validity and of interpretation of the results (Sokal and Rohlf 1 9 7 3 , Atchley et a1 1 9 7 6 , Green 1979).

Log-log regression or analysis of covariance

(ANCOVA) models can usually satisfy the objectives in such studies without derived variables being used in statistical analysis. The problem and this solution to it can best be explained by an example, which does not relate to water quality monitoring.

Then three examples which

are in a water quality monitoring context will be presented.

All four

examples are based on simulated data, s o that known parameters can be estimated by statistical procedures which can then be evaluated by their success in estimating the parameters and by their success in testing hypotheses whose truth or falsehood are known.

The simulated data for all

four examples are given in Appendix 1 , so that readers can do the analyses themselves.

All data simulation was done in the MINITAB statistical package.

Statistical analyses were done both in MINITAB and SAS.

In Appendix 2 is an

annotated SAS (Statistical Analysis System) j o b listing which will carry out both the ratio-variable and the log-log ANCOVA analyses for the first example. It can also be used to do the analyses for the other three examples, or for any log-log ANCOVA analysis with these objectives, by changing the variable names and changing the data to be analyzed.

232 2.

WATER CONTENT OF SPRING AND FALL FROGS - AN EXAMPLE Frogs are collected in spring and fall, and the question is whether there

is a difference between seasons in the percentage water content of the frogs. A common approach would be to determine total weight and dry weight for each frog, calculate percent water, and then do an ANOVA with the response variable being the derived percent water values and the treatment groups being the two seasons. Apart from the question of the poor statistical behavior of a ratio variable calculated using a denominator which has substantial variance, there is the problem that critical questions are being ignored and particular answers to them are being assumed.

For one thing, is percentage water content

independent of the size of a frog? The "derived ratio-variable'' approach assumes that it is, and treats the data in a way that obscures the problem. Furthermore, it may be that at one time of year the relationship between percent water and frog size is different than at the other time of year.

For

example, there may be a relationship in fall but not in spring. Let us instead build a log-log ANCOVA model as follows. water content of a frog and let D be the dry weight. total weight is W+D.

Let W be the

Therefore the frog's

If frog size changes, the percent water will stay the

same if and only if the water content changes at the same percentage rate as does the dry weight.

For example, if a 10 g frog is 6 g water and 4 g dry

weight, then both water and dry weight must go up 20 percent (to 7.2 g and 4 . 8 g) if a 12 g frog is to have the same percent water.

Thus dW/W

=

b(dD/D)

with b = l .

If this differential model is integrated we obtain the log-log b model log W = a + b log D, and the nonlinear form W = A D where A=ea. Again, only if b=l does percent water remain constant as frog size varies, and of course this is true of the WID ratio as well. W/D

=

If b=l then W

the constant ratio of water content to dry weight.

=

A D , and A

=

If b < l then percent

water, and the W/D ratio, decrease as frog size increases.

If b>l then

percent water and the W/D ratio increase with increasing frog size. The log-log form of the model is convenient for analysis because it represents a linear relationship between log W and log D.

A dummy variable

can be added to represent intercept differences between seasons, and another variable equal to the product of the dummy variable and log D to represent slope differences between seasons, and we then have a log-log ANCOVA model. By ANCOVA we can answer all four of the following questions: (a) What is the percentage water content, or equivalently the WID ratio? (b) Does percent water differ between seasons? (c) Does the size o f the frog influence the percentage water content? (d) Does the relationship, if any, between percentage water content and size of frog differ between seasons?

233

Question (d) corresponds to a test of H : "common slope" in the log-log ANCOVA model, and question (c) corresponds to a test of H : "slope b=l".

Given

acceptance of H : "common slope", question (b) corresponds to a test of H : "common intercept".

Given acceptance of H : "slope b=l" for either or both of

the regressions (for each season), question (a) corresponds to estimation of the intercept a from which A

=

ea = W/D can easily be calculated.

percent water content can then be calculated as 100W/(W+D)

Of course

= lOO(W/D)/(W/D+l).

Figure 1 shows the log-log plot of data simulated to represent this analysis problem.

The 30 values of log D are from a uniform random

distribution between 0 to 3, corresponding to a range of 1 to 20 for D.

We

simulate the data to represent the case that the answer to question (d) is "no", by simulating under a common slope model.

However we make the answer to

question (c) "yes" by choosing a common slope b=l.l, implying that percent water increases with size of frog because b>l.

The intercept for spring frogs

is chosen to be a =-0.511 corresponding to A=ea=0.6, and for fall frogs S

a =-0.223 corresponding to A=ea=0.8. F

Thus an average 1 g spring frog has a

W/D ratio of 0.6, equivalent to a percentage water content of 100(0.6)/(0.6+1)=37.5

percent.

For an average 1 g fall frog W/D=0.8 and

percentage water content is 100(0.8)/(0.8+1)=44

percent.

For a frog of any

given dry weight, the fall frog will average one-third greater water content (WF/Ws

=

0.8 D1'l/0.6

=

1.33).

Any realistic data set must have error variance in it, and in this case we introduce an error standard deviation (in predicting log W from knowledge of log D and season) of 5=0.1. The ANCOVA on these data yielded the following results. slope" was accepted (p>0.05).

The Ho:

The H : "common

"b=l" was rejected (p
common slope was estimated to be b=1.106.

The Ho: "common intercept" was

rejected (p
and the fall intercept aF as -0.215 (corresponding

The square root of the error mean square, which is an estimate

, was 0.096. It can be seen that our estimates are close to the model

parameters, and that all questions and associated hypotheses were answered correctly. 3. 3.1

WATER QUALITY MONITORING EXAMPLES BIOMAGNIFICATION OF A CONTAMINANT The first example of an application to water quality monitoring has to do

with biomagnification of a contaminant's concentration. Let us assume that clams are sampled from a mercury-contaminated river at various distances from the point source.

At each clam's location a sediment sample is taken and

234

WATER CONTENT OF SPRING AND FALL FROGS 3.0-

2.5L

0 6

2.0U A

T E 1.5R

U E 1.0-

I 6

H T 8.5-

8.8

8.5

1.8

1.5

2.8

2.5

3.8

Dashed line k "S" symbols: S p r i n s Broken line & "P" symbols: Fatt j c r o g ~ F i g u r e 1.

Log-log p l o t of s i m u l a t e d d a t a r e p r e s e n t i n g water c o n t e n t v e r s u s d r y weight f o r f a l l and s p r i n g f r o g s . The f i t t e d model i s drawn i n . The d i a g o n a l i s shown a s a s o l i d l i n e .

235

analyzed for Hg, and both muscle and liver from the clam are also analyzed for Hg.

Suppose that we wish to estimate: a) the relationship between Hg concentration in sediment ([Hg] ) and the S

Hg concentration in clam tissue ([HglT where T=M or T=L for muscle or liver respectively), b) the influence of tissue-type (muscle versus liver) on the biomagnification (i.e., on the ratio of [HgIT to [HgIS), c) The influence of varying sediment contamination ([Hg] ) on the S

biomagnification, and d) any difference between the tissues in how biomagnification responds to varying [%Is. The ratio-variable approach would start by calculating a synthetic variable "[HgIT divided by [HgIS", and then continue as an ANOVA where this synthetic ratio-variable (the estimated biomagnification for that particular tissue and clam) is the response, and the treatment or group is tissue type. The problems with this approach are the same as for the "spring and fall frogs" example previously described.

In particular, only (a) and (b) above

are estimated and they are estimated on the tacit assumption that effects (c) and (d) are nonexistent. The log-log regression approach derives from the following logic, which is analagous to that in the "spring and fall frogs" example.

As [HglS varies

over the contaminated area, the percentage variation of [HgIT should be proportional to the percentage variation of [HglS.

That is, d[HglT/[HglT

=

If (c), and therefore (a), are not significant effects then b(d[HgIS/[HglS). the word "proportional" can be replaced by the word "equal", and b=l. By integration we obtain the log-log model log[HglT nonlinear model [HglT =

AIHglS and A

=

=

A ([Hg]s)b

[HglT/[HglS

=

where A=ea.

=

a

+

b log[HgIS, and the

If and only if b=l, then [HglT

the estimate of the biomagnification.

If bfl

then the biomagnification varies as [HglS varies, and its magnitude cannot be determined unless [HgIS is specified. The entire analysis, with all desired tests of hypotheses (a)-(d),

is

easily done as a log-log ANCOVA in the same mannner as for the "spring and fall frogs'' example.

The dummy variable for group membership now identifies

tissue-type instead of season.

The response and predictor variables are now

log [HglT and log [HglS instead of log Ww and log WD.

Figure 2 shows the

log-log plot of a set of data simulated to represent this kind of analysis problem.

The 20 values of [HglS are from a uniform random distribution

between 0 and 100. For muscle the [Hg]T,M

HI]^,^ =

2.303

+

0.7 log [HglS

+5

where

the predictive model [HglTZM = 10[Hg]S0'7.

values satisfy the relationship log

5

=

0.2.

In nonlinear form this is

For liver the corresponding models

236

BIOMAGNIFICATION IN TISSUES

e

1

2

3 LO6 Hg CONC.

4

5

6

7

IN SEDMDcl

Dashed line & "L" symbols: L i v e r

Broken line & "M" symbols: Muscte Figure 2 .

Log-log p l o t of s i m u l a t e d d a t a r e p r e s e n t i n g mercury c o n c e n t r a t i o n i n l i v e r and muscle o f clams v e r s u s mercury c o n c e n t r a t i o n i n s e d i m e n t . The f i t t e d model i s drawn i n . The d i a g o n a l i s shown as a solid line.

231

are log[HglTCL 6[Hg]so'9.

=

1.792

+

0.9 log[HglS

+ 5 , again with 5

=

0.2, and

=

The choice of parameters is intended to represent a situation

where (a)-(b)

the biomagnification at [Hg] =1 is 10 for muscle and S

6 for liver, (c) biomagnification drops as [HglS increases, but (d) the decrease is greater for muscle than for liver. The ANCOVA on these data yielded the following results. slopes" was rejected (p
+

+ 0.96

The Ho: "common

and the two regression models were estimated to 0.60 , and [Hg]T,M = 13.9[HglS

0.60 log[HglS, or

log[HglS, or [HglT,L

5.2[Hg]S0'96.

=

of the error mean square, estimating 5 , is 0.22.

The square root

These parameter estimates,

for slopes, intercepts and residual error, can be compared with the previously stated parameters of the simulated data. questions corresponding to (a)-(d)

They are reasonably close, and the

that we wished to answer are answered

correctly, i.e., there are no Type I or I1 errors. 3.2

RATIO OF ISOTOPES OF ELEMENTS IN BIOGENIC MATERIAL The second water quality monitoring example has to do with the use of

relative concentration of isotopes of two elements in biogenic material as a monitor of potential pollution by an effluent containing one of the elements. Let us assume that clams are sampled from two areas, one of which receives an effluent which contains or may contain strontium. the control.

The other area serves as

The shell of each of these clams is analyzed for Sr and Ca

concentration, and then converted to total Sr and Ca content in that shell using the weight of the shell as a multiplier.

Suppose that we wish to

estimate : (a) the relationship between Sr and Ca content in a shell, (b) the influence of location (impact versus control areas) on the Sr/Ca ratio, (c) the influence of clam size (and presumable age), which is measured by Ca content, on the Sr/Ca ratio, and (d) any diffence between the locations in how the Sr ratio responds to varying clam size. The ratio-variable approach would contrast observed Sr /CaL ratios L between locations L=I and L=C, by ANOVA. Again this approach would only estimate effects (a) and (b) and would do so assuming that effects (c) and (d) are non-existent. The log-log regression approach is based on the assumption that any percentage variation in Sr content of shells would be proportional to

238

percentage variation in Ca content.

That is, d SrL/SrL

integration we obtain the log-log model log SrL

=

a

+

= b(dCaL/CaL). By b log CaL, and the

nonlinear model Sr = A CaLb where A=ea. If and only if b=l, then SrL = A CaL L and A = Sr /CaL. If b'l then the Sr /Cay ratio varies as clam size varies, L L L and its magnitude cannot be determined unless clam size (i.e., calcium content Ca ) is specified. L

Figure 3 shows the log-log plot of data simulated to represent this The 16 values of log Ca are from a uniform random distribution

situation.

between 0 and 3, which corresponds to a range of 1 to 20 for Ca.

Here we

simulate the situation where the (c) and (d) effects are nonexistent, i.e. the slopes are the same for the two locations and they are equal to b=l. Therefore the Sr /Ca ratio does not change as clam size varies. However, we L L do create differences between the locations in the intercepts a = log A, such that A

=

the Sr/Ca ratio is AC

the impacted area. SrI

=

0.1 CaI or log SrI

=

=

0.01 for the control area and AI

= 0.1

for

0.01 CaC or log SrC = -4.605 + log CaC, and -2.303 + l o g CaI. For both locations < = 1 for the

Thus SrC

=

log-log models.

The ANCOVA on these data yielded the following results.

The Ho:"common

slopes" was accepted (p> 0.05) and the common slope was estimated to be b

=

1.27.

The Ho:"b=l" could not be rejected (p> 0.05).

The Ho:"common

intercepts assuming common slope" was rejected (pi 0.01). The estimates for the "common slope b=l" model are log SrC

-5.0835

+

log CaC or SrC

0.0062 Ca

for the control area, C 0.1171 Ca for the impacted area. I The square root of the error mean square, estimating <, is 1.14. Again our =

and log SrI

=

-2.1452

+

=

log CaI or SrI

=

estimates are close to the true parameter values, and we have answered the questions corresponding to (a)-(d) 3.3

correctly.

RATIO OF SENSITIVE SPECIES TO RESISTANT SPECIES The third water quality monitoring example uses the ratio of the number

of species classed as "sensitive" ( S ) to the number classed as "resistant" (R) as a community index of water quality.

It is assumed that such a

categorization of species has been done a priori, and this is certainly a valid assumption for certain taxonomic groups in fresh water.

Again samples

are from two areas, one a possible impacted area and the other a control area. The species present in each sample are determined, and the number of species on the "sensitive" and "resistant" lists are recorded.

We wish to estimate:

(a)

the ratio of S to R in a sample,

(b)

the influence of location (impact versus control area) on the S/R ratio,

239

RATIO OF ISOTOPES IN SHELL

L! 3-l

60 1

/

:-I

/ / /

c -2

/ /

/

/

0

N-

I

T

C

/

-5

-4

& /

/

/

/ /

-6

/

-2

-3

-1

0

I

2

3

LO6 Co CONTENT

Dashed line 8t I symbols: Impacted &Tea Broken line 8t “C” symbols: Controt area ff

Figure 3.

ff

Log-log plot of simulated data representing the strontium content versus calcium content of clam shells from impacted and control areas. T h e fitted model is drawn in. T h e diagonal is shown a s a s o l i d line.

240

(c)

the influence of number of species present, as indicated by R, on

(d)

any differences between the locations in how the S I R ratio responds

the SIR ratio, and to varying species richness. The ratio-variable approach would contrast observed S/R ratios between locations L=I and L=C, by ANOVA.

Again only effects (a) and (b) would be

estimated based on the assumption that effects (c) and (d) are nonexistent. The log-log regression approach assumes that percentage variation in number of "sensitive" species (S) in a sample would be proportional to percentage variation in number of "resistant" species (R) in the sample.

That

By integration we obtain the log-log model

is, dSL/SL = b(dtt/%). logsL = a + b log and the nonlinear model S = A where A = ea. If and L If b=l then the SL/\ ratio varies and A = SL/%. only if b=l, then S = A L as species richness varies, and its magnitude cannot be determined unless

'tb

2,

't

is specified. Figure 4 shows the log-log plot of data simulated to represent this situation.

The 26 values of l o g R are from a uniform random distribution

between 1.386 and 3.689, which corresponds to a range of 4 to 40 for R.

As in

the previous example we simulate the situation where the (d) effect is nonexistent, but here we simulate a (c) effect.

That is, the slopes are the

same for the two locations but the common slope is b=1.2.

Therefore as the

number of species increases (greater species richness) the proportion of sensitive species, and the S/R ratio, increases.

This fits the perception

that higher diversity communities tend to have more "K-selected'' (biologically rather than physically accommodated) species. For the control area the log-log model is log Sc

=

-0.916

+

1.2

and the nonlinear form is S = 0.4 RCla2. For the impact area the C = -2.303 + 1.2 log RI. For both locations<= 0 . 8 .

model is log SI

The ANCOVA on these data yielded the following results.

The Ho "common

slope" was accepted (p > 0 . 0 5 ) , and the Ho:"b=l" just fails rejection (t = 1.93, t

.05

(23 df)

=

2.07).

Therefore we would marginally accept a common

slope b=l model, and conclude that the S / R ratio does not vary as species richness varies.

In doing s o , of course, we would commit the Type I1 error

since in fact b=1.2.

However, the model shown in Figure 4 is that based on

the estimated common slope of b=1.48, as it would have been had we correctly Sc

=

.

That model is log S = -1.718 + 1.48 log RR or C 0.18 R 1*48for the control area, and log SI = -2.966 + 1.48 log RI or

rejected H :"b=l" SI = 0.05 RIF.48

for the impact area.

square, estimating<, is 0.81.

The square root of the error mean

In this example we have not done as good a job

at estimating the parameters, and we would have committed one Type I1 error.

241

RATIO OF SENSITIVE TO RESISTANT SPECIES

5j I

L : 046

:

# 3-

s : E 2N :

s : I 1-

T . I :

/

r

-2

-1

/

e

I

I

2

3

4

5

LO6 # RESISTANT SPP.

II

1)

Dashed line & I symbols: Impact u r e a Broken line & "C" symbols: Control urea Figure 4 .

Log-log plot of simulated data representing the number of species in a sample classed as sensitive versus the number classed as resistant. The samples are from impacted and control areas. The fitted model is drawn in. The diagonal is shown a s a solid line.

242

A larger sample would have been needed to correctly reject H :"b=l" , and to obtain better estimates of the parameters.

4.

DISCUSSION A few cautionary comments should be added.

It should be recognized that

in all the above examples we have applied Model I least squares regression analysis and ANCOVA to what are obvious Model I1 data.

That is, both the

response variable and the predictor variable are really response variables that are sampled rather than controlled, and presumably both are measured with error.

The seriousness and consequences of this have been endlessly debated

in the literature (Madansky 1959, Kidwell and Chase 1967, Kuhry and Marcus 1977, Ricker 1973, 1975, 1984, Jolicoeur 1975, Snedecor and Cochran 1980). There is little agreement on remedies or alternatives, or indeed on whether remedies are needed.

The main concern is that the estimate of the slope is

biased downward by error in estimating the predictor variable (Snedecor and Cochran 1980).

If this error is similar in different groups, and the range of

observations on the predictor variable is similar in the different groups, then the test of H :"common slope" should not be affected.

It is estimates of

what the slopes are, and tests of H :"b=l" , that may be biased.

Similarly, if

the common slope model is fitted then the test of H :"common intercepts" should not be affected, but the estimates of what the intercepts are may be biased. My own approach to this problem is to re-estimate slopes by other more robust methods.

A variety of methods are available, such as finding the slope

of the first principal component in the 2-dimensional space defined by the two log-transformed variables (see Sokal and Rohlf 1981 for calculations), or using the method of grouping (Wald 1940, Nair and Banerjee 1942, Bartlett 1949, Madansky 1959, Kidwell and Chase 1967) which is implemented as the RLINE procedure in the MINITAB statistical package (see Velleman and Hoaglin 1981, which also contains FORTRAN and BASIC programs).

If these estimates are in

close agreement with the least squares Model I estimates, then bias is probably not a problem.

If they are not in agreement, then perhaps a barely

significant slope b < 1 should be taken with a grain of salt.

For the "percent

water in frog" data, the RLINE estimate of the common slope was 1.1062 compared to the least squares estimate of 1.1061 and the true value of 1.1. For the "biomagnification of mercury" data, the RLINE estimate of the slope for muscle was 0.57 compared to the least squares estimate of 0.60 and the true value of 0.7.

In neither case wou1.d there seem to be any cause for

concern. Another concern is that the groups being contrasted should not differ on

243

the predictor variable.

If they do differ, then an unbalanced design results.

Unfortunately, since the predictor variable is usually not controlled, it may differ between the groups. results.

This could result in ambiguous interpretation of

For discussion see Snedecor and Cochran 1 9 8 0 , Huitema 1 9 8 0 , Cox and

McCullagh 1 9 8 2 .

In all the examples presented here data were simulated for

both groups from a common distribution on the predictor variable.

REFERENCES Atchley, W.R., C.I. Gaskins, and D. Anderson., 1 9 7 6 . Statistical properties of ratios. I. Empirical results. Syst. Zool., 2 5 : 137-148. Bartlett, M.S., 1 9 4 9 . Fitting a straight line when both variables are subject to error. Biometrics, 5 : 207-212. Cox, D.R. and P. McCullagh, 1 9 8 2 . Some aspects of analysis of covariance. Biometrics, 3 8 : 541-554. Green, R.H., 1 9 7 9 . Sampling design and statistical methods for environmental biologists. Wiley, New York, 257 p. Huitema, R.E., 1 9 8 0 . The analysis of covariance and alternatives. Wiley, New York, 445 p. Jolicoeur, P., 1 9 7 5 . Linear regression in fishery research: some comments. J. Fish. Res. Board Canada, 3 2 : 1491-1494. Kidwell; J.F. and H.B. Chase, 1 9 6 7 . Fitting the allometric equation - a comparison of ten methods by computer simulation. Growth, 3 1 : 165-179. Kuhry, B. and L.F. Marcus, 1 9 7 7 . Bivariate linear models in biometry. Syst. ZOO^., 2 6 : 201-209.

Madansky, A., 1 9 5 9 . The fitting of straight lines when both variables are subject to error. Am. Statist. Assoc. J., 5 4 : 173-205. Nair, K.R. and K.S. Banerjee, 1 9 4 2 . Note on fitting of straight lines if both variables are subject to error. Sankhya, 6 : 3 3 1 . Ricker, W.E., 1 9 7 3 . Linear regressions in fishery research. J. Fish. Res. Board Canada, 3 0 : 409-434. Ricker, W.E., 1 9 7 5 . A note concerning Professor Jolicoeur's comments. J. Fish. Res. Board Canada, 3 2 : 1494-1498. Ricker, W.E., 1 9 8 4 . Computation and uses of central trend lines. Can. J. ZOO^., 6 2 : 1897-1905.

Snedecor, G.W. and W.G. Cochran, 1 9 8 0 . Statistical methods. 7th ed., Iowa State Univ. Press., Ames, Iowa, 507 p . Sokal, R.R. and F.J. Rohlf, 1 9 7 3 . Introduction to biostatistics. Freeman, San Francisco, 368 p. Sokal, R.R. and F.J. Rohlf, 1 9 8 1 . Biometry. Freeman, San Francisco, 8 5 9 p. Velleman, P.F. and D.C. Hoaglin, 1 9 8 1 . Applications, basics, and computing of exploratory data analysis. Duxbury Press, Boston, Mass., 354 p. Wald, A., 1 9 4 0 . Fitting of straight lines if both variables are subject to error. Ann. Math. Statistics, 1 1 : 284-300.

244

APPENDIX 1.

Simulated data used in the examples.

Biomagnification Hg

Water in frogs

DWT 2.13 3.04 4.49 2.32 1.56 4.42 8.39 5.56 4.46 11.49 6.78 3.12 9.55 8.71 1.31 2.83 2.98 3.10 2.04 1.59 1.93 1.55 4.75 8.41 3.12 2.24 18.43 11.24 2.66 1.29

Season 1.65 2.63 4.97 2.10 1.30 3.99 8.60 5.83 3.71 10.71 7.57 3.05 9.44 9.61 1.03 2.22 2.04 2.36 1.31 1.05 1.50 1.07 3.58 7.08 2.13 1.77 16.03 8.93 1.59 0.92

F F F F F F F F F F F F F F F S S S S S S S S S S S S S S S

S e d . Clam Tiss. ___--

26 90 1 48 14 60 17 70 55 46 23 86 39 55 12 1 88 83 35 31

129 190 12 110 89 153 73 141 165 177 89 267 182 282 52 6 526 422 121 165

M M M M M M M M M

M L L L L

L L L L

L L

Ratio of species types

Isotopes in Shell

-Ca 1.03 3.34 7.27 4.94 7.87 18.82 1.47 6.34 9.64 3.49 12.10 5.21 7.21 8.64 1.84 1.76

_Sr 0.010 0.029 0.029 0.021 0.032 0.589 0.006 0.021 2.552 0.882 6.699 0.408 0.050 1.664 0.420 0.070

Lot. C C C

c C C C C I I

I I I I I I

Resis. Sen. Loc. --38.31 8.55 5.00 6.56 6.83 4.78 7.96 8.18 4.66 7.09 4.32 29.02 6.20 37.58 13.03 5.83 23.22 17.21 4.24 25.92 16.99 12.11 16.22 6.25 11.43 7.14

15.15 5.86 1.60 2.85 3.11 5.75 5.33 11.79 0.33 6.41 0.63 63.11 1.48 14.09 2.56 0.61 6.23 0.78 0.49 11.26 2.90 1.55 2.02 3.72 3.23 0.50

C C C C C C C C C C C C C I I I I I I I I I I I I I

245

APPENDIX 2.

.

SAS job listing for analysis of ratio variables.

T I T L E WATER CONTENT O F S P R I N G AND F A L L FROGS: CREATE A DATA S E T C O N T A I N I N G THE V A R I A B L E S DRY WT. , WATER WT. , S E A S O N ( F = F A L L , S = S P R I N G ) , LN DRY WT. LN WATER WT. AND i %WATER , AND A NUMERICAL SEASON CODE ( F = l , S . 2 ) . t-Ill.1 r---, u1I N P U T DRWT WAWT S E A S O N $: LNDRWT=LOG(DRWT): LNWAWT=LOG(WAWT); TWT=DRWT+WAWT: PCWAWT=100* (WAWT/TWT) : I F S E A S O N = ' F ' THEN S C O D E = l : E L S E S C O D E = 2 : CARDS; 1.65 F 2.13 3.04 2.63 F

,

:

,

-.._..

.

.

.

PRODUCE S T A T S ON THE V A R I A B L E %WATER PROC MEANS; VAR PCWAWT: BY SEASON:

DO 2-GROUP ANOVA TO S E E IF PROC GLM: MODEL PCWAWT=SCODE:

.

$WATER

FOR EACH SEASON.

D I F F E R S BETWEEN S E A S O N S .

T E S T WHETHER T H E S L O P E S O F T H E LN WATER WT. VS. * R E G R E S S I O N S D I F F E R BETWEEN SEASONS. P R K GLM: MODEL LNWAWT=SCODE LNDRWT SCODE'LNDRWT;

LN DRY WT.

*

:

F I T T H E COMMON S L O P E A N A L Y S I S O F COVARIANCE MODEL AND OUTPUT T H E P R E D I C T E D LN WATER WT. VALUES TO A NEW DATA S E T . PROC GLM; MODEL LNWAWT=SCODE LNDRWT; OUTPUT O U T = O U T l P=PRLNY:

C R E A T E A NEW COMBINED DATA S E T C O N T A I N I N G OBSERVED AND P R E D I C T E D , : UNTRACSFORMED AND TRANSFORMED, WEIGHT AND %WATER WEIGHT VALUES.: DATA NEW; S E T 161: S E T O U T 1 ; P H Y = E X P ( P R L N Y ) : PRPCWAWT=100* ( P R Y / ( P R Y + D R W T ) ) : P R I N T OUT THE NEW DATA S E T . PROC P R I N T ;

.

4 DO A P L O T O F LN WATER WT. V E R S U S LN DRY W T . , CODED 3 Y SEASON. PROC PLOT: P L O T LNWAWT*LNDRWT=SEASON/ V A X I S = O T O 3 BY 0.5 H A X I S = O T O 3 BY 0 . 5 V p O S = 5 0 H P O S = 6 6 :

;

S U B S E T THE S P R I N G DATA ONLY DATA S P R I N G ; S E T NEW: I F S E A S O N = ' S ' ; DO LOG-LOG P L O T O F S P R I N G DATA W I T H T H E F I T T E D MODEL S U P E R I M P O S E D . PROC PLOT: P L O T LNWAWT'LNDRWT PRLNY*LNDRWT="' / OVERLAY V A X I S = O T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0 . 5 V P O S = 5 0 H P O S = 6 6 :

;

S U B S E T T H E F A L L DATA ONLY DATA FALL: S E T NEW: I F S E A S O N = ' F ' ;

.

Do T H E SA!4E P L O T FOR T H E F A L L DATA. PROC PLOT: P L O T LNWAWT'LNDRWT PRLNY'LNDRWT='*' / OVERLAY V A X I S = O T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0.5 V P O S = 5 0 H P O S = 6 6 : P L O T P R E D I C T E D LN WATER WT. V E R S U S LN DRY WT. , CODED BY SEASON. PROC P L O T DATA=NEW; P L O T PRLNY*LNDRWT=SEASON/ VAXIS.0 T O 3 BY 0 . 5 H A X I S = O T O 3 BY 0.5 V P O S = 5 0 H P O S = 6 6 ;

;

t

PLOT P R E D I C T E D DRY WT. V E R S U S DRY WT. , CODED BY SEASON. PROC P L O T DATA=NEW: P L O T PRY*DRWT=SEASON/ V A X I S = O T O 2 0 BY 5 HAXIS.0 T O 2 0 BY 5 V P O S = 5 0 HPOS=66; PLOT P R E D I C T E D %WATER V E R S U S TOTAL WT. PROC P L O T DATA=NEW: P L O T PRPCWAWT*TWT=SEASON:

.

* FOR S P R I N G P L O T \WATER V E R S U S TOTAL W T . , WITH THE F I T T E D MODEL SUPERIMPOSED. PROC P L O T D A T A = S P R I N G ; P L O T PCWAWT*TWT PRPCWAWT"TWT='*' / OVERLAY: Do THE SAME P L O T FOR THE F A L L DATA. PROC P L O T DATA=FALL: P L O T PCWAWT'TWT PRPCWAWT*TWT="'

/ OVERLAY;

:

A COMPARATIVE STUDY

OF THE SAMPLING PROPERTIES OF FOUR SIMILARITY

INDICES

HONG-WOO KHOO AND LIM TIT-MENG

1.1

INTROOUCTION D i s s i m i l a r i t y and s i m i l a r i t y

i n d i c e s have p l a y e d an i m p o r t a n t

r o l e i n r e c e n t e c o l o g i c a l s t u d i e s b u t u n f o r t u n a t e l y t h i s does n o t appear t o have extended t o a q u a t i c e c o l o g y a c c o r d i n g t o Washington (1984).

He s t a t e d t h a t t h e r e i s an o b v i o u s need f o r t h e f u r t h e r and

g r e a t e r e v a l u a t i o n o f s i m i l a r i t y i n d i c e s e s p e c i a l l y f o r a q u a t i c ecosystems and w a t e r p o l l u t i o n problems.

T h i s paper hopes t o c o n t r i b u t e

t o t h i s need. I n terms o f s a m p l i n g p r o p e r t i e s ,

a reliable similarity

index

would be one w i t h l o w tendency t o g i v e v a l u e s t h a t d e v i a t e f r o m t h e t r u e s i m i l a r i t y v a l u e s and w i t h s m a l l d i s p e r s i o n o f i n d e x v a l u e s f o r r e p e a t e d measurements o f t h e same community s i m i l a r i t y .

The magnitude

o f b i a s and d i s p e r s i o n o f a sample o f good s i m i l a r i t y measures s h o u l d n o t v a r y w i t h f a c t o r s such as sample s i z e and t h e number o f s p e c i e s i n v o l v e d i n t h e community comparison. Several w o r k e r s have e v a l u a t e d i n many d i f f e r e n t ways t h e p r a c t i cality

of

Bullock, Wolda,

different

similarity

1971; Johnston,

indices

in

biological

work

Lamont and Grant,

(eg.

1976;

Huhta,

1979;

1981; R i c e and B e l l a n d ,

1982).

B u t few s t u d i e s r e a l l y l o o k e d

i n t o t h e s a m p l i n g p r o p e r t i e s o f t h e a f f i n i t y measures. Lau (1980) a r e among t h e v e r y

1979;

R i c k l e f s and

few who have s t u d i e d t h e s a m p l i n g

p r o p e r t i e s of s i m i l a r i t y i n d i c e s . To

determine

how a

similarity

index

behaves

with

different

s a m p l i n g parameters i n t h e f i e l d i s q u i t e t e d i o u s and d i f f i c u l t s i n c e n a t u r e i s a l m o s t always t o o complex t o a l l o w f o r c o n t r o l l e d s a m p l i n g experiments. simulate

B u t w i t h t h e a i d o f a computer, i t i s p o s s i b l e f o r us t o

artificial

communities

on

which

proper

and

repeatable

s a m p l i n g t r i a l s can be c a r r i e d o u t and t h e s a m p l i n g b e h a v i o u r s o f t h e indices studied. U s i n g computer g e n e r a t e d samples t a k e n from s i m u l a t e d communities,

247

t h i s s t u d y i n v e s t i g a t e d t h e s a m p l i n g responses o f f o u r commonly u s e d similarity

indices.

All

these

( i d e n t i c a l resemblance) and a l o w e r l i m i t o f 0 ( n o resemblance). f o u r chosen i n d i c e s a r e :

1 The

i n d i c e s have an u p p e r l i m i t o f

Gower's g e n e r a l c o e f f i c i e n t o f s i m i l a r i t y

(Gower, 1971), B r a y - C u r t i s '

i n d e x ( s e e Huhta, l 9 7 9 ) , M o r i s i t a ' s i n d e x

( m o d i f i e d by Horn, 1966) and t h e E u c l i d e a n d i s t a n c e i n d e x . The s a m p l i n g responses o f t h e s e f o u r s i m i l a r i t y measures were s t u d i e d w i t h r e s p e c t t o v a r i o u s sample s i z e s ,

q u a d r a t e s i z e s and t h e

number o f s p e c i e s a t t r i b u t e s .

1.2

METHOD

1.2.1 S i m i l a r i t y I n d i c e s The f o l l o w i n g a r e t h e i n d i c e s i n v e s t i g a t e d i n t h i s s t u d y : ( i ) Gowers g e n e r a l c o e f f i c i e n t s i m i l a r i t y o r Gowers' S i m i l a r i t y I n d e x (GSI)

GSI..

= ~/N<(~-(/x~-x./)/R~) 1J J N = t o t a l number o f s p e c i e s i n v o l v e d i n t h e community comparison

IK.-X.f 1

= a b s o l u t e d i f f e r e n c e between t h e abundance o f s p e c i e s

J

k in

community i and j .

Rk = s p e c i e s range; d e f i n e d as t h e d i f f e r e n c e between t h e maximum and

minimum abundance o f s p e c i e s k p r e s e n t i n a l l communities u n d e r comparison

z= sum o f

k = 1 t o N species.

( i i ) B r a y - C u r t i s ' S i m i l a r i t y I n d e x (BCSI)

B C S I . .=l-(
P. = Jk

x.

Jk

/(X

jk

= r e l a t i v e frequency o f s p e c i e s k i n community j

248

( i v ) Euclidean S i m i l a r i t y Index (ESI)

E S I 1. J. = l - (
Pik,Pjk

= same as above

The complement v e r s i o n s o f t h e B r a y - C u r t i s and E u c l i d e a n i n d i c e s were used t o s t a n d a r d i z e t h e outcome i n o r d e r t h a t 0 s t a n d s f o r no s i m i l a r i t y and 1 f o r i d e n t i c a l s i m i l a r i t y .

Calculation f o r the ESI

was based on r e l a t i v e f r e q u e n c y d a t a .

1.2.2

S i m u l a t i o n Technique The method i s s i m i l a r t o t h a t d e s c r i b e d by L i m and Khoo ( 1 9 8 5 ) .

The s t r a t e g y was t o c r e a t e a r t i f i c i a l communities so t h a t t h e y can be sampled,

a n a l y s e d and s t u d i e d i n t h e same way as i f s t u d i e s were

conducted on r e a l

communities.

The b a s i c

idea o f

the

simulation

f o l l o w s t h a t o f Khoo and W i l i m o v s k y (1978).

In t h e s a m p l i n g s i m u l a t i o n , a square sample q u a d r a t e o f t h e r e q u i r e d s i z e was f i r s t randomly a l l o c a t e d i n a 100 x 100 c o - o r d i n a t e space w h i c h r e p r e s e n t e d t h e community " u n i v e r s e "

dimension.

In t h i s

" u n i v e r s e " i n d i v i d u a l members o f each community were a l l o t e d a c c o r d i n g t o t h e number o f s p e c i e s and s p e c i e s numbers o r community s t r u c t u r e and randomly p o s i t i o n e d as C a r t e s i a n c o - o r d i n a t e s space.

i n t h e "universe"

O v e r l a p p i n g o f i n d i v i d u a l p o s i t i o n s was a l l o w e d .

The number o f i n d i v i d u a l s p e r s p e c i e s t h a t were f o u n d i n t h e sample q u a d r a t e was c o u n t e d by comparing t h e i n d i v i d u a l ' s c o - o r d i n a t e s w i t h t h o s e o f t h e f o u r c o r n e r s o f t h e square q u a d r a t e .

The whole

process was r e p e a t e d f o r t h e number o f s a m p l i n g u n i t s ( q u a d r a t e s i n this

case)

r e q u i r e d by t h e e x p e r i m e n t and t h e n t h e mean s p e c i e s

abundance p e r q u a d r a t e c a l c u l a t e d . repeated f o r community

another

structure.

The whole s a m p l i n g process was

community w i t h e i t h e r t h e same o r d i f f e r e n t These

sampling

results

were

then

used

to

c a l c u l a t e t h e s i m i l a r i t y index values.

In o r d e r t o measure t h e b i a s and p r e c i s i o n t h e above s i m u l a t i o n was r e p e a t e d t h i r t y t i m e s and t h e mean and s t a n d a r d d e v i a t i o n o f t h e

30 s e t s o f sample v a l u e s f o r each s i m i l a r i t y i n d e x were c a l c u l a t e d . The mean and s t a n d a r d d e v i a t i o n v a l u e s were f o u n d t o s t a b i l i z e a f t e r

30 rounds.

A l l t h e s i m u l a t e d communities i n t h i s s t u d y had a c o n s t a n t

d e n s i t y o f 1000 i n d i v i d u a l s i n t h e 100 x 100 c o - o r d i n a t e " u n i v e r s e " .

249

1.2.3

Community S t r u c t u r e Communities w i t h 3, 6 and 12 s p e c i e s were c r e a t e d and o n l y com-

p a r i s o n s between communities w i t h t h e same number o f s p e c i e s were made.

F o r each s p e c i e s number l e v e l , e.g.

d i f f e r e n t abundance

6 species (Fig. l ) , e i g h t

p a t t e r n s were c r e a t e d so t h a t when t h e y were

t h e i r e x p e c t e d i n d i c e s w o u l d range f r o m 0 t o 1.

compared,

Table 1

shows t h e e x p e c t e d s i m i l a r i t y v a l u e s between t h e communities f o r each o f the f o u r indices.

1.2.4

Treatments Two s e t s o f s a m p l i n g s t r a t e g i e s were conducted t o s t u d y t h e

e f f e c t s o f sample and q u a d r a t e s i z e s .

I n t h e f i r s t s e t , sample s i z e s

o f 15, 10, 20, 40 and 80 q u a d r a t e s were taken. 8 co-ordinate

u n i t s were

used.

The

Quadrate sizes o f 8 x

second s e t used 10 q u a d r a t e

samples w i t h q u a d r a t e s i z e s c h a n g i n g from 4 x 4, 8 x 8, 16 x 16 t o 32 x 32 c o - o r d i n a t e u n i t s . 1.2.5

Response Measures The measures i n response t o t h e t r e a t m e n t s were t h e d e v i a t i o n o f

t h e mean observed s i m i l a r i t y v a l u e f r o m t h e e x p e c t e d s i m i l a r i t y v a l u e , and t h e s t a n d a r d d e v i a t i o n o f t h e r e s u l t i n g 30 r e p l i c a t e i n d e x v a l u e s . The f o r m e r was termed t h e b i a s and t h e l a t t e r t h e d i s p e r s i o n o f t h e sample i n d e x v a l u e s . indices

The b i a s w o u l d r e f l e c t t h e a c c u r a c y o f t h e sample

i n measuring

the

i n h e r e n t community

similarity while

the

d i s p e r s i o n would measure t h e p r e c i s i o n of t h e i n d i c e s i n e s t i m a t i n g s i m i 1a r i t y . 1.3

RESULTS

Bias

1.3.1

The degree o f b i a s i n r e l a t i o n t o t h e e x p e c t e d v a l u e s f o r Gower's, Bray- C u r t i s ' , M o r i s i t a ' s and E u c l i d e a n S i m i l a r i t y I n d i c e s a t d i f f e r e n t sample s i z e s a r e shown i n F i g . 2 w h i c h i l l u s t r a t e s t h e r e s u l t s f o r t h e 6 s p e c i e s s i m u l a t i o n model. F o r Gowers' i n d e x ( G S I )

h i g h e r b i a s was o b t a i n e d a t h i g h ( t h e

i n d e x t e n d s t o 1 ) as w e l l as l o w ( t h e i n d e x t e n d s t o 0) e x p e c t e d values

while

l o w b i a s was

Gowers' i n d e x v a l u e s . similarity,

observed a t t h e

intermediate

expected

T h i s means t h a t when two communities have h i g h

Gowers' i n d e x t e n d e d t o u n d e r e s t i m a t e ( n e g a t i v e b i a s ) t h e

t r u e s i m i l a r i t y w h i l e a t low t r u e s i m i l a r i t y i t tended t o overestimate (positive bias) it.

T h i s p a t t e r n remained t h e same even when sample

250

400

1

Communities 1 & 2

3

5

4

r

mh 9

L

m

U

C

2

400

m

-

6

7

8

rn

.-W

-

U

aJ

Q u)

200

-

0

abcdef

abcdef

abcdef

Species composition

h

abcdef

Pig. 1. Diagram showing the community structures or speciesabundance patterns o f the computer simulated communities for the six-species community model.

251

1. Expected similarity values f o r comparisons between communities w i t h 3 , 6 , and 1 2 species with 1 to 5 types of c o m m u n i t y structure (see fig. 1 . ) for Gower's index (GSI), Bray-Curtis' index (BCSI), Morisita's index (MSI) and Euclidean distance index ( E S I ) . TABLE

Comparison between community types 1 & 2 2 & 33 1

:"I

1

Similarity

Indices

GSI

BCSI

MS I

ESI

1.00

1.00

1.00

1.00

0.83

0.92

0.98

0.88

& 4 & 5 6.4 & 5 & 5

0.67

0.83

0.92

0.77

0.83 0.17 0.0

0.92 0.75 0.67

0.98 0.87 0.72

0.88 0.65 0.53

1 & 2

1.00

1.00

1.00

1.00

'"'3

0.61

0.83

0.93

0.84

2 2 3 3 4

=/

2 & 3 6 4 & 5 6 4 & 5 & 5

0.50

0.79

0.89

0.80

0.90 0.10 0.0

0.96 0.61 0.57

0.99 0.67 0.61

0.96 0.64 0.60

1 & 2

1.00

1.00

1.00

1.00

2 & 3

0.71

0.84

0.89

0.89

0.50

0.77

0.88

0.85

0.80 0.20 0.0

0.93 0.61 0.54

0.99 0.64 0.56

0.96 0,73 0.70

2 2 3 3 4

ii:i

2 3 3 4

& & & &

5 4 5 5

Number of species in each community 3

6

12

252

Sample N = 5

0

0.5

0.5

40

20

10

1 0

size

1 0

0.5

Expected S i m i l a r i t y

1 0

0.5

80

1 0

0.5

1

I n d e x Values

Fig. 2. Diagram showing the deviation of the sample (observed) index values from the expected similarity values f o r the six-species community comparison. GSI= Gower's Similarity Index, BCSI= BrayCurtis' Similarity Index, MSI= Morisita's Similarity Index and ESI= Euclidean Similarity Index.

253

s i z e s were i n c r e a s e d t o 80 b u t t h e magnitude o f b i a s decreased w i t h t h e i n c r e a s e i n sample s i z e . U n l i k e Gowers'

index,

t h e o t h e r t h r e e i n d i c e s showed,

a t high

expected s i m i l a r i t y , a n e g a t i v e b i a s w h i c h decreased as t h e e x p e c t e d values

decreased

to

0.

This

means

that

they

would

tend

to

u n d e r e s t i m a t e t h e a s s o c i a t i o n o f two h i g h l y s i m i l a r communities b u t would g i v e more c o r r e c t and a c c u r a t e e v a l u a t i o n s when t h e e x p e c t e d s i m i l a r i t y i s low. The magnitude o r degree o f b i a s i n Gowers' i n d e x decreases w i t h i n c r e a s i n g sample s i z e and reached an a s y m p t o t i c c o n s t a n t v a l u e beyond t h e sample s i z e o f 40 ( F i g . however, values,

Beyond t h i s optimum sample s i z e ,

3).

a n e g a t i v e and a p o s i t i v e b i a s a t h i g h and l o w e x p e c t e d respectively,

are s t i l l

observed f o r Gowers'

Index.

This

means t h a t i n c r e a s e i n sample s i z e would r e d u c e t h e magnitude o f b i a s but w i l l not eliminate i t ' s bias potential pattern i n relation t o the expected values. The decrease i n b i a s magnitude i n t h e o t h e r t h r e e i n d i c e s , a l s o reached an a s y m p t o t i c v a l u e b u t f o r each o f them t h e v a l u e was reached a t d i f f e r e n t sample s i z e s .

F o r B r a y - C u r t i s ' and E u c l i d e a n i n d i c e s t h e

optimum sample s i z e s were between 20 and 40 whereas f o r M o r i s i t a ' s i n d e x t h e y were between 10 and 20. When t h e e x p e c t e d s i m i l a r i t y was low, however, t h e b i a s o f t h e s e t h r e e i n d i c e s reached t h e a s y m p t o t i c v a l u e s a t a s m a l l e r sample s i z e than

at

high

expected

values.

This

means

that

when

the

true

s i m i l a r i t y i s l o w a s m a l l e r sample s i z e i s s u f f i c i e n t t o measure i t w h i l e a l a r g e r sample s i z e i s needed when t h e t r u e e x p e c t e d s i m i l a r i t y i s high. O f t h e f o u r i n d i c e s , M o r i s a t a ' s I n d e x has t h e l e a s t b i a s and i s therefore situations

the when

most only

suitable smaller

index sample

for

comparing

sizes

((20)

communities are

in

available.

M o r i s i t a ' s I n d e x appears a l s o t o g i v e t h e b e s t a c c u r a t e measure o f t h e true similarity.

R i c k l e f s and Lau (1980) a l s o showed t h a t i t has l e s s

bias than t h e Euclidean distance.

A t t h e i r r e s p e c t i v e optimum sample

s i z e s f o r h i g h e x p e c t e d s i m i l a r i t y comparisons t h e p e r c e n t a g e b i a s f o r Morisita's Bray-Curtis,

I n d e x i s l e s s t h a n 5% o f t h e e x p e c t e d w h i l e t h o s e f o r E u c l i d e a n and Gowers' I n d i c e s were a l l g r e a t e r t h a n 10%.

T a b l e 2 shows t h e r e l a t i o n s h i p between t h e b i a s p o t e n t i a l o f t h e f o u r i n d i c e s and t h e number o f s p e c i e s f o r d i f f e r e n t sample s i z e s a t t h e e x p e c t e d s i m i l a r i t y v a l u e o f 1.

The b i a s o f Gowers' i n d e x was

independent o f t h e number o f s p e c i e s .

T h i s was t r u e i r r e s p e c t i v e o f

254

5

10

20

Sample 40

Size 80

1

+3.0 GS I

Expected Similarity Values

f0.2

+0.1

.o.o

'0.103 0.0

0.605 0.502

-0.1

-0.2 -0.3

-0.4

0.0 rn rd

m .rl

-0.1

-0.2

+:

-0.1 fO.1

0.0

-0.1

-0.2

0.056 0.614

1

I

ESI

0.233

7-

Fig. 3. Diagram showing the relation betweeb bias (observed minus the expected similarity values) of each of the four indices ( G S I , B C S I , MSI and E S I ) with respect to the increase in sample size. The example is for the six-species comparisons,

255

TABLE 2 . B i a s o f sample i n d e x v a l u e s f o r t h e 3 , 6 and 1 2 s p e c i e s communities i n r e l a t i o n t o s a m p l e s i z e ( 5 - 80) when t h e e x p e c t e d s i m i l a r i t y i s 1. R e s u l t s a r e f o r t h e f o u r i n d i c e s - GSI, BCSI, MSI and ESI.

Similarity indices

Number of s p e c i e s i n each community

GSI

BCSI

MS I

ESI

5 10 20 40 80

-0.39 -0.30 -0.36 -0.34 -0.25

-0.17 -0.11 -0.08 -0.06 -0.04

-0.05 -0.02

-0.18 -0.11 -0.08 - 0 .0 5 -0.04

3

5

-0.19 -0.15 -0.11 -0.08 -0,06

-0.05 -0.03 -0.02 -0.01

-0.18 -0.14 -0.10 -0.07 - 0 .0 5

6

20 40 80

-0.35 -0.31 -0.37 -0.27 -0.21

5 10 20 40 80

-0.42 -0.36 -0.32 -0.38 -0.33

-0.31 -0.20 - 0 .1 5 -0.11 -0.08

-0.22 -0.11 -0.06 -0.03 -0.02

-0.21 -0.14 -0.11

12

Sample s i z e

10

-0.01 -0.01 -0.004 -0,09

-0.07

-0.05

TABLE 3 . D i s p e r s i o n ( s t a n d a r d d e v i a t i o n ) o f s a m p l e i n d e x v a l u e s f o r 3 , 6 , and 12 s p e c i e s c o m m u n i t i e s a t e x p e c t e d v a l u e of 1, i n r e l a t i o n t o sample s i z e .

Sample s i z e

GSI

BCSI

MS I

ESI

5 10 20 40 80

0.14 0.14 0.12 0.10 0.11

0.07 0 .0 5 0.03 0.03 0.02

0 .0 4 0 .0 2 0.01 0.01 0.003

0.08 0 .0 5 0.04 0.03 0.02

3

5 10 20 40 80

0.10 0.11 0.09 0 .0 7 0.07

0,06 0 .0 5 0.03 0.02 0 .0 2

0.05 0.04 0.02 0.01 0.01

0.05 0.04 0.03 0.02 0.02

6

5

0.08 0.09 0.07 0.05 0.05

0.07 0.05 0.03 0 .0 2 0.02

0.08 0.05 0.03 0.01 0.01

0.05 0.03 0 .0 2 0.01 0.01

12

10 20 40 80

No. o f s p e c i e s i n community

256

sample

sizes.

This

bias,

however,

was

observed t o

increase w i t h

i n c r e a s e i n s p e c i e s number f o r t h e o t h e r t h r e e i n d i c e s .

1.3.2

Dispersion The d i s p e r s i o n ( s t a n d a r d d e v i a t i o n ) p a t t e r n s o f t h e 4 i n d i c e s i n

r e l a t i o n t o sample s i z e and s i m i l a r i t y v a l u e s a r e shown i n f i g . 4.

The

s t a n d a r d d e v i a t i o n v a l u e s were g r e a t e s t a t t h e m i d range v a l u e s (0.5) and l e a s t a t b o t h t h e l o w and t h e h i g h end s i m i l a r i t y v a l u e s ( 0 and 1 ) . The g r e a t e s t d i s p e r s i o n v a l u e s were observed f o r Gowers' I n d e x followed

by

Bray-Curtis'

and M o r i s i t a ' s

i n d i c e s and t h e n

by

the

E u c l i d e a n I n d e x w h i c h appears t o be t h e most p r e c i s e o f t h e f o u r indices.

A s i m i l a r i t y measure w i t h h i g h p r e c i s i o n o r l o w d i s p e r s i o n p r o p e r t i e s would be more s e n s i t i v e i n d e t e c t i n g s m a l l e r d i f f e r e n c e s o r

I n t h i s regard t h e Euclidean index i s

changes t h a n one w h i c h i s n o t . t h e most p r e c i s e .

I t ' s s t a n d a r d d e v i a t i o n a t t h e optimum sample s i z e

o f 40 was a b o u t 0.027

under t h e w o r s t c o n d i t i o n s .

Thus i t ' s 95%

c o n f i d e n c e i n t e r v a l i s l e s s t h a n 10% o f t h e mean s i m i l a r i t y v a l u e . T h i s i n t e r v a l t e n d s t o g e t s m a l l e r when t h e s i m i l a r i t y v a l u e s t e n d towards t h e two extremes o f 0 and 1. other three

i n d i c e s were

The c o n f i d e n c e i n t e r v a l s o f t h e

g r e a t e r than

10% a t t h e mid-range

mean

s i m i l a r i t y values. I n c r e a s e i n sample s i z e had t h e e f f e c t o f l o w e r i n g t h e d i s p e r s i o n v a l u e s and hence i n c r e a s e s t h e p r e c i s i o n o f a l l f o u r i n d i c e s .

No

optimum sample s i z e a t w h i c h t h e d i s p e r s i o n s t a b i l i z e s c o u l d be seen. H e t e r o g e n e i t y o f v a r i a n c e o f t h e s i m i l a r i t y v a l u e s were observed from a l l

four

indices.

Uneven d i s p e r s i o n a t d i f f e r e n t v a l u e s

s i m i l a r i t y i s n o t a good p r o p e r t y o f an i n d e x .

of

The v a r i a n c e s were

r e l a t e d t o t h e i n d e x v a l u e s i n a p a r a b o l i c manner w i t h t h e g r e a t e s t v a r i a n c e a t t h e m i d range v a l u e s .

T h i s p a r a b o l i c and uneven d i s t r i b u -

t i o n o f t h e s t a n d a r d d e v i a t i o n , however tended t o b r e a k down as t h e sample s i z e i n c r e a s e d t o 80.

I t would appear t h e r e f o r e t h a t t o o b t a i n

homogeneity o f v a r i a n c e and t h e independence o f t h e d i s p e r s i o n f r o m t h e mean i n d e x v a l u e s one s o l u t i o n i s t o i n c r e a s e sample s i z e b u t i n t h i s case up t o 80.

T h i s number may n o t be p r a c t i c a l under f i e l d

c o n d i t i o n s . R i c k l e f s and Lau (1980) had a l s o observed uneven d i s t r i b u t i o n o f s t a n d a r d d e v i a t i o n even u p t o sample s i z e s o f 400 f o r a n o t h e r s i m i l a r i t y index. index

values

to

statistical tests.

An a l t e r n a t i v e s o l u t i o n i s t o l o g - t r a n s f o r m t h e obtain

homogeneity

of

variance

for

subsequent

257

Sample

. ft

0.150

0,100

0.075

40

.

h

c

.d

Size

20

10

0.100

.rl

2 P

2

0.050

a

a

9

U

m

v

0

0

0.5

1

0

0.5

1

0

0.5

1

Average S i m i l a r i t y Values

Fig. 4. D i a g r a m s h o w i n g t h e d i s p e r s i o n p a t t e r n s of t h e sample s i m i l a r i t y v a l u e s f o r t h e f o u r indices. The r e s u l t s a r e f o r t h e six-species communitv comoarisons.

258

As t h e number o f s p e c i e s i n c r e a s e d i n t h e communities t h e d i s p e r sions

of

Morisata's

Gowers' index

and

Euclidean

increased

indices

( t a b l e 3).

decreased w h i l e The

that

of

dispersion pattern o f

B r a y - C u r t i s ' i n d e x was, however, independent o f s p e c i e s number. 1.3.3

quadrate s i z e S i m u l a t i o n r e s u l t s showed t h a t t h e i n c r e a s e s i n q u a d r a t e s i z e s

had t h e same e f f e c t s on t h e b i a s p o t e n t i a l and t h e d i s p e r s i o n o f t h e f o u r i n d i c e s as t h a t o f i n c r e a s i n g sample s i z e .

I t reduced t h e b i a s

and i n c r e a s e d t h e p r e c i s i o n i n t h e same manner as i n c r e a s e s i n sample s i z e b u t a l s o d i d n o t a l t e r t h e r e l a t i o n s h i p p a t t e r n between t h e b i a s p o t e n t i a l and t h e e x p e c t e d s i m i l a r i t y v a l u e s as w e l l as t h e p a r a b o l i c r e l a t i o n s h i p between t h e d i s p e r s i o n v a l u e s and t h e s i m i l a r i t y v a l u e s i n a l l the four indices. 1.4

DISCUSSION

A good s i m i l a r i t y i n d e x s h o u l d be s i m p l e t o c a l c u l a t e ; i t s h o u l d be 0 i f t h e two e n t i t i e s o r communities under comparison a r e c o m p l e t e l y d i f f e r e n t and i t s h o u l d be 1 when b o t h communities a r e i d e n t i c a l ; t h e i n d e x s h o u l d v a r y l i n e a r l y and can be t e s t e d s t a t i s t i c a l l y and s h o u l d have t h e d e s i r a b l e s a m p l i n g p r o p e r t i e s o f b e i n g a c c u r a t e and precise. From t h e above c o m p a r a t i v e s t u d y i t c o u l d be seen t h a t none o f them was a b s o l u t e l y s u p e r i o r .

I n terms o f a c c u r a c y w i t h r e s p e c t t o

sample s i z e , M o r i s i t a ' s i n d e x had t h e l e a s t b i a s a t s m a l l sample s i z e f o l l o w e d by t h e E u c l i d e a n and B r a y - C u r t i s '

i n d i c e s and t h e

least

a c c u r a t e was Gowers' I n d e x . I n terms o f a c c u r a c y w i t h r e s p e c t t o s p e c i e s numbers,

Gowers'

Index was most p r e c i s e f o l l o w e d by B r a y - C u r t i s ' and M o r i s i t a ' s I n d i c e s and t h e l e a s t p r e c i s e was Gowers' I n d e x . I n terms o f p r e c i s i o n and sample s i z e , E u c l i d e a n I n d e x was most p r e c i s e f o l l o w e d b y B r a y - C u r t i s ' and M o r i s i t a ' s I n d i c e s and t h e l e a s t p r e c i s e was Gower's Index. I n terms o f p r e c i s i o n i n r e l a t i o n t o s p e c i e s number, B r a y - C u r t i s ' I n d e x was t h e b e s t because o f i t s independence f r o m t h e number o f species. Gower's g e n e r a l c o e f f i c i e n t o f s i m i l a r i t y has been c l a i m e d by Johnston (1976) t o be r o b u s t . include i t s applicability

Some o f i t s m e r i t s l i s t e d b y J o h n s t o n

t o any l e v e l

presence-absence d a t a t h r o u g h c o n t i n u o u s ,

o f measurement

(discrete,

biomass d a t a ) and i t does

259

not

require

calculation.

statistical

standardization

This

however,

study,

c r i t e r i a d i s c u s s e d above, therefore,

of

showed

the

that

data

out

of

used the

in four

Gower's i n d e x o n l y s a t i s f i e d one and i t i s

i n comparison w i t h t h e o t h e r i n d i c e s ,

n o t as u s e f u l as

c l a i m e d by Johnston i n terms o f i t s s a m p l i n g p r o p e r t y . Bray-Curtis'

i n d e x has been e v a l u a t e d by Huhta (1979) t o be good

i n showing s u c c e s s i o n i n f o r e s t f l o o r a r t h r o p o d communities o n l y a f t e r l o g a r i t h m i c transformation o f data.

T h i s s t u d y , however, showed t h a t

i t s one advantage i s t o have i t s sample d i s p e r s i o n v a l u e s independent o f t h e number o f s p e c i e s i n t h e communities. M o r i s i t a ' s i n d e x has been recommended by Wolda (1981) f o r i t s independence o f sample s i z e and s p e c i e s number.

T h i s s t u d y , however,

showed t h a t i t i s dependent on sample s i z e ( t h o u g h n o t as s t r o n g l y as t h e o t h e r t h r e e i n d i c e s ) and s p e c i e s number. E u c l i d e a n d i s t a n c e s i s c o n s i d e r e d t o be one o f t h e good s i m i l a r i t y i n d i c e s by Lamont and G r a n t (1979) f o r i t s s e n s i t i v i t y t o changes i n d i f f e r e n t number o f s p e c i e s between communities.

T h i s s t u d y , however,

o n l y compared abundance p a t t e r n s between communities o f same s p e c i e s number.

I t i s f o u n d t o be p r e c i s e and i t s p r e c i s i o n i n c r e a s e s w i t h

sample s i z e and t h e number o f s p e c i e s a t t r i b u t e s .

It i s therefore a

s u i t a b l e i n d e x t o use f o r complex communities and when sample s i z e s are large. M o r i s i t a ' s i n d e x s h o u l d be used when t h e a c c u r a c y o f community s i m i l a r i t y measurement i s t h e main o b j e c t i v e whereas E u c l i d e a n i n d e x s h o u l d be used when t h e p r e c i s i o n o f r e p e a t e d measurements i s o f importance i n s i t u a t i o n s where o n l y r e l a t i v e d i f f e r e n c e s a r e needed. According

to

Wolda

(1981),

indices are not y e t available.

statistical However,

tests

for

similarity

R i c k l e f s and Lau

(1980)

suggested t h e use o f f i e l d d a t a and t h e use o f computer s i m u l a t i o n t o o b t a i n s t a t i s t i c a l confidence l i m i t s f o r s i m i l a r i t y estimates.

This

p r e s e n t s t u d y a l s o shows t h a t i t i s p o s s i b l e t o e s t i m a t e c o n f i d e n c e l i m i t s f o r these s i m i l a r i t y i n d i c e s through s i m u l a t i o n provided the s a m p l i n g p r o p e r i t e s a r e t a k e n i n t o account. I t i s t h e r e f o r e p o s s i b l e t h a t t h r o u g h t h i s k i n d of

study

the

properties

of

similarity

indices

under

simulation

various

other

c o n d i t i o n s can be e x p l o r e d and w o u l d l e a d t o a b e t t e r u n d e r s t a n d i n g o f t h e many s i m i l a r i t y i n d i c e s a t p r e s e n t i n use, b e f o r e t h e y can be used properly i n the f i e l d .

260

REFERENCES Bullock, J.A., 1971. The investigation of samples containing many species 11. Sample Comparison. Biol. J. Linn. SOC. 3: 23-56. Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27: 857-871. Horn, H.S., 1966. Measurement of "overlap" in comparative ecological studies. American Naturalist 100 (914): 419-424. Huhta, V . , 1979. Evaluation of different similarity indices as measures of succession i n arthropod communities of the forest floor after clear-cutting. Oecologia 41: 11-23. Johnston, J.W., 1976. Similarity Indices I: What do they measure? BNWL-2152. Battele, Pacific Northwest Laboratories, Richland, Washington, U.S.A. Khoo, H.W. and Wilimovsky, N.J., 1978. Similarity Index. Department of Zoology, University of Singapore (unpublished report). Lamont, B.B. and Grant, K.J., 1979. A comparison of twenty-one measures of site dissimilarity. In: Multivariate Methods in Ecological Work. (eds.) Orloci, L., Rao, C.R. and W.M. Stiteler, pp 101-126, International Co-operative Publishing House, Maryland, U.S.A. Limy T.M. and Khoo, H.W., 1985. Sampling properties of Gower's general coefficient similarity. Ecology (in press). Morisita, M. 1959. Measuring Interspecific Association and Similarity Between Communities. Mem. Fac. Sci. Kyushu Univ., Ser. E (Biol), 3: 65-80. Rice, J. and Belland, R.J., 1982. A simulation study of moss flora using Jaccard's coefficient of similarity. J. Biogeography 9: 411-419. Ricklefs, R.E. and Lau, M. 1980. Bias and dispersion of overlap indices: Results of some Monte Carlo simulations. Ecology 61 (5): 1019-1024. Washington, H.G., 1984. Diversity, Biotic and Similarity Indices: A review with special relevance to aquatic ecosystems. Water Res. 18(6): 653-694. Wolda, H., 1981. Similarity indices, sample size and diversity. Oecologia 50: 296-302.

RANDOMIZED SIMILARITY ANALYSIS OF HULTISPECIES LABORATORY

AND FIELD STUDIES

ERIC P. SMITB

Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia

24061

INTRODUCI'ION

Biological monitoring studies involving measurments on a large number of species are difficult to analyze. the loss of important

Biological concerns are many, ranging from

species, to changes in the abundance, biomass or

biovolume of important species, to changes in the composition or diversity of groups of

species.

Although

a

number of

researchers have

recommended

multivariate methods for detecting the changes associated with differences in locations or levels of a toxicant, most studies cannot use these methods for inference because the sample sizes are not adequate and it is not feasible to obtain

adequate sample sizes.

For example,

it

is not

uncommon

researcher to observe as many as 100 different species in a study.

for a

To apply

multivariate analysis of variance, one would need over 100 replicate samples. Furthermore, it is unlikely that the assumptions of MANOVA would be realized even if this many samples were obtained.

Because of the large number of

species typically absent at a given site, the normality assumption cannot be met.

Therefore, some alternate methods are needed for the analysis of this

type of data. This paper then focuses on the analysis of biological data arising from multispecies studies.

Of interest in this paper are three basic questions

which arise in these studies (1) Are there differences due to the locations or treatments?

(2) Which species are primarily involved in the differences?

(3)

Which locations or treatments are different? The primary

inferential method we propose is based on permutation or

randomization procedures.

Such procedures were proposed for use in monitoring

studies using diversity measures by van Belle and Fisher (1977) and also by Bell et al. (1981).

Here, the methods are based on comparing similarities

between samples from like and unlike sites.

Similarities or measures of

distance between species are connnonly used to compare sites. the comparisons tend

to be

graphical and

not

based

on

However, most of an

inferential

262 procedure.

The permutation methods presented

here

are

complimented by

graphical and sununary measures to aid in interpreting the test results. MFPHODS AND DATA For

simplicity, the methods will

be

discussed

randomized design with a single factor of interest.

assuming

a completely

For example, we may be

interested in the effects of different concentrations of a single chemical on growth

in microcosm

experiments.

Each microcosm

would

receive only

1

treatment and each treatment would be applied to several replicate microcosms. Observations on for example, species biomass, would then be taken at some suitable time.

Alternatively, one may be interested in deciding if there are

differences in the aquatic community above and below a chemical plant. Several similar sites would be chosen above and below and the comaunity composition compared. variable

measured

If only a single species were to be studied and the

were

biomass,

appropriate method of analysis.

an

analysis of

variance might

be

an

However we have multiple species so an

alternate approach is required. Assume then that we have sampled at several sites in a river.

At each site,

a single sample (possibly a composite sample) is taken and recorded on a number of species.

information

This information may be presence or absence

of species, or abundance, biomass or biovolume etc. of individual species.

We assume further that the river has some point source of pollution and interest is in whether the pollution has an effect on the aquatic c o m n i t y . The approach that we take here uses the randomization or permutation method, a cormon approach in nonparametric statistics (Pitman 1938, Bradley 1968, Mielke et al. 1981, Moore et.al. 1984). The first step in the randomization analysis is the summarization of the data vectors through the use of similarity (or distance) measures.

A simi-

larity measure, Sij, describes the degree of relatedness between the species at t w o sites, i and j.

There exist a number of measures and a number of

research papers describing the merits and demerits of the various measures. The object here is not to enter into that debate but to describe methodology that is useful after the measure is chosen. is verg

important

hypothesis.

and

determines the

H m v e r , the choice of the measure

interpretation of 'the statistical

Some guidelines on the choice of appropriate measures are found

in discussions in Sneath and sokal (1973), Hellawell (1978) LamDnt and Grant (1979).

Eiajdu

(1981),

and

others.

A

simplified

summarization of

the

similarity measures categorizes them into three groups that are related to types of changes in community structure. First, if presence-absence data is used, the focus is on loss of species

263 associated with the pollutant.

Measures such as Jacard's coefficient

Sij = a/(a+b+c)

(1)

where a is the number of species present in both sites, b is the number present in site i only and c is the number present in site j only

or the simple matching coefficient Sij =

( a+d )/(

a+b+c+d )

(2)

where d is the number absent in both sites i and j, are useful for detecting changes in the occurrence of species.

Loss of species is of course not the only type of change that may occur in a ecosystem.

With mild pollution, one may expect global decreases in the

abundance of species, or some intolerant species may decline in abundance while others increase, or the relative abundances of species may change.

In

the first situation, measures that are based on quantitative or absolute measurements are to be preferred.

Some common measures include Euclidian

distance D - . = [c(xik-xjk)2]k lJ k

(3)

where xik and xjk refer say to the biomass or biovolume of species k at site i and k, or a version of the Minkowski metric Dij = ttxfi-xjkl

.

(4)

These measures may be converted to similarity measures. For relative changes, proportional abundances may be used and a measure

such as Bhatacharyya's (1946) measure of similarity sij = C(PikPjk)k

(5)

or the proportional similarity measure

Sij = Z min(Pik,Pjk) may be useful.

(6)

Kere Pik and Pjk refer to the proportion of species k at site i

and j respectively.

Alternatively, with biomass data a commonly used measure

(Sullivan 1978) is Stander's (1970) measure which is more generally the cosine of the angle between two vectors

s 13 ..

= I: PikPjk/( &ptk&k))4

( 8 )

if proportions are used. The

second

step

in the

randomization

analysis is the test of

site

264 differences based on permutations of the matrix of similarity or distance measures.

To help clarify the basic ideas, we shall make use of the simple

data and calculations in Table 1. that represent sampling from two locations, one above and one belav a suspected source of pollution.

We shall assume that

there were 5 species studied and three replicate samples taken at each location.

Below the data matrix is the matrix of similarity coefficients,

using Stander's cosine measure. coefficients. small

Note the obvious structure in the matrix of

There are two groups of high similarities and a single block of

coefficients.

The

large

coefficients

represent

the

"within"

similarities, that is, the similarities between the replicates from the same location (receiving the same treatment). the "between" similarities.

The block of small coefficients is

These coefficients measure the similarity between

samples from different locations (receiving different treatments).

If there

are no treatment effects, we expect the between similarities to have roughly the same values as the within similarities, otherwise the between similarities should be laver than the within values.

The permutation test compares the

between and within similarities assuming that there is no difference due to the location.

If there are large differences between the locations, the test

usually will indicate these differences.

TABLE 1 Bypothetical data and similarity estimates for three replicate sites at two locations, one above a source of pollution and one below.

DATA location

site

1 1 1 2 2 2

1 2 3

4 5 6

spl 10 12 18 5 3

4

sp2 5 2 9 7 4 8

sp3 8

9

4 9 6 2

sp& 2 5 1 15 12 16

sp5 1 0

2 5 9

8

ESTIMATED SIMILARITIES site 1 2 3 4 5 6

To

1 2 3 4 5 6 .685 .556 .486 1.00 ,955 .908 .950 1.00 .836 .717 .586 .506 - -.908 - - - - - - -.836 - - - - - - - 1.00 - - - - - - - - - -.515 - - - - - - - -.413 - - - - - - - - - a- - 4.685 .717 .515 11.00 .946 .925 1.00 .941 .556 .586 ,413 I .946 .L86 ,506 .44L I .925 .941 1.00

' '

'

test

differences.

for

differences,

.

a

statistic

is

needed

to

summarize

the

Recognizing the similarity of the above situation with the

analysis of variance method, a possible statistic (Gocd 1982) is

265 L =

E/ii

where % is the mean between similarity and

w is the mean within similarity.

It turns out that because T, the total of the similarities is a fixed number

% or

for a given similarity matrix, one may also just consider

w for testing

purposes (see for example Mielke et al. 1981 or Good 1982). To carry out the permutation test, we compute the statistic L for the data as collected.

Now because we have assumed that if

Call this value L(data).

there is no effect due to the pollutant, we should be able to switch one or more of the upstream samples with the same number of downstream locations and this should not change the value of the statistic by very much.

If however,

there is a difference between locations, switching the data should cause a relatively large change in the value of the statistic L.

We shall refer to

To test for differences in location, we carry

this value of L as L(permute).

out a large number of these data switches or permutations, say 1,000 and compare the original value L(data), with the permuted value.

If L(data) is

more extreme than say 95% of the L(permute) values, then one would reject the null hypothesis of no differences between locations.

Note that one may

permute similarities and not the data to save computation. Several comments need to be made at this point. locations

while

illustrative

is

problematic

in

First, the example using that

the

pollution

is

confounded with location so that any differences may not be due to pollution. As

with

most

statistical procedures,

a

significant difference does not

Second, in some cases (small samples) there are

necessarily imply causation.

only a small number of permutations possible. observations at k

locations or treatments

location, there are Nl/(nllnZI.. .%I

)

If there are a total of N with

ni replicate sites per

different permutations.

When

this

number is small, one may wish to compute all possible permutations of the data (Berry and Mielke 1984).

Third, the test is a one sided test and rejection

depends on the statistic that is used. values of

i/w

similarity measure is used. one considers

If

B/w

is used, one rejects for large

if a distance measure is used or small values of

?i

%/w if

a

Alternately, if a similarity measure is used and

as the statistic, then wereject for small values of

%

or large

values of

6.

If a distance measure is used, then one would reject for large

values of

6 or

small values of

ii.

PoLLoW-UP ANALYSES OF REPLICATES

After rejecting the null hypothesis of no location differences, there are a number of analyses that could be used to determine which locations (assuming there are more than two) differ and which species are the important species for indicating the differences.

It may also be of interest to look for

266

possible odd data values that may influence the analysis.

While there are a

number of procedures that could be utilized, we will focus on just a few here. For comparing locations, a set of graphical procedures useful for ordering the samples are the multidimensional scaling procedures (Gower 1966).

The

purpose of this set of procedures is to find a set of x and y data points that when graphed represent the location of the samples in a two dimensional space (a reduced species space).

The points (x,y) are chosen so that the visual

(Euclidian) distance between the points is approximately the same as the distance (or inverse to the similarity) between the samples as measured by the chosen index.

By looking at the graph of the sample sites, one can look for

groups of samples, for possible ordering of the sites, or for odd data values. A

more inferential approach is to compare the mean similarities for the

different sites.

These comparisons may be made using the multiple comparisons

permutation procedure developed by Foutz et al. (1985).

This procedure allows

one to control the error rates associated with overall set of comparisons.

FOLLOW-UP ANALYSES OF SPECIES

There are a number of possible methods useful for evaluating the individual species.

Our interest here however is not only in changes in individual

species but which species are primarily indicated by the test.

responsible for the differences

As the test is strongly dependent on the measure of

similarity, we propose using the contribution (or importance) of the species to the overall statistic as an importance measure. (for example, Bhattacharyya's computed.

measure),

If a measure is additive

the contribution may be directly

However, most measures are not additive so a method based on the

effect of removing a species on the similarity is proposed. Let B-i be the mean of the between similarities with species i removed. Then

rwi=

10o(B-~-?i)/B

(10)

measures the percentage relative influence of species i on the mean between similarity.

Large positive values indicate a species whose removal greatly

increases the between similarity.

These species show differences between

locations. Species with large negative values decrease the between similarity when removed and represent species that show little change over most locations but contribute to the between similarity.

Species that have influence close

to zero indicate species that are relatively unimportant or do not dominate the measure.

EXAMPLE For an application of the above methods, we shall use a data set from an

267

experiment on the effect of zinc on the periphyton comunity in the New River at Glen Lyn, Virginia.

Twelve artifical streams were placed by the river and

received one of four zinc treatments (0.0, 0.05, 0.5 or 1.0).

Artifical

substrates were placed in the streams and removed at a number of times throughout the experiment.

Thus, the experimental design is a split-plot

design but without blocking (Miliken and Johnson 1983).

As the experhtent was

designed to obtain a significant time-zinc interaction, data for individual times were analyzed separately.

A detailed analysis of the full data set will

be presented elsewhere. Table 2 gives the data for day 20 of the experiment and some sununary measures using Stander's measures for the eleven dominant species.

The

permutation test indicated significant differences between the four treatments for both measures (no mean between similarities were lower than the observed in 1000 permutations of the data). Figure 1 shows that the low zinc treatments (0, 1 and 2) are close together while the highest dose of zinc ( I )is well separated from the other doses. The mean

similarities suggest

a

relatively

high

degree

of

similarity

for

replicates within a treatment (the diagonal elements) while the between means show a decreasing similarity with increasing zinc concentration.

Note that

this is not clearly displayed in Figure 1 due to the horseshoe effect (Kendall 1971).

Variance

estimates suggests a high

replicates using Stander's index.

degree

of

variability

for

The multiple comparison procedure applied

to the between means suggest overlappings between the 0.0, 0.05 and 0.50 treatments.

2

0 0

0

1

11

2

2 Axis 1 -0.25

-Oa50

-0.75 -1.00

t

t+

I

:

+ - + --0.2 +-

0.0 0?2Axis 2 Figure 1. Plots of 12 replicate artifical streams from the Glen Lyn study in the first two species axes using multidimensional scaling on the matrix of Stander's similarities. Symbol 0 represents replicates for the 0.0 zinc treatment, 1 the 0.05 treatment, 2 the 0.50 treatment and I the 1.00 treatment. -0.6

-0.6

k?

TABLE 2

I .

Data (in cells/cmL) on 11 species from a study on the effects of zinc on the algae community in the New River, influence measures and mean similarities. SPECIES ZN 0.0

0.05 0.5 1.0

INFLUENCE

REP

1

2

3

4

5

6

7

1 2 3 1 2 3 1 2 3 1 2 3

175 134 77 44 18 49 111 59 29 81 98 66

1745 931 393 1738 24 1 716 846 386 482 862 953 794

642 0 393 14 11 16 44 22 29 65 53 37

408 412 323 59 55 66 89 40 51 85 80 14

21 53 48 564 137 874 7 3 0 44 17 0

730 596 705 1923 1103 1891 7 18 14 40 26 51

0 0 3 0 0 0 163 3 44 694 561 185

204 134 230 876 427 924 5133 802 995 8457 14012 10028

8

0.06

4.18

0.47

0.69

0.19

2.33

0.02

44.73

9

10

11

29 187 141 222 200 208 326 155 111 102 205 14

379 245 300 490 159 216 1515 594 1084 98 989 37

2293 1070 3613 5615 3725 7099 12940 9489 20116 1013 5740 1121

0.06

MATRIX OF MEAN SIMILARITIES USING STANDER'S INDEX ZN

0.00

0.05

0.50

1.0

0.00 0.05 0.50 1.00

.872

.882 .982

.824 .933 .967

.265 .322 .365 .97 3

-0.01

-15.09

co

269

The influence measures indicate some interesting relationships between the data and the similarity statistics.

On looking at the data, one notices that

some cell densities tend to be quite large. for species 5 and 7.

differences in similarities. the magnitude of 11.

k.

Easily observed differences are

However, these species have negligible importance to the Species 8 and 11 are the species that determine

This peculiarity is due to the high abundance of species

To better understand the relationship between the data, influence and the

differences between treatments, one must consider the proportions adjusted for the square root of the sum of the squared proportions which are presented in Table 3. 0.05

Note that species 11 dominates Stander's measure for treatments 0.0,

and

0.50.

abundance.

Only

the

last

treatment

alters this species relative

Species that have a strong decreasing effect on between similarity

are species 8, 6, and 2. Species 8 increases relative to others in replicates of the 1.0 treatment while species 2 and 6 are diminished by increasing zinc. Differences in Figure 1 between replicates for the 0.0 and 0.05 treatment are primarily due to species 2, while species 6 contributes to differences between the replicates for the 0.05 and 0.50 treatment and species 8 and 11 affect separation of the 0.50 and 1.0 treatments.

DISCUSSION The methods discussed above represent only a few of possible techniques available for analysing conanunity data.

These methods are however oriented to

the analysis of data using a specific measure of similarity. A number of other techniques such as principal components, discriminant analysis and detrended correspondence

analysis

(Greenacre

Euclidian type measures of distance.

1986,

Gauch

1983)

are

dependent

on

If interest is in Euclidian (or weighted

Euclidian) distance then these methods, which are available on some computer packages, should provide useful results.

One method of interest is Gabriel's

biplot analysis (ter B r a a k 1983) which allows graphical displays of both the species and the replicates. One drawback to the permutation approach is that the size of the difference between the mean between and mean within similarity may not be important but significant.

If all between similarities are only 0.01 less than the smallest

within similarity, the degree of significance using the permutation test is the same as if the difference was 0.50. nonparametric methods.

This problem is c o m n with many

Hence one should consider the magnitude of the between

and within similarities as well as the significance of the test.

An approach

based on the size of the similarities is available if one is willing to make additional assumptions.

Dyer (1978) presents a method based on linear models

and the assumptions of normality and homogeneity of variances.

N

4 0

TABLE 3 Adjusted proportions (p

ik

f o r Glen Lyn d a t a .

k

SPECIES

ZN

0.0 0.05

0.50 1.0

REP

1 2 3 1 2 3 1 2 3 1 2 3

1 0.056 0.082 0.021 0.007 0.005 0.007 0.008 0.006 0.001 0.010 0.006 0.007

L

0.562 0.570 0.105 0.276 0.061 0.096 0.060 0.040 0.024 0.100 0.063 0.079

3 0.207 0.000 0.105 0.002 0.003 0.002 0.003 0.002 0.001 0.008 0.004 0.004

4 0.132 0.252 0.086 0.009 0.014 0.009 0.006 0.004 0.003 0.010 0.005 0.001

5 0.007 0.032 0.013 0.090 0.035 0.117 0.001 0.000 0.000 0.005 0.001 0.000

6 0.235 0.365 0.188 0.306 0.281 0.252 0.001 0.002 0.001 0.005 0.002 0.005

7 0.000 0.000 0.001 0.000 0.000 0.000 0.012 0.000 0.002 0.081 0.037 0.018

8 0.066 0.082 0.061 0.139 0.109 0.123 0.366 0.084 0.049 0.984 0.921 0.991

9 0.009 0.115 0.038 0.035 0.051 0.028 0.023 0.016 0.006 0.012 0.013 0.001

10 0.122 0.150 0.080 0.078 0.041 0.029 0.108 0.062 0.054 0.011 0.065 0.004

11

0.739 0.655 0.961 0.892 0.949 0.947 0.922 0.994 0.997 0.118 0.377 0.111

271 The examples discussed above are for relatively simple designs.

For

example, we did not test directly for a time effect or interaction between time and treatment in the Glen Lyn study. Analyses of more complex designs are possible but not as straightforward due to the use of repeated measurements on the same e-rimental

unit over time (Carter et al. 1982).

ACKNOWLEDGEMENTS This paper was supported in part by NIB grant #18770.

REFERENCES Bell, C.B., L.L. Conquest, R. Pyke and E.P. Smith 1981. Some nonparametric statistics for monitoring water quality using benthic species counts. pp.100-121. In Environmetrics 81: Selected Papers. Society of Industrial and Applied Mathematicians, Philadelphia. Berry, K.J. and P.W. Mielke 1984. Computation of exact probability values for multi-response permutation procedures (MRPP). Conrmunications in Statistics: Simulation and Computation 13:417-432. Bhattacharyya, A. 1946. On a measure of divergence between two multinomial populations. Sankhya 7:401-&6. Bradley, J.W. 1968. Distribution Free Statistical Tests. Prentice Eiall, New York. Carter, R.L., R. Morris and R.K. Blashfield 1982. Clustering two-dimensional profiles: a comparative study. Technical Report 175, Department of Statistics, University of Florida, Gainesville. Dyer, D.P. 1978. An analysis of species dissimilarity using multiple environmental variables. Ecology 59:117-125. Foutz, R.V., D.R. Jensen and G.W. Anderson 1985. Multiple comparisons in the randomization anlaysis of designed experiments with growth curve reresponses. Biometrics 41:29-38. Gauch, K.G. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press, Cambridge. Good, I.J. 1982. An index of separateness of clusters and a permutation test for its significance. Journal of Statistical Computation and Simulation 15 :81 -84. Gower, J.C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325-338. Greenacre, M.J. 1984. Theory and applications of correspondence a n a l y s i s . Academic Press, London. Kajdu, L.J. 1981. Graphical comparison of resemblance measures in phytosociology. Vegetatio &8:47-59. Kellawell, J.M. 1978. Biological Surveillance of Rivers: A Biological Monitoring Handbook. Water Research Centre, Stevange Englan. Kendall, D.G. 1971. Seriation from abundance matrices. In F.R. Hodson, D.G. Kendall and P.A.P. Tautu (eds.). Mathematics in Archaelogical and Historical Sciences pp.215-252, Edinburgh University Press, Edinburgh. Lamont, B. and K. J. Grant 1979. A comparison of twenty measures of site dissimilarity. L. Orloci, C.R. Rao and W.M. Stiteler (eds.). Multivariate Methods in Ecological Work. International Cooperative Publishing Kouse, Fairland, Maryland pp.101-126. Pitman 1937. Significance Tests Which May be Applied To Samples from Any Populations. Journal of the Royal Statistical Society (Series B), 4:119130.

27 2 Mielke, P.W., K.J. Berry, P.J. Brockwell and J.S. Williams 1981. A class of nonparametric procedures based on multiresponse permutation procedures. Biometrika 60:720-724. Milliken, G.A. and D.E. Johnson 1984. Analysis of Messy Data. Volume 1: Designed Experiments, Wadsworth Inc. London. Moore, W.E.C., L.V. Holdeman, E.P. Cato, I.J. Good, E.P. Smtih, R.R. Ranney, J. Bunnerster 1984. Variation in periodontal flora. Infection and Immunity 46 :720 - 726. Sneath, P.H.A. and R.R. Sokal 1973. Numerical Taxonomy: The Principals and Practices of Numerical Classification. Freeman Publishing Company, San Francisco. Stander, J.M. 1970. Diversity and similarity of benthic fauna off Oregon. M.S. thesis, Oregon State University, Corvallis, Oregon. 72 pp. Sullivan, M.J. 1978. Diatom community structure: taxanomic and statistical analysis of a Mississippi salt marsh. J. Phycol. 14:468-475. ter Braak, C.T.F. 1983. Principal components biplots and alpha and beta diversity. Ecology 64:&4-462. van Belle, G. and L. Fisher 1977. Nonitoring the environment for ecological change. Journal of the Water Pollution Control Federation 49:1671-1679.

OF

ASSOCIATION

A

CHLOROPHYLL

WITH

PHYSICAL

AND

CHEMICAL

FACTORS IN LAKE ONTARIO, 1967-1981

,

A.H. El-Shaarawi' David R. Peirson4

J. Richard E l l i o t t 2 , R.E.

Kwiatkowski3 and

'National Water Research Institute, Burlington, Ontario Department of Mathematics3 and Biology4 Wi lfrid Laurier University, Water loo, Ontario

1.

INTRODUCTION Eutrophication

lakes.

is

a

However, the

oligotrophic eutrophic

state

state

naturally-occurring

rate

at

which

(limited

a

nutrient

(excessive

all

in

from an

availability)

nutrient

accelerated by anthropogenic inputs.

process

lake proceeds availability)

to

an

can

be

The most visibile impact

of eutrophication on an aquatic ecosystem is that of increased biomass/number primary biomass

of

biological

producer has

supplies,

(algae)

been

organisms,

level.

This

responsible

reduced

for

depleted

or

tainted oxygen

hypolimnetic

(cold,

with

decaying algal m a t s , and

dead

or

deep)

waters

the aquatic resource, making i t

particularly

of

at

the

excessive increase drinking levels

lakes,

water

in

beaches

in the

clogged

overall impairment

of

unsuitble for other desirable

uses. Concerned the Lake

about

the deterioration of the water quality of

lower

Great

Lakes

Erie

Water

Pollution

Ontario-St.

Lawrence

due

State

(renewed in 1978). of

Lake

program

Ontario

eutrophication

Board

Water

governments of Canada and Canada-United

to

and

the United

Great

the

Pollution Lakes

(International

International Lake Board,

1969),

States signed

Water

Quality

Though whole lake water quality has

specifically

occurred addressing

Agreements was initiated in 1974.

since the

1967,

a

the

the 1972

Agreement sampling

surveillance

requirements

of

the

214 One of the objectives of the surveillance program was to describe lake conditions on a spatial and temporal basis. Due to the

fact

increased

that

increased biomass

nutrient

loadings

is an expected effect of

(Vollenweider,

1976),

it

was

concluded that the cumulative progress of eutrophication could be

described

by

the

long-term

periodic

measurement

of

phytoplankton standing stocks (Watson et al., 1975). Two

methods

phytoplankton

were

readily

biomass.

available

Direct

for

enumeration

measurement and

cell

of

volume

measurement with subsequent biomass calculation, or indirectly through

advantages (a

5

chlorophyll

measurement.

and disadvantages.

Both

Direct

methods

have

enumeration was

costly

time consuming ( 2 to 3 samples/day/person)

$lOO.OO/sample)

and difficulties existed with accurate

linear measurements of

the cells and adequate dry weight estimates of the organisms. The advantage, assuming

that

an adequate estimate of numbers

could be obtained by subsampling the parent population, was a highly

accurate

Chlorophyll done

estimate

of

phytoplankton

5 was inexpensive (=$2.00/sample)

quickly

and

disadvantage was

(80

efficiently simply that

biomass.

and

could

samples/day/person).

5

the chlorophyll

be The

concentration

within a given phytoplankton cell varies with species, age and the overall physiological status of the algal cell, as well as the

nutrient

and

physical

environment

in

which

the

phytoplankton cell exists. Due to

the

size of the Lake Ontario

surveillance program

(approximately 90 stations sampled monthly, Figure I),

direct

phytoplankton enumeration and measurement was considered to be too

costly

and

time

prohibitive

surveillance parameter. only biomass

to

be

used

as

As a result, chlorophyll

indicator routinely measured

a

routine

5 became the

on all surveillance

cruises. Now

that

concentrations corresponding

reductions have

been

reductions

in

spring

clearly in

algae

biomass

predicted by the Vollenweider model a result, the valuable Ontario.

long-term

historical The

relationships

aim

between

of

the

this paper

are

phosphorus (IJC,

is

chlorophyll

5

1983)

e x p e c t e d , as

(Vollenweider, 1976).

chlorophyll

record of

total

documented

As

data set has become a

trophic to

2

find and

status if

*

of

Lake

statistical

other

commonly

275

STATION PATTERN FOR LAKE ONTARIO, 1974-80

>.

STATION DELETED 1975

II

STATION NOT SAMPLED 1975-1976

A

STATION DELETED 1971

u

19 30

LilOlEl~SS

t 4300

b STATION ADDED 1975

+

STATION ADDED 1976

.> STATION ADDED 1977

1.

Fig.

measured

S t a t i o n P a t t e r n f o r Lake O n t a r i o ,

water

quality

variables

exist,

1974-80.

and

to

study

the

stability o f these relationships seasonally, as well as over time,

to

provide

insight

into

the

overall

limnological

processes influencing the trophic status of Lake Ontario.

2.

MATERIALS A N D METHODS

Chlorophyll 5 has been collected on all whole lake cruises on Lake Ontario since 1967. year ranged number

of

The number of cruises in any one

from three (1971,

1980) to

and 1973) to 9 5 (1969).

from 3 2

identical station and cruise

Consistency in whole lake monitoring began with the

surveillance cruises, starting in 1974.

Similarly, changes in

methodology in collection and analysis of chlorophyll occurred.

the

(1972

It should be pointed out that in the

pre-1974 years n o two years had patterns.

17 (19741, while

stations in any given cruise ranged

For

1968

and

1969,

continuous

5

have

fluorometric

measurements of surface waters were performed using the method

276 of

Lorenzen

(1966)

concentration.

to

In

estimate

1970,

photometrically using the Parsons,

1968)

Starting

in

at

phacopigments

the

analytical

Yentsch

the

0-20

a

1974. and

Further

integrator

details lists

on

for

the

data

used

in

computer

at

the

Canada

Ontario,

in

the

STAR

stored 1973

in

two

files,

inclusive

Lake

on.

efficient

The

-a

data

silica

procedures (19741, Due

can

and

found

differences

in

programs were r e q u i r e d had

been

record the

sampled

for

further

period

1967

variables

were

collected

between

found

on

a

Cyber

Burlington,

data

the

data

from

data

was

1967 t o

from

retrieved

1974

from t h e

p r o g r a m PRETl w h i c h w a s

the

(mg

chlorophyll

total

phosphorus

P/Q),

integrated

temperature

depth

at

(meters),

the

soluble

integrated particulate organic of

i n mg N / Q ) m e a s u r e m e n t s w e r e

on

frequently

integrated

not

analytical

Department

of

the

techniques,

these

SORT1 w a s 1969,

1969

methods

stations at

combine

at

recorded,

and

so

field

Environment

(1975).

select

July

measured

(metals),

N/Q),

sampling

to

July

be

were

analysis. to

location

can

Ontario

the

1973 measurements

in

to

and

in

(both

details

be

Lake

all

study were

(mg

and Watson and W i l l i a m s to

stored

secchi

nitrogen

Further

station

Waters,

phosphorus

C),

ammonia a n d n i t r a t e / n i t r i t e retrieved.

starting

i n v e s t i g a t i o n were

From 1 9 6 7 t o

particulate

1969)

are

The

the

(mg S i O , / Q )

(me C / Q ) .

phytoplankton

Inland

station

nitrogen

reactive

and

samples were obtained

containing

for

reactive

(deg.

biomass

using

(1969)

program a v a i l a b l e .

retrieved

soluble

point

total

for

particulate

sampling

carbon

base.

sounding depth of

P/Q),

(mg total

study

general access

retrieval

variables

(pg/Q),

Lorenzen

frequency,

for

other

The v a l u e s r e q u i r e d

analyzed

2)

1983. this

Centre

the

data base using the most

by

column.

also

Ontario cruises

one c o n t a i n i n g

and

were

(Schroeder,

cruise

the

water

phacophorbide

chlorophyll

i n Kwiatkowski and N e i l s o n ,

A l l

and

spectro-

( S t r i c k l a n d and

the

samples

5

described

stratum,

metre

parameter

chlorophyll

chlorophyll

determined

in

To p r o v i d e e s t i m a t e s o f

i n t h e t o p 20 m e t r e with

depths

(phacophytin

procedure

(1970).

was

SCOR/UNESCO e q u a t i o n

discrete

1972,

for

fi viva

the

chlorophyll

during

the

water

and

1971,

various

which c h l o r o p h y l l

data

into

applied

to

which

a l l

surface. a l l

sorting

of

the

a

single

data

from

relevant For

data

variables

277 except one

t e m p e r a t u r e and c h l o r o p h y l l were m e a s u r e d

meter,

and

the

remaining fewer

variables.

than for

than

75 s t a t i o n s ,

these;

measured data.

at

a

From

and

the

meter

and

used

from

20

the

1973,

at

a depth

combine

of

the

one

s u r f a c e d a t a on t h e

most

surface data

cruises

only

cruises

so

which

sampled

SORT1 w a s

sampled

more

t e m p e r a t u r e were

depth

applied

and

SORT3 w a s

sampling

generate

to

these

for

total

particulate

comparable

SORT4 a v e r a g e d

all

of

or

less

for

analysed

more

than

meters

used

was

phosphorus,

To

variables,

to

the variables except

total

chlorophyll.

readings

had

integrated

nitrogen,

remaining

and

remaining

a l l of

1974 on,

particulate carbon

for

one

1972

In

75 s t a t i o n s

used

ding

SORT2 w a s

r e a d i n g s on t h e s e v a r i a b l e s w i t h

meter

the

program

readings

the

each

organic for

correspon-

variable

per

stat ion. Some

data

variables variable for

were such

measured

at

secchi

depth

as

a n a l y s i s but

s i z e by more

others

would

were

partitioned

into The

alternate data

be

not. two

set

be

would

reduce

seemed

the

large

was

all

example,

a

enough

p o s s i b l e sample

instances,

the

not

not

often

measured

containing

analysis

For

In o t h e r

consistently

subsets

as

measured

Consequently,

second

once

station.

would

approximately one-third.

variables.

3.

each

including it

variables

where

were

sets

data

at

one

would

different performed

enough t o warrant

or

stations become

sets

unless

of the

it.

RESULTS The

the

following

tables

regressions

on

i d e n t i f i e d by t h e i r maximum

R-squared

number o f

cases

the

value

total

in

t h e program

in

the

drop

is

most

obtained 1974-1981,

graphs

1967

summarize

to

1984

starting dates. values

for

for for

the

the

easily

results

versus

inclusively.

and

in

with

R-squared

value

the

August

Julian

into

the

default

There

Figure

T a b l e s 2.1

entered

1.1 t o 1.4 c o n t a i n

This

from t h e y e a r s

identified

R-squareds

given each variable

July

2.4

can

1974 t o

2,

for

be

1981.

which

the

is

conditions

is a minor

which

days

to

of are

together

equation using

in

results Cruises

cruise

entering variables.

values

the

data.

Tables

each

for each regression.

R-squared

more c l e a r l y i n

and

the

seen This

graphs the

drop

the

period

indicate the rating

equation

by

the

stepwise

TABLE

1.1

R2 v a l u e s t o 1971

for

Date

** ** ** ** ** NS

** ** ** ** **

** NS ** ** ** ** NS NS NS

** ** ** ** NS

NS

** ** ** ** ** ** NS

** ** ** ** ** NS NS

* ** **

Jan Feb Mar Mar Mar Apr Apr Apr Apr May May May May Jun Jun Jun Jun Jul Jut Jul Jul

3 30 31 31 12 27 28 30 I 12 25 29 3 9 20 22 2

data

on

Number

from 1967

of Cases

Number o f Variables

13 46 49 21 20 17

7 6 7 8 3 7

.61 .60 .49 . 9 3 (**) .54 .59 .80

11

18

7

(**)

.09

37 24 19 30 37 15 12 37 14 40 31 23 31 37

.52 .31 .88 .84 (**) . 6 8 .II

25 34 21 17 33

.83

38

3

.53

33

4

.78 .34 .66 .79 .67 ( .92 .86 .47 .60 .67 . 8 9 (**I . 5 6 (**I

22 28 21 27 16 16 26 25 30 22 24 15 23 19

I 3 5 5 8 6 3 4 7 5 8 8 3 7 7 3

.71 .67 .71 .75 .67 .97 .20 .52 .86 .45 .34 .46 .I1

R 10 16 Jill 2 3 J u l 29 Aug 5 Aug 5 Aug 9 Aug 1 7 Aug 1 9 Aug 2 1 Aug 2 1 Sep 5 Sep 8 Sep 10 Sep 14 Sep 16 act 1 Ost 2 Oct 5 OCt 8 Oct 1 7 O c t 27 Oct 2 8 Oct 31 NOV 15 Nov 1 6 NOV 1 8 Dec 1 Dec 8

.I9

.44 .21

.oo

.91 .85 .40 at at

regression

program.

order

done

R2

* Significant **Significant

its

regressions

the the

of

.73

(**)

3 5 3 8 2

6 3 8 3 3 6 7

5 3

5 4 7 8 3

36

39

59 42

54

18

22

7

7

7 6

7

5% l e v e l 1% l e v e l

The

rating

e n t r y by t h e

total

was

obtained

number of

by

multiplying

variables

available

(8 f o r 1 9 6 7 t o 1 9 7 2 a n d 9 f o r 1 9 7 4 t o 1 9 8 1 ) , a n d t h e n d i v i d i n g by

the

number

of

equation.

An

obtained.

This

importance equations.

of

variables

average was

each In

rating

done

to

variable

Tables

1

which for

each

facilitate

between

and

actually

2,

a

variable

was

and

under

the then

of

the

over

all

comparison

equations line

entered

the

number

279 1.2

TABLE

R2 v a l u e s

f o r r e g r r r s i o r l r d o n e on d a t a

R2

Date

** ** ** ** ** ** ** ** ** ** * **

**

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** NS NS

** ** NS ** ** **

**

** **

** **

** **

Jan Jan Jan Jan Jan Jan Jan Jan Feb Feb Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun Sep nct Oct Oct Oct Nov Dec Dec Dec Dec Dec Dec

3 3 8 9 15 16 29 30 12 26 6 12 13 26 27 4 5 10 10 10 11 17 18 24 25

lowest

(72) (72) (72) (72) (72) (72)

11 12 18 19

the

the

important

the

.

.I5 .84

.56 .51 .77 .38 .70 .87 .63 .86 .75

obtained

determined In

the

nitrogen and

1% l e v e l ) , first

at the

significant in

by

years

1967

(second

temperature

were

In

to

and

1971

at

and

at

most

the

important

the

with three

the most

1% l e v e l ) ,

the

and

first

1% l e v e l

the

1973,

the

three

nitrogen

(second

soluble

second

5% l e v e l .

the

1% l e v e l

the

silica

the

(third at

(first

and

at

variables

nitrate/nitrite

5% l e v e l )

1% l e v e l

at

1972

soluble reactive the

2,

Table choosing

secchi depth

5% l e v e l ) .

variables

and

at

.

v a r i a b l e s were

the

(first

.

averages were

5% l e v e l ) at

2 2 2 7 2 2 2 2 3 3 7 3 2 3 2 3 2 3 8 7 3 3 3 3 3 3 3 3 3 8 7 7 7 3 7 8 7 7 2 3 3 2 2 2

t h e v a r i a b l e was

nitrate/nitrite

second

19 33 23 19 37 44 17 21 17 17 29 23 44 26 43 24 43 25 15 32 19 24 16 23 18 31 19 28 20 16 31 31 24 18 31 17 59 32 43 19 17 40 37 21

.

(I) (2)

ratings.

important

Number o f Variables

.79 .73 .81 .94 .84 84 .93 .94 .73 .85 .40 .47 .33 .61 .42 .60 .53 74 .95 .74 .50 58 .62 .81 76 .75 .79 .59 78 .88 .66 .84 .80

(2)

72) 72) 72) A 72) B 72) 72) 72) 72) 72 1 72) 72) 72) 72) 72) 72) 72) 72) 72) 72)

indicates that From

Number of Cases

.

I

variables

(1)

(73) (73)

2 8 9 23 23 19 5 3 17 30 30 20 5 5

from 1972

1973

to

at

reactive

and most

(third

the

at

at

1% l e v e l

phosphorous

5% l e v e l ) .

For

the

TABLE

1.3

R2 t o

for

values 1977

repressions

R2

Date

** ** ** ** ** ** ** ** ** ** ** **

*

** ** ** ** t* ** ** ** ** ** ** ** ** ** ** * ** ** ** ** ** ** ** NS

** ** * **

** ** **

** ** ** ** ** **

** ** ** ** ** ** ** **

**

* ** ** ** ** ** ** ** **

Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May Hay Hay May May Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jun Jul Jul Jul Jul .Ju1 Jul Jut Jul Jul Jul Jul Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Aug Sep Sep Sep Sep Sep Sep Sep

15 15 5 5 11

11 12 12 16 16 26 26 29 29 9 9 13 13 23 23 2 2 3 3 6 6 7 7 17 17 28 28 28 28 2 2 2 I8 18 19

LL 21 22 27 27 27 27 6

6 12 12 12 12 12 12 15 15 17 17 19 19 2 2 2 2 2 3 3

(77) (77) (76) (76) (75) (75) (77) (77) (74) (74) (76) (76) (74) (74) (77) (77) 74) 74) 75) 75) 75) 75) 74) 74) 77) 77) 76) 76) 74) 74) 76) 76) 76) 76) 74) 75) 75) 77) 77) 76) (75) (75) (74) (76) (74) (76) (76) (74) (74) (74) (74) (74) (74) (75) (75) (77) (77) (76) (76) (74) (74) (75) (75) (75) (75) (75) (74) (74)

A B A

B A

B A B A B A B A B A B A B A B

A B A B A B A B A B A B C

D A B C A B A

A B A A

a C

D A

B A B

c D A

B A B A B A B 0 A B

c D A B

.

96 .88 .93 .93 .95 .90 .92 .86 .86 .R3 .95 .94 .93 .94 .94 .90 .92 .72 .79 .69 .93 .92 .89 76 .95 .90 .95 .84 .78 f38 .82 64 .79 .65 .85 .73 .38 .77 .73 .32 .60 .70 .69 .55 .63 .67 .68 .39 .17 .80 .68 .79 .54 .58 .65 .55 76 .89 .84 .66 77 .86 63 .58 .70 .31 .a2 .70

.

.

.

. .

done

on

data

from

1974

Number of Cases

N u m b e r of Variables

31 71 20 48 21 45 50 87 19 38 24 47 16 40 48 93 21 40 23 43 17 28 22 40 31 53 24 46 27 42 14 23 42 64 14 20 62 28 82 18 28 46 30 23 40 35 58 22 42 15 30 15 20 28 46 27 53 27 47 22 39 13 24 42 18 31 20 33

7 6 7 6 8 7 7 6 5 4 5 4 5 4 7 6 6 5 5 4 8 7 5 4 7 6 7 6 5 4 6 5 5 4 5 6 4 5 4 3 5 4 5 6 2 5 4 5 4 5 4 5 4 5 4 5 4 7 6 5 4 7 6 5 6 5 6 5

283 R2 v a l u e s f o r r e g r e s s i o n s done

1.3

TABLE

1977

to

Date

** ** ** ** ** ** ** * ** ** ** **

** ** ** NS

** ** *

** **

** ** ** ** ** ** ** ** ** ** ** ** ** **

R2

14 14 15 NOV 15 Nov 1 5 Nov 1 5 Nov 2 5 Nov 2 5 Dec 3 Dec 3 Dec 5 Dec 5 Nov

NOV

5 4 7 6 5 4 6 5 5 6 5 5 4 5 4 5 4 7 6 5 4 5 4

.

1977,

soluble

reactive

reactive

phosphorous

5%

29 63 38 94 24 42 18 32 22 18 34 34 65 20 43 20 37 18 45 19 53 19 44 18 53 11 33 28 63 16 38 21 48 16 45

.

1974 t o

level)

1974

from

Number o f Variables

.

Nov

period

data

Number o f Cases

.95 88 .73 .78 ,91 .92 .91 .87 .99 .91 .91 .89 .87 .78 .72 76 .67 .93 .82 .70 .75 .89 .72 .54 .69 .96 .79 .69 76 .84 .70 .89 .92 .86 .73

7 7 12 12 16 16 22 22 30 4 4 4 4 14 14 15 15 25 25 25 25 3 3

Sep Sep Sep Sep Sep Sep Sep Sep Sep Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Nov Nov

on

Continued

and

the

silica

three

total

most

(third

(third at

5 4 6 5 .5 4 6 5 5 4 5 4

important

at

1% l e v e l ) ,

the

1% l e v e l a n d

the

particulate

nitrogen

were

variables

soluble

second

(first

at

at

the both

l e v e l s 1. For

all

inclusive

remaining

were

observed.

This

inconsistencies above. period

The

summaries,

excluded

the

method

inconsistencies

virtually

impossible

of

also to

from

1967

significant

no

principally

was

in

results

since

due

data

to

to

1973

trends

were

the

numerous

collection,

made

the

relate

to

results the

as

noted

from

1974

this

to

1981

data. Examination variables R-squared high.

of

available value

Total

was

the

data

indicates

that

the

number

to

the

regression

was

lower

when

the

low

versus

value

was

particulate

when

nitrogen

the

was

R-squared

the

most

common

of

first

282

-

. .. ... ..<: :. ....'... ' 87-. -. . . . . . . - z. 97--

N

0

0 7

X 67 -

. . ..

.

. ... . . . .. .- . .. . -.

77:.

I

..:.. ..

% ,.*

. . 0

..

.

. * .

9.

. .

57 J

47 -

. .

3727 -

Fig. 2. R 2 v a l u e s f o r r e g r e s s i o n s d o n e o n d a t a f r o m 1974 to 1 9 8 1 .

v a r i a b l e when the

year,

the

the

R-squared

was

low.

of

the

percentage

accounted

for

the

equation;

they

were

percent.

Based

on

were

three

variables

very

high,

usually

the

ratings

4.

i n o r d e r of

between

from T a b l e

integrated

their

R-squared entered

2,

the

and

100

70

the

value

into

three

most

for estimating the chlorophyll

s o l u b l e r e a c t i v e p h o s p h o r u s and carbon,

total

first

frequently utilized variables concentration

Consistently throughout

total

particulate

5

nitrogen,

integrated particulate organic

importance.

DISCUSSION number

A

of

d i s t r i b u t i o n of 1970;

Nicholson,

et al., and past

1973;

have

chlorophyll 1970;

been

2

relate

1980).

published

Few

chlorophyll

on

o v e r Lake O n t a r i o

Glooschenko

Glooschenko and

Kwiatkowski, to

papers

et &.,

Dobson, attempts

2

1975; have

the

horizontal

(Chau et

1972;

al.,

Glooschenko

Kwiatkowski, been

concentrations

to

made the

1978

in

the

other

283 TABLE

1.4

R 2 v a l u e s f o r r e g r e s s i o n s d o n e o n d a t a from 1 9 7 4 to

1977

Date

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** * *;

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **

Mar Har Mar Mar Mar Mar Har Mar Apr Apr Apr Apr

Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun Jun Jun Jun Jun Jun Jul Jul Jul Jul Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug

R2

16 16 20 20 21 21 24 24

. .

4 4 8 8 10 10 21 21 27 27 30 30 8 8 19 19 28 28 5 5 15 15 25 25 4 4 13 17 30 30 8 8 10 10 25 25 27 27

Sep 5 Sep 5 S e p 14 S e p 14 S e p 17 Sep 17 Oct 5 Oct 5 O c t 10 O c t 10 Nov 1 4 Nov 14 NOV 1 6 Nov 1 6 Nov 1 9 Nov 1 9 Dec 7 Dec 7

.86 .86 .83 76 96 .92 .94 .94 .94 .R4 .95 .92 .94 .91 .93 .92 .87 .80 .95 .85 .95 .82 .78 .79 .93 .82 .82 .73 .89 .85 .90 .88 .66 .72 .57 .43

79) 78) 78) 81) 81) 79) 79) 78) 78) 81) 81) 80)

80) 79) 79) 78) 78) 31)

B A B A B A B A B A B A B A

B A B A

81)

5

79) 79) 31) 31) 78) 78) 38) 38) (81) (81) (79) (79) (81) (81)

A B A B A B A B A B A B A 5

.80 .66 .79 .66 .86 .87 .65 .69 .80 .62 .66

.62 .88 .85 .91 .90 .74 .82 .85 .73 .55 64 .90 .82 .89 .84 .93 .79

.

Number of Cases

Niirnber o f Variables

40 90 39 83 43 89 42 88 43 94 35 92

7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 5

45 92 59 92 55 93 45 93 27 56 62 94 35 59 33 54 61 93 34 59 30 58 55 92 33 59 28 58 45 89 35 93 26 59 54 92 51 94 52 94 44 93 26 59 18 53 31 88 39 93 31 93

4 7 6 7 6

5 4 5 4 7 6 5

4 5 4 5 4 5 4 5 4 5 4 7 6 7 6 5 4

7 6 5

4 7 6 5 4 5 4 5 4 7 6 7 6 5 4

284 Rated o r d e r of v a r i a b l e s

TABLE 2 . 1 Date

Depth

Jan I1 Feb 3 Mar 3 Mar 3 0 Mar 3 0 Mar 31

Aug

Aug Aug Aug Auz Aug

Sep Sep Ser, Sep Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct Nov Nov Nov NOV Dec Dec Dec

2.67 4.00

--

5.00 1.60 2. 67 1.60

variables Notable

8.00

5.33 5.33 8.00 8.00

2.67 __ 6.40 4.80 1.00

-

Stadelmann only one

and

a

chlorophyll the

a

1.14 3.43 3.43

1 .00 3 .00

5.71

4.57

6.86

6.86

5 . 71 3.20 --

4.80

0.00

8.00

5.00

(8.00)

4.57

3.00 7 . 0 0 2.00 1.14 2.29 3.43

8.00

7.00

6.00

5.00

2.00

3.43 7.00 6.40

2.29

(8.00) 3.00

6.36

(8.00)

1. I 4 2.00

5.00 6.00

7.00 5.00 4.57

1.00

2.00

1.00

4.80 6.40 3.20

2.00

6.00 7.00 6.86

3.00 8.00

2.29

1.00 3.43 - -

4.00 6.00

3.00

7.00

5.00

3.20 8.00 6.00 5.71 7.43

4.80 6.40 7.00 4.57 5.71

2.00 1.14 1.14

6.00 4.57

(8.001

5.71

8.00

3.00 2.67

8.00

1.60 6.86 8.00 4.00 6.86 8.00 2.29

1.14

3.43

8.00

(8.00)

7.00 4.57

3.43 6.40 6.00 4.57 3.00 4.57

2.00

1.00

7.00 8.00 4.00

1.16 7.00 6.86

I _

8.00

5.00 5.71 1.00

5.71

2. 67 5.71 4.00 5.71

5.00

6.86

6.86

1.99 2.79 3.19 3.18

3.43 2.29 -

4.98

are significant

5.09 at

5% l e v e l

the

measured are

Munawar

(1975), located

Kwiatkowski concentrations

The p r e s e n t

on

on and from

a

4.49

__ 2.18 2.05 3.0L

surveillance and

although

variables paper

1.97 4.87

the

Stadelmann

2.15 2.31

1.97 --

2.20 2.20 5. 11

I _

2.29

2.00-

8.00

physio-chemical

1974.

2.00

5.71 4.57 2.29

1.00

(8.00)

1.60

few s t a t i o n s year.

6.86

(8.00)

8.00

4.80

1.84 2.04 3.95

routinely

4.57

(8.00)

(5.00)

I _

exceptions

3.43

8.00

rn

1.14

underlined

6.86

8.00 2.67

2.67

4.61

year,

8.00

8.00 2.67

3.20

*Values

with

-

1.00

2.35 3.37

to

2.29 5.33

1.14

1 8

over

8.00

1.00

27 5 5 9 9 17 21 5 10 14 16 I 2 2 5 13 17 27 28 31 31 15 15 16 18 1

1.14 -

TP

SRP

N02,NOs

8.00 5.71 4.57 7.00 8.00

4.00

5.00

1.60

May 1 Mav 1 2 May 2 5 May 27 Jun 3 Jun 9 J u n 20 J u n 22 .Jill 2 Jul 8 J u l 10 J u l 16

NH3

3.43 6.86 5.71

2.67

12 Apr 28 Apr 30

Jul

Temp S e c c h i S i l i c a

(70) 4.57 2.29 (70) 2.29 *1.14 (70) (71)A (5.00) 6.00 ( 7 1 ) B 2.29 3.43 (70) 5.33 8.00 1.14 (69) (70) 8.00 (68) 6.40 (69) 5.33 2.67 (69) 6.00 4.00 (8.00) 5.33 70) 6.86 68) 5.71 8.00 69) 5.33 69) 3.00 69) 5.33 2.67 70) 68) 4.57 5.71 69) 6.00 4.00 8.00 ?.20 67) 8.00 5.33 70) 67) 4.80 3.20 67) 69) 3.00 4.06 7 1 ) A 4.00 2.00 71)B 5.71 70) 2.67 8.00 8.00 (57) 2.67 2.2 6.00 (67) (69) 4.00 5.33 (70) 8.00 8.00 1.60 (67) (67) (69)A 4.00 5.00 3.43 ( 6 9 ) B 6.86 (68) 6.86 2.29 (70) 5.33 (67) 3.20 6.40 (68) 4.57 2.29 (67) 4.80 3.20 2.00 (69)A 3.00 ( 6 9 ) B 3.43 2.29 ( 7 1 ) A 6.00 5.00 1.14 (71)B 3.43 8.00 (70) 5.33 (68) 4.57 1.14 5.00 ( 6 9 ) A 6.00 8.00 (69)B 2.67 5 . 3 3 (70)

Apr

1967 t o 1971

Fraser these

transect

El-Shaarawi all

measured,

and

were r e s t r i c t e d of

Lake

(1977)

surveillance

represents the

program.

(1974)

but

only

first

Ontario related stations for

attempt

one to

285 TABLE 2.2

Rated order of variables 1972 to 1973

Date Jan Jan

Jan Jan Jan Jan Jan Jan Feb Feb Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr May May May May May May Jun

Sep Oct Oct Oct Oct Nov Dec Dec Dec Dec Dec Dec

3 3 8 9 15 16 29 30 12 26 6 12 13 26 27 4 5 10 10 10 11 17 18

24 25 1 2 8 9 23 23 19 5 3 17 30 30 20 5

5 11 12 18

19

Depth

Temp

8.00 8.00 8.00

4.00

(73)(1) (73)(2) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (73) (72) (72) (72) (72)A (72)B (72) (72) (72) (72) (72) (72) (72) (72) (72) (72)A (72)B (72) (72) (72) (72) (73)A (73)R (72) (72)(1) (72)(2) (72) (72) (72) (72)

Secchi

Silica

NOZ,N03

(8.00)

2.29

3.43

5.72

6.86

SRP

TP

4.00

4.00

4.34

5.72

4.00 4.00 4.00 4.00 5.33

8.00

8.00 8.00 5.00

2.67 5.33 2.67 1.14 3.43

8.on 8.00

5.33

8.00 8.00

__ 2. 67

2.67

5.33

8.00

2.67

8.00 (8.00)

4.00

4.00

4.00 8.00

5.33

8.00

2.67 2.67 2.67

__ 2.67

5.33

5.33

8.00 8.00 8.00 8.00 8.00 8.00 8.00 8.00

4.00

8.00

2.67

2.67 2.67 7.00 2.29 8.00 4.57

5.33 5.33 6.00 8.00

4.57 6.86 5.33 4.57 4.00

__ 1.00

2.67

8.00

6.84 7.00

2.29

8.00

4.57

3.43

4.00

8.00 8.00

4.00

2.00

4.00 4.00

8.00 8.00 8.00

1.00

7.00

5.33 2.67

5.33

5.33 5.33

(8.00)

6.86

1.14

4.57

2.29

5.00

8.00

5.33

8.00 5.33 6.00 6.86

2.67

3.00

2.00

5.72

2.29

3.43

1.14

5.00 3.43 1.14 2.29 -

3.00 6.84 3.43 8.00

2.00 5.72 5.72

1.14

5.72 6.00 5.72 6.86

8.00

3.00

3.43

2.29

8.00

4.57

4.00 4.57

3.43 1.14 5.00 1.14 1.14

6.36 5.72

2.29

3.43 1.00

2.29 8.00 4.57 8.00

1.77 2.24 -

1.14 2.29 1.68 1.71 -

6.86 5.72

1.14

8.00

4.00

3.61 - 4.18 - 5.27 3.56 3.76 5.17 -

-

2.11

2.75 __

4.60 4.37

6.37

the other variables

2.67 -

1.46 __ 1.46 __

__ 2.11

e s t a b l i s h if a c o n s i s t e n t and

NH3

9 4.24

pattern exists between chlorophyll measured

on

the Lake

Ontario

a

cruises

1967-1984.

In Lake by

Ontario,

chlorophyll

spring pattern

and has

a

fall, been

phytoplankton biomass maxima,

concentration, w e r e c o n s i s t e n t l y

with

minima

reported

in

as b e i n g

the

summer.

represented found This

in t h e bimodal

typical of large temperate

oligotrophic lakes (Hutchinson, 1 9 6 7 ) . It

is

indicators,

interesting integrated

to

note

that

particulate

the organic

other

biomass

carbon

and

286 M L E 2.3

Rated order of variables 1974 to 1977

Date

Depth

Mar 15 (77)A Mar 15 (77)B 5 (76)A

Apr Apr

5.72 (8.00) 5.72 5.33 8.00 2.29

5 (7618

A D r 11 (75)A

Apr 11 (75)B 4.57 Apr 12 (77)A Apr 12 (77)B 4.00 Apr 16 (74)A 4.110 Apr 16 (74)B 8.00 Apr 26 (76)A 8.00 Apr 26 (76)B 8.00 Apr 29 (74)A (8.00) Apr 29 (74)B 8.00 May 9 (77)A 4.57 May 9 (77)B 5.33 Mav 13 (74)A 6.67 Ma; 13 (74)B 1.60 May 23 (75)A 4.80 May 23 (75)B 6.00 Jun 2 (75)A 6.00 Jun 2 (75)B (8.00) Jun 3 (74)A 4.80 Jun 3 (74)B 6.00 Jun 6 (77)A 8.00 J u n 6 (77)B 5.33 4.57 Jun 7 (76)A Jun 7 (76)B 4.00 J u n 17 (74)A 3.20 Jun 17 (7418 4.00 Jun 28 (76)A 5.33 Jun 28 (76)B 6.40 Jun 28 ( 7 6 ) c 3.20 4.00 Jun 28 (76)O J u l 2 (74)A 4.80 J u l 2 (74)B 5.33 J u l 2 (74)C 6.00 J u l 18 (77)A 6.40 8.00 J u l 18 (77)B J u l 19 (76)A 8.00 J u l 21 (75)A 8.00 J u l 21 (75)B (8.00) J u l 22 (74)A 4.80 J u l 27 (76)A 5.33 J u l 27 (76)C (8.00) J u l 27 (76)O 6.00 Aug 6 (74)A 1.60 Aug 6 (74)B 2.00 Aug 12 (74)A Aug 12 (7418 2.00 Aug 12 (74)C Aug 12 (74)O 2.00 Aug 12 (74)A 6.40 8.00 Aug 12 (7418

1.60

2.35 3.37 4.61

integrated biomass,

noted

8.00 6.67 3.43 4.00 6.00 8.00 5.72 8.00 3.20 6.00 6.40 6.00 3.20 6.00 6.86 6.67 8.00 8.00 3.20 4.00 2.00 2.29 6.40 8.00 2.29 2. 67 8.00 8.00 4.80 6.00 4.00 8.00 4.80 8.00 6.40 2.67 4.00 3.80 4.00 5.33 3.20 4.00 1.60 8.00 4.80 8.00 3.20 4.00 4.80 4.00 8.00 6.00 3.20 4.00

4.57

2.29 -

2.29 -

4.57

(8.00)

1.14 4.00

-

SRP

IPOC

3.43 1.33

6.86 2.67

1.00 1.14 2.29

5.00 4.57 3.43 2.67

1.148.00 6.67 2.67

3.00 3.43

5.72 1.14 1.33

5.33

E

m

2.00 1.60 -

3.20

2.

4.00

on

4.

no

6.40 2.00

2.29 -

5.72 4.00 -

1.33 (6.40) 4.00

3.00 5.72

3.20

-

4.57 -

-

3.43 -

8.00 2.29 6.67

6.86 6.40

x -

1.33 4.00 4.80 (8.00) 2.00 __ 1.00 3.43 8.00 4.00 1.14 1.33 5.72 2.67 1.60 2.00 1.60 __ 1.60 2.00 8.00 (8.on) 8.00 8.00 6.00

6.40 3.20 6.67 4. A0 -

(8.00)

1.14

1.60 2.00 -

6.86

5.72 6.67

4.00 -

1.14

3.43 5.33

1.33

(8.00) (11.00) (8.00) 4.80 8.00 6.00 1.60 1.33 2.00 1. 60 2.00

2.67 3.20

-

4.00

_ .

8.00 8.00 6.00

8.00 1.33

4.00

E 4.00

3.20 -

4.80

6.40 4.00

(R.00)

4.80 6. on

3.20 4.00 1.99 3.19 4.98

2.79 3.18 5.09

particulate plus

-

6.00 4.57

6.00 6.40 8.00

3.20 6.67 6.40

3.20

living

ts.oo)

7.00 6.86

8.00

4.80 6.00

in

8.00 5.33 6.40 1.60

1.60 2.00 m 2.67 3.20 2 .00 6.40

6.40

4.80

-

-

2.67 3.20 _ .

-

2.67 -

-

4.00

37z

8.00 2.67

-

6.67

-

the

ITP

1.33 2.00 -

(8.00)

living

that

ITN

8.00 4.00 6.86 6.86 6.67

7.00

variables

-

Silica 5.33

-

total

phytoplankton been

Secchi

1.84 2.04 3.95

e.g.

influential

Temp

2.20 2.20 5.11

1.60 -

2.00 -

nitrogen

detritus)

2.15 2.31 4.49

1.97 1.97 4.87

(measure

proved

to

explaining chlorophyll biomass) v a r i a b i l i t y .

bicarbon

ratio

(a

2.18 2.05 3.01

ratio

of

of

be

a

It

total

the

most

(measure of has

the

already

amount

of

287 TABLE 2.4

Date Mar Mar Mar Mar Mar Mar Mar Mar Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr Apr

Apr May May May May May May Jun Jun Jun

Jun Jun Jun Jul Jul Jul Jul

Jul Jul Aug Aug Aug Aug Aug Aug Aug Aug Sep

Sep Sep Sep Sep Sep Oct Oct Oct Oct

Depth

6.86 16 (81)A 6.67) 16 (81)B 20 (78)A 5.72 8.00 20 (78)B 5.72 21 (79)A 21 (79)B (8.00) 8.00 24 (80)A 24 (80)B 4.00 4 (81)A 6.86 4 (8118 8.00 8 (79)A 8.00 6.67 8 (79)B 6.86 10 (78)A 10 (78)B 8.00 21 (80)A 6.86 21 (80)B 8.00 27 (81)A 6.40 27 (81)B 6.00 30 (79)A 5.72 30 (79)B 6.67 4.57 8 (78)A 8 (78)B 5.33 19 (81)A 4.80 8.00 19 (81)B 28 (79)A 8.00 8.00 28 (79)B 8.00 5 (78)A 5 (7B)B 8.00) 15 (78)A 6.40 1 5 (78)B 4.00 6.40 25 (79)A 4.00 25 (79)B 3.20 4 (78)A 6.00 4 (78)B 13 (81)A 3.20 4.00 13 (81)B 6.40 30 (79)A 6.00 30 (79)B 8 (78)A (8.00) 6.00 8 (78)B 10 (81)A 4.57 8.00 10 (81)B 25 (80)A 5.72 25 (80)B 6.67 4.80 27 (19)C 6.00 27 ( 7 9 ) 0 5 (78)A (8.00) 5.33 5 (78)B 14 14 17 17 5 5 10

10 Nov 14

NOV 14

NOV 16 Nov 16

Nov 19 Nov 19

Dec Dec

Rated order of v a r i a b l e s 1978 t o 1981

7 7

(81)A (81)B (79)A (79)B (81)A (81)B (78)A (78)B (78)A (78)B (81)A (81)B (79)A (79)B (81)A (81)B

6.00 4.80 6.00 6.40 6 .00 4.57

3.43

4.57

6.72 8.00

1.14

(8.00)

4.57 5.33 6.86 6.67 6.86 5.67 3.43 5.33 6.86

(8.00)

3.43 4.52 4.52

8.00

5.72 2.67 -

2.29 5.33

2.29

3.20

1.90 4.00

8.00

(8.00)

4.80

-

(8.00)

1.33 4.80 2. on

2.67 5 . 72

2.67

6.40 4.00 2.29 5.33

3.20

6.00 1.14 4.00 1.60 -

m

2.29

5.33 3.20 4.00 -

6.40

-- -

2.00) 5.72 5.67

R.no

2.00 -

8.00

6.40 2. no 6.40

1.60 8.00

1.60 __

2.00 1.60 2. on 1.60

8.00 4.00 4.80

6.86 2.67 4.51 8.00

5.72 4. 00

1.14 1.33 -

3.43

__

2. 00

3.43 1 . 1 4 __ 6.67

3.20

6.40

4.57 4.00

6.68

-

5.72 8.00

4.80 -

1.14 1.33 -

I. 60 2.00 2.29

2.67

2.67 4.00 1.60 __ 4.51 4.00

8.00 2.29 6.67

8.00 6.67

1 .14 1.33

4.80 4. no

6.40 8.00

3.20

- -

2.51

2.92

1.60 2.00 1.69

2. 71

2.92

1.96

4.534.99

;%55

3.43 8 . no

2.00 -

2.29 1.14 2.67 1.33 3 . 4 3 1.14 5.33 1.33

4.80

~

m3-m

1.33

mA.OO

8.00

6.86 3.00

4.51)

1.33 R.00 6.00

2.00

5.72 2.67

3.68 4.04

8.00

8.00 3.43 1 . 1 4 1.33 4.00 3.43 6.67 1.33 (8.00) 5.33 2, 0 0

2.00 1.60

1. 60 -

4.51 4.66

--

-ITb(i

6.86 6.67

3.20 4.00 8.00 4.00 5.72 6.67 6.40 6.00

8.00

( 8 . 0 0 ) 2.29 6.67 4.00 3.43 5.72 -1_ .33_ 4.00 2.29 1.14 -2.61 _ _ 5.33 2.29 5.72 2.67 4.00 -

8.00 (8.00)

4.00

2.29 5.33 2 .67

8.00

4.80

6.40

2.67

-

1.33 1.14 1.33 1.14 2. 67

6.00

6.00

3.20 8.00

4.00

1.33

ri-6 ~ . __

3.20 -

3.20

4.80 6.00

1.60 2.00

--

-

4.80

8.00 5.33 3.86 4.00

.

2.Zv3-73

(8.00)

1.60

_

2.29 2.67

3.43

(8.00)

6.00

-

6.86 1.33

1.33 1.14

2.29 8.00 2.29 4.00

IPOC

SRP

2.29 2.67

1.14 1.33 -

6.77 4.57 6.67

8.00

4.00 3.43 5.33 4.80

6.67 4.57 5.33 5.12

ITP

ITN

4.57

6.86

8.00 6.40 4.00

Silica

5.72

1.60 -

8.00) 3.43 8.00

Secchi

4.00 -

8.00

5.33

Temp

3.20 4.00 R.00 R.OO 8.00 8.00

6.86 6.67 2.29 2.67 -

5.72 5.33 6.96 8.00

3.20 __

8.00 -

2.47 2. 70

2.44 3.00

m m

288

living

material

to

the

amount

of

detritus)

layer of Lake Ontario is uniform and

for

the

0-20 m

approximately equal

to

12.5%, while the summer ratio is highly variable, ranging from 25 to 50% (MacRinnon and Kwiatkowski, 1980). Obviously the relationship between detrital pool to living material

plays

an

important

role

in

the

variable

interrelationships found in this study. It was found that over 9 0 % of the variation in the size of the

algal

biomass

(as

estimated

by

the

chlorophyll

5

concentration) could be accounted for by a multiple regression model

typically

integrated

containing

total

only

particulate

three

variables;

nitrogen,

soluble

namely, reactive

phosphorus and integrated particulate organic carbon, in order of

importance.

Moreover,

the

coefficients

in

the

multiple

regression models, although varying within each year, appeared to be similar from year to year.

It was also noted that the

magnitude of the squared multiple correlation coefficient was positively that

correlated with

i s , the R-squared

the size of the algae population;

values

were high when

the

was large and low when the population was low. values

were

consistently found

period, with

during

5

between chlorophyll during

-a/phytoplankton decreasing cation,

the

summer

ratios

nutrient

for

why

stratified

have

been

peak

15

to

25

to

dominated agreement dominated

in was by

the low

greens

5

found during

phytoplankton in and

summer

known

with summer

stratifito

settle

In Lake Ontario thermocline

meters,

Good agreements between chlorophyll have been

the

subthermocline chlorophyll and

the

0-20 m

the water

column.

thus

integrator may not have adequately sampled measurements

decrease

summer

been

variables

Chlorophyll

during

During have

form a

(Steele and Yentsch, 1960). from

relationships

period.

(.e.p

phytoplankton populations

ranges

the

found

concentration

of the epilimnion and

depth

stratified

and the other physio-chemical

stratified period, Tolstory, 1979). out

L o w R-squared

summer

the physical rather than the nutrient variables

being the most influential. Several explanations exist

decrease

the

population

and direct phytoplankton the spring, when diatoms

community. when

blue-greens

the

However,

phytoplankton

(Watson

this was

5 g.,1975).

Bacterial growth (total biomass) has been found to be maximal

289

after the diatom peak in algal biomass (chlorophyll

al.,

1979).

lence

While Mortimer

and

within

internal

Lake

waves

Ontario

(1974)

has

generated

contribute to

proposed

by

the

2 ) (Rao

that

turbu-

mid-summer

breakup

of

seiches

the major

2 peak through alteration o f the physio-

spring chlorophyll

chemical environment. Obviously, more detailed observations on the chemical environment, as well as chlorophyll zooplankton and

bacterial

a,

physical

and

phytoplankton,

measurements, coupled

with

primary

productivity data at selected stations, at a frequency greater than monthly, will be

required

to

resolve questions on what

regulates summer phytoplankton populations.

REFERENCES Chau,

Y.K.,

R.A.

Chawla,

1970.

V.K.,

2

-

1970: 659-672, Manual

H.F.

and

Vollenweider,

Distribution of trace elements and chlorophyll

a in Lake Ontario.

Department

Nicolson,

of

Proc.

13th Conf.

Great Lakes Res.

Internat. Assoc. Great Lakes Res.

the

Environment.

1974.

Inland

1974.

Waters

Analytical

Directorate,

Methods

Water

Quality

Branch, O t t a w a , Canada. Glooschenko, W.A., M o o r e , J.E. and Vollenweider, R.A. 1972. The seasonal cycle of pheopigments in Lake Ontario with particular

emphasis

Limnol. Oceanogr. Glooschenko, Surface Ontario

W.A.,

of

role

zooplankton grazing.

17: 597-605. Nicholson,

distribution and

the

on

Erie,

of

1970.

H.F.

and

total

Moore,

chlorophyll

Tech.

Rep.

No.

J.E.

1973.

5

Lakes

in

351, Fish.

Res.

Bd. Can. Glooschenko, W.A.

and Dobson, H.F.H.

Hutchinson, G.E.

1975.

1967.

A

Treatise o f Limnology.

Introduction to

lake

Wiley and Sons.

New York, New York.

International Quality

Joint

Board's

Appendix.

Water quality in

Nature Canada, 4: 3-6.

the Great Lakes.

biology

Commission. 1983 Report

and

the

1983.

Vol.

limnoplankton. Great

Lakes

2. J.

Water

on Great Lakes Water Quality.

Great Lakes Surveillance, 100 Ouellette Avenue,

Windsor, Ontario.

290

International

Lake

Erie

Water

-

International Lake Ontario Board.

1969.

Commission.

Report

Vol.

Kwiatkowski,

St.

to

Board

and

Lawrence Water Pollution the

International

and

El-Shaarawi,

surveillance

data

Great Lakes Res.

1970. for

Lake

2.

J.

3: 132-143.

Kwiatkowski, R.E.

1978.

surveillance

Scenario for an ongoing chlorophyll

plan

sampling years. Kwiatkowski, R.E.

Joint

A.H.

obtained

O n t a r i o , 1974 and their relationship to chlorophyll

-a

the

1, Summary.

R.E.

Physicochemical

Pollution

Lake

on

Ontario

1980.

Ontario, 1974-1979.

Chlorophyll

for

non-intensive

4: 19-26.

J. Great Lakes Res.

5 measurements from Lake

Canadian Technical Report of Fisheries

and Aquatic Sciences No. 933. Kwiatkowski, R.E.

and

surveillance d a t a ,

Neilson, M.A.T. 1968-1980.

Technical Bulletin No.

126.

1983.

Lake

Inland Waters Environment

Ontario

Directorate,

Canada,

Ottawa,

Canada.

1966.

Lorenzen, C.J.

%

of

A method for the continuous measurement

chlorophyll concentration.

Deep Sea Res.

13:

223-227. Lorenzen,

C.J.

1967.

phacopigments Oceanogr.

Determination

spectrophotometric

chlorophyll

of

equations.

12: 343-346.

MacKinnon, M.

and Kwiatkowski, R.E.

1980.

A

survey of A T P

concentrations in Lake Ontario, 1975-1976. Res.

J. Great Lakes

6: 177-183.

Nicholson, surface

H.F.

1970.

water

of

The

Lake

Tech. Rep. No. 186. Rao,

and

Limnol.

Ontario,

June

to

R.E.

and

J.D.H.

handbook

of

of

November,

Jurkovic,

Distribution of bacteria and chlorophyll

the

1967.

and

seawater

A.A.

1979.

2 at a nearshore

Hydrobiologia, 6 6 : 33-39.

station in Lake Ontario. Strickland,

content

Fish Res. Bd. Can.

Kwiatkowski,

S.S.,

5

chlorophyll

Parsons,

1968.

T.R.

analysis.

Fish.

Res.

A

practical Board

Can.

Bull. 167. Tolstory,

A.

1979.

phytoplankton Hydrobiol.

volume

85: 133-151.

Chlorophyll in

some

a

Swedish

in

relation

lakes.

to

Arch.

Vollenweider, R.A. loading

levels

1976. for

Advantages

phosphorus

Mem. 1st. Ital. Idrobiol. Watson,

N.H.F.,

Problems

Carpenter,

in

the

in

in

defining

lake

critical

eutrophication.

33: 53-83. and

G.F.

monitoring

of

Munawar,

biomass.

1975.

M.

Water

Quality

Parameters, ASTM STP 573, American Society for Testing and Materials, 1975: 311-319. Watson, N.H.F.

and Williams, D.J.

of a pilot presented

1975.

surveillance program at

Research,

the

18th

Annual

International

Design and operation

for Lake Ontario.

Conference

Association

on

for

Paper

Great Great

Lakes Lakes

Research, Albany, New Y o r k , 1975. Yentsch, C.S.

1970.

Productivity, Trebon.

The state o f

In

environment. pp.

chlorophyll in the aquatic

Prediction

489-592.

Proc.

and

Measurement

IBP/PP

Tech.

of

Meeting

This Page Intentionally Left Blank

GAMMA MARKOV PROCESSES

R.M. PHATARFOD Monash U n i v e r s i t y ,

Clayton,

Victoria,

Australia

INTRODUCTION AND GENERALITIES The statistical problem of monitoring of water quality over time is essentially a study of the time series of some variable of interest (usually called a parameter) such as, for example, dissolved oxygen, suspended solids, various kinds of chemicals, organic matter and other impurities, the values being observed at various intervals of time, such as a day, a week or a month. There are two situations involved here. One is when the observations are made on the body of water such as a lake. Here an unusually high observed value of a parameter would make us inquire as to its cause. This may lead us to check the input source, such as a river for excessive pollution, or acid rain etc. The other situation is when the observations are made on the input source itself, such as a river and the problem of interest is the effect the observed values (or a statistical model of it) would have on the value of some parameter in the body of water fed by that river. An example of a problem of the latter kind is: having observed the concentration of suspended solids in the river over a period of years, we would want to know about the probability distribution of the totality of the load of that type in the lake in the future. It is with the second problem that the paper is concerned. Let us denote the amount of water flowing into a lake, say during the ith time period by Qi , and the volume of impurities 1.

.

Usually the Pi would be estimated from (load) by Pi observed values of the concentration and estimated values of Qi If the impurities are conservative, i.e. do not decay over time, we are then concerned with the sequence of concentrations C, in j j the lake at time j , where Cj = C P./C Qi On the other hand, 1 1 1

.

if organic matter is involved, the concentration C, is given by j j C . = C bJ-i Pi/C Qi . We are interested in the probabilistic 3 1 1 behaviour of the sequence {C.) over time; of particular 3

.

294

interest is the probability distribution of the time N such that CN crosses a threshold T for the first time. NOW, even if the {Pi) and CQi) were a sequence of independent and identically distributed random variables, C C . 1 3 is a sequence of dependent random variables, the dependence being of a complicated kind, and the problem of finding the distributions of ( C . 1 and N is a formidable one. 7 As a first step towards the solution of the above problems, we consider here the simpler problem of formulating a model for input variables (loads), their properties, and their mathematical tractability particularly in the direction of the derivation of the distribution of their cumulative sums and weighted sums. In the context of water quality monitoring, we are in effect ignoring the variation of the Q ' s and concentrate on the cumulative sums of the P's. First, let us see some broad features of the input variables. As in many geophysical phenomena, a time series of input variables may be regarded as a realization of an autocorrelated stationary process which may be approximated by a Markov process. The random variables are obviously non-negative; they are usually of the additive type (except when they are of a biological kind in which case they are of a multiplicative type): they are continuous rather than discrete. These considerations lead us to assume an input model to be a Gamma-Markov. Let us now consider the properties a proposed model (Gamma-Markov) for inputs should have: (A) First, since the purpose is to study real life phenomena, and .therefore use historical data, one should be able to estimate parameters of the model from the data in an efficient manner. This is particularly important for geophysical phenomena, as in most cases the historical series are very short. (B) The model should be easy to simulate. Such a facility would allow us to study the behaviour of processes derived from the model when it is not possible to do so mathematically. (C) The majority of hydrological phenomena have a pronounced seasonal element: it follows therefore that the model should be capable of extension to take seasonality into account. The word 'season' is used here in the wide sense. A year can be divided into 4 (natural) seasons or 12 'seasons' (or months): a week is composed of 2 'seasons' - week-days and week-ends, etc.

295

(D) The model should be mathematically tractable;

in the present context this implies that one should be able to derive the properties of sums - weighted or otherwise - of the random variables involved. A survey of Gamma-Markov models was given in Phatarfod (1976). The situation has changed somewhat since then. First a new model - The Gamma Autoregressive model (Gaver and Lewis; 1980) has appeared; secondly, in comparing the properties of the various models proposed, such as simulation and estimation of parameters, the advance in computer technology has meant that what was a difficult problem for some models is a comparatively simple one now. There are five Gamma Markov models proposed - three of which have been proposed in hydrological context (to be sure, for water quantity, not quality) and two as models for point processes; the latter two are mathematical1 tractable - they would have to be, or otherwise they wouldn't have been proposed! Phatarfod ( 1 9 7 6 ) has given a description of the models by Thomas and Fiering, Yevdjevich, and Moran. It was shown that while these models have most of the properties A to C above, they do not have property D. A description of the Linearly Regressive Gamma model and the properties of its cumulative sums were also given there. What we consider here is the problem of simulation of that model, its seasonal extension and the properties of the cumulative sums - weighted or otherwise - of the seasonal extension. It should be mentioned that it is possible to obtain M.L.E. of the parameters of the models; for reasons of shortness of space these are given elsewhere; see Phatarfod (1985). For the same reason, the properties of the Gamma Autoregressive model are not given. It can be shown that that model cannot be extended to take seasonality into account. Let us now define our variables of interest by X , i.e. {Xi} forms a first-order Markov chain when in the equilibrium condition, X has a gamma distribution. First consider the case when the X represents impurities (loads) in the form of organic matter, i.e. the non-conservative case. It is then of interest to study the distribution of Yn - Xn + bXn-l + ... + bnXo , where 0 < b < 1 , and represents the rate of decay.

Assuming that the process has been going on

296 m

for a long time we replace this by

Yn

= C 0

brXn-r

.

This is

considered in Section 4 for the (seasonal) Linearly Regressive model. Secondly consider the more difficult case when the X represents conservative matter. The cumulative sum CXi must increase over time, and what is of interest is the first time N T NT such that C Xi 3 T , i.e. crosses the threshold T For

.

1

example, X may be volume of suspended particles, and T is the volume such that EXi? T implies that the outlets of a reservoir are blocked; NT is then the life-time of that reservoir. When the sequence Xi are i.i.d., the asymptotic cumulants of NT are most easily obtained by applying Wald's Identity (see e.g. Cox and Miller; 1965). For, we have ignoring the overshoot over the barrier T ,

]=

-NT-1

1

where f*(t) is the Laplace Transform (L.T.) of each Xi . Therefore, if NT is large so that NT+l can be replaced by NT , we have

- SNT log e [ E

] = tT

(2) where s = l o g f*(t) . The left-hand side of (1) is the cumulant generating function (g.f.) of NT and hence the above two equations show that the cumulant g.f.'s of NT and of X are inverse functions. Inverting ( 2 ) we have,

.

It then follows where K ~ ,K ~ ,... are the cumulants of X that the first four cumulants of NT are asymptotically, (3)

Now it is shown in Phatarfod (1971a) that an analogue of Wald's Identity holds for the {Xi} forming a Markov chain

291

for which

n E[exp(-tC Xi]

Q

D(t)A(tln , in which case

f*(t)

is

1

replaced by A (t) . If now log A (t) can be expressed as r xKr(-t)/k! (where K r need not be cumulants) the cumulants of NT are given by ( 3 ) . 2. THE LINEARLY REFRESSIVE MODEL be a sequence of random variables Let Xi (i = 0, 1, 2, ...) such that the conditional L.T. of L 1+1 ~ , (tJxi)= [l+ta ( 1 - p )

I-'

Xi+l

Xi = xi

given

exp [-ptxi/{l+ta

is

11

(1-p)

(5)

cr,p>o, O < p < l From this, the conditional density of

Xi+l

xi

given

=

x.

is

where Ir is modified Bessel function. The equilibrium distribution of X is a Gamma (p,a) distribution, with L.T. Lx(0) = (l+ta)-' . From ( 5 ) , we have E(Xi+llxi) = pxi + (1-p)pa 2 2 v(xi+l(xi) = 2Pa(l-P)xi + pa ( 1 - p ) Corr(Xi,x. . ) = 1+7

Denoting

pJ

o s

p <

pxi/{a(l-p) I

L ~ , (tlx.1 = e-A c 1+1 r=O

1

A , we have

by

A'[l+ta

(1-p)

3 - (p+r)

r!

Eq. (6) shows that given Xi = x , Xi+l is a Poisson-Gamma mixture, i.e., 'i+l has a G(U+p,a(l-p)) distribution, where is a Poisson variable with mean A This result allows us to

U

.

simulate the sequence xo, xl, x2, ... as follows. Generate a value of a Poisson random variable (e.g. by using IMSL/GGPOS(1980) subroutine) with mean A = pxo/{a(l-p)) Call it u Now generate a value of a gamma random variable (e.g. by using IMSL GGAMS(1980) subroutine) with parameters u + p , a ( l - p ) . The value so obtained is x1 This procedure is repeated n

.

.

.

times to give the sequence xo

I

x1

, x2

I

-.. I

x*

.

n The L.T. of the sum given by (see Phatarfod;

Sn = C xi 1 1971b).

for a given

Xo = x

is

298

L n '

n n (tlxo) = CIAul (t) + Bu2 (t 3 -p pl(t) ,

where

For large

n

p2

(t) = %[l+p+ta(

-p) *{

[l+p+ta ( 1 - p ) 3

- 4P 1 %

*

we have

(tlxo) Dpl-np(t) Lsn from which we obtain,

From (7) we obtain us

h(t) = pl-np(t)

and the cumulants of

3.

(7)

NT

.

Expaiiuing log A(t)

gives

as

LINEARLY REGRESSIVE SEASONAL MODEL We first consider the case of two seasons.

The sequence

...

) of random variables form a cyclic Markov Xli, X2 1. (i = 0,1,2, sequence with the transitions X21. + Xl,i+l and Xli + X2i given by the conditional L.T.'s as follows:

L ' 1 , i+l

(tlx2i= y) = [l+tal(l-p2)1-p expI-tp2a1y/[a2+ta1a2 (1-P2) 11

(tlXli = x)

= L(a2'a1,P1IX)

.

The equilibrium distributions of X1, X2 are gamma with parameters ( p , a , ) and (p,a2) respectively. Also, corr. (Xli,X2i)

= p1

Equations ( 8 ) ,

where

X1

=

;

corr(X 2i' X 1,i+1)

=

~2

*

(9) can be written as

~ ~ x / { a ~ ( l - ~,~X2 ) }= p2y/{a2(1-p2)I

.

Thus the conditional distributions are Poisson-Gamma mixtures. Given X2i = y , Xl,i+l has a G(U + p , a l ( l - p , ) ) distribution where U is a Poisson variable with mean p2y/Ca2(1-p2) I and similarly given Xli = x I X2i has a G(V + PI a2(1 - p,))

299

distribution where V is a Poisson variable with mean plx/{al ( l - p l ) } . From this the sequence xll, x21, x12, x22, , . can be generated in a manner similar to that for the non-seasonal case. As in the non-seasonal case, the particular form of the conditional L.T.'s and the Markov dependence can be exploited to n (Xli + XZi) . The derive the L.T. of the sum Sn = C i=l derivation is somewhat cumbersome and is given in Phatarfod (1985). We have

.

Ln(t) = E where ul with H = Thus ,

Moreover, expanding log A(t) gives us (for simplicity, we consider only the mean and the variance).

t K1

= P(a +a )

1

I

2

K2

(a12+a2

2

1

(1 2

T

E(N ) = T p(al+a2)

Var(NT) =

-

2a1a2(P1+P*)

IP

PlP2)

2

1 (1+P1P2)+2ala2 (Pl+P2IT 2 3 (1-P1P2) P ( a 1 + a 2 )

(a1 + a 2

,

+

(l+PlP2)

=

For the three-seasons case, the transitions of the sequence are given by the conditional L.T.'s, Xli, X2i, X3i L (tlX31. = '1, i+l Lx2i(tlXli = X) Lx3i(tlx2i

=

Y)

2)

=

L(a1ra3r~3rz)

= L(a2ralr~~r~) = L (a3ta2 r ~2 ry)

with the equilibrium distribution of Xi being Gamma (?,ai) and Also, Corr (Xi, Xi+l) = p . (i = 1,2) and Corr (X3, X1) = p3 using the results from Phatarfod (1985) we obtain 3 K~ = p C a. and l 1 K 2 = P[( 1 + P l P 2 P 3 ) xaif2ala2 (Pl+P2P3)+2a2a3 (P2+PlP3)+2ala3(P3+PlP2 ) 1/ (l-PIP 2 P 3).

.

4.

WEIGHTED SUM OF SEASONAL GAMMA VARIABLES For simplicity, we consider here the case of two seasons only.

Let Xli, X2i (i = 0, 1, 2 , Section 3 and let

...)

be the sequence as given in

300

NOW, from (11) w e have

- b2i+l Qi+l

+

a 2 p 1 R1. / a l ;

zQizi

al(l-b R(z) =

R.zi

=

2

2)

(13)

alb+a

z (alb+a2p1)

=

+ alp2Qi/a2

we obtain

from which by t a k i n g g.f.'s Q(z) =

= b2i

Ri

i

q = Q ( 1 )=

(l-plp2z)

a1 (1-b

2

p

2 1

1 (1-p1p2)

a2+ba p (14) a2+ba p z 1 2 1 2 ; r = R(1) = 2 2 a 2 ( 1 - b 2 ) (1-p1p2z) a 2 (1-b 1 ( 1 - p 1 p 2 )

S u b s t i t u t i n g i n (12), w e o b t a i n 2 ~-r = p ( a 2 + a l b ) / ( l - b ) = pal/(l-b)

when

1

To o b t a i n V a r ( Y 2 n )

u

=

a1 = a 2

= u2

,

.

w e have,

d l o dt2 g L i ( t l~ ~ - ~ ~ a ~ ~ l - ~ ~ ~ ~ f - ~ ~ ~ l - p ~ ~ h(15) ~ + a ~ ~ l -

L

where, F i = K i " ( 0 )

,

Si = J i " ( 0 )

From (11), w e h a v e , 2 - a2p1 F. - 2a2 P1(l-P1) 'i+l - -

al

al

s = CSi,

h = CRi

2

,

s

and

f

2 Ri

YP2 ,F.=-Si-

a2

2a12 (1-p2) Qi

2

"2

i and s o l v i n g f o r s and f , w e i n t e r m s of h and t . S u b s t i t u t i n g i n ( 1 5 )

from which, summing o v e r gives us

f = CFi,

.

t = ZQi2

obtain,

,

Also, from (13) we have on squaring and taking g.f.’s, 2apb t = - b24 + - a 2 ’12h+--R(b2) 1-b 1 O1

2

1-b

(17)

“2

Using (14), (16) and (17) we finally obtain,

For the case

,

a1 = a 2 = a

.

p1

-

- p2 = p

,

o2

reduces to

pa2(l+bp)/[ (1-b2)(1-pb)] This final result was obtained by Lloyd and Warren (1981) using different methods. REFERENCES Cox, D.R. and Miller, H.D., 1965. The Theory of Stochastic Processes. Methuen & Co., London. Gaver, C.P. and Lewis, P.A.W., 1980. First-order autoregressive gamma sequences and point processes. Adv. Appl. Prob. 12, 727-745. IMSL/GGAMS, 1980. International Mathematical and Statistical Libraries, Houston, Texas, Vol. 2, 8th ed. IMSL/GGPOS, 1980. International Mathematical and Statistical Libraries, Houston, Texas, Vol. 2, 8th ed. Lloyd, E.H. and Warren, D., 1981. The linear reservoir with seasonal gamma-distributed Markovian inflows. Time Series Methods in Hydroscience, Edited by A.H. El-Shaarawi and S.R. Esterby. Elsevier, Amsterdam. Pegram, G.G.S., 1974. Factors affecting draft from a Lloyd reservoir. Water Resour. Res., E l 63-66. Phatarfod, R.M. 1971a. Some approximate results in Renewal and Dam Theories. Jour. Austr. Math. SOC., 12, 425-432. Phatarfod, R.M. 1971b. Sequential tests for normal Markov sequence. Jour. Austr. Math. SOC., 12, 433-440. Phatarfod, R.M., 1976. Some aspects of stochastic reservoir theory. J. Hydrol., 30, 199-217. Phatarfod, R.M., 1985. The linearly regressive Gamma process. Stats. Res. Report No. 130, Dept. df Mathematics, Monash University, Australia.

DYNAMIC C O V A i U A T I i ADJUSTMEN'I' OF' IGA'i'ER

QUALITY PARANETEKS

FO K S ' T R E A f i F L O W : T R A a S F E R PUNC'I'ION X O D E L S E L E C T I O N LAK17Y D. H A U G H , Y O S U K E NODA, J O A N M C C L A L L E N of

University

Vermont

A BS'T K AC'T

Meekly concentration dat2 for

toM phcqhorus and

total suqpended d d s , as

w e l l as mean wee'dy f l o w , w e r e studied for a period of 229 w e e k s (mid 1979 t~ late 1983)

at

one station

in

the

River

LaPlatte

Waterdied

of

Vernont.

The

autixorr&>tion patterns in these data and thek :;tationarity in the mean are der;cnwd

via selection oE u n i v a t k t e A R I M A time Series models.

The r h t i o n s h i p of t h e

concentration data to the f l o w d a h is modelled by selection of a tr.msfer f u n d o n Contrasts are drawn between stationary and nons?htionary

model.

difEerenced) mod&.

The relative advantage of

understanaing trends in concentration h considered.

(first order

t h e flow-dependent

model for

The model provides a dynamic

adjlstinent, or m v a r i a t e a n d y s k , of concentration changes due to Elow, which would be critical in messing intervention effects.

Such a model a k captures ~ seasonal

variation in concentration via the m n a l variation i n f l o w .

This study focuses on the relationshq3 between w a t e r quality parameters measured

in a

stream, such

as suspended

ads,

phcsphorous

and

nitmgen

concentrations, and t h e stream's f l o w rate. O u r objective w a s statistical in nature, in

that w e wished to assess whether a standard time series transfer function model would be adequate to describe such relationships.

'The water quality parameter; mentioned

are exam,des of outcome v a r i a b k that would be considered in many w a t e r quality monitcxing program. The , m u l a r data discussed here arase in a long b r m water monibring project i n the L a P l a t t e River watewhed of Vermont. Of interat in such studies are short and long t a r m trends in w a t e r quality as well as the e f f e c t s of various changes in t h e environment

OK

man-made interventions, due to changes in

agriculturdl or industrial practices. O f course to properly asses the eEfect o€ interveritions or understand the reasms f o r trends in w a t e r quality, one m u j t firt account €or the changes i n such q m t y vadahlerj which would just be due to natural changes in the w e a t h e r or

envjrmment. In Lpart.icular,the concentration of various elements i n a stream w j l l be

a function of the s t r e a n ' s f l o w rate. A s t h e f l o w rate changes with rainfall or with natural alterations in ~unoEf patterns, the concentration of various elements in t h e strsam w i l l change, quite apart (or in addi*&n) to man-made interventions. ' T h s any

303

statistical study of casual effect.;

in

w a t e r quality m u s t account f o r such a

time-varying covariate as streanflow. example).

Of

(Kirsch, et. al., 1982 emphasize this f o r

come some irkenrentions by their nature [nay aka alter total

s t r e a i n f l o w or the timing of appearance of water i n the streambed and such effeects

would require separate analysis to W - y the confounding that would then n&t o t h e r c a u s e of flow changes.

In

with

the Vermont study t h e intervention of primary

concern w a s not expected to alter streamflow directly, but m y aitV coricentrations for other reasou. It is hoped that such transfer function analyses of water quality parameters w i l l h e l p advance t h e state OE t h e art in a s s d g trends and intervention.

Although such

analytical kchniques have k e n available f o r s o m e time (Box and JeniCinS, 1976) and have been applied in hydrobgical or water quality work in other conteet. al., 1 9 U q

wel, et.

(Snorrason,

al., 1985, and r e f e r e n c e therpk), they are not used as a

standard Inethodalogy in this context (see McLeod, et. al., 1983 and Damdeth, 1986, though for r a t e d studies). A h , the inclusion of ELOW mte explicitly as a variable in the analysis of seasonality.

water quality paraineters may serve to simplify the analysis of

T h a t is, much of Me x a s o n a l i t y in elemental concentrations is directly

due to of at Least confounded with changes i n f l o w rate through the seasons of t h e year.

R a t i i e r than forcing inore arbitrary seasonal adjustments on concentration

levels, which typically would somehow average o u t f l o w ,ptt.erm acres ye-, be better

it could

to s e models incorporating f l o w expliciuy via a s i h a s t i c transfer function

model, or perhapi a recjresion model i n special cases.

In t h e following section the particular Vermont t i m e series data to be analyzed w i l l be ,aced i n pempeedive. sjmmarized.

Then the statistical methods to k used w i l l ce

The results of univariate t i m e sene analysis oE toM suspended a d s

(TSS)totA phcsphorous and sWeamElow w i l l k e given. Th? rdationship of TSS and TP to s&.r?amEbw w i l l be summarized via c m correlation analysis and selection of t r a d e r function model; to be fit.

Finally aore general remark w i l l be gi.3e.n i n the

discussion section. L4PLA'r'f'E RTVEK WAIXRSHED P R O J E C ' I '

'The

LaPlatte

River

rlakrshed

lies in

northwestmi Veraont, USA. It is a mapr non.&t which has rssulted

southern

in p30r water quality in the river

ChainLiLain into which it ?mpties.

Chittenden

County

in

soure of phcsphoros and sediment and the Jhelbumr Bay of Lake

Fifty percent of

the w a t a s h e d is used for

agricultural purpzses w i t h considerable acreage in cropland and large numbers of

cattle in the watenhed. ZroL&nd er&n and poor management of animal manure w e r e felt to be drjnificant nonpoint source contributors to these water quality problems. The data used in this study were taken within the Mud Hollow Brook

304 subwatershed area, t h e brook itself being characterized by high turbidity.

This

corresponds to Station iqo. 2 w i t h i n t h e w a t e r quality monibring program of t h e L a P B t t e River Natershed P m j e c t of the Vermont 'Water R e m u r c ~ sResearch C e n t e r

of the UniverJiy of Vermont which S sponsored by t h e USDA S a i l Conservation Service,in a-qperation w i t h o t h e r local agencies within Vermont. See Figure 1. The reports of the project (e.g. N&, 1983) should be Seen for furtlier details of the watershed area, s a m p b g bcations, prccedures used in obtaining the quality and w a t e r f l o w measurements, as w e l l as motivation for establishing best management practices

The 1983 report serves as t h e

(BNP) to help control manure runoEf i n the area.

source for the general comments nade here, and t h e scolpe of t h e project is summanzed in Meals, 1985.

F y . 1. LwatLon of t h e LaPBtta riivo-r d a k r J h e d .

i 4 o m t x x q S M m n No. 2

The subwatershed mrresLwndmg to

IS h m e n e u .

watsr quality rnonibring for the project k g a n in Spring 1979. 'The p u r L m e s of the monitoring program include evaluating the i m ' o a c t

of implementing conservation

practices on t h e surface wat;?rs of the watershed and on the c x p r t o€ jediment and nutrients to S W u m e Say.

Fiv2 automated sstations

w-2

&zaolkhed

in the

watershed for t h e p u p s , at d i c h s e u i w n t , p h a p h o r o s , and nitrogen and offier p l h h n t s arz determined on a reyular time-s;lm@ng b&s.

part oE a n eleven-year study (1979-1990).

'The s a m p h i g prograin is

'The Mud i l o l l o w B r w k Station (.Lo.2)

monitors a subwatenhed of a b u t 4000 acres. The s o b in h e a r a are predor:iinat?kj lacustrine clay and t'le area contains a variety of W g e and m a n u s :nanagement practices w i t h a high l e v 4 of farmer a r t i c i p a t i o n

Li

B H P implementation.

'The only

305

a a p r p i n t surce is a w a e r t r e a t n e n t Lnlant w!lich b downstream of this bbtion. Stream >&age w a s measured at Station 2 by bubbler-type recorders and &charge d a b w e r e calculated &g

sbge-discharqe ratings devodoped earlier and , w r i O d i d l y

updated €ram manudgauginy data.

Samples for quality variables w e r e taken every

in the laboratory

eight hour; automatically and corn@tsd and one 72-hour =n@e m c h week.

;ilinpk

b yield four 24-hour

Analytical analyses f o r total ?hcqhorous,

total suspended solick a i d total K j e l d a X Nitrogen were performed in t h e University WaQr Quality Laboratory according to a c c e p e d analytical k c h n i q u e ( A P d A standard methods). In the statistical studies which served as a basis f o r our study, both the

concentrahon and mass of the v a r i a b h totd p h o s p h m o s ('TP), total suspended solrds (TSS), and totd K y k k h l nitrogen ( T K N ) w e r e consickred.

concentration are miUlgrams per liter.

The medurement d t s for

The mean weekly f l o w ( N N F ) is measured in

in units of punds per week k denved koin concentration on an aggregated weekly basis .zs bei73 proportional t? the product of cubic

f e e t ,per

second.

concentration and f l o w . be on the r&tiOnsm

Mas;

B e c a s e of space l i m i t a t i o n s t h e emphasis in tkis paper w i l l

of both TS3 and TP to d W F. 'PKN was also analyzed similarly,

but thee w e r e long stretches of mis+xi.ng values, and the results are not @ven here. The total data available f o r this study (TSS,I P , d d F ) spanned t h e t i m e period

of 1979-1983, f o r 229 consecutive wee'=. O V E R V I E r J O F STATISTICAL A N A L Y S I S

are

Wabr quality and Elow data such as these have a s u k t a n t i a l inherent variance, to extreme events (rainst0rms, etc.) and subject to sampling and

sub-

measurement errorj.

Thus a g r e a t deal of initial data editing is p r e r e q ~ t eto a

formal m u h v a r i a t e t i m e series analysis. values in any of the v&&,

&g

unfortunately there is also a ,rntakial f o r due f o r example to automatic samplers not

f u n c t i o r i q . A standard regression analysis is not a f f e c e d by these as much as a transfer f u n d o n fit because of the time lags inherent

in the htter models, and a

method f o r supplying missing values consistent with the autocorrelated model Deing fit would be the 'kst practice.

Fortunately in t h e a c t u a l mod&

w e fit to the data, the

extent of a u t c c o r r e h t i o n w a s weak (because w e are using weekly values), so that occasional missing values could more safely t x supplled by i n t e r p l a t i o n . Based on a variety of sitistical tools, including t i m e plots, the autocorrelation function

(ACF),

partial autocorrelation f u n d i o n (PACF),

consideration of

time

differencing and transformation of t h e values, a univariate t i m e Series model was identified, fit and checked f o r each of t h e vatiabls (Box and Jenkins, 1976).

Vadous

extensions (for shifts in level, seasonality and trend) of t h e basic A R I M A models often ars adequate to consider for this p .The models are fit and checked u i b g an

306

efficient statistical &mation

method, which i n our case w a s the maximum likelihood

method approximation available in the BMDP2T program of B M D P (Dixon, 1981). C r o s ; correlation of the univ.xkte residuals from each sedes fit can be u;ed as an aid to identification of a transfer function model relating quality variables to f l o w ( H a q h , 1976; Haugh and Box, 1977). The cros correlation approach available in B M D P follows that outlined in Box and J e n h (1976), in which t h e univadate model form of the f l o w series i s u;ed to transform c o r r q x m d i i g l y the quality vadable series, so that the

crcx~scorrelation

fundion between the f l o w residual series and t h e transformed quality series is pro,uortional 'm the transfer function weigh& relating Elow to the quality Series. The p t t z r n in this ccosj correlation function serves to identify the form of a rational transfer function model which can be fit.

The residuals of such a fit are used to

identify (via e.g. ACF) t h e autocorrelation form of the noise component of t h e model. Final selection of a particular model w i l l be based on various critsria.

Statistical c n W which can aid in the selection include t h e mean quare error of t h e residuals (MSE) or an A r n e - b a s e d

m k e r b n of the AIC

Ultimately t h e selected model must also be tes*d it is to be used in a forecasting application.

form (Hipel, 1981).

for out&-sample

forecast error, iE

in this study is not forecasting however, but adjustment of the quality variable Series to changes in O u r primary application

flow. Box and Jenkins (1976) rightly emphasize the iterative nature of the model

building proces.

Although automatic selection criteria like AIC can be helpful f o r

selecting a best fit model w i t h i n a specified class of models, one should not try to

-ape

from t h e diagncstic checking and identification stages of the whole process.

One m l s t imaginatively

~se graphical

and numerical checks of f i t t e d models to help

discover ways that the model could be improved. Although mathematically a stochastic time series model is either stationary or nonstationary, it is not d w a y s so clear c u t in t h e data record of a few years length which would provide t h e more useful model. Thw in practice Series can appear to be "nearly nonstationary," this also often being the case in economic apphcations. Thus

in this study a special e f f o r t w a s made in fitting transfer f u n d i o n models to e x e r e f a y each of the @bilities

that could exist with res,oed to whether the quality

variable or f l o w w a s considered as stationary or requiring f i r s t differencing.

A s long

as one fallows u p each W h i l i t y in efficiently fitting in appropriate transfer function mode, one w i l l find approximately the S a m e adjustment model implied f o r f l o w . U M V A R I A T E DESCRIPTION O F THE SERIES

In this section s a m e univariate time sedes models w i l l be given for TSS, TP and MWF. However w e have chc6en to ignore longer lags (q.seasonal lags) in the

307 moaeling presented here.

Autocorrelation coefEcients for lags less than 52 w e r e only

marginally significant in any case (say at about 26 wee'ffi) and w e evenutally w i l l

€Low itself via t h e transfer f u n d i o n model to explain seasonal efEects.

ffie

O u r zmphasis

here is on whzther the series for shorter lags are stationary and what sort of A R I M A model a t l o w lags may be n e c e s a r y to explain short t e r m persistence. Parenthetically it should be noted t h a t events such as spring runoff affecting f l o w do not occur a t fixed weeks in the year, so t h a t s9xxhastic seasonality w i l l be present (Damskth, In t e r m s of the A C F the seasonal effect a t say a 52 week lag can be fairly

1986).

weak, and not sharply focused at 52 w e e k , but &t

aLr;o a t neighboring lags.

O u r purpase is not to fully characterize the seasonality of TSS or TP but

to a d j s t ultimately the quality Series by t h e f l o w Series.

ix-d

A general principle

that is of value to keep in mind when dealing with seasonality is to a i m f o r e x p k i t e l y including t h e c a w of seasonality in the model whenever , d b l e , mther than just describing it via a spectral analysis or high order A R I M A model. ffie t h e

In our case w e can

f l o w series a s a surroyate f o r seasonal efEect in general.

Incluion of the

f l o w series as an independent variable in the trans€er function model f o r a quality serie5, if successful in

this reyard, would yield a residual d e s which is assentially

nonseasonal. Mean Weekly Flow (NWF) Stream discharge is considerzd on a mean weekly basis, and generally is expected to follow a seasonal pattern of high and variable discharge during the w e t fall season, l o w e r values in the winter, peak values in the Spring and very l o w f l o w s during t h e summer (M&,

1983).

See Figure 2. Graphical analysis of the weekly f l o w

figures indica'e t h a t a log transformation would be helpful, and hencefortb just results

for log

ly

WE' are given.

See Figure 3.

For t h e given s a m p l e data there is =me

y u e s j n as to whether a first order differencing of the series is d a i r a h l e or not.

In

t h e modelling process w e tracked through to completion the t w o a l t s m a t i v e strategies 01not dzferencing or diEferencing the Series.

Any higher order diEferencing is not

needed. Asuming a weakly stationary model for f t o w yields an AR(1) as the selected model among the class of l o w order A R M A models, (1-.75B) zt = at, with an error standard deviation ( R M S E ) of 0.71, where B s t a n d s f o r t h e usual Dackshift cperator, a t b w h i t e noise error and z is mean corrected log MWF. Summaries of this and other hv&te

models appear in Table A

In fact the best model among the class of low order AEUMA models with first differencing had a standard deviation somewhat larger (0.80), (l-B)zt

= (1-.265B)at.

Note t h a t rewriting this I M A (1,l) model as a n infinite order A R model yielck 2 (1-.735B-.195B )zt = at up to second order, which agrees approximately with t h e first

308

I]

o h . . AUG

STREAM DISCHARGE - WATERSHED 2 YEAR 5 SEPTEMBER, 1982 - AUGUST, 1983 LAPLAlTE RIVER WATERSHED PROJECT

.,. .. .. . ....... . .. .Y. .. . . .. I . .

SfP

OCl

WOV

l9W

DEC

JbN

(013

FES MAR

-. . . .... . APR

YAY

.I..

JUN

.

.r.,

N L

. .,...I AUG

SO'

nuE

Fig. 2 . Stream Discharge on a weekly b a s i s over a one y e a r period, showing seasonal highs and lows. TRENDS IN STREAM DISCHARGE - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLATTE RIVER WATERSHED PROJECT

0.1

1979

1980

1981

1982

1983

1984

llME

F i g . 3. Stream Discharge on a weekly b a s i s over more than a f o u r y e a r period, now on a l o g s c a l e .

309 order t e r m of the AR(1) model above.

This raises the general point that when fLtting

a l k n i a t i v e AiUIMA models, it should be found t h a t contending models which f i t w e l l

w i l l be s i m i l a r to each other when expressed simultaneously as infinite a u t o r e y r d v e

or infinib moving average models in terms oE their coefficient weights in such expansions. Parenthetically, the v&b& in all stationary u n i v a i k t e f i t s her2 are mean corrected as necessary, and the estimated means of the differenced series model in all

cases w e r e i7signiEicantly different f r o m zero. Although t h e rer;ults w i l l not be given here, several alternative, higher order A X M A models w e r e f i t and found to be w e r L p r a n e t e r i z e d relative to the mod&

displ3yed in Table A. Such dmphfications in model can be justiEied via significance trsts on the a t i m a t e d coefficients or via t h e AIC cnt&n. The same modelling approach of overfitting and then &n&.fying when reasonable w a s s e d f o r all univ3liate v&!de fits discussed in the following sections. T A B L E A.

FI'r'IED U N I V A R I A ' T E A R I M A MODELS

Variab k

Log Mean Wee:dy Flow

Log Total S s p n d e d

A odd

Parameter t i t i m a t e with s t m d a r d -%rots

Standard deviation

(K MSE)

Ail(1)

0.747 (.044)

0.711

IM ~ ( 1 , l )

0.265 (0.64)

0.799

A K(1)

0.682 (.O48)

0.638

M I A(1,l)

0.334 (.063)

0.670

M I A(1,4) e 2=o

0.439 (.056) 0.180 (.062) 0.238 (.065)

0.636

0.536 (.056)

0.428

0.366 (.loll -0.246 (.103) 0.238 (.1)64)

0.417

0.454 0.445 0.163 -0.147

0.424

Solid5

Log Total Phorjpharus

AR(1) AR

A(1,8)

=...=e 7=0 2

0

IMA(1,lO)

8

3

=...=€I =3

8

(.060) (.06r)) (.062) (.061)

310 Total Suspended S d d s (TSS) In general the sediment amcentration, as measured by Total Suspended Salids (,I'sS) is responsive to the changes in strean discharge, and thus i s highest i n t h e rainy fall and spring runoff periods. See Figure 4. In the analyses considered here the bgarithm of TSS w a s used as the quality v-ble. The best fitting b w order A R N A model is a n AK(l), (1-.682B) xt = at with an

d l t v n a t i v e l o w order first difference model being again the I f i A ( l , l ) ,

as (1-B) zt =

(1-.334B) a (see Table A). H o w e v a f o r this vadable s o m e more comLnlicated, but s t i l l t relatively b w order, ARIMA models actually had a l o w e r mean q u a r 2 d error and AIC value.

In

prtxular

3 4 (1-.4398-.180B -.238B )at.

the

favOwing

model

was

better:

(l-B)zt

=

This is an example of a subset moving average model,

wher--in some of t h e usual M A parameters (here just 9 ) are set to 0, and makes the 2 general p i n t that best fitting models in t h e A R I X A clas c a n be oE this cons3xained

parameter form. The iligher order terms present here indicate a "persistence" in ' E S exknding beyond the one week lag in diEfwenced values.

TRENDS IN TSS CONCWTRATON - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLAllE RIVER WATERSHED PROJECT

llME

Fq. 4. 'TOM. Suspended Solids ('TSS) concentration on a wee'dy hiis over aore than a four year ,?riod. Total Pm+kx.is

f rP)

A s with susLpended soh&,

the concentration of tot4 phos9horou.s generally

follows changes m stream discharge, having seasonal peaks i n the f a l l and spring and

311 its b w e s j values i n the winter and summer.

See Figure 5.

A

logarithmic

transformation is s e d b e b w , also mean corrected in the stationary case.

Treating

the series as s b t i o n a r y , a simple model to consider would be the A R ( l ) , which i n fact would be preferred to an AK(2) or A R M A ( 1 , l ) model, (1-.536B)z = a However a t t t' somewhat longer lags there is also autixomelation which is a t least margindly signiELcant.

Far e x a m p l e t h e following model has a better X S E and AIC than the = (1+.246B + .238B 8)at. This again is a model with longer

A R ( 1 ) model, (1-.366B)zt

a p p a r m t persistence than the AK(1) model.

TRENDS IN TOTAL P CONCENTRATION - YEARS 1-5 WATERSHED 2 - MUD HOLLOW BROOK LAPLATIE RIVER WATERSHED PROJECT

0.0

1979

1980

1981

1982

1983

1984

TIME

Fig. 5. Total Phosphorus ( T P ) concentratiori on a weekly basis over nore than a four year period. An important general point to beware o€ in cases like this, wheri it nay bs diEficult to argue why there should a stronger l a g 8 p s i s t e n c e than at lags

intermediate,is that the large correlation may be somewhat artifactual, and i n fact due to a few unusual ozcurrences (e.g. outlying v a l u s ) in the data set. A scatter ,plot of zt v-wsus z f o r exaznpk may reveal a f e w unwual i x i ~o€ pints t-8 contributing to the large autocorrelation.

Considering the differenced sedes, again a longer lagged model happens to f i t 2 9 10 = (1-.4548-.4458 -.163B +.147B )at which can be viewed as a subset

best, (l-B)zt M A model.

The sirn,&t

models do not ,oass d i a g n a t i c checks on their residual s a i ~ ~ .

312 T R A N S F E R FUNC'TION lYODELS FOR TSS

Once models have k e n identified f o r each of the sa-ies individually, a crcs correlation analysis of the p r e w h k n e d v e r S m s of each Series can be undertaken. The use of the &dual

s~riesfrom t h e univadate m o d 4 f i t in the cross correlation

analysis ai& t h e interpretation of tile observed lead-lag relations-, in that each Series autocorrelation structure has Seen removed (Haugh, 1977). We v i e w this crross c o r r h t i o n analysis as an aid to identi.€ying the dynamic relationship between t h e t w o s e n e (Haugh and Box, 1977).

The transfer function identification technique of Box

and Jenkins (1976) as implemented in BMDP just requires a prewhitenhg of the input series, which

is here iul W F.

4s a n exercise in model identification w e proceeded to identify and fit transfer

function models under the four &ble cases of stationary or nonstationary input and st3.ti-Jnaty or nonstationary output series. The s t a t h n a r y A R M A and nonshtionary A R I M A models f o r each of TSS and E l o w have been previouily described.

residual; of

Log f l o w f o r e x a m p l e was Ati(1) and the

t h a t AK(1) model can be crw correlated with the correspondingly

tramformed (ie. filtered) TSS series, as would be done in t h e BMDP approach.

The

only strongly significant point i n the plot is at l a g 0, indicating a contemporaneous Thus one idenida transfer function as zt = w (i4 W F ) + nt, wher2 z is the mean corrected TSS and n is the noise in the o t t t relationship and w is the r q r & n coefficient of log TSS on log Elow. 0 Having identified the transfer f u n c t i o n forms one then fits a model with an assumed A K N A noise model. The noise model identification can either w a i t for a n analysis Df the residuals and their ACF from a first f i t or one can calculat2 an -timated noise series a;; $ = z - G ( B ) ( l q MWF ), where in our case t t t A n = z - Go(bq id rJF ), and identiEy a noise model €corn the' AC F of the ;t. $ can t t t 0 be calculated €corn the v a l u e of t h e cross correlation a t h g 0 and the standard deviations of t h e input and output series. The ACF of the noise series hdxates a highly autixorrp-lated series, which it turns o u t can be w e l l fit by an AK(1) model (see Table 8). O t h e r slightly more comphcated noise models did not 6t significanay better, and the residual; Lpassod the usual diagncstic c h e c k . The r d d u a l smndard deviation of f i t w a s 0.682 f o r log TSS, which can be correlation (not unsurprkincj).

comparm to the u n i v x i a t 2 model error standard deviation for log TSS of 0.638. the inclusion of l

b

Thus

in~ a transfer function i n o u e l yields a n a d j a t e d series w i t h a 6 %

smaller sbndarct deviation, than an a a j w k d (or residual series) for b g ks own tistory.

O f cowzie C?e cornparison is nore drarnatic with tile s%mciard

deviation of log TSS itself, if no a d w t . n e n t were done a t all. Even though the reduction in error .jtandard deviation is not drainatic for the transf%r function model, if compared to the univariate model, t n a t is a F E p e c t i v e b a t is u;eEul only in tile situation of :>roviding sliort term forecasts of the quality

313 vathid.?.

Thar, is, for short t 2 r n pre&.ction (a few weeks ahead) one can do just

about as w e l l using a w e l l fit univxiate model, as it i n e n c e incorprates t h e f l o w history via t h e hktonc.al yuality lev&.

However the purpose of

OUT

fitting t5e

transfer functions is to underJt3nd h o w to adjust qudlity levels f o r k w changes in future intervention studies.

S o to d e k r m i n e whether trends in quality have k e n

significant over time or whether ther? are significant u f e r e n c e s in different >&retches of the monitoring r x o r d , it is important to have a proLper adjustment for k w so that one can s e i s t h e e f f e c t of other variables beyond those e f f e c t s simply

due to f l o w changes.

Thm t h e closer analogy is with a dynamic covariate adjvstment,

rather tiian w i t h dynamic forecasting. 'FABLE B. l?t'FFEU TRANSFER E U N C r I 3 N S F O R T O F A L S U S P E N D E D SOLI3S AiQD M L A N WEEKLY F L O h

Prmsforrnatmn of sene5 log TSS, b g

Iy

FunctLon Nme Cmfklents* Coef€n.ents** (Skndard Errors) Tra1lt;icer

CJF

b g TSSS, (1-B) b g

M XF

(l-Y)log TSS, log M N F

(l-B)log TSS, (1-B)log M rJF

* **

Error Standard DevntLon (RMSE)

0.298(.056)

Ati(l):0.707(.047)

.602

0.299(.054), 0.992(.008)

A R ( 1):0.689(.048)

.602

0.273(.058) 0.281(.057)

A R( 1):8.653(.060) M A(1):0.971(.023)

.610

AK(1):0.634(.061) M A(1):0.955(.021)

.608

0.277(.056)

Numerator coefficients given first,then denominator coefficients of the rational transfer function. Ths h t t w o noise models are AR(l), the next t w o are A R l Y A (1,l).

INonstationary Flow Imphcations In the above transfer function model fit M WF w a s t r e a t e d as stA%nary,

A R ( 1 ) model. sdected.

via a n

Suppose instead that a first differenced model form f o r is1 W r ' w e r e

W h a t transfer function model would

result?

In this case one would select say the I N A ( 1 , l ) model to filter the log €low and

104 TSS

data, before c r s conelation.

The ct-osj correlation in this case does not as

clearly indicate t h e pattern i n t h e t r a n s f s r function, as the weights at lags 0 and 1 are jEt marginally significant.

A s a ,&bk

summarization of the pattern (and in

this case w e know what w e might k looking for) one notes that the weights f o r positive lags drop slowly €corn the Lag zero weight. The identification then & of a 1 With this function, various l o w order naise

transfer function of the form wo(l-SB)-

.

models w e r e fit, t h e final selection being an AR(1) (see Table B).

N o t e that the

314 h

dynamlc praneter 6 = .992 is insignficantly different horn 1.00, which indica-

that

the transfer function combined with the (1-B) differencing operator of log M W P g i v s the s a m e cons-nt

transfer function as t h a t w i t h t h e stationary input log “N‘F

previoudy seen.

The noise models are also t h e S a m e m n t i i l l y , as is the error

sL3ndard deviation.

Thus one would need other bases beyond statistical fit to choose

betvJeen the stationary and nonstationary versions of f l o w as input. N o n s h t b n a q TSS linphcations In both of t h e above identifkations of t r a r ~ e rfunctions, TSS w a s treat&

as

s k t i o n a r y , but in this and in the next subsection w e w i l l follow y on the identification results i f TSS w e r e firt differenced k s d . If one treats flow as stationary and J5lt.e~it, as w e l l as the differenced TSS sene;, by ths AR(1) model, one sees a

c r s correlation pattern which has signiEicant

qke; at h g s 0 and 1 (other marginally significant lags w i l l not k explored in this

This leads to a transfer function of the form (wo - wlB). V a r i o u s l o w order noise models w e r e considered, w i t h the s i m p l e s t models, such as AR(1) and discusion).

MA(1) found to be inadequak. h

coefficients of ,$ = 0.653 and difEerent f r o m 1 (see Table B).

An ARMA(1,l) model did f i t adequately, w i t h

6

= 0.971, but the H A operatar is insigniEicantly

Also the fitted transfer f u n d i o n is (0.273

-

0.281B)

which is insignificantly diEferent from a model of the k r m w (1-B). Thus there 0 approximately’ a common f a c t o r of (1-B) in the transfer function, noise model and in t h e differencing used f o r log TSS.

Elimination of the common f a c t o r would yield a

model Like that €i.tst found between log TSS and log M WF, where each w e r e t r e a t e d as s b t i o n a r y initially.

Nonsthmary TSS And M W F Implications In this case both of the f i t s t differenced series are filtered by the univariate The result is a crcorrelation function w i t h just a

IMA(1,l) model f o r log f l o w .

spke a t lag 0, leading to a transfer f u n d i o n model of jlst wo.

Among l o w order

= (1-.955B)at t match the previous results seen when both series a r s stationary, To

noise models, t h e AHMA(1,l) is found to be adequate, with (1-.634B)n (see Table B).

there should be a (1-B) t e r m in the noise model.

H e r e the M A coefficient is

marginally significantly different €rom 1, although clearly clce to (1-B) in form. Eliminating t h e common factor of (1-B) in the noise model and in the differencing operators f o r output and input yields the equivalent model found when treating both

series as stationary. T R A N S F E R FUNCTION

MODELS F O R 8rP

A d m i l a r exercise w a s done for T? as w a s done with TSS.

@bk

Each of the four

differencing posSibilities among Ti? and M W F w e r e explored, as to thek

315

ultimate form of f i t t e d transfer functions.

R e s u l t s are summarized in Table C.

Not2

that t h e f5tted models are all within 2 % of each other in t e r m s of their error >&.anaard

deviation.

The forms of transfer function can be resolved with each otlier by t a h g

into account the differencing operatar and ignoring s m a l l coefficients. Far example the transfer function f o r log TP with (1-B) log MWF is (.182-.132B-.O84B 2) which when

factored

by

(1-B)

gives

(l-B)(.182+.055.034B 2...),

approximately t h e (.176) form aE transfer function f o r log

which MWF,

agrees

with

ignoring the

mefficients which are .05 and s m a l l e r . TABLE C.

FIT’IED ‘IHANGPEK FUNCTIONS FOR TOTAL PHOSPklORUS AND MEAN WEEKLY FLOW

Transfer Function Coefficients (Standard Error)

S€Z-kS

Tnmformations

log TP, log MiJF

0.176 (.036)

idaise

Coefficients

A R(1):0.389 (.097)

Fxror Standard Deviation (R MSE) 0.396

MA(1):-0.262 (.lo) MA(8):-0.202 (.065)

log TP, (1-B) b g MWF

0.182 (.035) 0.132 (.042) 0.084 (.034)

AR(lk0.340 (.103) MA(1):-0.286 (.102) MA(8):-0.224 (.066)

0.395

(1-B) b g TP, l a 3 MWF

0.163 (.036) -0.172 (.036)

MA(1):0.958 (.021) M A(1):-0.294 (.120) MA(810.171 (.069)

0.402

0.169 (.036)

AR(110.308 (.126) MA(1):0.948 (.022) MA(110.166 (.069) MA(810.20 (.120)

0.403

(1-B) kq TP, (1-B) b.j M WF

CONCLUSION A N D DISCUSSION Although

there w i l l be cases where the choice between sZationary and

nonstationary versions of the sedes are dear, t h e r e are situations where it is not obvious from the s t r e t c h of historical record.

For t h e purpcse of adjustment of the

quality series t h e distinction is not so critical, as a careful iterative model building

process of identification, estimation and diagnastic checking should bring us round to models which are close enough f o r practical purpxes. For forecasting purposes one would hopefully have o t h e r knowledge to guzde t h e c h o k e in t e r m s of the implied short and longer term forecast functions. As in any careful statistical analysis t i m e m u s t be invested in considering reasonable transformations of t h e data and i n ixybg various alternative model forms. Although a c r i t e c b n such as AIC can be helpful in selecting among several f i t t e d

316 models, one m u s t have investigated a broad enough class of models to insure that the

0ptim.d model w f l be u k m a t e l y found within the class.

In practice one should not have to explore as completely as w a s done her? tqe variom V b l e di;ferencings for each of the independent and deLpndent inodds in the It is comforting to realize that the ultimate model c h w n f o r historical

models.

adjustment w i l l be somewhat robust to such a choice.

One would choose, i n the

a k e n c e of other consid~rations,that ,nodel with a f i t c r i k r b n optimized (e.g. MSE or AIC ).

The adjustment procedure f o r t h e quality variables w i l l &,End

on the overall

assumptions m a d e about the joint relationshq among the quality series, t h e 3 o w and the otlier variables of inters* such as interventions.

If one wants to adjust for f l o w ,

a p a r t Erom any other factors, one could use the (mean-corrected) transfer function model, zt

= V(B)

M WFt

flow-adjmted series.

+

nt.

fit would then &come the the j3int model, if specified, one could fit

The r d d u a l series

In terms of

siinuhneously a model involving all components, zt = V(B) MWFt + (other variable effects) + nt.

This would give a dynamic covariate analysis of the effects of t h e

other variables on zt.

This of course assumes that thers h no interaction between

f l o w and the other variables considered.

In s o m e cases a more complex i n t e r a d i o n

model might be appropriate or separate fits may be appropiate depxding on the season or flow rate.

aka that the m u d transfer function model assumes a linearity of a f f e c t of That B, a Lparti.cularchange in Elow rate should c a s e the s i n e change in concentration, whether the concentxation is fe.latively l o w or reLatively high. I€that w e r e not true, a more complex nonlinear model would have Note

flow

011

elemental concentration.

ta be formulated, or separate models would have to be used f o r di€€erent ranges of f l o w level. The forms or' transfer function in our examples w e r e quite s i m p l e , for e x a m p l e between say log TSS anci log M WP (see Table B) all t h e lagged weights are zero.

This

jlnplles that a simpler pro,mrtional regression adju;tment for contemporaneous f b w

rate would be adequate.

However the autocorrdated noise in such a model serves

notice t h a t ordinary regression analysis would not have been statistically efficient in making the adjustment.

Similar resuks of a s i m p l e proportional a d w m e n t f o r f l o w

held true aLS0 for TI? and TKN, where again the noise model w a s autccorrelated. ACKNONLEDGEbENT

W e gratefully acknowledge t h e help given by the Vermont Water Resources

C e n t e r personnel, Dr. Alan Cassell, Don i4&

and most particularly Dr. Jack Clamen.

Computer time was provided by the Academic Computing Center of t h e University of Vermont.

317

REF E R EN C ES

Box, G.E.P. and Jenkins, G.M., 1976. T h e Series A n a l y s i s : Forecasting and Control, Revised Edition. Hdden-Day, San Francisco. D a r n s l e t h , E., 1986. Modelling River A a d i t y - A Tmnsfer Function Approach. In: A. H. El-Shaarawi, and R. E. K w k t k o w s k i (Editors), 1986. Developments in Water Science. S t a t i s t i c a l A s p e c t s of Water Quality Monitoring. %vier Science Publisher, A m s t s r d a m . Dixon, W.J. (editor), 1981. BMDP Statistical Software, 1981 Edition. University of Califonlia Press, Berkeley. Haugh, L.D., 1976. Checking the Independence of T w o Covariance-Stationary Tine Series: A Univariate Reiidual C r c s Correlation Approach. Journal of t h e American S t a t i s t i c a l A d t i o n , 71:378-85. Haugh, L.D. and Box, G.E.P., 1977. IdentiEicatbn of D y n a m i c Regression (Distributed Lag) M o d e l s Connecting T w o Time Series. Journal of the American Statistical Ascciation, 72:121-30. H i p e L , K.W., 1981. Geophysical lyodel Discrimination Using the A k a i k e Information Criterion. I E E E Transactions on A u t o m a t i c Control, AC-26:358-378. Flipel, K.W., McLeod, A.I. and Li, W-K., 1985. C a d and Dynamic Relationship between Natural Phenomena. In: O.D. Anderson, J. K. Ord and E. A. Robinson (Editors), 1985. Time Series A n a l y s i s : Theory and Practice 6. Elsevier Science Publishers, Ams&erdam, ,pp. 13-34. tlirLjh, R.M., Slack, J.R. and Smith, R.A., 1982. Techniques of Trend A d y & for Monthly Water Quality Data. Water Resowces R e e a r c h , 18: 107-121. M c L s A , A l . , Hipel, K.W. and Camacho, F., 1983. Trend A s e s m e n t of d a t e r Quality Tine Series. WatEr R e m x c e s B U t i n , 19: 537-547. Meals, D.N., Jr., 1983. LaPlatte River Watetshed Water Quahty r\loni'aring and Vernont idater Analysis P q r a i n , Program R e p & No. 5, P r o j e c t Y e d r 4. R s ~ u r c e sResearch Center, University of Vermont, Burlington. M&, 0. W., 1985. luonitoring Changes i n Agricukural ilunoff Q d t y in the LLaPlatte River 'datershed, V-Jnont. In: Perspectives on Nonpoiit Source P d u t i o n . Pmceeckiqs of a Xational Conference. U. S. Envkmmental Protection Agency, pf?. 185-190. 1984. i"lult.iL& Input 'Tmnsfer Snorrwon, A., Nemtdd, P. and I y a x w d l , W.H.C., Function - N&e iblodelilg of River Flow. In: Maxwell, Ir3.H.C. and Beard, L.R. (Editors), 1984. Fmritiers in Hydrology. Water R ~ u r c e Publications, s Littleton, C a l o r a c b , rzp. 111-126.

RESIDUALS FROM REGRESSION WITH DEPENDENT ERRORS R. J . KULPERGER

Department o f S t a t i s t i c a l and A c t u a r i a l Sciences, O n t a r i o , London, O n t a r i o , Canada, N6A 5B9

51.

The U n i v e r s i t y o f Western

INTRODUCTION Regression models r

Y. =

’

z

+ xi

cxef,(Zi)

(1.1)

0

are

very

useful

i n practice.

Here we

o b t a i n e d a f t e r f i t t i n g t h e parameters. identically

x 1,n .

= y. 1

distributed r A

(i.i.d.)

are

process,

interested i n the residuals

{ X i

If

the

}

i s an independent and

residuals

are

given

by

M a c N e i l l (1974, 1978) and M a c N e i l l and Jandhyala

c ae,n(~i).

(1985) have c o n s i d e r e d some p r o p e r t i e s of t h e r e s i d u a l p a r t i a l sum process.

X

R e c e n t l y t h e case where

i s a dependent s e r i e s ,

s p e c i f i c a l l y an a u t o -

r e g r e s s i v e (AR) process, has become o f i n t e r e s t ( s e e El-Shaarawi and E s t e r b y (1982)

for

residuals

several

such examples).

i n t h i s case f o r

some

We c o n s i d e r some p r o p e r t i e s o f t h e simple

r e g r e s s i o n cases

i n section

3.

I n s e c t i o n 2 we summarize some r e s u l t s i n t h e AR case w i t h no r e g r e s s i o n . S e c t i o n 3 c o n s i d e r s t h e r e g r e s s i o n case and a l s o some remarks on d i f f e r e n c i n g . Section

4

describes

some

s i m u l a t i o n examples

to

illustrate

some o f

the

results. AUTUREGRESSIVE RESIDUALS

$2.

K u l p e r g e r (1985a) c o n s i d e r e d t h e model

where

i s an i . i . d .

i s assumed t o

process, mean z e r o and v a r i a n c e

satisify the

invertability

conditions

of

u

2

.

Box

The p r o c e s s and J e n k i n s

(1976). Observe d a t a

(8, ,n,

.. . ,8p,n),

X i , i = -p+l,-p+2 ,...,n. Estimate 81 ,..., B P t h e o r d i n a r y l e a s t squares e s t i m a t e , which m i n i m i z e s

n

P

by

319 The r e s i d u a l s a r e t h e n d e f i n e d by

x .1

2.i , n =

sn2

Let

- !i? j n X I. - J.,

i = 1,2

,...,n. u2 ,

be a c o n s i s t e n t e s t i m a t e o f

sn2 = l

f o r example

n ;i,n2 ,z

.

The

1 =1

r e s i d u a l p a r t i a l sum p r o c e s s i s t h e n d e f i n e d by

(2.2) h

K u l p e r g e r (1985a) t h e n shows

B, s t a n d a r d Brownian

converges weakly t o

Bn

m o t i o n ( s e e B i l l i n g s l e y (1968) f o r d e t a i l s on weak convergence and Brownian The weak convergence means f o r any c o n t i n u o u s f u n c t i o n

motion).

I:

D +

< 1} < t 0 -

sup{lB(t)

where

f

D+

on means

< 11, and < t 1: 0 -

f o r nice

(ii 3.

!?f ( B ) ,

F o r example

convergence i n d i s t r i b u t i o n . ( i )s u p { l B n ( t )

*(in)

lives,

Bn

space i n w h i c h

the function

g.

SOME REGRESSION MODELS WITH AR ERRORS Work i s c u r r e n t l y i n p r o g r e s s on t h e s e t y p e s o f r e s u l t s .

we w i l l

I n t h i s section

More d e t a i l s a r e g i v e n

p r e s e n t o n l y some more s p e c i f i c r e s u l t s .

i n K u l p e r g e r (1985b). F i r s t Order P o l y n o m i a l

3.1

We c o n s i d e r

first

y . = a,, t a l i

+

e s t i m a t e s of

aO,al

X.

1

where

L

=>

Xi

=

minimize

[ntl jointly En(t)= - c u f i 1

where

a special

ei

case of

+ BX. 1-1

E

s e c t i o n 3.2.

n 2 c (Yi-a -a i ) i=l 0 1

= > B ( t ) and

-I

means converges weakly.

The AR(1) p r o c e s s i s now e s t i m a t e d by

an

i is

.

AR(1)

I t can

t h e model

Consider

then

process. be

shown

The that

320

The r e s i d u a l s a r e f i n a l l y d e f i n e d t o be . ,

h

;. 1,n

x i. , n

=

i = 1,2 ,...,n.

- RnXi-1,n’

It e a s i l y f o l l o w s t h a t

E^.i n

+ (l-Bn)(ao-Gon)

+ (B-Rn)xi-l

. E1

=

+ (1-Bn)(al-Gln )i [ntl

h

Bn(t)

Let

t

C

= ofi

i=l

(3.2)

Bn(al-Gln)

2.i , n ’

< 1, 0 < t -

be t h e r e s i d u a l

partial

sum p r o c e s s .

Then u s i n g ( 3 . 1 ) and ( 3 . 2 ) i t now f o l l o w s t h a t i n ( t ) => B ( t )

2

t

t h e same l i m i t process as i n t h e case i n which t h e e r r o r s a r e i . i . d . Polynomial P l u s Centred P e r i o d Component Consider t h e model Yi = aO + a 1i t a 2 f ( i ) + Xi

3.2

n

The assumption Otherwise l e t = a

0

t a

1

i

-1

+

a2(g(i)+cl)

2 1

0

zf(i)

+

g(i) = f(i) - c

= ( a t a c ) t e-,i

t

t

c1

1’ Xi

a2g(i) + X i

The AR process i s e s t i m a t e d by = Y.

1

-

- u^ On - Glni

GZnf(i),

-p+ 1,

...,n

Upon f i t t i n g t h e AR(p) model, t h e r e s i d u a l s a r e o b t a i n e d as

P

h

;i,n

=

i s

i s n o t such a r e s t r i c t i o n w i t h

The r e g r e s s i o n e s t i m a t e s s a t i s f y

x i. , n

X_

an AR(p)

We need t h e f o l l o w i n g assumptions .

process.

Y. 1

where

Yi,n

-

1”

h

$,n

x .1-J.11.

c1

= 0.

321

Then

+ Pc

(B.-@.

j=1

-

p

Jn

J

C

8.

1

Jn

=> B ( t )

1 ) ofi

[ntl

x. . c i=l 1-J (al-$,,)

(a,-~,,)Cntl t(

U f i

-

where (B,Zo,Z1)

B,(t)

= U f i

U f i

has t h e j o i n t limit l a w o f

Cntl and

z 1

( z0t+z1 t 2

,#’(

e

1

E

~

.

Therefore

[ntl

[ntl (i-j)

(a2-G2,,)

z

1

t U f i

f(i-j)

322

-

B n ( t ) => B ( t ) + Z(B(1)

$

3

1

-

B(s)ds)t

3(B(1)

-

2

f

0

1

B ( s ) d s ) t2

9

G(t)

0

T h i s i s t h e same l i m i t as i n s e c t i o n 3.1. I f t h e model i s changed t o

Y. =

+

a.

1

+ a2fl(i)

ali

+

+

a f (i)

2 2

Xi

where fl and f2 b o t h s a t i s f y t h e assumptions a t t h e b e g i n n i n g o f t h i s s e c t i o n , t h e r e s i d u a l p a r t i a l sum l i m i t p r o c e s s a g a i n t u r n s o u t t o b e G(.).

3.3

Remarks on D i f f e r e n c i n g In

processes

differencing.

with

trends,

Here

161 <

1

(MI)

Xi+l

= 6x0

(M2)

Xi+l

=

(M3)

Yi

+

1

=

-

X.

1

X. 1-1

is

simple

performed

examples,

after

all

with

ali

E.

1+1

+ Xi,

with

s a t i s f y i n g (Ml).

X

f o r M1 and M2 and

i n M1 and M2 we have

...,n.

i = - l,O,l,

Observe d a t a a t t i m e s Z.

three

analysis

1+1

1

+

consider

the

E.

a + 6x. +

= a.

we

often

Zi

Z . = 6 z .1-1 + 1

yi

Upon d i f f e r e n c i n g o b t a i n d a t a

-

Yi-l

where

yi

=

Yo

f o r M3. =

E.

i

-

‘i-1.

Then f o r example Estimate

6

by

o r d i n a r y l e a s t squares,

Bn

Then

for

+

all

three

cases

( s e e Jandhyala

f o r some f u r t h e r comments on M l ) . y.

=

v^.1 ,n

E.

=

and t h e

P

O,n

Pi,,(a)

zi

E~+,

-

f o r M1 and M2 i s e s t i m a t e d by

B nz 1-1 .

E ~ ’ Sa r e

e s t i m a t e d by t h e r e s i d u a l s

(a) = a =

$,n + P.1-1 ,n ( a ) .

The sums o f t h e r e s i d u a l s a r t

and K u l p e r g e r

(1985)

323 Theorem 3.1 ( a ) For M1,

n-l

- bnX-l,

i f a # Xo

(i)

then

[ntl c 1

and

-

( i i ) i f a = Xo

B^nX-l,

then

1 o f i

i s standard Brownian motion.

B

where ( b ) For M2

For M3

zi

= Yi

- Yi-l

= a1

ui

+

U. = X. - X. 1 1 1-1-

where

t h e process

by

Ui

Ui,n

Estimate

a1

-

G,n.

= Z.

1

n -1 I: Zi. Estimate 1 Something different from

by

Gln

= n

Theorem

3.1 above occurs. Theorem 3.2 For M3

where

Xo,X1

have t h e AR(1) d i s t r i b u t i o n and

and has t h e same d i s t r i b u t i o n as

4.

i s independent o f

Xo,X1,

.,X,

REMARKS In

Let

=

t h e AR(1)

-.4

N

and l e t f(x) =

&-Ix'

just

i l l u s t r a t e Theorem 3.1

.

n = 200.

We a l s o take

process a r e n o t used. =

100

differencing gives

points

Rn

.+

6-1

for

model

M1.

be d i s t r i b u t e d symmetric e x p o n e n t i a l , t h a t i s

E

process and t o remove t h e s t a r t u p phase,

t h e AR

first

s e c t i o n we w i l l

this 6

with density of

X

By t r i a l ,

i s reasonable. =

-.7.

Here

I n o r d e r t o simulate the f i r s t

F i t t i n g an AR(1)

B

=

N

points

i t seems t h a t dropping t h e

model a f t e r

-.79. The f i r s t p i c t u r e , u s i n g

324

20 100 P L O T

u

0.1. I

I I I

00 D0000D00

000000 OOOODD D0000000

'0.3.

I

0000000

I

DO00

I

0000"0000

I

-0.7. I

I

-1.1. I I

FIGURE 1

asymptotic s ope -1.3687. The second 1 Cntl and g i v e s __ I: ?.. These o n l y i l l u s t r a t e has

p i c t u r e uses the

c0

= '.3687

1

L7-d difficulties

known

in

working

with

non-invettable

some

of

Also

i f one i s dea i n g w i t h a process c l o s e t o these,

,

models.

strange t h i n g s can

happen. I n many

the

cases,

residual

partial

sum

w i t h AR e r r o r s , i s t h e same as t h a t o f t h e i . i . d .

processes,

for

e r r o r s case.

regression

It s t i l l

10 100 PLOT u1 0.7. 1 I I I 0.41 I I I

I 0 . I.

00 00 0 D O 00 0 0 00

w

0 0 0

0 0

0 0

0 0

000

0

00

0 0

0

0

D

0

0

0

0 000

0

0

00

0

0

00 000

0 0 000 0

0

00

00

D O 000000

0 0

0 0 0

0 00

0 000

DODO

0 0

00

000 0

0000

-0.2.

0

00

0 0 000 0 00

0 00

00

0 0 0000

0

00 0

000

0

0 00

00 0

0

000

a

OD DOO

0

0

oa

00

I I

0

0

D

I

FIGURE 2

325 remains

to

be

seen

i f these r e s u l t s are useful

regression over time. sums,

these

results

distributions,

that

i n d e t e c t i n g changes i n

However f o r h e u r i s t i c t e s t s based on r e s i d u a l p a r t i a l and

those

i s where

i n Kulperger the

null

(1985b)

hypothesis

can g i v e

i s that

some n u l l

of

no change

i n regression. ACKNOWLEDGEMENT Supported by NSERC g r a n t number A5724. REFERENCES P.

Billingsley,

(1968).

Convergence

of

Probability

Measures.

Wiley,

New York. Box,

and J e n k i n s ,

G.E.P.

and C o n t r o l .

(1976).

G.M.

El-Sharaawi, A. and E s t e r b y , S.

(1982).

Developments i n Water Science, 17. Jandhyala,

Time S e r i e s A n a l y s i s :

Forecasting

Holden-Day, San F r a n c i s c o .

V.K.

(1985).

Ph.D.

Time S e r i e s Methods i n Hydrosciences. E l s e v i e r , New York.

Thesis.

Department o f S t a t i s t i c s , U n i v e r s i t y

o f Western O n t a r i o , Canada. Jandhyala,

V.K.

and K u l p e r g e r ,

R.J.

(1985).

Estimation o t t h e autoregressive

parameters i n some n o n - s t a t i o n a r y ARMA(p,l) models. K u l p e r g e r , R.J.

(1985a).

and p o l y n o m i a l

On t h e r e s i d u a l s o f a u t o r e g r e s s i v e processes

regression.

To appear i n S t o c h a s t i c Process and T h e i r

Appl ic a t i ons. Kulperger,

R.J.

errors

and

(1985b). their

Some remarks on r e g r e s s i o n w i t h a u t o r e g r e s s i v e

residual

processes.

Tech.

Report,

Department

of

S t a t i s t i c s , U n i v e r s i t y o f Western O n t a r i o . MacNeill, and

I.B.

(1974).

distributions

Ann. S t a t i s t . , MacNei 11, I .B.

Tests

of

some

for

change o f

related

parameter

functionals

on

at

unknown

Brownian

time

motion.

2, 950-962.

( 1978).

P r o p e r t i e s o f sequences o f p a r t i a l sums o f polynoini a1

r e g r e s s i o n r e s i d u a l s w i t h a p p l i c a t i o n s t o t e s t s f o r change o f r e g r e s s i o n a t unknown t i m e s . MacNeill,

1.6.

Ann. S t a t i s t . ,

and Jandhyala,

l i n e a r regression.

V.K.

6, 422-433. (1985).

The r e s i d u a l process f o r non-

To appear i n J. A p p l . Prob.

ALTERNATIVES FOR IDENTIFYING STATISTICALLY SIGNIFICANT DIFFERENCES EDWARD A. McBEAM INTRODUCTION The need to discriminate between two or more sets of data is commonplace. Examples where discrimination is needed include the determination of the impact of an implemented remedial technology and the examination of whether a non-point pollutant source is producing a statistically significant impact. In responding to these types of questions requiring analysis, a number of testing procedures have been utilized. However, in selecting the procedure for use in a particular application, there are no absolute rules, only guidelines. To a large extent, the selection of the best procedure involves careful scrutiny of the characteristics of the problem-at-hand, and the assumptions implicit in the particular discrimination technique being considered. The most frequently used procedure for environmental problems is the t-test. However, there are assumptions implicit to the test which require different approaches in application to a problem. The intent of this paper is to discuss the nature of these assumptions and some of the available alternatives in application to analysis of water quality monitoring data. BACRGROUND Mathematically, the testing procedure as presented by Fisher (1925) allows the testing of whether the means from two sets of measurements, say X (where elements of X are xi where i=l, 2, ...m) and Y (where elements of Y are y where j=l,...n) are the same. j Assuming that X and Y are normally distributed with the same variance but that their population means LI and ii may be different, - - Y then the difference between the sample means x-y will be normally 1 + n). 1 distributed with mean ( p -u ) and variance u (m Then X Y t =

IX - 71

where I I denotes the absolute value sign and ' u ' represents the standard deviation, will follow a t-distribution with m+n-2 degrees

327 of f r e e d o m .

(a)

Noteworthy p o i n t s r e g a r d i n g t h e above i n c l u d e :

t h e a s s u m p t i o n t h a t d i s t r i b u t i o n s o f X and Y have t h e s a m e v a r i a n c e i s e s s e n t i a l t o t h e argument;

(b)

the variance a2

(mL L ) n

i s n o r m a l l y r e f e r r e d t o a s t h e common

variance;

(c)

t h e t - t e s t i s based on t h e a s s u m p t i o n t h a t t h i s u n d e r l y i n g d i s t r i b u t i o n i s normal o r g a u s s i a n .

U n f o r t u n a t e l y , o n e o r more o f t h e s e a s s u m p t i o n s i s f r e q u e n t l y v i o l a t e d i n s u r f a c e water q u a l i t y monitoring d a t a .

As w e l l ,

numerous o t h e r d i f f i c u l t i e s w i t h t h e d a t a i n c l u d e :

-

t h e t e s t s a r e a p p l i c a b l e i f t h e o b s e r v a t i o n s w i t h i n , and between samples c a n b e t r e a t e d a s i n d e p e n d e n t o f one a n o t h e r . I n many c a s e s , however, t h i s i n d e p e n d e n c e may n o t e x i s t .

-

a l l l a b o r a t o r y a n a l y t i c a l techniques have d e t e c t i o n l i m i t s below which o n l y " l e s s t h a n " v a l u e s may b e r e p o r t e d .

The

r e p o r t i n g of less t h a n v a l u e s p r o v i d e s a d e g r e e of q u a n t i f i c a t i o n , b u t even a t t h e i r d e t e c t i o n l i m i t s ,

the concentration

l e v e l s o f p a r t i c u l a r c o n t a m i n a n t s may b e o f c o n s i d e r a b l e importance b e c a u s e of t h e i r p o t e n t i a l h e a l t h h a z a r d .

How d o e s

one t h e n c a l c u l a t e t h e n e c e s s a r y s t a t i s t i c s f o r u s e i n Equation

(l), o r e q u a t i o n m o d i f i c a t i o n s t h e r e o f ? ALTERNATIVE FORMS

Out o f t h e f u n d a m e n t a l d e v e l o p m e n t s by G o s s e t t and F i s h e r , a number o f d i f f e r e n t t e s t s f o r s t a t i s t i c a l d i s c r i m i n a t i o n h a v e b e e n developed,

The d i f f e r e n t t e s t s i n c l u d e :

( i ) t h e two s a m p l e t - t e s t r e q u i r e s t h a t a l l t h r e e a s s u m p t i o n s i n d i c a t e d a b o v e ( a ) t h r o u g h ( c ), b e m e t ; ( i i ) m o d i f i e d t-tests have been d e v e l o p e d ( e . g .

Satterthwaite

( 1 9 6 4 ) , Behrens ( 1 9 2 9 ) , C o c h r a n ' s Approximation t o t h e BehrensFisher Students' t-test

(see C o c h r a n ( 1 9 6 4 ) ) r e l a x t h e s t r i n g e n c y

of a s s u m p t i o n s ( a ) and ( b ) .

As well,

t h e t-test

is reasonably

i n s e n s i t i v e t o moderate d e v i a t i o n s from n o r m a l i t y i n t h e d i s t r i b u t i o n of t h e d a t a .

A s an example, t h e R e s o u r c e C o n s e r v a t i o n

Recovery A c t assumes t h a t a sample w i t h a c o e f f i c i e n t o f v a r i a t i o n

less t h a n 1 . 0 0 i s l i k e l y t o have a normal d i s t r i b u t i o n ( F e d e r a l Register, 1982); ( i i i ) p a i r e d s a m p l e t - t e s t s a r e u s e d when t h e s a m p l e p o p u l a t i o n s a r e n o t i n d e p e n d e n t , s u c h a s o c c u r when s u c c e s s i v e s a m p l i n g t a k e s p l a c e o f t h e s a m e w a t e r s a m p l e s u p s t r e a m and downstream o f some source.

W

'Table 1

Test

t statistic

Two Sample t-Test

t

N a3

Summary T a b l e of t - T e s t S t a t i s t i c s , Degrees of Freedom and A s s u m p t i o n s S a t t e r t h w a i t e Approximation t o the Two Sample t-Test

Cnchran' 6 Approximat 1on t o the Behrens-Fisher t e s t

Paired t-Test

- IX - YL

sm m

n

m

n

m

n where Di

and

and S

6

D

-

xi-yi

for i = l , . . . m

m

Z Di 1-1

=

/

p

m- 1 Degrees of Freedom

df=m+n-2

dfx dfy

-

-

-

t t a b l e s with m-1 degrees of freedom

-

t t a b l e s with n-1 degrees of freedom

s 2 Wx Note: Comments

Since a is unknown, i t I s replaced by S, the sample atandard deviation. The same formulae a r e used w i t h transformed d a t a , as with untransfomed data.

round ' d f ' dovn t o the next nearest integer

S 2

-%andW

-y-

~n

with t h e r e s u l t the comparison t - s t a t i s t i c is

wx

tx

+w

t

Y

df

-

m -1

329

A summary table of the mathematics implied in some of the resulting tests is included as Table 1. As an example of the difficulties of test selection, the surface water quality monitoring results obtained from measuring both upstream and downstream of a potential nonpoint source, are included as columns I1 and I11 in Table 2. Some remedial technologies were implemented in October/November 1980 and the water quality monitoring data are as characterized by column V, as measured in 1981/82. Of interest are two questions: (i) Is the source contributing significantly to the river? and (ii) Did the remedial technologies significantly impact the water quality? Each will be briefly addressed. Statistical Discrimination for Non-Point Loadings Columns I1 and I11

-

Using Satterthwaite's Approximation, an examination of the upstream and downstream concentrations finds X = 1.69 y = 4.74 m = 10, vm = 9 n = 10, v = 9 n sx* = 1.35 S = 1.83 Y m n t1 = 1.71 v1 = 17.5 which is then taken as 17 Finally, for a one-sided test (from standard t tables) tC

Since

tC

=

1.74

0.05

1 > t 0.05-

then a statistically significant change has not been identified at the 95% level. However, a visual inspection of the upstream/ downstream data clearly demonstrates that the downstream water quality is at a lesser water quality level. For the type of correlation existing between upstream and downstream points, the pairing of individual observations and then observing only the differences between the observations is appropriate. Once the differences in the pairs are calculated, they are treated as a single random, independent sample.

This

capability is particularly important for data series possessing seasonality. Therefore, although the paired test has half the degrees of freedom of the two-sample t-test, the paired test does not "see" the cyclical variation which affects both populations and thus does not include it in the calculation of the standard

330 TABLE 2

Upstream and Downstream Water Q u a l i t y M o n i t o r i n g Records I

Date of Sampling 10/79 11/79 12/79 1/80 2/80 3/80 4/8 0 5/8 0 6/80 7/80 8/80 9/80

Pre-Remedial Records I1 I11 Downstream Upstream Measurements Measurements (mg/ a. 1 ( m s / a. ) .29 12 .32

---

Mean Standard Deviation

Post-Remedial Records IV V Date of Downstream Sampling Measurements (mg/2 )

4.3 16 6.1

13/81 11/81 12/81 1/8 2 2/82 3/8 2 4/8 2 5/82 6/82 7/82 8/82 9/82

---

.49 -14 1.58 1.77 1.07 -07 -14 -

2.66 3.0 4.42 5.74 1.40 1.49 2.3

1.69 3.67

4.74 4.28

.53 1.5 1.3

---

2.1 1.1

--

1.8 1.2 -64 1.1 1.25 .50

TABLE 3

Impact of A l t e r n a t i v e E q u a l i t y Assignments I Data Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17

I1 Phenol Concentrations mg/ .t 4 46

I11

4 46 1

<1 <1 <1

IV

V

Impact of A l t e r n a t i v e E q u a l i t y Assignments (i) (ii) (iii) 4 46 .5 .5 .5

1

1 2 <1 1 <1 1 3 3 1 1 4 4 <1 1 9 9 8 8 2 2 2 2 7 7 mean 5.5 standard deviation 10.8 2

.

mean (no o u t l i e r ) 3 . 0 s t a n d a r d d e v i a t i o n * 2.7 (no o u t l i e r )

2 .5 .5 3 1 4 .5 9 8 2

4 46 0 0 0 2 0 0 3 1 4 0 9 8 2 2 7 5.18

2 7 5.35 10.8

10.9

2.8 2.9

2.6 3.0

Notes : - * w i t h removal of t h e o u t l i e r - L o c a t i o n of m o n i t o r i n g d a t a - phenol l e v e l s i n S c h n e i d e r C r e e k , J u n e 1 9 7 6 , by g r a d sample w i t h a f r e q u e n c y n o t e x c e e d i n g one p e r day.

331

error.

-

D

Using t h e p a i r e d t e s t = 2.95

SD = 1 . 5 5 t

= 6.0

F o r 9 d e g r e e s of f r e e d o m tc = 1.83 f o r 95% a n d a s t a t i s t i c a l l y s i g n i f i c a n t impact ( t 2 t ) i s profoundly demonstrated (as t h e d a t a C

v e r y l o g i c a l l y i n d i c a t e s t h a t such must b e t h e s i t u a t i o n ) . I m p a c t o f t h e Remedial T e c h n o l o g y Using S a t t e r t h w a i t e ' s A p p r o x i m a t i o n i n c o m p a r i n g p r e - and p o s t r e m e d i a l measurements g i v e s t * = 2.56 and

tc = 1 . 7 which i n d i c a t e s t h e r e m e d i a l m e a s u r e s h a v e had a s t a t i s t i c a l l y s i g n i f i c a n t impact. DETECTION LIMIT DATA ANALYSES

R e c o g n i t i o n o f t h e c a r c i n o g e n i c r e l a t i o n s h i p s o f many c o n s t i t u e n t s (most n o t a b l y t h e c h l o r i n a t e d c a r b o n s ) e v e n a t c o n c e n t r a t i o n s i n t h e range of d e t e c t i o n l i m i t s of t h e i n s t r u m e n t a t i o n , h as created a d i f f i c u l t y i n discrimination analyses.

U t i l i z a t i o n of

t h e p a r a m e t r i c d i s c r i m i n a t i o n procedures r e q u i r e s replacement of " l e s s t h a n " d e t e c t i o n l e v e l d a t a by q u a n t i f i e d v a l u e s . F r e q u e n t p r a c t i c e t o a l l o w q u a n t i f i c a t i o n of t h e parameter s e t i s t o a s s i g n less t h a n v a l u e s a s " e q u a l t o " e i t h e r detection l i m i t ,

(ii)one-half

( i )t h e

t h e d e t e c t i o n l i m i t , o r (iii) z e r o .

T h e s e a s s i g n m e n t s p r o v i d e a d e g r e e o f q u a n t i f i c a t i o n b u t may s e r i o u s l y a f f e c t subsequent u t i l i z a t i o n of t h e parameters used i n t h e t-test.

A s an i n d i c a t i o n of

t h e consequences, c o n s i d e r t h e

example where t h e ' l e s s t h a n ' v a l u e s a r e assumed e q u a l t o t h e i r d e t e c t i o n l i m i t ( ( i )above).

I n t h i s case, t h e e s t i m a t e of t h e

mean i s h i g h a n d t h e e s t i m a t e of t h e s t a n d a r d d e v i a t i o n i s low. A s an i n d i c a t i o n of t h e problem, c o n s i d e r t h e chemical c o n c e n t r a -

t i o n d a t a r e p o r t e d i n T a b l e 3 from p h e n o l c o n c e n t r a t i o n s i n Schneider Creek i n J u n e 1976.

The r e s u l t i n g s t a t i s t i c a l p a r a m e t e r

c a l c u l a t i o n s when t h e less t h a n v a l u e s a r e r e p l a c e d i n a c c o r d w i t h

( i ) , ( i i )a n d (iii) are a s i n d i c a t e d i n columns I11 t h r o u g h V i n T a b l e 1. The i m p a c t s o f a s s u m i n g ( i ) ,( i i )o r (iii) i n c l u d e t h e r e s u l t s t h a t Snedecor's F - t e s t might i n c o r r e c t l y suggest t h a t t h e variance

332

90 80 706050403020 10 5 2 I PROBABILITY OF EXCEEDANCE Figure 1

Probability of Exceedence of Phenol Concentrations

333

computed f r o m t h e s e d a t a i s s i g n i f i c a n t l y d i f f e r e n t f r o m t h e v a r i a n c e c a l c u l a t e d from o t h e r d a t a ( e . g . p o s t r e m e d i a l technology).

A l t e r n a t i v e l y , t h e r e s u l t s could s u g g e s t a n impact

h a s o c c u r r e d b e c a u s e o f t h e i n c o r r e c t e s t i m a t e s o f means and v a r i a n c e s where,

i n f a c t , t h e r e i s no i m p a c t

(a f a l s e positive).

I n response t o t h e s e concerns, s e v e r a l procedures are possible: i t i s p o t e n t i a l l y b e s t t o u t i l i z e more t h a n a s i n g l e p r o c e d u r e a n d interpret the collective findings.

.

The s u g g e s t i o n s i n c l u d e :

Examine t h e s i g n i f i c a n c e t e s t s by s e n s i t i v i t y a n a l y s e s , i . e .

u t i l i z e e a c h of

( i ) , ( i i ) and

(iii)p r e v i o u s l y d e s c r i b e d .

If all

t h e tests a r e i n agreement, then t h e assumption u t i l i z e d is unimportant.

.

F i t a p r o b a b i l i t y d i s t r i b u t i o n t o t h e i n f o r m a t i o n above t h e

technological l i m i t .

To b e s u c c e s s f u l , a r e a s o n a b l e p o r t i o n of

t h e d a t a must b e i n e x c e s s o f t h e d e t e c t i o n l i m i t .

Interesting

c a n d i d a t e d i s t r i b u t i o n s i n c l u d e t h e normal and l o g n o r m a l d i s t r i b u t i o n s , which t e n d t o b e u s e f u l i n d e s c r i b i n g many d a t a s e t s b e c a u s e of t h e c e n t r a l l i m i t t h e o r e m .

.

U t i l i z e a n o n - p a r a m e t r i c t e s t s u c h a s t h e Nann-Whitney

test

i n s t e a d of t h e t - t e s t . I f t h e s t a t i s t i c a l o u t l i e r ( t h e second monitored v a l u e ) i s l e f t i n t h e d a t a s e t , l i t t l e d i f f e r e n c e i n t h e c a l c u l a t e d means and s t a n d a r d d e v i a t i o n s i s a p p a r e n t from t h e r e p o r t e d v a l u e s i n t h e s e c o n d l a s t row o f T a b l e 3 .

However, i f t h e o u t l i e r i s removed,

t h e c h a n g e s a s s o c i a t e d w i t h a s s u m p t i o n s of

( i ) , ( i i )o r

( i i i )a r e

p o t e n t i a l l y more i m p o r t a n t . A s a n a l t e r n a t i v e p r o c e d u r e , t h e r a n k e d d a t a and p l o t t i n g

p o s i t i o n u s i n g t h e Weibull p l o t t i n g f o r m u l a ( m / ( n + l ) where m = t h e r a n k o f t h e s a m p l e a n d n = t h e number of s a m p l e s ) a r e c o n t a i n e d i n F i g u r e 1 on l o g n o r m a l p r o b a b i l i t y p a p e r .

Determining t h e b e s t - f i t

l i n e t o t h e r e s u l t i n g d a t a must b e c a r r i e d o u t w i t h c a u t i o n a b r i e f d a t a s e t , one o u t l i e r

(e.g.

t i a l l y bias the resulting line.

-

with

The r e s u l t i n g s t a t i s t i c s t o

c h a r a c t e r i z e t h e d a t a are:

-

--

t h e v a l u e of 4 6 ) c a n substan-

-

when a l l v a l u e s a r e i n c l u d e d ( L i n e A on F i g u r e 1)

x =

when t h e o u t l i e r i s n o t i n c l u d e d ( L i n e B on F i g u r e 1)

x

s -

1.79

= 10.4 =

1.91

S =

8.24

As. t h e t h i r d p r o c e d u r e , t h e n o n - p a r a m e t r i c p r o c e d u r e s such a s

t h e Mann-Whitney t e s t , d o n o t r e q u i r e t h e a s s i g n m e n t of means and

334

standard deviations and therefore avoid the problem to some extent. On the negative side, non-parametric tests typically do not have the same discrimination capability and so may not be as effective in application. CONCLUSIONS To make inferences about the means of small samples, the tdistribution which describes the distribution of the means of small samples from a normally distributed population, is frequently chosen as the reference. Whether this is valid or not depends somewhat upon the purpose of the testing and how the test is applied. The concept of "statistical significance" must be reflected in a number of aspects of the monitoring program involving not just the choice of the level of significance but also the choice of the test, and the requirements of the number of samples. REFEREPJCES Behrens, W.V., Landwirtsch. Jahrb., 68, 1929, 807. Cochran, W., "Approximate Significance Levels of the BehrensFisher Test", Biometrics, March 1964, p. 191. Federal Register, Rules and Regulations, Vol. 47, No. 143, July26, 1982. Satterthwaite, F.E., Biometric Bulletin, 2, 1946, 110. Snedecor, G.W., and C o c h r a n , i s t i c a l Methods, The Iowa State University Press, 6th Edition, 1967.

GLOBAL VARIANCE AND ROOT MEAN SQUARE ERROR ASSOCIATED WITH LINEAR INTERPOLATION OF A MARKOVIAN TIME-SERIES D.A.

CLUIS

INRS-Eau, Uni v e r s i t 6 du QuGbec, C.P. 7500, Sainte-Foy (Qugbec), Canada G1V 4C7

ABSTRACT Most general-purpose d a t a a c q u i s i t i o n n e t w o r k s p r o v i d e equispaced i n s t a n t a n e o u s i n f o r m a t i o n ; t h e f r e q u e n c y o f measurements necessary t o o b t a i n t h i s i n f o r m a t i o n e f f i c i e n t l y i s r e l a t e d t o t h e i n t r i n s i c temporal v a r i a b i l i t y o f t h e g i v e n phenomenon.

Thus,

m e t e o r o l o g i c a l and h y d r o m e t r i c phenomena a r e sampled more i n -

t e n s i v e l y t h a n a r e more s t a b l e groundwater v a r i a t e s . Once t h e d a t a have been a c q u i r e d ,

t h e e s t i m a t i o n o f v a l u e s which s h o u l d have

been t a k e n by a t i m e - s e r i e s a t h i g h e r f r e q u e n c y o r o f t h e c o m b i n a t i o n o f two o r more t i m e - s e r i e s problem.

measured a t d i f f e r e n t f r e q u e n c i e s i s a f r e q u e n t b u t u n s o l v e d

I n the f i e l d o f water q u a l i t y monitoring,

f o r example, t h e e s t i m a t i o n

o f mass-discharges i s a p r e r e q u i s i t e f o r t h e i n t e r p r e t a t i o n o f t r a n s p o r t phenomena,

source-effects

r e l a t i o n s h i p s and t r e n d d e t e c t i o n .

To e v a l u a t e t h i s

i m p o r t a n t secondary v a r i a t e , one must combine h i g h f r e q u e n c y / h i g h v a r i a b i l i t y f l o w d a t a w i t h l o w f r e q u e n c y / l o w v a r i a b i l i t y c o n c e n t r a t i o n data;

t h i s can be

done by u s i n g some c o m b i n a t i o n o f a g g r e g a t i o n and i n t e r p o l a t i o n o f d a t a . a g g r e g a t i o n o f h i g h f r e q u e n c y d a t a has r e l a t i v e l y m i n o r e f f e c t s , e.g.

The

a reduc-

t i o n o f t h e v a r i a n c e and a m o d i f i c a t i o n o f t h e p e r s i s t e n c e s t r u c t u r e i n t h e t r a n s f o r m e d data.

However,

the spreading o f the i n f o r m a t i o n r e s u l t i n g from

l i n e a r i n t e r p o l a t i o n c r e a t e s a c e r t a i n l e v e l o f h e t e r o s c e d a s t i c i t y and a l s o produces an e r r o r o f e s t i m a t i o n ,

t h e v a r i a n c e o f which i n c r e a s e s w i t h t h e num-

ber o f p a r t i t i o n s . F o r phenomena e x h i b i t i n g s h o r t - t e r m p o s i t i v e Markovian p e r s i s t e n c e , tical

expression f o r t h e global

established sel f - s i m i l a r

for

t h e analy-

v a r i a n c e o f t h e e s t i m a t i o n e r r o r was f i r s t

skipped s e r i e s d e r i v e d from actual

measurements:

p e r s i s t e n c e s t r u c t u r e o f t h e Markovian processes,

using the

t h e Root Mean

Square E r r o r (RMSE) o f t h e i n t e r p o l a t e d t i m e - s e r i e s was t h e n deduced.

Thus, a

336 criterion

r e l a t i n g the

short-term

persistence

to

t h e number o f

partitions

a l l o w s t o c o n t r o l and l i m i t t h e l e v e l o f e r r o r i n t h e t r a n s f o r m e d t i m e - s e r i e s . INTRODUCTION Sometimes one wishes t o combine two o r more g e o p h y s i c a l t i m e - s e r i e s which a r e sampled s y s t e m a t i c a l l y a t t i m e i n t e r v a l s whose t i m e s t e p s may o r may n o t be i n t e g r a l m u l t i p l e s o f each o t h e r .

C o n s i d e r a t i o n o f o n l y s i m u l t a n e o u s l y sampled

e v e n t s o f t h e s e r i e s would r e s u l t i n a m a j o r l o s s o f i n f o r m a t i o n ; t h i s i s espec i a l l y t r u e i f t h e b a s i c d a t a c o n s i s t s o f t h e equispaced i n s t a n t a n e o u s i n f o r m a t i o n p r o v i d e d by most g e n e r a l - p u r p o s e d a t a a c q u i s i t i o n network;

i n t h i s case

t h e f r e q u e n c y o f measurements necessary t o o b t a i n t h i s i n f o r m a t i o n e f f i c i e n t l y i s r e l a t e d t o t h e i n t r i n s i c temporal v a r i a b i l i t y o f t h e g i v e n phenomenon and c o u l d d i f f e r by some o r d e r s o f magnitude.

Thus, m e t e o r o l o g i c a l and h y d r o m e t r i c

phenomena a r e sampled more i n t e n s i v e l y t h a n a r e more s t a b l e groundwater v a r i a tes. I n the f i e l d o f water q u a l i t y monitoring,

f o r example, t h e e s t i m a t i o n o f mass-

d i s c h a r g e s i s r e q u i r e d f o r t h e i n t e r p r e t a t i o n o f t r a n s p o r t phenomena, e f f e c t r e l a t i o n s h i p s and t r e n d d e t e c t i o n .

source-

To e v a l u a t e t h i s i m p o r t a n t secondary

v a r i a t e , one has t o combine h i g h f r e q u e n c y / h i g h v a r i a b i l i t y f l o w d a t a w i t h l o w f r e q u e n c y l l o w v a r i a b i l i t y c o n c e n t r a t i o n data.

I n t h e P r o v i n c e o f Quebec, w a t e r

l e v e l s l e a d i n g t o p u b l i s h e d compounded mean d a i l y d i s c h a r g e s a r e r e c o r d e d e v e r y

15 minutes,

whi l e t h e m o n i t o r i n g network f o r r u n n i ng-water qua1 it y parameters

p r o v i d e s samples r e g u l a r l y e v e r y 3-4 weeks. ges, o r t h e f l u x e s ,

An e s t i m a t i o n o f t h e mass-dischar-

o f p o l l u t a n t s can be made by using,some c o m b i n a t i o n o f ag-

g r e g a t i o n and i n t e r p o l a t i o n ,

b u t i t i s i m p o r t a n t t o e v a l u a t e how much n o i s e i s

added by t h i s m a n i p u l a t i o n o f t h e data.

Knowledge o f t h e l e v e l o f t h i s n o i s e

can be o f extreme v a l u e i n t h e f i e l d o f water-qua1 i t y m o d e l i n g where "measured" mass-discharges a r e used f o r c a l i b r a t i o n purposes.

Disregarding t h e inaccuracy

o f the reference values could l e a d t o f u t i l e attempts a t f u r t h e r f i n e - t u n i n g o f a model as measured d a t a a r e a c t u a l l y f u l l y e x p l o i t e d . The d e f i n i t i o n o f a s u i t a b l e t i m e - i n t e r v a l

f o r calculation o f a time-series

c o m b i n a t i o n v a r i a t e as w e l l as t h e e s t i m a t i o n o f t h e e r r o r s i n v o l v e d c o n s t i t u t e s a f r e q u e n t b u t u n s o l v e d problem.

The a g g r e g a t i o n o f h i g h f r e q u e n c y d a t a .

does have s t r u c t u r a l consequences e.g.

a r e d u c t i o n o f t h e v a r i a n c e and a modi-

f i c a t i o n o f the persistence s t r u c t u r e i n t h e transformed data; b u t i t i s not introducing external

i n f o r m a t i o n i n t o t h e transformed time-series.

However,

the spreading o f t h e i n f o r m a t i o n r e s u l t i n g from l i n e a r i n t e r p o l a t i o n creates a c e r t a i n level o f heteroscedasticity, modifies the persistence structure,

but

337

a l s o induces an e r r o r of estimation, the variance of which increases with the number of p a r t i t i o n s . We deal with t h i s problem in t h i s paper. Hypothesis The sampled time-series Z i i s taken t o follow a c l a s s i c a l a d d i t i v e scheme: Zi = T . + S . i i

+ Xi

(1)

where T i a n d S i r e p r e s e n t t h e t r e n d and s e a s o n a l components of the studied phenomenon, X i t h e short-term temporal f l u c t u a t i o n s of i n t e r e s t and i and index of the time of occurrence. I n t h i s paper, we assume t h a t a s u f f i c i e n t l y long record of data i s a v a i l a b l e t o accurately i d e n t i f y and remove ab i n i t i o the long term trend T and the seasonal v a r i a t i o n s S. The following developments deal e s s e n t i a l l y with the s t a t i w i a r y innovation component X . The s t a t i s t i c a l model used f o r t h i s component i s c o n s i s t e n t with the s h o r t term behaviour of numerous geophysical time-series: the sampled process behaves as i d e n t i c a l l y d i s t r i b u t e d and c o r r e l a t e d random v a r i a b l e s , having t h e same zero mean and variance 0 2 ; i t s a u t o c o r r e l a t i o n s t r u c t u r e i s only a function of the time separation between the concerned sampled data; furthermore, an exponential decay of the autocorrelogram i s assumed according t o the r e l a t i o n : rk = r l k where rl i s the lag-one a u t o c o r r e l a t i o n c o e f f i c i e n t corresponding t o a u n i t sampling i n t e r v a l . I n t h i s development, we a r e s p e c i a l l y i n t e r e s t e d with the p r a c t i c a l case of a strongly p o s i t i v e dependence ( r l > 0 ) . As we a r e dealing here with i n t r i n s i c p r o p e r t i e s and in order t o avoid the use of multiple n o t a t i o n s , we will not attempt t o formally d i s t i n g u i s h the s t a t i s t i c s of the process from those of t h e i r e s t i m a t e s in t h e case o f l a r g e samples with neglected end e f f e c t s . L I N E A R INTERPOLATION

I t e r a t i v e l i n e a r i n t e r p o l a t i o n ( I L I ) i s one of the most common and p r a c t i c a l estimation techniques used t o generate values a t a frequency higher than t h a t of the measured hydrological v a r i a t e . I t i s c l e a r t h a t no new information i s created by i n t e r p o l a t i n g l i n e a r l y ; however, the o r i g i n a l content i s spread in time. When only two data points, located a t the ends o f the time-interval which i s studied r e t r o s p e c t i v e l y , a r e considered a t a p a r t i c u l a r time, ILI

338

c a n n o t be c o n s i d e r e d as an o p t i m a l e s t i m a t o r . s i m u l a t i o n o f i n t e r m e d i a t e values,

However, depending on t h e v a r i a -

ILI i s an e s t i m a t o r more p o w e r f u l f o r t h e

b i l i t y o f t h e process under s t u d y ,

t h a n t h a t o b t a i n e d by a mathematical expec-

t a t i o n which uses a g e n e r a l o r a seasonal mean value. t h e o r y p r o v i d e s r e f i n e d means, functions,

Numerical a p p r o x i m a t i o n

i n c l u d i n g i n t e r p o l a t i n g p o l y n o m i a l s and s p l i n e

f o r o b t a i n i n g more p o w e r f u l e s t i m a t o r s which t a k e advantage o f t h e

s h o r t t e r m temporal simplicity,

t r e n d s i n t h e measured data.

Nevertheless,

due t o i t s

ILI remains one o f t h e most common and p r a c t i c a l e s t i m a t i o n t e c h n i -

ques used i n h y d r o l o g y t o g e n e r a t e v a l u e s a t f r e q u e n c i e s h i g h e r t h a n t h o s e o f t h e measured v a r i a t e . C o n s i d e r t h e d a t a s e r i e s Xi c o n s e c u t i v e v a l u e s o f Xi interpolated series Y

j

(i = 1

... k )

o f l e n g t h k.

( j = 1

... N)

o f l e n g t h N = ( k - 1 ) p + 1.

i s d e f i n e d as a l i n e a r i n t e r p o l a t e o f t h e s e r i e s Xi (p'-l)

xi+l

p' and

p

+ (p-p'+l)

Each t e r m Y j

by t h e f o l l o w i n g e q u a t i o n :

xi (2)

Y J. = Y ( i - 1 ) p+p' where i

Each i n t e r v a l between

i s t h e n d i v i d e d i n t o p equal i n t e r v a l s , g i v i n g t h e

P

i s an i n t e g e r v a r y i n g f r o m 1 t o k; i s an i n t e g e r v a r y i n g f r o m 1 t o p; e x c e p t f o r i = k where p ' = 1; i s a f i x e d i n t e g e r d e f i n i n g t h e s u b d i v i s i o n l e v e l used i n t h e i n t e r polation.

Mean v a l u e s

1 N - 1 k c Y.; r e p l a c i n g Y J. by i t s d e f i n i By d e f i n i t i o n X = - c Xi and Y = k i=l ( k-1 )p+l 1 J t i o n e q u a t i o n [ 2 ] and p e r f o r m i n g t h e i n t e g r a l summations, one o b t a i n s :

-

2 p k

x -

(p-1) (x, + xk)

Y =

2 [ ( k - l ) p + 11 I f unknown end-values

Y,

and Xk a r e e s t i m a t e d by t h e mean v a l u e X o r i f t h e

-

l e n g t h o f t h e g e n e r a t i n g s e r i e s Xi

-

u n b i a s e d e s t i m a t e o f X.

i s l a r g e , t h e n Y becomes an a s y m p t o t i c

Making use o f t h i s p r o p e r t y and i n o r d e r t h e keep

t h e mathematical d e r i v a t i o n s as c o n c i s e as p o s s i b l e ,

t h e developments have

t o t h e case o f l a r g e and c e n t e r e d s t a t i o n a r y d a t a s e r i e s Xi. - G i v e n t h i s l a r g e - s a m p l e r e s t r i c t i o n , X = Y = 0 and t h e i n f l u e n c e o f t h e endbeen

limited

Val ues becomes n e g l ig i b l e.

339 So I L I p r e s e r v e s t h e f i r s t - o r d e r

s t a t i o n a r i t y o f the series;

cond-order s t a t i o n a r i t y i s n o t r e t a i n e d , s t r i c t 0 sensu,

however t h e se-

s i n c e i t w i l l be shown

To

t h a t t h e v a r i a n c e depends on t h e l o c a t i o n p ' o f t h e i n t e r p o l a t e d p o i n t .

c i r c u m v e n t t h i s p r o b l e m we have r e t a i n e d a b r o a d e r d e f i n i t i o n f o r s t a t i o n a r i t y g i v e n by PANKRATZ (1983, p. 1 6 ) w h i c h s t a t e s t h a t :

" i f a data-series

i s stationary,

t h e n t h e v a r i a n c e o f any m a j o r s u b s e t w i l l

d i f f e r f r o m t h e v a r i a n c e o f any o t h e r subset o n l y by chance".

To keep t r a c k o f t h i s r e s t r i c t i o n ,

we c a l l " g l o b a l " t h e v a r i a n c e o b t a i n e d by

summations o v e r i n t e g r a l numbers o f p a r t i t i ons. GLOBAL VARIANCE Noting

~2

and r l t h e v a r i a n c e and l a g 1 a u t o c o r r e l a t i o n c o e f f i c i e n t o f t h e

d a t a s e r i e s Xi,

a n u n b i a s e d e x p r e s s i o n o f t h e v a r i a n c e S2 o f t h e s e r i e s Y .

J

may be w r i t t e n as: N

Applying the variance operator t o t h e running point Y t h e i n t e r p o l a t e d s e r i e s e x p r e s s e d by e q u a t i o n [2],

(p,p') one g e t s :

= y(i-l)p+p'

T h i s e q u a t i o n shows t h a t t h e i n t e r p o l a t e d s e r i e s Y j i s n o t homoscedastic,

Of

i.e.

the variance i s n o t independent o f t h e l o c a t i o n p i o f the i n t e r p o l a t e d point. The minimal v a r i a n c e o c c u r s a t t h e m i d d l e o f t h e i n t e r p o l a t e d segment, however i n t h e case o f s e r i e s w i t h p o s i t i v e p e r s i s t e n c e ,

t h i s non-stationarity i n the

v a r i a n c e i s n o t v e r y severe:

I n t h e .case o f h i g h p o s i t i v e v a l u e s o f rl,

t h i s e f f e c t can be n e g l e c t e d .

For

an i n t e g r a l number o f p a r t i t i o n s , t h e s e r i e s Y . c o n s i d e r e d as a whole possesses J a f i n i t e and s t a b l e g l o b a l v a r i a n c e which can be o b t a i n e d by t h e summation o f

340 p ' between 1 and p; n e g l e c t i n g t h e e n d - e f f e c t s ,

The a s y m p t o t i c g l o b a l

one o b t a i n s :

v a r i a n c e o f an i n t e r p o l a t e d s e r i e s i s always reduced

w i t h r e g a r d s t o t h e v a r i a n c e o f t h e data s e r i e s .

The i n f l u e n c e o f t h e lag-one

a u t o c o r r e l a t i o n c o e f f i c i e n t rl i s g e n e r a l l y l a r g e r t h a n t h a t o f t h e p a r t i t i o n l e v e l p o f Xi

P

PI

and i s d i s p l a y e d on T a b l e 1.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2

0.750

0.775

0.800

0.825

0.850

0.675

0.900

0.925

0.950

0.975

5

0.680

0.712

0.744

0.776

0.808

0.640

0.872

0.904

0.936

0.968

10

0.670

0.703

0.736

0.769

0.802

0.835

0.868

0.901

0.934

0.967

,,.

0.667

0.700

0.733

0.767

0.800

0.833

0.867

0.900

0.933

0.967

ERROR OF ESTIMATION INDUCED BY ILI The expected Root Mean Square E r r o r (RMSE) r e s u l t i n g f r o m a l i n e a r i n t e r p o l a t i o n w i t h p p a r t i t i o n s w i l l be a n a l y t i c a l l y developed.

will

consider

the errors

p - s k i p p e d measured data.

To a c h i e v e t h i s , we

involved with the p possible realizations

o f the

We w i l l t h e n make use o f t h e s e l f - s i m i l a r p e r s i s t e n c e

s t r u c t u r e o f Markovian processes t o deduce t h e RMSE o f t h e i n t e r p o l a t e d s e r i e s o f s u b d i v i s i o n l e v e l p. Case where p = 2 S t a r t i n g w i t h t h e o r i g i n a l s e r i e s Xi d e r t h e two p o s s i b l e r e a l i z a t i o n s X I i

measured a t u n i t t i m e i n t e r v a l s , we c o n s i and

o b t a i n e d w i t h t h e s e r i e s Xiskip-

ped e v e r y two t i m e u n i t s and t h e n i n t e r p o l a t e d in-between: XIi

= XI

=

x*

xu* X'I3

x3 x4

XI4

x5

X'I5

X6

The e s t i m a t e d v a l u e s

XIi

.... ..... or

are l i n e a r interpolates o f type

341

( X i - l + Xi+1)/2. The two s e r i e s of estimation e r r o r s X-X' and X - X I ' c o n s i s t of z e r o s a l t e r n a t i n g with g e n e r i c terms X i - ( [Xi-1 + Xi+1]/2). Neglecting the end e f f e c t s , the expected variance of the estimation e r r o r f o r the two

s e r i e s ( t o t a l length 2N) i s :

I n t r o d u c i n g t h e v a r i a n c e and covariances of the sampled Markovian s e r i e s X i , one gets:

E2 a2

-

3

1

3

4

r l + 4-

= - -

2

1 r l + - r *

4

4

(4)

1

This expression r e l a t e s the variance of the estimation e r r o r t o the variance and the persistence s t r u c t u r e of t h e sampled data. The preceeding r e s u l t a p p l i e s t o an i n t e r p o l a t i o n of subdivision level 2 b u i l t If one wishes now t o i n t e r p o l a t e a on a measured s e r i e s of u n i t time step. s e r i e s t o a time s t e p half of the measured d a t a , the estimation e r r o r can not 1 3 5 be known because the intermediate times - , - , were not sampled, b u t i t s 2 2 2 variance can be estimated because of the s e l f - s i m i l a r s t r u c t u r e of the Markovian process: one has only t o replace the lag-one a u t o c o r r e l a t i o n c o e f f i c i e n t rl, o f t h e non-sampled s e r i e s (time s t e p 4) by ri4; this, because i n a Marko-

...

r l i s the lag-two auto-correlation

vian world,

c o e f f i c i e n t of t h i s s e r i e s .

T h u s , in t h i s p a r t i c u l a r case, f o r a Markovian process of mean zero, of variance 0 2 sampled a t u n i t times, the global variance of the i n t e r p o l a t e d s e r i e s a t times $ i s

E2

+

= 02

tion i s E 2 =

a2

-r1 l

a n d the expected variance of t h e e r r o r of estima-

[-i 4r1

3

[- - J r l + 4

-1.

4

Generalization f o r any level of p a r t i t i o n s p

-

Following the previously described p a t t e r n , we consider t h e p possible r e a l i s a tions of the P-skipped s e r i e s where the intermediate values have been l i n e a r l y i n terpol a ted:

P

P

*

f

X'

=

x1

,

X I 2

,

X I 3

,....) <+l

J'pt2.'

XI

p+3

'....'

XZp+l

* *

-

342

The e s t i m a t e d v a l u e s X ' ,

X",...,X(pl

a r e l i n e a r i n t e r p o l a t e s of t y p e :

I n t h i s e x p r e s s i o n , t h e i n d e x j v a r y i n g f r o m 1 t o p r e p r e s e n t s t h e rows, t h e i n d e x p ' v a r y i n g f r o m 1 t o p r e p r e s e n t s t h e columns and k i s an i n t e g e r , f r o m 0

N/P i d e n t i f y i n g t h e p a r t i t i o n .

to

I f one c o n s i d e r s t h e s e t o f t h e p s e r i e s o f i n t e r p o l a t i o n e r r o r s :

... , ( x - x ' P ) )

(x-x"),

(X-X'),

i t s generic term i s :

and t h e v a r i a n c e o f t h e e s t i m a t i o n e r r o r f o r t h e p s e r i e s ( t o t a l l e n g t h pN) can be expressed,

n e g l e c t i n g t h e e n d - e f f e c t s by 3 terms:

The two f i r s t summations

r e p r e s e n t t h e v a r i a n c e s o f t h e o r i g i n a l s e r i e s and

o f the p interpolated series:

The t h i r d summation g i v e s : a 2 2 - [p p

4

-P2

P

c

(p-p'+ll r p q 1

PI= 1

I f t h e p e r s i s t e n c e s t r u c t u r e o f t h e process i s Markovian, t h e n : l-rlP

P

( p - p ' + l ) rpIml =

1

p"

1

p r1 r1 (1-r1)2

1-rl

343 Thus t h e v a r i a n c e o f t h e e s t i m a t i o n e r r o r o f a p - s k i p p e d s e r i e s whose i n t e r m e d i a t e v a l u e s have been i n t e r p o l a t e d i s :

A p p l y i n g now t h e same r a t i o n a l e as w i t h t h e case where p = 2,

we t a k e advan-

tage o f t h e s e l f - s i m i l a r s t r u c t u r e o f Markovian processes t o e s t i m a t e t h e exp e c t e d v a r i a n c e o f t h e e s t i m a t i o n e r r o r f o r a s e r i e s sampled a t u n i t t i m e and 1 ,2 ,3 i n t e r p o l a t e d a t intermediate times To a c h i e v e t h a t , one has PUP! P because i n t h i s s e r i e s , r c o n s t i o n l y t o r e p l a c e i n e q u a t i o n [6] rl by r1 1

... ~.

tutes the l a g p autocorrelation coefficient:

T h i s e q u a t i o n d e t e r m i n e s t h e e x p e c t e d RMSE o f i n t e r p o l a t i o n ,

relative t o the

v a r i a b i l i t y o f t h e p r o c e s s u and f u n c t i o n o f t h e l a g one a u t o c o r r e l a t i o n c o e f I t takes f u l l y i n t o account the f i c i e n t rl and o f t h e l e v e l o f p a r t i t i o n p. h e t e r o s c e d a s t i c i t y o f t h e v a r i a n c e o f t h e i n t e r p o l a t e d s e r i e s and t h e terms o f c o v a r i a n c e i n t r o d u c e d i n t h e e r r o r s o f e s t i m a t i o n by t h i s p a r t i c u l a r i t y . As p i n c r e a s e s , e q u a t i o n ( 7 1 can be expanded a c c o r d i n g t c 1 ’ H o p i t a l ’ s r u l e :

so t h e a s y m p t o t i c e x p a n s i o n o f e q u a t i o n

[ 7 ] becomes,

f o r very l a r g e values

o f p: E 2 = 02

5+r

[--

+

3

4

4(1-r1)

Ln rl

Ln2 r1

-+-

1

Table 2 i l l u s t r a t e s t h e r e l a t i v e RMSE o f e s t i m a t i o n E / u

r e s u l t i n g from I L I

w i t h p p a r t i t i o n s b u i l t on a M a r k o v i a n s e r i e s o f parameter rI. t h a t t h e s i g n i f i c a n t f a c t o r i s t h e p e r s i s t e n c e parameter rI, values

of

r,,

very

little

t h a t an i n c r e a s e i n t h e l e v e l F o r l o w e r v a l u e s of

rl,

noise of

variance

is

F i g u r e 1 shows t h a t for higher

introduced by

I L I and

p a r t i t i o n p i s not a significant factor.

t h e i n t e r p o l a t i o n process,

l i k e any e s t i m a t i o n t e c h -

344 Expected relative RMSE of estimation resulting from I L I with p partitions b u i l t on a Markovian series of parameter r l .

Table 2:

riI

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2

0.866

0.677

0.594

0.527

0.466

0.410

0.354

0.297

0.236

0.162

5

1.13

0.782

0.680

0.600

0.530

0.465

0.401

0.337

0.267

0.184

10

1.21

0.796

0.691

0.610

0.539

0.472

0.408

0.342

0.271

0.186

1.29

0.801

0.695

0.613

0.541

0.475

0.410

0.344

0.272

0.187

P

0.4

0.2

0 Figure 1:

Relative RMSE

ds a

0.6

0.8

1.

f u n c t i o n o f r, and p .

n i q u e i s l e s s e f f i c i e n t and f o r rl = 0 ( t h e case o f independent sampled d a t a ) i t can even l e a d t o a l a r g e r RMSE o f e s t i m a t i o n t h a n t h e " g l o b a l " e s t i m a t o r

where i n t e r m e d i a t e unsampled d a t a p o i n t s a r e a r b i t r a r i l y e s t i m a t e d w i t h t h e g e n e r a l mean and whose e x p e c t e d r e l a t i v e RMSE o f e s t i m a t i o n i s € ' / a =

[p-!]". P

The a n a l y t i c a l r e s u l t s o b t a i n e d f o r t h e v a r i a n c e and e x p e c t e d v a r i a n c e o f t h e errors Xi

induced

(0; a*; rk =

by

the

r L k la r e

linear

interpolation

g i v e n on Table 3.

of

a

Markovian

series

345 Table 3: Synthesis o f the analytical results obtained with the interpolation o f a s e r i e s Xi[O;

k rk=rl

1.

Expected relative variance of the error E 2 / 0 2

Relative variance o f interpol ation s ' / 0 2

1 La?e-p-=-Z local value p' = 1

; ' 0

)

pl

1 ( l + r l )/ 2

= 2

(3tr-l) / 4

lglobal value

Occasional m i s s i n g v a l u e s The p r e c e e d i n g development a l l o w s t h e e v a l u a t i o n o f t h e i n f l u e n c e o f t h e use o f l i n e a r i n t e r p o l a t i o n t o f i l l - i n o c c a s i o n a l m i s s i n g values.

Consider a time-

s e r i e s o f l e n g t h N w i t h n n o n - c o n s e c u t i v e m i s s i n g v a l u e s ; t h e e x p e c t e d RMSE f o r

N

a series of length

-

i n t e r p o l a t e d once in-between would be:

2

E 3 - = [ - - 2 ~ r 0 2

r I

1/2

+>I 2

Thus f o r o n l y n r e - c r e a t e d d a t a p o i n t s one o b t a i n s : n

E

- = a

I:-

N

3 (-

2

1/2

-

2 J rl

+

rl

-)]

2

This equation gives a q u a n t i t a t i v e c o n t r o l over t h e variance o f the e r r o r global l y introduced i n t o the time-series. A p p l i c a t i o n t o mass-discharge c a l c u l a t i o n s I n t h e i n t e r p r e t a t i o n o f chemical t r a n s p o r t phenomena i n r i v e r s ,

charge e s t i m a t i o n c o n s t i t u t e s an e s s e n t i a l p r e l i m i n a r y step.

t h e mass-di s-

I f c ( t ) and q ( t )

a r e t h e c o n t i n u o u s processes r e p r e s e n t i n g t h e v a r i a t i o n s w i t h t h e t i m e o f t h e c o n c e n t r a t i o n s and o f t h e d i s c h a r g e s , t h e n t h e p r o d u c t c ( t ) q ( t ) represents t t h e f l u x o f m a t t e r a t t i m e t and Jt2 c ( t ) q ( t ) d t t h e mass d i s c h a r g e s e x p o r t e d between t i m e t, and t2.

I

346

I f simultaneous and equispaced t i m e - s e r i e s a r e a v a i l a b l e ,

t h e n t h e mass-dis-

N

1 ci -qi * A t a c c o r d i n g t o i=I I f sampling f r e q u e n c i e s a r e n o t

c h a r g e s a r e e v a l u a t e d by d i s c r e t e summations o f t y p e the trapezoidal r u l e o f numerical i n t e g r a t i o n .

t h e same d a t a m a n i p u l a t i o n such as a g g r e g a t i o n and i n t e r p o l a t i o n a r e necessary t o g e n e r a t e synchroneous e s q u i s p a c e d data. As numerous c o m b i n a t i o n s o f such m a n i p u l a t i o n s a r e p o s s i b l e t o o b t a i n a common frequency f o r t h e t r a n s f o r m e d data, some c o n s i d e r a t i o n s a r e expressed a b o u t t h e consequence o f t h e s e d a t a m a n i p u l a t i o n s : I f t h e t r a n s f o r m e d v a l u e s o f t h e processes c ( t ) o r q ( t ) a r e t o be i n t e g r a t e d

i n time according t o t h e trapezoidal i n t e g r a t i o n r u l e s then t h e e r r o r s i n t r o d u ced by l i n e a r i n t e r p o l a t i o n t o i n c r e a s e t h e f r e q u e n c y o r by t h e e l i m i n a t i o n o f i n t e r m e d i a r y v a l u e s t o reduce t h e frequency,

have a s i m i l a r m a t h e m a t i c a l formu-

l a t i o n , t h e o n l y d i f f e r e n c e o r i g i n a t e s f r o m t h e number o f o r i g i n a l d a t a p o i n t s and f r o m t h e v a l u e s o f t h e a u t o c o r r e l a t i o n c o e f f i c i e n t a s s o c i a t e d w i t h t h i s time i n t e r v a l .

T h i s s h o u l d be t h e b a s i s f o r m a x i m i z i n g t h e a c c u r a c y o f mass-

d i s c h a r g e s ; i n t h i s case t h e v a r i a n c e o f t h e e r r o r a s s o c i a t e d w i t h t h e c a l c u l a t i o n i s even more complex, because one s h o u l d a l s o t a k e i n t o a c c o u n t t h e covar i a n c e s and t h e p o s s i b l e f u n c t i o n n a l r e l a t i o n s h i p between c ( t ) and q ( t ) . CONCLUSION W i t h i n t h e c o n t e x t o f l a r g e M a r k o v i a n samples,

the global

v a r i a n c e and t h e

expected RMSE o f e s t i m a t i o n r e s u l t i n g f r o m i t e r a t i v e l i n e a r i n t e r p o l a t i o n ( I L I ) have been d e r i v e d .

The r e s u l t s show t h a t t h e l e v e l o f e r r o r i n t r o d u c e d by I L I

i s more s e n s i t i v e t o t h e p e r s i s t e n c e parameter rl t h a n t o t h e l e v e l o f r e a l i z e d p a r t i t i o n p. From a p r a c t i c a l p o i n t o f view,

t h e p r e c e e d i n g developments a l l o w t h e u s e r

t o c o n t r o l t h e amount o f n o i s e v a r i a n c e t h a t he i s w i l l i n g t o i n t r o d u c e i n t o t h e r e c o r d e d s i g n a l i n o r d e r t o make use o f t h e sampled i n f o r m a t i o n a t a f r e quency h i g h e r t h a n t h e one o f t h e measurements. REFERENCE PANKRATZ, A.

(1983). F o r e c a s t i n g w i t h u n i v a r i a t e Box-Jenkins models. 309 p. J.

W i l e y and Sons.

EMPIRICAL POWER COMPARISONS O F SOME TESTS FOR TREND

K.W. HIPEL’, A.I. MCLEOD2 and P.K. FOSU3

ABSTRACT Using monte carlo studies, the powers of Kendall’s tau and the lag-one serial correlation are compared for detecting trends in time series. Simulation experiments demonstrate that tests based on Kendall’s tau are more powerful than serial correlation tests for discovering deterministic trends. On the other hand, the lag-one serial correlation is more powerful when only purely stochastic trends are present. 1.

INTRODUCTION An important consideration in environmental impact assessment is whether a set of data

is random or whether a systematic trend is present. There have been several investigations (both theoretical and empirical) of test statistics to be used to test for randomness. An overwhelming majority of the test statistics currently used are nonparametric. Kendall and Stuart (1979) and Kendall et al. (1983) developed and employed statistics which include the turning point test, the sign test, Kendall’s tau denoted by r , and the rank correlation coefficient. Authors such as Dietz and Killeen (1981), van Belle and Hughes (1984), Hirsch et al. (1982), Hirsch and Slack (1984) and Simon (1977) considered modified versions of

Kendall’s tau statistic. Cox (1966) investigated the empirical distribution of the lag-one serial correlation, r l . Bartels (1982) compared the rank von Neumann statistic (RVN) to the runs test and the von Neumann statistic (VN). In that paper, the asymptotic relative efficiency (ARE) of RVN to VN was established as having a lower bound of 0.89. Knoke : Professor, Department of Systems Design Engineering, University of Waterloo, Wa-

ter loo, Ontario. : Associate Professor, Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario. : Phd Candidate, Department of Statistical and Actuarial Sciences, University Western Ontario, London, Ontario.

348 (1975,1977,1979) investigated the distribution of the serial correlation at different lags and how they could be employed in tests of randomness. For example, in the 1977 paper,Knoke compared some nonparametric tests to r l and established that the ARE'S of the rank serial correlation test and the turning point test (for normal first order autocorrelation alternatives) to be 0.91 and 0.19 with respect to r l , respectively. Kendall et al. (1983) derived the

ARE of

7

relative to the regression estimator to be 0.89.

The main purpose of this paper is to rigorously compare the most promising tests for trend and autocorrelation. In particular, using simulation, the powers (the power of an alternative hypothesis is the probability of rejecting the null hypothesis when the alternative hypothesis is true.) of Kendall's tau and r l are evaluated using various alternative models. Following the definition of the two test statistics, the alternative models to the null hypothesis are described. Finally

, the alternative models are employed in simulation studies to

ascertain the powers of r and r l . TEST STATISTIC

2.

Following the definition of

71,

Kendall's tau statistic is described. Let

x1,.. . , x,,

be a

random sample of size n drawn from a population. The first serial correlation r1 is defined here as

where

For samples as small as 10, Cox (1966) observed that

r1

has an approximate normal null

distribution for both normal and some nonnormal parent distributions. Knoke (1977) observed empirically that the normal distribution provides an adequate approximation for determining the critical regions (the subset of the sample space in which if the observed data falls, it is rejected.). The asymptotic distribution of rl was established by Wald and Wolfowitz(l943) and Noether(l950). In this paper, critical regions for

r1

are determined

by the normal approximations with the following moments (Kendall et al,l983;Dufour and Roy,1985) mean = -l/n and variance =

(n- 2)' n * ( n - 1)

349

Knoke(1975) noted that rl is a powerful test for detecting nonrandomness for first order autoregression alternatives and that it performs reasonably well for a wider class of alternatives including the first order moving average model. The second statistic considered is Kendall’s tau, ~(Kenda11,1970).For any two pairs of random variables (X;,Y;) and (Xj,yj), Kendall’s tau is defined as the difference (Gibbons, 1971) 7

= 7 l c - nd

(4)

where rC= p[(xi < xj)n (Yi< Yj)] +p[(Xi > Xj)n (Yi>

yj)]

(5)

In the case of no possibility of ties in neither the X’s nor the Y’s, r can be further expressed as 7

= 27lc

-

1 = 1 - 2nd

These relationships give practical meaning to Kendall’s

T

(7) since it is not just a test statistic

but it can also be expressed in terms of probabilities. From the sample of size n defined above, Kendall’s

T

is estimated by

(Gibbons,1971,1796;Conover,l971;Kendall,1970; Hollander and Wolfe,1973)

where N c , Nd and S are given by n

Nc =

C%; i<3

where

and

where if x; > zj; 0, otherwise. and

(9)

350

where I$; =

{

if x; < xj; if xi = x i ; otherwise.

I, 0, -1,

The statistic S can also be expressed as

In place of the statistic (8), the test statistic most often used in the literature is S (Kendall,l97O;Hirsch and Slack,l984;Hirsch et a1,1982;van Belle and Hughes,1984), where S is as defined in (13) and (15). In fact both

T

and S are statistically equivalent.

Under the assumption that X ; is independent and identically distributed (IID), Kendall (1970) gave the mean and the variance of S as

and for no ties; for ties. Where t is the number of ties for a particular rank and

Ct sums over all such ties.

(17) For

example, given the following set of ranks: 1,2,2,4,5,5,5,8,9,10,11,12the variance of S is V ( S )= 12(11)(29)-2(1)(9)-3(2)(11) 18 - 3744118. Kendall(l970) and Mann(1945) derived the exact distribution of S for n 5 10. For samples size as small as 10, the normal approximation was found to be adequate. For use with the normal approximation, Kendall(l970) suggested a continuity correction which is the standard normal variate Z defined as if S > 0; s+l

3.

otherwise.

ALTERNATIVE MODELS Under the null hypothesis, it is assumed that the time series Zt, t = 1,2,. . . ,n consists

of IID random variables. In this paper, the following specific alternative models are entertained, where the first three models contain only deterministic trends while the last three have purely stochastic trends. In the case of a purely deterministic trend the time series, Zt, may be written

351 where f(t) is a function of time only and et is an IID sequence. On the other hand, a time series having a purely stochastic trend may be written Zt = f ( Z t - l , Z t - z , . . .) where f(&-1,Zt--2,.

+ at

. .) is a function of the past data and at

is an innovation series assumed

to be IID and with the property

where < . > denotes expectation. In actual practice, it may be difficult to distinguish between deterministic and stochastic trends. For example, the series plotted in Figure 1 was simulated from the model (1- B ) ~=zat ~

where B is the backshift operator, at

-

N I D ( 0 , I ) and 21 = 100,

22 =

101,

23

= 102.

Based upon the shape of the plot, the series might well be fitted using a purely determistic trend even though the correct model is purely stochastic. Box and Jenkins (1976) suggest that for forecasting purposes it is usually better to use purely stochastic trend models provided such a model is reasonable apriori and also gives an adequate fit. However, in water quality studies it is often of interest to test if the level of the series has changed in some way and in this case a model with a possible deterministic trend component may seem more reasonable apriori. 3.1

Linear Model In the water resources literature, using linear regression models as alternative hypotheses

is quite common (Lettenmaier,1976;Hirsch et al,l982;Hirsch and Slack, 1984;van Belle and HughesJ984 ). AssumeZt is given by

zt = a + where et

- NID(0,a’).

bt

+

Et,

t = 1,2,. . . ,n.

Without loss of generality, let a = 0.0

.

(23)

352

50( 0 0

0

00

40(

Q)

3

3 D

Q)

C

0

300

00

0

m

0 0

E’ iij

0 0

0

200 0 0 0 0

f 10 30 40

0

Sequence Number

FIGURE 1 .

50

353 3.2

Logistic Model Because it is possible for a series to change rapidly at the start and then gradually to

approach a limit, a logistic model constitutes a reasonable choice for an alternative model. This model is defined as (Cleary and Levenbach,l982)

Zt = M/(1 - c(exp -at)) where

ct

-

+~

t ,

t = 1 , 2 , . . . ,n.

(24)

N I D ( 0 , l ) and M is the limit of 2, as t tends to infinity.

Step Function Model

3.3

The Step Function Model is defined as

where c t

- N I D ( 0 , a 2 )and a is the average change in the level of the series after time t

=

n/2.

The Step Function Model is actually a specific type of intervention model which can be used to model the effects of one or more interventions upon the mean level of a series (Box and Tiao,1975). In water resources, the intervention model has been employed in ascertaining the effects of both man-induced and natural interventions upon the mean level of water quantity (Hipel et a1,1975) and water quality (Mcleod et al,l983;Whitfield and Woods,1984) time series. Hipel and Mcleod (1986) presented a wide variety of time series applications in water resources using intervention and transfer function-noise models. 3.4

Barnard’s Model This alternative model is due to Barnard(l959) and is defined as Nt

Zt = 2,-1

+ 26; + i=

ct

t = 1,2,. . . ,n.

1

where Nt follows a poisson distribution with parameter A, 6 N I D ( 0 , l ) . Without loss of generality

, let

- NID(0,u2)

(26) and

ct

-

21 = €1. Barnard (1959) developed this model

for the use in quality control where there may be a series of Nt correctional jumps between measurements. 3.5

Second Order Autoregressive Model The second order autoregressive model may be written as (Kendall et a1.1983)

354 where

ct

-

E(Zt) = 0.0 3.6

N I D ( 0 , u 2 ) . For the simulation studies executed in this paper, u2 = 1.0 and

.

Threshold Autoregressive Model (TAR) The development of this type of model is due to Tong (1977,1978,1983), Tong and Lim

(1980) and Tong et al(1985). Tong(1983) and Tong and Lim (1980), Tong et al(1685) found TAR models to be suitable for modelling and forecasting riverflows. The particular model considered here(Tong,l983;Tong et a1,1985) is given by

where Zt is the volume of riverflow per cubic metre per second per day, Jt is the temperature in degrees centigrade e j l )

-

NID(O,0.69) and ej2)

-

NID(O,7.18)

. The above model was

estimated for the Vatnsdalsa River in Iceland for the period 1972 to 1974. 4.

SIMULATION EXPERIMENTS Sample sizes of 10,20,50 and 100 are considered. The power functions are estimated

for a significance Ievel of 5% by the proportion of rejections from 1000 replications. The

standard error of any entry in the tables is number of replications and

T

d ~ (- lT ) / N (Cochran,l977), where N is the

is the true rejection rate. For example, for the estimated

significance level of 5%, the standard error is

4°'05110i:'05) = 0.0069.

For the estimated significance level, the test is said to be conservative if the estimated level is clearly less than the nominal level (in this case 0.05). On the other hand, if the estimated level is clearly greater than the 0.05, the test is said to be optimistic. Otherwise the test is said to be adequately approximated. Empirical significance levels and powers are given in the six tables below, where there is a table for each of the six alternatives described in Section 3. The results suggest the critical regions are adequately determined by the null approximate distribution.

355

Empirical Rejection Rate at 5 Percent Level of Significance TABLE 1. LINEAR MODEL n 10 B U 7 71 0.00 0.05 0.052 0.041 0.01 0.05 0.335 0.138 0.50 0.01 0.057 0.039 0.01 1.00 0.053 0.040 0.041 0.01 2.00 0.050 0.05 0.05 1.000 0.990 0.05 0.50 0.121 0.063 0.05 1.00 0.066 0.049 0.05 2.00 0.050 0.040 1.ooo 0.10 0.05 1.000 0.138 0.335 0.10 0.50 0.063 0.10 1.00 0.121 0.049 0.10 2.00 0.066

TABLE 2. LOGISTIC MODEL n 10 A C M 7 rl .01 .01 0.0 0.052 0.041 .01 0.041 .01 0.1 0.052 .01 0.041 .01 1.0 0.052 .01 .01 5.0 0.052 0.041 .01 .50 0.0 0.052 0.041 .50 0.1 0.053 .01 0.042 .50 1.0 0.051 .01 0.042 .50 5.0 0.096 .01 0.046 0.052 .90 0.0 .01 0.041 .90 0.1 0.057 0.045 .01 .90 1.0 0.825 .01 0.431 .01 1.000 .90 5.0 1.000 0.052 .10 .01 0.0 0.041 .01 0.1 0.052 .10 0.041 1.0 0.053 .10 .01 0.041 0.053 .10 .01 5.0 0.041 0.052 .50 0.0 .10 0.041 0.054 .10 .50 0.1 0.046 0.076 0.044 .10 .50 1.0 0.622 .10 0.272 .50 5.0 0.041 0.052 .90 0.0 .10 0.045 0.052 .10 .90 0.1 0.239 .90 1.0 0.603 .10 0.976 1.000 .10 .90 5.0

20

50 71

0.035 0.995 0.072 0.050 0.036 1 .ooo 0.632 0.208 0.084 1.ooo 0.995 0.632 0.208

rl

7

0.044 0.749 0.043 0.043 0.047 1.000 0.194 0.064 0.043 1.000 0.749 0.194 0.064

0.040 1 .ooo 0.487 0.146 0.073 1.000 1.000 1.000 0.669 1.ooo 1.ooo 1.000 1.000

20 7

0.035 0.035 0.035 0.037 0.035 0.037 0.049 0.375 0.035 0.072 1.000 1.000 0.035 0.035 0.037 0.037 0.035 0.040 0.092 0.934 0.035 0.047 0.745 1.000

0.045 1.000 0.080 0.042 0.042 1.000 1.000 0.655 0.121 1.000 1.000 1.000 0.655

100 r rl 0.048 0.051 1.000 1.000 1.000 0.690 0.769 0.131 0.268 0.065 1.000 1.000 1.000 1.000 1 .ooo 1.000 0.927 1.000 1 .ooo 1.000 1.000 1.000 1.000 1.000 1.ooo 1.000

50 71

0.044 0.044 0.044 0.045 0.044 0.047 0.040 0.078 0.044 0.045 0.950 1.000 0.044 0.044 0.047 0.047 0.044 0.047 0.045 0.579 0.044 0.045 0.366 1.000

T

0.040 0.040 0.041 0.041 0.040 0.043 0.166 1.ooo 0.040 0.174 1.000 1.ooo 0.040 0.040 0.040 0.043 0.040 0.045 0.155 0.983 0.040 0.058 0.733 1.000

100 71

0.045 0.045 0.045 0.045 0.045 0.047 0.057 0.771 0.045 0.060 1 .ooo 1 .ooo 0.045 0.045 0.044 0.045 0.045 0.048 0.063 0.893 0.045 0.047 0.490 1.000

7

0.048 0.048 0.047 0.047 0.048 0.050 0.514 1.000 0.048 0.279 1.000 1.000 0.048 0.048 0.047 0.046 0.048 0.047 0.153 0.951 0.048 0.053 0.583 0.999

71

0.051 0.051 0.050 0.050 0.051 0.048 0.075 0.999 0.051 0.057 1.000 1.000 0.051 0.051 0.050 0.050 0.051 0.049 0.053 0.905 0.051 0.045 0.464 1.000

356

TABLE 3.

S T E P FUNCTION MODEL 10

n

A 0.00 0.05 0.05 0.05 0.05 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 5.00 5.00 5.00 5.00

0.05 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00

TABLE 4. n

x 1.o 1.0 1.0 1.o 2.0 2.0 2.0 2.0 5.0 5.0 5.0 5.0 10.0 10.0 10.0 10.0 20.0 20.0 20.0 20.0

7-

U

0.052 0.199 0.058 0.053 0.053 0.711 0.199 0.090 0.055 0.711 0.490 0.199 0.090 0.711 0.711 0.711 0.596

20

50

71

7

71

7

0.041 0.121 0.040 0.040 0.040 1.ooo 0.121 0.057 0.048 1.ooo 0.352 0.121 0.057 1.000 1.000 0.943 0.507

0.035 0.382 0.038 0.036 0.035 0.994 0.382 0.131 0.060 0.994 0.887 0.382 0.131 0.994 0.994 0.944 0.960

0.044 0.152 0.048 0.047 0.045 1.ooo 0.152 0.064 0.042 1.ooo 0.594 0.152 0.064 1.ooo 1.000 1.ooo 0.821

0.040 0.803 0.051 0.046 0.045 1.ooo 0.803 0.283 0.103 1.000 1.000 0.803 0.283 1.ooo 1.000 1.000 1.000

100 7

71

0.045 0.281 0.039 0.044 0.045 1.000 0.281 0.070 0.043 1.000 0.958 0.281 0.070 1.000 1.000 1.000 0.998

0.048 0.983 0.069 0.055 0.052 1.000 0.983 0.525 0.160 1.000 1.000 0.983 0.525 1.ooo 1.000 1.ooo 1.ooo

71

0.051 0.520 0.051 0.049 0.050 1.ooo 0.520 0.113 0.059 1.ooo 1.ooo 0.520 0.113 1.000 1.000 1.ooo 1.000

BARNARD’S MODEL 10 U

7-

0.05 0.50 1.oo 2.00 0.05 0.50 1.oo 2.00 0.05 0.50 1.oo 2.00 0.05 0.50 1.00 2.00 0.05 0.50 1.00 2.00

0.484 0.459 0.482 0.486 0.477 0.575 0.467 0.460 0.485 0.469 0.491 0.478 0.488 0.501 0.472 0.480 0.473 0.501 0.498 0.506

20

50

rl

r

71

7-

71

0.579 0.579 0.571 0.579 0.573 0.586 0.586 0.571 0.560 0.562 0.577 0.591 0.565 0.628 0.603 0.586 0.569 0.612 0.593 0.607

0.681 0.667 0.680 0.655 0.688 0.690 0.683 0.669 0.689 0.670 0.680 0.674 0.690 0.677 0.665 0.665 0.689 0.658 0.654 0.666

0.933 0.944 0.944 0.955 0.937 0.946 0.954 0.958 0.935 0.937 0.948 0.952 0.938 0.961 0.946 0.960 0.941 0.949 0.950 0.946

0.809 0.819 0.817 0.800 0.802 0.789 0.798 0.784 0.812 0.802 0.802 0.783 0.796 0.802 0.790 0.796 0.801 0.804 0.811 0.819

1.000 1.000 1.000 1.000 1.ooo 1.000 1.000 1.ooo 1.ooo 1.000 1.000 1.ooo 1.000 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo 1.ooo

357

TABLE 5.

AR(2) MODEL

n

10

20

50

41

42

T

rl

i-

71

-1.40 -0.70 0.70 1.40 -1.20 -0.60 0.60 1.20 -0.80 -0.40 0.40 0.80 -0.40 0.40 0.80 -0.50 -0.30 0.30 0.50 -0.40 -0.20 0.20 0.40

-0.80 -0.80 -0.80 -0.80 -0.50 -0.50 -0.50 -0.50 -0.20 -0.20 -0.20 -0.20 0.10 0.10 0.10 0.30 0.30 0.30 0.30 0.50 0.50 0.50 0.50

0.000 0.002 0.031 0.247 0.000 0.007 0.072 0.305 0.001 0.009 0.082 0.260 0.010 0.180 0.390 0.013 0.024 0.193 0.309 0.017 0.046 0.175 0.288

0.731 0.009 0.170 0.881 0.660 0.064 0.223 0.745 0.422 0.082 0.137 0.456 0.207 0.180 0.452 0.500 0.239 0.138 0.243 0.615 0.319 0.143 0.211

0.000 0.000 0.009 0.132 0.000 0.002 0.050 0.264 0.001 0.004 0.091 0.290 0.010 0.187 0.536 0.005 0.023 0.297 0.401 0.013 0.059 0.302 0.470

1.000 0.154 0.485 0.998 0.989 0.279 0.470 0.985 0.866 0.210 0.267 0.850 0.470 0.386 0.872 0.788 0.466 0.306 0.585 0.824 0.470 0.250 0.513

TABLE 6. TAR MODEL n: 10 20 71 r r 0.320 0.499 0.435

4.1

-

T

0.000 0.000 0.002 0.067 0.000 0.000 0.036 0.285 0.000 0.002 0.089 0.264 0.008 0.248 0.633 0.002 0.023 0.335 0.524 0.005 0.088 0.377 0.610

100

r

rl 1.000 0.975 0.991 1.ooo 1.000 0.900 0.932 1.ooo 1.000 0.645 0.684 1.000 0.840 0.814 1.ooo 0.990 0.776 0.:01 0.975 0.984 0.682 0.578 0.933

0.000 0.000 0.000 0.045 0.000 0.000 0.031 0.230 0.000 0.002 0.103 0.268 0.007 0.245 0.625 0.005 0.028 0.348 0.527 0.008 0.061 0.421 0.684

50 rl

0.904

T

0.320

rl 1.000 1.000 1.ooo 1.000 1.ooo 0.998 1.000 1.000 1.000 0.939 0.951 1.ooo 0.991 0.985 1.ooo 1.ooo 1.000 0.934 0.999 1.000 0.881 0.831 0.999

100

r

rl

1.000

0.338

71

1.ooo

Linear Model The results for this model (Table 1) indicate t h a t the smaller the standard deviation, the

better the performance of the two tests. This implies the better the fit of linear regression t o a time series the greater the chance of the detection of nonrandomness. For samples as small as 10, the tests are very powerful for small standard deviations. An encouraging aspect of this model is t h a t both tests attain asymptotic efficiency quite rapidly. For example, there is considerable improvement in the power functions from n=lO to n=20. A noteworthy point is t h a t r is generally more powerful compared t o

TI,

even though

the difference is almost negligible for n=50 and n=100 when both tests approach asymptotic efficiency.

358

4.2

Logistic Model The results for this model are presented in Table 2. Here too the tests perform better

when there is a good fit, indicating nonrandomness.When M

<

1.0, obviously the stan-

dard deviation of 1.0 used in the simulation studies tends t o have a greater impact on the simulated d a t a than the parameters of t h e model. Hence a great component of mined by

~ t which ,

is random. For M

Zt is deter-

> 1.0, the two tests (especially T ) prove effective for

detecting the presence of trend. Finally, it is seen t h a t r is more powerful than r l , especially for cases where the Logistic Model describes the d a t a fairly well ( M

2

1.0). There is, however, not much difference

between the two tests when n=100. 4.3

Step Function Model The results for this alternative model (Table 3) indicate greater power for relatively small

standard deviations (and hence fairly good fits). What is remarkable about this model is the great power of both tests even for samples as small as 10. For example, for a=5, the power is a t least 50% for all sample sizes. The power functions also improve as a increases. Both tests are very effective in detecting trend for even a slight shift of 0.5 in the mean level of the series. For a change of 5 in the mean level, both tests are very powerful. Even though tests are almost equally powerful for n 4.4

T

is more powerful t h a n r l , both

2 50.

Barnard’s Model The results of Table 4 are very consistent and easily comprehensible. For all sample

sizes and all combinations of lamda (A) and standard deviation (o),r1 has greater power than r For n as small as 50, r l attains asymptotic efficiency, while r is only about 80% efficient. T h e power of the two tests can be well appreciated by considering the results for n=IO. While the power of r is about 50% t h a t of r l is always greater t h a n 50%. 4.5

Second Order Autoregressive Model The results of this model (Table 5) parallel fairly closely those of Barnard’s model. T h e

main difference between. the two models is t h a t the results here are not as dramatic as in Table 4. Here too, r l is more powerful than r. As n increases, the power of rl increases faster than t h a t of r. For n=100, r l attains almost 100% efficiency while

T

performs fairly poorly in some cases. For example, for

$1

=

359

-0.2 and 4.6

42

= 0.5 the power of r l is 0.881 while t h a t of r is only 0.061 for n = 100.

TAR Model T h e results for this model are very similar t o those of Tables 4 and 5. T h e rl test is

obvious'ly more powerful than r. Also, while the power of r l increases very rapidly with increasing n, the the power of r only makes a slow progression. T h e

r1

test is about 90%

efficient for n=20 and it attains 100% efficiency a t n=50. On the other hand, the power of

r is less than 50% even for n=100. 5.

CONCLUSION The general deduction from Section 4 is that r is more powerful for t h e first three

models while r1 is more powerful for the last three. As noted in Section 3, the first three models contain deterministic trends and the last three have stochastic trends. Therefore, it is reasonable to conclude that r is more powerful for detecting deterministic trends while

rl

is more powerful for discovering stochastic trends. In practice, it is advantageous to have both a sound physical and statistical understanding of the time series being analyzed. This will allow one to decide whether one should employ models possessing deterministic trends or whether one should use models having stochastic trends. For example, it may be better to model certain kinds of water quality measurements using models having deterministic trends. On the other hand, for modelling seasonal riverflows, models having stochastic trends, such as a threshhold autoregressive model, may work well (Tong,l983;Tong et a1.,1985). In some cases, one may wish t o use a model which possesess both deterministic and stochastic trends.

360

REFERENCES Barnard, G.A. (1959). Control Charts and Stochastic Processes.

Journal of the Royal

Statistical Society,Series B 2 1 , 239-271. Bartels, R. (1982). The Rank version of von Neumann’s Ratio Test for Randomness.Journa1 of the American Statistical Association 77, 40-46. Box, G.E.P.;Jenkins, G.M. (1976). Time Series Ana1ysis:Forecasting and Control.2nd Edition, Holden-Day. Box, G.E.P.;Tiao,G.C. (1975). Intervention Analysis with Applications to Economic and Environmental Problems.Journal of the American Statistical Association 70,70-79. Cleary, J.A.;Levenbach, H. (1982). The Professional Forecaster. Lifetime Learning Publications,Belmont, California. Cochran, W.G. (1977). Sampling Techniques.3rd Edition, Wiley, New York. Conover, W.J. (1971). Practical Nonparametric Statistics. Wiley,New York. Cox, D.R. (1966). The null distribution of the first serial correlation coefficient.Biometrika 53,623-626.

Dietz, E.J.;Killeen, T. (1981). A Nonparametric Multivariate Test for Monotone Trend with Pharmaceutical Applications. Journal of the American Statistical Association 76,169-1 74. Dufuor, J.M.;Roy, R. (1985). Some Robust Exact Results on Sample Autocorrelation and Tests of Randomness.Department of Data Processing and Operation Research, University of Montreal.

Gibbons, J.D. (1971). Nonparametric Statistical Inference. McGraw-Hill,New York. Gibbons, J.D. (1976). Nonparametric Methods for Quantitative Analysis.Holt,Rinehart and Wins ton, Ne w York. Hipel, K.W.;Lennox, W.C.;Unny, T.E.;Mcleod, A.I. (1975). htervention Analysis in Water Research. Water Resources Research l l ( 6 ) ,855-861. Hipel, K.W.;Mcleod, A.I. (1986). Time Series Modelling for Water Resources and Environmental Engineers.Elsevier,Amsterdam. Hirsch, R.M.;Slack, J.R.;Smith, R. (1982). Tecniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1),107-121. Hirsch, R.M.;Slack, J.R. (1984). A Nomparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6),727-732.

361

Hollander, M.;Wolfe,D.A. (1973). Nonparametric Statistical Methods. Wiley,NewYork. Kendall, M.G. (1970). Rank Correlation Methods.4th Edition, GriEn’London. Kendall, M.G.;Stuart, A . (1979). T h e Advanced Theory o f Statistics. Vol I.GriEn,london. Kendall, M.G.;Stuart, A.;Ord, J.K. (1983). The Advanced Theory o f Statistics.Vo1 3.Griffin,London. Knoke, J.D. (1975). Testing for Randomness Against Autocorrelated A1ternatives:The Parametric Case. Biometrika 62, 571-575. Knoke, J.D. (1977). Testing for Randomness Against Autocorrelation. Alternative Tests. Biometrika 64,523-529. Knoke, J.D. (1979). Normal Approximations for Serial Correlation Statistics.Biometrics 35,491-495.

Lettenmaier, D.R. (1976). Detection of Trends in Water Quality Data from Records with Dependent Observations. Water Resources Research 1 2 (5),1036-1046. Mann, H.B. (1945). Nonparametric Tests Against Trend.Econometrica 13,245-259. Mcleod, A.I.;Hipel, K . W.;Camacho, F. (1983). Trend Assessment o f Water Quality T i m e Series. Water Resources Bulletin 19(4),537-547. Noether, G.E. (1950).

Asymptotic Properties o f the Wald- Wolfowitz Test o f Random-

ness.Annals o f Mathematical Statistics 21, 231-246. Simon, G. (1977). A Nonparametric Test o f Total Independence Based on Kendall’s Tau. Biometrika 64,237-282. Tong, H. (1977). Discussion o f p a p e r by A.J. Lawrence and N.T. Kottegoda.Journa1 o f the Royal Statistical Society, Series A 140,34-35. Tong, H. (1 978). On a Threshold Mode1.h Pattern Recognition and Signal Processing.(C.H. Chen ed).Sijthoffand Noordhoff,TheNetherlands. Tong, H. (1983). Threshold Model in Non-Linear T i m e Series Analysis. Lecture Notes No. 21,New York,Springer Verlag. Tong, H.;Lim, S. (1980). Threshold Autoregression,Limit Cycles and Cyclical Data (with discussion). Journal o f the Royal Statistical Society,Series B 42,245-292. Tong, H.;Thanoon, B.;Gudmundsson, G. (1985). Threshold T i m e Series Modelling o f two Icelandic Riverflow Systems.Water Resources Bulletin 2 1 . van Belle, G.;Hughes, J. (1984). Nonparametric Tests for Trend in Water Quality. Water Resources Research 20 (1),127-136.

362

Wald, A.;Wolfowitz,J . (1943). A n Exact Test for Randomness in the Nonparametric Case Based on Serial Correlation .Annals o f Mathematical Statistics 14,378-388. Whitfield,P.H.; Woods, P.F. (1984). Intervention Analysis of Water Quality Records. Water Resources Bulletin 20[5),657-667.

Statistical Assessment of a Lhnnological Data Set by Robert Clifford, Jr., John W. Wilkinson and Nicholas L. Clesceri Rensselaer Polytechnic Institute, Troy, N.Y.

ABSTRACT

In a study of Wisconsin Lakes, to examine the effects upon water quality of imposition of a ban on detergent phosphorus, the design protocol employed the concept of test lakes and reference lakes. A pairing was made of each test lake with a reference lake having as many similar characteristics as possible with the test lake except for a loading of phosphorus from municipal wastewater effluent or septic tank seepage. The responses measured for each lake were physical, chemical and biological in nature. Measurements were taken both before and after imposition of the ban. To estimate the potential effect of the ban, three forms of statistical models were used; (i) for each test lake a model using the reference lake variable as a covariate and the ban as a classification variable, (ii) a comprehensive model for all of the lakes combined using the reference lakes as covariates and the test lakes as dummy variables, and (iii) multivariate models providing multiple comparison estimates for pre- and post-ban differences. The advantage to the paired lake approach is the potential for variance reduction, and an examination of this was made for several data sets. In this paper are discussed the comparisons of the modeling procedures as well as estimates of the "ban effects." Also presented are some of the observed distributional characteristics of the measured responses. INTRODUCTION

The growth of algae is, to a large extent, regulated by the presence of the macronutrients nitrogen and phosphorus in the water column (Hutchinson, 1957, Wetzyl, 1975). Excess growth can degrade water quality by reducing clarity, adding noxious odors and taste to the water, hampering motorboat movement, and reducing overall aesthetic quality. Of the macronutrients, phosphorus is most frequently "limiting", i.e. the amount of phosphorus input to a water body is the regulating factor in photosynthetic production (Likens, 1972, Schindler, 1977). Phosphorus is an important ingredient in laundry detergents, serving as a "builder" by, among other things, reducing water hardness. As a means of reducing the load of phosphorus to both municipal and private wastewater treatment systems bans prohibiting the presence of phosphorus in laundry detergents have

364

been imposed in numerous locations around the United States. Although a reduction in treatment plant loadings of phosphorus have been monitored in some of these areas, mixed reviews have appeared as to the effectiveness of detergent phosphorus bans in subsequently improving water quality in these locales (Pieczonka and Hopson, 1 9 7 4 , Bell and Spacie, 1 9 7 8 , Hartig and Horvath, 1 9 8 2 , Runke, 1 9 8 2 , Maki, Porcella and Wendt, 1 9 8 4 ) . The state legislature of Wisconsin enacted a multi-year detergent phosphorus ban which became effective on 1 July 1 9 7 9 and was in effect to 3 0 June 1 9 8 2 . The Soap and Detergent Association initiated a lake study program in 1 9 7 8 and continued it through 1 9 8 3 in order to determine the effectiveness of the ban. The study looked at physical, chemical and biological parameters from the study lakes to determine if any changes in these were resultant from imposition of the ban. An assumption accepted, and borne out throughout the literature, was the strong relationship between phosphorus concentrations and a number of other lake water quality parameters. Typically, trend analysis of water quality data is hampered by several factors, among them missing values, values below detection limits, seasonality, and the non-normality of the parameter distributions (Hirsch, Slack and Smith, 1 9 8 2 , Van Belle and Hughes, 1 9 8 4 ) . It has also been reported that an extensive data record is necessary in the assessment of lake restoration programs in order to increase the statistical power level if parametric tests are used (Trautmann, et. al., 1 9 8 2 ) . As a result, non-parametric statistical methods are usually employed to determine time related variations in water quality. These studies, however, assume that a monitoring record is available only for a limited number of lakes or for only those lakes which are impacted by phosphorus control measures. Considering that the imposition of a detergent phosphorus ban was an experiment in improving water quality, two groups of lakes were selected for investigation. The experimental group, or "test" lakes, were those lakes within the state of Wisconsin that were determined to be receiving a significant percentage of their phosphorus loading as sewage effluent, from either public or private treatment systems. These lakes would therefore be the most likely to be impacted by a reduction in phosphorus concentration from these sources. The control group, or "reference" lakes, were lakes determined not to be impacted by sewage eff-

365

luent. By coincidentally monitoring reference lakes a baseline would be established reflecting only natural fluctuations in water quality occurring over time, those chiefly a function of climatic conditions (i.e. temperature, rainfall amounts and frequency). An overall temporal trend in water quality data observed upon the test lakes, which significantly deviated from any -observed upon the reference lakes, could them be ascribed as a function of imposition of the detergent phosphorus ban. MONITORING METHODOLOGY

In considering lakes to be included in the monitoring program, preference was given to those for which historical information was available from sources such as the National Eutrophication Survey (NES) or the Wisconsin Department of Natural Resources (WDNR) Quarterly Monitoring Program. Consideration in terms of size, depth, and hydraulic residence time followed NES selection criteria (NES, 1974). The locations of the lakes selected for the study are shown in Figure 1. The apparent concentration of study lakes in the northern part of the state is consistent with the actual partitioning of lakes within the state (WDNR, 1975). Groups of lakes fall within regional boundaries set by the WDNR and corresponding to bedrock and glacial geology as well as soil cover (Lillie and Mason, 1983). These groups include test lakes and their corresponding reference lake. In the analysis, test lakes Butternut, Elk and Balsam were paired with reference lake Teal. These lakes are situated in granite soils underlain by a sandstone bedrock (Prescott, 1962). Test lakes MOSS, Enterprise and Townline were paired with reference lake Little Bearskin; all are surrounded by sandy or silty soil and underlain by sandstone. Test lake Swan is paired with reference lake Fish; both are located in the alkaline soil of the southern regions of the state and are underlain by limestone. Limnologic, morphologic and drainage basin characteristics of the lakes are summarized in Table 1. Reference lakes are geographically proximate to their test lakes and it may be noted from Table 1 that, in several cases, morphological dissimilarities are minimal between test-reference lake pairs. Though not "pristine" (residences are located along the lake shore), the reference lakes have the least amount of drainage basin area devoted to shoreline development. The extent of impaction by

366

WISCONSIN

/

MICHIGAN

-5

I'

d

FISH

@MADISON

I

Figure 1. Locations of the Wisconsin Study Lakes sewage effluent upon the test lakes is listed in Table 2. The determination that Balsam, Moss and Enterprise lakes were not impacted by effluent phosphorus was made upon a reevaluation of nutrient loadings conducted after the monitoring study. At the time the study was initiated, in 1978, the phosphorus removal capabilities of municipal land treatment systems and private septic tank tile field systems were in question. These lakes were maintained as test lakes throughout the analysis since they did differ from their respective reference lakes by having effluent land treatment systems within their watersheds and, therefore, could be used to verify the phosphorus removal capabilities of these types of systems. Since the detergent

367

TABLE 1 Limnological, Morphological, and Drainage Basin Characteristics of the Study Lakes.

Lake

County

Surface Area (ha.),

Volume (10"*6 cu m.)

Mean Depth

(m)

2

3

Max Depth

(m)

Mean Hydraulic Residence Ti,e

Number o f Tributaries

in

5

Out

(days) ~

_

_

Immed. Drain.

No- O f Residences

":By:

i n 1981

6

(sq.km.1

_

7

~~

Butternut

Prlce

407

17.10

4.2

10.0

180

4

1

8.5

Elk

Prlce

36

0.55

1.5

6.0

c5

1

1

3.8

7

Balsam

Hashburn

119

8.74

7.3

15.0

70

2

1

9.6

24

265

Tea I

Sawyer

425

16.15

3.8

9.0

210

2

1

9.7

137

Moss

V I 10s

79

2.36

3.0

9.0

900

0

1

3.0

40

Towntine

Oneida

62

2.15

3.5

6.0

220

2

1

1.1

71

204

7.26

3.6

(1.0

620

1

1

10.9

124

66

1.57

2.4

8.0

50

1

1

5.0

43

E n t e r p r l s e Langlsde Llttie Oneida Bea r s k l n swan

Columbia

164

16.03

9.8

25.0

160

1

1

21.3

104

Flsh

Dane

102

6.34

6.2

19.0

1410

0

0

7.7

66

(1) Source - Wisconsin Department of Natural Resources (1981) (2) Volumes estimated planimetrically using depth contours from maps prepared by The Clarkson Company, Kauksuna, WI (3) Lake volume divided by surface area ( 4 ) Lake volume divided by the mean annual flow (5) Intermittent streams are not listed as tributaries ( 6 ) Source - Wisconsin Department of Natural Resources (1975) (7) Visual survey conducted by the Environmental Research Group, Inc., St. Paul MN. (Note: a resort was counted as equivalent to 20 residences, a scout camp equivalent to 40 residences)

phosphorus ban was intended to impact lakes which would be considered candidates for nutrient reduction measures, such as a lake possessing an effluent discharge within its watershed, the effect of the ban upon Balsam, Moss and Enterprise lakes would be relevant to the overall success of the ban. The Wisconsin lakes were monitored from 1978 through 1982. Only reference lakes were monitored during 1979, the year the ban was initiated. Monitoring of Fish Lake was discontinued in 1981 and, hence, data from Fish Lake is not included in the statistical analyses. Field trips to the lakes occurred between ice-out (late April to mid-May) and fall overturn (late October to early November). The interval between sampling was typically four weeks although samples were collected every two weeks during the summer months (July and August). Samples and measurements were taken, on all of the lakes, at the location of the deepest point and at one or two other locations, depending upon the

368

TABLE 2. Extent of Wastewater Treatment Within Study Lake Basins.

Lake

Name of Municipa 1 WWTP

Final Application of Treated Wastewater

----_-_--------_--__ ....................... Elk Butternut

Phillips Butternut

Direct Discharge to Lake Indirect Discharge to Surface Water

II Swan Pardeeville II Townline Three Lakes Balsam Birchwood Land Disposal 11 Moss Lac du Flambeau Enterprise --Septic tank/Tile field

Phosphorus Load % of Total kg/yr Load

------ -----1660

22

480

19

1730

39

54 0

8 0 0 0

0 0

Note: About 30% of wastewater phosphorus may be assumed to come from detergents. morphology of the lake. Transparency was measured using a standard Secchi disk. profile measurements were made at one meter depth increments for temperature, dissolved oxygen, and conductivity. An integrated two meter sample of the epilimnion was obtained using a 37 mm (I.D.) PVC pipe. Aliquots of the integrated sample were stored in amber Nalgene bottles at 4 O C and earmarked for specific analyses. Chemical analysis of the samples was typically initiated within 48 hours. Total Phosphorus determinations followed persulfate digestion (Menzyl and Corwin, 1 9 6 5 ) ; the colorimetric reaction involved reduction using ascorbic acid (Murphy and Riley, 1 9 6 2 ) . Chlorophyll-g was determined using trichromatic methods (APHA, 1 9 7 6 ) . Temporal Variation of the Data Temporal plots of the monitoring data, such as that presented in Figure 2, evidence the amount of variability present in water quality records of either physical or chemical parameters. Yearly trends in any of the monitored parameters were difficult to discern from the plots. However, a degree of "tracking", a synchronous correspondance between plots for test and reference lake pairs, could be ascertained in several cases. The obvious imprecision of any subjective determinations made upon the data set, however, lead to the statistical methodolgy employed.

Y

0

7t

3

5

c

0,

0

I

z

...

m

7c

s

B

m

--I

u l

<

3

3 m

r

D 2

v,

r-

m D

I

I

Di

< r

U

0

m

0

W

0

r

I

SEfCH% DE?3TH ‘?ETys’ 6 0

H

I

H

r-

’q -

0

20

h

dl

CO CEN (UG L) 60 100 120

0

Y

rn

-4

J:

60 Coh!ENf8dUGhb’ 300 360

370

STATISTICAL METHODOLOGY AND RESULTS

For examination of a potential ban effect, three types of statistical analysis were used: (i) covariance analysis for each test lake separately, (ii) combined covariance analyses for all test lakes, and (iii) multivariate analysis obtaining multiple comparison estimates for pre- and post-ban differences of interest. All analyses were performed using logarithmic transformations of the original lake data, a scale of measurement strongly supported by earlier lake data distribution studies. Covariance Analysis for Individual Test Lakes For each test lake, a covariance analysis was performed using a model of the form: log yt =

Po + Pllog Y, + P, B +

E

,

where yt

represents a test lake observation, represents a corresponding reference lake yr observation, B is a 0, 1 indicator variable indicating a pre- or post-ban observation. One can think of this model permitting variance reduction of the test lake data, due to their association with the reference lake data obtained under similar background conditions, thus allowing a potential difference due to the ban to be detected with improved sensitivity. Table 3 lists the salient features of the covariance analyses for the logarithms of the responses for the individual test lakes. Covariance Analysis for All Test Lakes The estimate for the change in the intercept associated with the post-ban period is the feature of greatest interest. Only for Elk and Townline Lakes for secchi disc depth is this change statistically significant. Part of this may be attributed to the small sample sizes and the large variability, encouraging an examination of the test lakes simultaneously using an "indicator variable" approach. This analysis is described next. Assuming that the test lake-reference lake relationship is similar for all the lakes, improved sensitivity for ban-effect detection is provided by a model that simultaneously considers

371

TABLE 3 . Covariance Analysis for Individual Test Lakes. I-----Flsh-----l---------------------Teal------------------I Elk wan I Balsam I Butternut I I I I

RererenCe Lake

I

Lake

Pa ramt e r Intercept

II

I

Po

Post-Ban Change Standard E r r o r o f Change

11.89 1.52 1.111

8

I

I .08 21

I

I

.13

,89 .27 1.0411.29 -.06

I

I

8,

I

.12 -.201 .OO

.13 -.Oll-.05

.17

.2(1II

.10

.201 .09

.05

.06

.21 1 .08 I

.05

.151 1

I

I

I I .28 . 3 5

.33l .15

I

.191 .08

I

I I .OO

.08

.ll

I

I

.13

.271 .13 I

.18

.Oll .31

.42

. 0 4 l .25

.39

.lo1 .OO

.12

.041 .OO

I

I

R-Squa red

I

.It(*.33 I

.31l .OO

* * I * * I - . 8 0 -.15l .51 .55 .281 .38 .68

1-.13

Standard E r r o r

.4311.36

.I7

0

slope

,I I I I

A R-SqUa red Rsrerence Lake

j--------------Little

*

I

I

I

*

Y

.26

-48

.13

.20 I I

.20

.I41

.27

.lll

I

I

I

I

I

I

4' I

d

.33I .14

.la1 .13 I .09 . O O l .OO

I

I

.801 .30

I

I

I I .02 I

I

I

I

t I

I

Bearskin---------------

I

Enterprise I Moss I I I ITP Se WIZ S D W l I I -I 1-.41 -.01 .13I-.28 .37 .081 I I I

I I

Lake Pa ramst e r Intercept

Po

Post-Bsn Change standard

o f Change

error

B

I st8 so0 21 I -12 -07 I 4

8,

Slope Standard E r r o r

R-SqUa red

A R-squs red

I 1.13 I I -22 I

,

0

-161 - 0 2

I

Se 2 !@ .!

.85

.314 .97

-.lo

-06 -071 -00

-22

.09l - 0 8

-05

.12

-25

.32

I

- 1 3 1 -10 .05

I

I 41

T~~nIlnd TP

4

41

0

0

.54 I .52

I)

.95

.7011.08

-26

-181 -19 -17 el21 -14 -19 -1s

I .47 .34 I I .04 .OO I

I

.33I .54 I -03I .OO I

.67

I

i

.41 .Oh

.391 .31 I

.021 .OO

I

i

.lb

.141

.33

.021

I

I

* Significance at the 5% level all test lakes.

Such a model has the form:

where each D is a dummy of indicator 0,l variable depending upon j whether or not the observation is from the j-th test lake or not, and B is a 0,l variable for pre- or post-ban ( g denotes the number of test lakes minus one). This model permits estimation of the differential effect on the slope and intercept for the various test lakes. In partitioning the test lake variability, the method of estimation removed the components due to the

372

indicator variables for the different test lakes and due to the reference lakes, before evaluating the ban component. Another way of saying this is that the log test lake response is being considered as the sum of a general intercept, a linear component relationship with the log reference lake response, an adjustment in the intercept for the specific test lake, an adjustment in the slope for the specific test lake, an adjustment in the general intercept for the pre/post ban and an adjustment in the intercept for a specific test lake for the pre/post ban with the analysis partitioning the test lake response variability into assignable sources in the order listed. Table 4 provides a summary of the analysis of variance for each of total phosphorus (TP), Secchi disc depth (SD) and chlorophyll-5 (CHLA). For the corrected total sum of squares, the variability was partitioned sequentially into the following components: reference lake, intercept adjustment for different test lakes, adjustment of slope of reference lake variables for different test lakes, and finally, intercept adjustment for post/pre-ban effect. Another way of expressing this is that one adjusts the total test lake response variability for potential relationship with the corresponding reference lake response and for individual test lake differences and then examines for the TABLE 4. Combined Covariate Analysis.

Model Steps

Parameter

sum of Mean Squares D.F. Square

Test Stat.

R

&A

2

373

effect of imposition of the ban. Only the Secchi disc depth measurement showed a detectable variation between the pre- and post-ban values at a five percent level of significance. By inspection of the column under R2 in Table 4, one can assess the proportion of variability in the data explained by the model. The model appears to do much better in this respect for total phosphorus (.51) and Secchi disc depth ( . 7 1 ) than it does for chlorophyll-& (.31). Some additional information of potential interest that can be obtained from Table 4 is the proportion of the variability explained by various groups of terms in the model. Those are summarized in Table 5. TABLE 5. Proportion of Variability Explained by Various Sources.

Source Measurement _____--------------------Reference Lake Covariate

Test Lake Difference

Proportion of Variability Exp 1ained by Model

--_------__-_

Proportion of Total Variability

_ _ - T - - - - _ _ _ _ _ _ _ _ _

TP SD CHLA

0.31 0.22 0.19

0.16 0.15

TP SD CHLA

0.63

0.34 0.49

0.69 0.71

0.06

0.22

Table 6 lists each test lake's estimates of the slope coefficients for the corresponding reference lake as well as estimates of the amount of shift in the model after the imposition of the ban. The estimated standard deviations of these estimates are listed in parentheses. Asterisks ( * I are used to indicate statistical significance of at least the five percent level. A shift associated with the ban was detectable at the five percent level in only 4 of the 21 cases, namely for total phosphorus in Enterprise Lake, Secchi disc depth in Elk and Townline Lakes, and chlorophyll-5 in Elk Lake. In two of these cases (total phosphorus for Enterprise Lake and chlorophyll-5 for Elk Lake) a positive direction in the post-ban shift is not something that could be attributable to the ban. Hence, from

314

TABLE 6 . Estimates of the post-/pre-ban shift and slope-coefficient for corresponding reference lakes. Chlorophyll - a

----------------

.oa

Swan

(.lo) Balsam

-.01

Butternut Elk Enterprise

*

.47

(-09)

(.I71

-.01 (.09)

.37 (.IT)

-.04 (.09)

.35 * (-17)

.17 (.09)

Moss

-.13 (.23)

0

1.14

*

*

.16 (.16)

-.0 2

.54

(.04)

(-17)

.33 .21)

-.06 (.I61

.38 .21)

.31 (.16

.26 (.I61

.40 .21)

0

.75

.15

.70 * .21)

.54

(.16)

(.20

.07

.66 (.20

.07

.24

.10

(-16)

(.09)

Townline

(.16)

*

(.I71 1.11

-.19

*

.22

*

*

.21) .32

* Statistically significant at at least the 5% level (

)

Standard Deviation

this analysis, the only effect that appears to be associated with the ban is for Secchi disc depth. The relative magnitude of this shift is approximately 10 percent, and, although statistically significant, a question could be raised about the meaningfulness of its significance. The number of slope estimates that are statistically significant is an indicator that the relationship of the reference lakes to the test lakes is accounting for a statistically significant proportion of the variability. These data were useful in making the analysis more sensitive. However, the amount of variability not explained by this relationship is larger still. Multivariate Analysis/Multiple Comparisons A general multivariate analysis taking into account the covariance structure of the data was carried out. It complements

375

the preceding two analyses by using statistical procedures which account for possible correlation of the measurements. To this end, the measurements for a given lake and year were considered to be a single multivariate variable, or vector, for purposes of analysis. For each post-ban year, the vector analyzed actually consisted of the differences from the corresponding sampling times for the single pre-ban year. In one analysis, the test lakes and the reference lakes were considered together. In another analysis, the test lakes were considered separately. In either case, the vectors of differences were analyzed in a two-way table in which the entries were identified by lake and by year. Simultaneous confidence intervals on the differences were also constructed. The description of the procedures for these analyses are given in the Appendix. A similar analysis was also performed for each test lake using the differences of the logarithms of the test lake measurements and the corresponding reference lake measurements, in a sense an analysis of the test lake data adjusted for a potential relationship with its corresponding reference lake. The estimated differences for correspondings dates between post- and pre-ban measurements and their simultaneous confidence intervals are best presented graphically. Figure 3 shows the

Toral Phosphorus

Chlorophyll 2

1

1982

1982

1982

1981

1981

1981

1980

1980

-100

-1.20

0.m

Um

Contrast Value

zoo

-2.M

-1.00

0.00

1.00

Contrasr Value

200

-0.80

-0.20

Om

Convast Value

Figure 3. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of data from all lakes.

&I

376

results for the three analyses for total phosphorus, Secchi disk depth, and chlorophyll-a. Figure 4 presents a similar analysis for test lakes only. For each response variable and year, the left curve is the lower confidence bound, the middle curve the estimated contrast value, and the right curve the upper confidence bound. A vertical "no effect" line passes through zero. It is clear from Figures 3 and 4 that the ban has not had a statistically significant effect on total phosphorus, chlorophyll-a, or Secchi disc depth, although the general positive nature of the estimate for the latter for all post-ban years may support an indication of some effect for Secchi disc depth. This multivariate analysis was also performed for data constructed from the differences of the log test lake responses and the corresponding log reference lake responses. Graphs of the simultaneous confidence intervals on the differences between post- and pre-ban years for each point in time that was sampled are given in Figure 5. In this analysis no effect of the ban is observable.

Secchl Dloc Depth

Chlorophyll r?.

T o t a l Phosphorus

.q 1982

1982

1981

1381

1981

1980

1980

1980

1982

I

-2.00

Contrast Value

-1.00

0.M

1.00

Contrast Value

200

50

-0.20

0.20

Contrast Value

Figure 4. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of test lakes only, logarithmic transform of data.

a60

-

377

Toral Phosphorus

Sccchl Disc Ocprh

Chlorophyll 2

1982

1982

1982

1981

1981

1981

isen

1980

... -2w

-1.30

0.m

Conrrasr

1.00

Value

0

-2.M

-1.M

0.M

1.M

Conrrasr Value

2.00

I

-0.20

p: 0.20

Conrrasr Value

1160

Figure 5. Estimated differences and 95% confidence bounds for post-ban effects. Analysis of difference of logarithmic transform of data between test lakes and reference lakes. CONCLUSION

An effect of the phosphate ban, if any, was sufficently small that its detection with statistical significance was not possible with the amount of variability observed in the data. However, it would appear that models involving reference lake measurements had their sensitivity improved for detecting ban effects. One could use this improvement to estimate the amount of additional test lake measuring that would be needed to provide the same sensitivity if one chose to eliminate sampling the reference lakes. The multivariate/multiple comparison analysis, based upon assumptions that are more supportable, would only have been capable of detecting a ban effect if there had been much more data or the measurement variability was greatly reduced. It does permit a valid analysis without the necessity of using a model relating the response to the time of year. Of course, with sufficient frequency of sampling over time to permit reliable estimation of such a model, considerably greater power for detection of a ban effect would result.

378

APPENDIX D e s c r i p t i o n of t h e Mu1 t i v a r i a t e / M u l t i p l e Comparison A n a l y s i s

Logarithms of the measurements for each response variable for a lake in a year were analyzed as an eight-dimensional vector response. These data were then analyzed as a multivariate two-way layout. The model can be written mathematically as: X . . = wk + Yik + L j k + eijk ilk (i = 1,2,3,4, j = 1,2,3,4,5,6,7,8,9, k = 1,2,3,4,5,6,7,8) where Xijk is the kth observation on lake j in year i, pk is the kth component of the grand mean, Yik is the kth component is the kth component of the of the effect of year i, and L jk effect of lake j . The errors {e.. eij2, e113' . . eij4r eij5g eij6, eij7, e . . } are assumed to be independent eight-variable 1 38 Gaussian with zero mean and covariance matrix C . There are two principle advantages of this model: 1. It takes account of the covariance structure of the data. 2. It is simple, allowing for differences between lakes and years without assuming a specific mathematical model for the difference. Parameter Estimation The multivariate analysis of variance closely parallels its univariate counterpart. Maximum likelihood estimates of the effects are given by:

where the dot and bar denote averaging over subscripts. The maximum likelihood estimate, C , of the error covariance matrix, is proportional to the error sum of squares and cross products matrix E l where 4

9

iC= l

j C= l

(xijk - Gk - yik - L jk ' ) ( " ijg - g' - ' ig - Ljg)' The statistical tests of interest are multiple comparisons of contrasts Cik = Yik - Ylk, denoting the difference in measurement k between post-ban year i (1980, 1981, or 1982) and the pre ban year, 1978. The Cik are estimated by Cik = 'ik Ylk, (i = 1,2,3,4, k = 1,2,3,4,5,6,7,8). Ekg

=

379

By formula ( 8 ) , pp. 2 0 0 - 2 0 1 of Morrison ( 1 9 7 6 1 , the 100 (1- a) percent simultaneous confidence intervals on the {Cik} for a11 nine lakes are: 1

ik

-

ik

'ik

Here Xa is the upper lOOa percentage point of the greatest characteristic root distribution with parameters (in Morrison's notation), s = 3, m = 2, and n = 7 . 5 . We take a = 0.05, and find from Chart 11, p . 3 8 1 of Morrison that Xa = 0 . 6 6 5 . Although the above is for all nine lakes, similar expressions can be displayed for the other situations discussed in the Methods Section.

REFERENCES American Public Health Association. 1 9 7 6 . Standard Methods for the Examination of Water and Wastewater, 14th Edition. Bell, J.M. and A. Spacie. 1 9 7 8 . "Trophic Status of Fifteen Indiana Lakes in 1 9 7 7 . " Purdue University. Hartig, J.H. and F.J. Horvath. 1 9 8 2 . "A Preliminary Assessment of Michigan's Phosphorus Detergent Ban." Journal of the Water Pollution Control Federation, 5 4 ( 2 ) : 1 9 3 - 1 9 7 . Hirsch, R.M., J.R. Slack,and R.A. Smith. 1 9 8 2 . "Techniques of Trend Analysis for Monthly Water Quality Data." Water 107 - 121. Resources Research, s(1): Hutchinson, G.E. 1 9 7 3 . "Eutrophication." American Scientist, 61: 269

-

279.

Likens, G.E., ed. 1 9 7 2 . Nutrients and Eutrophication: The Limiting-Nutrient Controversy. American Society of Limnology and Oceanography, Inc. Lawrence, Kansas. Lillie, R.A. and J.W. Mason. 1 9 8 3 . Limnological Characteristics of Wisconsin Lakes. Technical Bulletin No. 1 3 8 . Department of Natural Resources, Madison, Wisconsin. 1 1 6 pp. Maki, A.W., D.B. Porcella and R.H. Wendt. 1 9 8 4 . "The Impact of Detergent Phosphorus Bans on Receiving Water Quality." Water Research, s ( 7 ) : 8 9 3 - 9 0 3 . Menzyl, D.W. and N. Corwin. 1 9 6 5 . "The Measurement of Total Phosphorus in Seawater Based on the Liberation of Organically Bound Fractions by Persulfate Digestion." Limnology and Oceanography, lo: 2 8 0 - 2 8 2 .

380

Morrison, D.F. 1976. Multivariate Statistical Methods, 2nd Edition. McGraw-Hill Book Company. 415 pp. Murphy, J. and J.P. Riley. 1962. "A Modified Single Solution Method for the Determination of Phosphate in Natural Waters." Analytica Chimica Acta, 27: 31 - 36. National Eutrophication Survey. 1974. "Relationships Between Drainage Area Characteristics and Non-point Source Nutrients in Streams." Working Paper No. 25. Pieczonka, P. and N.E. Hopson. 1974. "Phosphorus Detergent Bans How Effective?" Water and Sewage Works, July 1974: pp. 52. Prescott, G.W. 1962. Algae of the Western Great Lakes Area. C. Brown Company Publishers. Dubuque, Iowa. 977 pp.

Wm.

Runke, H. 1982. "Effects of Detergent Phosphorus on Lake Water Quality in Minnesota: A Limnological Investigation of Representative Minnesota Lakes, 1975 - 1980." A report prepared for the Procter and Gamble Company. Schindler, D.W. 1977. "Evolution of Phosphorus Limitation in Lakes." Science, 195: 260 - 262. Trautmann, N.M., C.E. McCulloch and R.T. Oglesby. 1982. "Statistical Determination of Data Requirements for Assessment of Lake Restoration Programs." Canadian Journal of Fisheries and Aquatic Sciences, 2 : 607 - 610. Van Belle, G. and J.P. Huqhes. 1984. "Nonparametric Tests for Trend In Water Quality." Water Resoukces Research, =(1) : 127 - 136. Wetzyl, R.G. 1975. Limnoloqy. W.B. Saunders, Philadelphia. 743 pp. Wisconsin Department of Natural Resources. 1975. "Classification of Wisconsin Lakes by Trophic Condition: April 15, 1975." G. Anderson, ed. WDNR, Bureau of Water Quality, Madison, WI. 108 pp.

THE CHANGE POINT PROBLEM: A REVIEW OF APPLICATIONS V.K. Jandhyala and I.B. MacNeill, The University of Western Ontario, London, Ontario, Canada N6A 5B9

Introduction In the context of process inspection schemes Page (1955) proposed a test for change in a parameter occuring a t an unknown time point. Since then, an extensive literature on this problem has appeared in various scientific journals. The problem has been dealt with under several model assumptions using existing and new statistical methodologies. While a majority of these papers contain theoretical developments and simulated data analysis, many contain both discussion of models applied to actual data obtained from a variety of practical situations and also analysis for the purpose of detection and estimation of change points at unknown times. The aim of this paper is t o review those papers that contain analysis of statistical models applied to real data. This literature contains: new modelling techniques; new methods of data analysis developed from the Bayesian approach and from likelihood methods; and non-parametric methods. The models and corresponding analyses have been applied to various types of data. While statisticalmodelling and analysis of the change point problem originated with Page (1955), literature dealing with actual application to data was initiated in 1971 by Bacon and Watts who proposed estimating the transition between two intersecting straight lines using a smooth transition function. They then applied the procedure to the estimation of change in the behaviour of stagnant surface layer height in a controlled flow of water down an inclined channel. Since then, the following authors have made contributions t o both theory and applications: Griffiths and Miller (1973), Sen and Srivastava (1975), Brown, Durbin and Evans (1975), Schweder (1976), Tsurumi (1977), Bagshaw and Johnson (1977), Pettit (1979), Hsu (1979, 1982), Smith and Cook (1980), Esterby and El-Shaarawi (1981), Menzefricke (1981), Worsley (1983), Commenges and Seal (1985) and MacNeill (1985).

Two-Regime Transition Models Bacon and Watts (1971) studied the change in behaviour of stagnant surface layer height in a controlled flow of water down an inclined channel using different surfactants. The data have been analysed by modelling them with a two-regime transition model that is sensitive to changes in the slope of a simple linear regression model. The transition model is given by

+

+

+

Y = a0 aI(z - 20) az(z - z,)trn((z - z0)/7) z where trn( (z- z~)/y)is a transition function satisfying the following smoothness conditions:

(1)

(i) 1ima+- tm(s/7) = 1, (ii) trn(0) = 0, (iii) lim,,o (iv) lirn,+-

trn(s/-y) = sgn(s), strn(s/7) = s,

and z is a random variable representing error. This model is alleged to be insensitive to the particular form of the transition function, and hence a transition function of the form trn(s/y) = tanh(s/7) is used. The parameters of the model are then obtained by a Bayesian approach. The joint marginal posterior density of zo and 7 was calculated and a peak was noted in the probability density function. Similar analysis has also been performed on another water flow data set.

382 Griffiths and Miller (1973) analyzed the same water flow data by modelling it as a regression involving a modified smooth transition function. The transition function suggested by Griffiths and Miller (1973) is

id-,

tm(s/r)= (2) thereby relaxing the condition trn(0) = 0 assumed by Bacon and Watts (1971). This transition function makes the regression line appear as a bent hyperbola. The actual fitted model for the first data set of stagnent band heights in a controlled flow of water is P = 0.556 - 0.735(z - 0.063) - 0.359d(z - 0.063)2 0.096). (3) The model of Bacon and Watts (1971) and this model are both adequate.

+

Changes in Mean Level Den and Srivastava (1986) analysed the following data sets for traffic from 1962 through 1971 in the State of Illinois: number of traffic deaths, number of thousands of traffic injuries, number of thousands of traffic accidents, and the number of deaths per hundred million vehicle miles. The data are modelled as

+

X;= Y;+l - Y; = hi 6, (4) for the purpose of detecting change of parameters at unknown time points. In the above, Y; is any of the four types of traffic data for the ith year and the 6;'s are error variables assumed to be normal with mean zero and variance a', which is unknown. The statistic suggested for detecting changes in the hi's at unknown times is U 7,

p= where

(5)

-

u = c,n_;' i ( X ; + l - X)

and

v = C,n_;l(x;+1~ ; ) 2 / 2 ( n - I). The computed values for nP2P1respectively were 0.164,0.283,0.069 and 0.198. The 95% significance point is 0.155, hence changes are detected in the lst, 2nd and fourth data sets. The significance level was based on a Monte Carlo study of 5000 simulations.

Detecting Changes in Regression Using Cusum Schemes Brown, Durbin and Evans (1975) developed tests for the stability of relationships over time based on cumulative sums of recursive residuals and cumulative sums of squares of recursive residuals. A computer package called TIMVAR was developed to implement these methodologies. These tests were applied to three practical examples. First, the methodology developed by Brown et al. has been applied to a regression model to explain growth in the number of local telephone calls. The model involves a constant and four independent variables. The cusum residual plot and the cusum of squares of residuals plot have been obtained through TIMVAR. The analysis showed no change up to 1964/65 and then indicated instability thereafter. The second example analyzed is one concerning the International Monetory Fund. If

Mt = per capita stock of money,

I& = long term interest rate, and

Yt = per capita income,

383 then, the model proposed is A log Mt = a Plog A& qlog AYt C t , (6) where A is the difference operator. TIMVAR analysis detected no changes that were statistically significant. Similar modelling and analysis has also been done on certain civil service data. Schweder (1976) studied the relative growth of different body parts of fin whales. A point of structural shift in a whale's life usually indicates that the whale has entered a new phase in its development. With

+

+

+

X, = log length of whale i ,

Y,= log height of dorsal fin of whale i, and

Z, = log length of base dorsal fin of whale i, a' = 1 , 2 , . . .,108; the relations

Y, = a1 +PIX, and

+

+ el,,

(7)

+

(8)

Z, = 0 2 PzY, ez, were studied for structural shifts. A structural shift is indicated if model (7) is transformed to

+ +

Y,= al +PIX. 7. el, (9) for some unknown i . The observations were ordered such that x1 < xz < . . . < x10*. A cusum procedure was developed to test for structural shifts. The minimum of the cusum has been observed to be -32.4 with a significance probability of 0.0014 indicating a structural shift in the regression of Y on X . The point of shift is estimated by the point a t which the cusum is a minimum, and is given by 9 = -48. Similar analysis has been applied to model (8). Bagshaw and Johnson (1977) analysed the first 112 observations of the series for IBM stock given in Box and Jenkins (1970). They fitted an IMA(1,l) model with non-zero mean: v z t = 1.28 (1 .29B)t; (11) that is 8: = 25.28, eo = -.29 and = 1.28. Then, a cusum test developed for testing parameter changes in ARIMA models was applied to detect a change in 0 at observation 270.

+ +

Non-Parametric Tests for Change of Parameter at Unknown Time Pettit (1979) analyzed the Lindisfarne Scribe's binomial data for changes a t unknown times using a non-parametric method. The data refer t o the number of occurrences of present indicative third person singular endings "-s" and "-d", for different sections of Lindisfarne. It is believed different scribes used the endings "s" and "-6" in different proportions. Pettit (1979) developed a non-parametric test for detecting changes at unknown times; this test is a version of the two sample Mann-Whitney test. For testing change against no change, the statistic is:

KT = maxlst
(12)

KT+= maxl
(13)

and

384

KT -- - minl
(14)

ut,~,

where Ut,T = and

cf=1c:=,+i D*~

(15)

D,,= sgn(X, - X , ) Exact distributional properties were obtained for Bernoulli random variables. Using a simple modification of the Bernoulli for binomial data, the Lindisfarne data were analysed associating “-S” with one and “-8’ with zero. The value of

gT = maxt=t,,.=l, ,- I Ut,T I (16) was found to be 7906 and the standardized statistic was 1.83. This standardized statistic has the same distribution as that of Smirnov’s statistic, and from this the significance level was found to be 0.25 percent hence strongly indicating a change. The same technique has been used t o analyze industrial data. Detecting Parameter Changes Using Bayesian Methods Smith and Cook (1980) studied changes a t unknown times in the functioning of a transplanted kidney by formulating it as a simple linear regression model given by: Y,=al+Plz,+e,, and

i = l , ..., rn

+

(20)

e;, i = r n + 1 , . . .,n. (21) then in the renal transplant application, 7 corresponds to the If the two lines meet at 7 = time at which a rejection occurs. Data from two patients were considered and an unconstrained version of the model was analysed by Bayesian methods with a vague prior specification consisting of the uniform distribution over 2 5 rn 5 n - 1. The posterior densities for m and 7 were obtained. Using these posterior distributions, the change points and 7 were estimated for both patients.

Y,= a2 + P z x ,

H,

Hsu (1979, 1982) analyzed U.S.stock market prices from July 1971 through August 1984 for detecting parameter shifts in the variance at unknown times. Hsu (1979) also analyzed air traffic densities in the New York area.Hsu (1979) then used the statistic

T = Cy=,(i- l ) X , / [ ( n

c;=,

- 1) Xi] in the standardized form given by

T*-

T-112

m

(18) (19)

to detect changes at unknown times in the variances. The statistic, T’,for squared & has been found to be 3.521 which is substantially larger than the critical point 2.326 at a 0.01 right side level. The change point was then estimated by maximum likelihood and found to be the 89th time point. This corresponds to the mid March 1973 when Watergate events caught the full attention of the U.S. public. Hsu (1982) proposed a step change in the parameters for the same stock market price data and analyzed them using a Bayesian inference procedure in a modified form that was basically developed by Box and Tiao (1973). The posterior probability functions for the change point indicated that a change in the market return distribution occurred in late February or March of 1973. It can be seen that these results coincide with the analysis in Hsu (1979). Hsu (1979) also studied the problem of detecting changes in air traffic densities observed in the New York area. Arrivals a t the New York airports on a single day were considered for the purpose of analysis. The 213 arrival times were first analyzed t o establish the inter-arrival time densities; the exponential distribution was found to fit well. Then T’ was calculated to be 1.232 which is well below the significance bounds thus suggesting that the aircrafts were arriving at constant rates.

385 Tsurumi (1977) examined whether there was a parameter shift in consumer expenditures on vitamins and other nutritional supplements in the Japanese pharmaceutical industry. The demand curve is taken to be eny; = aenp;

+ penxi + U;,

i = 1 , 2 , . . . ,16

where:

Y;= real average expenditures on vitamins and other nutritional supplements by consumers in income group i , 1965 Yen; p. = relative price of nutritional supplements to consumers in income group

i, 1965 = 100;

x; = real average disposable income of consumers in income group i, in thousands of 1965 Yen. The study was based on data from 1969 to 1974. A Bayesian method was developed to test for parameter shifts. The posterior probability density functions of 7t = pt - pt-1 and 7; = at - a t - 1 were derived based on diffuse priors. The results indicate parameter shifts from one year to another did not occur in the coefficients of the price variable, but rather in those of the income variable. The shift in the income variable was estimated to have occurred in the year 1971. Menzefricke (1981) studied the stock return data of Hsu (1979) and the industrial data of Pettit (1979) using a Bayesian procedure for detecting changes in precision at unknown times. For the stock return data, Menzefricke (1981) hypothesized the model:

X,

-

~ ( p 1 , q ; ' )i = 1 , 2 , .. . , m

(24)

and

~ ( p 2 , q ; l )i = m + 1 ,...,n (25) Based on a vague prior, the posterior probability function on m was determined and the mode was detected a t m = 89. Thus, this result is in close agreement with the result of Hsu (1979,1982). Menzefricke (1981) then analyzed the industrial data giving the percentages of a particular material in 27 batches. These data were first analyzed by Pettit (1979) using a non-parametric method. Pettit (1979) concluded that a change occurs in period 16. Menzefricke (1981) applied the Bayesian method and found results in close agreement with those of Pettit.

X,

D e t e c t i o n and Estimation of Change Points Using Likelihood Methods Esterby and El-Shaarawi (1981) performed an analysis on the change of pollen concentration in a lake-sediment core by making inferences about the point at which changes occur in the relationship between concentration and depth. The data were modelled by a regression of the form

Y,= C;=, e.+;1

+ el,,

i = 1, . . .,m

(22)

and

+

Y, = C,9=oe2jx;1 ez., i = m + 1 , . ..,TI, (23) where p and q are unknown. The unknown parameters p, q and m are estimated by marginal, conditional and maximum likelihood methods. It was observed that p' = q' = 4 and rh = 12. The analysis has been carried out under the assumption that u: # ui for the two regimes. The methods have been applied to other data sets. Worsley (1983) analyzed the Gross Domestic Product and labor and capital input in the United States for the years 1929-1967 by modelling the logarithm of the gross domestic product as a linear function of the logarithms of the labour input and capital input. The log likelihood ratio statistics under the assumptions u1 # u2 and 6 1 = u2 have been computed and the distributions of the maxima of these statistics were approximated by a Bonferroni inequality. The analysis indicated a significant change in the year 1942 and another significant change in the year 1946.

386

Commenges and Seal (1985)considered a key problem occurring in neurophysiology. The problem is that of determining whether, after presentation of a stimulus there has been a modification in the discharge of a recorded neuron. The problem was considered as that of estimation of a change point in a sequence of random variables. The analysis involved a window method which was applied systematically to look for changes corresponding to a decrease in the mean time interval between action potentials. D e t e c t i o n of Regression Parameter Changes at Unknown Times Using Raw Regression Residuals MacNeill (1985) considered the series of annual flows of the Nile river for the period 18701945 and analyzed the series for unknown interventions. A detection procedure, called Adaptive Forecasting and Estimation using Change-Detection, or AFECD for short, was developed to detect interventions a t unknown times in time series. AFECD was applied and a change has been detected in the river flow in the year 1903. This change in the river flow corresponds t o the year when the high dam a t Aswan was constructed. The procedure also indicated a negative slope to the river flow over the period of time leading up to the change and indicated a flat slope thereafter. This has been interpreted as a positive effect by the dam on river flow. The AFECD procedure is based on a change detection statistic

r

o

L

zn,

o

...

0

...

A detailed derivation of this change detection statistic and of several other change detection statistics along with their distributional results can be found in the unpublished Ph.D. dissertation of Jandhyala (1985). REFERENCES Bacon, D. W., and Watts, D.G. (1971). Estimating the transition between two intersecting straight lines. Biometrika 58, 525-534. Bagshaw, M., and Johnson, A.R. (1977). Sequential procedures for detecting parameter changes in a time series model. Journal of the American Statistical Association 72, 593-597. Box, G.E.P., and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control. San Francisco, Holden-Day. Box, G.E.P., and Tiao, G.C. (1973). Bayesian Inference in Statistical Analysis, Reading, Mass.: Addison- Wesley. Brown, R.L.,Durbin, J., and Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society, Series B 37, 149-192. Comnienges, D., and Seal, J. (1985). The analysis of neuronal discharge sequences: change point estimation and comparison of variances. Statistics in Medicine 4,91-94. Esterby, S.R., and El-Shaarawi, A.H. (1981). Inference about the point of change in a regression model. Applied Statistics 30, 277-285.

387 Griffiths, D.A., and Miller, A.J. (1973). Hyperbolic regression-a model based on two-phase piecewise linear regression with a smooth transition between regimes. Communications in Statistics 2, 561-569. Hsu, D.A. (1979).Detecting shifts of parameter in gamma sequences with applications to stock price and air traffic flow analysis. Journal of the American Statistical Association 74, 31-40. Hsu, D.A. (1982).A Bayesian robust detection of shift in the risk structure of stock market returns. Journal of the American Statistical Association 77, 29-39. Jandhyala, V.K. (1985). Residual processes for regression models with applications to detection of parameter changes at unknown times. Unpublished Ph.D. thesis, The University of Western Ontario, London, Ontario. MacNeill, I.B. (1985).Detecting unknown interventions with application to forecasting hydrological data. Water Resources Bulletin 21, 785-796. Menzefricke, V. (1981). A Bayesian analysis of a change in the precision of a sequence of independent random variables at an unknown time point. Applied Statistics 30, 141-146. Page, E.S. (1955).A test for a change in a parameter occurring at an unknown time point. Biornetrika 42, 523-526. Pettit, A.N. (1979). A non-parametric approach to the change point problem. Applied Statistics 28, 126-135. Schweder, T. (1976). Some optimal methods to detect structural shifts or outliers in regression. Journal of the American Statistical Association 71, 491-501. Sen, A.K., and Srivastava, M.S. (1975). Some one-sided tests for change in level. Technometrics 17, 61-64. Smith, A.F.M., and Cook, D.G. (1980). Straight lines with a change point: A Bayesian analysis of some renal transplant data. Applied Statistics 29, 180-189. Tsurumi, H. (1977).A Bayesian test of a parameter shift and an application. Journal ofEconornetrics 6, 371-380. Worsley, K.J. (1983).Testing for a two-phase multiple regression. Technometrics 25, 35-42.

SPECTRAL ANALYSIS OF LONG-TERM WATER QUALITY RECORDS PAUL H.

WHITFIELO,

INLAND WATERS DIRECTORATE,

ENVIRONMENT CANADA,

VANCOUVER,

B.C.

ABSTRACT The G r e a t e r Vancouver R e g i o n a l D i s t r i c t (GVRD) p r o v i d e s , services,

s u p p l y o f w a t e r comes f r o m t h r e e sources:

The t o t a l

t h e C a p i l a n o , Seymour and C o q u i t l a m

A s p a r t o f an e x t e n s i v e q u a l i t y c o n t r o l program,

Rivers.

along w i t h other

d r i n k i n g w a t e r t o t h e communities o f G r e a t e r Vancouver.

t h e GVRD measures

w a t e r t e m p e r a t u r e d a i l y and pH and t u r b i d i t y t h r e e t o f o u r t i m e s each week. These r e c o r d s commence i n 1959 and c o n t i n u e t o t h e p r e s e n t . These r e c o r d s were reduced t o weeky averages and examined i n t h e f r e q u e n c y domain u s i n g s p e c t r a l

analysis.

The f r e q u e n c y approach i n v o l v e s e s t i m a t i n g

how much o f t h e v a r i a t i o n i n t h e d a t a a r i s e s f r o m v a r i o u s f r e q u e n c y bands. a n a l y s i n g t h e d a t a p r e s e n t e d here,

In

t h e a p p l i c a t i o n o f s p e c t r a l a n a l y s i s as an

a i d t o i d e n t i f y i n g s i g n i f i c a n t f r e q u e n c y components i s examined. INTRODUCTION

One o f t h e o b j e c t i v e s f o r g a t h e r i n g w a t e r q u a l i t y d a t a i s t h e d e t e c t i o n o f trends. data

For t r e n d assessment programs t o be e f f e c t i v e t h e manner

are

collected

complimentary.

(i.e.

design)

and

The u l t i m a t e o p e r a t i o n a l

the

data

goal

analysis

methods

i n which must

be

i n t r e n d assessment m o n i t o r i n g

i s t o be a b l e t o d e t e c t t h e s m a l l e s t p o s s i b l e change o r t r e n d w h i l e m i n i m i z i n g t h e amount o f d a t a g a t h e r e d .

T h i s paper c o n s i d e r s p r a c t i c a l a p p l i c a t i o n o f

spectral

to

records

analysis are

techniques

described

long-term

subsequently.

water

Some a s p e c t s

quality and

records.

results of

These spectral

a n a l y s i s w h i c h a r e u s e f u l i n t r e n d assessment w i l l be demonstrated. There a r e b a s i c a l l y two approaches t o t i m e s e r i e s a n a l y s i s ,

a frequency

domain ( o r s p e c t r a l ) approach, and a t i m e domain ( B o x - J e n k i n s ) approach. s e r i e s a n a l y s i s methods a r e concerned w i t h t h e f o l l o w i n g aims o r g o a l s : description; analysis

(2)

explanation;

(3)

prediction;

and

(4)

control.

Time (1)

Spectral

i s a method f o r t r a n s l a t i n g f r o m t h e t i m e domain t o t h e f r e q u e n c y

domain and back a g a i n .

The use o f t h e s e e q u a t i o n s has been g r e a t l y

improved

by t h e a d d i t i o n o f t h e f a s t F o u r i e r t r a n s f o r m and h i g h speed computers. The premise o f s p e c t r a l a n a l y s i s i s t h a t a t i m e s e r i e s can be m e a n i n g f u l l y r e p r e s e n t e d by p u r e s i n e o r c o s i n e waves

summed o v e r a range o f f r e q u e n c i e s

389 ( B l o o m f i e l d , 1976). series

S p e c t r a l a n a l y s i s uses t h e F o u r i e r T r a n s f o r m o f t h e t i m e

t o obtain the coefficients of

frequencies.

Grouping

neighbouring

the

sinusoids

frequencies

at

a

smooths

discrete the

set

spectrum

of and

enhances t h e s t a t i s t i c a l s t a b i l i t y o f t h e e s t i m a t e s ( C h a t f i e l d , 1984). The

spectral

variables.

analysis

One o f

of

single

t h e most

v a r i a b l e s can be extended t o p a i r s o f

important

coherence o f t h e two v a r i a b l e s .

frequency

domain

is

quantities

the

The coherence o f two v a r i a b l e s r e f l e c t s t h e

l i n e a r c o r r e l a t i o n between t h e v a r i a b l e s i n d i f f e r e n t f r e q u e n c y bands. Spectral analysis

has many u s e f u l

s e r i e s d a t a ( B r i l l i n g e r 1981a,

1981b;

1971; Jones 1964, 1965, e t c . ) . useful i n the analysis

of

applications t o the analysis o f time Chatfield,

1984;

C h a t f i e l d and

Pepper

Several o f these a p p l i c a t i o n s a r e p a r t i c u l a r l y

water

q u a l i t y data.

The m a t e r i a l w h i c h

follows

a v o i d s t h e m a t h e m a t i c a l a r e a s o f s p e c t r a l a n a l y s i s and c o n c e n t r a t e s on t h e use o f spectra f o r i d e n t i f y i n g p e r t i n e n t features. reflect

some o f

the data

r e c o r d s and t h e i r seasonality, limits.

features

spectra allow t h e i r

trend,

non-normal

The v a r i a b l e s c o n s i d e r e d h e r e

t h a t a r e n o r m a l l y seen i n w a t e r q u a l i t y illustration.

data

These

distribution

and

features

changing

include

detection

The examples p r e s e n t e d h e r e a r e l i m i t e d t o s i m p l e a s p e c t s o f s p e c t r a l

analysis,

namely

smoothed

spectra,

and

how

these

can

provide

useful

information i n t h i s analysis o f time series. DATA SERIES Time

series

data

from t h r e e

presented here (Figures 1-3). Coquitlam

Rivers.

Vancouver

and

These

provide

for

three

different

variables

are

The t h r e e r i v e r s a r e t h e C a p i l a n o , Seymour and

three its

rivers rivers

drinking

are

located

water.

north

Data

for

of

the

three

City

of

variables,

t e m p e r a t u r e , t u r b i d i t y and pH a r e g a t h e r e d by t h e G r e a t e r Vancouver R e g i o n a l D i s t r i c t a t t h e water intakes. The raw d a t a was g a t h e r e d on a d a i l y b a s i s . were made each day o v e r t h e p e r i o d o f r e c o r d . pH were made f r o m t h r e e t o f i v e t i m e s

Measurements o f t e m p e r a t u r e Measurements o f t u r b i d i t y and

each week.

c o n v e r t e d t o computer f o r m u s i n g Lotus-123.

These measurements

were

The i n p u t d a i l y measurements were

t h e n processed t o weekly averages and t e m p e r a t u r e s c o n v e r t e d f r o m F a r e n h e i t t o C e l c i u s where needed. The r e c o r d s f o r t h e C a p i l a n o and C o q u i t l a m R i v e r s s t a r t and t h e Seymour R i v e r r e c o r d b e g i n s i n January 1961. 1400 weekly

values

in

each,

e v a l u a t i n g s p e c t r a l methods.

provide a Temperature

nearly

i n January 1959,

These l o n g r e c o r d s , some

ideal

set

of

records

for

i s a h i g h l y seasonal v a r i a b l e w i t h

h i g h e r v a l u e s d o m i n a t i n g i n summer and l o w e r v a l u e s

i n winter.

I n addition,

t e m p e r a t u r e v a l u e s a r e h i g h l y a u t o c o r r e l a t e d w i t h warmer v a l u e s b e i n g f o l l o w e d by warmer v a l u e s and s i m i l a r l y f o r c o o l e r p e r i o d s ( W h i t f i e l d & Woods.

1984).

390

Capilano River

L959

1961

1963

1965

1967

1969

1971

-

1973

Seymour River

GVRD

1975

1977

1979

1981

1983

1985

1979

1981

1983

1985

1979

1981

1983

1985

GVRD

~

1

1959

1961

1963

1965

1967

1969

1971

1973

Coquitlarri RiT-er

1959

FIgure 1 .

1961

1963

1965

I967

1969

1971

1973

1975

~

1977

GVRD

1975

1977

391

Cdpilano River

GVRD

~

85

8

75

7

% 65

6

55

5

1959

1961

1963

1965

1967

1969

1971

1973

Seymour River

1975

-~

1977

1979

1981

1983

1985

1979

1981

1983

1985

1979

1981

1983

1985

GVRD

85

e 78

7

% 65

6

55

1959

1961

1963

1965

1967

1969

1971

1973

Coquitlam River

1975

-

1977

GVRD

R5

0

75

-

7

I

a 65

6

55

5

1959

F l g u r e 2.

1961

1963

1965

1967

1969

1971

1973

1975

1977

392

Capilano River

GVRD

~

60

Seyniour River

LJLAA.

-

GVRD

,

L 1983

1939

1961

1963

1965

1967

1969

1971

1973

Coquitlarn RiT-er

Flgure 3.

1975

~

1977

GVRD

1979

l!

1985

393

F i g u r e 1 shows t e m p e r a t u r e r e c o r d s f o r t h e t h r e e r i v e r s , showing t h e seasonal v a r i a t i o n mentioned p r e v i o u s l y .

The C a p i l a n o and C o q u i t l a m w a t e r s u p p l i e s a r e

both

reservoirs

fed

by

relatively

temperatures.

small

which

buffer

the

excursion

of

T h i s i s p a r t i c u l a r l y e v i d e n t when a l l t e m p e r a t u r e s i n excess o f

15°C a r e c o n s i d e r e d .

Water

temperatures

In

the

Seymour

River

exceed

15"

a l m o s t e v e r y y e a r w h i l e t h e o t h e r two r a r e l y exceed t h i s v a l u e . The r e c o r d s o f pH a r e shown i n F i g u r e 2;

t h e y do n o t show a pronounced

seasonality.

C o q u i t l a m pH's a r e somewhat l b w e r t h a n e i t h e r t h e C a p i l a n o o r

t h e Seymour.

O f particular

visual

inspection of

this

interest

i s the

l o n g t e r m pH d r i f t

F i g u r e 2 suggests.

that

I s t h i s a r e a l trend,

close o r an

artifact? Turbidity reflecting

i s a

intense

h i g h l y episodic rainfall

and

other

processes

introduce

Coupled w i t h t h e e p i s o d i c n a t u r e o f t h e s e r e c o r d s a r e two o t h e r

features.

t h e r e c o r d s show two changes i n t h e d e t e c t i o n l i m i t o f t h e

First,

a n a l y t i c a l method.

These a r e r e f l e c t e d i n changes o f t h e

1965 and a g a i n a t t h e end o f 1969. River

shows

regular

baseline

i n late

Second i s t h e o c c u r r e n c e o f p e r i o d s where

t h e r e I s a higher l i k e l i h o o d o f h i g h t u r b i d i t y values Capilano

which

F i g u r e 3 demonstrates t h e h i g h l y e p i s o d i c n a t u r e o f

sediment i n t o t h e r i v e r s . turbidity.

variable w i t h periods o f high t u r b i d i t y

events

turbidity

peaks

in

the

b e i n g observed. mid w i n t e r

S i m i l a r r e s u l t s o c c u r f o r t h e Seymour R i v e r and f o r t h e C o q u i t l a m R i v e r . mid-winter

peaks

occur

with

less

regularity.

Does

this

The

period. but

non-normal

d i s t r i b u t i o n of data influence the a b i l i t y o f spectral analysis t o a i d I n the evaluation o f these data? DATA ANALYSIS A l l s t a t i s t i c a l a n a l y s e s p r e s e n t e d were p e r f o r m e d u s i n g t h e U n i v a r i a t e and

B i v a r i a t e S p e c t r a l a n a l y s i s program c o n t a i n e d i n t h e BMDP package computer

programs.

This

program p r o v i d e s

graphic

(PlT)

of

d i s p l a y s and d e s c r i p t i v e

s t a t i s t i c s f o r s i n g l e o r p a i r e d t i m e s e r i e s ( D i x o n . 1983). BANDWIDTH EFFECTS One o f variation sine

t h e basic goals

of

spectral

analysis

i s t o d e t e r m i n e how much

I n t h e d a t a s e t I s accounted f o r by d i f f e r e n t f r e q u e n c y bands o f

waves.

The

fast

Fourier

transform

of

a

data

series

provides

a

periodogram w h i c h p r o v i d e s t h e spectrum necessary t o p r e c i s e l y d u p l i c a t e t h e input series. temperature.

F i g u r e 4 shows one such periodogram f o r C a p i l a n o R i v e r w a t e r D e t a i l e d s p e c t r a such as t h i s a r e d i f f i c u l t t o i n t e r p r e t .

S p e c t r a l d e n s i t i e s can a l s o be e s t i m a t e d f o r f r e q u e n c y bands.

For wider

bandwidths, t h e perlodograms a r e averaged o v e r a range o f f r e q u e n c i e s t o f o r m estimated s p e c t r a l d e n s i t i e s .

T h i s enhances t h e s t a b i l i t y o f t h e e s t i m a t e s .

394

F i g u r e 4.

F i g u r e 5.

b u t i t g i v e s l e s s d e t a i l t h a n n a r r o w e r bandwidths. more r e l i a b l e p i c t u r e o f t h e observed s e r i e s . plots

for

0.0128.

Capilano This

River

figure

bandwidth.

A

presented

here.

bandwidth This

temperatures

shows

the

Wider bandwidths p r o v i d e a

F i g u r e 5 shows w i d e r bandwidth

namely

0.0005,

progresslve

0.0018,

smoothing

0.0064

with

and

increased

of 0.0128 was used i n a l l o t h e r s p e c t r a l e s t i m a t e s width,

while

perhaps

overly

smoothed,

shows

the

c h a r a c t e r i s t i c s we w l s h t o c o n s l d e r . A R I T H M E T I C OR LOGARITHMIC PLOTS?

One a d d i t i o n a l a r e a w h i c h needs m e n t i o n i n g r e g a r d i n g s p e c t r a l p l o t s i s t h e p r o s and cons o f u s i n g l o g p l o t s o r l i n e a r p l o t s .

The c h o i c e o f p l o t t i n g t h e

e s t i m a t e d spectrum o r i t s l o g a r i t h m depends upon a number o f c o n s i d e r a t i o n s . The advantage o f p l o t t i n g t h e e s t i m a t e d spectrum on a

logarithmic

scale

is

t h a t I t s v a r i a n c e i s independent o f t h e l e v e l o f t h e spectrum and hence o f constant width. power e x i s t , that

L o g a r i t h m i c p l o t s a l s o have v a l u e where l a r g e v a r i a t i o n s

and a l l o w more d e t a i l t o be shown.

logarithmic plots

Interpretlng a

show exaggerated

The converse of t h e s e a r e

e f f e c t s where

spectrum p l o t t e d on an a r i t h m e t i c

s i n c e t h e a r e a under t h e c u r v e corresponds t o power.

in

variation

is

small.

scale i s straightforward,

As a r e s u l t o f t h i s the

r e l a t l v e i m p o r t a n c e of peaks a r e e a s i e r t o assess ( C h a t f i e l d , 1984). For t h e most p a r t we w i l l use l o g p l o t s , detail.

In

some

cases,

since they

show more o f

p a r t i c u l a r l y when s e r i e s a r e a d j u s t e d ,

p l o t s s e r v e t o emphasize t h e d i f f e r e n c e s .

the

arithmetic

395 SPECTRA OF SEASONAL VARIABLE - TEMPERATURE

These

Logarithmic p l o t s o f t h e temperature spectra a r e g i v e n i n Figure 6 . three

spectra

identical. ,>

Ierriperature

~

band 0.0128

100

are

showing

nearly

most

of

the

variance a t low frequencies

and

dominated

by

the

frequency

of

once

Secondary

peaks

peak

at

per

exist

a

year.

a t twice

and t h r e e p e r y e a r b u t t h e s e a r e of

much

lower

With

magnitude.

t h e e x c e p t i o n o f t h e s e peaks t h e spectrum

is

that

random s e r i e s .

of

a

purely

The peaks a t t w o

and t h r e e p e r y e a r appear t o be

I 10

5

0

15

25

20

harmonics

of

frequency.

High values a t zero

the

principal

:JO

Qcles per Year

cycles per year i n d i c a t e s a h i g h degree o f F i g u r e 6.

autocorrelation within

the series.

A l s o e v l d e n t f r o m F i g u r e 6 i s t h e f a c t t h a t no c y c l e s e x i s t a t f r e q u e n c i e s Cycles a t t h o s e f r e q u e n c i e s m i g h t w e l l e x i s t i f man

g r e a t e r t h a n 10 p e r y e a r .

has a d i r e c t i n f l u e n c e on t h e w a t e r ( i . e . e f f l u e n t s o r o t h e r use). cycles

would

be l i k e l y

if

the

sampling

frequency

was

much

Additional

greater

(i.e.

d i u r n a l c y c l e s when t h e s a m p l i n g r a t e i s I n h o u r s ) . I n sampling a time as A t

series

one

of

the

main

concerns

is

the

sampling

I t i s o b v i o u s t h a t any sampling l e a d s t o a l o s s o f i n f o r m a t i o n

interval At.

increases.

However,

since cost

i s proportional

expensive t o reduce A t t o a v e r y s m a l l i n t e r v a l .

to l/At,

i t becomes

I n a d d i t i o n as A t becomes

s m a l l t h e degree o f redundancy i n t h e d a t a s e r i e s i n c r e a s e s ( W h i t f i e l d . 1983). For

the

frequency small.

sampled

increases.

series, This

the

spectral

indlcates

I f i t does n o t approach zero,

chosen.

estimates the

approach

current

At

is

zero

as

A t w h i c h has a p r o b a b i l i t y

the

sufficiently

t h e n a s m a l l e r v a l u e o f A t needs t o be

W h i l e t h i s i s n o t c r i t i c a l when s a m p l i n g f r o m a c o n t i n u o u s t r a c e ,

becomes i m p o r t a n t when g a t h e r i n g d a t a . case,

that

it

Sampling r e q u i r e s an i n i t i a l c h o i c e o f

o f being t o o l a r g e o r t o o small.

I n t h e former

t h e r e s u l t i n g d a t a i s d i f f i c u l t t o a n a l y z e , and i n t h e l a t t e r t h e d a t a

Is u n n e c e s s a r i l y expensive. I f t h e v a l u e o f A t t h a t i s chosen i s t o o l a r g e , a l i a s i n g may o c c u r . a l l a s r e s u l t s when v a r i a t i o n a t f r e q u e n c i e s (actually n/At)

greater than the

An

sampling r a t e

a r e f o l d e d back, p r o d u c i n g an e f f e c t i n t h e measured spectrum

396 (Chatfleld. peaks

are

1984). not

A1 l a s e d

i n the

apparent

temperature spectra. __

Unadiusted

lo;'?:

2

a a

1

LI) M 1

01

n 01

Ih

computer

program w h i c h

a l l o w s a s e r l e s t o be deseasonal-

/\

4

The

i s used t o e s t i m a t e t h e s p e c t r a

y

lzed.

-\ww

I n deseasonallzatlon t h e

average

value

removed

from

for

a

perlod

each

is

indlvidual

observatlon f o r t h a t period. the

0 on1

present

values

are

average

of

(eg.

.case

the

adjusted all

by

slmllar

t h e average

In

weekly

value

the weeks

for

the

t h l r d week I n J u l y I s s u b t r a c t e d

Soyrriour K i \ - o r

~

'Terriperature

~

band 0.0128

from

each

occurrence

for

the

t h i r d week I n J u l y ) . Spectra temperature

for

deseasonallzed

spectra

are

shown

w i t h t h e spectra f o r t h e o r i g i n a l d a t a s e r i e s i n F l g u r e s 7-9. each case.

the

spectra

deseasonallzed s e r l e s the

seasonal

harmonlcs 0

in

is 20 L)icl?s per Year

2s

peak

for the

lack

both

and

Its

(2&3/year).

spectra

closely

spectra

of

In

These

approximate t h e

random

series

which

have t h e same mean and v a r i a n c e .

100c

inc

2E "

1c

4

a

'n C

O

I

-1

01

n 01 5

F i g u r e 9.

in 15 20 ('ycles per Year

25

397 SPECTRA OF A TREND VARIABLE - pH

The s p e c t r a o f t h e pH s e r i e s f o r t h e t h r e e r i v e r s a r e shown i n F i g u r e 10. The o v e r a l l v a r i a n c e o f t h e s e t h r e e s e r i e s was q u i t e s m a l l , in

the

l o w magnitude

of

the

spectra

i n F i g u r e 10.

and i s r e f l e c t e d

The s p e c t r a

for

the

C a p i l a n o and Seymour R i v e r s i n d i c a t e a seasonal component, w h i l e t h i s f e a t u r e

i s l e s s evldent f o r t h e Coquitlam River.

pH

~

Capilano River

band 0.0128

1[

08

I

06

E

2

$

01

a,

04

a a

l i :

M -i

02

0 01

band 0.0128

-~

---

E

4

4

pH

~

'4 0 001

0 0 ,

20

15

10

25

Cycles per Year

F i g u r e 10.

Seyrriour River

pfi

~

Coquitlam River

band 00128

~

pH

-

band 00128 ~

..I

Unadjusted Adiusted for Trend

1-

0.8

j2

-

0.6-

04-

1

I ~

o;-''\

'"\ I_____

F i g u r e 12. Dominating a l l t h r e e s p e c t r a i s t h e s t r o n g peak a t z e r o frequency. peak

usually

indicates

autocorrelation.

a

trend

the

series

or

a

high

Such a

degree

of

F i g u r e s 11-13 show a r i t h m e t i c p l o t s o f t h e o r i g i n a l s p e c t r a

and t h o s e f o r t h e d e t r e n d e d s e r i e s . frequency.

D e t r e n d i n g t h e s e r i e s removes most o f t h e

Detrending

of

variance

at

seasonal

peak i n t h e C a p l l a n o R i v e r ( F i g u r e 11) o r t h e Seymour R i v e r ( F i g u r e

12).

zero

in

Detrendlng r e q u i r e s f i t t i n g a

the

linear trend

data over

does

not

affect

the

t h e t i m e p e r i o d and

398

TABLE 1 O e t r e n d i n g S t a t i s t i c s f o r t h e pH S e r i e s Series

Mean

Slope*

Max

M in

Capilano

original detrended

6.320 6.316

0.190 0.0

6.90 6.04

5.90 5.92

Seymour

o r ig i n a l detrended

6.322 6.322

0.182 0.0

7.00 6.99

5.800 5.82

Coqui t 1am

o r ig inal detrended

6.164 6.164

0.275 0.0

6.75 6.63

5.75 5.66

* T o t a l decrease o v e r 25 y e a r p e r i o d subtracting t h i s

from t h e o r i g i n a l

s t a t i s t i c s f o r each o f .the s e r i e s .

series.

Table 1 contains t h e p e r t i n e n t

A s i s e v i d e n t f r o m T a b l e 1 pH has shown a

decrease o v e r t h e p a s t 25 y e a r s on a l i n e a r b a s i s i n t h e s e t h r e e r i v e r s . might

be

better

to

try

higher

fits

order

to

this

relationship

to

It more

a d e q u a t e l y d e s c r i b e t h e t r e n d s l t u a t i o n e v i d e n t i n F i g u r e 2. The r e m a i n i n g peak a t z e r o f r e q u e n c y may be a t t r i b u t e d First

autocorrelation

Second,

of

the

series

may c o n t r i b u t e

l i n e a r f i t t i n g o f a higher order,

some o f

o r more c o m p l i c a t e d ,

t h e m a j o r source o f v a r i a n c e subsequent t o d e t r e n d i n g . subseries o f t h e o r i g i n a l series,

t o two sources.

with the inclusion of

t h e variance. t r e n d may be

Similar analysis of higher order

terms

m i g h t s e r v e t o e l i m i n a t e more o f t h e v a r l a n c e .

g i v e n as F i g u r e 14. case,

the

approaches frequency

I n o n l y one

spectral

(dJlldli0

1000-

estimates

zero w i t h Increasing (Seymour). Such

:

iao-

2 a, 0

spectra

1n d i c a t e

serious

problems w i t h t h e assumptions regarding these series. The o r i g i n a l data s e r i e s a r e h l g h l y episodic

and

the

data

10.

R u-

2

-1

11

01:

0017

II

b

kynlour roqultidm

399 such

Capilario River

Iiirbidity

~

barid 0.0128

~

series

series

often

which

results

are

in

approximately

normal. S p e c t r a f o r t r a n s f o r m e d and untransformed

series

are

given

f o r each o f t h e r i v e r s as F i g u r e s 15-17.

I n each case t h e s p e c t r a l

estimates

for

series

the

transformed

approaches

zero

as

frequency increases.

The s p e c t r a

for

series

the

transformed

t h e Seymour show

a

trend

frequency) ( 1 per

a

and

similar peak

is

results

original

(zero

component

a

harmonic

The C o q u i t l a m R i v e r

is

seasonal These

component seasonal

year)

component. series

for

and C a p i l a n o R i v e r s

although

the

less evident.

with

agree

observation

of

our

annual

midwinter high t u r b i d i t y periods (seasonal changes

features). in

described dominant

The

limit

detection

previously trend

are

the

component

and

t h i s i s reflected i n the figures as t h e peak a t z e r o f r e q u e n c y . F i g u r e 16.

Coquit larn Rive1

These s e r i e s a r e plagued by ~

nirbidity

~

band 0 0128

i n detectlon l i m i t .

t h e changes

I t i s l i k e l y t h a t most trend

component

frequency) the is

the

(i.e.

a

at

entirely

decreasing

fully or

is

of

zero

due

baseline.

difficult

the to This

situation

to

evaluate since truncation

censorlng original

of

baselSne

than s a t l s f y i n g . converse

(i.e.

later

data Is

to less

N e i t h e r .is t h e uncensoring)

of

t h e e a r l i e r d a t a a v e r y s a f e way

400 t o proceed.

C e n s o r i n g o f t i m e s e r i e s d a t a such as

i n t r o d u c e d by d e t e c t i o n

l i m i t s make t i m e s e r i e s a n a l y s i s v e r y d i f f i c u l t a t t h e p r e s e n t t i m e .

As b e t t e r

methods a r e developed f o r e s t i m a t i n g t h e s t a t i s t i c s o f censored d a t a ,

perhaps

t h e t i m e s e r i e s a n a l y s i s o f t h e s e t y p e s o f s e r i e s w i l l become more p r a c t i c a l . CROSS SPECTRA - BIVARIATE PROCESSES O f t e n i n w a t e r q u a l i t y a n a l y s e s one wishes t o c o n s i d e r t w o s e r i e s w h i c h are e i t h e r s i m i l a r or causally related.

These two s i t u a t i o n s

are the time

s e r i e s e q u i v a l e n t o f c o r r e l a t i o n and r e g r e s s i o n o f two s e r i e s X ( t ) and Y ( t ) . I n t h e t i m e domain, t h e n a t u r a l t o o l f o r e v a l u a t i n g such r e l a t i o n s h i p s i s I n t h e f r e q u e n c y domain t h e e q u i v a l e n t i s t h e

the cross-correlation function. cross-spectrum.

Usually

three

functions

are

plotted,

to

r e l a t i o n s h i p between t w o s e r i e s i n t h e f r e q u e n c y domain.

describe

the

These a r e coherence,

phase and g a i n , a l t h o u g h o t h e r f u n c t i o n s may be s u b s t i t u t e d ( C h a t f i e l d . 1984). The p l o t o f coherence I s a measure o f t h e l i n e a r a s s o c i a t l o n between t h e two t i m e s e r i e s a t v a r i o u s f r e q u e n c i e s and i s analogous t o t h e square o f t h e correlation

coefficient.

The

closer

the

coherence i s t o u n i t y ,

t h e more

c l o s e l y r e l a t e d a r e t h e two s e r i e s . The p l o t o f phase shows how t h e l i n e a r f i l t e r f o r f i t t i n g one s e r i e s f r o m the

other

shifts

the

phase

of

sinewaves

at

different

frequencies.

This

f u n c t i o n may be i n t e r p r e t e d a t t h e a n g l e between t h e f r e q u e n c y component o f X ( t ) and t h e c o r r e s p o n d i n g comparison o f Y ( t ) . . , linear

association.

This

Is

analogous

to

i.e.

the

the direction of

sign

of

the

the

correlation

coefficient. The p l o t o f g a i n shows how t h e f i l t e r f o r f i t t i n g Y ( t ) g a i n s o v e r X ( t ) a t f r e q u e n c y X.

This i s o f t h e nature o f t h e absolute value o f coefficient

and

a regression

is

a

constant

w i t h respect t o frequency. Capilmo Kiver.

~

Seymour K1wr

~~

Temperature

is

analogous

to

the

This

regression

c o e f f i c i e n t a t each f r e q u e n c y .

0

F i g u r e 18 I s a p l o t o f t h e coherence

between

Seymour

Capilano

River

temperatures.

The

between

two

these

particularly seasonal

high

and water

coherence series around

Is

the

peak and g e n e r a l l y l o w

a t o t h e r frequencies. The Cycles per Year

Figure 18.

plot

of

phase

(Figure

19) shows t h a t t h e l i n e a r f i l t e r

401 Capllano Ril-er - Seymour River - Temperature

.

'I

3

LO

5

Q

15

20

25

Cycles per Year

wiles per Year

F i g u r e 20.

F i g u r e 19

f o r f i t t i n g Seymour t e m p e r a t u r e s f r o m C a p l l a n o t e m p e r a t u r e s I s near z e r o a t a l l frequencles.

This

i n d i c a t e s t h a t t h e two

s e r l e s a r e synchronous.

The

p l o t o f g a i n o f t h e same f i l t e r ( F i g u r e 20) i s c o n s t a n t a t most f r e q u e n c i e s w i t h a mean o f

zero

suggests

the

that

(log of

1).

association

This

coupled w i t h

between

t e m p e r a t u r e s e r i e s I s a l m o s t immediate. f r e q u e n c y t h l s I s w i t h i n one week.

the

the

Capllano

zero

and

phase s h i f t

Seymour

RIver

I n terms o f t h e u n d e r l y i n g s a m p l i n g

Such a r e s u l t seems r e a s o n a b l e s i n c e b o t h

s e r i e s a r e t h e r e s u l t o f t h e same I n f l u e n c e s ( s o u r c e s o f h e a t and m o i s t u r e ) . APPLICATION PROBLEMS I n u n d e r t a k i n g t h i s s t u d y I found a number o f a r e a s where t h e n e x t s t e p was

l e s s than

I n t u l t l v e l y obvious.

I t I s d i f f i c u l t t o make t h e t r a n s i t l o n

between t i m e domain and f r e q u e n c y domaln concepts.

One becomes aware t h a t one

i s c o n s t a n t l y e x p r e s s i n g o n e s e l f i n analogues f r o m t i m e domain a n a l y s l s . i s always t h e case when one v e n t u r e s f r o m t h e known t o t h e unknown.

Such

There a r e

a c o u p l e o f s p e c i f i c a r e a s t h a t I f e e l a r e w o r t h m e n t i o n i n g . namely c h o o s l n g a band w i d t h and a l i a s i n g . Choosing a Band W l d t h The s e l e c t i o n o f an a p p r o p r i a t e band w l d t h f o r smoothing t h e s p e c t r a I s crltical.

I n a l a r g e number o f peaks w h l c h

Too n a r r o w a band w i d t h r e s u l t s

may o r may n o t be r e l e v a n t t o e v a l u a t l n g t h e spectrum. broad a band w i d t h relevant cycles.

results

i n an o v e r

The a p p l i c a t i o n s s o f t w a r e t h a t

spectrum and t h r e e

bandwidths

as d e s c r l b e d

band w l d t h s were t o o n a r r o w f o r broader

band

smoothed

wldth

(0.0128)

f r e q u e n c i e s o f p r i m a r y concern.

was

t h e data used

I used p r o v i d e s t h e e x a c t

previously. series that

and

was

On t h e o t h e r hand t o o

spectrum w h i c h m i g h t mask A l l t h r e e standard

I was examlnlng.

effective

A

I n showing t h e

402

The p r o b l e m w h l c h a r l s e s t h e n I s s e l e c t i n g a band w l d t h w h i c h m l g h t s e r v e f o r many p o t e n t l a l v a r l a b l e s . choice

of

a

band

Is

wldth

remain a

trial

slmllar

to

chooslng

a

class

Slnce t h e

Interval

when

i t I s d l f f l c u l t t o recommend a s i n g l e band w i d t h .

c o n s t r u c t l n g a hlstogram,

It w l l l

and f o r r e c o r d s o f v a r y l n g l e n g t h s .

and

error

process

for

each

series.

a p p l l c a b l l l t y o f t h e method I s r e s t r i c t e d because o f t h i s ,

The

general

s i n c e "automatic"

s e a r c h l n g o f s p e c t r a c a n n o t I n c l u d e t r l a l and e r r o r procedures. Allaslng The equal occurs

when

interval.

s p a c l n g I n t l m e o f o b s e r v a t l o n s may I n t r o d u c e a l l a s l n g . osclllatlons

exist

at

frequencles

greater

than

the

This

sampllng

A l l a s e d peaks I n t r o d u c e d i n t o s p e c t r a can be m i s l e a d i n g I f t h e y a r e

n o t r e c o g n l z e d as a l i a s e s .

Two problems a r e t h e a t t r l b u t i o n o f some mechanism

t o an a l l a s e d peak, o r t h e masklng o f a r e l e v a n t peak by an a l l a s e d peak. There a r e two o p p o r t u n i t i e s f o r d e a l l n g w l t h t h e a l l a s l n g problem.

One

o p t l o n i s t o choose a s a m p l i n g f r e q u e n c y w h i c h I s s m a l l enough so t h a t a l l s l g n t f i c a n t f r e q u e n c l e s a r e a d e q u a t e l y sampled.

T h l s o p t i o n may be f i n e f o r

v a r l a b l e s w h l c h can be measured

but

expensive f o r o t h e r varlables.

electronically

prohlbitlvely

The o t h e r o p t i o n I s t o c o l l e c t d a t a a t a g i v e n

The f l l t e r e d s e r l e s I s t h e n sampled a t a new I n t e r v a l

a t higher frequencies.

w h i c h I s an I n t e g e r m u l t l p l e o f t h e o r i g i n a l r a t e .

is

be

T h l s d a t a s e r i e s i s t h e n f i l t e r e d t o remove t h e unwanted power

sampllng r a t e .

mechanism

would

a

weighted

average

of

which

One p o s s l b l e f l l t e r l n g

numerous examples

exist

I n the

literature. APPLICATION ADVANTAGES S p e c t r a l a n a l y s i s t e c h n i q u e s a r e used t o perlodicltles serles spectra

I n t o a sum o f produced

s i n e and c o s l n e waves

represent

f r e q u e n c y components.

look

for

cyclical patterns

or

The F o u r i e r T r a n s f o r m I s used t o decompose t h e d a t a

I n data.

a

sum o f squares

of

various

frequencies.

( a s I n ANOVA)

The

f o r each o f t h e

Smoothlng o f t h e s p e c t r a l e s t i m a t e s g l v e s

a

spectrum

T h l s methodology w i l l be e x t r e m e l y u s e f u l I n i d e n t l f l c a t l o n o f

perlodlc

which can be u s e f u l l y a p p l i e d t o t h e d e s c r l p t l o n o f t h e I n p u t s e r i e s . and t r e n d components I n w a t e r q u a l i t y t l m e s e r i e s .

L i k e a l l methods t h e r e a r e

some p r a c t l c a l problems, however, t h e s e would appear t o be m i n o r . water q u a l i t y time series,

Spectra f o r

w l t h a p p r o p r i a t e smoothing, can be used t o l d e n t l f y

p e r l o d l c components o f p r a c t i c a l s l g n i f l c a n c e . W i t h t h e I n c r e a s i n g a v a i l a b i l i t y o f s p e c t r a l methods on microcomputers, and t h e g e n e r a l decrease I n computlng c o s t s , becoming more p r a c t i c a l .

appllcatlon of spectral analysis I s

A l t h o u g h m a t h e m a t i c a l l y complex,

the application o f

t h e a v a i l a b l e s o f t w a r e I s s t r a i g h t f o r w a r d , and t h e r e s u l t s o f p r a c t i c a l use.

403 CONCLUSIONS T h i s paper has shown a number o f a p p l i c a t i o n s o f s p e c t r a l a n a l y s i s t o t h e description presented,

and the

analysis spectra

of

water

obtained

quality

for

a

time

data

features o f the o r l g i n a l time series p l o t .

In

series.

series

each

confirmed

case

observable

Applications of spectral analysis

t o w a t e r q u a l i t y s e r i e s w i l l a l l o w s l g n i f i c a n t f r e q u e n c i e s t o be I d e n t i f i e d and e v a l u a t e d , a s i g n i f i c a n t a i d i n t r e n d assessment. Spectral a n a l y s i s i s a u s e f u l t o o l i n t h e a n a l y s i s o f water q u a l i t y time series. and

T h i s methodology complements t!me

Jenkins

(1976).

Brillinger

domain methods such as t h o s e o f Box

(1981a)

suggests

domain methods, as w e l l as t h e i r h y b r i d s , a r e a l l i e s ,

that

time

and f r e q u e n c y

r a t h e r than competitors.

ACKNOWLEDGEMENTS I would

District here. in

like

for

to

having

thank

Bob

provided

us

Jones with

and

the

Greater

Vancouver

the extensive data

Regional

s e r i e s presented

I n 1981, I a t t e n d e d t h e Time S e r i e s Methods i n Hydrosciences Conference

Burlington,

Ontario

where

I had

S p e c t r a l A n a l y s i s by David B r i l l i n g e r .

the

opportunity

The m a t e r i a l

to

be

introduced

he p r e s e n t e d ,

to

and o u r

subsequent d i s c u s s i o n s , l e d t o t h e a p p l i c a t i o n p r e s e n t e d h e r e . LITERATURE CITED F o u r i e r a n a l y s i s o f t i m e s e r i e s : An i n t r o d u c t i o n . John B l o o m f i e l d P., 1976. 258 p. Wlley. and G.M. J e n k i n s , 1976. Time S e r i e s A n a l y s i s : F o r e c a s t i n g and Box, G.E.P. C o n t r o l . Holden-Day. 575 p. B r i l l i n g e r D.R., 1981. Some c o n t r a s t i n g examples o f t h e t i m e and f r e q u e n c y I n : Time S e r i e s Methods i n domain approaches t o t i m e s e r i e s a n a l y s i s . Hydrosciences ( A . H . El-Shaarawi and S.R. E s t e r b y e d t . ) 1-15. B r i l l i n g e r D.R., 1981. Time S e r i e s : Data A n a l y s i s and Theory. Holden-Day. 540 p . C h a t f l e l d C . , 1984. The a n a l y s i s o f Time S e r i e s : An i n t r o d u c t i o n . Chapman and H a l l . 286 p. C h a t f i e l d C . and M.P.G. Pepper. 1971. Time-Series A n a l y s i s : An example f r o m g e o p h y s i c a l d a t a . A p p l i e d S t a t i s t i c s 20:217-238. D i x o n W.J.. 1983. BMDP S t a t i s t i c a l S o f t w a r e . U n i v e r s i t y o f C a l i f o r n i a Press. Jones R.H., 1964. S p e c t r a l a n a l y s i s and l i n e a r p r e d i c t i o n o f m e t e r o l o g i c a l time series. J. A p p l i e d M e t e r o l o g y 3:45-52. 1965. A reappraisal o f t h e periodogram i n s p e c t r a l analysis. Jones R.H.. Technometrics 7:531-542. W h i t f i e l d P.H., 1983. E v a l u a t i o n o f w a t e r q u a l i t y s a m p l i n g l o c a t i o n s on t h e Yukon R i v e r between Dawson, Yukon T e r r i t o r y and Eagle, Alaska. Water Resources B u l l e t i n 19:115-121. W h i t f i e l d P.H and P.F. Woods, 1984. I n t e r v e n t i o n a n a l y s i s o f water q u a l i t y r e c o r d s . Water Resources B u l l e t i n 26:657-668. Z l m e r m a n M . . 1981. A beginner's guide t o Spectral Analysis. Part 1 Byte F e b r u a r y 1981 :68-90. Z i m e r m a n M., 1981. A beginner's guide t o Spectral Analysis. P a r t 2 Byte March 1981 3166-198.

This Page Intentionally Left Blank

RAYES ESTIMATION OF PAPAMETERS OF FIRST ORDER AUTOREGRESSIVE PROCESS

M. S . Ahu-Salih

A. A.

Deyartment o f S t a t i s t i c s

Department o f S t a t i s t i c s

Yarmouk U n i v e r s i t y , I r b i d , JORDAN

King Saud U n i v e r s i t y , Riyadh,

Abrl-Alla

SAUDI ARABIA INTRODUCTION

1.

Consider t h e s t a t i o n a r y a u t o r e g r e s s i v e p r o c e s s of o r d e r o n e , AR(1) w i t h z e r o mean,

yt = eyt-l+

where

{E

t

1

E

~ ~,

E

= I

I ..., -2, -1, 0,

a r e i n d e p e n d e n t normal

1, 2,

...I

( 0 , ~ ’random )

(1.1)

v a r i a b l e s and 181.1.

This is

o f t e n a s t a t i o n a r y r e p r e s e n t a t i o n of t h e e r r o r t i m e s e r i e s i n economic models. S e v e r a l methods o f e s t i m a t i o n l i k e Yuie-Walker’s,

maximum l i k e l i h o o d ,

c o n d i t i o n a l maximum l i k e l i h o o d ( C M L ) , and l e a s t s q u a r e s , were u s e d by Box and Jenkins

(1970), F u l i e r (1976),Hasza (1980)and o t h e r s t o e s t i m a t e t h e unknown

p a r a m e t e r s 8 and

4‘.

Box and J e n k i n s (1970) o b t a i n e d t h e Bayes’ e s t i m a t o r for

8 assuming n o n - i n f o r m a t i v e p r i o r s on 8 and u’.

Abd-Alla and Abouammoh ( 1 9 8 2 )

c a l c u l a t p d Bayes’ e s t i m a t e s o f 8 u s i n g n u m e r i c a l i n t e g r a t i o n methods and assuming uniform and normal p r i o r s f o r 8. I n t h i s p a p e r w e o b t a i n t h e Bayes‘ e s t i m a t o r s f o r 8 and

0’

under i n f o r m a t i v e

and n o n - i n f o r m a t i v e p r i o r s , namely,

The e s t i m a t o r s are compared w i t h t h e CbIL and Box and J e n k i n s ’ e s t i m a t o r s through simulation.

2.

AAYES ESTIMATORS Consider t h e t i m e s e r i e s g i v e n by (l.l), where 8 and u2 a r e assumed t o be

random v a r i a b l e s w i t h j o i n t p r i o r g i v e n i n ( i ) . T h i s p r i o r assumes t h e independence o f u2 and 8 w i t h an i n v e r t e d gamma p r i o r on u2 and a p r o b a b i l i t y

406

density function (pdf) prior on 8 given by l / I I ( l - e 2 ) % for -1<8
We feel it is

necessary to truncate the prior on 8 because the series is stationary.

Prior

(i) reduces to Box and Jenkins’ prior if a+O, d=l and the range of 8 is (-=, Let Y1,Y2,

... ,Y

be a random sample from (1.11, and y. = (y,,y,

-.

realization of such a random sample.

y

= (yl,y2,...y

,...yn) be

m).

a

The likelihood function of

) is given by:

The posterior joint pdf of 8 and u 2 is given by: (2.2)

where

where n

R.=

n y.y. 1-1/,1 i=2 1=3 C

If

2

n and A = { ( 2 a +

2

n

I: yi)/ 1 ~ i=3 i=3

2

2

.

~ -R -

~

j

is an integer we use standard integration formulas and get:

407 cle

J -I

-

[-

(n1!)’(4A)~

-

2

r(0-R) + A 1 2

l

m C R(j)+KI, where i=l

Hence

from a Bayesian point of view we consider a - = (y,,y2, . . . , y ) denotes a realization of the decision function w(y) where y

In seeking an estimate of

0’

...,Yn’ and we assume a squared error loss function

random samp1.e Yl,Y2,

The Bayes’ estimator of u2 with respect to squared-error loss (2.4) is the given by: posterior mean of I?’

o21 = E[o’IyI. I

Using (2.2) and (2.3) we get:

(2.5)

Simplifying (2.5),we get: n A “2 0

1

=

2

1 Yi-q

i=3 -~

n+2d-3

[l-

Bim)

Z R( j)+2 AK j=1

Noticing that,

(2.6)

408

m ->

m

we g e t R ( m ) ->

0 as m ->

m

Substituting for A we get:

(2.7)

The Rayes e s t i m a t o r o f 8 w i t h r e s p e c t t o s q u a r e d - e r r o r

loss is

Carrying o u t t h e i n t e g r a t i o n and s i m p l i f y i n g w e g e t :

Since,

1

1

-

[A+(l+R)21m

[A+(1-R)'Im

2

i-

=

Am

and by S t i r l i n g ' s f o r m u l a , w e have

Then,

81

= R =

1 Y.Y. / i=2 1 1=1

C y;-, i=3

(2.8)

We n o t i c e t h a t f o r l a r g e n , t h e Bayes' e s t i m a t o r o f 8 under p r i o r ( i ) i s t h e same as t h a t o f Box and J e n k i n s (1970). Next, w e c o n s i d e r t h e j o i n t p r i o r pdf ( i i ) . To g e t t h e p o s t e r i o r pdf o f 8 and u2

, the

computation i s g r e a t l y s i m p l i f i e d i f Y1 i s t r e a t e d as f i x e d and

409 the conditional likelihood is used.

This idea is justified and used by Fuller

(1976) to obtain the CML estimators of 0 and u’.

The conditional likelihood function of

=

(y2,...,y ) given y1 is:

The posterior joint pdf of 8 and uz is given by: (2.10)

Carrying out the integration as in (2.3) we get:

(2.11)

n+2d-1 n+2d-3 whenever - is an integer and where rn = 2 ’

1

L, = --(tan i,

-1

* 6

-1

+tan

l+r

-),

b

5

r =

n (

a =

c Yi+2d i=2 2 - r n % 1 yi-1 i=2

The Bayes’ estimator of uz is:

n n 2 I: Y ~ Y ~ - Y~ /~ , -~ ~ i=l 1=2

410

n

7

(2.12)

The Rayes’ e s t i m a t o r of 8 i s :

L a s t l y we c o n s i d e r t h e c o n j u g a t e p r i o r of 8 and c2 given i n ( i i i ) . The p o s t e r i o r pdf o f

e

and u2 i s g i v e n by

C a r r y i n g o u t t h e i n t e g r a t i o n o v e r 8 t h e n o v e r u2 we g e t :

Where

D =

n [ C

2

2

yi-(l+ Z y i=2 i-1 i=2

’c’ i=2

411 The Bayes' e s t i m a t o r o f 0 i s :

n

n

n (2.15)

S i m i l a r l y t h e Bayes' e s t i m a t o r o f u2 i s :

-

3.

[

n+2d-3

"

c

i=2

2

y.-( 1

c

2

"

2

y.y. ) (1+ c y . r 1 + 2 a 1 ; i = 2 1-1 i = 2 1 1-1

(2.16)

RESULTS

I n o r d e r t o compare t h e performance o f t h e d i f f e r e n t estimates used i n t h i s p a p e r , we g e n e r a t e d samples o f s i z e s 50, 1 0 0 , 1 5 0 , and 700 f o r e a c h of t h e twenty -0.1,

f'our ( o , U 2 0.2,

cornbkatioris of parameter v a l u e s , namely, for 0 = -0.7,-0.4,

0.5, 0 . 8 , and O = 0 . 5 ,

1.0,

1 . 5 , and 2 . 0 .

The v a l u e s o f d and c1 were

e s t i m a t e d e m p i r i c a l l y by t h e method of moments.

For each combination o f p a r a m e t e r v a l u e s and sample s i z e , one hundred samples were g e n e r a t e d .

For e a c h sample, e s t i m a t e s o f 0 and u were c a l c u l a t e d u s i n g

t h e methods d e r i v e d i n t h i s p a p e r . each e s t i m a t e were r e c o r d e d . A2

u.

J'

T he mean and mean-square e r r o r (MSE) o f

Tables 1 t o

4

give t h e values of

Bi,

i = 1 , 2 , 3 and

j=l,2,3, and 4 for t h e t w e n t y four combinations mentioned. Since

el

i s t h e same as t h e e s t i m a t o r u s e d by Box and J e n k i n s as shown i n

(2.8), we d i d n o t i n c l u d e t h e l a t t e r i n t h e comparison.

A2

The estimate uL, i s

Box and J e n k i n s ' estimate o f u2 and it w a s i n c l u d e d i n t h e t a b l e s f o r t h e s a k e o f comparison w i t h our estimates.

I t i s n o t i c e d t h a t t h e v a l u e s o f e s t i m a t e s o f 8 by t h e d i f f e r e n t methods

412 ( i n c l u d i n g Box and J e n k i n s ) a r e v e r y c l o s e t o one a n o t h e r and v e r y c l o s e t o The same may b e s a i d a b o u t t h e estimates o f u 2 , b u t it i s

t h e assumed v a l u e .

n o t i c e d t h a t t h e l a r g e r t h e sample s i z e t h e c l o s e r t h e r e s u l t s a r e t o t h e assumed v a l u e . Observirrg t h a t I c . f .

GCL

=

(

'z'

yiyi-l)( i=2

I'

Z

i u l l e r 1Y761 t h e CML of 2

we notice t h a t

6 is

icLi s

very close t o

6

i=2

i n e q u a t i o n ( 2 . 1 6 ) and t h e r e f o r e

GcL

i s not included i n t h e t a b l e s .

as g i v e n

0'

=

e =

-0.7

n

6

50

01

0.110 0.109 0.110

-0.386 -0.381

-0,369

0.081 0.081 0.082

-0.398 -0.395 -0.388

0.101 0.099 0.098

-0.097 -0.096 -0.094

a 3 -0.679

0.063 0.063 0.063

-0.398 -0.396 -0.392

0.073 0.072 0.072

-0.694 -0.692 -0.689

0.052 0.052 0.052

-0.402

-0.401

0.471 0.465 0.474 0.482

0.077

gi2 0.479

i2

i1 -0.695 83

-0.691 -0.683

e2

-0.686 -0.684

,e2 83 6 2

ita32 c4

100

-0.686 -0.677 -0.663

-0.1 Mean M.S.E. 0.151 0.149 0.143

150

50

M.S.E.

-0.090 -0.088 -0.085

a2

200

-0.4 Mean M.S.E.

Mean

0.133 0.130 0.127

83

100

TABLE 1

0.5

2

g2 03; 64

0.476 0.481 0.485

0.2 Mean M.S.E.

0.199

0.5 Mean M.S.E.

0.8

Mean

M.S.E.

0.196 0.188

0.125 0.124 0.119

0.474 0.466 0.151

0.133 0.130 0.128

0.788 0.781 0.768

0.096 0.096 0.098

0.094 0.093 0.092

0.201 0.199 0.195

0.093 0.092 0.091

0.490 0.787 0.479

0.084 0.084 0.083

0.788 0.784 0.778

0.063 0.064 0.064

-0.097 -0.096 -0.095

0.084 0.084 0.083

0.200 0.199 0.196

0.077 0.076

0.075

0.486 0.484 0.479

0.070 0.070 0.069

0.790 0.787 0.783

0.054 0.054 0.054

-0.397

0.066 0.067 0.066

-0.096 -0.096 -0.095

0.063 0.063 0.062

0.201 0.200 0.198

0.070 0.070 0.069

0.506 0.504 0.500

0.058 0.057 0.057

0.794 0.793 0.790

0.042 0.043 0.043

0.076 0.076 0.079

0.490 0.484 0.487 0.502

0.096 0.095 0.095 0.099

0.495 0.486 0.486 0.502

0.094 0.093 0.093 0.096

0.472 0.465 0.466 0.479

0.097 0.097 0.097 0.100

0.466 0.459 0.464 0.471

0.090 0.089 0.089 0.092

0.497 0.493 0.505 0.483

0.093 0.093 0.093 0.083

0.062 0.062 0.062 0.063

0.491 0.487 0.489 0.496

0.071 0.070 0.071 0.072

0.501 0.496 0.497 0.505

0.059 0.059 0.059 0.060

0.482 0.477 0.478 0.486

0.066 0.065 0.065 0.066

0.492 0.489 0.491 0.496

0.072 0.073 0.073 0.074

0.504 0.501 0.508 0.504

0.066 0.066 0.066 0.066

0.060 0.061 0.061

0.061

0.501 0.498 0.500 0.505

0.059 0.059 0.059 0.059

0.497 0.494 0.495 0.500

0.059 0.059 0.059 0.060

0.491 0.488 0.488 0.494

0.051 0.052 0.052 0.053

0.488 0.485 0.487 0.491

0.064 0.063 0.063 0.064

0.491 0.490 0.494 0.493

0.050 0.050 0.050 0.050

0. 045 0. 045 0.045 0.046

0.492 0.490 0.491 0.495

0.055 0.055 0.055 0.055

0.492 0.490 0.490 0.494

0.048 0.048 0.048 0.049

0.497 0.495 0,495 0.500

0.051 0.051 0.051 0.051

0.498 0.496 0.498 0.500

0.049 0.050 0.050 0.050

0.491 0.491 0.494 0.493

0.049 0.049 0,049 0.049

uz = 1 8 =

-0.7

n

8

Mean

50

:i i 2

-0.686 -0.677

83

-0.670

M.S.E.

-0.4 Mean M.S.E.

TABLE -0.1 Mean M.S.E.

CL rp

2 9.2 Mean

..

:.I.S E

Mean

IF-

0.8

0.5

M.S.E.

Mean

M.S.E.

0.096 0.096 0.097

0.110 0.109 0.110

-0.386 -0.381 -0.375

0.133 0.130 0.128

-0.090 -0,088 -0.087

0.151 0.149 0.146

0.199 0.125 0.196 0.124 0.192 0.122

0.474 0.133 0.466 0.130 0.458 0.129

0.788 0.781 0.774

8 3 -0.687

0.081 0.081 0.081

-0.398 -0.395 -0.392

0.101 0.099 0.099

-0.097 -0.096 -0.095

0.094 0.093 0.093

0.201 0.093 0.199 0.092 0.i97 0.092

0.490 0.084 0.487 0.081: 0.&83 0.084

0.788 0.063 0.784 0.06L 0.781 0.064

-0,686 -0.684 6 3 -0.682

0.063 0.063 0.063

-0.398 -0.396 -0.394

0.073 0.072 0.072

-0.097 -0.096 -0.096

0.084 0.084 0.083

0.200 0.199 0.197

0.076

0.486 0.070 0.484 0.070 0.481 0.069

0.790 0.054 0.787 0.054 0.785 0.054

-0.694 -0.692 -0.690

0.052 0.052 0.052

-0.402 -0.401 -0.399

0.066 0.066

-0.096 -0.096 -0.095

0.063 0.063 0.063

0.201 0.070 0.200 0.070 0.199 0.070

0.506 0.058 0.504 0.057 0.502 0.057

0.794 0.042 0.793 0.042 0.792 O.Oh2

0.155 0.153 0.153 0.159

0.978 0.967 0.970 1.003

0.191 0.190 0.190 0.198

0.988 0.970 0.970 1.005

0.188 0.185 0.185 0.193

0.942 0.193 0.927 0.194 0.928 0.194 0.959 0.201

0.928 0.180 0.914 0.178 0.919 0.178 0.942 0.184

0.977 0.969 0.981 0.966

0.173 0.174 0.174 0.167

0.124 0.123 0.123 0.126

0.982 0.974 0.976 0.993

0.142 0.141 0.14: 0.1~4

1.002 0.992 0.992 1.011

0.118 0.118 0.118 0.120

0.962 0.954 0.954 0.971

0.983 0.976 0.979 0.993

0.144 0.145 0.145 0.147

1.003 0.598 1.005 1.008

0.132 0.131 0.131 0.132

0'981 0:990

0.121 0.121 0.121 0.123

1.002 0.997 0.998 1.009

0.117 0.117 0.117 0.119

0.994 0.989 0.989 1.001

0.117 0.118 0.118 0.120

0.982 0.102 0.975 0.104 0.976 0.104 0.987 0.105

0.975 0.127 0.970 0.127 0.971 0.127 0.981 0.128

0.981 0.977 0.982 0.986

0.100 0.100 0.100 0.101

0.997 0.987 0.997 1.004

0.090 0.090 0.090 0.091

0.984 0.980

0.109 0.109 0.109 0.110

0.963 0.979

0.095 0.096 0.096 0.097

0.994 0.101 0.990 0.101 0.990 0.101 0.999 0.102

0.996 0.997 0.993 1.001

0.981 0.098 0.980 0.098 0.983 0.098 0.987 0.099

-0.695

i 2 -0.691 dl

e2

61 $2

e3

,.' 00'978 981

''

"

$;

:j $4

0.981 0.990

0.067

0.979 0.988

0.077

0.076

0.131 0.130 0.130 0.132

0.099 0.099 0.099 0.100

02

TABLE 3

=1.5

a =

-0.7 Mean

M. 3.0,.

-0.686

0.110

-0.677 -0.672

0.109 0.109

-0.695 -0.691

0.081 0.081

-0.14 Plean

0.5

0.2

-0.1

M.S.E.

Mean

M.S.E.

Mean

0.133

0.151

o.iL7

0.199 0.196 0.193

..

>I S. E

0.8

Mean

M.S.E.

0.125 o.12L

0.47i 0.466

0.1.22

0.461

0.133 0.130 0.129

0.490

0.08h

0.788

0.487

0.084 0.084

0.784 0.782 0.790 0.787

Mean

b1.S.E.

-0.386 -0.381 -0.37’7

0.130 0.129

-0.090 -0.088 -0.087

0.101 0.099 0.099

-0.097 -0.096 -0.095

0.094 0.093 0.093

C.201

-0.688 0.081

-0.398 -0.395 -0.393

0.199 0.198

0.093 0.092 0.092

-0.686 -0.68k -0.682

0.063 0.063 0.063

-0.398 -0.396 -0.395

0.073

0.08L 0.084 0.083

0.200 0.199 0.198

0.077 0.076 0.076

o.l.86

0.072

-0.097 -0.096 -0.096

0.h8h 0.482

0.070 0.070 0.070

0.786

0.054 0.054

-0.694 -0.692 -0.691

0.052 0.052 0.052

-0.401 -0.399

0.066 0.067 0.066

-0.096 -0.096 -0.095

0.063 0.063 0.063

0.201 0.200 0.200

0.070 0.070 0.070

0.506 0.504 0.503

0.058 0.057 0.057

0.794 0.793 0.792

0.042 0.042 0.042

1.412 1.389 1.390 1.438

0.289 0.291 0.291 0.301

1.390 1.369 1.373 1.413

0.269 0.266 0.266

1.443 1.430 1.431

0.197 0.194

0.194

1,457

0.198

0.176 0.177 0.177 0.179

1.473 1.h63 1.463

0.154 0.156 0.156 0.158

0.143 0.144 1.468 0.144 1.482 0.116

1.1191 1.&85

-0.402

1.466

0.072

0.lL9

1.410 0.232 1.393 0.229 1.403 0.229 1.4117 0.238

1.450 1.153 1.505

1.435 0.186 1.427 0.185 1.432 0.185 1 . 4 5 5 0,189

1.473 1.361 1.463 1.489

0.213 0.211 0.215

1.488 0.177 1.488 0.177 1.516 0.181 1h91 7 A83 A83 I.501

1.471 0.181

1.503

1.466 0.182

1-1-95

1.469

0.287 0.285 0.285 0.297

0.211

1 *481 0.282 1.453

0.277 1 . 4 5 1 0.278 I.508 0.289 1.503

0.182

1.496

1.485 0.184

1.514

0.176 0.176 0.176 0.178

1.496 1.491

1.477

0.161

1.475

1.470

0.163 0.165 0.165

1.L68

0.135 0.136 1.494 0.136 1.506 0.117

1.471 1.465

0.177

1.1481

1.1385 1.499

0.152 0.152 0.152 0.153

0.481

1.462 1.1454 1.456

0.788

0.781 0.776

0.096 0.096 0.097 0.063 0.064 0.064 0.054

0.276

1.472

0.191 0.190 0.190 0,192

1.494

0.148

1.471

0.147

1.488

0.149

1.489 1.501

0.149

1.469 1.472

0.147 0.147

0.150

1.480

0.148

*

F

01

b 0

=

n 50

100

-0.7 0 Meari M.S.E. 0.110 Cil2 -0.686 0.109 ^ S z 2 -0.677 0.109 8 3 2 -0.674

HIz

M.S.E. 0.125 0.124 0.123

0.5 Mean 0.474 0.466 0.462

...

M S E

0.8 Mean

M.S.E.

0.133 0.130 0.129

0.788

0.781 0.777

0.056 0.096 0.097

0.788 754 0.783

0.063 0.364 0.064

0.790 0.786

0.054 0.054 0.054 0.042 0.042 0.042

-0.398 -c.395 -0.393

0.101 0.399 0.099

-0.097 - C , 296 -0.095

0.094 c.093 0.093

0.201 z.193 0.198

0.093 c.092 0.092

0.490 0.487 C.485

0.084

-0.398 -0.396 -0.395

0.073 0.072 0.072

-0.-97

-0.683

0.063 0.063 0.063

0.084 0.084 0.084

0.200 0.199 0.198

0.077 0.076 0.076

0.486 0.484 0.483

0.070 0.070 0.070

-0.694 -0.692 -0.691

0.052 0.052 0.052

-0.402 -0.401 -0.400

0.066

-0.096 -0.096 -0.095

0.063

0.067 0.066

0.063 0.063

0.201 0.200 0.200

0.070 0.070 0.070

0.506 0.504 0.503

0.058 0.057 0.057

0.794 0.793 0.792

c . 309

1 *954

u4*

1.880 1.857 1.866 1.929

0.305 0.305 0.317

1.932 1.935 2.007

0.382 0.380 0.380 0.396

1.973 1.937 1.938 2.010

0.375 0.370 0.370 0.385

1.881 1,851 I .852 1.918

.36f 0.388 0.388 0.402

1.852 1.823 1.828 1.884

0.358 0.355 0.355 0.367

1.930 1.914 1.927 1.932

0.333 0.335 0,335 0,333

i12

1.913

1.940

1.964 1.948 I. 950 1.986

0.284 0.282 0.282

uh2

0.248 0.247 0.247 0.252

2.003 1.983 ‘I .983 2.021

0.236 0.236 0.236 0.241

1.924 1.907 1 .go7 1.942

0.263 0.259 0.259 0.264

1.964 1.951 1.953 1.985

0.288 0.289 0.289 0.294

1.999 1.990 1.997 2.015

0,262 0,261 0.261 0,264

1.961 1.955

0.242 0.242 0.242 0.245

2.003 I. 993 1.994 2.019

u .234

1.988 1.977 1.977 2,002

0.234 0.236 0.236 0.739

964 1.950 I .950 1.974

3.205

0.234 0.234 0.237

0.207 0.207 0.210

1.949 1.938 1.940 1.962

0.254 0.253 0.253 0.256

1.9ie 1.952 1.956 1.972

0,200 0.199 0.199 c.201

1.995 1.989 1.991 2.008

0.180 0.181 0.181 0.183

I.965 1.960 1.961 1.980

0.219 0.218 0.218 0.220

1.966 1.957 1.958 1.976

0.190 0.192 0.192 0.194

1.988 1.980 1.980 1.999

0.202 0.203 0.203 0.205

1.991 1.984 1.985 2.002

0.198 0.199 0.199 0.200

1.961 1.958 1.961 1.974

0.195 0.196 0.196 0.197

!Iz 0z2 832 * 2

i i 2 c~~~

100

C.2 Mean 0.199 0.196 0.194

0.081 0.081 0.081

832

50

M.S.E. 0.133 n . 130 0.129

-0.1 M.S.E. Mean 0.151 -0.090 0.149 -0.088 0.147 -0.088

-0.695 -0.691 -0.689

822

200

-0.4 Mean -0.386 -0,381 -0.378

150

200 : 2 532 Gk2

0 .287

-0.096 -0.096

‘I.

c. 394 0.084

:.

0.787

417 REFERENCES Abd-Alla, A.A., and Abouammoh, A.M., 1982. A c o m p a r a t i v e s t u d y on e s t i m a t i o n o f parameters o f a Markovian process-1. Time S e r i e s Methods i n H y d r o s c i ences, E d i t o r s : E.H. El-Shaarawi and S.R. E s t e r b y . 1982. S c i e n t i f i c P u b l i s h i n g Company, Amsterdam. and J e n k i n s , G.M., 1970. Time s e r i e s a n a l y s i s f o r e c a s t i n g and Box, G.E.P., c o n t r o l . Holden-Day, San F r a n c i s c o . F u l l e r , W.A., 1976. I n t r o d u c t i o n t o s t a t i s t i c a l t i m e s e r i e s . John W i l e y & Sons I n c . , New York. 1980. A n o t e on maximum l i k e l i h o o d e s t i m a t i o n f o r t h e f i r s t Hasza, D.P., o r d e r a u t o r e g r e s s i v e process. Com. S t a t i s t . Theor. Math. A 9 ( 1 3 ) , 14131415.

A SYSTEMS APPROACH TO COMPUTERIZING DATA ACQUISITION

BY THOMAS R . CLUNE

Abstract: The problems o f c o m p u t e r i z i n g an e s t a b l i s h e d l a b o r a t o r y procedure a r e i e g l o n and h i g h l y s p e c i f i c . Even i n s u c c e s s f u l c o m p u t e r i z a t i o n p r o J e c t s , t h e s e problems te n d t o be d e a l t w i t h on an ad hoc b a s i s as t h e y a r i s e . This paper a t t e m p t s t o p r e s e n t a s y s t e m a t i c o v e r v i e w o f t h e a u to m a ti n g process, so t h a t c o m p u t e r i z a t i o n may be a ch i e ve d I n an o r d e r l y manner, a c c o r d i n g t o s p e c l f i c a t l o n . it is necessary t o c o n s i d e r a g r e a t deal o f d e t a l l i n d e s i g n i n g a c o m p u te r l z e d i n s t a l l a t i o n . i n t h i s pa p e r, t h e d e t a i l i s always c o n s i d e r e d from t h e p e r s p e c t i v e o f how i t a f f e c t s t h e o v e r a l l performance o f t h e a c q u i s i t i o n system. 1 . 0 INTRODUCTION Uni I k e most i n s t r u m e n t a t i o n purchases I n a l a b o r a t o r y , mlcrocomputers a r e a c q u l r e d f o r d a t a a c q u l s i t l o n most commonly o u t o f a g e n eral d e s i r e t o modernize and s l m p l l f y t h e r u n n i n g o f the lab, r a t h e r than t o perform a s p e c i f i c , w e l l - d e f i n e d task f o r whlch t h e computer i s underst ood t o be i d e a l l y s u i t e d . As a r e s u l t , most a t t e m p t s t o c o m p u t e r i z e l a b o r a t o r y f u n c t l o n s end i n a t least p a r t i a l f a i l u r e . The t i m e r e q u i r e d t o c o m p u t e r l z e an e s t a b i l s h e d . w e l l - u n d e r s t o o d procedure i s enormous. T y p l c a l development ti me s range from S I X months t o a y e a r . i t i s thus desirable t o e s t a b l i s h u n e q u i v o c a l l y t h a t t h e need e x l s t s f o r c o m p u t e r l z a t l o n b e f o r e t h e p r o J e c t I s undert aken. There a r e t h r e e major sources o f f a i l u r e I n au toma tl on p r o J e c t s . The f i r s t stems from u n d e r e s t i m a t i n g t h e amount o f d i g l t a l i n f o r m a t i o n necessary t o repro d u ce an an a l o g e x p e r l m e n t . For example, a s i n g l e sweep o f a d l g l t a i o s c l i i o s c o p e w i l l t y p i c a l l y r e p r e s e n t 2 Kbyt es o f d a t a . Related t o the u n d e r e s t l m a t l o n o f t h e amount o f d a t a i s t h e u n d e r e s t i m a t i o n o f t h e t l m e i t t a k e s t o download t h a t d a t a t o t h e computer. To i l l u s t r a t e : I comput erized a t i m e - r e s o l v e d l a s e r sp e ctrosco p y experiment a t B r a n d e l s U n l v e r s l t y whi ch employed a B i o m a t i o n 8100 waveform d i g i t i z e r and an I B M CS-9000 mi cro co mp u ter. The scan window o f t h e B i o m a t l o n was 20 microse co n d s. B l o m a t i o n s e l l s an IEEE-488 i n t e r f a c e c o n v e r t e r box f o r t h e 8100 t h a t makes i t c o m p a t i b l e w i t h most microcomput ers. However, t h e t h r o u g h p u t on i t would t h u s have r e q u i r e d 2 t h i s box i s 1 Kbyt e/ second. seconds j u s t t o t r a n s f e r each 20-microsecond scan t o t h e computer. We c o u l d n o t a f f o r d t o W al t t h a t l o n g on t h l s ex p e r l m e n t, so I designed a h y b r i d i n t e r f a c e between t h e 8100 and t h e IEEE-488 p o r t o f t h e computer whi ch was a b l e t o f u n c t i o n a t 300 K b y t e s / s e c . I have d e s c r i b e d t h e hardware C31 and s o f t w a r e C51 o f t h i s e xperlment elsewhere. What I want t o p o i n t o u t he re i s t h a t 1 ) t h e computer f u n c t l o n s o f a d a t a a c q u i s i t i o n experiment w i l l o f t e n be t h e slow s t e p , and 2 ) t h e m a n u f a c t u r e r s a r e n o t n e c e s s a r i l y v e r y good a t o p t i m i z l n g t h e comp u teri ze d performance o f t h e i r own i n s t r u m e n t s . Indeed, H ew i e tt-Pa cka rd , which i n v e n t e d t h e IEEE-488, o f f e r s a c q u i s i t i o n r a t e s from i t s d i g i t i z e r s v i a IEEE-488 t h a t a r e e s s e n t i a l l y t h e same as Blomation's. You would n o t expect such p e rforma n ce i f you looked

419 a t the Interface speclflcatlons rather than the instrument's u t l i l z a t i o n o f the interface. The maxlmum r a t e d t h r o u g h p u t on an IEEE-488 I n t e r f a c e i s 1 m e g a b y t e / s e c o n d ! The second k l n d o f m i s u n d e r s t a n d i n g t h a t undermines s u c c e s s f u l c o m p u t e r l z a t l o n m i g h t be c h a r a c t e r l z e d a s t h e b e l i e f t h a t p u t t l n g an A / D b o a r d i n t o a m i c r o c o m p u t e r c r e a t e s a d a t a a c q u l s l t i o n Instrument. I n r e a l i t y , t h e r e I s a g r e a t deal o f e n g l n e e r l n g t h a t goes I n t o a s t a n d - a l o n e i n s t r u m e n t . An A / D b o a r d i s o n l y one s m a l l component o f a commerclal I n s t r u m e n t . If you e l e c t n o t t o pay f o r an i n s t r u m e n t company t o s o l v e y o u r e n g i n e e r i n g p r o b l e m s f o r you, you must be p r e p a r e d t o do t h a t englneerlng yourself. The t h l r d d l f f i c u l t y e n c o u n t e r e d I n c o m p u t e r i z a t l o n stems f r o m t h e d e s l r e t o I n c l u d e u n n e c e s s a r y and h l g h l y complex r e f i n e m e n t s I n t h e system. F o r example, r e a l - t l m e d i s p l a y and a n a l y s i s o f data almost always I n t e r f e r e s w l t h t h e a b i l i t y t o acqulre the data i t s e l f . S l m l l a r l y , t h e d e s l r e t o use t h e data a c q u l s l t l o n computer f o r word p r o c e s s i n g o r d e p a r t m e n t a l bookkeeping as f o r e g r o u n d t a s k s w h l i e t h e system I s c o l l e c t i n g d a t a can J e o p a r d l z e t h e d a t a a c q u i s i t i o n p r o c e s s . T h i s paper w I I I a t t e m p t t o h e l p you d e t e r m i n e whether y o u r a p p l l c a t l o n I s s u i t a b l e f o r c o m p u t e r l z a t l o n , and, I f so, what k l n d o f c o n f l g u r a t l o n you w i l l need. Throughout t h e paper, t h e need f o r a systems a p p r o a c h t o a u t o m a t i o n i s emphasized. 2 . 0 A / D CONVERTERS

The f i r s t p r o c e s s we w I I I c o n s i d e r I s d l g i t l z l n g t h e data. T h i s I s t h e a r e a where y o u r a n a l o g e x p e r i e n c e I s l e a s t a p p l l c a b l e and, u n f o r t u n a t e l y , a l s o t h e a r e a f l i i e d w l t h t h e most f a l s e o r mlsleadlng statements i n t h e popular l i t e r a t u r e . I w i l l d i s c u s s d i g i t i z i n g by u s l n g A I D b o a r d s - - n o t because I b e l l e v e t h a t I t I s t h e b e s t c h o l c e , b u t because I t I s t h e a p p r o a c h most fraught w i t h d l f f l c u l t i e s . I b e l l e v e t h a t stand-alone d i g i t i z e r s make much more sense i n a l a b o r a t o r y t h a n do A/D b o a r d s . N o n e t h e l e s s , t h e r e l a t l v e c o s t o f a waveform d i g i t i z e r and an A / D b o a r d I s such t h a t many p e o p l e a r e t e m p t e d t o s a v e some money by u s i n g t h e A/D board. You c a n v i e w what f o l l o w s a s an argument on why such an a p p r o a c h i s l e s s a t t r a c t i v e t h a n I t may f l r s t a p p e a r . The p r l m a r y q u e s t l o n s a b o u t d i g i t l z l n g d a t a a r e : How f a s t do I need t o sample t h e a n a l o g s t r e a m ; what r e s o l u t i o n i s n e c e s s a r y i n t h e d i g l t i z a t l o n f o r u s e f u l a n a l y s l s ; and what l e v e l o f c o o r d l n a t i o n between d l f f e r e n t s e n s o r s ' r e a d l n g s do I need?

2.1

SAMPLING R A T E S AND ALIASING

Minimum s a m p l i n g r a t e I s I n v a r i a b l y d l s c u s s e d I n t h e I l t e r a t u r e I n conJunctlon w l t h allaslng. A typical presentatlon goes s o m e t h i n g I l k e t h i s : A l l a s l n g I s t h e phenomenon I n w h i c h a h l g h - f r e q u e n c y s i g n a l a p p e a r s t o be a l o w e r - f r e q u e n c y s l g n a l , and I s caused by I n s u f f i c i e n t s a m p l l n g r a t e . The N y q u i s t theorem s t a t e s t h a t t h e s a m p l i n g r a t e s h o u l d be a t l e a s t t w l c e t h e f r e q u e n c y o f t h e f a s t e s t waveform sampled. W h l l e t h i s I s n o t an e x a c t q u o t e o f any a r t i c l e w l t h w h l c h I am f a m i l i a r , t h e c o n t e n t s a r e f u n c t i o n a l l y e q u i v a l e n t t o a l m o s t any p o p u l a r e x p o s i t l o n t h a t you w i l l r e a d on t h e s u b j e c t . I t contains a v a r i e t y o f errors and m l s l e a d l n g i m p l l c a t i o n s . W h i l e t h e f l r s t s e n t e n c e approaches t h e t r u t h , I t r e q u i r e s s l g n i f l c a n t expansion. A l l a s i n g I s a phenomenon t h a t c a n o n l y be e x p r e s s e d r e l a t i v e t o an a n a l y s l s r o u t i n e . I f , f o r example, you a r e

420 a n a l y z i n g d a t a by u s i n g t h e f a s t F o u r i e r t r a n s f o r m (FFT), t h e r e w i l l be c o n d i t i o n s under which t h e t r a n s f o r m i n t r o d u c e s a systematic e r r o r i n t o t h e data. To u n d e rstan d t h e n a t u r e o f t h a t e r r o r , l e t us r e c a l l some o f t h e h i g h l i g h t s o f t h e method. F i r s t , t h e F F T i s a d i s c r e t e method, i . e . , s e p a r a t e p o i n t s o f a ( p r e s u m a b l y ) c o n t i n u o u s st ream a r e sampled and used t o r e c o n s t r u c t t h e f requency components o f t h e c o n t i n u o u s stream. S o f a r , t h e F F T does n o t d i f f e r from any o t h e r d i g i t a l sampling technique. Second, t h e F F T t r e a t s a i l waveforms as b e i n g c o n s t r u c t e d from some c o m b i n a t i o n o f superimposed s i n e waves. T h i r d , N d a t a p o i n t s produce an N/ P-po i n t t r a n s f o r m (frequency-domain o u t p u t ) . The o t h e r h a l f o f t h e p o i n t s a r e thrown away because d i s c r e t e t r a n s f o r m s s i m p l y produce t h e m i r r o r - i m a g e s o f t h e f i r s t N/2 p o i n t s . The same phenomenon t h a t produces t h e redundancy i n d i s c r e t e F o u r i e r t r a n s f o r m s produces a l i a s i n g f o r sampling r a t e s l e s s t h a n t w i c e t h e freq u e n cy o f t h e h i g h e s t - f r e q u e n c y s i g n a l component (because t h e n t h e tran sforme d spectrum o v e r l a p s t h e m i r r o r - i m a g e s p e c t r u m ) . The p o i n t h e re i s t h a t t h e N y q u i s t theorem i s n o t about a l i a s i n g per se, b u t about aliasing in the FFT. F u r t h e r , t h e n o t i o n o f a f requen cy component i n an F F T i s an a b s t r a c t one. The s i n e wave f requen cy o f t h e F F T has l i t t l e t o do w i t h t r a n s i e n t s . i f you have a s i n g l e t r a n s i e n t t h a t you have c o l l e c t e d two p o i n t s on, t h e F F T i s n ' t g o i n g t o g i v e you a m e a n i n g fu l d e s c r i p t i o n o f t h e s i g n a l . R a t h e r , i t i s assumed t h a t t h e t r a n s i e n t i s composed o f s i n u s o i d a i f r e q u e n c i e s t h a t r e p e a t t h r o u g h o u t t h e sampling wlndow and c o n s t r u c t i v e l y i n t e r f e r e t o form t h e observed t r a n s i e n t . Consider that t h e maximum sine-wave component i s i n a square wave ( o r any waveform t h a t i s n o t a s i m p l e s i n e wave). W h i l e t h e commonly made p o i n t t h a t you should always use a low-bandpass f i l t e r w i t h a c u t - o f f fre q u e n cy n o t more t h a n h a l f t h e sampling r a t e f o r FFT a n a l y s i s i s t r u e , i t d o e s n ' t t e l l you what t h e h i g h e s t f req u e n cy sine-wave component t h a t i s s i g n i f i c a n t t o your d a t a i s . i f anyone does n o t c l e a r l y u n d e r s ta n d t h i s p o i n t , I u r g e them t o t r y t h e f o l l o w i n g s i m p l e experiment: g e n e r a t e a 1 KHz square wave, f i l t e r t h a t wave t h r o u g h a 2 KHz low-pass f i l t e r , and d i s p l a y t h e o u t p u t on an osciiloscopt. There i s no s u b s t i t u t e f o r e x p e r i e n c e i n the se things. The p o i n t you must r e c o g n i z e i s t h a t t h e r e i s no r o y a l ro a d t o d e t e r m i n i n g t h e minimum necessary sa mp l i n g r a t e , even i n t h e w e l l - d e f i n e d and t h o r o u g h l y - s t u d i e d r e a l m o f t h e F F T . Wh i l e you can and s h o u l d l e a r n what t h e FFT I i m i t a t l o n s a r e i n t h e o r y , t h e a p p l i c a t i o n t o your experiment depends on an a n a l y s i s o f what your a c t u a l waveforms look l i k e . A l i a s i n g i s n o t a problem u n i q u e t o t h e F F T . You have p r o b a b l y seen cowboy movies i n which wagon wheels appear t o r o t a t e backwards w h i l e t h e wagon moves f o r w a r d s . The cause o f t h i s i s t h a t t h e r a t e o f sampling o f t h e movie camera r e l a t i v e t o t h e s y m m e t r i c a l l y e q u i v a l e n t p o s i t i o n s o f t h e wagon wheel can most d i r e c t l y be i n t e r p r e t e d by our p e r c e p t u a l a p p a r a t u s ' " a l g o r i t h m " by assuming t h a t t h e wheel i s s p i n n i n g I n r e v e r s e . You can r e a d i l y reproduce t h i s k i n d o f phenomenon i n t h e l a b o r a t o r y I f you have a waveform d i g i t i z e r and a s i n e wave generator. Set t h e sampling r a t e a t about 1 / 1 0 t h e g e n e r a t o r ' s f re q u e n c y , t h e n f i n e - t u n e t h e f requen cy g e n e r a t o r u n t i l you see what appears t o be a p u r e s i n e wave. What i s i n t e r e s t i n g about t h i s e x p e r i m ent i s t h a t t h i s k i n d o f a l i a s i n g i s v e r y frequency-sensitive. A v e r y small ad j u stmen t i n t h e frequency g e n e r a t o r w i l l make t h e d i g i t i z e d o u t p u t l o o k l i k e garbage. The F F T , on t h e o t h e r hand, w i l l always f i n d some s e t o f f r e q u e n c i e s that the data w i l l f i t . That i s , any t i m e you sample above t h e

N y q u i s t fr e q u ency and a n a l y z e t h e d a t a w i t h an FFT, you w i l l produce m e a n i n g f u l b u t f a l s e o u t p u t . For t h i s reason, band-pass f l l t e r i n g i s more i m p o r t a n t w i t h FFT t h a n w i t h most o t h e r a n a l y t i c a l methods. W i t h methods o f a n a l y s i s o t h e r t h a n an FFT, a i l a s i n g may o b t a i n o n l y f o r a v e r y s e l e c t number o f v e r y narrow bands. Thls f a c t i s e x p l o i t e d i n some A I D systems t o a l l o w you t o t r a c k c o n t i n u o u s s i g n a l s t h a t a r e a t a much h i g h e r fre q u e n cy t h a n t h e sam p l i n g r a t e . For example, t h e Hewl ett-Pa cka rd company s e l l s an in e x p e n s i v e d i g i t i z e r t h a t samples a t 25,000 sampies/second b u t w l i i t r a c k a c o n t i n u o u s s i g n a l o f up t o 5 M H z . i n order t o ac c o m p l i s h t h i s , t h e d l g i t i z e r i n c o r p o r a t e s a v e r y f a s t sample-and-hold c l r c u i t t h a t samples t h e s l g n a i a t random i n t e r v a l s f o r very b r i e f periods. I must c o n f e s s t h a t I d o n ' t know what t h e a l g o r i t h m i s f o r r e c o n s t r u c t i n g t h e waveform, b u t I know enough t o be w o r r l e d about i t . The f i r s t problem o f such a sam p i l n g method I s t h a t t h e waveform must be r i g o r o u s l y p e r i o d i c . A damped s i n e wave, f o r example, canno t be a n a l y z e d by such r a r e and random s a mpling. The second problem i s t h e o c c u r r e n c e o f t h a t " v e r y s e l e c t number o f v e r y narrow bands" o f a l i a s i n g . I have n o t seen t h e HP d i g i t l z e r m a l f u n c t i o n , b u t i would expect t h a t any such sampling t e c h n i q u e would have t o have a worst-case s e t o f i n p u t s under which I t would. The p o i n t h e r e I s n o t t h a t H e w l e tt- P a c k a rd i s s e i i l n g a f a u l t y p r o d u c t , b u t t h a t you must know t h e I i m i t a t i o n s o f your equipment and t h a t tho se I i m i t a t i o n s may n o t be i mmediat ely o b v i o u s . We a r e s t l i i l e f t w i t h o u t an answer as t o what t h e minimum s a m p ling r a t e r e q u i r e d f o r an e xp e ri me n t would be. Answering t h i s q u e s t i o n i n t h e a b s t r a c t I s always dangerous. However, i w i l l suggest g u i d e l i n e s t h a t I b e i l e v e t o be re a s o n a b l e . F i r s t , t h e problem o f a l i a s i n g s h o u l d never a r i s e i n an a c t u a l experiment. You must sample a t a r a t e much h i g h e r tha n A necessary t o a v o i d a l i a s i n g i n o r d e r t o have r e l i a b l e d a t a . good r u l e o f thumb i s t h a t no f ormal a n a l y t i c a l t e c h n i q u e i s b e t t e r t h a n your eye. I f you c a n ' t t e l l what you want t o know from a p l o t o f t h e raw d l g i t i z e d d a t a , a nu me ri ca l method o f anaiysls probably c a n ' t e i t h e r . As you read t h e l i t e r a t u r e on sample r a t e s , you w I i I d i s c o v e r t h a t t h e r e a r e two d i f f e r e n t s c h o o l s o f t h o u g h t on how many samples per p e r i o d i s enough. One, t h e computer s c i e n c e s c h o o l , argues t h a t t h e fewest number o f p o i n t s t h a t w i l l work i s t h e b e s t number o f p o i n t s . T h e i r concern i s t h a t more d a t a means more a n a l y s i s t i m e on t h e computer. I am r e a s o n a b l y c o n f l d e n t t h a t most s c i e n t i s t s w i l l n o t s h a r e t h i s p e r s p e c t l v e . Clearly. t h e a p p r o p r i a t e maximum number o f p o l n t s i s t h e maximum t h a t you can g e t . I f you have t o w a i t two hours f o r t h e a n a i y s l s t o be completed, t h e p a y o f f i s b e t t e r a n a l y s i s . As l o n g as you have th'e t i m e (and memory), t h e r e i s no b e t t e r way t o use I t t h a n w a i t i n g f o r good r e s u l t s . i t i s o f t e n supposed t h a t one method o f a n a l y s i s i s b e t t e r t h a n a n o t h e r , i n t h e sense t h a t i t w i l l g e n e r a l l y g i v e t h e same p r e c l s l o n w i t h fewer d a t a p o i n t s t h a n a n o t h e r . i n my experience, t h e r e i s l i t t l e d l f f e r e n c e I n t h e e f f i c i e n c y ( a l t h o u g h a l o t o f d i f f e r e n c e i n t h e a p p l i c a b i l i t y ) o f most common methods o f a n a l y s i s . The one method t h a t comes t o mind as g e n e r a l l y l e s s e f f i c i e n t t h a n most i s t h e moving average, o r bo x c a r , method. The one method t h a t i s al w a ys t h e b e s t f o r da ta smoothing a p p l i c a t i o n s ( b u t n o t f o r reasons o f e f f i c i e n c y ) i s s i g n a l averaging. There i s no mat he ma tl cai s u b s t i t u t e f o r d a t a . A good q u i c k o v e r v i e w o f methods o f a n a l y s i s i s [ l o ] .

422 2 . 2 RESOLUT I ON D i s c u s s l n g mlnlmum r e s o l u t i o n i s r a t h e r l i k e t e l l i n g a good-news, bad-news Joke. The good news i s t h a t you need fewer b i t s r e s o i u t l o n t h a n you t h l n k t h a t you do. The bad news i s t h a t you GET fewer b l t s t h a n you t h i n k you do. F l r s t , t h e good news. People commonly assume t h a t t h e minlmum r e s o l u t i o n needed f o r r e s e a r c h - q u a l l t y work i s 1 2 b i t s . I n r e a l i t y , p u b l i s h a b l e - q u a l l t y r e s e a r c h I s s t i l l done w l t h high-quality 6-bit d i g l t i z e r s . A t B r a n d e i s U n l v e r s i t y , one res e a r c h e r i n c o o r d l n a t l o n complexes does q u a n t l t a t i v e work on log-llnear data w i t h a 6 - b i t d l g i t l z e r ! I w o u l d n ' t recommend t h a t as an I d e a l number, b u t i t w I i I s u f f i c e f o r a l o t o f work. F u r t h e r , most work w i l l n o t r e q u i r e more t h a n 8 - b i t s o f resolutlon--assuming t h a t you r e a l l y have a f u l l 8 b i t s t o work with. And t h e r e I s a r e a l advantage t o n o t u s i n g a d i g l t i z e r l a r g e r t h a n 8 - b i t s I f you d o n ' t have t o . Many computers t r a n s f e r data a byte (8 b i t s ) a t a tlme. I f you use a 10 o r 1 2 - b i t c o n v e r t e r , you r e q u i r e two t r a n s f e r s per r e a d l n g . i f you use an 8 - b l t c o n v e r t e r , you can t r a n s f e r t w i c e as many samples i n t h e same p e r i o d o f t i m e . The r a t e o f t r a n s f e r o f d a t a I s o f t e n t h e i i m l t i n g f a c t o r I n how many samples per second you can make w i t h your A / D c o n v e r t e r . You w i l l g e n e r a l l y be b e t t e r serve d making more c o n v e r s l o n s per second t h a n more p r e c i s e c o n v e r s l o n s . The a v e r a g i n g o f n o l s e t h a t comes from a d d i t l o n a i r e a d l n g s n o r m a l l y w I i I be more u s e f u l t h a n h a v l n g a v e r y a c c u r a t e r e c o r d o f t h e noise. On t h e o t h e r hand, s i n c e you w i l l have t o Walt t h e same l e n g t h o f t l m e f o r a 6 - b l t t r a n s f e r and an 8 - b i t t r a n s f e r on a computer w i t h an 8 - b l t bus, you m i g h t as w e l l g e t t h e added resolutlon. S l m l i a r l y , I f your computer has a 1 6 - b l t bus, you m i g h t as w e l l use a 1 6 - b i t c o n v e r t e r ( u n l e s s t h e b o a rd w i l l sup p o r t two 8 - b i t t r a n s f e r s a t o n c e ) . The p o i n t I s t o a v o l d p a y i n g a speed p e n a l t y f o r r e s o l u t l o n , n o t t o a v o i d r e s o l u t i o n a t any c o s t . Now f o r t h e bad news. i f you rea d t h e m a n u f a c t u r e r ' s spec sheet on an A / D board, t h e r e s o l u t l o n w l I I i n v a r l a b l y be r e p o r t e d as + / - 1 LSB ( l e a s t s l g n l f i c a n t b i t ) or b e t t e r . For example, a 1 2 - b i t board t h a t i s designe d t o r e a d 0-10 V commonly w I I i be s a i d t o have a " r e s o l u t i o n " o f 0.0049 V . However, t h e e f f e c t i v e r e s o i u t l o n o f t h e board w i l l be o r d e r s o f magnitude l e s s t h a n t h a t I n most a p p i i c a t l o n s . C l e a r l y , what you c a r e about i s what t h e board w l i i r e a l l y do. I t i s extremely r a r e f o r I n my e x p e r i e n c e , i t an A / D b o a r d manuf act urer t o t e l l you t h a t . I s e x t r e m e l y r a r e f o r an A / D board m a n u f a c t u r e r t o even know what t h e e f f e c t l v e r e s o l u t i o n o f h i s board I s . W h l l e many t h i n g s w l i i a f f e c t t h e r e s o l u t i o n o f an A / D bo a r d , t h e r e I s o n l y one t h a t you can r e a d l i y do a n y t h i n g a b o u t. Boards t h a t p l u g I n t o a c o m p u t e r ' s expa n sl o n s l o t a r e s u b J e c t t o t h e e l e c t r o m a g n e t i c f l e i d o f t h e c o m p u t e r ' s power t r a n s f o r m e r . T h l s e f f e c t w l i i be e s p e c l a i i y pronounced i f you a r e t r y i n g t o rea d s m a l l v o l t a g e s , as w l t h a thermocouple. It I s also s l g n i f l c a n t i y a f f e c t e d by t h e c h o i c e o f s l o t on t h e computer. In ge n e r a l w i t h t h i s k i n d o f board, you s h o u l d use t h e s l o t f a r t h e s t from t h e power s u p p l y f o r t h e A / D b o a r d . To a v o l d t h i s e f f e c t , some A / D m a n u f a c t u r e r s p l a c e t h e l r A / D c l r c u i t r y I n a box e x t e r n a l t o t h e computer. I f t h e box does n o t have i t s own power s u p p l y , t h i s can be an e f f e c t i v e s t r a t e g y . I f i t does, you must c o n s i d e r whether t h e e x t e r n a l box was d e sl g n e d t o s h l e l d t h e A / D from e x t e r n a l f i e l d s o r s l m p i y t o p r o v l d e more r e a l e s t a t e f o r the A/D product.

423 The s l n g i e l a r g e s t source o f e r r o r i n A I D comparisons I s not a function o f f a u l t y electronics, but o f deslgn cholce. Most I ne x p e n s i v e A / D boards do n o t use sample-and-hold c l r c u l t r y on t h e a n a l o g I n p u t end. i n o r d e r t o und e rstan d what e f f e c t l v e r e s o l u t i o n can be expect ed from a board , you must u n d e rstan d what t h e consequences o f t h i s d e s l g n c h o i c e a r e . I w i l l expialn t h i s by example. Conslder a 1 2 - b l t A / D comparator t h a t can make 100,000 c o n v ersions/ second and i s s e t f o r 0-10 V measurements. For t h e sake o f s i m p l i c i t y , l e t us assume t h a t we a r e t r y l n g t o t r a c k a t r i a n g u l a r wave w l t h p o l e - t o - p o l e v o l t a g e swing o f 10 V , i . e . , a wave t h a t c o v e r s t h e f u l l s c a l e ( f s ) o f v a l u e s f o r t h e co m p a r a to r . Comparators u s u a l l y work by " s u c c e s s l v e a p p r o x i m a t l o n , " which means t h a t t h e y compare t h e s i g n a l t o 5 V , and i f t h e s i g n a l i s l a r g e r , s e t t h e most s l g n l f i c a n t b i t t o 1 , t h e n s e t t h e n e x t b i t and so on t h r o u g h a i l t w e l v e b l t s . One f u l l comparison t a k e s a p p r o x i m a t e l y 1/100,000 o f a second. i f t h e i n p u t s i g n a l I s a l l o w e d t o change as we a r e comparing i t , t h e I n p u t v o l t a g e must change no more t h a n 0.0049 V ( t h e v a l u e o f t h e LSB i n t h l s example) i n 1/100,000 o f a second i n o r d e r t o have 1 2 - b i t r e s o i u t l o n , assuming i d e a l e l e c t r o n i c s . A t r i a n g u l a r wave o f 10 V goes t h r o u g h a 1 0 V change I n 1 / 2 i t s period. Thus, t h e maxlmum fr e q u ency we c o u l d t r a c k w i t h 1 2 - b i t r e s o l u t i o n i s : 100,000 r e a d i n g s / s e c / (2048 d l v i s i o n s / f s swing * 2 f s s w l n g s / c y c i e ) = a p p r o x i m a t e l y 2 5 Hz. Make s u r e t h a t you un d e r s ta n d t h i s p o i n t - - i t I s seldom r e c o g n l z e d , b u t a b s o l u t e l y c r l t l c a i t o e v a l u a t i n g t h e l i m i t s o f p r e c l s l o n f o r an A / D board of t h i s type. I f you o n l y need 8 - b l t r e s o l u t i o n , p l u g 256 into t h e f o r m u l a i n s t e a d o f 2048 and g r l n d i t o u t . Understand t h a t t h l s v a l u e i s an i d e a l i i m l t o f p r e c i s i o n . I t assumes t h e b e s t p o s s i b l e waveform, changlng w l t h a b s o l u t e l i n e a r i t y , and I t assumes i d e a l e i e c t r o n l c s on t h e A / D b o a r d . To I n c r e a s e t h e r e s o l u t l o n o f t h e A / D a t h l g h e r speeds, some companles use sample-and-hold ( S / H ) c i r c u i t r y . What an S / H does i s t a k e a q u i c k r e a d l n g o f t h e an a l o g v o l t a g e and s t o r e i t u n t i l a c o n v e r s l o n can be made. What h o l d s t h e v o l t a g e i s a simple c a p a c l t o r . The e r r o r s t o whlch t h i s k l n d o f c l r c u l t I s h e l r a r e t h u s t h o s e a s s o c l a t e d w l t h any R C c i r c u i t . L e t us b r l e f i y I n d i c a t e t h e major p o t e n t i a l problems. F i r s t , the c a p a c i t o r may leak ( d r o o p ) , t h a t i s , I t may s l o w l y l o s e a charge that I t i s trying t o store. The second p ro b l e m I s a s s o c l a t e d w i t h charglng time. The RC c l r c u l t must be exposed t o t h e analog v o l t a g e f o r a r e p r o d u c l b i e p e r i o d o f t l m e a t r e g u l a r l y spaced intervals. The d e v l a t l o n from r e g u l a r i t y i s c a l l e d J l t t e r . W h i l e i t i s c h a r g i n g , t h e analog s i g n a l must be e s s e n t l a i i y c o n s t a n t , o r t h e same average v a l u e f o r a s i g n a l t h a t I s i n c r e a s i n g and a s l g n a i t h a t I s d e c r e a s l n g w i l l n o t be s t o r e d as Next, t h e equal. T h i s phenomenon I s c a l l e d h y s t e r e s i s . comparator must n o t e x e r t s l g n l f i c a n t load on t h e RC c i r c u i t w h l i e i t I s maklng i t s comparison, o r e l s e t h e RC c i r c u i t w l i i dlscharge w h i l e being read. F i n a l l y , t h e S / H c l r c u l t must have adequate t l m e t o d i s c h a r g e t o below 1 LSB V b e f o r e re-sa mp i l n g , o r t h e v o l t a g e r e a d w i l l be p a r t i a l l y due t o a r e s l d u a l ch a rg e from t h e l a s t sample. T h i s phenomenon i s c a l l e d memory. Most o f t h e s e problems s h o u l d be t h e concern o f t h e A / D m a n u f a c t u r e r , so assuming t h a t he has been c a r e f u l i n h l s b o a rd d e s l g n ( a heady a s s u m p t i o n ) , you need g l v e t hought o n l y t o t h e RC t l m e c o n s t a n t o f the S/H c i r c u i t . B e f o r e d l s c u s s l n g how t o e v a l u a t e t h e t l m e c o n s t a n t , I want t o p o i n t o u t a general f a c t o f I n s t r u m e n t a l l i f e . I f you do n o t need a S/H, e . g . , I f you a r e o n l y t r y l n g t o t r a c k a 25 Hz wave w l t h a 1 2 - b i t 100,000 Hz board, you a r e b e t t e r o f f w l t h o u t

424 i t . A i l e l e c t r o n i c s i n t r o d u c e e r r o r s o f t h e i r own i n t o your data. The fewer e l e c t r o n i c gadgets between you and your d a t a , the better. Assuming t h a t you need t o t r a c k a f a s t e r s i g n a l t h a n can be accomplished w i t h o u t t h e S / H c i r c u i t r y , how do you c a l c u l a t e i t s response t i m e ? Again, t h e method w i l l be I l l u s t r a t e d by example. Assume i d e a l e l e c t r o n i c s , a 10 V t r i a n g u l a r wave, and a 1 2 - b i t . 100.000 Hz c o n v e r t e r . The t i m e c o n s t a n t f o r an RC c i r c u i t i s simply R times C . St eve C i a r c i a o f B Y T E was k i n d enough t o p o l n t o u t t o me t h a t most S / H c l r c u i t s a r e CMOS d e v i c e s , so t h e i r r e s i s t a n c e s w i l l be about 400 ohms. The c a p a c i t o r v a l u e can be read from t h e bo a rd i t s e l f . Typical v a l u e s a r e i n t h e .001 t o .01 microFa ra d ran g e . L e t ' s assume t h a t our S / H c a p a c i t o r v a l u e i s .01 m i c r o f a r a d s . Then t h e t i m e c o n s t a n t w i l l equal 4 0 0 t i m e s 1x10A-8, o r 4x10A-6 seconds. This value represents t h e time i t takes f o r t h e c i r c u i t t o g a i n o r l o s e 63.2% o f i t s charge. T o d e t e r m i n e t h e t i m e t o dro p below 1 LSB, we m u l t i p l y : 4x10A-6 t i m e s i n ( 2 0 4 8 ) . w hi ch g i v e s a t i m e o f a p p r o x i m a t e l y 3x10A-5 seconds. That I s , t h e S / H can ch a rg e o r d i s c h a r g e f u l l y a p p r o x i m a t e l y 32,000 t i m e s a second. Remember ( T h i s Is a b i t o f an t h a t i t must do b o t h f o r each r e a d i n g . oversimplification. The c a p a c i t o r does n o t have t o ch a rg e f o r t h i s long. However, i f we use t h i s a p p r o x i m a t i o n , we do n o t have t o concern o u r s e l v e s w i t h h y s t e r e s i s . ) S o t h e S / H c i r c u i t r y can complete a f u l l c h a r g e / d l s c h a r g e c y c l e t o s u p p o r t 1 2 - b i t p r e c l s l o n 16,000 t i m e s a second. Of c o u r s e , t h i s does n o t include the time for the actual conversion. However, even when we want 1 2 - b i t p r e c i s i o n , we seldom c a r e about h a v i n g 4096 r e a d i n g s per wave, so we can n e g l e c t t h e c o n v e r s i o n t i m e as i r r e l e v a n t t o our purposes. i n essence, an S / H a c t s as an i n t e g r a t i n g c i r c u i t and i n t r o d u c e s t h e k i n d o f smoothing you would n o r m a l l y expect o f such c i r c u i t r y . As me n ti on e d above, you w i l l u s u a l l y want a t l e a s t 10 o r 1 2 samples per p e r i o d , so t h i s h y p o t h e t i c a l S / H would p r o v i d e between one and two o r d e r s o f magnitude Improvement over a f r e e - r u n n i n g c o n v e r t e r . Again, I emphasize t h a t t h e s e a r e c a l c u l a t i o n s f o r an i d e a l system. The p o i n t t h a t needs t o be r e c o g n i z e d i s t h a t a 100.000 sampie/second c o n v e r t e r i s n o t designed t o make 100.000 c o n v e r s i o n s a second. A good d i s c u s s i o n o f S / H c i r c u i t r y and many o t h e r a s p e c t s o f A / D c o n v e r s i o n can be found i n [ a ] .

2 . 3 COORDINATION AND CONTROL There i s more t h a n one reason t o use S / H c l r c u i t r y . Besides i n c r e a s i n g t h e p r e c i s i o n o f t h e c o n v e r s i o n on f a s t s i g n a l s , i t can be used t o c o o r d i n a t e r e a d i n g s . T y p i c a l l y , we want t o c o l l e c t c o r r e l a t e d d a t a on two o r more sensors i n an e x p e r i m e n t . For example, we may want t o measure te m p erat ure v e r s u s p r e s s u r e f o r a system. A / D boards appear t o o f f e r a c o n v e n i e n t way o f d o i n g t h i s . T y p i c a l l y , the y w i l l p r o v i d e 16 analog i n p u t s on one b o a rd . S u r e l y , we can s i m p l y use two l i n e s o f t h e A / D board and o b t a i n o u r c o r r e l a t e d reading. As you p r o b a b l y suspect by now, t h e answer I s , " n o t necessar i I y " A / D boards g e n e r a l l y use a m u l t i p l e x e r t o rea d t h e 1 6 different lines. What t h a t means i s t h a t one comparator s e q u e n t i a l l y s e r v i c e s each o f t h e ( u p t o ) 16 l i n e s t h a t you use i n an e x p e r i ment . Thus, t h e r e a d i n g s w i l l never be si mul tan e o u s. F u r t h e r , t h e y w i l l n o t even be as c l o s e l y spaced as t h e comparator c o n v e r s i o n r a t e . To see why, we must look a t how an A / D board h a ndles d a t a .

.

425 Once t h e comparator has made an A / D c o n v e r s i o n , I t passes t h e conversion through t o t h e c i r c u i t r y t h a t presents t h e d l g l t l z e d I n f o r m a t i o n t o an l n p u t / o u t p u t ( i / O ) p o r t on t h e computer. The computer must p l c k up t h e r e a d l n g a t t h e i / O p o r t b e f o r e a new r e a d l n g can be made by t h e co mp a ra tor. I f t h l s were n o t done, t h e r e a d l n g f o r t h e n e x t A / D l i n e would o v e r w r i t e t h e l a s t r e a d l n g , and you would n o t know w hl ch l i n e a r e a d l n g was from. Most o f t e n , t h e computer I / O f u n c t i o n i s t h e slow s t e p I n A/D operation. I w I I i have more t o say on t h i s when we d i s c u s s t h e computer, b u t f o r now we w i l l c o n t e n t o u r s e l v e s w l t h t h e r e c o g n i t i o n t h a t on an IBM PC, t h e maximum r e a d i n g r a t e f o r i / O i s a p p r o x l m a t e i y 100,000 byt es/ second u s l n g DMA. Since t h e A / D w a l t s f o r t h e d a t a t o be read b e f o r e b e g i n n l n g a new c o n v e r s i o n , t h e maxlmum t h r o u g h p u t w l t h a 100 KHz c o n v e r t e r i s a p p r o x l m a t e i y 50,000 b y t e s / s e c o n d . W l t h a 1 2 - b i t comp a ra tor, t h a t means t h a t we cannot r e a d more t h a n 25,000 samples/second. An Apple I I w I I I be even s l o w e r . Thus, t h e c l o s e s t t o s i m u l t a n e i t y t h a t we c o u l d g e t w l t h two A / D I l n e s and I d e a l components I s a 40-mlcrosecond separation. I f we wanted 1000 c o r r e l a t e d d a t a p a i r s / s e c o n d , t h i s v a l u e would r e p r e s e n t a mlnlmum o f a 4% s y s t e m a t i c e r r o r i n t h e tlme axis. Again, t h l s v a l u e I s a be st-case number. Most programming you do w i l l n o t be o p t i m a l . I f you program i n a h i g h - l e v e l language, t h e t i m e 1/0 f u n c t i o n s t a k e t o e xe cu te may be two o r d e r s o f magnitude g r e a t e r t h a n t h e o p t l m a l r a t e . One way t o m i n l m i z e t h l s e r r o r I s w l t h S / H c l r c u l t r y t h a t samples a l l channels a t once and h o l d s each u n t i l t h e c o n v e r s i o n I s c o m p l e te d . The o n l y problems w i t h t h i s approach a r e t h e problems t h a t we have i n d i c a t e d b e f o r e w i t h S / H c l r c u l t s . in g e n e r a l , I t i s a good, c l e a n answer t o t h e problem. Another s o l u t i o n p r o v i d e d by some o f t h e more expenslve A / D p r o d u c t s i s on-board s t o r a g e o f c o n v e r t e d d a t a . These p r o d u c t s f u n c t i o n r a t h e r I l k e low-end wave-form d l g i t i z e r s . They w i l l s t o r e a "sweep" o f a few thousand samples and download t h e e n t l r e d a t a s e t t o t h e computer a f t e r d a t a c o l l e c t i o n . Thls approach I s n o t bad, b u t t h e l e v e l o f c o o r d l n a t i o n I s s t i l l l i m i t e d by t h e r a t e o f c o n v e r s l o n o f t h e A / D co mp a ra tor. Wh i l e you may s h r i n k t h e d e l a y between c o n v e r s i o n s , some d e l a y I s s t l I I there. The major v i r t u e s o f o f f - b o a r d memory a r e t h a t i t can I n c r e a s e r e a d i n g r a t e s and, I n c o n J u n c t l o n w l t h an o f f - b o a r d t i m e r , can Improve t h e r e p r o d u c l b i i l t y o f c o n v e r s i o n i n t e r v a l s as we s h a l l d i s c o v e r p r e s e n t l y . A point worth mentionlng w i t h respect t o control of com p u te r i z e d d a t a a c q u i s i t i o n i s t h a t A / D bo a rd t r l g g e r s a r e n o t l i k e oscilloscope trlggers. Some exp e ri me n ts ca n n o t be done on A / D b o a r d s because o f t h i s , so I want t o e x p l a i n how th e se t r i g g e r s work. O s c i l l o s c o p e t r i g g e r s a l l o w you t o s e t t h e v o l t a g e l e v e l and d i r e c t i o n o f m o t i o n o f a s l g n a i . For example, you can t r l g g e r t h e scope on t h e f a i l l n g edge o f a 3 V s i g n a l . i n some e x p e riment s, t h i s c a p a b l l i t y i s v e r y i m p o r t a n t . U n f o r t u n a t e l y , A / D board e x t e r n a l t r i g g e r s do n o t have t h l s capability. The t ' r i g g e r i n g i s e f f e c t e d by v o l t a g e l e v e l o n l y . The d i r e c t l o n I s n o t s e l e c t a b l e . C l o s e l y r e l a t e d t o t h e s e issues i s t h e m a t t e r o f c o o r d l n a t l n g t r l g g e r l n g o f simult aneou s e v e n t s . There a r e many ways you can do t h i s , i n c l u d i n g : by o u t p u t t i n g a v o l t a g e on a D / A l i n e , s e t t i n g a d l g i t a i i / O l i n e h i g h , o r O u t p u t t i n g a command to a stand-alone instrument through a d i g l t a i i n t e r f a c e . The q u e s t l o n I s how much c o n t r o l you need. What makes c o n t r o l d l f f i c u i t I s t h a t you never know what sl m u i ta n e o u s means u n t i l you s e t up your e x p e r i m e n t . The e l e c t r o n i c s o f each i n s t r u m e n t and t h e c a b l e l e n g t h s o f t h e

426 p a r t i c u l a r s e t u p a f f e c t c o o r d i n a t i o n i n ways t h a t a r e b e s t de te r m i n e d e x p e r i m e n t a l l y . T h e r e f o r e , you want t o be a b l e t o f i n e - t u n e your c o n t r o l from t h e i n s t r u m e n t s th e mse l ve s. For example, t h e p r e t r i g g e r and delayed t r i g g e r f u n c t i o n s on your waveform d i g i t i z e r s i m p l i f y c o o r d i n a t i o n tre me n d o u sl y. Some A / D packages s u p p o r t delayed t r i g g e r i n g b u t , t o t h e b e s t o f my knowledge, none s u p p o r t p r e t r i g g e r i n g . 3 . 0 THE COMPUTER

The c o n s i d e r a t i o n s t h a t go i n t o what computer c o n f i g u r a t i o n you w i l l need I n c l u d e : c h o i c e o f i n t e r f a c e t o t h e d i g i t i z i n g equipment, s u p p o r t c h i p s you w i l l need, and language you w i l l use f o r programming. The c h o i c e o f i n t e r f a c e i s p r o b a b l y t h e most i m p o r t a n t , so we w i l l b e g i n w i t h t h a t . 3.1

INTERFACES

When you d e c i d e t o comput erize a s u c c e s s f u l exp e ri me n t (and you s h o u l d never t r y t o c o m p u t e r i z e an e xp e ri me n t t h a t you d o n ' t a l r e a d y f u l l y u n d e r s t a n d ) , t h e f i r s t t h i n g you s h o u l d look f o r i s ways t o use t h e equipment you a l r e a d y have. The reasons f o r t h i s a r e obvious. F i r s t , you a l r e a d y u n d e rstan d t h e equipment and know t h a t i t w i l l do your Jo b . And second, you have a l r e a d y bought t h a t equipment, so you can save money by u s i n g what you have. You s h o u l d b e g i n by d e t e r m i n i n g whether t h e i n s t r u m e n t you a r e u s i n g i s equipped w i t h an i n t e r f a c e a l r e a d y . Some o l d e r IEEE-488, o r and many newer i n s t r u m e n t s w i l l have an R S - 2 3 2 , C e n t r o n i c s p o r t on them as s t a n d a r d equipment. i f you a r e l ucky enough t o be b l e s s e d w i t h such an i n s t r u m e n t , your r o u t e t o c o m p u t e r i z i n g has been decided f o r you. i f your i n s t r u m e n t does n o t have a d i g i t a l p o r t on i t , f i n d o u t i f t h e manuf act urer s t i l l s u p p o r t s t h e model. i f so, you may be a b l e t o r e t r o f i t a d i g i t a l p o r t t o i t . Again, your r o u t e t o computerizing i s then c l e a r . i f n e i t h e r o f t h e s e c o n d i t i o n s o b t a i n s , your t a s k i s more difficult. You s h o u l d b e g i n by r e a d i n g t h e manual o f t h e Instrument. What you a r e l o o k i n g f o r i s a s i m p l e e n t r y f o r an interface. For example, i f your i n s t r u m e n t has a N i x i e - t u b e d i s p l a y , you may be a b l e t o w i r e up a bi n a ry-co d e d decimal ( B C D ) i n t e r f a c e t o t h e computer. Remember t h a t , i f t h e i n s t r u m e n t has a d i g i t a l r e a d o u t , i t has d i g i t i z e d t h e d a t a a t some p o i n t . Your t a s k I s t o f i n d o u t where and d e t e r m i n e whether t h e d i g i t i z i n g code i s s u f f i c i e n t l y c l o s e t o a s t a n d a r d t o p e r m i t you t o Use an off-the-shelf interface. i n o r d e r t o approach t h i s t a s k i n t e l l i g e n t l y , you need t o know what t h e s t a n d a r d D / D ( d i g i t a l t o d i g i t a l ) i n t e r f a c i n g optidns are. There i s an e x c e l l e n t o v e r v i e w o f t h e v a r i o u s f l a v o r s o f i n t e r f a c i n g boards t h a t was r u n as a s i x - p a r t S e r i e s i n BYTE a c o u p l e o f y e a r s ago [ 8 ] . T h i s has been r e v i s e d and r e p r i n t e d as a book t h a t s h o u l d be easy t o o b t a i n C91. Read one o r t h e o t h e r o f t hese b e f o r e you peru se t h e i n s t r u m e n t manual. You w o n ' t be i n a p o s i t i o n t o a c t u a l l y do t h e i n t e r f a c i n g from t h i s s e r i e s , b u t you w i l l know whether an i n t e r f a c e may be a p p l i c a b l e t o your t a s k . Once you have s e l e c t e d a s t r a t e g y , you can r e s e a r c h t h e d e t a i l s o f t h e i n t e r f a c e t o co mp l e te t h e t a s k . You s h o u l d a l s o r e a l i z e t h a t n o t a i l computers s u p p o r t a i l interfaces. Make s u r e t h a t t h e computer you buy s u p p o r t s t h e i n t e r f a c e you i n t e n d t o use. An a l t e r n a t i v e t o t h i s approach I s t o do a l i t e r a t u r e

421 se a r c h and see I f someone e l s e has I n t e r f a c e d your In strume n t t o a computer a l r e a d y . The Revlew o f S c i e n t i f i c I n s t r u m e n t s . f o r example, r u n s d l g l t a l a p p l l c a t l o n s n o t e s each Issue . I f you can f i n d someone e l s e who has a l r e a d y s o l v e d your problem, do what they d i d . F a l I I n g t hese, you can r u n an A / D c o n v e r t e r t o t h e c h a r t - r e c o r d e r o u t p u t o f an analog I n s t r u m e n t as a q u l c k and. d l r t y way o f c o m p u t e r l z l n g . T h l s I s n o t a bad approach, a l t h o u g h you must o b s e rve a l l t h e warnings on A / D c o n v e r t e r s p re se n ted above. The p o l n t o f a l l o f t h l s I s t h a t u s l n g t h e equlpment you have I s p r o b a b l y t h e s a f e s t , most c o s t - e f f e c t i v e way o f computerlzlng. I f none o f t h e above o p t l o n s o b t a i n , you have two c h o l c e s : buy an A / D c o n v e r t e r and c o m p u t e r l z e from s c r a t c h , o r buy new s ta n d-alone equipment. The second o p t l o n I s much more e x p e n s l v e , b u t a l s o much l e s s p r o b l e m a t i c . I f you have t h e l u x u r y o f b u y l n g a l l new equipment, make s u r e t h a t I t I s equlpped w i t h an IEEE-488 I n t e r f a c e . Thls I n t e r f a c e was deslgned f o r l a b o r a t o r y a p p l l c a t l o n s . As I have argued e l s e w h ere C41. t h e IEEE-488 I s v a s t l y s u p e r i o r t o any o t h e r f o r l a b o r a t o r y uses. I f c o s t I s a s e r i o u s I l m l t a t l o n and speed I s n o t c r l t l c a l , an a t t r a c t i v e a l t e r n a t l v e t o IEEE-488 l n t e r f a c l n g I s t h e HP-IL. T h l s I s a low-cost S e r l a l I n t e r f a c e developed by HP (and a v a l l a b l e on o n l y HP p r o d u c t s ) t h a t c o n t a i n s many o f t h e f e a t u r e s o f t h e IEEE-488, a l b e l t I n sl ow m o t i o n . You can even use a Hewlet t -Packard hand-held c a l c u l a t o r f o r t h e "c o m p u te r " w i t h t h l s I n t e r f a c e . More l n f o r m a t l o n on t h l s o p t l o n I s p r e s e n t e d I n C7l.

3.2 SUPPORT C H I P S Support c h i p s a r e p a r t s o f t h e computer o t h e r t h a n t h e m l c r o p r o c e s s o r t h a t I n c r e a s e a c o m p u t e r ' s performance by removlng some s p e c l a l l z e d t a s k from t h e l i s t o f t h l n g s t h a t t h e m l c r o p r o c e s s o r has t o do. There a r e two maJor s u p p o r t c h l p s t h a t a r e Im p o r ta n t I n comput erlzed d a t a a c q u l s l t l o n . First Is a d l r e c t memory access (DMA) c o n t r o l l e r . What a DMA c o n t r o l l e r does I s p l c k up l n f o r m a t l o n from one p a r t o f t h e computer and p l a c e I t somewhere e l s e . For example, I f you a r e c o l l e c t l n g d a ta from an 1/0 p o r t and s t o r i n g I t on d l s k o r I n main memory, t h e DMA c o n t r o l l e r may be used t o p e r f o r m t h l s t a s k a t t h e maximum r a t e t h e computer can s u p p o r t . My own p r e J u d l c e I s t h a t any computer t h a t l a c k s a DMA c o n t r o l l e r does n o t b e l o n g I n a d a t a a c q u l s l t l o n envlronment . W i t h a DMA c o n t r o l l e r , you can program I n any h l g h - l e v e l language t h a t a l l o w s you t o access 1/0 p o r t s ( e . g . , BASIC's OUT command) and memory l o c a t l o n s ( e . g . , BASIC's PEEK command) and a c h l e v e d a t a a c q u l s l t l o n r a t e s equal t o f u l l y o p t l m l z e d assembly-language a c q u l s l t l o n r o u t i n e s . There I s one I l m l t a t l o n on DMA c o n t r o l l e r s t h a t I s Important I n A / D a p p l l c a t l o n s . A DMA c o n t r o l l e r can access o n l y one p o r t l o c a t l o n a t a t l m e . I f your A / D bo a rd has more th a n 8 - b i t r e s o l u t i o n , I t may use two p o r t s t o o u t p u t d a t a t o t h e computer. I f s o , you cannot use DMA. However, some 1 2 - b l t A / D boards ( e . g . , Data T r a n s l a t l o n p r o d u c t s ) m u l t i p l e x t h e two-byte o u t p u t t o make I t a v a i l a b l e t o t h e same 1/0 p o r t so t h a t the y can s u p p o r t DMA o p e r a t l o n . The second s u p p o r t c h l p t h a t may be o f v a l u e t o you I s a nu m e r l c a l co-processor (NCP). I f you need t o do a l o t Of number-crunchlng on your d a t a , an NCP can speed t h e tu rn -a ro u n d t l m e by as much as two o r d e r s o f rnagnltude. I f you a r e d o l n g FFTs on l a r g e d a t a s e t s . f o r example, you w I I I p r o b a b l y want t h l s

capablllty. The major t h l n g t o watch o u t f o r w l t h r e s p e c t t o NCPs I s t h a t many computers w I I I s u p p o r t them, b u t t h e languages on t h e computer w l I I n o t use them. For example, M l c r o s o f t BASIC and F O R T R A N on t h e IBM PC w i l l n o t use t h e 8087 even I f I t I s Installed. A t h l r d k i n d o f s u p p o r t c h l p t h a t can be o f use I n A / D c o n t e x t s I s a programmable I n t e r v a l t i m e r ( P I T ) . T h i s c h i p keeps t r a c k o f t l m l n g I n t e r v a l s w i t h l n a computer. I d o n ' t emphaslze I t s use because you a r e g e n e r a l l y b e t t e r se rve d by an A / D t h a t has I t s own t i m e r f o r sample I n t e r v a l s . The p ro b l e m w i t h u s i n g t h e computer t o keep t r a c k o f t l m e I s t h a t a v a r l e t y o f housekeeplng f u n c t i o n s I n t h e computer may a f f e c t t h e P I T ' S operation. The c o m p u t e r ' s t i m e r i s d e si g n e d f o r use by t h e computer, n o t f o r use by p e r l p h e r a l s r e q u l r l n g h i g h r e s o l u t l o n o f tlme. An e x c e l l e n t d i s c u s s l o n o f t h e k l n d s o f problems a s s o c l a t e d w i t h t h e IBM PC t l m l n g f u n c t i o n s I s [ l l ] . The problems a s s o c l a t e d w i t h t h e PIT a r e s l g n l f l c a n t t o data a c q u l s l t l o n generally. There I s a d i f f e r e n c e between computer t l m e and r e a l t i m e . I f you use a computer t o t l m e your d a t a a c q u l s l t l o n , you w l I I g e t d a t a w l t h equal CPU t i m e sp a cl n g . But computers do a l o t o f housekeeplng o p e r a t l o n s t h a t g e n e ra te Interrupts. What t h l s means I s t h a t p e r i o d l c a l i y your a p p l i c a t i o n program I s p u t t o s l e e p w h l l e t h e computer, e . g . , updates I t s t i m e o f day c l o c k . I t may seem t h a t you c o u l d s o l v e t h l s p ro b l e m by d l s a b l l n g t h e system i n t e r r u p t s . U n f o r t u n a t e l y , you c a n n o t . There a r e two f l a v o r s o f I n t e r r u p t s i n a computer, maskable and non-maskable (NMI ). W h l l e you can, and u s u a l l y s h o u l d , d i s a b l e t h e maskable I n t e r r u p t s d u r l n g d a t a a c q u l s l t l o n , you cannot d l s a b l e NMls. F u r t h e r , you w l I I p r o b a b l y n o t be a b l e t o f l n d o u t what causes an NMI on your computer. I n some computers, any keyboard I n p u t w l I I g e n e r a t e an NMI. Bu t you w i l l n o t f l n d t h a t o u t by r e a d i n g t h e m a n u f a c t u r e r ' s docume n tatl on . Compoundlng t h e problem I s t h a t any s o f t w a r e manuf actu re r may i n vo ke an NMI f o r any reason t h a t he sees f i t . F u r t h e r , t h e f a c t t h a t t h e computer may be o ccu p i e d w l t h housekeeplng f u n c t i o n s when d a t a I s ready means t h a t , even I f you t l m e your d a t a a c q u l s l t i o n i n t e r v a l s e x t e r n a l t o t h e computer, your d a t a may be unevenly spaced because I t i s downloaded a t unequal I n t e r v a l s . To a v o i d t h l s problem, some m a n u f a c t u r e r s make A/Ds w i t h b u f f e r memory as w e l l as e x t e r n a l t l m e r s . Whlle t h l s approach makes p e r f e c t sense, t h e c o s t o f t h l s k i n d o f se tup I s t y p i c a l l y t h r e e t o f i v e thousand d o l l a r s . For n o t a whole l o t more money, you can g e t a f u l l - f u n c t i o n sta n d -a l o n e d l g i t l z e r . There a r e many advantages t o st and-al on e d l g i t l z e r s . They can be s e t w l t h f r o n t - p a n e l c o n t r o l s l i k e an o s c l l l o s c o p e I n s t e a d Of o n l y by programmlng, so you know what t h e d l g l t l z e r I s s e t t o do As because you can see t h e s e t t i n g s on t h e f r o n t - p a n e l d l a l s . mentioned b e f o r e , t h e t r l g g e r l n g o p t i o n s o f waveform d l g l t zer s a r e s u p e r l o r t o A/Ds. The range o f scan r a t e s and v o l t a g e ga I ns te n d s t o be much l a r g e r and more f i n e l y a d j u s t a b l e . And I YOU must have a r e a l - t i m e d l s p l a y o f d a t a , you can co n n e ct an o s c l l l o s c o p e t o t h e d l g l t l z e r ' s anal o g o u t p u t and view t h e scan wlthout I n t e r f e r i n g w l t h the a c q u l s i t l o n function. One o t h e r s u p p o r t c h l p needs t o be me n ti on e d : t h e programmable I n t e r r u p t c o n t r o l l e r ( P I C ) . I w l I I wlthhold d l s c u s s l o n o f t h e PIC u n t l l t h e s e c t i o n on programmlng languages. The k i n d o f I n f o r m a t i o n t h a t you need t o know f o r l a b o r a t o r y I n t e r f a c i n g t ends t o be v e r y s p e c l f l c t o t h e l n d l v l d u a l computer, and a v a i l a b l e ( I f a t a l l ) o n l y I n a r t l c l e s and books p u b l l s h e d by I n d i v i d u a l s who have worked w l t h t h e

429 system. For example, t h e i B M PC T ech n i ca l R e fere n ce Manual g i v e s no i n f o r m a t i o n on a c c e s s i n g o r programming t h e DMA c o n t r o l l e r . i t J u s t d o e s n ' t occur t o programmers o r businessmen t h a t anyone o u t s i d e t h e manuf act urer has any use f o r t h i s i n f o r m a t i o n . However, t h e r e a r e v a r i o u s books on t h e ma rke t t h a t do address t h i s q u e s t i o n f o r t h e PC. One good example i s 111, w h i ch was w r i t t e n by one o f t h e o r l g i n a i d e s i g n e r s o f t h e I B M PC. The p o i n t o f t h i s I s t h a t you s h o u l d p r o b a b l y a v o i d computer system c l o n e s i n t h e l a b , r a t h e r t h a n r i s k t h e i r h a v i n g address space or s u p p o r t c h i p s t h a t d i f f e r from t h e system d e s c r i b e d i n t h e literature. i f a PC i s s i m p l y t o o slow f o r your a p p l i c a t i o n , any o f t h e Versabus o r VMEbus 68000 systems w i l l p r o v i d e an o r d e r o f magnitude improvement i n perf ormance. However, t h e degree o f d i f f i c u l t y i n p u t t i n g your a p p l i c a t i o n t o g e t h e r w i l l a l s o be in c r e a s e d by an o r d e r o f magnit ude. T h i s i s p a r t l y because t h e r e a r e fewer p e o p l e w r i t i n g books and a r t i c l e s on a p p l i c a t i o n s f o r t h e s e systems and p a r t l y because t h e r e a r e fewer companies s u p p l y i n g boards f o r t h e s e systems. 3 . 3 PROGRAMMING CONSIDERATIONS

I look on programming as a necessary e v i l . The g o a l s o f programming a r e t w o f o l d . F i r s t , you want t o be done w i t h i t as q u i c k l y as p o s s i b l e . And second, you want t o a c h i e v e t h e l e v e l o f c o n t r o l t h a t you had b e f o r e you co mp u teri ze d t h e o p e r a t i o n . U n f o r t u n a t e l y , t h e s e g o a l s a r e n o t complementary. One way t o lessen t h e t i m e spent programming t h e c o n t r o l o f d a t a a c q u i s i t i o n i s by b u y i n g a d r i v e r program f o r your aC q U l S i tl On system. A d r i v e r i s a program t h a t s e t s t h e o p e r a t i o n o f a d e v i c e f o r you when you i n vo ke ( s u p p o s e d l y ) o r d i n a r y - l a n g u a g e commands. For example, t h e d r i v e r may l e t you s e t t h e r a t e o f c o n v e r s i o n on l i n e 1 o f t h e A / D b o a rd t o 1000 con v e r s i o n s /s econd by s a y l n g something l i k e , "SET.RATE(1,1000)." The a l t e r n a t i v e t o t h i s m i g h t be o u t p u t t i n g a s e r i e s o f hexadecimal numbers t o a g i v e n p o r t . i n p r i n c i p a l , t h e idea o f canned d r i v e r s I s v e r y a t t r a c t i v e . I n P r a c t i c e , t h e programs t e n d t o be u n n e c e s s a r i l y slow, f i l l e d w i t h bugs, and produce u n r e l i a b l e data. F u r t h e r , they w i l l o f t e n not support t h e o p e r a t i o n s you want t o p e r f o r m on t h e A I D b o a rd . The v a l u e o f d r i v e r s f o r D I D i n t e r f a c e s i s somewhat higher. Most IEEE-488 board m a n u f a c t u r e r s , f o r example, w i l l s u p p l y assembly language d r i v e r s f o r t h e i r b o a rd s t h a t p e r f o r m re a s o n a b l y W e l l . The programs a r e O f t e n n o t a d e q u a tel y debugged, however, so you s h o u l d make s u r e t h a t t h e so u rce code i s p r o v i d e d w i t h t h e package. 3 . 4 CHOICE OF LANGUAGE

i f you a r e g o i n g t o w r i t e your own programs, what language s h o u l d you use? W h i l e everyone seems t o have t h e i r own p r e f e r e n c e s on t h i s , I b e l i e v e t h a t BASIC i s by f a r t h e b e s t choice f o r s c i e n t i s t s . The v i r t u e s o f B A S I C a r e t h a t i t can be i n t e r p r e t e d w h i l e d e v e l o p i n g a program t o ease debugging and t h e n comp i l e d f o r (some) speed when t h e program has been debugged, I t p r o v l d e s access t o p o r t s and s u p p o r t c h i p r e g i s t e r s w i t h t h e INP and OUT commands, i t p r o v i d e s access t o memory l o c a t i o n s by t h e PEEK and POKE commands, and i t can be mastered i n a week o r two. I should m e n ti o n t h a t t h e computer cannot be ma stere d i n t h a t t i m e , b u t you w i i i know enough BASIC t h a t t h e language i s n o t what w i l l be

430 p r e v e n t i n g you from d o i n g something. The sl o w s t e p I s l e a r n i n g where t h e m a nuf act urer p u t t h e s u p p o r t c h i p s i n t h e c o m p u t e r ' s address space, f i g u r i n g o u t what p o r t l o c a t i o n your A / D o r i n t e r f a c e uses, what t h e c r y p t i c i n t e r f a c e o r A / D documentation means, e t c . The language w i l l n o t be t h e p ro b l e m. And t h a t i s a i l you can reasonably expect o f a language. Most commonly, BASIC on a microcomputer means M i c r o s o f t BASIC, so we w i l l b e g i n by d i s c u s s i n g I t . There a r e a few major f l a w s i n M i c r o s o f t BASIC. These i n c l u d e : i t i s s l o w , has v e r y l i m i t e d dynamic range ( a p p r o x . 1 0 * - 3 7 t o 1 0 A 3 7 , w h i ch i s i n s u f f i c i e n t f o r s o l v i n g a reasonabl y l a r g e m a t r i x by p i v o t a l c o n d e n s a t l o n ) , and can o n l y address a t o t a l o f 6 4 Kbytes combined program and d a t a space, even i f your computer has t e n t i m e s t h a t available. A l t h o u g h BASIC i s n o t a f a s t language, t h e r e a r e s i m p l e ways o f o b t a i n i n g adequate performance f o r your program. These include: u s i n g t h e support c h i p s i n t e l l i g e n t l y ; o p t i m i z i n g t h e o b j e c t code o f your BASIC c o m p i l e r o u t p u t b y, e . g . , kee p i n g intermediate values o f v a r i a b l e s i n r e g i s t e r s instead o f s h u f f l i n g them back and f o r t h t o main memory; and n o t o v e r b u r d e n i n g your program w i t h needl e ss t a s k s . A number o f new BASIC I mplemen tati on s have r e c e n t l y appeared on t h e market t h a t a t t e m p t t o r e d r e s s some o f t h e l i m i t a t i o n s o f M i c r o s o f t BASIC. Three t h a t a r e w o r t h m e n t i o n i n g a r e B e t t e r BASIC, True B A S I C , and M T B A S I C . I have n o t used any o f th e s e BASICS, so I cannot recommend them. However, the y have some f e a t u r e s t h a t may be i m p o r t a n t t o your w ork. Each o f them can use t h e f u l l amount o f memory on your computer, s u p p o r t t h e use o f t h e NCP ( M T B A S I C o n l y w i t h t h e $ 7 9 . 9 5 v e r s i o n ) , and p r o v i d e a dynamic range o f a t l e a s t 1 0 A - 9 9 t o 1 0 A 9 9 . On t h e n e g a t l v e s l d e , none o f t hese languages come i n i n t e r p r e t e d versions. F u r t h e r , True BASIC l a c k s t h e INP and OUT commands. MTBASIC and True BASIC i n c l u d e one o t h e r f e a t u r e : i n t e r r u p t handling. I am n o t a f a n o f I n t e r r u p t s , however. i n t e r r u p t s a r e used when you want t h e computer t o do some ta sk w h i l e i t i s w a i t i n g f o r some o t h e r t a s k t o be comp l e ted . For example, i f you a r e c o l l e c t i n g d a t a a t a r e l a t i v e l y sl o w r a t e , you m i g h t want t o have t h e computer p l o t a graph o f t h e d a t a i t a l r e a d y has c o l l e c t e d w h i l e w a i t i n g f o r more. When t h e new d a ta p o i n t i s r e a dy, t h e d a t a a c q u i s i t i o n d e v i c e w i l l s i g n a l t h e computer t h a t more d a t a i s a v a i l a b l e , i . e . , i t w i l l i n t e r r u p t the p l o t t i n g f u n c t i o n t o perform t h e data a c q u i s i t i o n f u n c t i o n . What I f i n d d i s t a s t e f u l about t h i s proced u re i s t h a t i t c o m p l e t e l y i g n o r e s t h e r e l a t i v e importance o f t h e two f u n c t i o n s . Your f i r s t concern s h o u l d be t o g e t t h e d a t a , and g e t i t r i g h t . We a r e n o t o f f e n d e d t h a t an o s c i i i o s c o p e j u s t s i t s and w a i t s f o r a t r i g g e r . We s h o u l d n o t be alarmed t h a t a micr oco mp u ter, wh c h i s no more i f your e x p e n s i v e t h a n a decent scope, i s no more i n d u s t r ous. d a t a a c q u i s i t i o n r a t e I s so slow t h a t t h e compute can c o m p l e t e l y p l o t t h e d a t a b e f o r e t h e n e x t p o i n t i s c o l l e c t e d , you d o n ' t need interrupts. i f you need i n t e r r u p t s t o a cco mp l i sh two t a s k s , you s h o u l d n ' t be d o i n g b o t h t a s k s on one computer. The overhead My concerns a r e n o t p u r e l y p h i l o s o p h i c a l . i n c u r r e d by i n t e r r u p t s can be v a r i a b l e , depending on what i n s t r u c t i o n t h e computer was w o r k i n g on when i t was i n t e r r u p t e d . T h i s u n c e r t a i n t y w i l l m a n i f e s t I t s e l f i n one o f two ways. First, t h e u n c e r t a i n t y caused by t h e t i m i n g problems a l r e a d y d i scu sse d w i l l be f u r t h e r exacerbat ed. O r , I f you t r y t o f o r c e r e g u l a r i t y i n t h e t i m e base by t r i g g e r i n g o f f a f r e e - r u n n i n g e x t e r n a l t i m e r , you may m i s s a d a t a p o i n t e n t i r e l y . F u r t h e r , t h e amount o f t i m e i t t a k e s t o s e r v i c e an i n t e r r u p t i s n o t i n s i g n i f i c a n t . For

example, t h e IBM PC r e q u i r e s over 8 0 c l o c k c y c l e s t o ha n d l e t h e bookkeeping a s s o c l a t e d w i t h an I n t e r r u p t . I f you a r e s l m p l y t r y i n g t o p l o t an incoming d a t a p o i n t on t h e C R T , an assembly r o u t l n e can a c h l e v e t h a t i n l i n e i n about t h e same t i m e t h a t i t would t a k e t o s e r v i c e t h e i n t e r r u p t . The g r e a t advantage o f u s i n g a s i n g l e r o u t i n e f o r d a t a a c q u l s l t i o n i s t h a t you normally debug and v e r i f y programs i n d e p e n d e n t l y . i f you t r y r u n n i n g two p r o p e r l y debugged r o u t l n e s I n tandom, you r u n t h e r i s k o f i n t r o d u c i n g a new s e t o f e r r o r s caused by t h e i n t e r a c t i o n o f t h e two r o u t l n e s . Such e r r o r s a r e d l f f l c u i t t o d e t e c t . To my mind, t h e o n l y l e g l t l m a t e use o f I n t e r r u p t s i n d a t a a c q u i s i t i o n I s s i g n a l l i n g u n a n t i c i p a t e d e v e n t s t h a t a r e more important than t h e data. For example, I f an i n s t r u m e n t malfunctions, i t I s desirable t o i n t e r r u p t the data a c q u i s i t i o n process. By t h e way, t h i s k l n d o f f u n c t i o n i s e a s i l y programmed on s ta n d - a l o n e i n s t r u m e n t s t h a t s u p p o r t t h e I E E E - 4 8 8 i n t e r f a c e . The SRQ l i n e o f t h a t i n t e r f a c e can be t i e d t o an I R Q I I n e o f t h e computer t o a u t o m a t i c a l l y generat e an i n t e r r u p t s i g n a l f o r any c o n t i n g e n c y t h a t you have programmed t h e i n s t r u m e n t t o m o n i t o r . I f you wanted t o employ t h l s k l n d o f a s e t u p , MTBASIC ( w h l c h i s l e s s e x p e n s i ve t h a n True BASIC) m i g h t be an a t t r a c t i v e a l t e r n a t i v e t o programming t h e I n t e r r u p t - h a n d l i n g r o u t i n e i n assembly as i s n o r m a l l y r e q u i r e d i n BASIC. 4 . 0 OVER-RELIANCE ON AUTOMATION

There I s a tendency when p e o p l e c o m p u t e r i z e t o p u t t o o much f a i t h I n t h e computer. T h i s g e n e r a l l y t a k e s one o f two forms. F l r s t , i t i s easy t o o v e r t r e a t d a t a . For example, The sometimes d a t a W l I i be smoothed b e f o r e i t i s a n a l yse d . s l g n i f l c a n c e o f s t a t i s t l c a l i n f o r m a t i o n on t h e degree o f f i t o f smoothed d a t a t o a l i n e i s , o f c o u r s e , t o t a l l y opaque. The second t r a p i s t o t r y t o make t h e computer do an a n a l y s i s t h a t you c o u l d b e t t e r do w i t h o u t i t . I w i l l illustrate t h i s p o i n t by example. i n my d a t a a c q u l s l t i o n c o u r s e a t B r a n d e i s , I assigned an experiment i n w h i ch t h e s t u d e n t was t o make a phase dlagram o f t h e a c e t a m i d e / s a l i c y i i c a c i d system. T h l s i s an i n t e r e s t l n g system because i t forms a p e r l t e c t i c m i x t u r e a t . 4 6 X s a l i c y l i c a c i d and because most o f t h e mole f r a c t i o n s o f t h e system a r e prone t o s u p e r c o o l i n g . As a r e s u l t , t h e c o o l i n g c u r v e s f o r t h i s system a r e a mess. They a r e Very easy t o a n a i y s e by eye, e s p e c i a l l y i f you f o r t i f y your a n a i y s l s w i t h an o b s e r v a t i o n o f t h e c l o u d p o i n t s . But t h e s t u d e n t s i n v a r l a b i y t r i e d t o w r i t e programs t o i d e n t i f y t h e break p o i n t s f o r them i n s t e a d o f s i m p l y h a v i n g t h e computer p l o t t h e p o i n t s and d o l n g t h e a n a l y s i s by eye. I t i s probably possible t o w r i t e an a n a l y s i s r o u t i n e f o r d o i n g t h i s , b u t none o f my s t u d e n t s was ever a b l e t o do i t . The p o i n t I wanted them t o l e a r n was t h a t t h e computer can be a l o t more t r o u b l e t h a n i t ' s w o r t h i f a p p l i e d t o t h e wrong pr o b l e m s . You s h o u l d n o t b o t h e r aut oma ti ng a n y t h i n g t h a t i s n ' t a pr o b l e m w i t h o u t a u t o m a t i o n . T h i s i s a more o b v i o u s p o i n t i n t h e a b s t r a c t than i t i s i n p r a c t i c e . 5 . 0 PUTTING I T ALL TOGETHER

The l a s t argument I want t o make a g a i n s t t h e use o f A / D boards i s a systems argument. When you s e t up an exp e rl me n t, you n o r m a l l y i n c o r p o r a t e one i n s t r u m e n t a t a t i m e i n t o t h e s e t u p . You v a l i d a t e t h e performance o f t h a t i n s t r u m e n t , t h e n add t h e n e x t one and so on. T h i s I s a n a t u r a l way t o procede. When you

432 g e t t o t h e l e v e l o f r u n n i n g e v e r y t h i n g a t once, t h e problems t h a t a r e l e f t a r e problems o f c o o r d l n a t l o n . You know t h a t because you know t h a t each o f t h e components I s p e r f o r m i n g as e xp e cted lndlvldually. Uslng an A / D board on a computer, however, sta n d s t h l s p r o c e s s on i t s head. You cannot t e s t pe rforma n ce o f your s e n s o r s s e p a r a t e l y from t e s t l n g t h e system. You must b e g i n by l n t r o d u c l n g t h e computer i n t o t h e loop. I f t h e r e I s a problem, and t h e r e always i s , you d o n ' t know whether I t I s I n t h e computer, t h e A / D board, t h e sensor, o r c o o r d l n a t l o n between some elements. W i t h st and-alone equlpment t h a t I s d l g i t a i i y I n t e r f a c e d t o t h e computer, however, you can work l i k e y o u ' v e always worked. The s ta n d - a l o ne I n s t r u m e n t can measure i t s sensor w i t h o u t b e i n g I n c o r p o r a t e d i n t o t h e system. A f t e r you know t h a t each p a r t I s w o r k i n g s e p a r a t e l y , t h e problems t h a t remain w i l l c l e a r l y be u n d e r s to o d as problems o f c o m p u t e r l z a t l o n . W h i l e I have emphasized t h e e r r o r s t h a t c o m p u t e r l z a t l o n may i n t r o d u c e I n t o a procedure, I t may be t h a t t h e p r e c l s l o n l o s t i n c o m p u t e r i z i n g d a t a a c q u l s i t l o n I s o f f s e t by a v o i d l n g t h e e r r o r s I n your c u r r e n t procedures, e . g . , manual d a t a e n t r y . Fu r th e r m o r e , t h e problems I have d l s c u s s e d a r e most c r i t i c a l i n hlgh-speed, h i g h - p r e c l s l o n work. The sl ow e r o r more q u a l l t a t i v e your work I s , t h e l e s s you need t o w o r r y about t h e dangers I have enumerated. My purpose has been t o make you aware o f t h e w a l l s i n A / D , r a t h e r t h a n t o suggest t h a t A / D bo a rd s have no use. REFERENCES

[13 B r a d l e y , Davld, Assembly Language Programming f o r t h e IBM Personal Computer. P r e n t i c e - H a l I , 1984. C23 C a r r , Joseph, l n t e r f a c l n g Your Microcomputer t o V l r t u a l l y An y th l n g , Tab, 1984. C3l Clune Thomas and K a r n e t t , M a r t l n , "Computer-Independent n t e r f a c e between a B l o m a t l o n 8100 and a ml croco mp u ter," I EEE-488 Revlew o f S c l e n t l f l c I n s t r u m e n t s , Nov. 1984, v . 55 no. 1 1 . p . 1879. C41 C l u n e Thomas, " I n t e r f a c l n g f o r d a t a a c q u l s l t i o n , " B Y T E , Feb. 0 no. 2, p . 269. 1985, v . -, "The IBM CS-9000 l a b co mp u ter," B Y T E , Feb. 1984, C5l v. 9 no. 2 , p . 278. [6] F e n s t e r . Samuel and F ord, D r . L i n c o l n , " S a l t , " BYTE, June 1985, v . 10 no. 6, p. 147. [7] Kane, G e r r y ; Harper, St eve; and U s h l J l m a , D a v l d , The HP-IL System, Osborne/McGraw-Hill, 1982. [8] L e l b s o n , St eve, "The I n p u t / o u t p u t p r l m e r , " B Y T E , s l x - p a r t s e r i e s from Feb. 1982, v . 7 no. 2 t o J u l y 1982, v . 7 no. 7. Cgl , The Handbook o f Mlcrocomputer l n t e r f a c l n g , Tab. [lo] L l s c o u s k i , Joseph, "Connect lng computer and e xp e rl me n ts:

-

N o i s e r e j e c t i o n t h r o u g h s o f t w a r e , " Computer A p p l l c a t l o n s i n t h e Lab, Aug. 1984, V . 2 no. 4, p . 208. [ll] S m l th , Bob and P u c k e t t , Tom, " L l f e i n t h e f a s t l a n e , " PC Tech J o u r n a l , Apr. 1984, v . 1 no. 7, p . 63.

HIGH FREQUENCY WATER QUALITY MONITORING OF A COASTAL STREAM NORMAN E. DALLEY, INLAND WATERS DIRECTORATE, ENVIRONMENT CANADA, 502-1001 WEST PENDER ST., VANCOUVER, CANADl V6E 2M9

ABSTRACT High frequency monitoring of a number of water quality indicators was carried out for a one year period in a Pacific coastal stream. A computer program was written to facilitate presentation

and preliminary analysis of the data collected. Application of the

program to these data demonstrated a number of interesting short term variations in the indicators

being

monitored.

This

study

confirms

the

conclusion

that

high

frequency

monitoring can be an appropriate strategy and concludes that in streams with widely varying discharge, i t is the preferred approach. Several limitations in the data acquisition system being used are noted. INTRODUCTION This

paper

reports

on

the

initial

part

of

a

study

of

coastal

stream monitoring

techniques and strategies. The purpose of the complete study is four-fold: (1) to develop a low-cost, versatile water quality monitoring system, (2) to evaluate the performance of data acquisition systems in the field,

(3) to select appropriate statistical methods for

analyzing

frequency

and

presenting

the

high

data

produced,

and

(4)

to

make

recommendations, based on this analysis, of appropriate strategies for the monitoring of coastal streams. This paper reports on work directed towards the first two goals. High frequency monitoring of a selected suite of water quality indicators was undertaken at a

site chosen to be typical of coastal streams. The frequency of monitoring desired and the volume of data that would be produced dictated use of a digital data acquisition system in which physical analog signals are converted to digital information. Up to the present, monitoring of the stream has utilized

a commercially available

data acquisition system with a number of limitations, particularly the difficulty of altering the

types of

sensors being

used.

An

inexpensive data

acquisition system

which will

overcome the limitations is needed. Data collecbd over the period August 1984 to August 1985 at 15 minute intervals indicate large variations in magnitude

over

very

short periods for a number of the

variables monitored. Scme observed variations were: (1) a rapid and dramatic drop in stream pH correlated with heavy rainfall; (2) a large rapid response of water level to rainfall; (3) a significant diurnal variation of water level and pH during summer months; (4) a wide variation of temperature with a large diurnal frequency component throughout

434 the year; and (5) significant diurnal variations of oxidation-reduction potential. and analysis. Also

A computer program was developed to aid in data. presentation under

development are

a

program to remove

the

types of

noise encountered in high

frequency environmental monitoring and methods for data analysis using existing software packages.

In

addition

reliable

methods

of

data

transfer

from

acquisition

system

to

microcomputer and microcomputer to mainframe were implemented. METHODS Kanaka Creek, a tributary of the lower Fraser River in southern British

Columbia

was selected as the site for this study. The northern portion of the watershed is heavily forested mountain slopes while the southern portion is lightly populated with small farms and

residential

areas.

This stream was chosen

i t exhibits highly

for several reasons:

episodic flow behaviour typical of Pacific coastal streams; it contains a hydrometric survey station with long term water quantity records; it is proximal to the city of Vancouver where Water Quality Branch offices are located; it is the site of a Salmon Enhancement Program (SEP) hatchery and has a hatchery manager on site twenty-four hours a day; it has power and telephone service. Equipment was installed in the stream and a nearby pumphousc. For the past year of the study a Hydrolab 8000 data acquisition system

For a detailed discussion of this system see Whi6eld (1984). The system

w a s used. included

the

data

transmitter

unit

with

sensors

(pressure,

temperature,

conductivity,

dissolved oxygen, pH and oxidation-reduction potential), the data control unit (logger) and the

data

management

compensated.

unit

Calibration and

(for

data

transfer).

The

pH

sensors

was

carried

cleaning of

electrode out

was

temperature

approximately once

every two to three months using standard solutions a s described in the Hydrolab 8000 instructions. The transmitter unit was enclosed in a PVC pipe which was anchored to a cement block in the stream bed. The sampling frequency was set a t once every 15 min. in order to effectively sample even short term variations.

While this frequency would

theoretically capture fluctuations with as short a time period as 30 minutes (see Fritschen

and Gay, 19791, practically we expected to observe phenomena with excursions lasting in the order of hours as a minimum. Data was transferred weekly from the data control unit

to

an

management

IBM-PC

compatible

portable

microcomputer

(Hyperion)

using

the

data

unit and a communications program with a terminal emulator (Dynalogic

Info-Tech, 1983). Batteries were changed and the system memory was cleared of data weekly. Batteries (12v, 20 ampere hour lead acid Yuasa or Gel Cells) were charged with a Johnson Controls 12v charger which switched to float charge at 80% charge capacity. Data collected on microcomputer floppy diskettes were edited and transmitted to a n IBM mainframe

coriiputer

communications computer

was

at

programs used

for

Simon

Fraser

(IN:TOUCH, further

University

using

3 101, Crosstalk,

editing, for

data

and

one

of

Kermit).

analysis and

several The

presentation,

different mainframe and

for

archiving. Programs were writLen in FORTRAN IV, or VS FORTRAN and utilized the

435

AES Station

- Haney Daily readings

East

60. [r)

$

50.

3

al .r(

I

40.

d 3 .r(

x

c

-

30.

.r(

20.

(d cl4

c

.r(

in.

P= 0. 1

16 1

161

AUG S E P 1964

16 I

OCT

161

NOV

16 1

DEC

161 16 1 16 1 16 I 16 1 16 1 16 1 FEE MAR APR MAY J U N JUL AUG S E P

16 1

JAN 1985

FIGURE 1

Kanaka Creek at SEP Hatchery Manual gauge readings 1.2

0.0

~,

, , , , ,

I

,

,

,

1

, , , ,

I

,

,

, ,

I

,

,

, , J

I 16 1 1 6 1 16 1 1 6 1 16 I 16 1 161 1 6 1 1 6 1 I 6 1 1 6 1 16 1 16 1 A U C S E P OCT NOV DEC JAN FEE MAR APR MAY J U N JUL AUG S E P 1984 1985

FIGURE 2

436 Plot Description System of the Michigan Terminal System. Plots were produced with a

QMS Lasergrafix 1200 printer or an HP7470A pen plotter. RESULTS @ Frequency Monitoring

Many of the variables being monitored rhanged rapidly over short periods of time (within a few hours). Many water quality

sites are monitored on a weekly or even

monthly basis and would not, of course, demonstrate such short term variations. The rapid response and episodic nature of flow data for this stream is illustrated by daily rainfall (Figure 1, data from AES, Atmospheric Environment Service) and daily gauge height readings (Figure 2). The record for the month of November 1984, one of heavy rainfall in the Kanaka Creek watershed, serves to illustrate the dramatic drop in stream pH which can occur after a rainstorm (Figure 3a). On four occasions stream pH fell between one-half and ‘ a full pH unit. In two out of three heavy rainfall periods in December 1984, there was a dramatic drop from the normal pH range of 6.0 to 6.3 down to either 5.2 or 5.4 pH units (Figure 3b). In both cases the significant portion of the drop occurred during a 5-6 hour period and there was a slow (approximately 4-day) rebound to the normal pH level for that month. Data for July 1985 indicate a marked diurnal variation in pH, water level, and temperature (Figure 4). The two days chosen to illustrate throughout

this

variation

are

typical

examples

of

the

behaviour

of

these

variables

5 shows that Lhere were also periods with significant

the month. Figure

diurnal variations of oxidation reduction potential. Equipment The Hydrolab sensors were very reliable and remained stable for long periods between calibrations. The major

problem

causing loss of data or inaccurate readings was the

unreliability of the batteries. Even new cells did not hold charges well, after as few as 10

cycles.

readings)

Data and

transfer from

required

the Hydrolab

attachment

of

the

was slow DMU

to

(about the

data

15 minutes

logger

and

for

4000

to

the

microcomputer. Computer Program

A computer program has been written specifically for high frequency environmental data presentation and is available on request from the author. In order to accentuate some types of errors and to provide smoothing for presentation purposes, this program will group data at the user’s request, calculating means and standard deviations of the grouped data and producing plotted output of the mean and standard deviation of the grouped data. The user selects the number of data points he wishes to group. Figure 6 a demonstrates a graph of true daily average temperatures calculated from 15 minute data while

Figure

6b

is

a

plot

of

the

standard

deviation

of

raw

data

grouped

eight

observations at a tiine. It should be noted that in many cases examination of the raw graphically

presentee

data

is

sufficient to

spot periods

of

equipment malfunction

or

437

Kanaka Creek a t SEP Hatchery In s i t u sensor d a t a

6.5 40. 3

3

.3

0.6

E

6.0

cn

3

1

.3

cl

3

30.

5.5

z a 5.0 .A

4.5

0.0

-c

0.

4.0

-

FIGURE 3a

Kanaka Creek a t SEP Hatchery In s i t u sensor d a t a 1.2

!i

1.0

cn

a

Q)

0.8

E

.A 3

40.

3 .H

0.6

.4

z 30. C

.+

42

c

50. 3

L

d

7.0

-

3

0.4

.A

cb

20.

'

6.5

cn

6.0

5 C

3

e a

5.5

5.0

h

=e

d

.H

0.2

0.0

(d

iz -

10.

0..

4.5

-

.i & -

Q

4.0

I

1 . 1

1 - 1

I

I 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272E293031 DECEMBER 1984

FIGURE 3b

438

Kanaka Creek a t SEP Hatchery In s i t u sensor data 2g,

1

25.1

.&

-0.8

c 13.

7.0

6.5

'

1

5.0

ci 4.5

-1.6

b

5.

4.0 27

FIGURE 4

Kanaka Creek a t SEP Hatchery

5

In situ sensor d a t a

3 690.

F: Q,

2

530.

.& 4

*6 .&

& X & 490.

MARCH 1 2 3

0

1985

FIGURE 5

4 5 6 7 8 9 101112131415161718I9202122232425262728293031

439

Kanaka Creek a t SEP Hatchery In s i t u sensor data

rn

3

24.

.d

rn

A

Q,

v

20.

rn al Q,

L 16. bn Q,

n G

12.

.+ Q,

L

8.

hj

4.-

E"

Q O .

\An I

b

,

16 1

I

,

,

)

,

,

I

I

I

1

I

I

I

I

I

I

I

I

I

FIGURE 6a

rn

Kanaka Creek a t SEP Hatchery

3

u

.d

Std Dev

rn

4 8

Q)

[/)

a

40-

Q) LI

M

a 32Q)

1

l

l

I 8 1 I6 1 18 1 16 1 18 I 16 1 18 I 16 1 16 1 16 1 16 OCT NOV DEC JAN FEE MAR A P R MAY J U N JUL AUG 1985

18 I

AUG SEP 1984

,

V"

-

Grouped d a t a

440 otherwise unreliable data (not shown).

DISCUSSION Frequency Monitoring The high frequency data collected shows many interesting variations in water quality variables which would not have been noted had Among

the

variables monitored

were

pH,

less frequent observations been made.

temperature,

dissolved

oxygen,

conductivity,

water level and oxidation reduction potential. The rapid drop in stream pH is correlated with rainfall events (e.g. Figure 3). Measurements of precipitation pH have been taken for many rainstorms and some have yielded readings in the range of 4 to 5 pH units. It is thought that this may contribute to the drop in stream pH along with leaching of organic acids or other chemicals from the forest floor. in water level during summer months was significant on a diurnal

The variation

basis in this stream (Figure 4). Since the lowest water levels occur in late afternoon and the highest levels in early morning shortly after sunrise, what is observed may be the result of transpiration by the abundant vegetation in the basin, evaporation caused by solar heating, and/or withdrawal for use by local residents. Similarly a diurnal variation of oxidation-reduction potential was noted, possibly the result of photoactivation of various chemical species (see Stumm and Morgan, 1981). The wide daily variation in temperature during

summer

months

demonstrates

the

quick

response

of

this

stream

to

physical

environmental influences. Equipment

A number of limitations were noted for the Hydrolab 8000 system. There was no facility for attaching other sensors, no local equipment servicifig was available, the local supplier could not provide schematic diagrams, and any changes of sensor type would be of

a

permanent

nature.

The

depth

and

conductivity

sensors

were

not

of

sufficient

sensitivity for the magnitudes being measured. This system could only support a fixed frequency

of

monitoring and

a maximum of 4096 observations during an unattended

monitoring interval. Alteration of the collection frequency required taking the unit apart, a difficult

procedure

to

perform

in

the

field.

In

addition,

current drain

was high

and

continual replacement of batteries was required. Another disadvantage was the high cost of a complete system (>US$ZO,OOO).

We are planning to test new devices which will allow for dynamic alteration of collection frequency, .i.e. based on the values of variables being monitored. The devices will also contain standard RS-232C interfaces directly on the logging units, they will be built

CMOS

using

technology

for

low

power

consumption

and

reliability

at

low

temperatures, and they will collect up to 10 times as much data as the previously used equipment.

The

U.S.$5000

each.

high-frequency

cost

of

Future

monitoring

these work and

systems with the

with the

selection

six

sensors

new

systems

and

testing

is

estimated

will of

focus

sensors,

to

be

on

continued

e.g.

about

ion-specific

441 electrodes, for monitoring additional variables. Requirements for satellite transmission are currently being investigated and this facility will be added in the coming year as will the ability to activate automatic samplers. The

actual reliability and

accuracy

of

the

systems in field use will be examined in the coming months. Computer Program Graphical output was chosen a s the most appropriate for the large volume of data 15 min. for one year (210,000 points).

resulting from monitoring or six variables every

The computer program developed to deal with these data provides graphical output only. The user

is

prompted interactively

for

the

time period

to

be

analyzed, the

type

of

graphical display (symbols used, presence or absence of a line connecting points) and the number of points to group for determining and plotting averages and standard deviations. Default values exist and can be chosen for most of these options. Data from one day up to several years will be appropriately handled. In addition, any format of time-series data

can be handled easily providing each time point of observations is an individual record in the data file and that the day, month, and year are provided on each record. Hours and

minutes will

be

used

if

provided.

Data

is

scaled

and

appropriate divisions and

labelling of the time axis are determined. When the appropriate number of points to group

is

chosen

by

the

user,

graphical

output of

the

standard deviations

serves to

highlight many types of erroneous data. CONCLUSIONS High frequency monitoring can provide insights into the variation

of water quality

indicators that would go unnoticed with monthly, weekly or even daily monitoring. I t is an appropriate approach to monitoring in some circumstances. In a stream with highly variable

discharge,

or

seasonally

low

flow

rates,

rapid

excursions

in

pH

and

other

variables can be expected. Assessment of changes in these variables under such conditions would

require

accomplished

high

frequency monitoring. Presentation of

graphically

as tables

of

such

data

prove

high

frequency

difficult

to

data

is best

comprehend.

Data

acquisition devices should be flexible and adhere to standards in their input and output functions, and Reliability

should

have

large

data

storage capacity

and

low

power

consumption.

and downtime are of prime importance. Information on reliability should be

obtained from other users prior to purchase if possible. Sensors and electronic recording devices must be chosen with care to ensure the appropriate sensitivity for the application being considered. ACKNOWLEDGEMENTS

I

would

like

to

thank

Mr.

Vancouver Regional District for

John

Heaven

for

his

invaluable

help,

the

Greater

their cooperation, and Bev McNaughton, Paul Whitfield

and Normand Rousseau for their assistance. The views presented are those of the author and not necessarily those of Environment Canada.

442

REFERENCES Dynalogic Info-Tech Carp., 1N:TOUCH Communications Program Manual, 1983. Fritschen, L.J. and L.W. Gay, Environmental Instrumentation, Springer-Verlag, 1979. Stumm, W., and J.J. Morgan, Aquatic Chemistry - An Introduction Emphasizing Chemical Equilibria in Natural Waters, John Wiley and Sons, 1981. Whitfield, P.H., Operation of the Hydrolab 8000 system for Collection of Water Quality Data. Yukon River Basin Study. Water Quality Working Group Report No. 5. Inland Waters Directorate. Environment Canada. Vancouver, B.C. 1984.

THE D E S I G N OF PI COST EFFECrIVE MICHOCOMPUlER-BASED D A r A ACBUISITION SYSTEM

t i y o h i s a 01 a m u r a .

F'rotessor

Ph-D.

Mechanical Engineering

O+

Bradley University Peoria,

!,amyab

I l l i n o i s 61625

Aghai -1 a b r i i

E l e c t r o n i c Software Engineer

1n t e g r at e d

1e c h n i c a l

Jalley City,

Systems

N o r t h D a k o t a 58c.172

AHS FHACT

cle5iqning of a d a t a a c q u i s i t i o n s y s t e m u t i l i z i n g a m a s s

!he produced

model

pr-trpclsi t i on.

o+

The

microcomputers

high

mi r r o c o m p u t e r -

makes

comtner c i a 1 ? y

available

volume

i t

production

of

inexpensive

quite

microprocessor-based

f 1 e : : i b i l i t y a n d v e r - s a t i l i t y of retaining

assemb I y to

a

very

attractive

a general

purpose

to

compared data

the

capacity

1 a n g u a g e programmi n a the

cc*stolriize

high level

data

of

-

a

acquisition

I h e microcomputer system p r o v i d e s t h e d e s i g n e r w i t h

c,ystem.

st:ll

is

the

language software while

h i g h s p e e d t h r o u g h t h e use o f

I t is p o s s i b l e f o r

acquisition

system

to

the

designer

hisiher

own

p a r t i c i . t l a r s p e c 1 f ication-?. irr

this

paper

acnc..tl sti o n s y s t e m presented.

The

Commodore 6 4 ,

t h e d e s i g n a n d t e s t i n g procedLkre f o r a d a t a

using unique

a

Commodore

feature

which i s o n e a f

of

64

computer

system

h a r d w a r e and s o f t w a r e of

t h e b e s t s e i l i n q and

lowest

are the

cost

444

computers,

utilized

is

design pitfalls a r e

to

simplify

pointed

out.

the

CSctual

design.

S o m e of t h e

experimental

reuslts

using the data acquisition system are shown.

INTRODUCTION

The

microprocessor-based

increasingly

as

plays

acquisition

important r o l e in modern instrumentation.

acquired in a digital Iorm can analyses

data

well

as

for

be

used

for

an

T h e data

various

data-base

which used t o be t h e wain

plotting,

Furpose +or t h e data acquisition in t h e pre-computer age. An engineer or a scientist who n e e d s a

microprocessor-based

acquisition system may have to make a decision o n selecting

data

o n e of t h e following three alternatives:

( 1 ) Purchase

a

specifications;

data

(2)

acquisition Purchase

system a

general

acquisition system conmerciaily available; construct a data acquisition,

tailored

to

purpose

the data

or ( 3 ) Design arrd

possibly u s i n g a mass produced

microcomputer system.

Alternative 1 requires the minimum effort He'she

or!

the

user s

can specify t h e system to b e a turn-l:'ey type.

it becumes

then t h e manufacturer's job t o make t h e system user-friendiv fool-pr-oci.

be

the

ax+

Quite understandably, t h i s aiternative will prove t 9

most

available

part.

expensive.

imnediat.ely.

In

addition,

filternative

2

the 2 5

system

prcbably

nav riot b e

the

flrt-~et

frequently utilized.

S i n c e t h e system is produced in a

to

alternative 2 c o s t s l e s s than alternative 1

large

quantity,

moderate

445 d o e s and t h e s v s t e m However,

since

the

readily

more

is

system

is n o t

to

available

taj

the

user.

l o r e d t o t h e u s e r ' s exact

s p e c i f i c a t i o r r s , h e / s h e m a y h a v e t o m o d i f y t h e s y s t e m d e p e n d i n g on

the situatiun.

C i l t e r n a t i x d e 3 is

Since t h e b a s e L w i t

a

is

complete

ea,;i i y d e v e l o p c u s t o m i z e d the

in

httrit

T h i s paper-

c i i r i l ized. deve;cjpnient

computer system,

software.

compc!ter-

In

presents

expenzive. t h e u s e r can

addition,

sc;broc:tines

as

least

the

usually

the

roitware

or- t - e r n e ? c a n b e + L i l i y

fundamentals

necessary

for

the

a d a t a a c q u i s i t i o n s y s t e m b a a e d un a m i c r o c o m p u t e r

,xi

s y s t m l [ 1 i.

Commodore 64 i C - 6 4 ! 3.5

a b a s e c!ni t

cystem

the

at

system.

I

m i c r o c o m p u t e r s q r t e m h3.s

T k i s m i trocumpcrter

time

the best-selling

As

ceiected

was t h e b e s t - s e I 1 1 n g

development

o+

been

compc!ter

of

t h i s d a t a acquisition

si;stelTi,

the price

cumputer- s v s t e m w a s ainong t h e l o w e s t w h i l e i t s a t i s f i e d ~ e q u l r e m e n t s fclr a d a t a a c G u i s i t i o n s y s t e m . b s emphasized

+ o r a ciatzt s c q i : i s i t i o n

bat-dwsre c n n i i g u r a t i o n s a r e p o s s i b l e

ar!a;oy

world

ipaper.

L.it-ewi5e: an i n f i n i t e

Hrjwsvet-,

an

with

the

computer

+or

5,:,5temm

interiacing

of

the

all

the

it should

Howeuer,

t h a . t m a n y o t h e r c o m p u t e r - 5 - j s t e m s may a l s o

a s a ba5e u n i t

quii1ified

somputef

be

we!l

..iJarza~is

t5e

real

o t h e r t h a n t h a t shown i n t h i s .

v a r i e t y of

software can b e developed.

a t t e m p t h a s b e e n made t o 5 i m p l i f y t h e c i r c u i t r y a n d

i 9 - f t w a t - e a s mcich a s p o s s i b l e .

1 . S.r'STEM FV!EF:'d5Elr) .FIJ.1

i 1lustrntes

data scquicitron 5;ystem c o r ? s i s t s of !C64SI7

a

block d i a g r a m of

!C64DAS)

t h r e e subsystems:

a Commodore 6 4 - b a s e d

and its s i g n a l f l o w .

The s y s t e m

a Commodore 64 m i c r o c o m p u t e r s y s t e m

t h e cot-e uf w h i c h i s a Commodore 64

('2-64)

microcomputer.

446 3

condltlon~ng system (SESf,

signal

!IFS!.

T h e e n t i r e process

c o n t r o l of t h e C - 6 4 . acquic-xtion

is

under

the

? h e p u r p o s e of t h e m i c t - o c o m p u t e r - b a s e d

data

of

data

a n d a.n i n t e r f a c i n g s y s t e m acquisition

i s

t o c o n v e r t electrical a n a l o g s i g n a i s c o m i n g f r o m

s e n s o r s / t t - a n s d u c e r s t o d i g i t a l d a t a and store i t

trmpot-arily

or

p e r m a n e n t l y i n t h e compoctter s y s t e m .

SIGNAL C O N D I T I O N I NG SYSTEM (SCS)

r--------

Commodore 64 Data Acquisition

Fig.1

1

System Signal Conditioner

SensordTransducers

I

.............. I

I

Signal Conditioner

))io L____--_1

COMMODORE 64 MICROCOMPUTER SYSTEM (C64S)

r----------1 I

I

Disk D r i v e U n i t

INTERFACING SYSTEM( I F S )

i I

____---Commodore 64

Control Main Frame Computer IBM 3081-D24

lrt

the

f o l l c w i n q e a c h of

wi 1 i b e d : . s c c t s s e d .

t h e s e s ! i b s y s t e m s and i t s e l e m e r ? t s

447 2.

SIGNAL CONDITIONING SYSTEM The o u t p u t s i g n a l

seldom

(SCS)

coming

directly

a

from

transducer

a s a n i n p u t t o I n t e r f a c i n g S y s t e m (IS).

suitable

1s.

The I S

requires t h e input to be within certain voltage l i m i t s .

For

the

s y s t e m p r e s e n t e d i n t h i s p a p e r t h e s e l i m i t s are

+5V.

If

the

signal

too l a r g e ,

is

i t must b e a t t e n u a t e d .

t h e s i g n a l is t o o s m a l l ,

hand,

if

data

suffers.

Accordingly,

There+ore, the

On t h e o t h e r

t h e r e s o l u t i o n of t h e a c q u i r e d

the

signal

should

be

amplified.

s i g n a l c o n d i t i o n e r shoctld i n c l u d e a n a m p l i f i e r

w h i c h would a d j u s t t h e s i g n a l

voltage

range.

should

Also,

and

it

the

amplifier

to

f i t

be

a

in

equipped

desirable

a bias

with

control. Another f u n c t i o n d e s i r a b l e f o r noise

60

wire(s),

Hz

n o i s e r a d i a t e d f r o m power l i n e s .

diffet-ential

be

amplifier

e s s e n t i a l l y of

may

common mode t y p e .

used a s i m p l e t w o s t a g e

cost

low

performance, desired:

is

The

encountered

The u s e of

shielded

w i t h t h e s h i e l d p r o p e r l y g r o u n d e d , u s u a l l y r e d u c e s b0 H r If

two

conditioner

T h e s i g n a l may c o n t a i n n o i s e .

noise t o a negligible level.

qf

signal

n o i s e v a r i e s b u t among t h e m o s t f r e q u e n t l y

sour-ce of

is

capability.

reducing

the

a

e.g.,

used

input

since

the

shown i n Fig.?,

operational high

does

Hi nrJise

6t:i

the

and

1%.

authors

which c o r ; s z s t s

amplifiersE21. impedance

a

work,

not

O n many o c c a s i o n s

amplifer

commercially R3

this scheme

If

?ow

a

high

drift,

is

a v a i l a b l e i n s t r u m e n t a t i o n a m p l i f i e r is R6

G A I N CONTROL

Fig.2

OUTPUT

0

B I A S COHTROL

Simplified Instrumentation Amp1 ifier

448 recommended. the

noise

(Example: source

Analog D e v i c e s A D 5 0 0 and b00 seriesj not

is

noise

level

i i l t e r i n q can b e

is

common mode n a t u r e ,

excessive.

filtering

m a i n frame computer.

is

shown

in

An e x a m p l e o f The

Fig.3.

if

procedure

filter of

The

the signal

i n t h e C-64

either

simple active

detailed

and,

necessary.

is

a c c o m p l i s h e d by h a r d w a r e as p a r t of

c o n d i t i o n e r or s c f t w a r e i n a c o m p u t e r ,

If

i t cannot be

or i n s t r u m e n t a t i o n a m p l i f i e r :

reduced by a d i f f e r e n t i a l the

of

-

or t h e design

active filter

Example o f Second Order Butterworth Active Low Pass F i l t e r Fig.3

0 $ouTpuT

d e s i g n c a n b e f o u n d i n Ref ' s . C Z I S C 3 1 . however.

ttiat

Theref ore, reduction

the

filter

the design i5 and

the

signal

p o s s i b l e by software: digital

filters

A1 though

t.hr r

it

well,

is

- w. f -

is

a

a

compromi s e t23.

f i 1 t e r s n o i s e of

between

Filtering

One

iilter.

rectangular

note,

b a n d w i d t h of a s i g n a l .

the

bandwidth

a digital

of

also

is

the

~ ~ i n d o wf i l t e r

noise

simplest (r.w.f.).

moderate f requnecies

quite

i n e f f e c t i v e i n suppressing g l i t c h type noise.

Olvmpic a v e r a g i n g f i l t e r ressonably

reduces u s u a l 1y

to

I t is i m p o r t a n t

C 4 1 is

a5

simple

as

the

r.w.f.

The and

e f f e c t i v e i n reducing glitcnes.

7.. INTEKFACING SYSTEM ( I F S ) At-:

beior-e

ar.aloq s i g n a l (5) m u s t

beinq trarisn!itted

be converted to a d i g i t a l

to t h e computer.

thev

must

m o r e than one s i g n a l be

multiplexed.

,

Thus,

i.e. the

(5)

The a n a l o g - t o - d i g i t a l

tal

c ~ f i ~ e r s i ois n executed bv an analog-to-digi I f t h e r e is

signal

,

converter

(ADC).

multichannel s i g n a l s ,

IFS

must

perform

two

449 functions:

multiple:-:ing

!Integrated Circai t ) are

o n cme c h i @ .

functions

&/a

and

available eowevet-,

Fip.4

which

intearate

the d e s i g n o f

CD4ii51 (€4-channel MUS)

and

l o w cost

ADCCjSG4 ( A n a l o g

INPUT CHANNELS

I

20

PC,

vcc

ijii

8

-

t

VIE1

13

OUT

1

2

KKiT

3

PBO

C

18 DO

PB1

D

17 Dl

PB2

E

PB3

F

PB4

H

7

PB5

J

0

PB6

K

PB7

CLKIN

12 06,

-

to-

T h e s e -two

F i g . 4 Interface System for Commodore 64 Data Acquisition System

5

L

DG 1 0

4'

11 0 7

t

v 1K

-

are

3.hrps

ctsed.

a.d,:antaqe

Rather

&I?C

e.9.

PB11

PORT 1

N o f USER PORT CN2 = GROUND

o+ t h e rwo-chip

than

assigning

(of

a

shared

type;

trv

all

approach

sern~orsf of

sigcsl

onIv a n e s i p n a l i o n a i t i o n e t -

s.r-6

diiierential

PBO

I N H "EE 'SS

many t r a n s d u c e r s

thac

transducer,

11 10

less e z p e n s i v e t h a n ~ r i ec h i p w h i c h d5es

Gnother

5Lippose

chip.

-

6

1K

arid

-

IFS.

4

CD4051

t5

,oh.

12

MUX

4

two

a n d an ADC.

Ciigi tal C o n v e r t e r bv s x c c e s s i v e a p p r o x i m a t i o n m e t h o d !

USER PI3RT CN2

the-se

an a p p r o a c h i s a d o p t e d i n t h i s

~ l l i i s t r a t e sa n e x a z p l e uf

w h i r h consists o f

IC s

riwnber o f

a m u l t i p l e x e r !!<EX)

to use t w r , s s p a t - a t e c h i p s :

palier

copvet-sicn.

afi

as.

foi3owr:

t h e ?,amp

t i , p e a.t-s

is

rcr,ditronet

can b e p l a c e d

transducers.

eqti~va';e!ii

~

4

3

eac'r

h e t x e e n EUX

Tf t h e siqi-,aIs

a r e rijF

CD4C151 s h c m f d b e r e p l a r e d w i t h an a p p t - o p r i a t e

CD4i152 w h i c t . h a s a c a p a b i l i t y of m u i t i p i e : - ; ; n g

four

c!iffer-e!itia? i n p u t s . T h e r e s o i u t i o n of ar: ADC

I.C.

c h i p m a y b e €4 S i t s ,

I(:) Sits,

12

or

bits

I6 bits.

i n a better resoiution.

Larger b i t s r e s u l t

s c a l e w h i l e t h e e r r o r b y 12 b i t - c o n v e r s i o n important t o note.

riot

Zjrec i s i o n

is

o n l y C1.024%.

t h a t t h e r e s o l u t i o n is

however,

accuracy.

Fnrthermore.

a c c u r a t e e n c u g h t o take f u l l

few

to

transducers

The

circui trr

simpler

t h a n a 1 0 or

using

t h a n t h o s e u s i n g a 12

t h a t many o f

then

sof t w a r e

present

time.

There

s l o p e method, dual

slope

noise.

W h i l e a f i 1t e r i f i g the

principle

with

instrumentation

a

pt-oduces

the

and

performance

flash.

The

Sirice

operation.

reduces

feature,

However,

slope

it

for 1

high

s t r a i n gauge (12

method

ampliiier of

very

a

a prominent

accuracy

approximation

dctal

.The C o R v e r s i o n t i m e b y t h e

dual

t r a c e a b l e t o t h e t J a t i o n a i Bureau of The s u c c e s 5 . i v e

the

They- d i f f e r

favorable

For example,

an

8 bitat

t h i s method

is t y p i c a l i y a few m s .

high

the data

conversion:

method

a

is

t h e signal.

that

wonder

an a d b a n a t g e a n d d i s a d v a n t a g e .

is

characteristic

reports

no

systems

of

characteristic

can b e achieved.

manufacturer tcgether

the

This

bandwidth oi

d u a l s l o p e method accuracy

speed

is e s s e n t i a l l y a n i n t e g r a l

method

frequency

limits

is

probably change i n f u t u r e .

successive approximation

integration has a low pass high

It

are t h r e e t y p e s of ADC's g e n e r a l l y u s e d .

i r o m each o t h e r i n t e r m s of

the

a r e much

or s o m e t i m e s e x c l u s i v e l y ,

The 5 . i t i ; a t i o n w i l l

are

bit-converter.

a n 8 bi t-converter

is e s p e c i a l l y t r u e f o r h i g h

It

than

l e a d i n g manufacturers of

a c q u i s i t i o n system o f f e r mostly. 'systems.

12

bit-converter.

the nation's

to

$%8 b i t - c o n u e r t e r

w h i c h t h e s y s t e m is c o n n e c t e d .

and

is

even 8 b i t s r e s o l u t i o n .

advantage of

is much f a s t e r afici less e ; : p e n s i v e

I t

related

A d a t a a c q u i s i t i o n s y s t e m car: p r o d u c e n o i n o r e a c c u r a c y

t.ransducer

a full

i s 0.33X of

the error b y a n 8 b i t - c o n v e r s i o n

For example.

bitsi

s t r a i n gauge

microinch/inch

Standara. method

in

general

executes

coriversi on

faster

mc;ci:

than

the

dual

T ~ p i c a l l y , i t . = c o n v e r s i o n t i m e i s 3:)ps.

capable oi :nethail

1Cli.i

ps c o n v e r s i o n .

ti:.

compared

m l c r i i i G bits

iftir35cj

dital

the

disadvantage

c~ailIvers.zon s p e e d . 51

fast

scci-ierter m a y b e u s e d . i-onvert1ng

At481!:ti3

chip

thls

Ir!

s'.?t:cessfv

i

(I I

c

1 5

Icwer

ctiannel

the

HUX atc, s h o w n .

!.i'G-

Hgwevet-.

to

1-F

dynamic

+lash

a

parallel

a

l i t a n d 3 are c o n n e c t e d

part

oi I : o m p l e x

The t-aiige o f

(CIFtl

since

r

d < d e c i ~ i m l : . The b i n a r y v a l u e of ! o r - a t z o n 56721.

Therefore,

#1,

a channel.

at:d Fi3i.i s? e l ~ e i r !t e s p e c t i v e l y a t selecCed

1 5

which employs t h e shows an The

are

ir.put vcitages i s limit

i s +5V

and

voltage references these pins

t-espectireiy.

1 . w h i c h is a c t u a l l y of

t h e C-64

a5 shown.

F o r example,

1Ihlghi

!:!lio?-;i a n d

if

I?

PBZ.

ther:

l(>lB,

equals

1 PR's is s t o r e d i n

memory

lf:tl(binary), Fort

example

i npcits

desired,

t o game p o r t

interiace Adapter

a r e ~ i s e dt o s e l e c t

i s

the

lZC!

of

speed

h i g n e r and l o w e r voltages:

11,

5

i nputs.

a n a f oq

a w i d e r t-anqe

?iris

channel

Flg.4

used.

r b i s w a y no additioi-ial

ccnr.ected

Fb:

CiT)C8(1).3,

Iri t h i s .=are the uppet-

be

pins

is

8

ca::

lhese

the

the i n m p a r a t o r s .

a

at-tained

a l o w cost c h i p ,

iar

1inl:t

a! e n e r e s s a r , . ' .

has

pasf;

preferred.

employs

a s i a s t a 5 t h e t-ise t i m e of

i ~ ! n : t e d b v VDD and VEE.

the

a

for

required,

Sirice t h e c o n v e r t e r

dpp'O::Ima-klon m e t h o d

tu

this

E m . paper

r c c i i t r *?

cvrineat.ed

is

lowers

alsc:

acquisition

ronvet-sion

;Matsushital

iiegasampf E;s;sec

I+

filter-

low

method u . t i 1 izing m a n v r e s i s t o r s a n d i o m p a r a t o r s t

c ~ ~ : v e t - s : o n 5Dee.d

.t

pass

lob4

I n cases o f d a t a

an e x t r e m e l i ;

Wt;er.

A.7

a

a

or

t h e 51-tccec-si VE; a p p r o x i m a t i o n i s g e n e r a l I y

ynai

of

c o n v e r - s i o n w h i c h w i l 1 a p p e a r a s . v i r t u a l noise.

einpluving

Eki:

does.

is a p o s s i b i l i t y o f

s l o p e method

l - t i i i ; p r o b l e m c a n b e r e s o l v e d by- s a m p i e - a n d - h o l d

filter-.

rrretiiod

E v e n a low c o s t c h i p is

major

The

slope

or

t h e c h a n n e l s e l e c t i o n c a n be e x e c u t e d

by

i n mem.10~.

s t o r i n g a c h a n n e l number

exercised,

however.

t h e C-64

Clhen

C a u t i o n must b e

is a l s o c o n n e c t e d t o t h e k ; e y b o a r d !

This p o r t

is t u r n e d o n ,

56321.

t h e bootstrapping procedure assigns

C I A #I P o r t B a s an i n p u t p o r t by poking D a t a D i r e c t i o n

at mem.lor.

(DDR)

matrix which

56.323 w i t h CJ.

makes

the

B a c t s as c o l u m n s o f

The p o r t

keyboard

work.

Therefore,

niwnber

Once

is

ttiis

comrn:.ir;ication impussible.

Ref

' 5

17J & C 1 0 7

is

keyboard through

detailed

w i t h 0 a5 p a r t o f

routine

before

information

in t u r n connected to pin 6 ,

to the of

ADC

can

one h a l f

of

of

the

reference

far- d a t a c o n v e r s i o n .

case.

R E F I T = +2.5

iliqztal

IPS

t h e program t o b e

be

back

to

obtained

the from

(:!at+, .:.?iGi ( $ - a t

I n t h i s case t h e

t h e ADC.

However.

the

ADC

is

P i n 9 c o r r e s p o n d s t o REF/2,

Here,

voltaqe.

the

reference

i .e.,

255,

is

assigned.

In

this

T h e r e f o r - e , t h e r e f e r e n c e v o l t a g e i s +5 V .

V.

t h a t +5

w h i c h is

t h e MUX,

V

corresponds

1111

to

the

maximum

converted

13318).

T h e c cw i ' v er si bn s p E E d d e p e n d s o n t h e ADC c l o c k ,

which i n t h i s

c a s e is d e t e r m i n e d b y r e s i s t a n c e a n d c a p a c i t a n c e C 6 1 .

ps5sitrle

to

s c a l e v o l t a g e t o w h i c h t h e maximum b i n a r y

riumlier

I t i i s imp!

instruction

i n p u t by u s i n g p i n s 6 a n d 7,

differential

a

means t h e f u l l

.doitaqe

and

t h e corresponding signal

ground-referenced.

is

taking

input,

w i t h pirt 7 d i s c o n n e c t e d f r o m g r o u n d . 1-e.

disabled

going

i s a u t n m a t i c a l l y c o n n e c t e d t o p i n 3, o u t p u t o f

capable

t h e channel

.

When t h e c h a n n e l has b e e n s e l e c t e d ,

input

is

t h e keyboard becomes

i t is n e c e s s a r y t u w r i t e a n

acquisition

More

the

computer

(mem.loc.56323)

~ i s e df o r d a t a

Leeyboard.

the

Hence.

t h e DDR

poke

however,

with

a

( 1 1 1 1 1111H) m u s t b e p o k e d i n t o t h e DDR.

255

done,

it

if

d e s i r e d t o u s e t h i s p o r t as a n o u t p u t p o r t t o c o n t r o l selectron,

Register

t o c o n n e c t an e x t e r n a l

I t is a l s o

clock t o p i n 4 w i t h r e s i s t o r a n d

capacitor

removed.

particular

WR

conversion

start

the

of

this

can

before

the

conversion.

INT

When t h e c o n v e r s i o n is c o m p l e t e d . signal

with

S i n c e C S a n d R D a r e b o t h h e l d low,

new d a t a i n .

!pin 3 ) a l o n e t r i g g e r s

This

speed

t h e A / D c o n v e r s i o n is a l w a y s c o m p l e t e d

t a k e s

computer

the

combination outpaces t h e a u t h o r s' software i n t h e

RC

i-e-,

C-64:

However,

!pin

5)

low.

goes

b e u t i l i z e d f o r c o m m u n i c a t i o n b e t w e e n t h e C-64

a n d t h e ADC b u t t h e d a t a a c q u i s i t i o n r o u t i n e c a n b e made c o m p l e t e withoc:t

this

connection.

executed by t h e conversion

If

microprocessor

speed

of

the

the

data

(CFU6510)

ADC,

this

kind

n e c e s s a r - y s i n c e t h e CPIJ h a s t o w a i t f o r t h e conversion

before

taking

t h e new d a t a i n .

d a t a a p p e a r s a t p i n s 17-11 w i t h p i n significant b i t #2 o f

t h e C-64,

56577,

the

ADC..

s o f t w a r e of

so t h a t

4.

completior?

of

dat.a

The c o n v e r t e d 8 b i t the

least CIG

This

t h e c o r r e s p o n d i n g DDK When m e m - l o c .

is p e e k e d b y BASIC or

assembly

feature

of

the

t h e next data C-64

simplifies

Once t h e

data

conversion

by

h a r d w a r e and 1s

stored

in

i t c a n be t r a n s f e r e d t o RAM (Random A c c e s s M e m o r y )

m e m . l o c . 56577 wi 1 1 b e r e a d y t o s t o r e t h e n e x t d a t a .

ST?FrwF\RE

rlEvEL.owErn

The d a t 3 a c q u i s i t i o n p r o g r a m c a n b e w r i t t e n i n e i t k e r or

the

a u t o m a t i c a l l y s e n d s a n e g a t i v e p u l s e t o PCZ,

t h e data acquisition.

m e m . loc.5.5577,

than

communication is

b e p o k e d w i t h (3 i n a d v a n c e .

must

CIA

This

of

representing

Therefore,

which i n t u r n t r i q g e r s t h e s t a r t of the

faster

w h i c h is d e n o t e d a s U s e r P o r t CN2 i n Fig.4.

which c o n t a i n s t h e d a t a ,

language,

is

routine

1 - h i s d a t a is t r a n s m i t t e d t o p o r t E; o f

!LSB).

p o r t i s a s s i g n e d merit. l o c . 56577.

[ m e m . loc.56579)

17

acquisition

assembly

language.

block: o f f r e e FiAH s p a c e .

The I t

data

acquired

BASIC

must he s t o r e d i n a

is focmd t h a t t h e f a s t e s t con-..fersion

454 s p e e d u t i l i z i n g BASIC

abocct

is

faster

high

assembly

level

1anguage

a

If

faster

t h e p r o g r a m may h a v e t o b e w r i t t e n

c o n v e r s i o n s p e e d is r e q u i r e d , in

3i’ s a m p l e s / s e c .

language

or

programming,

assembly the

language.

speed

With

4MK!

exceeding

s a m p l e s / s e c is e a s i l y a t t a i n e d . The d a t a a c q u i r e d c a n b e g r a p h i c a l l y d i s p l a y e d using

bitmapping

on

T h e b i t m a p p i n g vdith BASlC

scheme.

the

1s

iew

possible

h u t v e r y slow:

a screenfctl of

Theref ore,

a s s e m b l y 1 a n g u a q e p r o g r a m is s t r a n g l y recommended

an

minrites.

displat.. The a u t h o r s h a v e

acquis: t i o n , two

bitmapping t a k e s

w h i c h t a k e s lrss t h a m o n e second f o r a s c r e e n f u f

f o r bitmapping, of

a

CRT

acquisition

and

s o f t w a r e used for

physical

DATALUG.

‘.ma::imcim:

sof tware

for

data

and

1s

suitable for

DATALOG is s i m i l a t -

moni t o r i n g

quantities,

and

e.g.,

recording

and self-explanatory.

r a t h e r slowlr c h a n g i n g

atmospheric temperature

e n e r g y c o n s u m p t i o n i n a bcci P d i n g .

to

I t is

many c o m m e r c i a l l y a v a i l a b i e d a t a l o g g e r s .

I n

of

T h e DACE is f o r r e l a t i v e l y f a s t

samples/sec)

4300

package

The p a c k a g e c o n s i s t s

a t r a n s i e n t phenomenon.

o b t a i n i n g d a t a of

suitable

a

d i s p l a y and t r a n s m i s s i o n .

DGCO

parts:

developed

B o t h p r o g r a m s are

The main f e a t u r e s of

and

hourli/

menu-driven

DACQ a n d DATALOG w i l l

b e summarized i n t h e f o l l o w i n g .

DACQ

was

introduced

in

Fief.Ll3.

I t

has

the

following

f e a tLit-es :

il)

Number o f

( 2 ) Speed nf Standard

channels: sampl inq:

t3 F r o g r a m m a b l e up to 430CI, s a m p l e s / s e c .

r a t e s a v a i l a b l e f r o m menu a t 1 0 ,

sampling

100

a n d 1OVO s a m p l e s / s e c . ( 3 ) Number of

data

to

be

stored:

Frogrammable.

Default

455 r e s n l ts i n 320 s a m p l e s / c h .

(4) N o n - v o l a t i l e (5) Display:

s t o r a g e ot: d a t a :

S e q u e n t i a l f i l e in disk..

t w o or t h r e e c h a n n e l s of

One,

s e l e c t a b l e f r o m menu.

R e s o l u t i o n of

data

vs

t i m e

d i s p l a y - 32V X 200

f o r one channel. RSZTZ or MODEM.

!6) P a t a t r a n s m i s s i o n :

i n BGSIC l a n g u a g e o n l y .

DG1ALC)G is w r i t t e n

Among

the

main

f e a t u r e s a r e as follows:

( 1 ) Number o f

channels:

or

8

16

with

MUX

modification

(F3g.5).

CHANilELS 910 11 12 13

-

13 14

l6 'OD 3

- 15 - l 2 %051

-1

11

5

10

15 16 14

CHANNELS

l6

""

-

13-1 15 -3

MUX CD4051

-4 l2 1-

11

-t10

'SS"EEINH 8 7

2-7 INH'EE'SS 6 7

6 L

5

5-6

9 t-9

2

Fig.5 Sixteen Channel Mu1 t i p l e x i n g

14-2

PBO

-8 8

-

- PB1

t t e PB3

( 2 ) S p e e d of

sampling:

c C-64 PORT 1

Programmable anywhere from e v e r y few

s e c o n d s t o e v e r y few hours. ( 3 ) Number of

data

to

be

stored:

Programmable

at

menu

prompt.

(4) N c j r i - v o l a t i l e

(5) D i s p l a y : DCICQ.

s t o r a g e of

data:

Sequential f i l e i n disk.

S h a r i n g t h e DACL! d i s p l a y p r o g r a m .

Then s e l e c t D i s p l a y f r o m menu.

Load a n d run

456 (6! D a t a transmissian:

t7r

R S 2 Z 2 a n d MODEM.

When DATALOG p r o g r a m shown

are

Also

period.

the

on

CRT

shown

s a m p l i n g s made a n d

1s

on

actlvated. ard the

16 c h a n n e l s of

revlsed CRT

data

at e v e r y s a m p l l n g

are

the

number

t h e t i m e e l a p s e d from t h e b e g i n n i n g

of 0.f

d a t a acqulsltion.

5 - E X P E R I M E N T A L RESULTS

€::ampies

e x p e r i m e n t a i r e s u ? ts u s i n g t h e d a t a a c q u i s i t i o n

CI+

s.ysten\ prececterf

ip.

t h i s p a p e r are shown i n

Figs.&

&

7.

Fig.6

Fig.6 Cam Velocity Experiment Data by DACQ

*ep:-estlrrtci was a t t a c h e d ocit

plot

of

cam v e l o c i t y v s t i m e .

to a cam-cam

0s t b e e n g l n e ,

follower assemblv.

A velocity transducer which had been t a k e n

and t h e o u t p l i t o f t h e t r a n s d u c e r w a s c o n n e c t e d

457 t o t h e C64DAS. d i g i t i z e d signal The

data

u s i n g SAS

was

The assembly was r u n by an e l e c t r i c

DACO

was a c q u i r e d b y t h e C64DAS u s i n g sent

to

program.

t h e main f r a m e computer and was p l o t t e d

FLOl program.

Another example shown intensity used f o r

motor and t h e

i n

Fig.7

monitored by a p h o t o v o l t a i c

i s

the

cell.

record

of

solar

DATALOG program was

t h i s experiment.

SOLAR I N T E N S I T Y

Fig.7

c . 7

Solar

Intensity Experiment

Data b y DATALOG

A.M.

6. CDNCL US I CIN There e x i s t s a b e l i e f a

toy.

among academic people:

a Commodore i s

P r o b a b l y more v i d e o games have been developed f o r t h e C-

64 t h a n any o t h e r computers. foundation.

A t

So,

t h e same t i m e ,

this

however,

belief

i s

not

without

we have t o r e a l i z e t h e

458 fact that used.

No

"1" i s "1" a n d "0" is "0" no m a t t e r

sophisticated

computer

can

q u a l i t y than

" 1 " c r e a t e d b y t h e C-64.

limitations

i n u s i n g t h e C-64

system, i.e., C64DAS

published

an

Of

article

questions.

computer of

"1"

course,

is

better

t h e r e a r e many

as a b a s e u n i t o f d a t a a c q u i s i t i o n

satisfactory

inquiry w e r e received. technical

produce

s p e e d a n d memory c a p a c i t y .

produces

what

Rut,

results.

Since

the

the

authors

o v e r o n e hundred letters of

i n BYTE C l l ,

Many o f t h e m w e r e Especially,

i n many cases,

some

asking

specific

t h e a u t h o r s w e r e s u r p r i s e d by

many l e t t e r s f r o m E u r o p e . I t i s a l s o i n t e r e s t i n g t o know t h e nation's

leading

u s i n g a C-64, Glso,

manufacturers

fact

that

one

of

the

of h y d r a u l i c equipment h a s been

which w a s hooked t o

million

dollar

a v e r y l a r g e h o s p i t a l h a s been u s i n g C-64's

equipment.

t o monitor

medical equipment.

The

d a t a a c q u i s i t i o n s o f t w a r e p a c k a g e is a v a i l a n l e f r o m t h e

f i r s t a u t h o r a t a nominal c h a r g e t o b e p a i d The

package

user's manual. suggested

to

includes If

you

explore

to

his

department.

D A C a a n d DATALOG p r o g r a m s i n a d i s k a n d a uses

this

package

very

often.

it

is

a p o s s i b i l i t y t o t r a n s f e r t h e programs t o

EPROM a n d p u t i t i n a game

cartridge.

You

really

can

make

a

s o ph i st i r a t e d t ny

kEFERENCES Lll t::,.Ukamctra a n d t:..Gqhai-Tabriz, "A Low-Cost Data-Acquisition S y s t e m . " B Y T E , F e b r u a r y 1985, pp. 199-202. L Z J t.:..Okamiira, "Measurements L a b o r a t o r y Note," Dept . o f MEGM,

North D a k o t a State U n i v e r s i t y . C 3 1 H.M.Berlin, Design of Active Filters with Experiments, B1 a c k s b u r g . L 4 1 C.Okamura a n d W-Chu, " O l y m p i c A v e r a g i n g Method a s a D i g i t a l to be published in the Journal of Computer Filter,"

459

Applications (ACCESS). "A/D and D/A Converters Link: Digital Controls t o an Analog World," Control Engineering. December 1984, pp.5557. National Semiconductor, Linear Databook. Intersil. Data Acquisition Handbook. Analog D e v j ces, Integrated Circ~tit s Databook. Commodore Business Machines, Commodore 64 Programmer's Reference Guide. 1:. A g h a i -Tabr i z Data Acquisition based on Commodore 64, M . S . Thesis.Dept.of MEAM. North Dakota State University, 1985.

C51 H.M.Morris,

C61

C71 C81 C91 L 10 1

.

ON THE ESTIMATION OF MONTHLY MEAN PHOSPHORUS LOADINGS M.E. Thompson and K. Bischoping University of Waterloo ABSTRACT This paper reports on a study of estimation of monthly mean phosphorus loadings, using daily readings from the Niagara river for 1975-1982. Two alternatives to the current method of a c c o u n t i n g for missing data are proposed from finite population sampling theory.

1.

THE PROBLEM

Suppose we wish to measure the mean daily amount of a chemical such as phosphorus flowing past a certain location in a riverlover a short period of time such as a month . Formally, this can be expressed as

where N is the number of days in the period, xi is the flow past the location on day i, ci is the concentration of phosphorus in the water on day i, and the "loading" for day i is yi = c.x 1 i' Typically the flow xi is known for all N days, but the concentration ci is measured only for a sample s of n days. If we assume for simplicity that there is no technical error in the measureTent of concentration when it does take place, then the yi are known for

days in the sample s, and the problem of estimating p is one of Y estimating a finite population mean from sample values of the variate. Since the failure to measure concentration on certain days is not controlled, there is no reason to believe that the sample s is generated bv anything approaching a random sampling design. Thus the choice of estimator is not convincingly justified by an appeal to randomization based properties, although this has sometimes been attempted. For example, Dolan et a1 (1981) studied several ways of estimating p

Y

and concluded that the best choice was a stratified

461

version of

See also Lam et a1 (1983). Here, -

N (Ci=l

=

(IiEs y.)/n = sample mean daily loading

=

(Iics xi)/" = sample mean daily flow

UX

y

x

xi)/b1 = mean daily flow,

2

s2 =

[liES

s

[IiES xi yi

X

-

XY-

xi

-

n x21/(n-1)

-

n

=

x y]/(n-l)

sample variance of flow =

sample covariance of flow and loading.

Without the factor in braces, is the classical ratio estimator uY (Cochran, 1977); the factor in braces is designed to correct for sampling bias when simple random sampling has been chosen to select the sample. The estimator (1.2) was the only one considered by Dolan et a1 which took into account the knowledge of all values of the flow xi. Thus it is not surprising that it performed relatively well. HOWever, from the standpoint of recent developments in sampling theory this choice can be criticized. First of all, as indicated above, the sample of days for which concentrations would be available is not necessarily random. Second, there are other estimates besides the ratio estimator which would use the flow data fully in estim-

-

ating 1-1 the choice among these ought ideally to depend on a model Y' for the daily concentration. It can be shown that the ratio estimator would be optimal if the variance of the concentration varied as l/flow (Royall, 1971). From data from the Niagara River 19751982 this inverse relationship does not appear to be attained, although in the winter months flow decreases and concentration fluctuates a little more wildly. Thus the most widely accepted justification for the ratio estimator does not apply in this case. Since the sample is not random and some serial autocorrelation in the concentration series seems likely a predictive or model based aDproach to the estimation is indicated.

In this approach

yl, ...yN are jointly distributed random quantities and the sampled

462

yi are used to "predict" the sum of the unsampled yi and hence p Y' It is also to be hoped that the model will yield suitable estimates of uncertainty in the prediction of u Y' An examination of the Niagara River daily series 1975-1982 indicates that the mean level and covariance structure of the concentration series vary seasonally. Within most months, however, the log concentration series appears approximately stationary and Gaussian. Thus, for estimating monthly means no stratification seems to be necessary, although for the estimation of yearly means it would be desirable to stratify the series by season or by month. In the next sections we present some estimators based on simple models for concentration or its logarithm. The behaviour of these and corresponding uncertainty estimates is examined in a small scale empirical study involving artificial deletions from two months for which complete data are available. 2. 2.1

SOLUTIONS FROM THE PREDICTIVE APPROACH TO SAMPLING THEORY Estimators based on a zero correlation model for concentration. Suppose it is reasonable to assume, for simplicity, that the ci are stationary with mean C, variance 02, and zero autocorrelation. In this section they are not assumed to be Gaussian. Then the best linear unbiased estimator of C is (2.1) The best linear unbiased estimator (predictor) of 1-1

Y

is

(Royall, 1971). Since

in the sense of the above the mean squared error E (fi - 1,) IY model can be estimated in a robust manner by

where

463

See Royal1 and Cumberland, 1978, for a discussion of the sense in which (2.4) is robust to departures from the assumed constant variance of the ci series. If the ci are close to normally distributed or the xi are not highly variable it is reasonable to apply a normal approximation to the distribution of N(C - Y ) as exhibited in (2.3). YI 2.2

Estimators based on time series interpolation for concentration We may say that interpolation type estimators take the form

where Eiis an estimated or interpolated value for c from the i = 6 for sample. Estimator (2.2) in fact is of this type, where unsampled i, and it can be shown that the ordinary ratio estimator follows the same pattern with Ei = y/x. However, in this section, estimators with variable Ei will be derived, as would seem appropriate under models taking account of the time series structure of the data. Since for the Niagara river data the logarithm of concentration appears symmetrically distributed about its mean, consider the model

zi

Rnc.

=

X +

q

(2.6)

i'

where qi, i = l,...,N is a mean 0 Gaussian time series with auto2 correlation function p Under this = corr(nj,nk) and variance 0 jk 2 model ci is marginally lognormal, with Eci = exp{X + 0 /2} and 2 2 C O V ( C ~ , C ~=) exp(2X + 0 (1 + p ) I - exp12X + 0 I . Assume to begin with thato2jkand the p are known. If Rngi jk denotes the best linear unbiased predictor of an unsampled Rnci then it can be shown (Bartlett, 1983; Ripley, 1981) that

.

,.

where X is the best unbiased estimator of X which is homogeneous linear in the

nc

1'

and the aij are chosen to minimize

464 E(Rn2

-

Rnci)

2

.

I n general a i j of

-1

(2.7) i s t h e (i - j ) t h element of Vzs Vss,

where i f V i s t h e m a t r i x o f t h e p j k , m a t r i x w i t h rows i n d e x e d by m a t r i x Vss

I,,,

s

Vzs

i s t h e (N

-

n ) x n sub-

and columns by s , and t h e n x n

Also, X =

i s analogously defined.

1.,Es

aj Rncj wher;

aj = 1 and a . i s p r o p o r t i o n a l t o t h e j - t h column sum of V ss. 3

With t h i s d e f i n i t i o n

where b i j (2.7).

lkEs

= a . . + a.(la i k ) , t h e c o e f f i c i e n t o f Rnc i n 17 3 j Thus a “ c o r r e c t e d “ e s t i m a t o r i s

with t h i s d e f i n i t i o n E ( E i Setting

Ei

= ci

xt =

ci)

= 0.

i f t h e ci i s n o t m i s s i n g , w e c a n w r i t e t h e e s t -

imator (2.5) of loading a s where

-

( X ~ , . . - , X ~ ) ,

fiyTS Et

=

- 2,” -

= xt

( El f . . . r C i N ) .

Then t h e p r e d i c t i v e MSE i s x t Var ( E

-

-

-

and Var d e n o t e s t h e c o v a r i a n c e m a t r i x ; e l e m e n t of V a r ( $

- 2)

- -

c)x/N2

.

where c t = ( c l r . . , c N ) ,

note t h a t t h e (i - R ) t h

i s 0 i f ci o r c R i s sampled.

Thus anunbiased

o r nearly unbiased estimator of t h i s covariance matrix w i l l y i e l d a n e s t i m a t o r o f t h e MSE o f

GyTS.

With RnEi g i v e n by ( 2 . 8 ) , i t i s e a s y t o show t h a t EEiZR

= exp

= exp

Similarly

465

The resulting formula for E(Ci - ci) (EL - ca) is the (i - R)th element of var(E - c) . The use of the formulas above for estimation of the MSE of 'yTS requires estimation of A , o2 and the piR. h

If a first order autoregressive structure for Lnci is assumed, , where p is strictly between -1 and then piR is of the form p 1. For each day i let ji = number of days since previous nonmissing c value, and ki = number of days until next non-missing c value. If there is no previous [next] non-missing value set Now let c . = previous non-missing c value and ji = m[ki = m]. Pl c = next non-missing c value. The uncorrected estimator (2.7) qi becomes

I'-il

an;.

1

=

where

fi + ai(Rnc

Pi

ai

=

pJi(1

-

-

f i ) + bi(Rncqi - f i )

p 2ki)/(l

(2.9)

- p2(ki+ji)),

k. 2j b. = p '(1 - p 1)/(1 - p 2(ki+ji)). Note that if p = 0, RnSi is simply fi, while if p 1, Rnci a linearly interpolated value between Lncpi, Rncqi. For 0 < p 1, Rnsi will lie between fi: and the linearly interpolated value. In the first order autoregressive case, the variance o2 can be estimated in a consistent manner by -+

-2

u

=

Ziss(inci

-2 - Rnc) /(n -

-+

1);

this will be satisfactory if p is not too close to 1. The parameter p may be estimated or assumed. For moderately long series with almost complete data, to estimate it seems advisable. However, for the short concentration series considered here, the maximum marginal likelihood estimate for p (Ramakrishnan, 1985) has been found to be highly unstable when a large number of c values are missing. EMPIRICAL STUDY In the empirical study two months were chosen for which data were complete. These were March 1978 and November 1979. For these two months the laglautocorrelation for log concentration is estimated at .7 and .4 respectively. 3.

466

For e a c h of t h e two months, 1 0 0 samples w e r e g e n e r a t e d w i t h 5 randomly chosen c o n c e n t r a t i o n r e a d i n g s m i s s i n g , and a f u r t h e r 1 0 0 with 1 5 concentration readings missing.

For e a c h sample t h u s gen-

e r a t e d , t h e f o l l o w i n g e s t i m a t o r s were c a l c u l a t e d . (i) Dolan's estimator (ii) t h e e s t i m a t o r

c;YD

of

(1.2);

of (2.2) YI t i o n f o r concentrations;

(iii) t h e time-series e s t i m a t o r p = 0,

.4,

c;YTS

of

(2.9)

assump-

with

.95;

t h e MSE e s t i m a t o r Vs of

(iv)

based on t h e i . i . d .

(2.4),

associated with

cys:

GyTS.

(v) t h e MSE e s t i m a t o r VTs a s s o c i a t e d w i t h T a b l e s 3 . 1 and 3 . 2 compare t h e performance of t h e mean e s t i m a t o r s f o r t h e two months. TABLE 3 . 1

Performance of t h e e s t i m a t o r s o f p

Mean o f

1;

,.yD of p y D

Mean of MSE

of

f o r t h e month o f March, 1 9 7 8 .

5 points deleted 6715.79

True mean MSE

Y

c1

6745.97

6647.88

49967.10

322972.49

6732.16

6616.07

48869 -93

314361.84

Mean of

cyTSI

p = 0

6723.55

MSE

GyTS,

p = 0

48603.65

of

1 5 points deleted 6715.79

6599.93 313892-74

Mean of

GyTS,

p = .4

6700.40

6716.35

MSE

of

cyTS,

p = .4

43451.50

186371.48

Mean of MSE of

G

yTSl

P = -95

6598.35

6532.24

cyTSl

p = -95

36788.47

174288.32

Means and MSE's a r e o v e r 1 0 0 r e p l i c a t i o n s

467

TABLE 3.2 Performance of the estimators of 1.1 for the month of November,l979 Y True mean

5 points deleted 3717.30

15 points deleted 3717.30

3714.80

3768.61

17597.13

72685.98

3710.18 17310.83

3754.50 67973.80

3708.96

3752.82

17013.73

64804.02

Mean of _yD MSE of pyD Mean of f; YI MSE of f; YI Mean of f; yTS, MSE

of

Mean of MSE of

GyTS,

=

p = 0

cyTS, p iyTS,

= .4

3689.89

3743.64

p = .4

18198.76

69288.76

Mean of f; p = .95 *YTS' MSE of 1 . 1 ~ ~ p~ ' = .95 Mean and MSE's are over 100

3617.29 22303.53

replications

Although the number of replications is small, some observations did emerge: (i) The estimators $ and f; are similar in performance YD YI for both months. They have only a small bias, if any. ~ ~ to have a downward bias, particularly for (ii) 1 . 1 ~tends p = .95, which brings it close to a linear interpolation estimator. However, the stability of this estimator keeps its MSE low, and for March (with actual p about - 7 ) it has the lowest MSE's. h

(iii) The variance estimators VI and VTs give realistic values, VI tending to be conservative. The values of VI and VTs are extremely variable. It would be useful to study the coverage properties of confidence intervals based on them. REFERENCES Bartlett, R.F., 1983. On Estimation with Kriginq for Finite Populations under Superpopulation Models. Ph.D. Thesis, U n i v e r s i t y of Waterloo, Waterloo, Canada. Cochran, W.G., 1977. Sampling Techniques. Wiley, New York, 428 pp. Dolan, D.M., Yui, A.K. and Geist, R.D., 1981. Evaluation of river load estimation methods for total phosphorus. J. Great Lakes Res. 7:207-214.

468 Lam, D.C.L., Schertzer, W.M. and Fraser, A.S., 1983. Simulation of Lake Erie Water Quality Responses to Loading and Weather Variations. Environment Canada. Ramakrishnan, V., 1985. Marginal Likelihood Analysis of Growth Curves. M.Phi1. Thesis, University of Waterloo. Ripley, B.D., 1981. Spatial Statistics. Wiley, New York, 252 pp. Royall, R.M., 1971. Linear regression models in finite population sampling theory. In Foundations of Statistical Inference, ed. V.P. Godambe and D.A. Sprott. Holt, Rinehart and Winston, Toronto. Royall, R.M. and Cumberland, W.G., 1978. Variance estimation in finite population sampling. J. Amer. Statist. Assoc. 73: 351358.

ESTIMATION OF L O A D I N G BY N U M E R I C A L

A.H.

EL-SHAARAWI,

K.W.

INTEGRATION

KUNTZ A N D A .

SYLVESTRE

ABSTRACT Methods

based

on

are

used

interpolation loading The

from

a

variance

is

the

estimated to the

1984.

l o a d i n g of

1

The

results

estimator

source

into

is

loading estimate

a

of

a

the

the

input

system.

given.

yearly during

steady

linear

water

also

Niagara River

indicate

and

This

chloride

the

decline

period

in

input

c h l o r i d e t o Lake O n t a r i o .

INTRODUCTION

is

It the

well

l e v e l of

lake.

known

the

_e_F_

Vollenweider loading

Slater

Bangay

and

source

phosphorus

10,000

metric

1977.

Accuracy

factors

samples

are

load

tool

has

the

estimate.

In

this

follows. into

total

Let the

paper,

i n which are

estimator only

the

Formally,

the

a the

which

sample

recorded:

the of

and

used

problem

point

approximately

on

( i ) with the

the

metric

depends

following:

upon

to

reduction

Erie

from

5,700

consistency

the

depends

eutrophication.

Lake

reduced to

the

or

lake

to of

tons/yr

in

number

of

sampling the

water

analysis

is

(iii)

the

obtain

the

estimating

l o a d i n g i s d e f i n e d as

a c t ) d t b e t h e i n s t a n t a n e o u s l.oad of a s u b s t a n c e

water

load of

in

and

loading is considered.

the

that

estimate

t h e way

a

controlling

been

results

expression

of

chemical substances

recommended t h e

for

1972173

load

care

collected,

of

report

in

the

the

quality

c.( 1 9 8 0 ) a

(1980)

of

and

mathematical

water

(loading)

include

(ii)

performed,

as

tons/yr

which

strategy;

load

that

inputs

phosphorus

S

an

point

applied

1975 t o

integration

derive

a

l o a d i n g t o Lake O n t a r i o by

approach

then

to

or

river

of

numerical

system

in

S during the

the

interval

interval

(0,T)

(t,t+dt).

Then,

the

is

T L

=

E(t) dt. 0

(1)

470

2(t)

can be expressed as the product of the instantaneous water

flow f(t)

dt and the concentration C(t)

of S .

This gives

T L

f(t)

=

C(t)

dt

0

For a set

of

measurements

of

the

concentration and

the

flow

rate, the objectives are to estimate L and its standard error. All

methods

available

for

estimating L

assume

the

finite

According to this approach, the period T

population approach.

is divided into n intervals with

iG

interval (i = 1 , 2

(1)

reduces to

, . . . ,N )

the concentration within the

a constant.

In this c a s e formula

N

where f;

=

C;

the

=

concentration

then becomes how given

methods Casey

by for

and

t o

<

available for n is

of

in

S

i G interval

and

estimating Salbach

the

The finite population approach (1985).

Bischoping mean

(1974).

daily

Dolan

estimating the mean

Two

loading

et g . daily

that the best is the ratio estimator. pointed

and

The problem

estimate L given that the measurements are

N intervals.

Thompson

several ways of (1985)

the

the flow of water during the iLh_ interval.

are

(1981)

load

and

different given

in

described concluded

Thompson and Bischoping

out that the ratio estimator is optimal

if the

variance of the concentration varies as l/flow. In this paper, the finite population

approach

is not used.

The loading is assumed to be a continuous function in time and hence

numerical

estimating L.

integration

methods

are

appropriate

for

The stochastic characteristics of the flow and

the concentration are used to derive the standard error for the estimate o f

L.

Finally, this approach is applied to estimate

the chloride load from the Niagara River to Lake Ontario during the period

1 9 7 5 to 1 9 8 4 .

473 2

SOME COMMENTS O N THE METHODS F O R ESTIMATING THE L O A D I N G The

ratio estimator

(1985)

and

presented

w i l l

hence

by

Casey

and

i s d i s c u s s e d b y Thompson not

be

Salbach

considered

here.

(1974)

Dolan

and

and

Bischoping

The et

methods

al.

(1981)

load are

f o r e s t i m a t i n g t h e mean d a i l y

i=l and

L2

- _

=

c,

f

(5)

respectively,

=

n

where

the

number

of

for

days

which

the

flow

f

and

the

concentration C a r e measured, n and

n

Casey

and

likely

and

flow

Salbach

to

give the

(1974)

very

indicated

close

that

results.

concentration

are

This

the

two

is

true

uncorrelated.

methods

are

if

the

only

To

illustrate

this note that

-

L1

where S,

r

and

is

L2

the

Sf

same

result

correlation

are

respectively.

the

Hence, if

r

(smaller) than Lp.

(6)

r SfS

=

is

coefficient

standard

if

r

=

positive

0,

between

deviation the

two

(negative)

C

of

L1

f,

and

and

f ,

produce

the

C

methods and

and

is

larger

412 Under

is

a

the

random

assumption variable

methods

estimate

same).

The

v a r i a n c e of

which

Finally,

the

is

f;

mean

same

difference

thing

between

known and

pc

completely

(expected

the

the

values

variance

and

uZc,

variance

of

L1

C i

two

are

the

and

the

is

L2

indicates if

variables

that

with

that

both

with

L2

C;

means

has

and

a

smaller

f i

uc

are

and

variance

assumed and

pf

than

to

be

variances

L1.

random and

UZc

u 2 f , t h e n t h e e x p e c t e d v a l u e s o f L1 a n d L 2 a r e

where

is

p

appears

the

that

correlation

L1

and

means

that

daily

l o a d when p

with

L1

decreases

both

is

L2

methods

>

biased

f

of

and

sample

i n c r e a s e s and h e n c e

From of

the bias

size,

while

is a consistent

this

ucuf.

(underestimate)

Furthermore, the

c.

estimates

overestimate

0 (p
independent

as n

between

are

the

it

This mean

associated that

of

L2

estimate

of

UcVf. Under

the assumption that

and Q i s b i v a r i a t e normal, by

and

the

the variances

joint of

L1

d i s t r i b u t i o n of and L2

f

are given

413 respectively.

Lq

i s s u p e r i o r t o L1.

is

the

develop

an

This demonstrates above

that

assumptions,

t h e mean d a i l y

3

is

The d i f f e r e n c e between t h e s e v a r i a n c e s

Lp

maximum

Indeed,

likelihood

under

the

estimator

for

load.

TYPES OF DATA In

order

it

loading, data

to

is

important

available

for

the

to

appropriate understand

(i)

at

the

the which

flow

estimate

concentration ( i i ) at

the

and of

rate

the

and

true

and

type

of

ways

in

which are:

common o f y(ti)

and

x(ti)

concentration flow

rate

the

are

time

at

€(ti)

ti

and

the

C(t;):

, . . . ,t n + l ,

points

the

tO,tl

x(ti+6)

are

t h e amount

of

interval

t h e most

are

for

many

nature

There

t O , t l, . . . , t n + l ,

points

measured

the

calculation.

which t h e d a t a a r e a v a i l a b l e ,

estimator

available,

where

and

0

<

the

Y(ti)

estimate

into the system i n the

< -

6

is

Y(t;)

w a t e r which flowed

(ti*ti+l)

measurements

-

ti+l

ti.

This

and (iii) at

the

but

the

points t

4

. . . ,t n + l

points

concentrations lo,.

we

tO,tl,

..,t

x(ti)

where m

are

<

have

y(ti)

available

at

or

Y(ti)

only

m+2

n.

ESTIMATION OF L O A D I N G Expressions

interval

[O,T]

for

the

are

estimate

presented

for

of

the

each

loading

of

the

d a t a given above.

The t r a p e z o i d a l r u l e i s u s e d

integral,

is not

below

but

are

this

equally

i n t e g r a t i o n method.

a

restriction,

applicable

for

and any

during

three

the

types

of

to estimate the

the

methods

other

given

numerical

474 Data of

4.1

Type 1

The t r a p e z o i d a l r u l e e s t i m a t e o f

where L ( t i )

= x(ti) y(ti),

instantaneous load a ( t i ) . a model

then

E(L)

=

the

L,

Assume t h a t

expected value

1 Q

the

is a s t o c h a s t i c process with

Denote

E(Z),

i s t h e e s t i m a t e of

To d e t e r m i n e t h e p r o p e r t i e s o f

is needed.

for E(t;)

where n ( t ; )

which

L is

2

- t 1-1 . 1

(ti+l

-

a

random

a (ti) i=l + a ( tn+l)(tn+l - tn)}

- {a(to)(tl

=

of

variable

by

2

t o ) +

.

The e x p e c t e d mean s q u a r e e r r o r i s g i v e n b y E(L -

which of

L)*

is c o m p o s e d o f

the

above

to

numerical

be

evaluated

a,,

=

tl

-

two p a r t s

equation.

c h a r a c t e r i s t i c s of

+ (Q - L ) ~ .

= Var(i)

L,

The

as

shown by

first

represents

right the

The n u m e r i c a l

if

we

a

t o ,

a i

assume =

special

ti+i-t;-l

( i

integration

L.

By and

for

side

e r r o r due error

1 , 2 , . . . ,n )

form =

hand

stochastic

while the second r e p r e s e n t s the

integration.

tn+l-tn, then

the

can

setting an+l

=

475 0'

n+l

1

-{

V a r ( ~ )=

a'

2

+

n

n-i+l

I:

I:

4

Furthermore,

i=O

when

the

distance A=ti+l-ti,

measurements

U'

are

equally

spaced

with

the

.

(n + 1/2)

A'

under equal spacing, we have

Generally for

any p u ,

Finally,

expression

the

then

then =

Var(L)

u=l

. . ,( n + l ) ,

u = 1.2,.

= 0 for

When p u

i i+u

u

i=O

j

a a

P

i

the

for

autoregressive

process

where

pU=pu is

4.2

n+l

(n + 5 [ 2

V a r ( L ) = u 2 A'

l

+

p

4p{n-(n+l)

+

-_________

p+pn+lj

1

( 2 n + 1 ) ( n + l 1' ( 1 - p ) '

D a t a of T y p e 2 The

There

data are

two

available

in

estimators

this

case

available

are

which

Y(ti) are

and

x(ti+6).

dependent

on

the

assumptions used. (i)

Under

constant

the

within

assumption

the

interval

that

E

(ti,ti+l),

(x(ti+&))=c(ti) then

as

an

is

a

estimate

f o r L we h a v e n L1

=

1

i =O

Y(ti)

X(ti

+ 6)

.

If Y ( t i ) i s assumed t o be n o n - s t o c h a s t i c

(11)

then

476 n

( i i )

Under

c(t)

is

t o the

assumed

-

i=O

t.

where

an approximation

then

function,

c(t;+6)

for t i

ti)

<

t

5

. +1 t1

( 2 ) becomes

y(ti)

n

1

=

=

(x(ti+6))

E

flow r a t e is obtained as

Y(ti)/(ti+l

=

and hence e q u a t i o n L

smooth

t o be a

instantaneous

Y(ti)

that

assumption

the

1+1

-

.

Jti+l c ( t ) dt ti

ti

By t h e t r a p o z o i d a l r u l e w e h a v e

L

=

1

z"

2

i=O

The measurements

the

using

interpolation

are and c ( t ; ) concentration

c(ti+l) the for

Of

observed

.

+ C(ti))

{c(t;+,)

Y(ti)

They 6.

are

+ ti-1

+ 6,

ti

known.

not at

ti

concentration

BY

+ 6 a n d t ; + l + 6 , t h e e s t i m a t e o f L in t h i s c a s e i s =

x(ti+

Y(ti)

6)

6

- _ 2

4.3

+

1 2

i =O

:

i=O

Data of

Y(ti) .

ti

-

(X(ti

Y(ti)

+ 6 )

than

applied

-

X(ti

6)}

+

-

X(ti-,

+

6 ) )

+

6

2

Type 3

flow to

+ 6 )

i-1

In t h i s case the concentrations the

{x(ti+,

i=O

measurements.

derive

an

The

estimator

a r e measured approach for

the

used

less

frequently

above

loading.

could The

be

only

477

TABLE 1.

Annual Chloride Load and Its Standard Error (x lo5 mt yr-’ ) Type 3

Type 1 Year

.-____-

Load

76-77 76-77 77-78 78-79 79-80 80-81 81-82 82-83

83-84

TABLE 2.

Standard Error

Load

Standard Error

0.807 0.834 0.622 0.931 0.835 0.352 0.540 0.298 0.715

45.413 44.090 43.393 42.066 42.546 39.035 37.670 35.510 34.890

0.599 0.793 0.629 0.883 0.834 0.621 0.601 0.329 0.596

45.502 44.138 42.747 42.191 43.514 39.090 37.775 35.513 35.037

Autocorrelation Function for the Chloride Load Chloride Concentration for 197Sf1976

and

for

Autocorrelation __ Concentration

_ I -

Lag

Load

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1.00 .31 .08 .02 09 01 .03 .06 .03

____________ ________

-_

-. -.

.Ol .05 .07 .02 .02 .07

--

1.00 .36 .12 .08 .06 .05 11 .08 .04 02 -.05 -.01 .00 -.01 .03

-. -.

-___

478 additional of

the

Y(ti)'s

5

computation

values

of

is

the

the

estimation or

concentration

a r e measured but

the x(t;)'s

methods

estimate Lake

described

in

the

the

annual

chloride

Ontario

during

the

concentrations

1

type

of

load. load

are a v a i l a b l e 3

and

are

1,

In T a b l e clear

large.

is

data cases

that

necessary

to

and

weekly

it

appears

order

the

The

on a w e e k l y

basis

while

1,

year

AR(1) model,

2

the

as

for

and

4

between

the

weekly for the

respectively.

data

appropriate.

to

River

to

chloride the

type

3,

the

two

of

the

example.

are

is

given clear

flow

for data chloride

estimates of

the

respectively. estimates

these load

for

3.

type

is

estimates, type

In

1

both

a n a u t o r e g r e s s i v e model Table

2

gives

the

€or t h e l o a d and t h e c o n c e n t r a t i o n

an

and It

the

chloride

year

Using

the standard e r r o r s of

two m e t h o d s

estimating give

used

t h e methods

s t a n d a r d e r r o r of

within

is

AR(l),

1975/1976

Therefore,

concentration that

Niagara

1984.

difference

model

the

are

to

1 and d a t a of

type

To e v a l u a t e t h e

correlation function

the

where

available.

section

from

1975

available

columns

assuming d a t a of

is

not

of

a r e not

previous

load

period

i s a v a i l a b l e on a d a i l y b a s i s .

it

interpolation

points

APPLICATION The

It

the

the

at

the

properties

auto-

for the of

t h e load are c a l c u l a t e d

in

columns

that

the

3

and

chloride

5

of

Table

load

of

the for

1, the

N i a g a r a R i v e r t o Lake O n t a r i o shows a s t e a d y d e c l i n e d u r i n g t h e period of

this

study.

REFERENCES Casey, D . J . and S a l b a c h , S . E . , 1974. IFYGL s t r e a m m a t e r i a l s b a l a n c e s t u d y (IFYGL). P r o c . 1 7 t h Conf. Great Lakes R e s . , I n t e r n a t . A s s o c . Great L a k e s R e s . 668-681. Dolan, D.M., Yui, A.K. and Geist, R . D . , 1981. E v a l u a t i o n of r i v e r load e s t i m a t i o n methods f o r t o t a l phosphorus. J. Great Lakes Res. 7 : 207-214. Thompson, M . G . and B i s c h o p i n g , U . , 1985. On t h e e s t i m a t i o n of m o n t h l y mean p h o s p h o r u s l o a d i n g s . (This volume) Vollenweider, R.A., Rast, W. and Kerekes, J., 1980. The p h o s p h o r u s l o a d i n g c o n c e p t and Great Lakes e u t r o p h i c a t i o n , pp. 207-234. P h o s p h o r u s Management S t r a t e g i e s f o r L a k e s , R . C . L o e h r , C . S . M a r t i n a n d W . R a s t ( e d s . ) , Ann A r b o r S c i . , Ann A r b o r , M i c h . , 4 9 0 p p . S l a t e r , R.W. and Bangay, G . E . , 1980. Action taken t o control In P h o s p h o r u s M a n a g e m e n t phosphorus i n t h e Great Lakes. S t r a t e g i e s f o r Lakes, R.C. L o e h r , C.S. M a r t i n and W. Rast ( e d s . ) , Ann A r b o r S c i . , Ann A r b o r , M i c h . , 4 9 0 p p .

INTERVENTION ANALYSIS OF SEASONAL AND NONSEASONAL DATA TO ESTIMATE TREATMENT PLANT PHOSPHORUS LOADING SHIFTS K. A. Booman, The Soap and D e t e r g e n t A s s o c i a t i o n , New York P. M. Berthouex, The U n i v e r s i t y o f Wisconsin-Madison L a r s P a l l e s e n , T e c h n i c a l U n i v e r s i t y o f Denmark, Copenhagen INTRODUCTION I n t e r v e n t i o n s i n e n v i r o n m e n t a l systems a r e o f t e n p r o m u l g a t e d w i t h o u t complete knowledge o f t h e system.

The p r o m u l g a t o r s o f t e n impose t h e i n t e r v e n t i o n w i t h

a m i x t u r e o f c o n f i d e n c e and hope--confidence a r e a l change i n t h e d e s i r e d d i r e c t i o n

t h a t t h e i n t e r v e n t i o n w i l l cause

tempered by a degree o f h o p e f u l

s p e c u l a t i o n t h a t t h e s h i f t w i l l be l a r g e enough t o j u s t i f y t h e c o s t o f t h e intervention. S c i e n t i s t s n a t u r a l l y want t o e v a l u a t e t h e e f f e c t i v e n e s s o f t h e i n t e r v e n t i o n a f t e r d a t a have become a v a i l a b l e .

The g e n e r a l problem i s t o e s t i m a t e " t h e

e f f e c t o f an i n t e r v e n t i o n t h a t has been made w i t h t h e i n t e n t o f c a u s i n g a system t o change where t h e b e h a v i o r o f t h e system i s i n d i c a t e d by a s e t o f data t h a t are a time

s e r i e s , and so t h e o r d e r i n which t h e d a t a o c c u r as

w e l l as t h e i r magnitude i s i m p o r t a n t " (Box and T i a o , 1975). E s t i m a t i n g t h e change i n phosphorus l o a d e n t e r i n g a sewage t r e a t m e n t p l a n t when a d e t e r g e n t phosphate ban goes i n t o e f f e c t seems t o be a s t r a i g h t f o r w a r d task.

Abundant u s e f u l d a t a e x i s t .

And, a c a l c u l a t i o n comes q u i c k l y t o mind.

Use s i m p l e averages t o c h a r a c t e r i z e t h e l e v e l s b e f o r e and a f t e r t h e ban. U n f o r t u n a t e l y , t h i s method w i l l f r e q u e n t l y g i v e m i s l e a d i n g r e s u l t s . simple

average

(horizontal)

assumes

level

that

there

has

about which f l u c t u a t i o n s

been

a

occur.

long-term

If

Using a

stationary

there i s a trend

( s t o c h a s t i c o r d e t e r m i n i s t i c , l i n e a r o r n o n l i n e a r , upward o r downward), average

i s n o t a good r e p r e s e n t a t i o n o f t h e t i m e s e r i e s .

this

I f there i s a

seasonal p a t t e r n , a r b i t r a r y d e c i s i o n s must be made a b o u t how t o " c u t o u t " a s e c t i o n o f t h e d a t a o v e r w h i c h t h e ban t o o k e f f e c t .

480 More i m p o r t a n t l y ,

commonly a p p l i e d s i g n i f i c a n c e t e s t s assume t h a t t h e d a t a

a r e independent o f each o t h e r i n t i m e , i . e . , a r e random.

t h a t v a r i a t i o n s a b o u t t h e model

Most e n v i r o n m e n t a l t i m e s e r i e s d a t a t e n d t o be a u t o c o r r e l a t e d .

I g n o r i n g t h e a u t o c o r r e l a t i o n w i l l make (1) t h e averages seem more p r e c i s e t h a n t h e y a c t u a l l y a r e and ( 2 ) b i a s t h e s i g n i f i c a n c e t e s t s t o w a r d i n d i c a t i n g a s t a t i s t i c a l l y s i g n i f i c a n t change i n l e v e l when none has o c c u r r e d . Time

series

analysis

provides

a u t o c o r r e l a t i o n i n t o account. o f t h e f o r m ARIMA ( O , l , l )

the

means

for

properly

taking

the

A s i m p l e o b s e r v a t i o n e r r o r random walk model

f i t s many d a t a s e t s .

Where t h e r e i s an annual

seasonal p a t t e r n , a more c o m p l i c a t e d model must be used. F i g u r e s 1, 2, and 3 show some d a t a and t h e f i t t e d model.

The A R I M A ( O , l , l )

model f i t s t h e Kalamazoo and Saginaw, M i c h i g a n d a t a ( F i g u r e 1).

In all, it

gave an adequate f i t t o 12 o f 2 1 d a t a s e t s f o r Wisconsin t r e a t m e n t p l a n t s , two o f which a r e n o t r e p o r t e d because l o c a l c o n d i t i o n s a r e h i g h l y a t y p i c a l , and 6 o f 10 d a t a s e t s f o r M i c h i g a n t r e a t m e n t p l a n t s .

The Ann A r b o r , E a s t

Lansing, ( F i g u r e s 2 and 3 ) and Warren d a t a i l l u s t r a t e a seasonal p a t t e r n f o r which t h e s i m p l e model i s i n a d e q u a t e .

A seasonal model t h a t has been used

s u c c e s s f u l l y t o f i t these t h r e e Michigan p l a n t s i s presented.

THE B A S I C (NONSEASONAL) MODEL The b a s i c model i s s i m p l e i n concept.

The t i m e s e r i e s i s a f f e c t e d by an

i n t e r v e n t i o n t h a t i s n o t f u l l y r e a l i z e d u n t i l a few months l a t e r (we have used a four-month

t r a n s i t i o n gap f o r t h e d e t e r g e n t ban p r o b l e m ) .

t h i s gap a r e n o t used i n t h e i n t e r v e n t i o n a n a l y s i s , values,

they

represent

neither

the

before

or

Data i n

because as t r a n s i t i o n after

situation.

An

e x p o n e n t i a l l y w e i g h t e d moving average (EWMA) i s used t o e s t i m a t e t h e l e v e l s i m m e d i a t e l y b e f o r e and a f t e r t h e gap.

The magnitude of t h e e f f e c t o f t h e

i n t e r v e n t i o n i s t h e d i f f e r e n c e o f t h e two, (O,l,l) degree

since t h e f o r e c a s t f o r ARIMA

o r o b s e r v a t i o n e r r o r / r a n d o m w a l k model, of

uncertainty

associated w i t h

i n c r e a s e s as t h e . f o r e c a s t

the

i s a horizontal line.

estimated

effect

i n t e r v a l i n c r e a s e s and t h i s must be accounted f o r

when a s s e s s i n g t h e p r e c i s i o n o f t h e e s t i m a t e d e f f e c t . P a l l e s e n e t a l . (1985).

intervention

The

Details are given i n

6.4

- . _ ..

.............................................. . . :. . . . . . . . . . . . .. . . .S a. g. i.n.a .w . . . .

6.2

- -

6.0 5.8 5.6 5.4

- . u...T*

. . . . . . . . . . . . .

- . . . . . . . . . . . . . . . . . .. . . . . . . ................. . . .......... . . . . . . . . . . . . . . . . . .~ . . . . . . . . . . . . ... ln(inf. P, kg/day) . . ...........................

-

-

,

.

ban

5. '

'

'

'

20 '

'

'

'

'

'

40

3

,i 1 -2

1

'

'

60

80

month

month FIGURE 1 . SAGINAW AND KALAMAZOO DATA AND F I T OF THE RANDOM WALK-OBSERVATION ERROR MODEL.

100

P

CD N

6.6 6.4 6.2 6.0 5.8

l n ( i n f . P , kg/day)

5.6 5.4

month

0.22

--

0.12 . .

0.02

. . . . . .-

-

- 0.08

-

-- 0.18

0

20

40

60

month

ao

FIGURE 2. ANN ARBOR DATA, F I T OF RANDOM WALK, SEASONAL RANDOM WALK, OBSERVATION ERROR MODEL, AND THE SEASONAL COMPONENT OF THE F I T .

100

6.0

5.8

.........

5.6 5.4

. . . . . . . .

5.2

. . . . . . .

5.0

0

20

40

60

80

100

month

0.16

0.06 -0.02 . . . . . . . .

-0.12

-0.22 month

F I G U R E 3 . E A S T L A N S I N G DATA, F I T OF RANDOM WALK, SEASONAL RANDOM WALK, OBSERVATION ERROR MODEL, AND SEASONAL COMPONENT OF F I T .

484 Using

the

EWMA

instead o f

a

simple

average

to

estimate

the

pre-

and

p o s t - l e v e l s a r i s e s f r o m t h e t i m e s e r i e s model t h a t was used t o d e s c r i b e t h e data series.

The m a t h e m a t i c a l model d e s c r i b e s t h e d a t a as b e i n g g e n e r a t e d by

a process t h a t d r i f t s i n t h e p a t t e r n o f a random w a l k and w h i c h c a n n o t be observed w i t h o u t e r r o r . The model i s Y

t

= y + e t t

where Y t

i s the underlying true,

b u t unmeasurable v a l u e o f t h e v a r i a b l e , yt

i s observed v a l u e o f t h e v a r i a b l e , et i s t h e "measurement" e r r o r , and Et t h e shock c a u s i n g t h e random w a l k ( d r i f t ) . s e r i e s has s u b s t a n t i a l i n t u i t i v e appeal. accept

than

completely

random

a p p l i c a t i o n discussed i n t h i s

This representation o f the data

In p a r t i c u l a r , i t seems e a s i e r t o

variation paper,

is

about

the Yt

a

will

fixed

level.

be mass f l u x

In of

the

total

phosphorus o r t h e l o g a r i t h m s o f t h i s q u a n t i t y . Equations 1 and 2 can be combined t o e x p r e s s t h e model as

i n which at and a E

a r e random "shocks" ( w h i t e n o i s e ) and r e l a t e d t o et and t-1 i n E q u a t i o n s 1 and 2. T h i s i s t h e ARIMA ( O , l , l ) model, w h i c h i s d e s c r i b e d

t i n Box and J e n k i n s (1976) and Harvey (1981).

The s i n g l e

parameter o f

t h i s model,

e x p o n e n t i a l l y w e i g h t e d moving averages.

8 becomes t h e w e i g h t i n g f a c t o r f o r D e t a i l s o f how t h i s parameter i s

e s t i m a t e d a r e g i v e n i n P a l l e s e n e t a l . (1985).

THE SEASONAL MODEL I d e n t i f y i n g a s i m p l e , u s e f u l model has been c h a l l e n g i n g .

While t h e data sets

t h a t a r e seasonal show an annual c y c l e , t h i s c y c l e i s n o t smoothly s i n u s o i d a l and s o i t has been d i f f i c u l t t o model.

485

The seasonal model t h a t f i t s t h e E a s t L a n s i n g d a t a and a l s o d a t a f o r Warren and Ann A r b o r , M i c h i g a n ,

Yt

=

is

y t + st + et

S t= * t - 1 2

+

(4)

ft

T h i s model i n c o r p o r a t e s an annual p a t t e r n , b u t i t i s n o t a d e t e r m i n i s t i c s i n e p a t t e r n ; i n s t e a d , i t a l l o w s more f l e x i b i l i t y . f o r by E q u a t i o n 4.

O b s e r v a t i o n e r r o r i s accounted

E q u a t i o n s 4, 5 and 6 a r e c o n s i d e r e d t o r e p r e s e n t a u s e f u l

seasonal model. I n t h e s e e q u a t i o n s , st i s t h e seasonal e f f e c t component a t t i m e t and f t i s t h e random shock c a u s i n g ( o r a s s o c i a t e d w i t h ) change i n t h e seasonal e f f e c t component.

The model i s s i m p l e and y i e l d s l e v e l f o r e c a s t s .

The smoothness

priors-state

space approach o f Kitagawa and Gersch (1984),

Gersch and Kitagawa (1983),

and Kitagawa ( 1 9 8 1 ) , a f t e r m o d i f i c a t i o n t o make

i t f u l l y Bayesian, was used t o f i t t h e model g i v e n by E q u a t i o n s 4, 5, and 6

and t o e s t i m a t e t h e i n t e r v e n t i o n e f f e c t .

Kalman f i l t e r e q u a t i o n s a r e used t o

c a r r y o u t a p p r o x i m a t e maximum l i k e l i h o o d e s t i m a t i o n . The c a l c u l a t i o n s r e q u i r e d f o r a n a l y s i s o f t h e nonseasonal and seasonal d a t a s e t s were r e a d i l y programmed i n APL and c a r r i e d o u t on a microcomputer. Models o f t h e A R I M A f a m i l y c o n t a i n i n g t h e o p e r a t o r ( 1

-

3”’B

2

+ 6 ) , which

d e f i n e s a s i n u s o i d a l p a t t e r n w i t h 12-month c y c l e were a l s o t r i e d , f o r example (1 -

3 1 / 2 ~+ B ~ ) Y , = ( 1 -

a

This d e t e r m i n i s t i c sinusoida

(7) p a t t e r n was n o t s a t i s f a c t o r y .

486

RESULTS OF INTERVENTION ANALYSIS The phosphate d e t e r g e n t ban i n M i c h i g a n was i n f o r c e on October 1, 1977.

The

d a t a s e t s s t a r t J a n u a r y 1975 and r u n f o r seven y e a r s . The Wisconsin h i s t o r y i s more c o m p l i c a t e d . 1,

1979.

T h i s ban was

A ban was p u t i n t o e f f e c t on J u l y l a t e r , J u l y 1, 1982, and was

l i f t e d t h r e e years

reimposed on January 1, 1984.

TABLE 1.

ESTIMATED INFLUENT PHOSPHORUS LOAD SHIFTS DUE TO PHOSPHATE DETERGENT REGULATION I N W I S C O N S I N AND M I C H I G A N

Treatment P l a n t

S h i f t due t o ban' kglcap-yr

S h i f t due t o ban l i f t ' kglcap-yr

Algoma, W I

-1.061

t0.410

Brookfield, W I

-0.133

to. 179

Burlington, W I

-0.150

+0.254

Clintonville, W I

-0.094

t0.065

Kewaskum, W I Omro, W I

-0.251 -

-0.137 +O. 242

Racine, W I 3 Milwaukee, W I Grand Rapids, M I

-0.585

+0.277

-0.522

+O. 193

-0.247

NA4

Jackson, M I

-0.356

NA

Kal amazoo, M I

-0.833

NA

Lansing, M I

+0.201

NA

Midland, M I

-0.881

NA

Saginaw, M I

-0.508

NA

East Lansing, M I

-0.257

NA

Warren, M I

-1.17

NA

Ann A r b o r , M I

-0.581

NA

'October

1, 1977, f o r M i c h i g a n ; J u l y 1, 1979, f o r W i s c o n s i n

n

L

J u l y 1, 1982, f o r W i s c o n s i n o n l y ; no ban l i f t i n M i c h i g a n .

3 R e s u l t r e p o r t e d f o r t h e combined Jones I s l a n d and South Shore p l a n t s ; t h e A R I M A ( O , l , l ) model a l s o f i t s each p l a n t i n d i v i d u a l l y .

4NA

-

n o t a p p l i c a b l e s i n c e t h e r e was no ban l i f t i n M i c h i g a n .

487 Summory o f Jnterventfon Anolysfe Results 1.2

0.9

.

i

. . . . . . . . . . . . . . . . . . . . . .. . . . . . .

'

......

..

......

'I

U

-

0

a

.

0.3

t

....

......

'

Kg

................. ........

........

..........................................

I

Hlchlgon 1977

Wisconsin 1979

Yieconsfn 1982

FIGURE 4 . BOX AND WHISKER PLOT OF THE ESTIMATED EFFECTS OF THE BAN I N MICHIGAN AND THE BAN AND BAN-LIFT I N WISCONSIN ON WASTEWATER TREATMENT PLANT INFLUENT PHOSPHORUS LOADS.

The d a t a a n a l y z e d were m o n t h l y averages, day.

Calculations

were

done

after

P per

expressed as pounds o f t o t a l

taking

the

natural

logarithm

since

e x p e r i e n c e showed t h a t t h i s s t a b i l i z e s t h e v a r i a n c e . The e s t i m a t e d s h i f t s

i n i n f l u e n t phosphorus mass l o a d i n g f o r t h e s i x t e e n

p l a n t s f o r w h i c h t h i s model was adequate a r e g i v e n i n T a b l e 1.

These r e s u l t s

have been c o n v e r t e d f r o m t h e l o g a r i t h m i c t o t h e o r i g i n a l m e t r i c and t h e n converted t o kglcapita-year. The r e s u l t s f o r E a s t L a n s i n g , Warren, and Ann A r b o r were e s t i m a t e d u s i n g t h e seasonal

model.

All

other

results

come

from

the

simple

random

w a l k / o b s e r v a t i o n e r r o r model. F i g u r e 4 i s a box and w h i s k e r p l o t o f t h e p e r c a p i t a 1.

This

plot

shows t h e c o n s i d e r a b l e

variation

s h i f t s l i s t e d i n Table

between c i t i e s .

It also

i n d i c a t e s t h a t t h e e f f e c t o f a d e t e r g e n t ban i s a b o u t 0.3 k g / c a p i t a - y e a r .

488

CONCLUSIONS An A R I M A ( O , l , l ) percent o f

t i m e s e r i e s model has been used s u c c e s s f u l l y on a b o u t s i x t y

the data

set

analyzed t o

successfully a p p l i e d t o t h r e e data sets.

date.

A

seasonal

model

has

been

The model may p o s s i b l y be adequate

f o r t h e r e m a i n i n g s e t s b u t t h e a n a l y s i s i s n o t complete a t t h i s t i m e .

The

e f f e c t o f a d e t e r g e n t phosphate ban on i n f l u e n t wastewater t r e a t m e n t p l a n t P l o a d s appears t o be a b o u t 0.3 kg/cap. yr., as o f 1982.

489 REFERENCES

Box, G.E.P. and J e n k i n s , G.M., 1976. Time S e r i e s A n a l y s i s : F o r e c a s t i n g and C o n t r o l . Revised e d i t i o n , Holden-Day, Oakland, C a l i f o r n i a . 1965. A change o f l e v e l o f a n o n s t a t i o n a r y t i m e Box, G.E.P. and T i a o , G.C., s e r i e s , B i o m e t r i k a , 52: 181-192. 1975. I n t e r v e n t i o n a n a l y s i s w i t h a p p l i c a t i o n s t o Box, G.E.P. and Tiao, G.C., economic and e n v i r o n m e n t a l problems. J o u r . Amer. S t a t . Assoc., 70: 70-79. Gersch, W. and Kitagawa, G., 1983. The p r e d i c t i o n o f t i m e s e r i e s w i t h t r e n d s and s e a s o n a l i t i e s . J o u r . o f Bus. and Econ. S t a t . , 1: 253-264. Harvey, A. C., 1980. Time S e r i e s Models. H a l s t e d Press, New York. Kitagawa, G., 1981. A n o n s t a t i o n a r y t i m e s e r i e s model and i t s f i t t i n g by a r e c u r s i v e f i l t e r . J o u r . Time S e r i e s A n a l y s i s , 2: 103-116. Kitagawa, G. and Gersch, W., 1984. A smoothness p r i o r - s t a t e space m o d e l l i n g o f t i m e s e r i e s w i t h t r e n d and s e a s o n a l i t y . J o u r . Amer. S t a t . Assoc., 79: 378-389. P a l l e s e n , L., Berthouex, P.M. and Booman, K.A., 1985. E n v i r o n m e n t a l i n t e r v e n t i o n a n a l y s i s : W i s c o n s i n ' s ban on phosphate d e t e r g e n t s . Water Research, 19: 353-362.

SEDIMENT RESPONSES DURING STORM EVENTS I N SMALL FORESTED WATERSHEDS

W.A.

RIEGER and L.J. OLIVE

Department o f Geography Royal M i l i t a r y C o l l e g e ACT A u s t r a l i a

ABSTRACT Measurements o f s u s p e n d e d s e d i m e n t c o n c e n t r a t i o n and d i s c h a r g e d u r i n g storm e v e n t s a r e examined t o d e t e r m i n e t h e possible p a t t e r n s i n response of sediment t o flow i n f i v e s m a l l f o r e s t e d w a t e r s h e d s . The e x a m i n a t i o n o f s e d i m e n t r e s p o n s e i s c a r r i e d o u t i n t w o c o n t e x t s : ( a ) The r e s p o n s e o f s u s p e n d e d s e d i m e n t t o t o t a l d i s c h a r g e ( b a s e f l o w and q u i c k f l o w or s t o r m f l o w ) , o r i n t h e framework commonly u s e d f o r sediment p r e d i c t i o n modelling. ( b ) The r e s p o n s e o f s u s p e n d e d s e d i m e n t t o q u i c k f l o w , where q u i c k f l o w i s p o s t u l a t e d as a p o s s i b l e mechanism o f s e d i m e n t d e l i v e r y t o t h e channel. I n b o t h c o n t e x t s , h y s t e r e s i s d i a g r a m s are f i r s t u s e d t o d e t e r m i n e t h e b r o a d p a t t e r n s between s u s p e n d e d s e d i m e n t c o n c e n t r a t i o n and f l o w R e s u l t s i n d i c a t e t h a t seven d i f f e r e n t r e s p o n s e i n t h e time domain. Spectral analysis is then t y p e s are o p e r a t i n g i n t h e w a t e r s h e d s . used on t h e s t o r m e v e n t d a t a i n an a t t e m p t t o i s o l a t e p o s s i b l e f a c t o r s which may be c a u s i n g t h e d i f f e r e n t r e s p o n s e types. The temporal and s p a t i a l v a r i a t i o n s found t o be o p e r a t i n g i n t h e w a t e r s h e d s have i m p o r t a n t i m p l i c a t i o n s f o r b o t h t h e d e s i g n o f m o n i t o r i n g n e t w o r k s and t h e a s s o c i a t e d water s a m p l i n g t e c h n i q u e s ; and f o r t h e commonly u s e d l i n e a r p r e d i c t i v e methods of e s t i m a t i n g sediment l o a d s .

1.

INTRODUCTION

Suspended s e d i m e n t c o n c e n t r a t i o n s (mg 1-1) i n s t r e a m c h a n n e l s h a v e been used by r e s e a r c h e r s a s m e a s u r e s o f r a t e s o f e r o s i o n and s o i l During s t o r m e v e n t s i n a b a s i n , complex l o s s from d r a i n a g e b a s i n s . e r o s i o n a l p r o c e s s e s o c c u r over p a r t s o f t h e s l o p e s of t h e b a s i n and t h e r e s u l t a n t o f t h e s e processes i s s e d i m e n t d e l i v e r y t o t h e c h a n n e l ( W a l l i n g , 1 9 8 3 ) . By m o n i t o r i n g b o t h d i s c h a r g e (m3sec-l) and suspended s e d i m e n t c o n c e n t r a t i o n s a t a p o i n t i n t h e c h a n n e l , a measure o f s e d i m e n t d e l i v e r y c a n be o b t a i n e d f o r t h e c o r r e s p o n d i n g w a t e r s h e d area.

491 While such d a t a can be used f o r bulk e s t i m a t e s of e r o s i o n , t h e y can a l s o be used f o r t h e determination of t h e behaviour of suspended sediment c o n c e n t r a t i o n s , o r sediment responses, d u r i n g storm events. Once t h e responses a r e known and f u l l y understood, they can form t h e b a s i s f o r water q u a l i t y r e s e a r c h on suspended sediment. Networks can be designed s o a l l important a s p e c t s of sediment response a r e monitored, o r p o s s i b l e p r e d i c t i o n models developed based on t h e known response p a t t e r n s . I n t h e p a s t , however, t h e bulk of r e s e a r c h on sediment response has been hampered by t h e r e l a t i v e l y s i m p l i s t i c models of suspended sediment behaviour. The models a r e based on a simple l i n e a r r e l a t i o n s h i p between d i s c h a r g e and suspended sediment and w e r e o r i g i n a l l y developed f o r t h e p r e d i c t i o n of suspended sediment i n t h e form of a r a t i n g curve. The curve t a k e s t h e form:

c = a g b ,

(1)

where C i s suspended sediment c o n c e n t r a t i o n , Q i s discharge, a and b a r e c o n s t a n t s f o r a p a r t i c u l a r watershed. The curves a r e estimated from a sample of f i e l d d a t a c o n s i s t i n g of a wide range of discharges and t h e corresponding c o n c e n t r a t i o n s , u s i n g l e a s t squares r e g r e s s i o n on t h e l o g a r i t h m i c a l l y transformed d a t a . Though r a t i n g curves are convenient t o use, t h e i r simple l i n e a r framework g i v e s l i t t l e i n d i c a t i o n of t h e dynamic behaviour of t h e r e l a t i o n s h i p between c o n c e n t r a t i o n and discharge. Slope e r o s i o n , and thus sediment d e l i v e r y , i s a storm event based phenomenon with important temporal v a r i a t i o n s occuring throughout t h e storm. To use a simple l i n e a r model t o d e s c r i b e t h e behaviour of sediment d e l i v e r y ignores t h e s e q u e n t i a l n a t u r e of t h e v a r i a b l e s C and Q, and t h e f i x e d c o e f f i c i e n t s of t h e r a t i n g curve do not allow f o r p o s s i b l e v a r i a t i o n s i n t h e response of suspended sediment c o n c e n t r a t i o n a t d i f f e r e n t s c a l e s o r l e v e l s of d i s c h a r g e .

I n t h e following d i s c u s s i o n , t h e s e two important a s p e c t s of suspended sediment response a r e examined f o r d a t a obtained d u r i n g storm events. The s e q u e n t i a l n a t u r e of sediment response t o both t o t a l d i s c h a r g e and t o quickflow (stormflow) i s considered i n terms of h y s t e r e s i s diagrams which g i v e an i n d i c a t i o n of t h e behaviour of t h e v a r i a b l e s i n t h e time domain. P o s s i b l e s c a l e v a r i a t i o n s between suspended sediment and d i s c h a r g e are then considered by a t r a n s f e r t o t h e frequency domain, o r v i a s p e c t r a l a n a l y s i s . Conclusions a r e then drawn concerning t h e i m p l i c a t i o n s of varying response p a t t e r n s t o water q u a l i t y monitoring.

492 2.

STUDY AREA AND DATA

Data f o r t h e a n a l y s i s a r e from f i v e small f o r e s t e d watersheds i n south e a s t e r n New South Wales, with a l l f i v e streams flowing i n t o t h e Wallagaraugh River. The watersheds a r e a d j a c e n t t o one another and vary i n s i z e from 76ha t o 225ha. Within each watershed, a 140° V-notch weir has been i n s t a l l e d , s t a g e was measured w i t h a Rimco Sumner Mark I1 f l o a t r e c o r d e r , and water samples were taken with a Gamet automatic water sampler. The water samplers take p o i n t samples and a r e f l o a t switch o p e r a t e d . Concentration of suspended sediment f o r each water sample was determined using a membrane f i l t r a t i o n technique. The p r e s e n t a n a l y s i s i s based on t h e p e r i o d J u l y 1977 t o June 1979. During t h i s p e r i o d , 20 storm e v e n t s , with r a i n f a l l s varying from 12 t o 339mm were sampled i n t h e f i v e watersheds. Due t o equipment f a i l u r e and i n some i n s t a n c e s , l i t t l e o r no sediment response i n t h e watersheds during storm e v e n t s , a t o t a l of 39 i n d i v i d u a l storm hydrographs have been analysed. Both t h e d i s c h a r g e s e r i e s and t h e suspended sediment c o n c e n t r a t i o n s e r i e s f o r t h e s e 39 storm events were i n t e r p o l a t e d t o one hour time i n t e r v a l s before t h e a n a l y s i s was carried out.

3.

TIME DOMAIN ANALYSIS

3.1 Suspended Sediment Response t o Discharge To o b t a i n some idea of t h e broad behaviour between suspended sediment c o n c e n t r a t i o n and t o t a l d i s c h a r g e (baseflow and q u i c k f l o w ) , h y s t e r e s i s p l o t s w e r e used. These p l o t s a r e simply a s c a t t e r diagram f o r t h e two v a r i a b l e s , with t h e s z q u e n t i a l a s p e c t of t h e d a t a denoted by j o i n i n g a d j a c e n t p o i n t s i n t h e t i m e series with a straight line. Before t h e p l o t s were c o n s t r u c t e d , a t h r e e p o i n t moving average f i l t e r was a p p l i e d t o t h e two s e r i e s , t h u s removing high frequency components s o only broad p a t t e r n s of sediment response were i n d i c a t e d with t h e h y s t e r e s i s diagrams. The h y s t e r e s i s p l o t s f o r t h e d a t a from t h e 39 storm e v e n t s i n d i c a t e d t h a t seven d i f f e r e n t suspended sediment response types were o p e r a t i n g i n t h e watersheds ( F i g u r e 1 ) . A f u l l d e s c r i p t i o n of each of t h e s e responses i s given by Olive and Rieger ( 1 9 8 5 ) , and a b r i e f summary of t h e i r c h a r a c t e r i s t i c s i s a s follows: ( a ) S i n g l e rise storm events with sediment l e a d , o r a simple clockwise loop, occurred i n f o u r of t h e watersheds and made up 23% of the storm events analysed ( b ) S i n g l e rise storm events with sediment l a g , o r a c o u n t e r clockwise loop, occurred i n t h r e e watersheds and made up 8% of t o t a l events

493 SINGLE RISE

DD (a) Sediment lead

(b) Sediment lag

(c) Sediment-discharge correlation

MULTIPLE RISE

(d) Sediment lead

9 (e) Sediment lag

(f) Sediment lead-lag

(g) No recognisable pattern

DISCHARGE (cumecs)

F i g u r e 1:

)7

Sediment response t y p e s f o r storm e v e n t s .

494 ( c ) S i n g l e r i s e with t h e sediment and d i s c h a r g e peaks i n phase occurred i n two watershed s and made up 5% of t h e t o t a l e v e n t s ( d ) Multiple r i s e with sediment l e a d and sediment d e p l e t i o n occurred i n two watershed s and made up 10% of t o t a l e v e n t s ( e ) Multiple r i s e with sediment l a g and sediment d e p l e t i o n occurred i n two watershed s and made up 5% of t o t a l e v e n t s ( f ) Multiple rise with sediment l e a d and l a g occurred i n t h r e e watershed s and made up 8% of t o t a l e v e n t s ( g ) Responses i n which t h e r e was no i d e n t i f i a b l e p a t t e r n occurred i n f o u r watershed s and made up 41% of t h e t o t a l e v e n t s . These r e s u l t s a r e made more complicat ed when i n d i v i d u a l watershed s and storm e v e n t s a r e taken i n t o c o n s i d e r a t i o n . Over t h e two year study p e r i o d , p a r t i c u l a r streams demonstra ted up t o f i v e d i f f e r e n t response types during storm e v e n t s and no stream showed t h e same There were a l s o major d i f f e r e n c e s i n response type f o r a l l storms. s f o r p a r t i c u l a r storm watershed e v i f e h t among type response The dominance of response types with no i d e n t i f i a b l e events. p a t t e r n ( 4 1 % of t h e t o t a l storms a n a l y s e d ) p o i n t s f u r t h e r t o t h e complexit y of t h e behaviour of suspended sediment c o n c e n t r a t i o n s . 3 . 2 Suspended Sediment Response t o Quickflow

The examinati on of t h e response of suspended sediment t o quickflow , o r storm flow, i s i n t h e realm of p r o c e s s s t u d i e s i n t h a t t h e complexit y of t h e sediment d e l i v e r y problem i s reduced t o a form where quickflow i s p o s t u l a t e d a s a p o s s i b l e d e l i v e r y mechanism (Walling and Webb, 1982). Since t h e source of sediment i s w i t h i n a watershed and t h e sediment i s t r a n s p o r t e d t o t h e channel by s u r f a c e runoff and i n t e r f l o w , quickflow has appeal a s a p o s s i b l e d e l i v e r y mechanism. Baseflow s e p a r a t i o n was c a r r i e d out on t h e d i s c h a r g e series f o r storm e v e n t s using a r e c u r s i v e d i g i t a l f i l t e r proposed by Lyne and Hollich ( 1 9 7 9 ) . The f i l t e r i n g process t a k e s t h e form: Qq(t)

=

a

*

Qs(t-1)

+

(l+a)/2

*

[Q(t)-Q(t-l)],

(2)

where Q q ( t ) i s t h e quickflow component Q ( t ) is t o t a l streamflow a is the f i l t e r i n g coefficient. The values used f o r t h e c o e f f i c i e n t , a , were i n t h e range 0.7 t o 0.9 and phase c h a r a c t e r i s t i c s of t h e s e r i e s were p r e s e r v e d with a two pass forward and backward a p p l i c a t i o n of t h e f i l t e r . H y s t e r e s i s p l o t s were generated f o r suspended sediment c o n c e n t r a t i o n The and quickflow i n t h e same f a s h i o n a s o u t l i n e d i n S e c t i o n 3.1. follows: s a e r a s t n e v e storm 39 e h t r results fo

( a ) Those responses which showed i d e n t i f i a b l e p a t t e r n s , o r types ( a ) through ( f ) i n Section 3.1, showed s i m i l a r behaviour i n t h e sediment responses t o quickflow ( b ) Approximately h a l f of t h e storm events which showed no i d e n t i f i a b l e p a t t e r n i n t h e sediment - d i s c h a r g e p l o t s had i d e n t i f i a b l e p a t t e r n s i n sediment response t o quickflow. Thus i n using quickflow a s a p o s s i b l e sediment d e l i v e r y mechanism, sediment responses a r e almost a s complex a s were t h e responses t o t o t a l discharge. I n d i v i d u a l watersheds d i s p l a y e d d i f f e r i n g responses throughout t h e study p e r i o d and t h e r e was v a r i a t i o n i n response among t h e f i v e watersheds t o p a r t i c u l a r storm e v e n t s . The major d i f f e r e n c e with response t o quickflow was a r e d u c t i o n of t h e responses showing no i d e n t i f i a b l e p a t t e r n t o 2 1 % of t h e t o t a l storms studied. 4.

FREQUENCY DOMAIN ANALYSIS

The o b j e c t of t r a n s f e r r i n g t o the frequency domain v i a s p e c t r a l a n a l y s i s , i s t o i s o l a t e t h e important frequency components, o r s c a l e s , which might be p a r t of the p r o c e s s o p e r a t i n g between discharge and suspended sediment c o n c e n t r a t i o n . Frequency domain a n a l y s i s a l s o has t h e p o s s i b l e b e n e f i t of reducing t h e complexity of sediment response a s i n d i c a t e d by t h e time domain a n a l y s i s i n t h e above d i s c u s s i o n . The maximum Entropy Method ( M E M ) was used f o r t h e c a l c u l a t i o n of s p e c t r a l e s t i m a t e s . MEM was p r e f e r r e d t o t h e more t r a d i t i o n a l methods of s p e c t r a l e s t i m a t i o n ( J e n k i n s and Watts, 1968) because it can be used f o r s h o r t time s e r i e s (Ulrych and Bishop, 1975), which i s t h e case f o r storm event d a t a . Most of t h e 39 e v e n t s , used h e r e , have fewer than 1 0 0 o b s e r v a t i o n s i n t h e d i s c h a r g e and suspended sediment c o n c e n t r a t i o n series. A s t h e frequency domain a n a l y s i s i s t o be used f o r t h e study of t h e process o p e r a t i n g between d i s c h a r g e and c o n c e n t r a t i o n , t h e d a t a f o r each of t h e s e v a r i a b l e s w e r e combined t o g i v e a new v a r i a b l e which was r e p r e s e n t a t i v e of t h a t p r o c e s s . This new v a r i a b l e was generated by c a l c u l a t i n g t h e s l o p e between c o n c e n t r a t i o n and d i s c h a r g e f o r adjacent p o i n t s i n t i m e , g i v i n g a new series which measures t h e changing s e q u e n t i a l r e l a t i o n s h i p between t h e two v a r i a b l e s . I n e f f e c t , t h e new s e r i e s r e p r e s e n t s t h e s l o p e angle of a d j a c e n t p o i n t s along t h e h y s t e r e s i s p l o t f o r a storm event.

The g e n e r a l i s e d r e s u l t s of t h e s p e c t r a l a n a l y s i s of t h e 39 storm events a r e shown i n Figure 2 which can be summarised a s follows: ( a ) A l l storm e v e n t s a r e dominated by a low frequency component which i s l i k e l y a m a n i f e s t a t i o n of t h e broad loop i n t h e hysteresis plots.

-

.........

0

.1

identifiable sediment responses unidentifiable sediment responses

.2 FREQUENCY

Figure 2:

.3 (CYCLES HR-’)

Generalised spectra f o r storm event data.

.4

.5

497 ( b ) A l l storm events showed a high frequency component i n t h e range 0.37 t o 0.50 cycles h r - l . This frequency l i k e l y corresponds t o t h e f l u c t u a t i o n s about t h e broad h y s t e r e s i s loop, and was removed by t h e moving average f i l t e r a p p l i e d t o t h e d a t a when t h e h y s t e r e s i s p l o t s were generated i n Section 3 . ( c ) Those storm e v e n t s which showed no i d e n t i f i a b l e p a t t e r n i n t h e time domain were d i f f e r e n t i a t e d from t h e e v e n t s with recognisable p a t t e r n s by a mid-erequency component i n t h e range 0.20-0.33 cycles hr-l.

5.

CONCLUSIONS

The examination of t h e behaviour of suspend sediment c o n c e n t r a t i o n s i n stream channels d u r i n g storm e v e n t s has i n d i c a t e d a complex set of responses which have some important i m p l i c a t i o n s f o r water q u a l i t y monitoring and p r e d i c t i o n models. I n t h e c a s e of monitoring, both t h e temporal and t h e s p a t i a l v a r i a t i o n i n sediment response have t o be considered i n any sampling network. Since sediment response v a r i e s between watersheds f o r i n d i v i d u a l storms, sampling a p a r t i c u l a r watershed and e x t r a p o l a t i n g t h e r e s u l t s t o a d j a c e n t watersheds may prove t o be i n v a l i d . The a c t u a l sampling r a t e a t p o i n t s w i t h i n t h e network a l s o needs c a r e f u l c o n s i d e r a t i o n . For example, r a t e - o f - r i s e water samplers assume t h a t sediment response t o d i s c h a r g e is a l i n e a r f u n c t i o n , and would not c o r r e c t l y sample f o r responses with sediment l e a d s or l a g s . I n t h e case of p r e d i c t i o n models f o r suspended sediment, it i s obvious t h a t t h e commonly used r a t i n g curve does not manifest t h e t r u e behaviour between c o n c e n t r a t i o n and discharge. The l i n e a r response assumed by t h e r a t i n g methodology occurred i n only 5% of t h e storm e v e n t s s t u d i e d . However, t h e s p e c t r a of t h e storm event s e r i e s showed some s i m u l a r i t y i n dominant frequency components f o r t h e storm e v e n t s which had such v a r i e d responses i n t h e time domain. A l l events contained a high and a low frequency component, and events which showed no i d e n t i f i a b l e p a t t e r n i n t h e t i m e domain contained an a d d i t i o n a l middle frequency component.

ACKNOWLEDGEMENTS

The a u t h o r s would l i k e t o thank t h e A u s t r a l i a n Research Grants Scheme and t h e F o r e s t r y Commission of N e w South Wales f o r t h e i r assistance.

498

REFERENCES Jenkins, G.M. and Watts, D.G., 1968: Spectral Analysis and its Applications, Prentice-Hall, Englewood Cliffs, N.J. Lyne, V.D. and Hollick, M., 1979: Stochastic time-varying rainfall-runoff modelling, Hydrology and Water Resources Symposium, 89-92, Institution of Engineers, Australia. Olive, L.J. and Rieger, W.A., 1985: Variation in suspended sediment concentration during storms in five small catchments in south east New South Wales. Australian Geographical Studies, 23, 38-51. Ulrych, T.J. and Bishop, T.N., 1975: Maximum entropy spectral analysis and autoregressive decomposition, Reviews of Geophysics and Space Physics, 13, 1 83-20 0.

Walling, D.E. and Webb, B.W., 1982: Sediment availability and the prediction of storm period sediment yields, IAHS Publication No.137, 327-340. Walling, D.E. , 1983: The sediment delivery problem, Journal of Hydrology, 69, 209-237.

I N D E X

A A c i d r a i n , 64 A n a l y s i s o f d a t a , 261 A p p a l a c h i a n P l a t e a u , 133 A t l a n t i c C a n a d a , 53 A t m o s p h e r i c ' i n p u t s , 53 A u t o r e g r e s s i v e p r o c e s s , 404

B B h a t t a c h a r y y a ' s m e a s u r e , 266 B i o l o g i c a l mon i t o r i ng, 261 Biomagnification of a contaminant,

233 B r a y - C u r t i s i n d e x , 247 B r i t i s h C o l u m b i a , 434

C Canadian Shietd, 133 C e n s o r e d w a t e r q u a l i t y d a t a , 137 C h a n g e p o i n t p r o b l e m s , 381 D e t e c t i o n a n d e s t i m a t i o n , 385 Two r e g i m e t r a n s i t i o n m o d e l , 381 C h e m i c a l t r a n s p o r t , 345 Chi-square goodness-of-fi t test,

1 9 8 , 215 C h l o r i d e l o a d i n g , 469 C h l o r o p h y l l 2 , 273

Chromatographic and colorimetric e v a l u a t i o n , 64 C l u s t e r a n a l y s i s , 1 0 0 , 1 3 3 , 199 C o l i f o r m c o u n t s , 217 C o l i f o r m m o n i t o r i n g , 183 C o l l e c t i o n o f w a t e r s a m p l e s , 196 C o m p u t e r i z a t i o n , 418 A l i a s i n g , 419 Commodore 64 ( C - 6 4 ) , 445 C o m p u t e r c o n f i g u r a t i o n , 426 C o n v e r s i o n t o , 419 C o o r d i n a t i o n and c o n t r o l , 424 Cost-effective d e s i g n 444 D a t a a c q u i s i t i o n , 4 1 8 , 443 I n t e r f a c i n g s y s t e m , 448 M i c r o c o m p u t e r b a s e d , 445 P r o g r a m m i n g c o n s i d e r a t i o n s , 429 R e s o l u t i o n , 420 S i g n a l c o n d i t i o n i n g s y s t e m , 446 S o f t w a r e d e v e l o p m e n t , 453 S y s t e m s a p p r o a c h , 418 Contaminant analysis: in a q u a t i c b i o t a , 231 C o r r e l a t i o n a n a l y s i s , 199 C o v a r i a t e a n a l y s i s , 303 C r o s s c o r r e I a t i o n , 306

,

D Data acquisition,

18

500 D a t a u n c e r t a i n t y , 18 I m p l i c a t i o n s o f , 25 E s t i m a t i o n , 21 S o u r c e s , 1 8 , 28 Data u t i l i z a t i o n , 18 D e s i g n q u a 1 i t y a s s u r a n c e , 81 Detecting changes i n regression, 382 D e t e c t i n g p a r a m e t e r c h a n g e s , 384 D i s c r i m i n a t i o n t e c h n i q u e s , 326 Dispersion p a t t e r n s of b a c t e r i a , 196 D i s t r i b u t i o n a l P a r a m e t e r s , 137 D i s s o l v e d o r g a n i c c a r b o n , 61

H Hazen P l o t : lognormal frequency d i s t r i b u t i o n , 186 Heterogeneity characterization, 1 Hierarchical clustering analysis 3 0 , 33

I Ion Chromatography

(IC),

64

J E K e n d a l 1's T a u s t a t i s t i c , E c o l o g i c a l m o n i t o r i n g , 30 L a G r a n d e C o m p l e x , Q u e b e c , 30 E n g l a n d a n d W a l e s , 221 E s t i m a t e v e r i f i c a t i o n , 155 E s t i m a t i o n , 405 B a y e s ' e s t i m a t o r s , 405 C o m p a r i s o n , 41 1 Of e x t r e m e v a l u e s , 173 Of p a r a m e t e r s , 142 W i t h , c I a s s i f i c a t i o n , 148 E u c l i d e a n d i s t a n c e i n d e x , 248 E u t r o p h i c a t i o n , 273

348

L L a b o r a t o r y a n a l y s i s , 21 L a g o n e s e r i a l c o r r e l a t i o n , 347 Lake Erie, 2 L a k e O n t a r i o , 7 9 , 9 9 , 469 S u r v e i l l a n c e p r o g r a m , 274 Limnological d a t a set: statistical a s s e s s m e n t , 363 L o g - l o g A n c o v a m o d e l , 232 Long term water q u a l i t y records, 388

F F e r m e n t a t i o n t u b e t e c h n i q u e , 184 F r a n c e , 194 F r e q u e n c y a n a l y s i s , 1 7 7 , 1 8 7 , 495 F r e q u e n c y componen t i d e n t i f i c a t i o n , 388

G Gamma M a r k o v p r o c e s s e s , 293 I n p u t p r o p e r t i e s , 294 L i n e a r r e g r e s s i v e m o d e l , 297 W e i g h t e d s u m o f s e a s o n a l Gamma v a r i a b l e s , 299 Global v a r i a n c e , 335, 339 Gower S i m i l a r i t y C o e f f i c i e n t M a t r i x , 3 3 , 247 G r e a t L a k e s W a t e r Qua1 i t y Agreemen t , 2 7 3 Grouping procedures, 5

M M a n n - W h i t n e y t e s t , 333 M a s s d i s c h a r g e e s t i m a t i o n , 345 M a x i m u m e n t r o p y m e t h o d , 495 M e a n v a l u e e s t i m a t i o n , 187 Membrane f i l t r a t i o n t e c h n i q u e , 1 8 4 , 222 Methyl thymol b l u e procedure (MTB), 54, 64 M i c h i g a n , 4 8 6 , 487 M i c r o b i o l o g i c a l w a t e r q u a l i t y , 221 Assessment , 221 S t a n d a r d s , 222 M o n i t o r i n g a c t i v i t i e s , 19 B a c t e r i a l d e n s i t y , 194 C o a s t a l s t r e a m , 433 H i g h f r e q u e n c y , 433 M o r i s i t a i n d e x ( m o d i f i e d ) , 247 Most p r o b a b l e n u m b e r ( M P N ) , 222

501 M u I i p l e s t a n d a r d a d d i t i o n (MSA 65 MuI i p l e t u b e ( d i l u t i o n ) method, 222 Mu I i s p e c i e s s t u d i e s , 261 M u I i v a r i a t e m e t h o d s , 3 0 , 33

N N a t i o n a l a s s e s s m e n t p r o g r a m , 95 N a t u r a l v a r i a b i l i t y , 158 Negative b inomia I distribution, 21 5 New S o u t h W a l e s , A u s t r a l i a , 492 N i a g a r a R i v e r , 8 , 4 6 1 , 469 N e t w o r k d e s i g n , 95 N o n p a r a m e t r i c m e t h o d s , 3 3 3 , 383 Numerical i n t e g r a t i o n , 469 1 6 3 , 168 Nutrient concentration,

P P a r a m e t e r e s t i m a t i o n , 198 P h o s p h o r u s , 3 0 2 , 479 D e t e r g e n t b a n , 3 6 4 , 486 E s t i m a t i o n , 460 L o a d i n g , 3 6 3 , 460 L o a d i n g s h i f t s , 479 Monthly mean N o n - S e a s o n a l , 480 Time Series: S e a s o n a l , 484 Time Series: Z e r o c o r r e l a t i o n m o d e l , 462 P h y top Ia n k ton b iomass measurement, 274 P o i s s o n d i s t r i b u t i o n , 222 P o i s s o n m o d e l , 217 P r e c i p i t a t i o n , 51 P r i n c i p a l Components A n a l y s i s , 3 0 , 3 3 , 1 0 1 , 124

Q Q u a n t i f i c a t i o n of n o n - t i d a I t e m p o r a l v a r i a b i l i t y , 161 Q u e b e c R i v e r s , 117

R Random w a l k / o b s e r v a t i o n e r r o r , 480 R a n d o m i z a t i o n p r o c e d u r e s , 261

R a t i o o f i s o t o p e s of e l e m e n t s i n b i o g e n i c m a t e r i a l , 237 R a t i o of s e n s i t i v e species t o r e s i s t a n t s p e c i e s , 238 R e g r e s s i o n m o d e l s , 318 A u t o r e g r e s s i v e r e s i d u a l s , 31 8 F i r s t o r d e r p o l y n o m i a l , 319 Polynomial p l u s centred period c o m p o n e n t , 320 R e l a t i o n s h i p between a v a r i a b l e and i t s p o s i t i o n , 10 R e l a t i v e s t a n d a r d d e v i a t i o n , 24 L a G r a n d e , 30 Reservoir: R i v e r a c i d i t y , 44 Root m e a n s q u a r e e r r o r , 335

S St. L a w r e n c e R i v e r , 9 3 S a m p l e c o l l e c t i o n , 21 S a m p l i n g p r o p e r t i e s , 246 S a t t e r t h w a i t e ' s a p p r o x i m a t i o n , 33 1 S e a s o n a l v a r i a b i l i t y , 9 9 , 329 S e q u e n t i a l s a m p l i n g , 200 S i m i l a r i t y i n d i c e s , 2 4 7 , 262 S i m u l a t i o n t e c h n i q u e , 248 S n e d e c o r ' s F - t e s t , 331 S p a t i a l a u t o c o r r e l a t i o n methods, 8 S p a t i a I c o r r e l a t i o n s , 175 S p a t i a l d i s t r i b u t i o n , 100 S p a t i a l v a r i a b i l i t y , 117 S p e c t r a l A n a l y s i s , 388 S t a n d e r ' s m e a s u r e , 263 S t o c h a s t i c t r a n s f e r f u n c t i o n , 303 S t u d e n t s ' t - t e s t , 327 Sulphate determination, 53, 64 S u s p e n d e d s e d i m e n t r e s p o n s e , 490

T T e m p e r a t e e s t u a r y , 158 T i m e d o m a i n a n a l y s i s , 495 T i m e s e r i e s a n a l y s i s , 4 4 , 1 7 6 , 388 3 0 2 , 3 3 5 , 3 4 7 , 463 Iterative I inear interpolation, 337 M a r k o v i a n , 335 T e s t c o m p a r i s o n s , 347 T o t a l s u r v e y d e s i g n c o n c e p t , 21 T r a n s f e r f u n c t i o n , 4 4 , 302 T r e n d a s s e s s m e n t m o n i t o r i n g , 388 T r e n d s u r f a c e a n a l y s i s , 11 T r e n d s , 3 0 3 , 347 D e t e r m i n i s t i c , 1 0 0 , 350 L i n e a r r e g r e s s i o n m o d e l s , 351

502 [Trends] L o g i s t i c model, 3 5 3 Second o r d e r a u t o r e g r e s s i v e m o d e l , 353 S t o c h a s t i c , 350 Threshold a u t o r e g r e s s i v e model,

354

W W a t e r c o l o u r , 56 W a t e r q u a l i t y , 17 I n d i c a t o r s , 433 W i s c o n s i n , 4 8 6 , 487 Wisconsin Lakes, 363

U Univariate analysis,

47

v V a l u e of

V a n c o u v e r , 388 V i r g i n i a , 159, 267

z total estimate,

26

Zonation determination,

99