Data Analysis Briefbook

Contents next up previous index Next: Algebraic Computation Up: Welcome page Previous: Contents Contents ● ● ● ● ●...

55 downloads 821 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Contents

next

up

previous

index

Next: Algebraic Computation Up: Welcome page Previous: Contents

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Algebraic Computation Aliasing Amdahl's Law Analysis of Variance ANOVA Attenuation Autocorrelation Average Bandwidth Baud Bayes Theorem Bayesian Statistics Benchmarking Beta Distribution Bias Biased Sampling Binning Binomial Distribution Bivariate Normal Distribution Boolean Algebra Bootstrap Breit-Wigner Distribution Brent's Method Cauchy Distribution Cellular Automata Central Limit Theorem Centroid Chebyshev Norm Chebyshev Polynomials Chi-Square Distribution Chi-Square Test Cholesky Decomposition

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (1 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Clipping Composite Hypothesis Confidence Level Constraints Convolution Coordinate Systems Correlation Coefficient Cost of Test Covariance Covariance Ellipse Cramer-Rao Inequality Cramer-Smirnov-Von-Mises Test Cramer's Rule Curtosis Cylindrical Coordinates Database Data Compression Data Structures Decibel Decision Boundary Decision Quality Diagram Deconvolution Degrees of Freedom Delta Function Derivative Matrix Differentiation Dirac Delta Function Discrete Cosine Transform Discriminant Analysis Discriminant Function Dispersion Matrix Distance Function Distribution Dynamic Range Eigenvalue Problems Entropy Error Ellipse Error Function

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (2 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Error Propagation Errors, General Classification Errors, Quadratic Addition Estimator Euler Angles Expectation Value Exponential Distribution Extrapolation to the Limit F Distribution F Test Factor Analysis Fast Transforms Feature Extraction FIFO Filtering Finite Difference Method Finite Element Method Finite State Machine Fitting Folding Fourier Transform Fractile Full Width at Half Maximum Gabor Filter Gamma Function Gauss-Jordan Elimination Gauss-Markov Theorem Gauss-Seidel Iteration Gaussian Distribution Gaussian Elimination Gaussian Quadrature Genetic Algorithms Geometric Mean Geometrical Transformations Givens Rotation Global Correlation Coefficient Global Image Operations Goodness-of-fit Test

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (3 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Gradient Gram-Schmidt Decomposition Graph Theory Haar Transform Hamming Distance Harmonic Mean Hash Function Heaviside Function Hessian Histogram Horner's Rule Hot Spot Hough Transform Householder Transformation Huffman Coding Hypothesis Testing Ideogram Image Enhancement Image Processing Image Recognition Image Restoration Image Segmentation Importance Sampling Interpolation Jackknife Jacobi Determinant Jacobi Iteration Jacobi Matrix Jacobian Jacobian Peak Jitter Kalman Filter Karhunen-Loeve Transform Kolmogorov Test Korobov Sequences Kronecker Delta Kurtosis Lagrange Multipliers

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (4 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Landau Distribution Laplace Transform Least Squares Least Squares, Linear Left-handed Coordinate System Likelihood Linear Algebra Packages Linear Equations Linear Equations, Iterative Solutions Linear Programming Linear Regression Linear Shift-invariant Systems LU Decomposition Marginal Distribution Markov Chain Matrix Operations Matrix Operations, Complex Maximum Likelihood Method Mean Median Median Filter Metric Metropolis Algorithm MFLOPS Minimax Approximation Minimization MIPS Mode Moment Monte Carlo Methods Morphological Operations Multinomial Distribution Multivariate Normal Distribution Neural Networks Neville Algorithm Newton-Raphson Method Newton's Rule Neyman-Pearson Diagram

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (5 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Noise Norm Normal Distribution Normal Equations Numerical Differentiation Numerical Integration Numerical Integration of ODE Numerical Integration, Quadrature Numerov's Method Object-oriented Programming Optimization Orthogonal Functions Orthogonal Matrices Orthogonal Polynomials Orthonormal Outlier Overdetermined Systems Pade Approximation Parallel Processing Penalty Function Petri Nets Point Spread Function Poisson Distribution Polar Coordinates Polynomials Population Positivity Power of Test Predictor-Corrector Methods Principal Component Analysis Probability Probability Calculus Probability Density Function Protocol Pseudoinverse Pseudorandom Numbers Pull Value Purity of Test

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (6 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

QR Decomposition Quadrature Quantile Quantization Quasirandom Numbers Radius of Curvature Radon Transform Random Numbers Random Numbers, Correlated Random Variable Rank Filter Recursion Regression Analysis Regularization Relaxation Resampling Residuals Right-handed Coordinate System Rms Error Robustness Rotations Runge-Kutta Methods Runs Runs Test Saddle Point Sagitta Sample Sample Mean, Sample Variance Sampling from a Probability Density Function Sampling Theorem Scalar Product Scatter Diagram Schwarz Inequality Shaping Sharpening Sigmoid Function Signal Processing Significance of Test

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (7 of 9)9/3/2006 14:13:23

Contents ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Simplex Method Simpson's Rule Simulated Annealing Singular Value Decomposition Skewness Small Samples Smoothing Software Engineering Sorting Spherical Coordinates Spline Functions Stack Standard Deviation Statistic Stirling's Formula Stratified Sampling Structured Programming Student's Distribution Student's Test Successive Over-Relaxation T-Distribution, T-Test Template Matching Thresholding Training Sample Transformation of Random Variables Trimming Truly Random Numbers Tuple Type-I Error Unfolding Uniform Distribution Validation Sample Variance Wavelet Transform Weighted Mean Width Winsorization Zero Suppression

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (8 of 9)9/3/2006 14:13:23

Contents ● ● ●

References Index About this document ... Data Analysis BriefBook, Version 16, April 1998

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node1.html (9 of 9)9/3/2006 14:13:23

Algebraic Computation

next

up

previous

contents

index

Next: Aliasing Up: No Title Previous: Contents

Algebraic Computation Also called Formula Manipulation or Symbolic Computation. Existing programs or systems in this area allow us or one to transform mathematical expressions in symbolic form, hence in an exact way, as opposed to numerical and hence limited-precision floating point computation. Primarily designed for applications in theoretical physics or mathematics, these systems, which are usually interactive, can be used in any area where straightforward but tedious or lengthy calculations with formulae are required. Typical operations include differentiation and integration, linear algebra and matrix calculus, polynomials, or the simplification of algebraic expressions. Well known systems for algebraic computation are, amongst others, Macsyma [MACSYMA87], Maple [Char91], Mathematica[Wolfram91], or Reduce [Hearn95], [Rayna87]. These systems have different scope and facilities, and some are easier to use or to access than others. Mathematica is a commercial package; Maple is available through another commercial package, Matlab (Symbolic Math Toolbox). For introductory reading, and many further references, e.g. [Buchberger83] or [Davenport88].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node2.html9/3/2006 14:13:25

Index

next

up

previous

contents

Next: About this document Up: No Title Previous: References

Index

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node301.html9/3/2006 14:13:27

The Data Analysis BriefBook

Welcome to the Internet version of

The Data Analysis BriefBook The BriefBook is a condensed handbook, or an extended glossary, written in encyclopedic format, covering subjects in statistics, computing, analysis, and related fields. It intends to be both introduction and reference for data analysts, scientists and engineers. This site has been selected as one of the top educational resources on the Web, by StudyWeb.

The Data Analysis BriefBook has been prepared by ● ●

R.K.BOCK and W.KRISCHER, both at CERN (Geneva).

Enter The Data Analysis BriefBook (your browser should have graphics capability). You will access version 16, which is also available as a book (order directly from Springer or from your preferred bookstore). The Internet version will be updated occasionally, and is not necessarily identical to the printed version. In all cases, we appreciate your feedback: please send comments, error corrections, or your suggestions for new contributions, to R.K.Bock.

Part of the information has been derived, with permission, from a booklet FORMULAE AND METHODS IN EXPERIMENTAL DATA EVALUATION, published in 1984 by the European Physical Society, and out of print since many years. This BriefBook is a major update and extension, but some original contributions by V.Blobel (Hamburg), S.Brandt (Siegen), R.Frühwirth (Vienna), F.James (Geneva), J.Myrheim (Copenhagen), and M.Regler (Vienna) are acknowledged. Parts

http://rkb.home.cern.ch/rkb/titleA.html (1 of 2)9/3/2006 14:13:28

The Data Analysis BriefBook

related to physics have been eliminated and are now presented separately as The Particle Detector BriefBook.

Some comments on this Internet version of The Data Analysis BriefBook: The html version has been generated automatically, using Latex2html version 3.1. Minor adjustments by hand were necessary; if in some places the html presentation is not optimal, we ask for your understanding. Although itself available on Internet with multiple internal cross references, you will find practically no URLs of other external sites; we have found much interesting information with our browsers, but a good deal of it can be characterized as shortlived, unfinished and abandoned, unchecked, or sometimes even containing outright errors. It is our intention to avoid these pitfalls as best we can: the BriefBook has been conceived primarily as a book, i.e. with stability in mind. The BriefBook is sure to contain some errors: we will be eager to correct them. In some areas, it is incomplete: we will include obvious omissions and let it evolve slowly towards other, related subjects. Updates, however, will be carefully grouped, and somewhat oriented along the lines successive printed editions take. All this being said, we want to give here some pointers towards sites where definitely useful, in many cases more detailed, and hopefully long-lived information can be found: ● ● ● ● ● ●

Numerical Recipes (Press et al., books and algorithms in C or Fortran) StatSoft (Statistics textbook in electronic form) Statistics algorithms from the Royal Statistical Society Links to General Numerical Analysis Sites Mathematics Archives: lessons, tutorials, course material Algorithm course material - a wide selection

Rudolf K.Bock, March 1999

http://rkb.home.cern.ch/rkb/titleA.html (2 of 2)9/3/2006 14:13:28

Aliasing

next

up

previous

contents

index

Next: Amdahl's Law Up: No Title Previous: Algebraic Computation

Aliasing Used in the context of processing digitized signals (e.g. audio) and images (e.g. video), aliasing describes the effect of undersampling during digitization which can generate a false (apparent) low frequency for signals, or staircase steps along edges (jaggies) in images; Sampling Theorem. Aliasing can be avoided by an antialiasing (analogue) low-pass filter, before sampling. The term antialiasing is also in use for a posteriori signal smoothing intended to remove the effect.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node3.html9/3/2006 14:13:29

Amdahl's Law

next

up

previous

contents

index

Next: Analysis of Variance Up: No Title Previous: Aliasing

Amdahl's Law Various interpretations are in use. Originally defined for showing that vectorizing of a program can only affect that part of the program which lends itself to vectorizing. The ``law'' can be written as

where f is the fraction of the program that can be improved, S is the improvement factor on this fraction, f

and

is the overall improvement achieved. Obviously, for small f,

, whatever the value of

S , i.e. insignificant overall gain is achieved. f

The generalization to the parallelizing of programs is obvious, although the effect of diminishing returns there is enhanced because of the introduction of communication overheads, synchronization effects, etc. A further generalization could be to a rule of thumb like work only on problems with good returns. Another accepted meaning is that of diminishing returns for parallel systems as the number of processors increases: according to this rule of thumb, the effective capacity scales not with the number of processors (N), but with .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node4.html9/3/2006 14:13:30

Analysis of Variance

next

up

previous

contents

index

Next: ANOVA Up: No Title Previous: Amdahl's Law

Analysis of Variance Essentially corresponds to a determination of the fluctuations observed in a sample, and their dependencies. The terminology in some textbooks for statisticians is somewhat different from the one used by engineers. Training samples are called control samples , interrelations between variables are found by factor analysis, and the analysis of variance (ANOVA) appears under different names as the accents are set differently, like one-way and two-way ANOVA, analysis of covariance (ANCOVA), multivariate multivariate analysis of variance (MANOVA), discriminant analysis , etc. For further reading, e.g. [Edwards93].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node5.html9/3/2006 14:13:31

ANOVA

next

up

previous

contents

index

Next: Attenuation Up: No Title Previous: Analysis of Variance

ANOVA Short for Analysis of Variance

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node6.html9/3/2006 14:13:32

Attenuation

next

up

previous

contents

index

Next: Autocorrelation Up: No Title Previous: ANOVA

Attenuation A name given to phenomena of reduction of intensity according to the law

resulting in an exponential decay

In this equation t may be time (e.g. attenuation of a circulating beam) or length (e.g. attenuation of light in a light guide (fibre) or scintillator), or any corresponding continuous variable. The attenuation time or attenuation length is given by , the time (length) over which the intensity is reduced by a factor . is due to the exponential Frequently I is a discrete variable (number of particles), and the factor distribution of individual lifetimes. then is the expectation value of the distribution, i.e. the mean lifetime . If the intensity at time zero is I0 and a time

is the lifetime or attenuation time, then the average intensity over

is given by

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node7.html9/3/2006 14:13:34

.

Autocorrelation

next

up

previous

contents

index

Next: Average Up: No Title Previous: Attenuation

Autocorrelation A random process x(t) evolves with time t according to the frequencies present. Autocorrelation is the expectation value of the product , with a time difference. The autocorrelation depends on x and

, but is independent of t.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node8.html9/3/2006 14:13:35

Average

next

up

previous

contents

index

Next: Bandwidth Up: No Title Previous: Autocorrelation

Average Weighted Mean

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node9.html9/3/2006 14:13:36

Bandwidth

next

up

previous

contents

index

Next: Baud Up: No Title Previous: Average

Bandwidth Defines the part of the frequency spectrum where the attenuation through a device is low, thus allowing a uniform transfer of the signals within that band (passband). Usually measured at the half-power points Decibel). of the response curve i.e the points of -3 dB ( For communication purposes, the bandwidth defines the amount of information that can be transferred through a particular channel in a given time interval. For analogue signals, the bandwidth defines the quality of the channel. Typical values are 3000 Hz for speech and 15 to 20 KHz for high-quality channels. In the case of digital transmission, the bandwidth defines the maximum information capacity, baud, of the channel. The bandwidth can either be referred to an interval starting at 0 Hz (baseband) or to any other part of the spectrum. Baseband information can be modulated, by various methods, on a high frequency carrier. Note that after modulation, the bandwidth required to transfer the baseband information might increase. Bandwidth limiting is often applied to readout electronics of sensors, in order to optimize the signal-tonoise ratio (``shaping'').

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node10.html9/3/2006 14:13:37

Baud

next

up

previous

contents

index

Next: Bayes Theorem Up: No Title Previous: Bandwidth

Baud Most often, used superficially (and incorrectly) to mean bits/second. Baud is the capacity unit for data transmission in communication systems, and expresses information units per second. Each information unit may contain one or more information bits. Modern communication techniques use both amplitude and phase information to code a set of bits into each information unit, like 4800 bits/s on a 1200 baud link. The bandwidth required is given by the baud rate, while the bit/s defines the quality requirements on the link. Use of the latter unit is recommended in most practical contexts.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node11.html9/3/2006 14:13:39

Bayes Theorem

next

up

previous

contents

index

Next: Bayesian Statistics Up: No Title Previous: Baud

Bayes Theorem A theorem concerning conditional probabilities of the form P(A|B) [read: ``the probability of A, given B'']:

where P(B) and P(A) are the unconditional (or a priori) probabilities of B and A, respectively. This is a fundamental theorem of probability theory, but its use in statistics is a subject of some controversy ( Bayesian Statistics). For further discussion, see [Eadie71], [Sivia96].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node12.html9/3/2006 14:13:40

Bayesian Statistics

next

up

previous

contents

index

Next: Benchmarking Up: No Title Previous: Bayes Theorem

Bayesian Statistics An important school of statistical theory, in which statistics is derived from a probability interpretation that includes the degree of belief in a hypothesis. It thus refers not only to repeatable measurements (as does the frequentist interpretation). The interpretation of data can be described by Bayes Theorem:

where H is a hypothesis and d is experimental data. The Bayesian meaning of the different terms is: ● ● ● ●

- P(H|d) is the degree of belief in the hypothesis H, after the experiment which produced data d. - P(H) is the prior probability of H being true. - P(d|H) is the ordinary likelihood function used also by non-Bayesians. - P(d) is the prior probability of obtaining data d. It can be rewritten using the other terms as: , where summation runs over all hypotheses.

What is called a ``Bayesian'' viewpoint is the application of the laws of probability to non-repeatable events: H is a hypothesis or proposition, either true or untrue, and P(H) is interpreted as the degree of belief in the proposition. For further discussion, see [Eadie71], [Press95], [Sivia96].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node13.html9/3/2006 14:13:41

Benchmarking

next

up

previous

contents

index

Next: Beta Distribution Up: No Title Previous: Bayesian Statistics

Benchmarking In general, benchmarking (of computers) consists of defining one or several variables that describe a computer system's performance, and to measure these variables. There is no standard or generally accepted measure for computer system capacity: ``capacity'' is a mix of multiple parameters like cycle time, memory access time, architectural peculiarities like parallelism of processors and their communication, instruction parallelism or pipelining, etc. Usually, benchmarks should include system software aspects like compiler efficiency and task scheduling. Potential buyers of computer systems, in particular large and parallel systems, usually have to go to more or less detailed understanding of systems, and perform benchmark tests, i.e. they execute performance measurements with their own program mix, in order to assess the overall performance of candidate systems ( [Datapro83], [GML83], [Hennessy90]). Attempts to express computer capacity in a single or a few numbers have resulted in more or less controversial measures; conscientious manufacturers advertise with several or all of these. MIPS is an acronym for Million Instructions Per Second, and is one of the measures for the speed of computers. It has been attempted, theoretically, to impose an instruction mix of 70% additions and 30% multiplications (fixed point), and architectural factors as much as efficiency of scheduling or compilation should be entirely ignored. This makes the measure a simple and crude one, barely superior to cycle time. In practice, vendors usually make some corrections for such factors, and the results found are considered more or less controversial. Sometimes a floating point instruction mix is used; the unit is then called MFLOPS, clearly not a useful measure for some types of programs. The Whetstone benchmark (like a later relative, Dhrystone) is a group of synthetic (i.e. artificially defined) program pieces, meant to represent an instruction mix matching the average frequency of operations and operands of ``typical'' program classes. A different effort resulted in the SPEC benchmarks: a grouping of major workstation manufacturers called the System Performance Evaluation Cooperative agreed on a set of real programs and inputs, against which to measure performance. Real programs such as a mix of Linpack (linear algebra) operations are also frequently used for benchmarks.

next

up

previous

contents

index

Next: Beta Distribution Up: No Title Previous: Bayesian Statistics

http://rkb.home.cern.ch/rkb/AN16pp/node14.html (1 of 2)9/3/2006 14:13:42

Benchmarking

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node14.html (2 of 2)9/3/2006 14:13:42

Beta Distribution

next

up

previous

contents

index

Next: Bias Up: No Title Previous: Benchmarking

Beta Distribution A family of distributions which are non-zero only over a finite interval 0 < X < 1:

n and m are positive integers, and

is Euler's gamma function. For appropriate n and m, these

distributions resemble phase space distributions of kinematic variables like effective mass.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node15.html9/3/2006 14:13:43

Bias

next

up

previous

contents

index

Next: Biased Sampling Up: No Title Previous: Beta Distribution

Bias A physical quantity is measured using the estimator S, which is a function of the elements of a sample, . The difference between the expectation value of the estimator, E(S), and the true value

of the physical quantity is the bias of the estimator:

. The estimator is unbiased if variance of an estimator, see [Bishop95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node16.html9/3/2006 14:13:45

. For the relation between bias and

Biased Sampling

next

up

previous

contents

index

Next: Binning Up: No Title Previous: Bias

Biased Sampling Importance Sampling

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node17.html9/3/2006 14:13:45

Binning

next

up

previous

contents

index

Next: Binomial Distribution Up: No Title Previous: Biased Sampling

Binning The process of grouping measured data into data classes or histogram bins. Discretization, quantization, or digitizing are very similar concepts. After binning, the fine-grain information of the original measured values is lost, and one uses only bin contents. The amount of information lost in this way is negligible if the bin widths are small compared with the experimental resolution. Many statistical methods, notably those based on the chi-square distribution, require that data be binned, and that the bins satisfy certain constraints, namely that the number of events in each bin be not less than a certain minimum number so that the distribution of expected events per bin is approximately Gaussian. Opinions differ on the minimum number of events required, but this is usually taken as being between five and ten, provided only a few bins have this minimum number. There is no reason why bins should be of equal width, except for convenience of computation (e.g. in image processing), and many studies indicate that the statistically optimal binning is that which gives equally probable bins. Where the amount of data is so small that wide bins are necessary, it is preferable to avoid binning by using other methods if possible. For example, use the maximum likelihood fit instead of the least squares fit, and use the Kolmogorov test or the Cramer-Smirnov-Von-Mises test rather than the onedimensional chi-square test.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node18.html9/3/2006 14:13:47

Binomial Distribution

next

up

previous

contents

index

Next: Bivariate Normal Distribution Up: No Title Previous: Binning

Binomial Distribution A given experiment may yield the event A or the event

(not A) with the probabilities P(A)=p and

, respectively. If the experiment is repeated n times and X is the number of times A is obtained, then the probability of X taking exactly a value k is given by

with the binomial coefficients

The distribution has the properties mean:

E(X) = np, ,

variance: skewness:

,

curtosis: c = (1-6pq)/(npq) +3, which are determined by the single parameter p. If in a sample of n events k have the property A, then the maximum likelihood estimator of the parameter p is given by

The variance of the estimator of p is

for which an unbiased estimator is

http://rkb.home.cern.ch/rkb/AN16pp/node19.html (1 of 2)9/3/2006 14:13:49

Binomial Distribution

Note that the probability of obtaining k events out of n for a given p should not be estimated by comparing the difference of P and p against s2(P), but from a Poisson distribution with mean pn, particularly if P is close to 0 or 1.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node19.html (2 of 2)9/3/2006 14:13:49

Bivariate Normal Distribution

next

up

previous

contents

index

Next: Boolean Algebra Up: No Title Previous: Binomial Distribution

Bivariate Normal Distribution If

is a constant vector and

are positive definite symmetric matrices (

where variables

.

Positivity), then

is the joint probability density of a normal distribution of the

. The expectation values of the variables are

. Their covariance

-plane correspond to constant values of matrix is C. Lines of constant probability density in the the exponent. For a constant exponent, one obtains the condition:

This is the equation of an ellipse. For the equation becomes

, the right-hand side of

and the ellipse is called the covariance ellipse or error ellipse of

the bivariate normal distribution. The error ellipse is centred at the point

and has as

principal (major and minor) axes the (uncorrelated) largest and smallest standard deviation that can be found under any angle. The size and orientation of the error ellipse is discussed below. The probability . of observing a point (X1,X2) inside the error ellipse is Note that distances from the point to the covariance ellipse do not describe the standard deviation along directions other than along the principal axes. This standard deviation is obtained by error

http://rkb.home.cern.ch/rkb/AN16pp/node20.html (1 of 3)9/3/2006 14:13:54

Bivariate Normal Distribution

propagation, and is greater than or equal to the distance to the error ellipse, the difference being explained by the non-uniform distribution of the second (angular) variable (see figure).

For vanishing correlation coefficient (

) the principal axes of the error ellipse are parallel to the

coordinate x1, x2 axes, and the principal semi-diameters of the ellipse p1,p2 are equal to For

.

one can find the principal axes and their orientation with respect to the coordinate axes from

the relations

where a is the angle between the x1 axis and the semi-diameter of length p1. Note that a is determined up to multiples of

, i.e. for both semi-diameters of both principal axes.

http://rkb.home.cern.ch/rkb/AN16pp/node20.html (2 of 3)9/3/2006 14:13:54

Bivariate Normal Distribution

The marginal distributions of the bivariate normal are normal distributions of one variable:

Only for uncorrelated variables, i.e. for

, is the bivariate normal the product of two univariate

Gaussians

Unbiased estimators for the parameters a1,a2, and the elements C are constructed from a sample (X1 ij

X2 ),

as follows:

k

Estimator of a : i

Estimator of C : ij

next

up

previous

contents

index

Next: Boolean Algebra Up: No Title Previous: Binomial Distribution Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node20.html (3 of 3)9/3/2006 14:13:54

k

Boolean Algebra

next

up

previous

contents

index

Next: Bootstrap Up: No Title Previous: Bivariate Normal Distribution

Boolean Algebra n

A set

with n elements has 2 different subsets, including the empty set 0 and I either belongs to the subset or does not belong). The Boolean algebra B

itself ( each

n

n

consists of these 2 subsets with the operations of union , intersection , and complement - (the complement of X is also written ). Examples of rules that are valid for any X,Y,Z are

Every Boolean equation is equivalent to its dual, in which the operations of union and intersection are interchanged and simultaneously all variables are complemented. For example, is . equivalent to B1 is also called propositional calculus. It is the calculus of truth values (0 = false, I = 1 = true,

= or,

= and, - = not). Boolean variables and operations can be used in high-level programming languages (TRUE, FALSE, OR, AND, NOT, sometimes XOR). Sometimes the rules of Boolean algebra can also be used to simplify considerably the logic of a complicated sequence of tests. A much more complete discussion of Boolean algebra can be found by looking in The Free On-line Dictionary of Computing.

http://rkb.home.cern.ch/rkb/AN16pp/node21.html (1 of 2)9/3/2006 14:13:57

Boolean Algebra

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node21.html (2 of 2)9/3/2006 14:13:57

Bootstrap

next

up

previous

contents

index

Next: Breit-Wigner Distribution Up: No Title Previous: Boolean Algebra

Bootstrap As a general term, bootstrapping describes any operation which allows a system to generate itself from its own small well-defined subsets (e.g. compilers, software to read tapes written in computerindependent form). The word is borrowed from the saying pull yourself up by your own bootstraps . In statistics, the bootstrap is a method allowing one to judge the uncertainty of estimators obtained from small samples, without prior assumptions about the underlying probability distributions. The method consists of forming many new samples of the same size as the observed sample, by drawing a random selection of the original observations, i.e. usually introducing some of the observations several times. The estimator under study (e.g. a mean, a correlation coefficient) is then formed for every one of the samples thus generated, and will show a probability distribution of its own. From this distribution, confidence limits can be given. For details, see [Efron79] or [Efron82]. A similar method is the jackknife.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node22.html9/3/2006 14:13:58

Breit-Wigner Distribution

next

up

previous

contents

index

Next: Brent's Method Up: No Title Previous: Bootstrap

Breit-Wigner Distribution Probability density functions of the general form

are also known in

statistics as Cauchy distributions. The Breit-Wigner (also known as Lorentz) distribution is a generalized form originally introduced ([Breit36], [Breit59]) to describe the cross-section of resonant nuclear scattering in the form

which had been derived from the transition probability of a resonant state with known lifetime. The equation follows from that of a harmonic oscillator with damping, and a periodic force.

The above form can be read as the definition of a probability density as a function of E, the integral over all energies E is 1. Variance and higher moments of the Breit-Wigner distribution are infinite. The distribution is fully defined by E0, the position of its maximum ( about which the distribution is symmetric), and by

, the full width at half maximum (FWHM), as obviously

http://rkb.home.cern.ch/rkb/AN16pp/node23.html (1 of 2)9/3/2006 14:14:05

Breit-Wigner Distribution

The Breit-Wigner distribution has also been widely used for describing the non-interfering cross-section of particle resonant states, the parameters E0 (= mass of the resonance) and (= width of the resonance) being determined from the observed data. Observed particle width distributions usually show an apparent FWHM larger than , being a convolution with a resolution function due to measurement uncertainties. and the lifetime of a resonant state are related to each other by Heisenberg's uncertainty principle ( ). A normal (Gaussian) distribution decreases much faster in the tails than the Breit-Wigner curve. For a Gaussian, FWHM = 2.355 , [ here is the distribution's standard deviation]. The Gaussian in the graph above would be even more peaked at x = 0 if it were plotted with FWHM equal to 1 (as the BreitWigner curve).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node23.html (2 of 2)9/3/2006 14:14:05

Brent's Method

next

up

previous

contents

index

Next: Cauchy Distribution Up: No Title Previous: Breit-Wigner Distribution

Brent's Method A particularly simple and robust method to find a minimum of a function f(x) dependent on a single variable x. The minimum must initially be bracketed between two values x=a and x=b. The method uses parabolic interpolation as long as the process is convergent and does not leave the boundaries (a,b), and interval subdividing methods otherwise. The algorithm requires keeping track of six function points at all times, which are iteratively updated, reducing the minimum-enclosing interval continually. An algorithm is given in [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node24.html9/3/2006 14:14:06

Cauchy Distribution

next

up

previous

contents

index

Next: Cellular Automata Up: No Title Previous: Brent's Method

Cauchy Distribution A random variable X follows the Cauchy distribution if its probability density function is

Its mode and median are zero, but the expectation value, variance and higher moments are undefined since the corresponding integrals diverge. A commonly used measure of the width is the full width at half maximum (FWHM), which is equal to 2. If a variable

is uniformly distributed between

and

, then

will follow a

Cauchy distribution. If y and z follow independent normal distributions, x=y/z will again follow a Cauchy distribution. A more general form of the Cauchy distribution is the Lorentz distribution , also called the Breit-Wigner distribution which has the probability density

where x0 is the mode and

the full width at half maximum or FWHM.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node25.html9/3/2006 14:14:07

Cellular Automata

next

up

previous

contents

index

Next: Central Limit Theorem Up: No Title Previous: Cauchy Distribution

Cellular Automata A simple mathematical system made of cells arranged on a grid. Cells have a state; all states evolve simultaneously according to a uniform set of rules such that the state at step i+1 depends on the state in step i of the cell in question and of cells in a small neighbourhood. Such a discrete dynamical system may serve to model physical systems; large cellular automata, despite their simplicity at the local level, can show behaviour of substantial complexity. As information processing systems, cellular automata may also be regarded as a subclass of artificial neural networks, in which node connections are of the nearest-neighbour type in two dimensions. see [Wolfram86], [Raghavan93].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node26.html9/3/2006 14:14:08

Central Limit Theorem

next

up

previous

contents

index

Next: Centroid Up: No Title Previous: Cellular Automata

Central Limit Theorem This theorem states that the sum of a large number of random variables is approximately normally distributed, even though the random variables themselves may follow any distribution or be taken from different distributions. The only conditions are that the original random variables must have finite expectation and variance. Although the theorem is only true of an infinite number of variables, in practice the convergence to the Gaussian distribution is very fast. For example, the distribution of the sum of ten uniformly distributed random variables is already indistinguishable by eye from an exact Gaussian (see [Grimmett92]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node27.html9/3/2006 14:14:09

Centroid

next

up

previous

contents

index

Next: Chebyshev Norm Up: No Title Previous: Central Limit Theorem

Centroid Synonymous with centre of gravity; most often used for two- (or more-) dimensional distributions, designating the point given by the arithmetic mean in all variables.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node28.html9/3/2006 14:14:13

Chebyshev Norm

next

up

previous

contents

index

Next: Chebyshev Polynomials Up: No Title Previous: Centroid

Chebyshev Norm Also called the

norm, this is the L norm with p

. In the Chebyshev norm, the distance

between two sets of points or two lines is just the largest distance between any pair of points or the separation between two lines at the point where they are the farthest apart. A Chebyshev approximation minimizes the maximum distance between the data and the approximating function, hence the occasional name minimax approximation. The use of the Chebyshev norm is indicated in many cases where the residuals of the fit are known not to follow a Gaussian distribution, in particular for all approximations of an empirical nature, where residuals are dominated by the inadequacy of the approximation rather than the errors of the measurements being approximated. Programs performing fits using the Chebyshev norm are usually more time consuming than least squares fit programs, but can be found in some program libraries. A specific application to track fitting can be found in [James83].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node29.html9/3/2006 14:14:15

Chebyshev Polynomials

next

up

previous

contents

index

Next: Chi-Square Distribution Up: No Title Previous: Chebyshev Norm

Chebyshev Polynomials For

the Chebyshev polynomials of the first kind are defined by

In particular,

for

is a Fourier series in

. A Chebyshev series in x,

. Terms

, etc. can be ignored (for

) as long as

is smaller than the error one can tolerate. The truncated series

can be computed by the recursion formula (

which is numerically stable for

Horner's Rule)

.

The Chebyshev series converges faster (if convergence is measured in terms of the maximum error for ) than the Taylor series for the same function,

http://rkb.home.cern.ch/rkb/AN16pp/node30.html (1 of 3)9/3/2006 14:14:20

Chebyshev Polynomials

j

The two series are approximately related by c = 2 -1 a , if the sequence j

is

j

rapidly decreasing. Rearrangement of a Taylor series into a Chebyshev series is called economization. The Chebyshev series is optimal in the sense that S (x) is approximately equal to the polynomial of m

degree m that minimizes the maximum of the error |S(x) - S (x)| for m

(the assumption is again

that the absolute values |a | decrease rapidly). j

If the function S(x) is known for

, the coefficients in its Chebyshev series are

This follows from the orthogonality relation for the Chebyshev polynomials. For a rapidly converging series the truncation error is approximately equal to the first neglected term, and the approximation

implies that

where

for

follows from the orthogonality relation

http://rkb.home.cern.ch/rkb/AN16pp/node30.html (2 of 3)9/3/2006 14:14:20

are the m+1 zeros of T

(x). This

m+1

Chebyshev Polynomials

for

. (Note an error in [NBS52], where the term

is omitted.)

These results may be useful if a polynomial interpolation of measured values of S(x) is wanted. One may and use the above formula to determine . Then S choose to measure m

(x) is the best polynomial approximation to S(x) for

in the sense that the maximal error is

(nearly) minimized. Moreover, if the measurement error is the same for all S(x ), then for any r<m, S (x) determined in this l

r

way is the polynomial of degree r which gives the least squares approximation to the measured values. also [Abramowitz74], [NBS52], [Press95].

next

up

previous

contents

index

Next: Chi-Square Distribution Up: No Title Previous: Chebyshev Norm Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node30.html (3 of 3)9/3/2006 14:14:20

Chi-Square Distribution

next

up

previous

contents

index

Next: Chi-Square Test Up: No Title Previous: Chebyshev Polynomials

Chi-Square Distribution If the random variable X follows the standard normal distribution, i.e. the Gaussian distribution with of size N from this distribution zero mean and unit variance, one can draw a sample and form the sum of squares

(chi-square) follows the probability density of the

The random variable

distribution with N

degrees of freedom

where

is Euler's Gamma function. The

distribution has the properties ,

mean:

,

variance: skewness: curtosis: c = 12/N + 3 In the limit

the

distribution approaches the normal distribution with mean N and variance

2N. For an N-independent test (e.g. comparing

's with different N) one can use the quantity

however, the expression

http://rkb.home.cern.ch/rkb/AN16pp/node31.html (1 of 2)9/3/2006 14:14:23

Chi-Square Distribution

is usually preferred, as it approaches standard normal behaviour faster as N increases.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node31.html (2 of 2)9/3/2006 14:14:23

Chi-Square Test

next

up

previous

contents

index

Next: Cholesky Decomposition Up: No Title Previous: Chi-Square Distribution

Chi-Square Test If N measurements y are compared to some model or theory predicting values g , and if the i

i

measurements are assumed normally distributed around g , uncorrelated and with variances i

, then the

sum

follows the integral of the

(chi-square) distribution with N degrees of freedom. The distribution; if the sum above is equal to the quantile

test compares s with the of the

distribution

then the probability of obtaining s or a larger value in the 'null hypothesis' (i.e. the y are drawn from a i

distribution described by the

Integral curves for the

) is given by

.

distribution exist in computer libraries or are tabulated in the literature. Note

that the test may express little about the inherent assumptions; wrong hypotheses or measurements can, but need not cause large

's. The only statement to make about a measured s is the one above: ``

is the probability of finding a

as large as s or larger, in the null hypothesis''.

http://rkb.home.cern.ch/rkb/AN16pp/node32.html (1 of 2)9/3/2006 14:14:25

Chi-Square Test

Rudolf K. Bock, Oct 2000

http://rkb.home.cern.ch/rkb/AN16pp/node32.html (2 of 2)9/3/2006 14:14:25

Cholesky Decomposition

next

up

previous

contents

index

Next: Clipping Up: No Title Previous: Chi-Square Test

Cholesky Decomposition A symmetric and positive definite matrix can be efficiently decomposed into a lower and upper triangular matrix. For a matrix of any type, this is achieved by the LU decomposition which factorizes A , where L = LU. If A satisfies the above criteria, one can decompose more efficiently into (which can be seen as the ``square root'' of A) is a lower triangular matrix with positive diagonal for x.

elements. To solve Ax = b, one solves first Ly = b for y, and then A variant of the Cholesky decomposition is the form

, where R is upper triangular.

Cholesky decomposition is often used to solve the normal equations in linear least squares problems; , in which is symmetric and positive definite. they give To derive

, we simply equate coefficients on both sides of the equation:

to obtain: a11 = l112 a21 = l21l11 a22=l212 + l222 a32=l31l21

l32=(a32-l31l21)/l22, etc.

+l32l22

http://rkb.home.cern.ch/rkb/AN16pp/node33.html (1 of 2)9/3/2006 14:14:28

Cholesky Decomposition

In general for

and

:

Because A is symmetric and positive definite, the expression under the square root is always positive, and all l are real (see [Golub89]). ij

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node33.html (2 of 2)9/3/2006 14:14:28

Clipping

next

up

previous

contents

index

Next: Composite Hypothesis Up: No Title Previous: Cholesky Decomposition

Clipping Clipping is used, apart from everyday usage, in image processing, when parts of an image are removed, usually delimited by straight lines. Images which are projections of three-dimensional computer objects may be clipped in 3-D, usually by one or several delimiting plane(s). Clipping is also in use for thresholding signal amplitudes or greyvalues in an image.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node34.html9/3/2006 14:14:29

Composite Hypothesis

next

up

previous

contents

index

Next: Confidence Level Up: No Title Previous: Clipping

Composite Hypothesis A hypothesis with one or more free parameters. As an example, the hypothesis that the decay of a given particle is purely exponential with unknown lifetime, is a composite hypothesis. The testing of a composite hypothesis involves first estimating the unknown parameter(s). In the actual test, it is then necessary to compensate for the fact that the parameter(s) has (have) been fitted using the same data. Since one typically knows how to do this correctly only in the asymptotic limit of a large amount of data, such tests are never as safe as tests of simple (completely defined) hypotheses (see [Eadie71]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node35.html9/3/2006 14:14:30

Confidence Level

next

up

previous

contents

index

Next: Constraints Up: No Title Previous: Composite Hypothesis

Confidence Level A measure usually associated with the comparison of observed value(s) with a probability density function (pdf). It expresses the probability that the observation is as far as observed or further away from the most probable value of the pdf, i.e., it corresponds to the integral over the pdf from the observed value to infinity. Differences in interpretation exist, e.g. deviations on both sides may be considered, or the integral may extend over integration limits defined from case to case (like over all ``large'' deviations). A confidence interval, bounded by confidence limits, is an estimate for the range of values which an unkown parameter could take, given a confidence level. Confidence levels are often expressed as a percentage, e.g. it is 95% likely that a value of 11.07 or larger does not belong to a five degrees of freedom.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node36.html9/3/2006 14:14:31

distribution with

Constraints

next

up

previous

contents

index

Next: Convolution Up: No Title Previous: Confidence Level

Constraints In classical mechanics, constraint and degree of freedom are complementary terms: adding constraints reduces the number of degrees of freedom. In statistics, on the other hand, the two terms are used with identical meaning, i.e. the number of degrees of freedom is equal to the number of independent constraints. Note that constraint equations are not independent if they contain free parameters, as eliminating one unknown costs one equation. Example 1: In classical mechanics let a particle be constrained to move on the surface of a sphere of radius r. There are three coordinates x, y and z, and one constraint

leaving 3-1=2 degrees of freedom (for the particle to move). In other words, the position of the particle is described by two independent coordinates, e.g. the polar angles and , where

Assume now that independent measurements of x,y,z are carried out. Then there is said to be one (statistical) degree of freedom, meaning that there is one constraint equation with no unknown. The true values of x,y,z must satisfy the constraint equation c(x,y,z)=0, but the observed values will usually fail to do so because of measurement errors. Given the true values x,y,z the observed values are random variables such that

is the probability that

,

,

. In the

maximum likelihood method, estimates for x,y,z are determined by the condition that should be maximal, while at the same time c(x,y,z)=0. If the probability distribution f is Gaussian, with variances independent of x, y and z, then the maximum likelihood method reduces to the least squares method. If for example

http://rkb.home.cern.ch/rkb/AN16pp/node37.html (1 of 3)9/3/2006 14:14:35

Constraints

and

is independent of x,y,z, then the maximum of f is the minimum of S2. The least squares method

provides not only a best fit for x,y,z, but also a test of the hypothesis c(x,y,z)=0. Define

as the

minimum value of S2(x,y,z) with the constraint c(x,y,z)=0. Then in the above example

follows

approximately a chi-square distribution with one degree of freedom, provided the hypothesis is true. It is not an exact

- distribution because the equation c(x,y,z)=0 is non-linear, however, the non-linearity is

unimportant as long as the residuals

, etc. are small, which is true when

.

A general method for solving constrained minimization problems is the Lagrange multiplier method. In this example it will result in four equations

for the four unknowns x,y,z and

and

, where

is a Lagrange multiplier.

A more efficient method in the present case is to use the constraint c(x,y,z)=0 to eliminate one variable, writing for example

This elimination method gives 3-1=2 equations

for two unknowns and chain rule (

, instead of the 3+1=4 equations of the Lagrange multiplier method. The

Jacobi Matrix) is useful in computing

has three equations

http://rkb.home.cern.ch/rkb/AN16pp/node37.html (2 of 3)9/3/2006 14:14:35

and

. Counting constraints, one

Constraints

with two free parameters and , so the number of degrees of freedom is 3-2=1, as before. Note that x, y,z here are measured quantities and therefore not free parameters. Another possible method is to add a penalty function kc2 to S2, with k a large constant, and to minimize the sum S2(x,y,z)+k[c(x,y,z)]2. Example 2: Assume an event in a scattering experiment where the energy and momentum of every particle is measured. Then the conservation of energy and momentum imposes four constraints, so there are four degrees of freedom. This example may also be treated differently. If N particle tracks are observed, meeting at the same vertex, then the 3N+3 physically interesting variables are the vertex position and the N 3-momenta . However, these are not directly measured, instead one measures altogether M coordinates on the N tracks, which are functions of the physical variables, i.e.

These are M equations with 3N+3 unknowns, so in this treatment there are M-3N-3 degrees of freedom. Adding the four energy- and momentum conservation equations gives M-3N+1 degrees of freedom. In the last example the number of degrees of freedom happens to be equal to the number of measurements minus the number of parameters. Note that this relation is only true in the special case when there is one equation for every measured quantity, a common situation when fitting curves in two or three dimensions.

next

up

previous

contents

index

Next: Convolution Up: No Title Previous: Confidence Level Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node37.html (3 of 3)9/3/2006 14:14:35

Convolution

next

up

previous

contents

index

Next: Coordinate Systems Up: No Title Previous: Constraints

Convolution Convolution is both a mathematical concept and an important tool in data processing, in particular in digital signal and image processing. Discussing first the mathematical aspect, let us assume the goal of an experiment is to measure a random variable X, described by the probability density function f (x). Instead of X, however, the setup allows us x

to observe only the sum U = X+Y of two random variables, where Y has the probability density function f (y) (typically Y is a composite of the measurement error and acceptance functions). The (convolved or y

folded) sum has the probability density f(u) given by the convolution integrals

If f(u) and f (y) are known it may be possible to solve the above equation for f (x) analytically y

x

( deconvolution or unfolding). Most frequently, one knows the general form of f (x) and f (y), but wants to determine some open x

y

parameters in one or both functions. One then performs the above integrals and, from fitting the result f (u) to the distribution obtained by experiment, finds the unknown parameters. For a number of cases f(u) can be computed analytically. A few important ones are listed below. The convolution of two two normal distributions with zero mean and variances distribution with zero mean and variance

is a normal

.

distributions with f1 and f2 degrees of freedom is a

The convolution of two

and

distribution with f1 +f2

degrees of freedom. The convolution of two Poisson distributions with parameters parameter

.

http://rkb.home.cern.ch/rkb/AN16pp/node38.html (1 of 4)9/3/2006 14:14:41

and

is a Poisson distribution with

Convolution

The convolution of an exponential and a and a normal distribution is approximated by another exponential distribution. If the original exponential distribution is

and the normal distribution has zero mean and variance the sum is

In a semi-logarithmic diagram where lies by the amount

, then for

the probability density of

is plotted versus x and

versus u the latter

higher than the former but both are represented by parallel straight lines,

the slope of which is determined by the parameter

.

The convolution of a uniform and a and a normal distribution results in a quasi-uniform distribution smeared out at its edges. If the original distribution is uniform in the region and vanishes elsewhere and the normal distribution has zero mean and variance is

, the probability density of the sum

Here

is the distribution function of the standard normal distribution. For the function f(u) vanishes for ub and is equal to 1/(b-a) in between. For finite the sharp steps at a and b are rounded off over a width of the order . Convolutions are also an important tool in the area of digital signal or signal or image processing . They are used for the description of the response of linear shift-invariant systems, and are used in many filter operations.

http://rkb.home.cern.ch/rkb/AN16pp/node38.html (2 of 4)9/3/2006 14:14:41

Convolution

One-dimensional discrete convolutions are written

(often abbreviated to

).

Convolutions are commutative, associative, and distributive; they have as the identity operation

with

for all

, and

. The figure shows a one-dimensional example of two

sequences and their convolution.

For longer sequences, convolution may pose a problem of processing time; it is often preferred to perform the operation in the frequency domain: if X, Y and Z are the Fourier transforms of x, y and z, respectively, then:

Normally one uses a fast Fourier transform (FFT), so that the transformation becomes

http://rkb.home.cern.ch/rkb/AN16pp/node38.html (3 of 4)9/3/2006 14:14:41

Convolution

For the FFT, sequences x and y are padded with zeros to a length of a power of 2 of at least M + N - 1 samples. For more e.g. [Kunt80]details and more references , [Oppenheim75] or [Rabiner75].

next

up

previous

contents

index

Next: Coordinate Systems Up: No Title Previous: Constraints Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node38.html (4 of 4)9/3/2006 14:14:41

Coordinate Systems

next

up

previous

contents

index

Next: Correlation Coefficient Up: No Title Previous: Convolution

Coordinate Systems The mathematical description of a geometrical system (detector, magnetic field, etc.) can often be greatly simplified by expressing it in terms of an appropriate coordinate system. be Cartesian (or Euclidean) coordinates. A point in space is represented

Let by a vector

where

is the origin and

are Cartesian unit vectors. The line element is

Two different Euclidean coordinate systems are related by a translation and a rotation. i

i

For more general coordinates u = u (x,y,z), the chain rule (

Jacobi Matrix)

gives the line element

where

Any vector defined at the point

, e.g. an electric field

system as

http://rkb.home.cern.ch/rkb/AN16pp/node39.html (1 of 2)9/3/2006 14:14:46

, will be expressed in the old and the new

Coordinate Systems

Thus,

The most important difference is that the new basis vectors basis vectors

vary with

, while the Cartesian

are the same everywhere.

For orthogonal coordinate systems, which are the main systems in practice, one has g = 0 for ij

is then convenient to introduce orthonormal basis vectors at the point

An orthogonal matrix

relates the two bases

and

(

. It

,

Rotations),

The determinant |A| is everywhere either +1 (for a right-handed system) or -1 (left-handed), except possibly at singularities of the transformation. In two dimensions, the most common non-Cartesian system is that of polar coordinates. In three dimensions, the most commonly used, apart from Cartesian, cylindrical coordinates spherical coordinates.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node39.html (2 of 2)9/3/2006 14:14:46

Correlation Coefficient

next

up

previous

contents

index

Next: Cost of Test Up: No Title Previous: Coordinate Systems

Correlation Coefficient The correlation coefficient between two random variables X and X covariance divided by the square i

j

root of the product of the variances

and vanishes for independent variables. If

It has the range

and X are j

linearly dependent and the covariance matrix is singular. The correlation coefficient can be regarded as a measure of the relation between the statistical distributions of the two random variables considered: if

and

are the variances along the

uncorrelated major and minor axes in the plane defined by the two variables, the correlation coefficient Bivariate Normal Distribution) is given by after a rotation by the angle a (

with

If no minor/major axes can be defined (

), the variables are uncorrelated.

The global correlation coefficient is a measure for the strongest correlation between variable i and a linear combination of all other variables, and is defined by

where C and (C-1) are elements in the diagonal of the covariance matrix and of its inverse, ii

ii

http://rkb.home.cern.ch/rkb/AN16pp/node40.html (1 of 2)9/3/2006 14:14:48

Correlation Coefficient

respectively.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node40.html (2 of 2)9/3/2006 14:14:48

Cost of Test

next

up

previous

contents

index

Next: Covariance Up: No Title Previous: Correlation Coefficient

Cost of Test The cost of a test is the probability of rejecting good events in hypothesis testing ( Diagram).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node41.html9/3/2006 14:14:49

Neyman-Pearson

Covariance

next

up

previous

contents

index

Next: Covariance Ellipse Up: No Title Previous: Cost of Test

Covariance The covariance between two random variables X1,X is the following moment about the means E(X ), E j

(X ) ( j

i

Expectation Value)

It vanishes for independent variables. The converse is not true: a covariance of zero is not a sufficient condition for independence. The

are the variances of the variables. The C constitute the ij

covariance matrix . The covariance matrix is always symmetric and positive semi-definite. It is diagonal if all n variables are independent. Its determinant is zero if linear relations exist between variables.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node42.html9/3/2006 14:14:51

Covariance Ellipse

next

up

previous

contents

index

Next: Cramer-Rao Inequality Up: No Title Previous: Covariance

Covariance Ellipse Bivariate Normal Distribution

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node43.html9/3/2006 14:14:51

Cramer-Rao Inequality

next

up

previous

contents

index

Next: Cramer-Smirnov-Von-Mises Test Up: No Title Previous: Covariance Ellipse

Cramer-Rao Inequality This inequality sets a lower limit (the minimum variance bound ) on the uncertainty which can be achieved in the estimation of a parameter . This bound is given by:

where V is the variance (square of the standard deviation), information about

bias of the estimator, and I is the x

contained in the data X, which can be written:

where L is the likelihood function, and the integral is taken over all the space of the observables X ( [Eadie71]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node44.html9/3/2006 14:14:53

Cramer-Smirnov-Von-Mises Test

next

up

previous

contents

index

Next: Cramer's Rule Up: No Title Previous: Cramer-Rao Inequality

Cramer-Smirnov-Von-Mises Test A powerful test that a one-dimensional data sample is compatible with being a random sampling from a given distribution. It is also used to test whether two data samples are compatible with being random samplings of the same, unknown distribution. It is similar to the Kolmogorov test, but somewhat more complex computationally. To compare data consisting of N events whose cumulative distribution is S (x) with a hypothesis N

function whose cumulative distribution is F(x) and whose density function is f(x), the value W2 is calculated:

The confidence levels for some values of NW2 are (

[Eadie71]) for N>3:

conf.l. NW2 10% 0.347 5% 0.461 1% 0.743

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node45.html9/3/2006 14:14:54

Cramer's Rule

next

up

previous

contents

index

Next: Curtosis Up: No Title Previous: Cramer-Smirnov-Von-Mises Test

Cramer's Rule The solution of the linear equations

is x = D1/D, y = D2/D, where D, D1 and D2 are

determinants:

Cramer's rule is the general formula for n linear equations with n unknowns: each unknown x can be i

expressed as the quotient D /D where D is the determinant of the coefficient matrix, and D is D with the i

i

ith column replaced by the right-hand side. For large n, the method is both inefficient on computers and numerically unstable, and hence should in general not be used for numerical computations if n > 3.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node46.html9/3/2006 14:14:55

Curtosis

next

up

previous

contents

index

Next: Cylindrical Coordinates Up: No Title Previous: Cramer's Rule

Curtosis By the curtosis c (also kurtosis) of a distribution one defines the quotient of the fourth moment about the mean E(X) and the fourth power of the standard deviation

It is large if the distribution has sizeable tails which extend much further from the mean E(x) than Since the normal distribution has c=3, it is sometimes c-3 that is called the curtosis.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node47.html9/3/2006 14:14:56

.

Cylindrical Coordinates

next

up

previous

contents

index

Next: Database Up: No Title Previous: Curtosis

Cylindrical Coordinates The cylindrical coordinates

The matrix A (

are related to Cartesian coordinates (x,y,z) by:

Coordinate Systems) relating the two sets of unit vectors is:

The volume element is

, and the distance element is

The Laplace differential equation

becomes in cylindrical coordinates

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node48.html9/3/2006 14:14:58

.

Database

next

up

previous

contents

index

Next: Data Compression Up: No Title Previous: Cylindrical Coordinates

Database A database is a computer-based collection of data structured in a schematic way, and usually includes an access system called a database management system (or DBMS) of variable complexity ( [Mayne81] or [Bowers91]). General database systems are, of course, available commercially; a good introduction can be found in [Loney94].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node49.html9/3/2006 14:14:59

Data Compression

next

up

previous

contents

index

Next: Data Structures Up: No Title Previous: Database

Data Compression Large amounts of data can create enormous problems in storage and transmission. A good example is given by digitized images: a single DIN A4 colour picture, scanned at 300 dpi with 8 bits/pixel/colour, produces 30 MBytes of data. The widespread, consumer-market use of information in the form of images has contributed much to the development of data compression techniques. The design goal of image compression is to represent images with as few bits as possible, according to some fidelity criterion, to save storage and transmission channel capacity. All image compression techniques try to get rid of the inherent redundancy, which may be spatial (neighbouring pixels), spectral (pixels in different spectral bands in a colour image) or temporal (correlated images in a sequence, e.g. television). There are lossless methods, which are reversible, viz. do not sacrifice any information, and lossy methods which may be used if the quality of a compression-decompression sequence is judged by general criteria, like unchanged quality for the human visual system. Note that in image processing jargon, ``lossless'' is sometimes used in the sense of ``no visible loss''. Examples of lossless methods are run-length coding, Huffman coding , or the Lempel-Ziv-Welsh (LZW) method. In run-length coding one replaces runs , sequences of equal greyvalues, by their lengths and the greyvalues. Huffman and LZW coding are approximations to entropy encoding, i.e. frequently used sequences are replaced by short codes, rare sequences by longer codes. In Huffman coding, sequences are single greyvalues, for LZW they are strings of greyvalues. Of the many lossy coding techniques the simplest may be thresholding, applicable in some situations; the most important ones are predictive and transform coding. In predictive coding, one removes the correlation between neighbouring pixels locally, and quantizes only the difference between the value of a sample and a predicted value ( Quantization). Transform coding decorrelates the whole signal, e.g. pixels in an image, as a unit, and then quantizes the transform coefficients, viz one sets a block of insignificant coefficients to zero. Only complete sets of unitary transforms are considered, i.e. transforms with the property of equal energy in the spatial domain and in the transform domain. This compression works well if the energy is clustered in a few transform samples. One talks of zonal coding, if certain coefficients are systematically set to zero (e.g. frequencies in the Fourier domain), and of adaptive coding, if coefficients are set to zero according to some threshold criterion of significance ( e.g. rank reduction in principal component analysis) The following sets of unitary transforms are usually described in the literature ( http://rkb.home.cern.ch/rkb/AN16pp/node50.html (1 of 2)9/3/2006 14:15:00

[Rabbani91])

Data Compression

● ● ● ● ● ●

- Karhunen-Loeve or principal component analysis, - Discrete cosine transform, - Fourier transform, - Hadamard transform, - Slant transform, - Haar transform.

They are listed above in order of decreasing energy compaction and computer time used. The popular JPEG algorithm for compression of colour images uses essentially the discrete cosine transform (DCT), followed by quantization and Huffman coding (JPEG, short for the original committee ``Joint Photographic Experts Group'', is a widely used compression standard for still images).

next

up

previous

contents

index

Next: Data Structures Up: No Title Previous: Database Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node50.html (2 of 2)9/3/2006 14:15:00

Data Structures

next

up

previous

contents

index

Next: Decibel Up: No Title Previous: Data Compression

Data Structures A collection of data items and the relations between them are called a data structure in designing and writing programs. Typically, data items are grouped to represent necessary conceptual units proper to the application. These might be units of physics (particle, event, shower), of measurement (pulse in a sensor), of a piece of apparatus (a scintillator cell, a VME crate) or of data processing (a list of tentatively associated signals). Items have data attributes (e.g. coordinates, momentum, signal shape), and relational attributes (pointers to other items). The proper definition of data and their relations is a key element in software engineering. So much so that modern object-oriented programming talks of ``objects'' that may be data or program pieces, and usually contain some of both. Whilst there is no discussion about their conceptual necessity ( [Maurer77]), the practical implementation of data structures is far from agreed upon. Standard programming languages offer more or less limited data structuring concepts as part of the programming language. Most typically, they are limited in the sense that they get declared once and remain rigidly the same; the concept of dynamic data structures allows structural changes during execution of an application; this is, of course, more difficult to define and implement (see [King92]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node51.html9/3/2006 14:15:01

Decibel

next

up

previous

contents

index

Next: Decision Boundary Up: No Title Previous: Data Structures

Decibel One tenth of a unit called the bel, after A.G. Bell, the decibel (dB) denotes the tenfold logarithm to base 10 of the ratio of two amounts of power, . The dB is a convenient way to define attenuation and gain in a system; according to the above definition, 20dB describes a ratio of 100:1, 3dB is close to a factor of 1/2, -20dB stand for a factor of 0.01. One decibel in dynamic range corresponds to 0.3322 bits. The same measure is often used by engineers with a factor of 2 applied to denote the ratios of voltages (or currents) in the form , as power is proportional to the square of the voltage. Note that for a voltage or current ratio the system impedance must be constant. In the frequent use of the unit in the domain of audible noise one often (mistakenly) thinks of dB as an absolute unit; in reality, decibel is a unit to express ratios of sound pressure p1/p0, with the above definition, where p0 is the ``smallest audible noise''. Audio engineers also use dB in the above sense of voltage ratios, and write dBV if they scale by setting 0dBV = 1V, or dBu if the scale is given by 0dBu = 0.775V.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node52.html9/3/2006 14:15:02

Decision Boundary

next

up

previous

contents

index

Next: Decision Quality Diagram Up: No Title Previous: Decibel

Decision Boundary Neyman-Pearson Diagram

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node53.html9/3/2006 14:15:03

Decision Quality Diagram

next

up

previous

contents

index

Next: Deconvolution Up: No Title Previous: Decision Boundary

Decision Quality Diagram Neyman-Pearson Diagram

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node54.html9/3/2006 14:15:04

Deconvolution

next

up

previous

contents

index

Next: Degrees of Freedom Up: No Title Previous: Decision Quality Diagram

Deconvolution Convolution. For more detail,

[Blobel85], [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node55.html9/3/2006 14:15:05

Degrees of Freedom

next

up

previous

contents

index

Next: Delta Function Up: No Title Previous: Deconvolution

Degrees of Freedom Most frequently used in connection with the

-distribution and in least squares fitting, the number of

degrees of freedom describes how many redundant measurements exist in an overdetermined system, and allows one to predict the probability density function of the minimum of the sum of squares in least squares fitting. For more detail,

Constraints.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node56.html9/3/2006 14:15:06

Delta Function

next

up

previous

contents

index

Next: Derivative Matrix Up: No Title Previous: Degrees of Freedom

Delta Function The delta ``function'' (also Dirac delta function)

is not a true function since it cannot be defined

completely by giving the function value for all values of the argument X. Similar to the Kronecker delta, the notation

stands for

For any function F:

or in n dimensions:

can also be defined as a normalized Gaussian function in the limit of zero width.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node57.html9/3/2006 14:15:08

Derivative Matrix

next

up

previous

contents

index

Next: Differentiation Up: No Title Previous: Delta Function

Derivative Matrix Jacobi Matrix

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node58.html9/3/2006 14:15:08

Differentiation

next

up

previous

contents

index

Next: Dirac Delta Function Up: No Title Previous: Derivative Matrix

Differentiation Jacobi Matrix, Numerical Differentiation

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node59.html9/3/2006 14:15:09

Dirac Delta Function

next

up

previous

contents

index

Next: Discrete Cosine Transform Up: No Title Previous: Differentiation

Dirac Delta Function Delta Function

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node60.html9/3/2006 14:15:10

Discrete Cosine Transform

next

up

previous

contents

index

Next: Discriminant Analysis Up: No Title Previous: Dirac Delta Function

Discrete Cosine Transform Also abbreviated DCT, the transform is closely related to the fast Fourier transform; it plays a role in coding signals and images [Jain89], e.g. in the widely used standard JPEG compression. The onedimensional transform is defined by

where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given by

for

.

The discrete cosine transform in two dimensions, for a square matrix, can be written as

with an analogous notation for N, s, t, and the c(i,j) given by c (0,j) = 1/N, c (i,0) = 1/N, and c (i,j) = 2/N for both i and . The DCT has an inverse, defined by

for the one-dimensional case, and

http://rkb.home.cern.ch/rkb/AN16pp/node61.html (1 of 2)9/3/2006 14:15:12

Discrete Cosine Transform

for two dimensions. The DCT is included in commercial image processing packages, e.g. in Matlab (see [MATLAB97]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node61.html (2 of 2)9/3/2006 14:15:12

Discriminant Function

next

up

previous

contents

index

Next: Dispersion Matrix Up: No Title Previous: Discriminant Analysis

Discriminant Function Neyman-Pearson Diagram, Discriminant Analysis

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node63.html9/3/2006 14:15:34

Dispersion Matrix

next

up

previous

contents

index

Next: Distance Function Up: No Title Previous: Discriminant Function

Dispersion Matrix Principal Component Analysis

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node64.html9/3/2006 14:15:36

Distance Function

next

up

previous

contents

index

Next: Distribution Up: No Title Previous: Dispersion Matrix

Distance Function Metric

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node65.html9/3/2006 14:15:36

Distribution

next

up

previous

contents

index

Next: Dynamic Range Up: No Title Previous: Distance Function

Distribution A distribution of measurements or observations is the frequency of these measurements shown as a function of one or more variables, usually in the form of a histogram. Experimental distributions can thus be compared to theoretical probability density functions. The term distribution function is short for cumulative distribution function and describes the integral of the probability density function: a random variable X has the (cumulative) distribution function F(x), if the probability for an experiment to yield an X < x is

For several random variables

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node66.html9/3/2006 14:15:38

the joint distribution function is

Dynamic Range

next

up

previous

contents

index

Next: Eigenvalue Problems Up: No Title Previous: Distribution

Dynamic Range The range of signals that can be correctly handled by a device. It can be expressed as a ratio, either linear or logarithmic [decibel] or, when digitized, as the word length generated by the quantization process, usually expressed in bits. The limiting factors, at the low end, are the system noise and, if applicable, the size of the quantization step. To accommodate simultaneously low and very large signals, one frequently applies a non-linear approach (e.g. logarithmic to maintain constant relative error); any non-linearity will produce a response where the absolute resolution changes with amplitude, thus requiring a careful choice of the non-linear transfer function.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node67.html9/3/2006 14:15:39

Eigenvalue Problems

next

up

previous

contents

index

Next: Entropy Up: No Title Previous: Dynamic Range

Eigenvalue Problems Eigenvalue problems appear as part of the solution in many scientific or engineering applications. An example is the determination of the main axes of a second order surface

(with a

symmetric matrix A). The task is to find the places where the normal

is parallel to the vector x, i.e

.

A solution x of the above equation with origin. Therefore,

and

has the squared distance . The main axes are

from the .

The general algebraic eigenvalue problem is given by

with I the identity matrix, with an arbitrary square matrix A, an unknown scalar , and the unknown vector x. A non-trivial solution to this system of n linear homogeneous equations exists if and only if the determinant

http://rkb.home.cern.ch/rkb/AN16pp/node68.html (1 of 3)9/3/2006 14:15:43

Eigenvalue Problems

This nth degree polynomial in is called the characteristic equation. Its roots are called the eigenvalues and the corresponding vectors x eigenvectors. In the example, x is a right eigenvector for a left eigenvector y is defined by

;

.

Solving this polynomial for is not a practical method to solve eigenvalue problems; a QR-based method is a much more adequate tool ( [Golub89]); it works as follows: A is reduced to the (upper) Hessenberg matrix H or, if A is symmetric, to a tridiagonal matrix T. The Hessenberg and tridiagonal matrices have the form:

This is done with a ``similarity transform'': if S is a non-singular (n,n) matrix, then is or with y = Sx and B = SAS-1, i.e. A and B transformed to share the same eigenvalues (not the eigenvectors). We will choose for S Householder transformation. The eigenvalues are then found by applying iteratively the QR decomposition, i.e. the Hessenberg (or tridiagonal) matrix H will be decomposed into upper triangular matrices R and orthogonal matrices Q. The algorithm is surprisingly simple: H = H1 is decomposed H1 = Q1R1, then an H2 is computed, H2 = R1Q1. H2 is similar to H1 because H2 = R1Q1 = Q1-1H1Q1, and is decomposed to H2 = Q2R2. Then H3 is formed, H3 = R2Q2, etc. In this way a sequence of H 's (with the same eigenvalues) is generated, that i

finally converges to (for conditions, see [Golub89])

http://rkb.home.cern.ch/rkb/AN16pp/node68.html (2 of 3)9/3/2006 14:15:43

Eigenvalue Problems

respectively. For access to software, [Press95].

Linear Algebra Packages; the modern literature also gives code, e.g.

next

contents

up

previous

index

Next: Entropy Up: No Title Previous: Dynamic Range Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node68.html (3 of 3)9/3/2006 14:15:43

Entropy

next

up

previous

contents

index

Next: Error Ellipse Up: No Title Previous: Eigenvalue Problems

Entropy Originally derived as a quantity which permits to express formally the second law of thermodynamics (Clausius); the entropy S (of a closed system) changes by , where is (heat) energy transferred to the system at temperature T; S can only increase with time or stay the same. The second law is characteristic for irreversible processes, which tend to evolve towards equilibrium; as such entropy is also at the centre of debates on causality (which in many ways contradicts time reversibility) and consciousness. In general terms, entropy is a measure of ``disorder'' and can be seen as depending directly on probability: , where k and k0 are constants and P is the probability of a state. Entropy is also a concept used in information theory; if N states are possible, each characterized by a probability p , with i

, then

is the entropy, the lowest bound on

the number of bits needed to describe all parts of the system; it corresponds to the information content of the system (see [Jain89]). This is used in data compression: entropy encoding makes use of the nonuniform occurrence of bit patterns in some quantized scheme. An efficient entropy encoding technique is Huffman coding.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node69.html9/3/2006 14:15:44

Error Ellipse

next

up

previous

contents

index

Next: Error Function Up: No Title Previous: Entropy

Error Ellipse Bivariate Normal Distribution

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node70.html9/3/2006 14:15:45

Error Function

next

up

previous

contents

index

Next: Error Propagation Up: No Title Previous: Error Ellipse

Error Function Normal Distribution

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node71.html9/3/2006 14:15:46

Error Propagation

next

up

previous

contents

index

Next: ErrorsGeneral Classification Up: No Title Previous: Error Function

Error Propagation If

is a set of random variables with the covariance matrix C , and if x

,

is a set of transformed variables with transformation functions

which are linear or well approximated by the linear terms of the Taylor series

in the neighbourhood of the mean E(X), then the covariance matrix C of y

where T is the matrix of derivatives (

is

Jacobi Matrix)

If the X are independent, i.e. if C is diagonal, the variances of the Y are given by the so-called law of i

x

law of error propagation

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node72.html9/3/2006 14:15:48

i

Errors, General Classification

next

up

previous

contents

index

Next: ErrorsQuadratic Addition Up: No Title Previous: Error Propagation

Errors, General Classification A measurement of a physical quantity yields a random variable X, which is different from because of various sources of measurement errors. It is useful to consider the distribution of X as caused by one single error source at a time, and to find the distribution due to all sources considered independently, convolution. The essence of experimentation consists of finding devices and methods which ● ●

- allow one to estimate the errors reliably, and - keep them small enough to allow the experiment to produce meaningful conclusions.

The most important types of errors are superficially discussed in the following. ●

a) Random errors occur whenever random processes are at work in a measurement, e.g. ionization in chambers, transmission of light in scintillators, conversion of a light signal into an electric signal. Being sums of many small error sources, they are usually well approximated by the normal distribution. The effect of random errors decreases by a factor

●

●

when the available sample size increases

by a factor R. b) A special case of random error occurs when a measurement consists of counting random events. The outcome is then an integer n between 0 and a maximum number N, and the statistical , the binomial distribution approaches distribution of n is the binomial distribution. For the Poisson distribution. The variance of n can be estimated assuming a binomial or Poisson distribution (for the Poisson distribution, var(n)=n). Only if both n and N-n are large, is the [Regener51]). assumption of a normal distribution for n justified ( c) Truncation and rounding errors occur whenever signals are converted to and processed in digital form. Comparatively easy to estimate are truncation errors occurring in digitization processes, e.g. time digitizers using a clock, mechanical digitizers of length or angle using a grating, or analogue to digital converters (ADCs) using simple divider chains. The relevant quantity in these processes is the value corresponding to the least count (e.g. the inverse clock frequency). Translating the least count (l.c.) into a statistical measure, one obtains a standard deviation of

http://rkb.home.cern.ch/rkb/AN16pp/node73.html (1 of 3)9/3/2006 14:15:50

Errors, General Classification

●

●

The effect of truncation errors may be reduced by increased sample size in many cases, but they do not follow the law of Gaussian errors ( [Drijard80]). Rounding errors in the processing of data, i.e. caused in algorithms by the limited word length of computers, are usually much more difficult to estimate. They depend, obviously, on parameters like word size and number representation, and even more on the numerical methods used. Rounding errors in computers may amplify harmless limitations in precision to the point of making results meaningless. A more general theoretical treatment is found in textbooks of numerical analysis (e.g. [Ralston78a]). In practice, algorithms suspected of producing intolerable rounding errors are submitted to stability tests with changing word length, to find a stability plateau where results are safe. d) Systematic errors are those errors which contain no randomness and can not be decreased by increasing sample size. They are due to incomplete knowledge or inadequate consideration of effects like mechanical misalignment, electronic distortion of signals, time-dependent fluctuations of experimental conditions, etc. The efforts of avoiding and detecting all possible systematic errors take the better part of design and analysis in an experiment, the general aim being that they should be compensated or understood and corrected to a level which depresses them below the level of random errors. This usually necessitates a careful scheme of calibration procedures using either special tests and data or, preferably, the interesting data themselves. A systematic error causes the expectation value of X to be different from the true value , i.e. the measurement has the bias

One will usually try to find some estimate b for the bias B by estimating the precision of the calibration procedures used. For lack of better knowledge one then introduces b as an additional random error (of Gaussian distribution) of around the mean X. This is mathematically equivalent to X being normally distributed around with variance b2. A systematic error is thus treated as if it were a random error, which is perfectly legitimate in the limit of many small systematic errors. However, whereas the magnitude of random errors can be estimated by comparing repeated measurements, this is not possible for systematic errors. e) Gross errors are those errors originating in wrong assumptions; they result in a deterioration of results or in losses of data which are difficult to estimate in general. Despite serious preparation and careful real-time control, experiments usually produce data that require, at all levels of processing, cuts and decisions based on statistical properties and hence sometimes are taken wrongly (e.g. the limited two-track resolution of a drift chamber makes two adjacent tracks appear as one, random pulses in scintillators produce a fake trigger). The experimenter's aim is, of course, to keep the influence of gross errors below that of all other error sources. The extent of his success becomes visible when test functions are compared with their theoretical distribution. In nearly all experiments, such critical distributions exhibit tails larger than expected, which show the level of gross errors (outliers) of one sort or another.

http://rkb.home.cern.ch/rkb/AN16pp/node73.html (2 of 3)9/3/2006 14:15:50

Errors, General Classification

next

up

previous

contents

index

Next: ErrorsQuadratic Addition Up: No Title Previous: Error Propagation Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node73.html (3 of 3)9/3/2006 14:15:50

Errors, Quadratic Addition

next

up

previous

contents

index

Next: Estimator Up: No Title Previous: ErrorsGeneral Classification

Errors, Quadratic Addition Let a measurement of the physical quantity yield the random variable X, and the deviation of X from be due to N independent (uncorrelated) errors. Hypothetical measurements with only one of these errors present would yield the deviations . If all these differences can be described by distributions with zero means and variances

then the

difference

follows a distribution of zero mean and variance

( Convolution). Expressed in errors rather than variances, one has the rule of quadratic addition of errors:

which can also be written

For errors

of normal distribution, the total error

will also have a normal distribution.

For large N, the total error will have normal distribution for any distribution of the limit theorem).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node74.html9/3/2006 14:15:53

( central

Estimator

next

up

previous

contents

index

Next: Euler Angles Up: No Title Previous: ErrorsQuadratic Addition

Estimator A random variable X is described by a probability density function which is determined by one or several parameters

,

. From a sample

of size N, e.g.the results

of a series of N measurements, one can construct functions S = S i

which are called estimators of the parameters

with

i

, and can be used to determine the

An estimator is unbiased if its expectation value E(S ) is equal to the parameter in question ( i

).

Otherwise it has the bias

An estimator is consistent if its bias and variance both vanish for infinite sample size

An estimator is called efficient if its variance attains the minimum variance bound ( Inequality), which is the smallest possible variance.

Cramer-Rao

For the estimators of the parameters of the more important distributions e.g. Binomial Distributione.g. Binomial Distribution, Normal Distribution. Uncertainties of estimators with unknown statistical Bootstrap). properties can be studied using subsamples ( Quite independent of the type of distribution, unbiased estimators of the expectation value variance are the sample mean and the sample variance :

The practical implementation of this formula seems to necessitate two passes through the sample, one for finding the sample mean, a second one for finding . A one-pass formula is

http://rkb.home.cern.ch/rkb/AN16pp/node75.html (1 of 2)9/3/2006 14:15:55

.

Estimator

where C has been introduced as a first guess of the mean, to avoid numerical difficulties clearly given if . Usually, C = X1 is a sufficiently accurate guess, if C = 0 is not adequate.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node75.html (2 of 2)9/3/2006 14:15:55

Euler Angles

next

up

previous

contents

index

Next: Expectation Value Up: No Title Previous: Estimator

Euler Angles Rotations

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node76.html9/3/2006 14:15:56

Expectation Value

next

up

previous

contents

index

Next: Exponential Distribution Up: No Title Previous: Euler Angles

Expectation Value The expectation value or mean of a random variable X or a function H(X) is given by

for a discrete or continuous variable, respectively. The sum for discrete variables is extended over all possible values of x , where P(X=x ) are the corresponding probabilities. For continuous variables, the i

i

probability density is f(x). The concept is readily generalized for several random variables by replacing X by

.

The expectation value is a linear operator. The expectation value of a function is sometimes written E

x

(H) instead of E(H(x)).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node77.html9/3/2006 14:15:57

Exponential Distribution

next

up

previous

contents

index

Next: Extrapolation to the Limit Up: No Title Previous: Expectation Value

Exponential Distribution The exponential distribution is characterized by a probability density function

with positive a and for

, resulting in

Exponential distributions describe the distance between events with uniform distribution in time: if x is the time variable, ax is the expected number of events in the interval [0,x], then is the probability of no event in [0,x] ( Poisson Distribution). The probability for the first event to occur in the interval is given by

Thus, the distribution of individual lifetimes of unstable particles is exponential Exponential functions are also commonplace when describing phenomena of attenuation. Depending on the context, the mean 1/a is called the mean life of a particle, the lifetime of a stored beam, the attenuation length of a scintillator, etc. In a bin of width

where

with starting abscissa x1 one will find a fraction of events given by

. The average height for the bin is given by

The average abscissa for the same bin is at

http://rkb.home.cern.ch/rkb/AN16pp/node78.html (1 of 2)9/3/2006 14:16:00

.

Exponential Distribution

which is always between x1 and

as can be seen from the development

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node78.html (2 of 2)9/3/2006 14:16:00

Extrapolation to the Limit

next

up

previous

contents

index

Next: F Distribution Up: No Title Previous: Exponential Distribution

Extrapolation to the Limit Let F(h) be some quantity, such as a numerical derivative or integral, depending on a finite step size h, n

where the limit of F(h) as is wanted. If it is known that F(h) = F(0) + O(h ), i.e., the order n of the error is known, then for any r (with 0
so as to obtain a smaller error as

,

with m>n. If m is known, then the procedure can be repeated, with G instead of F and m instead of n.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node79.html9/3/2006 14:16:01

F Distribution

next

up

previous

contents

index

Next: F Test Up: No Title Previous: Extrapolation to the Limit

F Distribution (Also called the Fisher-Snedecor Distribution): if X1 and X2 are random variables that follow normal distributions with arbitrary means and variances

and

, and if samples of sizes N1 and N2 are drawn

from the two distributions, then

are unbiased estimators of the variances. The quantities

chi-square distributions with f1=N1-1 and f2=N2-1 degrees of freedom, respectively. The quotient

is described by the F distribution with (f1,f2) degrees of freedom. It has the probability density function

and the properties 3pt

http://rkb.home.cern.ch/rkb/AN16pp/node80.html (1 of 2)9/3/2006 14:16:04

F Distribution

In the limit

the product f1F approaches the chi-square distribution with f1 degrees of freedom.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node80.html (2 of 2)9/3/2006 14:16:04

F Test

next

up

previous

contents

index

Next: Factor Analysis Up: No Title Previous: F Distribution

F Test In comparing two independent samples of size N1 and N2 the F Test provides a measure for the probability that they have the same variance. The estimators of the variance are s12 and s22. We define as test statistic their ratio T = s12/ s22, which follows an F Distribution with f1= N1-1 and f2= N2-1 degrees of freedom. One can formulate the F test for three different hypotheses, defined by:

Tables of the quantiles

, etc., of the F distribution can be found in the literature (e.g. [Brandt83]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node81.html9/3/2006 14:16:05

Factor Analysis

next

up

previous

contents

index

Next: Fast Transforms Up: No Title Previous: F Test

Factor Analysis A variant of statistical analysis of data close to principal component analysis, the difference being largely one of terminology. Factors are the eigenvectors of the dispersion matrix ( Principal Component Analysis), multiplied by the square root of the corresponding eigenvalue. One calls the elements of this transformed eigenvector the factor loadings; they show the amount by which each original variable contributes to the total variance , which is the eigenvalue (= the sum of squared factor loadings). Like in principal component analysis, it is customary in factor analysis to set some elements in the factor matrix (viz. the matrix of factor loadings) to zero, using some problem-specific thresholding; this rank analysis or quantization reduces the dimensionality of the problem.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node82.html9/3/2006 14:16:06

Fast Transforms

next

up

previous

contents

index

Next: Feature Extraction Up: No Title Previous: Factor Analysis

Fast Transforms In image or signal processing, linear transforms are commonplace, e.g. in data compaction; their processing time may be a sensitive parameter. To transform an N-vector f according to g = A f (with A a square matrix) into a vector g, requires O(N2) operations; for large N clearly a problem (imagine the pixels of an image, redefined every 25msec!). Fortunately, for most unitary transforms (and for N an integral power of 2), fast algorithms exist. They are essentially based on the fact that one can partition the task into some intermediate steps, and subsequently reuse the intermediate results in further iterations. The Cooley-Tukey algorithm for the discrete Fourier transform is an example. It is based on a factorization algorithm by Good:

where the A are very sparse matrices. This reduces the number of operations to i

We illustrate this by the 8-point Walsh transform ( algorithm with different coefficients:

.

Orthogonal Functions), which uses the same

with g = W8 f where f and g are 8-vectors. If this transformation were to be carried out in a straightforward way, 64 additions or subtractions would be necessary. Good's sparse matrix factorization for this case reads W8 = A1 A2 A3, with the definitions

http://rkb.home.cern.ch/rkb/AN16pp/node83.html (1 of 5)9/3/2006 14:16:10

Fast Transforms

In the first step, only sums and differences of neighbouring pixels are formed. They are then used in the second step to produce expressions of four pixels, etc. Only three steps are necessary to obtain the entire transform:

http://rkb.home.cern.ch/rkb/AN16pp/node83.html (2 of 5)9/3/2006 14:16:10

Fast Transforms

The following signal flowchart shows the three steps; solid and dashed lines indicate additions and subtractions, respectively:

A similar gain can be obtained on the Haar transform, whose transformation matrix (for an 8-vector) is given by: http://rkb.home.cern.ch/rkb/AN16pp/node83.html (3 of 5)9/3/2006 14:16:10

Fast Transforms

Blind execution needs 64(N2) additions or subtractions (plus a small number of multiplications by 2 or ). Suppressing all zero values, this is reduced to

. The corresponding signal

flowchart shows that only 14, or more generally, additions or subtractions are necessary for its computation (remember that N is a power of 2). This is by far the fastest of all unitary transforms.

No efficient algorithm has been found for the optimal Karhunen-Loeve transform, which is signaldependent. For all other unitary transforms fast algorithms exist. They reduce the task from O(N2) operations to about ) operations, except for the very sparse Haar transform, for which O (N) operations suffice. For the Fourier transform, these operations are complex, for all the others real. Because the non-

http://rkb.home.cern.ch/rkb/AN16pp/node83.html (4 of 5)9/3/2006 14:16:10

Fast Transforms

sinusoidal Walsh and Haar transform matrices consist only of +1, -1 and zero, or simple multiples thereof, only additions or subtractions have to be executed. Comparing complex multiplication with real addition, and considering the necessary high precision for the Fourier kernels , the difference in processing time between Fourier and Walsh or Haar transforms is evidently substantial. For

next

e.g. [Beauchamp87], [Kunt80]more details and references , [Pratt78], or [Ahmed75].

up

previous

contents

index

Next: Feature Extraction Up: No Title Previous: Factor Analysis Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node83.html (5 of 5)9/3/2006 14:16:10

Feature Extraction

next

up

previous

contents

index

Next: FIFO Up: No Title Previous: Fast Transforms

Feature Extraction The transformation of signal or image raw data into higher-level characteristic variables. These may be general features, which are evaluated to ease further processing, or application-oriented, like those needed for image recognition. Extracting edges in an image ( Sharpening) is an example of a general algorithm, identifying the boundaries of individual chromosomes in medical imaging is an applicationImage Processing. dependent example. For a list of textbooks,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node84.html9/3/2006 14:16:11

FIFO

next

up

previous

contents

index

Next: Filtering Up: No Title Previous: Feature Extraction

FIFO Short for first-in-first-out. A commonly used buffering technique holding data as in a pipeline, e.g. for synchronizing different parts of a complex system, or in a data-driven environment. More commonly used in hardware, but software implementations of FIFOs exist, e.g. as communication in parallel systems. also Stack.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node85.html9/3/2006 14:16:12

Filtering

next

up

previous

contents

index

Next: Finite Difference Method Up: No Title Previous: FIFO

Filtering Filtering is a basic technique in signal processing, and usually consists of the modifications of the frequency components of a signal; for instance, the shape of the frequency spectrum is modified by suppressing or enhancing certain frequencies. In general, these filters are defined as step functions in the frequency domain (pass- and stop-bands). Many linear shift-invariant systems can be described by a difference equation with constant coefficients of the form:

If we assume

, we can compute the current output y from the input values and output values n

that have been previously computed:

If N = 0, the filter is called a finite impulse response filter (FIR); the output depends only on the sum of products of weights b with the present and past M input samples; this is a non-recursive filter. If M = 0 m

the filter is called an infinite impulse response filter (IIR); the output depends on the present input sample and the sum of products of weights b with the present and past M input samples; this is a m

recursive filter. Some idealized filters are shown in the following graph:

http://rkb.home.cern.ch/rkb/AN16pp/node86.html (1 of 3)9/3/2006 14:16:14

Filtering

For the e.g. [Kunt84], [PROG79]design of digital filters , or the signal processing tools of commercial mathematical packages. The following example of a band stop filter was constructed using Matlab [MATLAB97]:

The graph shows an input signal consisting of the sum of three sine functions with different frequencies (sin1, sin2, sin3), the ideal band stop filter, and the resulting output signal with sin2 suppressed. The two plots of the magnitude of the digital Fourier transforms (DFT) of the input and output signals also show the suppression of the middle frequency.

http://rkb.home.cern.ch/rkb/AN16pp/node86.html (2 of 3)9/3/2006 14:16:14

Filtering

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node86.html (3 of 3)9/3/2006 14:16:14

Finite Difference Method

next

up

previous

contents

index

Next: Finite Element Method Up: No Title Previous: Filtering

Finite Difference Method A simple and efficient method for solving ordinary differential equations (ODEs) in problem regions with simple boundaries. The method requires the construction of a mesh defining local coordinate surfaces. For each node of this mesh, the unknown function values are found, replacing the differential equations by difference equations, i.e is replaced by , where For a more detailed discussion and examples,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node87.html9/3/2006 14:16:15

are steps in an iterative procedure. e.g. [Press95].

Finite Element Method

next

up

previous

contents

index

Next: Finite State Machine Up: No Title Previous: Finite Difference Method

Finite Element Method A powerful method for solving partial differential equations (PDE) in problem regions with complicated boundaries, if the PDE is equivalent to the minimization problem for a variational integral. The method requires the definition of elementary volumes, for each of which the integral can be approximated as a function of node values of the unknown functions. The sum of these variational integral values will be minimized by the method as a function of the node values. For a more detailed discussion and examples,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node88.html9/3/2006 14:16:16

e.g. [Press95], [Ames77].

Finite State Machine

next

up

previous

contents

index

Next: Fitting Up: No Title Previous: Finite Element Method

Finite State Machine Although basically mostly a formal concept like the Turing machine, finite state machines do have some applications. A finite state machine consists of ● ● ●

a set of states, an input alphabet (tokens), a transition function for each state, mapping tokens to other states.

Some of the states are terminal, like ``accept'' or ``reject'', thus have no output to other states. Other than the transition functions, a finite state machine has no memory. Finite state machines may be used to classify items, or to find a string of tokens in an input stream.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node89.html9/3/2006 14:16:17

Fitting

next

up

previous

contents

index

Next: Folding Up: No Title Previous: Finite State Machine

Fitting Experimental data analysis frequently leads to the following set of m simultaneous equations for the n (< m) unknowns c (an overdetermined system): j

Here the c are the unknowns and the f (u ), b are known. If we introduce j

j

i

the (m,n) matrix A = (f (u )) j

i

the (n,1) matrix the (m,1) matrix the problem to solve becomes

where the sign means that we want to find the vector x in the range of A which is closest to b [Branham90], [Flowers95]). according to some norm ( As an example we choose the fitting of a second-order polynomial. With f (u ) = u i

above equation becomes

http://rkb.home.cern.ch/rkb/AN16pp/node90.html (1 of 3)9/3/2006 14:16:20

j

i-1 ,

j

the matrix A in the

Fitting

T

and Ax = b can be solved e.g. by QR decomposition: QRx = b becomes x = R-1 Q b.

As a second example we look at the fitting of a second-order surface

through the

neighbours of a point in an image. The coordinates u,v and the given values b are:

The coefficients of the second-order polynomial with the least squares condition, where

http://rkb.home.cern.ch/rkb/AN16pp/node90.html (2 of 3)9/3/2006 14:16:20

can be found by solving Ax = z

Fitting

Using the pseudoinverse, one gets x = A+b.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node90.html (3 of 3)9/3/2006 14:16:20

Folding

next

up

previous

contents

index

Next: Fourier Transform Up: No Title Previous: Fitting

Folding convolution.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node91.html9/3/2006 14:16:21

Fourier Transform

next

up

previous

contents

index

Next: Fractile Up: No Title Previous: Folding

Fourier Transform The principle of Fourier analysis consists of decomposing an arbitrary function s(t), possibly periodic, into simple wave forms, i.e. into a sum of sine and cosine waves in the case of a periodic wave form, and into an integral over sine and cosine waves, if the wave form is not periodic. This way one obtains a representation of the original wave form that allows one to identify easily which frequencies are contained in the wave form. Mathematically speaking, there are two steps involved in performing this decomposition. Step one is the Fourier transform

of the wave form s(t). The second step is the inverse Fourier transform

which yields the decomposition of s(t). If the wave form s(t) is periodic with period T, the Fourier transform is given by the series

where

and

is the Dirac delta function. Substitution yields the Fourier series

http://rkb.home.cern.ch/rkb/AN16pp/node92.html (1 of 4)9/3/2006 14:16:25

Fourier Transform

Interpreting f as a frequency, it follows that S(f) determines which frequencies contribute to the sine and cosine decomposition of s(t), and what the corresponding amplitudes are. If

|S(f)| is called the amplitude, or Fourier spectrum, of s(t); and

is the phase angle of the Fourier

transform. Knowledge of S(f) is sufficient for reconstructing s(t). In other words, the Fourier transform S(f) is a representation in the frequency domain of the information contained in the wave form s(t) in the time domain. The following basic properties of the Fourier transform are important for applications. Time domain

Frequency domain

s1(t)+s2(t)

S1(f)+S2(f)

s(at)

(1/a)S(f/a)

s(t-t0) S(f-f0) s(t) even

S(f) real

s(t) odd

S(f) imaginary (convolution) S(f)=s1(f)S2(f)

(continuous for all t)

For Fourier analysis on a computer, the infinite integration interval has to be truncated on both sides, and the integral discretized. This leads to what is called the discrete Fourier transform and its inverse, vital tools in signal and image processing. The discrete Fourier transform X of a vector x of length N is defined by

http://rkb.home.cern.ch/rkb/AN16pp/node92.html (2 of 4)9/3/2006 14:16:25

Fourier Transform

and its inverse is given by

.

with

Many digital signal processing operations, convolution, can be speeded up substantially if implemented Fast Transforms) is used. in the frequency domain, in particular when the Fast Fourier Transform ( The main use of the discrete Fourier transform is in finding the frequency components in signals:

For more information about the Fourier transform, and particularly spectral analysis, consult the standard textbooks, e.g. [Kunt80] or [Rabiner75]. For implementations, see [Press95], or rely on software packages like Matlab (see [MATLAB97]) or Mathematica (see [Wolfram91]).

next

up

previous

contents

index

Next: Fractile Up: No Title Previous: Folding

http://rkb.home.cern.ch/rkb/AN16pp/node92.html (3 of 4)9/3/2006 14:16:25

Fourier Transform

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node92.html (4 of 4)9/3/2006 14:16:25

Fractile

next

up

previous

contents

index

Next: Full Width at Half Up: No Title Previous: Fourier Transform

Fractile Quantile

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node93.html9/3/2006 14:16:26

Full Width at Half Maximum

next

up

previous

contents

index

Next: Gabor Filter Up: No Title Previous: Fractile

Full Width at Half Maximum The full width at half maximum or FWHM is a simple measure of the width of a distribution, and is easily obtained from empirical distributions, histograms. As one of the two parameters describing a Breit-Wigner distribution whose standard deviation is infinite, it is most frequently used in connection with distributions describing resonant states. For a distribution described by the probability density f(x) the FWHM is defined by |x2-x1| where x1,x2 are points to the left and right of the mode x (defined by f(x ) = max), with f(x1)=f(x2)=f(x )/2. For the m

normal distribution one has the relation

between FWHM and

, standard deviation.

The FWHM can only be defined for unimodal distributions.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node94.html9/3/2006 14:16:27

m

m

Gabor Filter

next

up

previous

contents

index

Next: Gamma Function Up: No Title Previous: Full Width at Half

Gabor Filter Gabor filters are defined by harmonic functions modulated by a Gaussian distribution. As an example, in two dimensions and using a polar coordinate system with coordinates and :

Gabor filters bear some similarity to Fourier filters, but (by the Gaussian damping terms) are limited to certain frequency bands (``passband filter''). With a judicious choice of frequencies, e.g. by octaves (viz. by successive factors of 2), a succession of Gabor filters can be assimilated to a wavelet transform, and do an excellent job in image or information compaction. Compare also Haar Transform.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node95.html9/3/2006 14:16:28

Gamma Function

next

up

previous

contents

index

Next: Gauss-Jordan Elimination Up: No Title Previous: Gabor Filter

Gamma Function Euler's gamma function is defined by the integral

For real integer and half integer arguments it is given by

and the recurrence formula valid for all complex z (except negative integers and zero) is

Some further values of the Gamma function for small arguments are: (1/5)=4.5909 (1/3)=2.6789 (3/5)=1.4892 (3/4)=1.2254 An asymptotic formula for

(1/4)=3.6256 (2/5)=2.2182 (2/3)=1.3541 (4/5)=1.1642 .

and |z| large is Stirling's formula

which also approximates the factorial:

Rudolf K. Bock, 7 April 1998 http://rkb.home.cern.ch/rkb/AN16pp/node96.html (1 of 2)9/3/2006 14:16:30

Gamma Function

http://rkb.home.cern.ch/rkb/AN16pp/node96.html (2 of 2)9/3/2006 14:16:30

Gauss-Jordan Elimination

next

up

previous

contents

index

Next: Gauss-Markov Theorem Up: No Title Previous: Gamma Function

Gauss-Jordan Elimination Gaussian Elimination

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node97.html9/3/2006 14:16:31

Gauss-Markov Theorem

next

up

previous

contents

index

Next: Gauss-Seidel Iteration Up: No Title Previous: Gauss-Jordan Elimination

Gauss-Markov Theorem This theorem states that when estimating parameters in a linear model (viz. the parameters appear linearly in the model), the linear least squares estimator is the most efficient (viz. with minimum variance, Estimator) of all unbiased estimators which can be reduced to linear functions of the data. There are cases where other estimators are more efficient, but they are not linear functions of the data.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node98.html9/3/2006 14:16:32

Gauss-Seidel Iteration

next

up

previous

contents

index

Next: Gaussian Distribution Up: No Title Previous: Gauss-Markov Theorem

Gauss-Seidel Iteration Linear Equations, Iterative Solutions

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node99.html9/3/2006 14:16:33

Gaussian Distribution

next

up

previous

contents

index

Next: Gaussian Elimination Up: No Title Previous: Gauss-Seidel Iteration

Gaussian Distribution Normal Distribution

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node100.html9/3/2006 14:16:34

Gaussian Elimination

next

up

previous

contents

index

Next: Gaussian Quadrature Up: No Title Previous: Gaussian Distribution

Gaussian Elimination Gaussian elimination is used to solve the system of linear equations Ax = b, where

The method consists of combining the coefficient matrix A with the right hand side b to the ``augmented'' (n, n + 1) matrix

A sequence of elementary row operations is then applied to this matrix so as to transform the coefficient part to upper triangular form: ● ● ●

multiply a row by a non-zero real number c, swap two rows, add c times one row to another one. will then have taken the following form:

and the original equation is transformed to Rx = c with an upper triangular matrix R, from which the unknowns x can be found by back substitution. http://rkb.home.cern.ch/rkb/AN16pp/node101.html (1 of 2)9/3/2006 14:16:37

Gaussian Elimination

Assume we have transformed the first column, and we want to continue the elimination with the following matrix

To zero

we want to divide the second row by the ``pivot''

, multiply it with

and subtract it

from the third row. If the pivot is zero we have to swap two rows. This procedure frequently breaks down, not only for ill-conditioned matrices. Therefore, most programs perform ``partial pivoting'', i.e. they swap with the row that has the maximum absolute value of that column. ``Complete pivoting'', always putting the absolute biggest element of the whole matrix into the right position, implying reordering of rows and columns, is normally not necessary. Another variant is Gauss-Jordan elimination , which is closely related to Gaussian elimination. With the same elementary operations it does not only zero the elements below the diagonal but also above. The resulting augmented matrix will then look like:

Therefore, back substitution is not necessary and the values of the unknowns can be computed directly. Not surprisingly, Gauss-Jordan elimination is slower than Gaussian elimination.

next

up

previous

contents

index

Next: Gaussian Quadrature Up: No Title Previous: Gaussian Distribution Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node101.html (2 of 2)9/3/2006 14:16:37

Gaussian Quadrature

next

up

previous

contents

index

Next: Genetic Algorithms Up: No Title Previous: Gaussian Elimination

Gaussian Quadrature The computation of definite integrals in one or more dimensions is called quadrature ( Numerical Integration). Gaussian quadrature uses the fact that the choice of abscissas at which to evaluate the function to be integrated can substantially contribute to improving the accuracy of the result; for details, see [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node102.html9/3/2006 14:16:38

Genetic Algorithms

next

up

previous

contents

index

Next: Geometric Mean Up: No Title Previous: Gaussian Quadrature

Genetic Algorithms A class of heuristic and adaptive search algorithms useful in maximization and minimization problems with a large number of discrete solutions. They are inspired by concepts of natural selection, and contain an aspect of randomness. In a genetic algorithm, many solutions are maintained, each with an associated value of the objective function (the function that is to be maximized). These solutions are allowed to combine according to a reproductive plan, producing new solutions. The population is kept constant; as both reproduction and replacement in the population favour solutions with a high objective function (the ``fittest''), the solution may eventually converge to a population of good solutions only, producing [Beasley93]. noticeably better generations. For implementation,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node103.html9/3/2006 14:16:39

Geometric Mean

next

up

previous

contents

index

Next: Geometrical Transformations Up: No Title Previous: Genetic Algorithms

Geometric Mean Mean

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node104.html9/3/2006 14:16:40

Geometrical Transformations

next

up

previous

contents

index

Next: Givens Rotation Up: No Title Previous: Geometric Mean

Geometrical Transformations In many image processing applications, geometrical transformations facilitate processing. Examples are image restoration, where one frequently wants to model the degradation process as space-invariant, or the calibration of a measurement device, or a correction in order to remove a relative movement between object and sensor. In all cases the first operation is to eliminate a known geometrical distortion, as in the following figure:

In many cases, a two-dimensional polynomial transformation from the distorted (x,y) system to the undistorted (u,v) system is sufficient:

The a , b , are usually found by some fitting method. If one takes the example of N = 3, i.e. n = 10 ij

ij

unknowns, with

corresponding point pairs (x ,y ) and (u ,v ), and defines the (m,n) matrix

http://rkb.home.cern.ch/rkb/AN16pp/node105.html (1 of 3)9/3/2006 14:16:42

i i

i i

Geometrical Transformations

then the problem can be written

with a, b the vectors of coefficients. Choosing the orthogonal triangularization A = QR (

QR Decomposition), the solutions are:

where S is the (n,n) upper triangular matrix part of R, and P a (m,n) part of Q (see figure).

A different example is the calibration of a scanner based on a cathode ray tube; this type of twodimensional digitizer shows a pincushion distortion, which is corrected by a fifth order polyomial (i.e. N = 5):

http://rkb.home.cern.ch/rkb/AN16pp/node105.html (2 of 3)9/3/2006 14:16:42

Geometrical Transformations

When dealing with digital images characterized by greyvalues (or colours), interpolation between the greyvalues at the locations in the distorted image becomes necessary. This resampling can be done using different methods. The simplest one is the nearest-neighbour interpolation. Bilinear interpolation uses a weighted average of the four nearest pixels. If (x0,y0) is the point for which the greyvalue g0(x0,y0) should be interpolated, and (x ,y ) with g (x ,y ) are the four known neighbours, then g0 can be computed i i

i i i

by establishing a local relation

The four unknowns a, b, c, d are determined from the four known nearest neighbours, and then used to obtain g0. For higher order interpolations,

next

up

previous

e.g. [Pratt78].

contents

index

Next: Givens Rotation Up: No Title Previous: Geometric Mean Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node105.html (3 of 3)9/3/2006 14:16:42

Givens Rotation

next

up

previous

contents

index

Next: Global Correlation Coefficient Up: No Title Previous: Geometrical Transformations

Givens Rotation Let A be an (m,n) matrix with and full rank (viz. rank n). An orthogonal matrix QR Decomposition) consists in determining an (m,m) orthogonal matrix Q such triangularization ( that

with the (n,n) upper triangular matrix R. One only has then to solve the triangular system Rx = Py, where P consists of the first n rows of Q. Householder transformations clear whole columns except for the first element of a vector. If one wants to clear parts of a matrix one element at a time, one can use Givens rotation, which is particularly Parallel Processing). practical for parallel implementation ( A matrix

with properly chosen

and

for some rotation angle

can be used to zero the

element a . The elements can be zeroed column by column from the bottom up in the following order: ki

http://rkb.home.cern.ch/rkb/AN16pp/node106.html (1 of 2)9/3/2006 14:16:45

Givens Rotation

Q is then the product of g = (2m + n + 1) / 2 Givens matrices To annihilate the bottom element of a (2,1) vector:

the conditions sa + cb = 0 and c2 + s2 = 1 give:

For `Fast Givens', see [Golub89].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node106.html (2 of 2)9/3/2006 14:16:45

.

Global Correlation Coefficient

next

up

previous

contents

index

Next: Global Image Operations Up: No Title Previous: Givens Rotation

Global Correlation Coefficient Correlation Coefficient

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node107.html9/3/2006 14:16:46

Global Image Operations

next

up

previous

contents

index

Next: Goodness-of-fit Test Up: No Title Previous: Global Correlation Coefficient

Global Image Operations A global operation on an image is a mapping of all input pixels f(m,n) into an output image g(i,j). Linear transformations (usually invertible) can be written:

The function O(i, j; m, n) is a function of the input and output coordinates, of the row coordinates i,m and the column coordinates j,n. Particularly interesting are the linear transformations with separable kernels:

in which case the two-dimensional transformation can be executed as the succession of two onedimensional transforms, columns first and rows next:

or, in matrix notation,

.

All image transformations mentioned under orthogonal functions are of this type. If F is an image, this linear transformation represents O(N4) operations (multiplication and additions). For a Fourier transform the operations are complex. For a reasonably large N this becomes in practice a problem of computing time. If O is separable as above, the number of operations is reduced to 2N3. For further drastic reductions, Fast Transforms.

http://rkb.home.cern.ch/rkb/AN16pp/node108.html (1 of 2)9/3/2006 14:16:48

Global Image Operations

Another global image processing operation is the Hough transform.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node108.html (2 of 2)9/3/2006 14:16:48

Goodness-of-fit Test

next

up

previous

contents

index

Next: Gradient Up: No Title Previous: Global Image Operations

Goodness-of-fit Test A statistical test in which the validity of one hypothesis is tested without specification of an alternative hypothesis is called a goodness-of-fit test. The general procedure consists in defining a test statistic, which is some function of the data measuring the distance between the hypothesis and the data (in fact, the badness-of-fit), and then calculating the probability of obtaining data which have a still larger value of this test statistic than the value observed, assuming the hypothesis is true. This probability is called the size of the test or confidence level. Small probabilities (say, less than one percent) indicate a poor fit. Especially high probabilities (close to one) correspond to a fit which is too good to happen very often, and may indicate a mistake in the way the test was applied, such as treating data as independent when they are correlated. The most common tests for goodness-of-fit are the chi-square test, Kolmogorov test, Cramer-SmirnovVon-Mises test, runs.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node109.html9/3/2006 14:16:49

Gradient

next

up

previous

contents

index

Next: Gram-Schmidt Decomposition Up: No Title Previous: Goodness-of-fit Test

Gradient The gradient of a (differentiable) function

of n variables is the vector of

partial derivatives

(

Jacobi Matrix). It can be pictured as a vector pointing in the direction of fastest increase of f, whose

length

is the rate of increase of f in this direction. Thus, at any given point

level surface

is normal to the

) = constant which goes through that point.

In two dimensions, drawing level curves f(x1,x2) = c for suitably chosen values of c produces a ``map'' of f. Then

points everywhere ``uphill'', normal to the level curves, with

inversely proportional to

the distance between the curves. An extremum of f, i.e. a local minimum, maximum or saddle point, is a point where method for finding an extremum is to solve the equation

= 0. One

= 0, e.g. by the Newton-Raphson method,

which converges fast, but requires the second derivatives of f as well as a more or less precise first guess but no second derivatives, or for the solution. Other minimization methods exist which use f and which use only f itself (

Minimization).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node110.html9/3/2006 14:16:50

Gram-Schmidt Decomposition

next

up

previous

contents

index

Next: Graph Theory Up: No Title Previous: Gradient

Gram-Schmidt Decomposition Any set of linearly independent vectors can be converted into a set of orthogonal vectors by the Gram-Schmidt process. In three dimensions v1 determines a line; the vectors v1 and v2 determine a plane. The vector q1 is the unit vector in the direction v1. The (unit) vector q2 lies in the plane of v1, v2, and is normal to v1 (on the same side as v2). The (unit) vector q3 is normal to the plane of v1, v2, on the same side as v3. In general, first set u1= v1, and then each u is made orthogonal to the preceding i

subtraction of the projections of v in the directions of i

The i vectors u span the same subspace as the v . The vectors i

i

by

:

are orthonormal. This

leads to the following theorem: Any (m,n) matrix A with linearly independent columns can be factorized into a product, A = QR. The columns of Q are orthonormal and R is upper triangular and invertible. This ``classical'' Gram-Schmidt method is often numerically unstable, see [Golub89] for a ``modified'' Gram-Schmidt method.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node111.html9/3/2006 14:16:52

Graph Theory

next

up

previous

contents

index

Next: Haar Transform Up: No Title Previous: Gram-Schmidt Decomposition

Graph Theory Graph theory is a branch of topology which, although going back to L. Euler, has received particular interest only in recent years, as its applications in electrical engineering and operations research lend themselves readily to algorithmic formulation and solutions on digital computers. Also, computer science itself finds numerous applications of graph theory, like in computer networking and the structuring of data [Deo74]. Graph theory formalizes the relations of entities called graphs, which consist of two sets of objects called nodes (or vertices) and edges, each edge connecting two nodes. A vertex has the degree d, if it has d incident edges. Two vertices are adjacent if they have a common edge. A walk is a sequence of connected vertices and edges; a trail is a walk in which no vertex is included more than once. An important graph is the tree, which is a graph containing no loops, and for which therefore the relation holds. Given a metric (a distance function for nodes), edges can be assigned a numerical value, e.g. the Euclidean distance between its two end-points (graphs are in n-space). It is then possible to define a minimum spanning tree which is that tree for a given set of nodes, for which is minimal, where e is the value associated to the edge connecting the nodes i and j, and the sum is over ij

all edges. It is comparatively easy to write an algorithm for the minimum spanning tree [Zahn71], and the concept has been applied in the recognition of tracks from digitizings ([Zahn73], [Cassel80]) and for cluster recognition in multi-dimensional space. Another application of a graph-theoretical notion, the compatibility graph, is useful in pattern recognition, for deciding between conflicting candidates using the same raw data. The problem to be solved there arises when low-level information (e.g. digitizings or track segments) can be connected to give a higher-level result (track) in several conflicting ways. The decision as to which of the possible sets of tracks compatible with each other is to be chosen, makes use of criteria like

, the amount of

unused information, etc. The compatibility graph is simply an aid in picking all possible non-conflicting sets. For more on graph theory, see [Chartrand85] or [Skiena90].

http://rkb.home.cern.ch/rkb/AN16pp/node112.html (1 of 2)9/3/2006 14:16:53

Graph Theory

next

up

previous

contents

index

Next: Haar Transform Up: No Title Previous: Gram-Schmidt Decomposition Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node112.html (2 of 2)9/3/2006 14:16:53

Haar Transform

next

up

previous

contents

index

Next: Hamming Distance Up: No Title Previous: Graph Theory

Haar Transform Being the fastest of all known complete unitary transforms ( Fast Transforms), the (non-sinusoidal) Haar transform is well suited for the data compression of non-stationary (``spiky'') signals, or for edge extraction in images (see [Jain89]). It can also be viewed as a special kind of wavelet transform. In the Haar transform most of the coefficients are functions of only part of the signal, e.g. the Haar transform matrix: (orthonormal)

Apart from powers of

, all elements are 1, -1, and 0. One can interpret the multiplication of the Haar

matrix with a signal vector as sampling the signal from low to high frequencies (``sequencies''). The first sample corresponds to the mean, the second to the mean difference of the first to the last four neighbour pixels, and the last four transform samples represent differences of two neighbouring pixels. Most of the transform coefficients depend only on their direct neighbours; the Haar transform thus has good highand low-frequency response. The following is an example of a 128-point one-dimensional signal from a physics experiment, and shows the original signal, the result of the Haar transform, and the signal that has been reconstructed from 13 transform samples.

http://rkb.home.cern.ch/rkb/AN16pp/node113.html (1 of 2)9/3/2006 14:16:55

Haar Transform

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node113.html (2 of 2)9/3/2006 14:16:55

Hamming Distance

next

up

previous

contents

index

Next: Harmonic Mean Up: No Title Previous: Haar Transform

Hamming Distance In comparing two bit patterns, the Hamming distance is the count of bits different in the two patterns. More generally, if two ordered lists of items are compared, the Hamming distance is the number of items that do not identically agree. This distance is applicable to encoded information, and is a particularly simple metric of comparison, often more useful than the city-block distance (the sum of absolute values of distances along the coordinate axes) or Euclidean distance (the square root of the sum of squares of the distances along the coordinate axes). also Metric.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node114.html9/3/2006 14:16:56

Harmonic Mean

next

up

previous

contents

index

Next: Hash Function Up: No Title Previous: Hamming Distance

Harmonic Mean Mean

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node115.html9/3/2006 14:16:56

Hash Function

next

up

previous

contents

index

Next: Heaviside Function Up: No Title Previous: Harmonic Mean

Hash Function Hashing is the technique of using scatter storage in computer programs, for fast access to items contained in large lists. Items can, of course, represent anything a computer deals with, like events, persons, particle tracks, elements in a sparse matrix, etc. The application of hashing is indicated when frequent and random access is made to items with an identifier, the search key, as the only access clue. Well-done hashing may be regarded as approximate entropy encoding. also Data Compression. Scatter storage is spread over different addresses; the hash function converts the identifiers into such addresses. If the hash function is a one-to-one function, pointers to items are stored directly at the address given by the hash function. If the hash function associates the same address to different existing identifiers, the final item address is found by a search in a (short) list, and the search key must appear in the list. The ratio at which elements in the hash vector are used is called the loading factor. A good hash function achieves a loading factor not too far from one, and thus ensures roughly equal probability of addressing elements in the hash vector. In practice, search time is usually more relevant than space, and attention is given to using hash functions resulting in a loading factor <1, so that hash clashes can be handled as exceptions. For numeric information, e.g. the indices i in a sparse matrix, the hash function used is often i' = i modulo(k), where k is chosen to be k = 4j + 3 with j a prime number. Clashes can be settled by linear probing, which consists of searching upwards for the next unoccupied location in the hash vector, if a clash occurs. Searching time depends critically on the loading factor , and takes an average of

comparisons (with hash vector elements) if successful, and

comparisons for an unsuccessful search. In the second equation,

. This is valid for values

e.g. [Maurer77]For details , [Knuth81], [Vitter87], for sparse matrix of around 0.5 to 0.8. compaction see [Branham90].

http://rkb.home.cern.ch/rkb/AN16pp/node116.html (1 of 2)9/3/2006 14:16:59

Hash Function

next

up

previous

contents

index

Next: Heaviside Function Up: No Title Previous: Harmonic Mean Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node116.html (2 of 2)9/3/2006 14:16:59

Heaviside Function

next

up

previous

contents

index

Next: Hessian Up: No Title Previous: Hash Function

Heaviside Function A discontinuous step function, usually defined as h(x) = 0 for x < 0, and as h(x) = 1 for

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node117.html9/3/2006 14:17:00

.

Hessian

next

up

previous

contents

index

Next: Histogram Up: No Title Previous: Heaviside Function

Hessian Given a scalar function of an n-vector

,

. The symmetric

derivatives is called the Hessian matrix

matrix of second partial

of F:

The Hessian is positive (negative) definite at a minimum (maximum) of F, indefinite at a saddle point. acting on F gives the trace of H: The Laplace operator

For applications

Lagrange Multipliers,

Minimization.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node118.html9/3/2006 14:17:02

Histogram

next

up

previous

contents

index

Next: Horner's Rule Up: No Title Previous: Hessian

Histogram Measured or generated data can be grouped into bins, i.e. discretized by classifying into groups each characterized by a range of values in characteristic variables. The resulting graphical representation, usually limited to one or two variables, is called a (one- or two-dimensional) histogram. This process results in a certain loss of information compared to scatter diagrams, but is frequently necessary for the purpose of showing the statistical properties of data and in applying some calculational methods. On Binning. choosing bin sizes for histograms,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node119.html9/3/2006 14:17:03

Horner's Rule

next

up

previous

contents

index

Next: Hot Spot Up: No Title Previous: Histogram

Horner's Rule Horner's rule is the factorization

of a polynomial. It reduces the computation to n multiplications and n additions. The rule can be generalized, e.g. to a finite series

orthogonal polynomials p =p (x). Using the recurrence relation k

k

one obtains

with

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node120.html9/3/2006 14:17:04

Hot Spot

next

up

previous

contents

index

Next: Hough Transform Up: No Title Previous: Horner's Rule

Hot Spot Apparatus errors often result in some sensor parts giving a permanently high signal (or signal above threshold). They may, for instance, be due to dark currents in a sensor, or defective transmission. Depending on context, one talks about hot pixels, hot channels, or hot spots or spikes in a geometrical distribution.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node121.html9/3/2006 14:17:05

Hough Transform

next

up

previous

contents

index

Next: Householder Transformation Up: No Title Previous: Hot Spot

Hough Transform The Hough transform [Hough59] is a standard tool in image analysis that allows recognition of global patterns in an image space by recognition of local patterns (ideally a point) in a transformed parameter space. It is particularly useful when the patterns one is looking for are sparsely digitized, have ``holes'' and/or the pictures are noisy. The basic idea of this technique is to find curves that can be parameterized like straight lines, polynomials, circles, etc., in a suitable parameter space. Although the transform can be used in higher dimensions the main use is in two dimensions to find, e.g. straight lines, centres of circles with a fixed radius, parabolas y = ax2 + bx + c with constant c, etc. As an example consider the detection of straight lines in an image. We assume them parameterized in the form: , where is the perpendicular distance from the origin and the angle with the normal. Collinear points (x ,y ), with

, are transformed into N sinusoidal curves

i i

in the (

) plane, which intersect in the point (

Care has to be taken when one quantizes the parameter space (

).

). When the bins of the (

) space

(it is easy to visualize the transform as a two-dimensional histogram) are chosen too fine, each intersection of two sinusoidal curves can be in a different bin. When the quantization is not fine enough, on the other hand, nearly parallel lines which are close together will lie in the same bin. For a certain range of quantized values of parameters

http://rkb.home.cern.ch/rkb/AN16pp/node122.html (1 of 2)9/3/2006 14:17:08

and

, each (x , y ) is mapped into the ( i

i

)

Hough Transform

space and the points that map into the locations ( histogram IHIST (

), i.e. IHIST (

are accumulated in the two-dimensional ) = IHIST (

.

If a greylevel image g(x,y) is given, and g is the greyvalue at the point (x , y ) the greyvalues are i

) = IHIST (

accumulated: IHIST (

i

i

.

In this form, the Hough transform is not basically different from the discrete Radon transform, typically used for reconstruction of three-dimensional images from two-dimensional projections. Local maxima of the pixel intensity IHIST (

) identify straight line segments in the original

image space. Ideally, the Hough domain has to be searched for a maximum only once. In situations where a picture contains many patterns of different size, it may, however, be necessary to take out first those patterns in the original image space that correspond to clearly identifiable peaks in the Hough domain and to repeat the process.

next

up

previous

contents

index

Next: Householder Transformation Up: No Title Previous: Hot Spot Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node122.html (2 of 2)9/3/2006 14:17:08

Householder Transformation

next

up

previous

contents

index

Next: Huffman Coding Up: No Title Previous: Hough Transform

Householder Transformation The most frequently applied algorithm for QR decomposition uses the Householder transformation u = H v, where the Householder matrix H is a symmetric and orthogonal matrix of the form:

with the identity matrix I and any normalized vector x with

.

Householder transformations zero the m-1 elements of a column vector v below the first element:

One can verify that

fulfils

and that with

one obtains the vector

To perform the decomposition of the (m,n) matrix A = QR (with

. ) we construct in this way an

(m,m) matrix H(1) to zero the m-1 elements of the first column. An (m-1,m-1) matrix G(2) will zero the m2 elements of the second column. With G(2) we produce the (m,m) matrix

http://rkb.home.cern.ch/rkb/AN16pp/node123.html (1 of 2)9/3/2006 14:17:10

Householder Transformation

i

After n (n-1 for m = n) such orthogonal transforms H( ) we obtain:

R is upper triangular and the orthogonal matrix Q becomes:

i

In practice the H( ) are never explicitely computed.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node123.html (2 of 2)9/3/2006 14:17:10

Huffman Coding

next

up

previous

contents

index

Next: Hypothesis Testing Up: No Title Previous: Householder Transformation

Huffman Coding A form of variable-length information encoding which approaches the minimum number of bits necessary ( Entropy). It uses the fact that in an information stream the data for a given variable may N

be given as N bits, but that not all 2 bit combinations are used or at least not with equal probability. In short, the ideal Huffman code functions like this (see [Jain89]): arrange the symbols in order of decreasing probability of occurrence; assign the bit 0 to the symbol of highest probability, the bit 1 to what is left; proceed the same way for the second-highest probability value (which now has the code 10), and iterate. In practice, this may result in a long ``code book'' and correspondingly clumsy compute-intensive coding and decoding, so that application-dependent truncated Huffman codes or other modified procedures are used more often. Truncated Huffman coding encodes only the most probable values according to the rule above, and uses fixed-length coding for the remainder. Also Data Compression.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node124.html9/3/2006 14:17:11

Hypothesis Testing

next

up

previous

contents

index

Next: Ideogram Up: No Title Previous: Huffman Coding

Hypothesis Testing Much of statistical analysis is concerned with inferring from measured data in a sample some properties of a population; this is usually achieved making comparisons of sample data with other data of known properties, e.g. generated assuming some theory. In most cases, statistic has to be calculated, based on which the compatibility (expressed as confidence levels) with different hypotheses can be established. Note that a ``simpler'' hypothesis (often referred to as null hypothesis) is usually to be preferred over a ``more complicated'' one; introducing many free parameters will typically make the apparent compatibility look better, but introduces a lessening of a ``confidence level'' not expressed in numbers, if there is no physical reason to introduce the parameters. In statistical terminology, one refers to a type-I error if a null hypothesis is wrongly rejected, and one calls type-II error when a null hypothesis is accepted when in fact it is false. This corresponds, in event classification, to losses and contamination ( Neyman-Pearson Diagram).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node125.html9/3/2006 14:17:12

Ideogram

next

up

previous

contents

index

Next: Image Enhancement Up: No Title Previous: Hypothesis Testing

Ideogram The name ideogram is sometimes used for representations of probability density functions (pdf's), obtained by superposing several pdf's of individual measurements. Typically, the individual pdf's are introduced as Gaussian curves centered at the measured value, with a width equal to the standard deviation of the measurement, and with an integral normalized to the weight of the measurement, usually unity. The bin size chosen for displaying an ideogram has to be considerably smaller than the typical standard deviation of a measurement. Ideograms are smoother in appearance than histograms, and will show repeated measurements of the same point as peaks in probability. The apparent width of measurements in the ideogram will, however, be artificially increased by adding the estimated uncertainty to the spread inherent in the measurement. Note that the name ideogram is not part of standard statistics textbook terminology, and is also used with quite different meaning in other contexts; apart from its generic meaning of a drawing or symbol representing an object or concept (e.g. Chinese characters), it is for instance used in the Human Genome Project for a specific graphical representation in chromosome classification.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node126.html9/3/2006 14:17:13

Image Enhancement

next up previous contents index {Smooth Next: Image Processing Up: No Title Previous: Ideogram

Image Enhancement Image enhancement is the improvement of digital image quality (wanted e.g. for visual inspection or for machine analysis), without knowledge about the source of degradation. If the source of degradation is known, one calls the process image restoration. Both are iconical processes, viz. input and output are images. Many different, often elementary and heuristic methods are used to improve images in some sense. The problem is, of course, not well defined, as there is no objective measure for image quality. Here, we discuss a few recipes that have shown to be useful both for the human observer and/or for machine recognition. These methods are very problem-oriented: a method that works fine in one case may be completely inadequate for another problem. Apart from geometrical transformations some preliminary greylevel adjustments may be indicated, to take into account imperfections in the acquisition system. This can be done pixel by pixel, calibrating with the output of an image with constant brightness. Frequently space-invariant greyvalue transformations are also done for contrast stretching, range compression, etc. The critical distribution is the relative frequency of each greyvalue, the greyvalue histogram . Examples of simple greylevel transformations in this domain are:

Greyvalues can also be modified such that their histogram has any desired shape, e.g flat (every greyvalue has the same probability). All examples assume point processing, viz. each output pixel is the function of one input pixel; usually, the transformation is implemented with a look-up table:

http://rkb.home.cern.ch/rkb/AN16pp/node127.html (1 of 3)9/3/2006 14:17:15

Image Enhancement

Physiological experiments have shown that very small changes in luminance are recognized by the human visual system in regions of continuous greyvalue, and not at all seen in regions of some discontinuities. Therefore, a design goal for image enhancement often is to smooth images in more uniform regions, but to preserve edges. On the other hand, it has also been shown that somehow degraded images with enhancement of certain features, e.g. edges, can simplify image interpretation both for a human observer and for machine recognition. A second design goal, therefore, is image sharpening. All these operations need neighbourhood processing, viz. the output pixel is a function of some neighbourhood of the input pixels:

These operations could be performed using linear operations in either the frequency or the spatial domain. We could, e.g. design, in the frequency domain, one-dimensional low or high pass filters ( Filtering), and transform them according to McClellan's algorithm ([McClellan73] to the twodimensional case. Unfortunately, linear filter operations do not really satisfy the above two design goals; in this book, we limit ourselves to discussing separately only (and superficially) Smoothing and Sharpening. Here is a trick that can speed up operations substantially, and serves as an example for both point and neighbourhood like: neighbourhood processing in a binary image: we number the pixels in a

http://rkb.home.cern.ch/rkb/AN16pp/node127.html (2 of 3)9/3/2006 14:17:15

Image Enhancement

and denote the binary values (0,1) by b (i = 0,8); we then concatenate the bits into a 9-bit word, like i

b8b7b6b5b4b3b2b1b0. This leaves us with a 9-bit greyvalue for each pixel, hence a new image (an 8-bit image with b8 taken from the original binary image will also do). The new image corresponds to the matrix containing as coefficients the powers of result of a convolution of the binary image, with a two. This neighbour image can then be passed through a look-up table to perform erosions, dilations, noise cleaning, skeletonization, etc. Apart from point and neighbourhood processing, there are also global processing techniques, i.e. methods where every pixel depends on all pixels of the whole image. Histogram methods are usually global, but they can also be used in a neighbourhood. For global methods, Transform.

next

up

previous

Global Image Operations

contents

Global Image Operations, see also Hough

index

Next: Image Processing Up: No Title Previous: Ideogram Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node127.html (3 of 3)9/3/2006 14:17:15

Image Processing

next

up

previous

contents

index

Next: Image Recognition Up: No Title Previous: Image Enhancement

Image Processing Most one- or two-dimensional signals in everyday life (audible or visible) arise in analogue form. Optical imaging systems or electrical networks process information in an analogue way. They are excellent in execution speed; however, complicated analogue signal or image processing algorithms are very difficult and sometimes impossible to implement. Today, most signals are converted into a form tractable by digital hardware, and can then be treated by digital signal processing (for one-dimensional signals) or by image processing (in two dimensions).

To convert analogue signals into a digital form one has to sample them sufficiently frequently ( Sampling Theorem) and quantize (digitize) the samples, a process usually called a nalogue-to-digital conversion (ADC). After digital processing (DSP), the digital signal is sometimes converted back to an analogue form (DAC) as, e.g. in an image processing system (for viewing). In this case, a video signal from a television camera is the analogue input, the processing is done by some digital hardware in or close to real time (possibly by specialized processors), and the output is again a video signal. Transmission and/or storage of digital signals sometimes needs as a first step, for reasons of economy, data compression. Normally the signals have to be improved in some sense ( Filtering, Image Enhancement, Sharpening, Smoothing). Enhancement with the aim of getting rid of some degradation known a priori is called image restoration. One of the most important parts of practically any automated image recognition system is called image segmentation. This is the classification of each image pixel into one of the constituent image parts. Signal or image processing methods can be executed either directly in the time or spatial domain, respectively, or one can first transform the signals into another domain ( Orthogonal Functions), perform the processing in the transform domain, and then perform the back transformation. Transformations of some input functions f(x,y,z;t) into some output functions g(x,y,z;t) can often be treated as linear shift-invariant systems, for which convolution is the standard operation. For many enhancement problems, non-linear methods like rank filters or morphological operations are indicated.

http://rkb.home.cern.ch/rkb/AN16pp/node128.html (1 of 2)9/3/2006 14:17:16

Image Processing

Digital signal or image processing has found many application in today's commodity markets, and can be extremely compute-intensive. Much effort went into the methods, but also into the development of fast algorithms and into computer architectures ( Parallel Processing). Here is a choice of standard textbooks on signal processing: [Kunt80], [Rabiner75], [Oppenheim75]; on image processing: [Jain89], [Gonzalez87], [Pratt78], [Rosenfeld76]; on specialized hardware:[Kung88].

next

up

previous

contents

index

Next: Image Recognition Up: No Title Previous: Image Enhancement Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node128.html (2 of 2)9/3/2006 14:17:16

Image Recognition

next

up

previous

contents

index

Next: Image Restoration Up: No Title Previous: Image Processing

Image Recognition The ultimate goal of most image processing is to extract information about some high-level and application-dependent objects from an image available in low-level (pixel) form. The objects may be of every day interest like in robotics, cosmic ray showers or particle tracks like in physics, chromosomes like in biology, houses, roads, or differently used agricultural surfaces like in aerial photography or synthetic-aperture radar, etc. This task of pattern recognition is usually preceded by multiple steps of image restoration and enhancement, image segmentation, or feature extraction, steps which can be described in general terms. The final description in problem-dependent terms, and even more so the eventual image reconstruction, escapes such generality, and the literature of application areas has to be consulted. [Jain89] deals with many problems in the most general possible way.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node129.html9/3/2006 14:17:17

Image Restoration

next

up

previous

contents

index

Next: Image Segmentation Up: No Title Previous: Image Recognition

Image Restoration Image restoration removes or minimizes some known degradations in an image. It can be seen as a special kind of image enhancement. The most common degradations have their origin in imperfections of the sensors, or in transmission. It is assumed that a mathematical model of the degradation process is known, or that it can be derived by an analysis of other input images. The degradation process is assumed to be linear and shift-invariant ( Linear Shift-invariant Systems):

where f is the original image, h a degradation function (the point spread function ), n is some undesirable signal assumed additive (noise ), and g is the recorded image ( stands for convolution). In the following figure we want to illustrate the deblurring of some text in the absence of noise n, by inverse filtering in the frequency domain.

http://rkb.home.cern.ch/rkb/AN16pp/node130.html (1 of 3)9/3/2006 14:17:20

Image Restoration

We assume that the squares in the picture g(x,y) come from sharp points in the original picture f(x,y), and use this function h(x,y) as an estimation for the point spread function. The two middle pictures show the corresponding Fourier transforms G(u,v) and H(u,v). An estimate of the original function f'(x,y) is then obtained by the inverse Fourier transform of

The restoration filter in this case was

This problem is in general ill-conditioned, if H(u,v) is small or zero. Often, one sets R(u,v) to zero at points where |H(u,v)| is small. The main problem with inverse filters is the amplification of noise. Different solutions have been suggested for optimal restoration of individual images. Wiener has derived an optimal solution in the statistical sense to the general problem. He derived a restoration filter that minimizes the mean square error between the degraded image and the original, and arrives at the following transfer function:

where

and

are the power spectra of the noise and the signal, respectively. They have to be

known a priori, which limits the filter's practical usefulness. Sometimes the noise can be assumed to be = constant. The only thing one needs then is the signal power of a ``model'' . If the white noise: noise power is zero, the Wiener filter just becomes a normal inverse filter. Sometimes the ratio of noise to signal power can be estimated by a constant K:

Of course there could be much better non-linear and/or shift-variant filters ( Image Enhancement, Morphological Operations, Rank Filters). More information can be found, e.g. in [Rosenfeld76] or [Jain89].

http://rkb.home.cern.ch/rkb/AN16pp/node130.html (2 of 3)9/3/2006 14:17:20

Image Restoration

next

up

previous

contents

index

Next: Image Segmentation Up: No Title Previous: Image Recognition Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node130.html (3 of 3)9/3/2006 14:17:20

Image Segmentation

next

up

previous

contents

index

Next: Importance Sampling Up: No Title Previous: Image Restoration

Image Segmentation Partitioning of an image into several constituent components is called segmentation. Segmentation is an important part of practically any automated image recognition system, because it is at this moment that one extracts the interesting objects, for further processing such as description or recognition. Segmentation of an image is in practice the classification of each image pixel to one of the image parts. If the goal is to recognize black characters, on a grey background, pixels can be classified as belonging to the background or as belonging to the characters: the image is composed of regions which are in only two distinct greyvalue ranges, dark text on lighter background. The greylevel histogram, viz. the probability distribution of the greyvalues, has two separated peaks, i.e. is clearly bimodal. In such a case, the segmentation, i.e. the choice of a greylevel threshold to separate the peaks, is trivial. The same technique could be used if there were more than two clearly separated peaks. Unfortunately, signal and background peaks are usually not so ideally separated, and the choice of the threshold is problematic. A typical histogram, still bimodal, but with peaks not separated, is shown in the figure:

A variety of techniques for automatic threshold selection exists. A relatively successful method for certain applications is described in [Weszka79], where it is suggested that a modified histogram is employed by using only pixels with a small gradient magnitude, i.e. pixels which are not in the region of the boundaries between object and background. In many cases, segmentation on the basis of the greyvalue alone is not efficient. Other features like colour, texture, gradient magnitude or orientation, measure of a template match etc., can be put to use. This produces a mapping of a pixel into a point in an n-dimensional feature space, defined by the vector of its feature values. The problem is then reduced to partitioning the feature space into separate clusters, a general pattern recognition problem that is discussed in the literature. We want to illustrate this with the following example:

http://rkb.home.cern.ch/rkb/AN16pp/node131.html (1 of 2)9/3/2006 14:17:22

Image Segmentation

The two halves of the image labelled ``original'' contain peaks of random height, but of different shape: in the bottom half, the peaks are steeper than in the top half. The greylevel histogram of the original image is clearly not bimodal. We create two different morphological features, and show them in the images labelled ``feature1'' and ``feature2''. We now enter, for all pixels, the greyvalue of feature1 against that of feature2, into a two-dimensional histogram (``feature space''); in this representation it is easy to distinguish the two clusters. For other segmentation methods (edge detection, matched filtering, region analysis, etc.) we refer to standard textbooks: [Jain89], [Gonzalez87], [Pratt78], [Rosenfeld76]. Most of the work in this paragraph was performed with and programmed in the macro language of the public-domain interactive image processing system [NIHimage96].

next

up

previous

contents

index

Next: Importance Sampling Up: No Title Previous: Image Restoration Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node131.html (2 of 2)9/3/2006 14:17:22

Importance Sampling

next

up

previous

contents

index

Next: Interpolation Up: No Title Previous: Image Segmentation

Importance Sampling Also called biased sampling , this is one of the variance-reducing techniques in Monte Carlo methods. A key issue in order to achieve small errors on the obtained result (for a given number of samplings) is a suitable strategy of sampling the available multidimensional space. If the volume to be sampled is large, but is characterized by small probabilities over most parts, one achieves importance sampling by approximating the probability distribution by some function P(x), and generating randomly x according to P, weighting each result at the same time by

. If a Monte Carlo calculation is

visualized as a numerical integration in one dimension, say, importance sampling translates into a change of integration variable (interpret f(x) as a probability density function and use the transformation rule):

is a probability density function, and should be chosen to be close to f(x) in order to reduce the variance optimally. In more dimensions, one usually has to proceed one dimension at a time. For further discussion, e.g. [Press95]. Note that the price to pay for obtaining better average values (for a given number of samplings) is in the deterioration of fluctuations and correlations; the computed variables do not reproduce the model the same way a blind sampling (``analogue'' Monte Carlo) would achieve.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node132.html9/3/2006 14:17:23

Interpolation

next

up

previous

contents

index

Next: Jackknife Up: No Title Previous: Importance Sampling

Interpolation An unknown function f(x) may be known at a number of discrete points , e.g. from measurements. If the function value is needed at other, arbitrary values x, an analytic representation of f (x) has to be found from the (x ,f(x )) pairs such that f can be evaluated for any x. Usually, these methods i

i

work with some confidence only if the x are enclosed by the minimum and maximum x (interpolation), and become risky if one leaves this range (extrapolation). Interpolation often makes use of simple (viz. low-order) polynomials, which may extend over all x 's i

(global), or are derived only for some group of contiguous x 's (piecewise polynomials, so-called spline i

functions); also useful are rational functions (ratios of polynomials), or trigonometric functions. Each method makes different assumptions about the implied function smoothness, and has its pitfalls and advantages. For polynomial interpolation, also Neville Algorithm; for rational function approximation, Pade Approximation. A very practice-oriented discussion can be found in [Wong92] or, with program examples, in [Press95]. Interpolation in several dimensions is usually confined to a set of points on a regular mesh (not and which span a grid of necessarily equidistant), e.g. a set of points, from which one could derive a two-dimensional analytical surface z = z(x,y) to interpolate on. Typically, it is not attempted to define an overall expression for this surface; instead, one solves only an interpolation problem for the desired points

by interpolating in one dimension (one

variable) at a time; in our two-dimensional example, one constructs m interpolation polynomials along x, to find a set of values at the point , which then allow one to interpolate along y for

. Again, [Press95] provides a detailed discussion.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node133.html9/3/2006 14:17:25

Jackknife

next

up

previous

contents

index

Next: Jacobi Determinant Up: No Title Previous: Interpolation

Jackknife The jackknife is a method in statistics allowing one to judge the uncertainties of estimators derived from small samples, without assumptions about the underlying probability distributions. The method consists of forming new samples by omitting, in turn, one of the observations of the original sample. For each of the samples thus generated, the estimator under study can be calculated, and the probability distribution thus obtained will allow one to draw conclusions about the estimator's sensitivity to individual observations. A competitive, perhaps more powerful method is the bootstrap; details can be found in [Efron79] or [Efron82].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node134.html9/3/2006 14:17:26

Jacobi Determinant

next

up

previous

contents

index

Next: Jacobi Iteration Up: No Title Previous: Jackknife

Jacobi Determinant Let

be a function of n variables, and let

be a function of x, where inversely x can be expressed as a function of u,

The formula for a change of variable in an n-dimensional integral is then

is an integration region, and one integrates over all is the Jacobi matrix and

is the absolute value of the Jacobi determinant or Jacobian. As an example, take n=2 and

Define

http://rkb.home.cern.ch/rkb/AN16pp/node135.html (1 of 2)9/3/2006 14:17:30

, or equivalently, all

.

Jacobi Determinant

Then by the chain rule (

Jacobi Matrix)

The Jacobi determinant is

and

This shows that if x1 and x2 are independent random variables with uniform distributions between 0 and 1, then u1 and u2 as defined above are independent random variables with standard normal distributions (

Transformation of Random Variables).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node135.html (2 of 2)9/3/2006 14:17:30

Jacobi Iteration

next

up

previous

contents

index

Next: Jacobi Matrix Up: No Title Previous: Jacobi Determinant

Jacobi Iteration Linear Equations, Iterative Solutions

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node136.html9/3/2006 14:17:31

Jacobi Matrix

next

up

previous

contents

index

Next: Jacobian Up: No Title Previous: Jacobi Iteration

Jacobi Matrix A function f=f(x) of one variable is differentiable at x with derivative

if

This definition can be generalized to the case of m functions of n variables. Then x and h are matrices, and one defines for example matrices (n-vectors), f and R are

then becomes an

matrix, called the Jacobi matrix whose elements are the partial

derivatives:

Other possible notations for

are:

The chain rule is valid in its usual form. If

then

. Note that this is a matrix product, and therefore non-commutative except in special cases. In terms of matrix elements,

http://rkb.home.cern.ch/rkb/AN16pp/node137.html (1 of 2)9/3/2006 14:17:35

Jacobi Matrix

A coordinate transformation

is an important special case, with p=n, and with u=u(x) the

inverse transformation of x=x(u). That is, u=u(x) = u(x(u)), and by the chain rule

i.e., the product of

and

is the unit matrix, or

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node137.html (2 of 2)9/3/2006 14:17:35

.

Jacobian

next

up

previous

contents

index

Next: Jacobian Peak Up: No Title Previous: Jacobi Matrix

Jacobian Jacobi Determinant

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node138.html9/3/2006 14:17:36

Jacobian Peak

next

up

previous

contents

index

Next: Jitter Up: No Title Previous: Jacobian

Jacobian Peak A peak in a probability distribution which can be understood as due to the variation of a Jacobi determinant. For example, if an invariant cross-section falls off exponentially with transverse momentum p , T

then the distribution of p , T

will have a peak at p =1/a. T

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node139.html9/3/2006 14:17:37

Jitter

next

up

previous

contents

index

Next: Kalman Filter Up: No Title Previous: Jacobian Peak

Jitter US slang for fluctuation (originally nervousness). Imprecise in its meaning with respect to well-defined statistical distributions, jitter is directly related to parameters like standard deviation or width.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node140.html9/3/2006 14:17:38

Kalman Filter

next

up

previous

contents

index

Next: Karhunen-Loeve Transform Up: No Title Previous: Jitter

Kalman Filter Originally, Kalman filtering was designed as an optimal Bayesian technique to estimate state variables at a time from indirect and noisy measurements at time t, assuming as known the statistical correlations between variables and time. Kalman filtering can also be used to estimate variables in a static (i.e. time-independent) system, if the mathematical model is suitably segmented. As such it is much used in sequential refinement of tomography images. The model is defined recursively, step by step, by system equations

with x the variables at step i, T (the state transition matrix) and c the linear relation with step i-1, and i

some process noise with covariance matrix

. The x are related to measurements m by another linear i

i

ansatz, the measurement equations

with the observational noise (covariance matrix E), and H and c' the linear relation. T and H are assumed known (although possibly changing from step to step); the same goes for the statistical covariances in and E. If we ignore the shifts c , c' , without loosing generality, the prediction for step i is obtained by i

i

where K, the Kalman gain matrix, is given by

with the step covariance matrix C defined by i

http://rkb.home.cern.ch/rkb/AN16pp/node141.html (1 of 2)9/3/2006 14:17:40

Kalman Filter

It is easy to see that m - H x -1 is a residual, and the successive C are obtained by error propagation. C0 i

i i

i

is the zero matrix. For a more complete treatment, see [Haykin91]. The technique has been used ([Frühwirth97]) for recursive fitting of particle tracks, where multiple (Coulomb) interactions introduce small deflections and hence (non-Gaussian and) highly correlated noise. Both these deflections and the measurements are naturally discretized along the path. The track model is linearized from the equations of motion in a magnetic field, the variables x are track positions. [Frühwirth97] has also shown the robustness of the filter with respect to outliers, and discussed multiple parallel Kalman filters for obtaining a valid model for the non-linear case.

next

up

previous

contents

index

Next: Karhunen-Loeve Transform Up: No Title Previous: Jitter Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node141.html (2 of 2)9/3/2006 14:17:40

Karhunen-Loeve Transform

next

up

previous

contents

index

Next: Kolmogorov Test Up: No Title Previous: Kalman Filter

Karhunen-Loeve Transform Principal Component Analysis

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node142.html9/3/2006 14:17:41

Kolmogorov Test

next

up

previous

contents

index

Next: Korobov Sequences Up: No Title Previous: Karhunen-Loeve Transform

Kolmogorov Test A powerful test (also called Kolmogorov-Smirnov test) that a one-dimensional data sample is compatible with being a random sampling from a given distribution. It is also used to test whether two data samples are compatible with being random samplings of the same, unknown distribution. It is similar to the Cramer-Smirnov-Von-Mises test, but somewhat simpler. To compare a data sample consisting of N events whose cumulative distribution is S (x) with a N

hypothesis function whose cumulative distribution is F(x), the value D is calculated: N

are (for N > 80):

The confidence levels for some values of

conf.l. 10% 5% 1%

1.22 1.36 1.63

To compare two experimental cumulative distributions S (x) containing N events, and S (x) containing N

M

M events, calculate:

Then table. For more detail,

is the test statistic for which the confidence levels are as in the above [Press95].

Rudolf K. Bock, 7 April 1998 http://rkb.home.cern.ch/rkb/AN16pp/node143.html (1 of 2)9/3/2006 14:17:43

Kolmogorov Test

http://rkb.home.cern.ch/rkb/AN16pp/node143.html (2 of 2)9/3/2006 14:17:43

Korobov Sequences

next

up

previous

contents

index

Next: Kronecker Delta Up: No Title Previous: Kolmogorov Test

Korobov Sequences Finite sequences of quasirandom numbers for use in multidimensional integration by quasi-Monte Carlo methods. The kth point (vector) in the p-dimensional space is given by:

The constants a1 are carefully chosen to minimize the non-uniformity of the distribution for p and N. A general discussion can be found in [Zakrzewska78].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node144.html9/3/2006 14:17:44

Kronecker Delta

next

up

previous

contents

index

Next: Kurtosis Up: No Title Previous: Korobov Sequences

Kronecker Delta The Kronecker delta

is defined as having the value one when i=j, and zero when

integers).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node145.html9/3/2006 14:17:45

(i and j are

Kurtosis

next

up

previous

contents

index

Next: Lagrange Multipliers Up: No Title Previous: Kronecker Delta

Kurtosis Curtosis

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node146.html9/3/2006 14:17:46

Lagrange Multipliers

next

up

previous

contents

index

Next: Landau Distribution Up: No Title Previous: Kurtosis

Lagrange Multipliers Let

be a function of n variables

. If the n

variables are independent, then a local minimum of f can be found by solving the n equations

In general, these equations define an extremum of f (a minimum, maximum or saddle point). If the n variables are not independent, but satisfy m constraint equations

gradient of f need not vanish at the extremum, it need only be orthogonal to the (n-m)-dimensional surface described by the constraint equations. That is,

or in matrix notation

, where the coefficients

are called Lagrange

multipliers. The above equations together may be written as

where

A useful method for solving these equations is the Newton-Raphson method. Stick to matrix notation

http://rkb.home.cern.ch/rkb/AN16pp/node147.html (1 of 3)9/3/2006 14:17:51

Lagrange Multipliers

and let

i.e.

Hessian of f and

(

Jacobi Matrix). Assuming that x is an

approximation to the required solution, a better approximation

is calculated. This procedure is

iterated until some convergence criterion is satisfied, e.g. until the equations are satisfied to a given precision, or until the step is ``sufficiently small''. For the unconstrained minimization problem, the Newton-Raphson formula is

For the constrained problem, the Newton-Raphson formula becomes

where the superscripts `u' or `c' stand for ``unconstrained'' or ``constrained''. Apart from the additional term in the formula for

, there is no change in the procedure.

Note in particular that the first guess for the solution may well violate the constraint equations, since these equations are solved during the iteration procedure. Note also that if efficiency is essential, and can be calculated without explicit inversions of the matrices involved. For example, the matrix should be calculated by solution -1 of the linear equation , not by calculation of A . The formulae given here for and then one must solve the linear equations

for

and

are only valid if the matrix A has an inverse. If A is singular,

together.

The Lagrange multiplier method is in general very easy to use. It may, however, be more sensitive to rounding errors and also more time-consuming than the elimination method ( Constraints), in which the constraint equations are solved explicitly. However, an explicit solution is frequently not possible. http://rkb.home.cern.ch/rkb/AN16pp/node147.html (2 of 3)9/3/2006 14:17:51

Lagrange Multipliers

also Minimization.

next

up

previous

contents

index

Next: Landau Distribution Up: No Title Previous: Kurtosis Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node147.html (3 of 3)9/3/2006 14:17:51

Landau Distribution

next

up

previous

contents

index

Next: Laplace Transform Up: No Title Previous: Lagrange Multipliers

Landau Distribution The fluctuations of energy loss by ionization of a charged particle in a thin layer of matter were first described theoretically by Landau [Landau44]. It gives rise to a probability density function characterized by a narrow peak with a long tail towards positive values. The mathematical definition of the probability density function is

where is a dimensionless number and is proportional to the energy loss. Other expressions and formulae, in particular for , have been given in the literature for , for the distribution of , for the derivative

and for the first two moments. [Moyal55] gives a closed analytic form

with

This curve reproduces the gross features of the Landau distribution but is otherwise unrelated to defined above, and a poor approximation in the long tails of

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node148.html9/3/2006 14:17:53

.

as

Laplace Transform

next

up

previous

contents

index

Next: Least Squares Up: No Title Previous: Landau Distribution

Laplace Transform The Laplace transform is an integral transform which has the property of translating certain complicated operations (e.g. the differentiation of a function or the convolution of two functions), into simple algebraic operations in the image (Laplace) space. It can therefore be used to transform certain types of functional equations into algebraic equations. A special case of the Laplace transform is the Fourier transform. The one-sided Laplace transform of a function F is defined by

where s is a complex parameter. If the integral converges for s = a, a real, the Laplace transform f(s) exists for all s with . The two-sided Laplace transform to

is defined by the same formula, with the integral extending from

.

Under appropriate assumptions, the original function F is obtained from the ``image'' function f by the inversion formula

where x is any real number, with

a for

and

for

.

For practical use, refer to modern packages (e.g Mathematica, [Wolfram91]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node149.html9/3/2006 14:17:55

Least Squares

next

up

previous

contents

index

Next: Least SquaresLinear Up: No Title Previous: Laplace Transform

Least Squares The general problem to be solved by the least squares method is this: given some direct measurements y of random variables, and knowing a set of equations f which have to be satisfied by these measurements, possibly involving unknown parameters x, find the set of x which comes closest to satisfying

where ``closest'' is defined by a

such that

The sum of squares of elements of a vector can be written in different ways

The assumption has been made here that the elements of y are statistically uncorrelated and have equal variance. For this case, the above solution results in the most efficent estimators for x, (viz. with Estimator). If the y are correlated, correlations and variances are defined by a minimum variance, Covariance), and the above minimum condition becomes covariance matrix C (

Least squares solutions can be more or less simple, depending on the constraint equations f. If there is exactly one equation for each measurement, and the functions f are linear in the elements of y and x, the solution is discussed under linear regression. For other linear models, Least Squares, Linear. Least squares methods applied to few parameters can lend themselves to very efficient algorithms (e.g. in realtime image processing), as they reduce to simple matrix operations. If the constraint equations are non-linear, one typically solves by linearization and in iterations, using in every step, and linearizing by forming the matrix of derivatives, the approximate values of Jacobi matrix, http://rkb.home.cern.ch/rkb/AN16pp/node150.html (1 of 2)9/3/2006 14:17:58

Least Squares

possibly also

, at the last point of approximation. Note that as the iterative improvements

tend towards zero (if the process converges),

converges towards a final value which enters the

minimum equation above. Algorithms avoiding the explicit calculation of

and

have also been investigated, e.g.

[Ralston78b]; for a discussion, see [Press95]. Where convergence (or control over convergence) is problematic, use of a general package for minimization may be indicated.

next

up

previous

contents

index

Next: Least SquaresLinear Up: No Title Previous: Laplace Transform Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node150.html (2 of 2)9/3/2006 14:17:58

Least Squares, Linear

next

up

previous

contents

index

Next: Left-handed Coordinate System Up: No Title Previous: Least Squares

Least Squares, Linear Let A be an (m,n) matrix with

and b an (m,1) matrix. We want to consider the problem

where stands for the best approximate solution in the least squares sense, i.e. we want to minimize the Euclidean norm of the residual r = Ax - b

We want to find the vector x which is closest to b in the column space of A. Among the different methods to solve this problem, we mention Normal Equations, sometimes illconditioned, QR Decomposition, and, most generally, Singular Value Decomposition. For further e.g. [Golub89], [Branham90], [Wong92], [Press95]. reading, Example: Let us consider the problem of finding the closest point (vertex) to measurements on straight lines (e.g. trajectories emanating from a particle collision).

http://rkb.home.cern.ch/rkb/AN16pp/node151.html (1 of 2)9/3/2006 14:18:00

Least Squares, Linear

This problem can be described by Ax = b with

This is clearly an inconsistent system of linear equations, with more equations than unknowns, a frequently occurring problem in experimental data analysis. The system is, however, not very inconsistent and there is a point that lies ``nearly'' on all straight lines. The solution can be found with the linear least squares method, e.g. by QR decomposition for solving Ax = b:

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node151.html (2 of 2)9/3/2006 14:18:00

Left-handed Coordinate System

next

up

previous

contents

index

Next: Likelihood Up: No Title Previous: Least SquaresLinear

Left-handed Coordinate System Right-handed Coordinate System

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node152.html9/3/2006 14:18:00

Likelihood

next

up

previous

contents

index

Next: Linear Algebra Packages Up: No Title Previous: Left-handed Coordinate System

Likelihood Maximum Likelihood Method

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node153.html9/3/2006 14:18:01

Linear Algebra Packages

next

up

previous

contents

index

Next: Linear Equations Up: No Title Previous: Likelihood

Linear Algebra Packages One of the most complete linear algebra subroutine packages available is LAPACK (see [Anderson92]), which supersedes the packages LINPACK (see [Dongarra79]) and EISPACK (see [Smith76]). Linear algebra packages are also part of the popular commercial interactive software packages like Matlab (see [MATLAB97], [Lindfield95]) or Mathematica (see [Wolfram91]). Both are language-based and cater for numerical computation and visualization, for scientific and engineering applications. Other packages exist, commercial or public-domain, some of them most likely have equal merit as those we happen to mention. For more reading on linear algebra: see [Flowers95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node154.html9/3/2006 14:18:02

Linear Equations

next

up

previous

contents

index

Next: Linear EquationsIterative Solutions Up: No Title Previous: Linear Algebra Packages

Linear Equations A system of m equations in n unknowns

can be written in matrix notation as

with the coefficient matrix:

We have to distinguish the following three cases: overdetermined exactly determined underdetermined m>n m=n m
http://rkb.home.cern.ch/rkb/AN16pp/node155.html (1 of 3)9/3/2006 14:18:04

Linear Equations

Such systems can be consistent and have solutions, or inconsistent and then have no solution, as shown in the following figure representing three equations with three unknowns (three planes). The two systems in the top row are consistent: system a has one point as the unique solution and system b has all points of a straight line as solutions. The three systems in the bottom line (c, d, e) are inconsistent and have no solution.

An underdetermined system (m < n) does not have a unique solution; it can be consistent with infinitely many solutions or inconsistent, with no solution, as can be seen in the above picture. If one removes in the above picture one plane, then all systems will have infinitely many solutions except for system c, which has no solution. In the case of more equations than unknowns (m > n) the system is usually inconsistent and does not have any solution. Adding more planes to the plots in the above picture could leave the systems a and b consistent only if they pass exactly through the intersecting point or line. In some inconsistent (overdetermined) cases, approximate solutions can be found, if additional criteria are introduced ( Fitting).

http://rkb.home.cern.ch/rkb/AN16pp/node155.html (2 of 3)9/3/2006 14:18:04

Linear Equations

To solve Ax = b, one can choose between many different methods depending on A. If A is upper (lower) triangular : backward (forward) substitution symmetric and positive definite : Cholesky Decomposition not triangular

:

Gaussian Elimination

square and many right sides

:

LU Decomposition

non square

:

QR Decomposition

any matrix (e.g. ill-conditioned) :

Singular Value Decomposition.

The computing time increases in the above order. The advantage of orthogonalization methods (QR and SVD) is that they can be applied to all systems, producing stable solutions without accumulation of rounding errors (see [Golub89]).

next

up

previous

contents

index

Next: Linear EquationsIterative Solutions Up: No Title Previous: Linear Algebra Packages Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node155.html (3 of 3)9/3/2006 14:18:04

Linear Equations, Iterative Solutions

next

up

previous

contents

index

Next: Linear Programming Up: No Title Previous: Linear Equations

Linear Equations, Iterative Solutions For certain types of systems of linear equations Ax = b methods like Gaussian elimination can become inefficient, e.g. if A is sparse and large. In such cases, iterative methods are preferable. They converge if certain conditions are fulfilled, e.g. if A is diagonally dominant (see [Golub89]):

In this case, Ax = b can be rewritten in the form

where each line solves separately for the x appearing with the diagonal element of A. Any iterative scheme needs an initial guess x(0), whose quality determines the possibility or the speed of convergence. k

k

We obtain the (k+1)st iteration x +1 if we substitute the kth iteration x into the right hand side. If we compute all the new values on the left side with all the old values on the right side we obtain the Jacobi iteration :

If we successively use new values of x as soon as they are computed, we get the Gauss-Seidel iteration : i

http://rkb.home.cern.ch/rkb/AN16pp/node156.html (1 of 2)9/3/2006 14:18:06

Linear Equations, Iterative Solutions

A variant of this algorithm is the method of Successive Over-Relaxation:

where the over-relaxation parameter

satisfies

[Young71].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node156.html (2 of 2)9/3/2006 14:18:06

. For how to determine

, see [Golub89] or

Linear Programming

next

up

previous

contents

index

Next: Linear Regression Up: No Title Previous: Linear EquationsIterative Solutions

Linear Programming Linear programming is the mathematical name for methods used in optimizing systems, e.g. project resources. Problems in this domain commonly reduce to a set of m linear equations for n unknowns (with m < n), with additional constraints, such that all coefficient solutions must be non-negative and some linear combination of the unknowns (the objective function) must be minimized. Applications are frequent in industry and business, or in project management. General optimization procedures like the simplex method used (in variations) in minimizing programs have originally been derived in the context [Branham90] or [Press95]. of linear programming applications. For more details,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node157.html9/3/2006 14:18:07

Linear Regression

next

up

previous

contents

index

Next: Linear Shift-invariant Systems Up: No Title Previous: Linear Programming

Linear Regression Linear regression is a special case of the least squares method. In its simplest case, regression corresponds to a straight line fitted to measurements all characterized by the same variance, also Fitting. Assume n measurements y for a function f depending linearly on error-free variables x i

and assume the y without bias [i.e. E(y )=f | and of variance i

i

i

i

, without correlation. The least squares

estimators for a0, a1 are then given by

where

The covariance matrix for these estimators is given by

The measurements y differ from the fitted f on the regression line by the residuals i

If

i

is not known, it can be set to 1 for obtaining a0 and a1 (the result is independent of scale factors),

http://rkb.home.cern.ch/rkb/AN16pp/node158.html (1 of 2)9/3/2006 14:18:10

Linear Regression

and subsequently estimated from the residuals by

The generalization to a linear model with more than two coefficients, e.g. the polynomial ansatz

this parameterization can lead to instabilities is called regression of the pth order. Note that for and results can be difficult to interpret; orthogonal polynomials should be introduced instead. For confidence limits in linear regression or for a comparison of different regression lines, [Brandt83].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node158.html (2 of 2)9/3/2006 14:18:10

Linear Shift-invariant Systems

next

up

previous

contents

index

Next: LU Decomposition Up: No Title Previous: Linear Regression

Linear Shift-invariant Systems Electrical networks or optical imaging systems transform their input (e.g. voltages or light intensities) as a function of time and/or of space. In general such one- or more-dimensional transformations S map some input functions f(x,y,z;t) into some output functions g(x,y,z;t):

The transformation is called a linear system L, if the following equation holds for all functions f1, f2 and any a and b:

i.e. an arbitrary function that can be expressed as a sum of several elementary excitations will be transformed by a linear system as the superposition of the output of these excitations. In general:

L is called shift-invariant if and only if a shift (translation) of the input causes the same shift of the output:

Electrical networks or optical systems are usually treated as time- and space-invariant, respectively. To simplify the notation and to derive the computational aspects, we choose a one-dimensional discrete system. With

http://rkb.home.cern.ch/rkb/AN16pp/node159.html (1 of 3)9/3/2006 14:18:13

Linear Shift-invariant Systems

we can write the identity:

Application of the linear operator L produces:

which is the superposition sum of the shift-varying impulse response h(k;i). If L is shift-invariant, i.e. h (k-i) = L[d(k-i)], the equation can be written in form of a convolution

or abbreviated:

http://rkb.home.cern.ch/rkb/AN16pp/node159.html (2 of 3)9/3/2006 14:18:13

Linear Shift-invariant Systems

The impulse response h is called the point spread function in the two-dimensional case. If F,G and H are the Fourier transforms of f,g and h, respectively,

with the frequency response or transfer function H of the linear shift-invariant system L. For more g. [Kunt80] details and more references or [Goodman68].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node159.html (3 of 3)9/3/2006 14:18:13

e.

LU Decomposition

next

up

previous

contents

index

Next: Marginal Distribution Up: No Title Previous: Linear Shift-invariant Systems

LU Decomposition Any non-singular matrix A can be expressed as a product A = LU; there exist exactly one lower triangular matrix L and exactly one upper triangular matrix U of the form:

if row exchanges (partial pivoting) are not necessary. With pivoting, we have to introduce a permutation matrix P, P being an identity matrix with interchanged (swapped) rows. Instead of A one then decomposes PA:

The LU decomposition can be performed in a way similar to Gaussian elimination. LU decomposition is useful, e.g. for the solution of the exactly determined system of linear equations Ax = b, when there is more than one right-hand side b. With A = LU the system becomes

or

c can be computed by forward substitution and x by back substitution. (see [Golub89]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node160.html9/3/2006 14:18:14

Marginal Distribution

next

up

previous

contents

index

Next: Markov Chain Up: No Title Previous: LU Decomposition

Marginal Distribution Given n random variables

, X with joint probability density function n

, the marginal distribution of x is obtained by integrating the joint probability r

density over all variables but x : r

It can be interpreted as a probability density of the single variable X . r

The joint marginal distribution of several variables over x1+1, x1+2, The variables

, X1 is obtained by integrating

.

are independent if and only if g can be factorized:

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node161.html9/3/2006 14:18:16

Markov Chain

next

up

previous

contents

index

Next: Matrix Operations Up: No Title Previous: Marginal Distribution

Markov Chain A Markov chain is a succession of elements each of which can be generated from a finite (usually small) number of elements preceding it, possibly with some random element added. One can talk about a Markov process of nth order, in which a memory of n elements fully describes the relevant history and the future behaviour of the process. Markov chain Monte Carlo methods can be used in importance sampling, when in generating each point not only random numbers are used, but the previously generated point(s) enter with some weight, in the simplest case by a random walk, where , with r a random vector. The random perturbations used in simulated annealing are another example.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node162.html9/3/2006 14:18:17

Matrix Operations

next

up

previous

contents

index

Next: Matrix OperationsComplex Up: No Title Previous: Markov Chain

Matrix Operations An (m,n) matrix is a rectangular array of real numbers with m rows and n columns

where is the set of real numbers. Most laws of ordinary algebra can be extended to these mathematical objects in a natural way. The sizes of the operands have to agree, of course, depending on the operation. Addition C = A + B is defined elementwise like C = A + B , multiplication with a scalar B = cA by b ij

ij

ij

= c a , matrix-matrix multiplication C = AB by ij

In general,

; matrices are said to commute if AB = BA.

Multiplication is associative: (AB)C = A(BC), left distributive: C(A+B) = CA + CB, and right distributive: (A+B)C = AC + BC. The transpose matrix

is the matrix (a ), and ji

. A matrix is symmetric if

. A vector (or column vector) is an (n,1) matrix (a matrix with only 1 column). The row vector, an (1,n) matrix, is obtained by transposition: . The inner (dot, scalar) product s of 2 vectors u and v is a scalar, and defined as:

http://rkb.home.cern.ch/rkb/AN16pp/node163.html (1 of 3)9/3/2006 14:18:21

ij

Matrix Operations

The outer product O of 2 vectors u and v is a matrix, and defined as o = u v : ij

A set of r vectors

i j

is called linearly independent if and only if the only solution to is .

Matrix notation is particularly useful for the description of linear equations. A matrix A is positive definite if and only if it is symmetric and the quadratic form all non-zero vectors x.

is positive for

A square matrix has an inverse if and only if a matrix A-1 exists with AA-1 = A-1A = I with I the identity , unlike in ordinary algebra, matrix. (AB)-1 = B-1A-1. In general the inverse A-1 need not exist for where a-1 always exists if

. Usually an inverse is not computed explicitly, even if the notation

suggests so: if one finds an inverse in a formula like x = A-1 b, one should think in terms of computing the solution of linear equations. The pseudoinverse (A+) is a generalization of the inverse and exists for any (m,n) matrix. A matrix Q is orthogonal if One can use the norm of a vector u (

, i.e.

.

, defined as the Euclidean length:

http://rkb.home.cern.ch/rkb/AN16pp/node163.html (2 of 3)9/3/2006 14:18:21

Matrix Operations

The span of a set of vectors

is the set of all their linear combinations.

The range of A or column space is the span of the column vectors of A. The span of the row vectors is called the row space (= range of ). The set of vectors x with Ax = 0 is called the null-space. The rank of A [rank(A)] is the dimension of the column (or row) space. The nullity of A [nullity(A)] is the dimension of the null-space. For more details, see [Golub89].

next

up

previous

contents

index

Next: Matrix OperationsComplex Up: No Title Previous: Markov Chain Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node163.html (3 of 3)9/3/2006 14:18:21

Matrix Operations, Complex

next

up

previous

contents

index

Next: Maximum Likelihood Method Up: No Title Previous: Matrix Operations

Matrix Operations, Complex Most of the discussion in this book concentrates on matrices whose elements are real numbers, these being relevant for most applications. However, most of what is described works equally well for complex and real elements, if one observes the following formal changes: ●

the transpose

becomes the conjugate (Hermitian) transpose

●

the inner product

●

orthogonality

●

the length

becomes

●

a symmetric matrix

becomes Hermitian A = A;

●

an orthogonal matrix

becomes

;

;

H

is written x y = 0; ; H

H

H

H

becomes (Ux) (Uy) = x y, and For further reading, see [Strang88].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node164.html9/3/2006 14:18:24

H

becomes unitary U U=I or U = U-1; hence

or

remains

.

Maximum Likelihood Method

next

up

previous

contents

index

Next: Mean Up: No Title Previous: Matrix OperationsComplex

Maximum Likelihood Method If measurements y have been performed, and p(y|x) is the normalized (

) probability

density of y as function of parameters x, then the parameters x can be estimated by maximizing the joint probability density for the m measurements y (assumed to be independent) j

is called the likelihood function . L is a measure for the probability of observing the particular sample y at hand, given x. Maximizing L by varying x amounts to interpreting L as function of x, given the measurements y. If p(y|x) is a normal distribution, and if its variance is independent of the parameters x, then the maximum-likelihood method is identical to the least squares method. The general problem is often solved numerically by minimization of [Press95], [Bishop95]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node165.html9/3/2006 14:18:25

, (see [Blobel84],

Mean

next

up

previous

contents

index

Next: Median Up: No Title Previous: Maximum Likelihood Method

Mean Given n quantities a one defines the i

If all a1 are positive, then Median,

. For further details,

Mode.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node166.html9/3/2006 14:18:27

Weighted Mean,

Expectation Value,

Median

next

up

previous

contents

index

Next: Median Filter Up: No Title Previous: Mean

Median The median of the distribution of a random variable X is defined as the quantile x1/2, i.e. the probability of observing X<x1/2 is the same as observing X>x1/2, or in the ordered sample as many points lie to the left and to the right of the median.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node167.html9/3/2006 14:18:28

Median Filter

next

up

previous

contents

index

Next: Metric Up: No Title Previous: Median

Median Filter Median filtering is a non-linear signal enhancement technique ( Image Enhancement) for the smoothing of signals, the suppression of impulse noise, and preserving of edges. In the one-dimensional case it consists of sliding a window of an odd number of elements along the signal, replacing the centre sample by the median of the samples in the window. In the following picture we use window sizes of 3 and 5 samples. The first two columns show a step function, degraded by some random noise. The two last columns show a noisy straight line, and in addition one and two samples, which are considerably different from the neighbour samples.

Whereas the median filter in the first column preserves the edge very well, the low-pass filtering method in the second column smoothes the edge completely. Columns 3 and 4 show the importance of the window size: one sample out of range can be easily removed with a window size of 3, whereas two http://rkb.home.cern.ch/rkb/AN16pp/node168.html (1 of 2)9/3/2006 14:18:29

Median Filter

neighbouring samples can only be removed with a larger window. For more details, see [Pratt78].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node168.html (2 of 2)9/3/2006 14:18:29

Metric

next

up

previous

contents

index

Next: Metropolis Algorithm Up: No Title Previous: Median Filter

Metric A metric or distance function is a function d(p,q) of two points p and q which satisfies:

Frequently used examples are: The Euclidean distance: in two dimensions,

In a digital image, the elements of p and q are row and column numbers. Generalized to any number of elements in p and q, one can write

Points with equal

from p form a circle (sphere, hypersphere) of radius

around p.

The city block distance: in two dimensions,

with obvious generalization to more dimensions. Points (pixels in an image) with equal form a diamond around p; in an image:

http://rkb.home.cern.ch/rkb/AN16pp/node169.html (1 of 2)9/3/2006 14:18:32

from p

Metric

Points with

from p are called the 4-connected neighbours of p.

The chess board distance: in two dimensions,

Points with equal

from p form a square around p; in an image:

Points (pixels in an image) with

from p are called the 8-connected neighbours of p.

[Rosenfeld76]. A metric can also be defined in a binary space, e.g. as the distance between two bit patterns ( Hamming Distance).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node169.html (2 of 2)9/3/2006 14:18:32

e.g.

Metropolis Algorithm

next

up

previous

contents

index

Next: MFLOPS Up: No Title Previous: Metric

Metropolis Algorithm The first documented introduction of stochastic principles in numerical calculations (see [Metropolis53]). Concepts like simulated annealing in optimization problems or importance sampling in Monte Carlo calculations are derived from the principles of this algorithm.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node170.html9/3/2006 14:18:33

MFLOPS

next

up

previous

contents

index

Next: Minimax Approximation Up: No Title Previous: Metropolis Algorithm

MFLOPS Benchmarking

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node171.html9/3/2006 14:18:34

Minimax Approximation

next

up

previous

contents

index

Next: Minimization Up: No Title Previous: MFLOPS

Minimax Approximation Chebyshev Norm

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node172.html9/3/2006 14:18:35

Minimization

next

up

previous

contents

index

Next: MIPS Up: No Title Previous: Minimax Approximation

Minimization Minimization problems arise in many contexts, usually in connection with optimization: a mathematical model describes phenomena as functions of variable parameters x, and a single measure of quality F(x), the objective function, is defined, whose maximum (or the minimum of its negative or inverse) corresponds to the optimal solution. Frequently, the optimum is constrained by additional equations (or inequalities) that have to be satisfied. Many different methods exist for solving minimization problems of various kinds, and program libraries or commercial mathematical packages contain a choice of them (e.g. [Wolfram91]). None of them is universally applicable, although some are robust for many problems, e.g. the (downhill) simplex method; usually these are not efficient in the use of computer resources (often, however, this is not an issue). A good introduction to the various classes of solutions is given in [Press95], many with implemented programs. Here are some common and useful concepts encountered in minimization: ●

●

●

●

- Programs have typically no problem in finding local minima, be it by frequent function evaluation or by the use of derivatives. To find a global minimum, instead, particularly if a function is discontinuous (e.g. narrow spikes), needs a suitable way of finding starting points, and is a problem that escapes a general definition. Typically, programs require guidance for a global maximum, e.g. the limits of the explored volume, a step size, or a choice between general search methods for starting points like a grid or random numbers. - If one views the function to be minimized as a (hyper-) surface, its behaviour around the minimum determines the sucess of different methods. In many problems, the coordinates along which programs search for minima are correlated, and the function forms a ``long narrow valley'', at some direction with the axes. The effect is that along all coordinate axes, one gets ``off the valley floor'', i.e. to higher function values, and the true minimum is difficult to find. Clever algorithms do find these correlations, and determine with fewer steps a more correct minimum. - Many methods consist of reducing the multidimensional space minimization problem to a succession of one-dimensional minimization problems, so that a fast minimum finder along a line (univariate minimization) is a desirable part of the problem, e.g. by parabolic interpolation, Brent's method. - When differentiation is possible, what is needed is the gradient vector

; in

some methods, Hessian matrix is computed to decide about the direction of steepest descent. and H that define a minimum. Mathematically, it is conditions on

http://rkb.home.cern.ch/rkb/AN16pp/node173.html (1 of 2)9/3/2006 14:18:36

Minimization ●

- The maximum likelihood method is a special case of minimization, in which is derived from L(x), the joint probability distribution of all measured

●

values assumed independent. If one makes the assumption of a large number of measurements, the likelihood function has a Gaussian probability density with respect to the parameters x, and the Hessian of F(x) is the inverse of the covariance matrix of the parameters x, a useful way of estimating the quality of the result. - If the number of parameters is very large, and the number of possible discrete solutions is given by permutations, i.e. increases factorially, standard methods of minimization are usually impractical due to computer limitations. Often this is referred to as the ``travelling salesman problem''. A different class of heuristic solutions is available for these problems, most of which avoid getting trapped into local minima by allowing random perturbations. Among them we mention the method of simulated annealing or genetic algorithms. In these methods, the objective function is evaluated after random changes in the parameters or from combinations of previous solutions; solutions are retained or not depending on a strategy guided by the effect the changes have on the objective function. The names suggest that the problem is treated in simulated annealing according to principles of thermodynamics, in genetic algorithms according to concepts about evolution; derivatives are not used, and no proof exists that the minimum of the objective function is absolute; in practice, however, there is good convergence to an asymptotic minimum which then resists many further (random) changes.

For more reading, see [Press95], [Flowers95], [Bishop95], also Simplex Method.

next

up

previous

contents

index

Next: MIPS Up: No Title Previous: Minimax Approximation Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node173.html (2 of 2)9/3/2006 14:18:36

MIPS

next

up

previous

contents

index

Next: Mode Up: No Title Previous: Minimization

MIPS Benchmarking

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node174.html9/3/2006 14:18:37

Mode

next

up

previous

contents

index

Next: Moment Up: No Title Previous: MIPS

Mode A random variable X can either assume a number of discrete values x (with probabilities P(X=x )) or i

i

continuous values x (with a probability density function f(x)). The mode x of a distribution is defined as that value of x for which the probability of observing the m

random variables is a maximum, i.e.

If a distribution has only one mode it is called unimodal, otherwise multimodal.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node175.html9/3/2006 14:18:38

Moment

next

up

previous

contents

index

Next: Monte Carlo Methods Up: No Title Previous: Mode

Moment The moment of order l about the mean E(X) of a random variable X is defined as the expectation value

For several variables

the moment of order (l,m,n, .... ) about the mean

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node176.html9/3/2006 14:18:40

is

Monte Carlo Methods

next

up

previous

contents

index

Next: Morphological Operations Up: No Title Previous: Moment

Monte Carlo Methods The systematic use of samples of random numbers in order to estimate parameters of an unknown distribution by statistical simulation. Methods based on this principle of random sampling are indicated in cases where the dimensionality and/or complexity of a problem make straightforward numerical solutions impossible or impractical. The method is ideally adapted to computers, its applications are varied and many, its main drawbacks are potentially slow convergence (large variances of the results), and often the difficulty of estimating the statistical error (variance) of the result. Monte Carlo problems can be formulated as integration of a function

over a (multi-

dimensional) volume V, with the result

where

, the average of f, is obtained by exploring randomly the volume V.

Most easily one conceives a simple (and inefficient) hit-and-miss Monte Carlo: assume, for example, a three-dimensional volume V to be bounded by surfaces difficult to intersect and describe analytically; on the other hand, given a point (x,y,z), it is easy to decide whether it is inside or outside the boundary. In this case, a simply bounded volume which fully includes V can be sampled uniformly (the components x, y,z are generated as random numbers with uniform probability density function), and for each point a weight is computed, which is zero if the point is outside V, one otherwise. After N random numbers, n will have been found inside V, and the ratio n/N is the fraction of the sampled volume which corresponds to V. Another method, crude Monte Carlo, may be used for integration: assume now the volume V is bounded by two functions z(x,y) and z'(x,y), both not integrable, but known for any x,y, over an interval and . Taking random pairs (x,y), evaluating at each point, averaging to and forming

, gives an approximation of the volume (in this example, sampling the

area with quasirandom numbers or, better, using standard numerical integration methods will lead to more precise results). Often, the function to be sampled is, in fact, a probability density function, e.g. a matrix element in http://rkb.home.cern.ch/rkb/AN16pp/node177.html (1 of 2)9/3/2006 14:18:42

Monte Carlo Methods

phase space. In the frequent case that regions of small values of the probability density function dominate, unacceptably many points will have to be generated by crude Monte Carlo, in other words, the convergence of the result to small statistical errors will be slow. Variance-reducing techniques will then be indicated, like importance sampling or stratified sampling. For more reading, see [Press95], [Hammersley64], [Kalos86].

next

up

previous

contents

index

Next: Morphological Operations Up: No Title Previous: Moment Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node177.html (2 of 2)9/3/2006 14:18:42

Morphological Operations

next

up

previous

contents

index

Next: Multinomial Distribution Up: No Title Previous: Monte Carlo Methods

Morphological Operations Mathematical morphology is a set-theoretical approach to multi-dimensional digital signal or image analysis, based on shape. The signals are locally compared with so-called structuring elements S of arbitrary shape with a reference point R, e.g.:

The aim is to transform the signals into simpler ones by removing irrelevant information. Morphological operations can be applied to binary and greylevel signals. The most basic building blocks for many morphological operators are erosion and dilation. We will define these operations without any mathematical rigour, and will therefore restrict ourselves to relatively simple structuring elements like the first four above. For a binary image we will define: The eroded image of an object O with respect to a structuring element S with a reference point R, , is the set of all reference points for which S is completely contained in O. The dilated image of an object O with respect to a structuring element S with a reference point R, , is the set of all reference points for which O and S have at least one common point. Opening is defined as an erosion, followed by a dilation: Closing is defined as a dilation, followed by an erosion:

. .

In the following figure we show the sequence of opening and closing with a element, with the reference point in the centre.

http://rkb.home.cern.ch/rkb/AN16pp/node178.html (1 of 5)9/3/2006 14:18:45

square structuring

Morphological Operations

The examples show that erosions and dilations shrink and expand objects, respectively; they remove and add parts of objects, and in general cannot be inverted. The choice of the structuring element is of extreme importance, and depends on what should be extracted or deleted from the image. For shapes one wants to keep, the choice must be a structuring element and an invertible sequence of morphological operators The morphological operators can easily be extended to greylevel images. Erosion and dilation will be replaced by the maximum/minimum operator ( Rank Filter). The following one-dimensional example shows how opening can be used to remove impulse noise:

http://rkb.home.cern.ch/rkb/AN16pp/node178.html (2 of 5)9/3/2006 14:18:45

Morphological Operations

In the next figure, it is the signal which is ``spiky'' and must be extracted; the signal and the result of the four basic morphological operations are shown. The second row shows how the difference between the original and the opened signal, the ``top hat'' transform, can be used for contrast stretching and peak detection. In the middle, the original signal is plotted with the opened signal and with a signal that was first closed and then opened. The last plot illustrates the better performance if one uses this upper ``noise envelope''.

http://rkb.home.cern.ch/rkb/AN16pp/node178.html (3 of 5)9/3/2006 14:18:45

Morphological Operations

Morphological operations can also be used for edge detection. It might seem that the simple difference between a dilated and an eroded signal could define an edge, but this method is very noise sensitive. In [Lee86] it is recommended to smooth first the original signal and then use the operator , where s is the smoothed original, and d and e are the dilated and eroded versions of s, respectively:

http://rkb.home.cern.ch/rkb/AN16pp/node178.html (4 of 5)9/3/2006 14:18:45

Morphological Operations

Good introductions to mathematical morphology are e.g.: [Dougherty92], [Haralick87], [Maragos87]. The more mathematically inclined reader may consult [Serra80].

next

up

previous

contents

index

Next: Multinomial Distribution Up: No Title Previous: Monte Carlo Methods Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node178.html (5 of 5)9/3/2006 14:18:45

Multinomial Distribution

next

up

previous

contents

index

Next: Multivariate Normal Distribution Up: No Title Previous: Morphological Operations

Multinomial Distribution This is an extension of the binomial distribution to the case where there are more than two classes into which an event can fall. The most common example is a histogram containing N independent events distributed into n bins. Then if p is the probability of an individual event falling in the ith bin, the i

probability of exactly r events falling in bin i for each i is: i

where

The expectation value, variance, correlation coefficient of the r are: i

Even though the events are independent, there is a correlation between bin contents because the sum is constrained to be N.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node179.html9/3/2006 14:18:46

Multivariate Normal Distribution

next

up

previous

contents

index

Next: Neural Networks Up: No Title Previous: Multinomial Distribution

Multivariate Normal Distribution The joint probability density of n random variables

and the covariance matrix C = B-1, if it has the form

means

Only if the covariance matrix is diagonal can means of two variables,

is normal with the

and variances

be written as a product of n normal distributions with . For a more detailed discussion of the normal distribution

Bivariate Normal Distribution.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node180.html9/3/2006 14:18:48

Neural Networks

next

up

previous

contents

index

Next: Neville Algorithm Up: No Title Previous: Multivariate Normal Distribution

Neural Networks Neural networks (or, with more precision, artificial NNs) are mathematical models that are inspired by the connections and the functioning of neurons in biological systems. NNs have given rise to a branch of research called neural computing, being used or tried out in many disciplines. Basically, NNs are based on two simple concepts, the topology of nodes and connections between them, and transfer functions which relate the input and output of each node. A node receives input data through its input connections, performs a very simple operation on these (weighted sum and some kind of thresholding function), and passes the result on its output connection(s), as final output or for use in other nodes. Recent interest in this class of algorithms (which includes cellular automata as a subset) was stimulated [Hopfield86] by good results and excellent robustness on simple tasks. Many classification and pattern recognition problems can be expressed in terms of NNs. For introductory reading, see [Beale91] or [Bishop95]. The inherent simplicity of NNs suggests that massive parallelism and possibly special, very simple hardware can be taken advantage of in the implementation of NNs, e.g. semiconductors or optical elements. More relevant than implementation questions, however, appears to be the understanding of the virtues and pitfalls of NNs as algorithms. One of their important properties is that they can be trained, i. e. they can be given training samples of events of different classes, and by learning algorithms of various complications, can adjust the weights associated to all input connections until some overall function is maximized which characterizes the quality of the decision mechanism. The optimization is often viewd in analogy with the minimizing of a physical potential (Boltzmann machine); the function is then termed an ``energy function''. Impressive results can be achieved on small-size classification problems, where NNs can learn up to a good performance level without more input than training samples; a common example is character recognition. An optimization of the choice of input data and of network topology is usually left to trial and error. A frequently found suggestion is that input data should describe events exhaustively; this rule of thumb can be translated into the use as input of all variables that can be thought of as having problem-oriented relevance (and no more). Unnecessarily large and possibly inadequate neural networks can be avoided by pre-processing of data and/or (partial) feature extraction; in general, it is a useful suggestion to reduce and transform the variables of the training sample into fewer or new variables, with whatever a priori information may exist on them, before submitting them to a NN training algorithm. The variables should display translation- and scale-invariance with respect to the information to be extracted. Studies have shown that such variables are implicitly used (``found'') by the training procedure, if they are linear combinations of the input variables, but not in general. Indeed, if the thresholding function is a simple step function, a feedforward network of more than one layer performs multiple piecewise linear transformations; decision boundaries are then multiple hyperplanes. For more involved thresholding

http://rkb.home.cern.ch/rkb/AN16pp/node181.html (1 of 2)9/3/2006 14:18:49

Neural Networks

functions (transfer functions or activation functions), sigmoid functions or tanh, the interpretation is more complicated. NNs are often used as a way of optimizing a classification (or pattern recognition) procedure; this optimization aspect puts NNs close to other optimization tools ( Minimization), which also define an objective function that has to be maximized. NNs also usually have more input than output nodes; they may thus also be viewed as performing a dimensionality reduction on input data, in a way more general than principal component analysis. Another possible interpretation of network outputs is that of probabilities; for a discussion, see [Bishop95]. The trial-and-error approach is usually also taken for the initial choice of weights needed to launch the learning process. Robustness is demonstrated by showing that different starting values converge to the same or similar results. Once trained, neural networks in many cases are robust with respect to incomplete data. Training may also be a continuing process, in that the network weights are updated periodically by new training samples; this is indicated if the characteristics of the input data are subject to slow evolution, or if training samples are not initially available, i.e. the network has to learn on the data. Depending on the topology of interconnection and the time sequence of operations, networks can be classified ([Humpert90]), from simple one-directional networks with few layers acting in step (feedforward), of which the nodes or neurons are sometimes also called perceptrons, to the fully connected networks (Hopfield network). For multiple practical applications,

next

up

previous

contents

e.g. [Horn97].

index

Next: Neville Algorithm Up: No Title Previous: Multivariate Normal Distribution Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node181.html (2 of 2)9/3/2006 14:18:49

Neville Algorithm

next

up

previous

contents

index

Next: Newton-Raphson Method Up: No Title Previous: Neural Networks

Neville Algorithm This algorithm is a schematic recursive way of evaluating the coefficients of a polynomial of order n-1 from n known function values

Given the n pairs (x ,y ), one procedes schematically: i i

●

- first find the n ``polynomials'' of order zero going through the n function values at , i.e. simply the y ; i

●

- next obtain from these the n-1 polynomials of order one going through the pairs (x ,y ) and (x +1, i i

i

y +1); i

●

- next the n-2 polynomials of order two going through the triplets (x ,y ), (x +1,y +1) and (x +2,y i i

i

i

i

i

+2); ●

- etc.,

until one reaches the required single polynomial of order n-1 going through all points. The recursive formula allows one to derive every polynomial from exactly two polynomials of a degree lower by one, by

The formula may be viewed as an interpolation. It translates, for instance, into a second-order polynomial defined by the equations of two straight lines by:

http://rkb.home.cern.ch/rkb/AN16pp/node182.html (1 of 2)9/3/2006 14:18:50

Neville Algorithm

see [Press95] for variants.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node182.html (2 of 2)9/3/2006 14:18:50

Newton-Raphson Method

next

up

previous

contents

index

Next: Newton's Rule Up: No Title Previous: Neville Algorithm

Newton-Raphson Method An iteration method for solving a system of n non-linear equations

for the n variables

. An approximate solution x must be known. Then a better

approximation

is found from the approximate equations

which are linear equations in the unknown

. The matrix J is the Jacobi matrix,

The process is iterated until it converges, usually until is smaller than the accuracy wanted in the solution, or until all the f (x) are ``sufficiently close to 0'' (general criteria are difficult to define). j

Convergence may, of course, not be obtained if the first approximation was poor (again this is difficult to define in general). In the one-dimensional case the Newton-Raphson formula

has a very simple geometrical interpretation: it is the extrapolation to 0 along the tangent to the graph of f (x) (also called Newton's rule). The convergence is quadratic,

, where

is the error after m iterations. Note that only

approximate solutions for are required. A small error in will not destroy the convergence completely, but may make it linear instead of quadratic. Hence also the Jacobian matrix J needs to be calculated only approximately, in particular it need often not be recalculated for each iteration. Double computer precision for x and f(x) but single precision for J and may give double precision for the http://rkb.home.cern.ch/rkb/AN16pp/node183.html (1 of 2)9/3/2006 14:18:53

Newton-Raphson Method

final solution. In fact, the Newton-Raphson method may be applied even to linear equations in order to give double precision solutions using single precision subroutines. Numerical differentiation might be used; this is then essentially the secant method. Some care may be needed, since numerical differentiation becomes inaccurate both for small and large steps, see [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node183.html (2 of 2)9/3/2006 14:18:53

Newton's Rule

next

up

previous

contents

index

Next: Neyman-Pearson Diagram Up: No Title Previous: Newton-Raphson Method

Newton's Rule Newton-Raphson Method

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node184.html9/3/2006 14:18:54

Neyman-Pearson Diagram

next

up

previous

contents

index

Next: Noise Up: No Title Previous: Newton's Rule

Neyman-Pearson Diagram A diagram (also named a decision quality diagram ) used in optimizing decision strategies with a single test statistic. The assumption is that samples of events or probability density functions are available both for signal (authentic) and background (imposter) events; a suitable test statistic is then sought which optimally distinguishes between the two. Using a given test statistic (or discriminant function ), one can introduce a cut which separates an acceptance region (dominated by signal events) from a rejection region (dominated by background). The Neyman-Pearson diagram plots contamination (misclassified background events, i.e. classified as signal) against losses (misclassified signal events, i.e. classified as background), both as fractions of the total sample.

An ideal test statistic causes the curve to pass close to the point where both losses and contamination are zero, i.e. the acceptance is one for signals, zero for background (see figure). Different decision strategies choose a point of closest approach, where a ``liberal'' strategy favours minimal loss (i.e. high acceptance of signal), a ``conservative'' one favours minimal contamination (i.e. high purity of signal). http://rkb.home.cern.ch/rkb/AN16pp/node185.html (1 of 2)9/3/2006 14:18:55

Neyman-Pearson Diagram

For a given test (fixed cut parameter), the relative fraction of losses (i.e. the probability of rejecting good events, which is the complement of acceptance), is also called the significance or the cost of the test; the relative fraction of contamination (i.e. the probability of accepting background events) is denominated the power or purity of the test. Hypothesis testing may, of course, allow for more than just two hypotheses, or use a combination of different test statistics. In both cases, the dimensionality of the problem is increased, and a simple diagram becomes inadequate, as the curve relating losses and contamination becomes a (hyper-) surface, the decision boundary . Often, the problem is simplified by imposing a fixed significance, and optimizing separately the test statistics to distinguish between pairs of hypotheses. Given large training samples, neural networks can contribute to optimizing the general decision or classification problem.

next

up

previous

contents

index

Next: Noise Up: No Title Previous: Newton's Rule Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node185.html (2 of 2)9/3/2006 14:18:55

Noise

next

up

previous

contents

index

Next: Norm Up: No Title Previous: Neyman-Pearson Diagram

Noise Random background signals in transmission or communication systems. Noise is strictly dependent on the systems used and their technologies. One usually distinguishes white noise which occurs with similar amplitudes over a wide frequency spectrum (the analogy is with white light, made up of all visible frequencies), and is also called random, Gaussian or steady state noise, and impulse noise (also impulsive noise which is a momentary perturbance, limited in the frequency band, and often at saturation (i.e. the maximum signal height permitted). In analogue electronics, one talks about shot noise , which is Poisson-distributed and explained by the small statistics of charge carriers passing through semiconductor junctions; in image processing, the expression blue noise is used for random perturbations favouring high over low frequencies (sometimes also called 1/f noise , where f is the frequency). In experiments, noise is quite generally used as a synonym for background of different kinds; outliers are noise of the impulse type, multiple scattering of particles produces fluctuations of the white noise type.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node186.html9/3/2006 14:18:56

Norm

next

up

previous

contents

index

Next: Normal Distribution Up: No Title Previous: Noise

Norm A norm of a vector x (written

) is a scalar function which measures the ``size'' of x. It satisfies:

Most often one uses the p-norms :

The most important cases:

●

12mm p=1 defines the length of a vector as the sum of the absolute values of the components: . L1 is also called the city block metric , [Bishop95]. The L1 estimator of the centre of a distribution is the median.

●

p=2 defines the familiar Euclidean length of a vector: centre of a distribution is the

●

. The L2 estimator of the

estimator, which is the mean.

yields the Chebyshev norm:

,

. The

estimator

of the centre of a distribution is the midrange, i.e. the average of the two extreme values.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node187.html9/3/2006 14:18:57

Normal Distribution

next

up

previous

contents

index

Next: Normal Equations Up: No Title Previous: Norm

Normal Distribution Also called a Gaussian distribution, this is in practice one of the most important distributions, since experimental errors are often normally distributed to a good approximation ( Central Limit Theorem), and, further, the normal assumption simplifies many theorems and methods of data analysis (e.g. the method of least squares). The normal distribution has the following properties:

It has two parameters, the mean a and the width , which can be estimated from a sample by the following estimators:

In the statistical literature the probability density function of the normal distribution is often denoted by ). The standard normal distribution has zero mean and unit variance, i.e.

The corresponding distribution function is denoted by

http://rkb.home.cern.ch/rkb/AN16pp/node188.html (1 of 2)9/3/2006 14:18:59

Normal Distribution

This is the complement of what is usually denoted as error function (the name is also used in other contexts), i.e. .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node188.html (2 of 2)9/3/2006 14:18:59

Normal Equations

next

up

previous

contents

index

Next: Numerical Differentiation Up: No Title Previous: Normal Distribution

Normal Equations We consider the problem , where A is an (m,n) matrix with vector, and x is the (n,1) vector to be determined.

, rank(A) = n, b is an (m,1)

stands for the least squares approximation, i.e. a minimization of the norm of the residual r =

The sign Ax - b

or the square

i.e. a differentiable function of x. The necessary condition for a minimum is:

These equations are called the normal equations , which become in our case:

The solution

is usually computed with the following algorithm:

First (the lower triangular portion of) the symmetric matrix decomposition

. Thereafter one solves

. http://rkb.home.cern.ch/rkb/AN16pp/node189.html (1 of 2)9/3/2006 14:19:02

is computed, then its Cholesky

for y and finally x is computed from

Normal Equations

Unfortunately

is often ill-conditioned and strongly influenced by roundoff errors (see [Golub89]). T

Other methods which do not compute A A and solve singular value decomposition.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node189.html (2 of 2)9/3/2006 14:19:02

directly are QR decomposition and

Numerical Differentiation

next

up

previous

contents

index

Next: Numerical Integration Up: No Title Previous: Normal Equations

Numerical Differentiation Let f=f(x) be a function of one variable. Then

The error is of order h2 in each case if f(3)(x), respectively f(4)(x), exists. Note that if f(x+h) and f(x-h) have n significant digits, but are equal to within m digits, then their difference has only n-m significant digits. Hence, unless f(x)=0, this formula for is imprecise for very small h (more precisely for Extrapolation to the limit, in this case to h=0, may give numerical derivatives to high precision even with relatively large steps h.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node190.html9/3/2006 14:19:03

Numerical Integration

next

up

previous

contents

index

Next: Numerical Integration of ODE Up: No Title Previous: Numerical Differentiation

Numerical Integration One can distinguish three classes of integration problems: 1) quadrature, i.e. computation of definite integrals in one or more dimensions; 2) solution of ordinary differential equations (ODEs), and 3) solution of partial differential equations (PDEs). For example, the calculation of an instrument's acceptance can be looked upon as a quadrature problem. An example of class 2), ODEs, is the Lorentz equation of motion for a charged particle in an electromagnetic field. An example of class 3), PDEs, are Maxwell's equations. Only in special cases can analytic solutions be found; Numerical Integration of Numerical Integration of ODE, Numerical Integration(Quadrature), see [Wong92], [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node191.html9/3/2006 14:19:04

Numerical Integration of ODE

next

up

previous

contents

index

Next: Numerical IntegrationQuadrature Up: No Title Previous: Numerical Integration

Numerical Integration of ODE Let

, y (x)) be n functions of one variable x, with n

and

the first and

second derivatives. A first order, x-independent ODE has the form

A second order, x-dependent ODE has the form

In principle, these two forms are completely equivalent, the one is a special case of the other (if and , then ; if and then i.e.

.

However, from the numerical point of view the two forms are not equivalent, and second-order equations are most efficiently treated by special methods. The general solution of a second-order ODE contains 2n arbitrary constants, which have to be fixed, e.g. by fixing initial values y(x0)=y0 and at one given x = x0, or by fixing boundary values y(x0) =y0, y(x1)=y1 at two points

. For numerical methods for initial and boundary value problems

see [Hall76], [Press95]. An example is the Lorentz equation of motion describing the movement of charged particles in a magnetic field:

with s the path length along the track, and

the momentum, along the direction of the track. B is the

magnetic field vector. In the Monte Carlo simulation of tracks, one has to solve an initial value problem. In track reconstruction, one has to determine the initial values y(x0) and from a number of

http://rkb.home.cern.ch/rkb/AN16pp/node192.html (1 of 2)9/3/2006 14:19:08

Numerical Integration of ODE

measured values of

and x along the track, and this is more like a boundary value problem (we have

assumed here that the field B is along z). Required here is an integration method for second-order equations. The bending of tracks often being small, one can get good precision using a high (e.g. fourth) order method with quite large steps. A typical spectrometer magnet has a very sharp-edged field. For the equation of motion this means that resembles a step function. Certain methods (like n-step methods with n>2 and large steps) do not handle such functions very well. On a smaller scale,

may have

artificial discontinuities due to a discontinuous representation of the magnetic field, or

may be

discontinuous. Such discontinuities typically invalidate error estimates, and may cause trouble for methods based on extrapolation to the limit of zero step length. Runge-Kutta methods are simple and efficient, and are much used for this problem. An interesting alternative is offered by the predictor-corrector methods.

next

up

previous

contents

index

Next: Numerical IntegrationQuadrature Up: No Title Previous: Numerical Integration Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node192.html (2 of 2)9/3/2006 14:19:08

Numerical Integration, Quadrature

next

up

previous

contents

index

Next: Numerov's Method Up: No Title Previous: Numerical Integration of ODE

Numerical Integration, Quadrature A simple procedure for calculating a one-dimensional integral

is to divide the interval [a,b] into smaller intervals [u,u+h], where and then use approximations like Simpson's rule,

or three-point Gauss integration

Errors for the complete integral are O(h4) and O(h6), respectively, and the two methods are said to be of order 4 and 6. Note that these error estimates are invalid if the integrand has singularities or discontinuities. For more detail, see [Wong92].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node193.html9/3/2006 14:19:09

Numerov's Method

next

up

previous

contents

index

Next: Object-oriented Programming Up: No Title Previous: Numerical IntegrationQuadrature

Numerov's Method This is a two-step, fifth order predictor-corrector method for a second-order ordinary differential equation

where f=f(y,x) is independent of the first derivative

(example: the Schr

dinger equation). The corrector formula is

A four-step predictor formula, as well as starting formulae, are given in [Press95]. A modified version of Numerov's method has been found to be more precise for this problem than the Runge-Kutta methods commonly used. This method applies to the general second-order equation

but is only of fourth order, since the Numerov corrector formula is supplemented by Milne's corrector formula (= Simpson's rule)

Two-step predictor formulae are used,

For the first step the one-step predictor formulae

http://rkb.home.cern.ch/rkb/AN16pp/node194.html (1 of 2)9/3/2006 14:19:12

Numerov's Method

and corrector formulae

are used. These one-step formulae are of lower order than the two-step formulae, but to compensate for this one may take the first two steps to be half the length of the following steps (i.e., let the first step of length h consist of two steps of length h/2).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node194.html (2 of 2)9/3/2006 14:19:12

Object-oriented Programming

next

up

previous

contents

index

Next: Optimization Up: No Title Previous: Numerov's Method

Object-oriented Programming The term describes an approach to (large-scale) programming which puts on an equal footing processes (viz. the structure of program actions) and data (viz. the objects manipulated by the program). Software is built as much or more from the starting point of data structures, and the objects of an object-oriented (OO) approach contain both data and the transformations they may be subjected to. The goal of this approach (and its rapid spread seems to indicate that the goal is largely attained) is to produce modular code that somehow possesses many of the desirable buzzword (i.e. ill-defined) qualities like correctness, robustness, reusability, extendibility, etc. OO programming is based on languages like C++ ([Deitel94]) or Eiffel ([Meyer88]); the field is in rapid evolution, and an impressive (and confusing) bibliography exists in print (and on the Internet); e.g. [Ross96], [Budd91].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node195.html9/3/2006 14:19:13

Optimization

next

up

previous

contents

index

Next: Orthogonal Functions Up: No Title Previous: Object-oriented Programming

Optimization Minimization

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node196.html9/3/2006 14:19:13

Orthogonal Functions

next

up

previous

contents

index

Next: Orthogonal Matrices Up: No Title Previous: Optimization

Orthogonal Functions A set of functions

, defined in

, is called orthogonal (or unitary, if complex) if it

satisfies the following condition:

where

for i = j, and = 0 for

, and * is the complex conjugate. Without loss of generality

we assume orthonormality (K = 1) and the range [0, 1] for x. We want to approximate a function f(x) by i

a linear combination of these functions

are complete if any piecewise continuous function f(x) can be represented in to a given accuracy. The this form in the sense that the mean square error

converges to zero for sufficiently large N. In the discrete case, f(x) is known at some points x

j

then the equations above become

http://rkb.home.cern.ch/rkb/AN16pp/node197.html (1 of 5)9/3/2006 14:19:17

. If

Orthogonal Functions

The relationship between the two domains is governed by Parseval's relation:

i.e. the ``energy'' in the spatial domain equals the ``energy'' in the transform domain. The importance of this equation lies in the potential for bandwidth reduction; if most of the energy is contained in a few large transform samples, using, e.g. threshold coding, the small transform samples can be ignored, without loss of relevant information. Examples are:

The integral of the product of any two functions is zero, i.e. both are orthogonal; they are not complete, however. Among the complete orthonormal transforms are the sinusoidal transforms (Fourier transform, the sine and cosine transform), but there exist also many different non-sinusoidal transforms, of which we show three in the next figure:

http://rkb.home.cern.ch/rkb/AN16pp/node197.html (2 of 5)9/3/2006 14:19:17

Orthogonal Functions

Typical example functions that are to be approximated by these transforms are:

It can be shown, as one would expect, that smooth continuous waveforms like the first one above are well approximated by sinusoidal functions, but that discontinuous waveforms (rectangular or saw tooth) or non-stationary (``spiky'') experimental data, as they occur frequently in practice, are much better approximated by non-sinusoidal functions. There are pitfalls with discontinuous curves: what can happen if one tries to fit a Fourier series to a square wave was demonstrated by Gibbs in 1898. He proved that the sum of the Fourier series oscillates http://rkb.home.cern.ch/rkb/AN16pp/node197.html (3 of 5)9/3/2006 14:19:17

Orthogonal Functions

at jump discontinuities, and that the amplitude of the oscillations does not depend on the number of terms used. There is a fixed overshoot of about 9% of the step size.

A second example is taken from signal coding ( Data Compression). It shows that the Haar transform adapts much better to the ``spiky'' signal from a physics experiment than the cosine transform, which is, [Jain89]), the best of the fast transforms, from the point of view of energy according to the literature ( compaction, for more stationary signals. Threshold coding with identical criteria was used in both cases.

http://rkb.home.cern.ch/rkb/AN16pp/node197.html (4 of 5)9/3/2006 14:19:17

Orthogonal Functions

For the two-dimensional discrete case, Global Image Operations. More reading and references can be found e.g. in [Beauchamp87], [Kunt84], [Kunt80], [Ahmed75], [Courant62].

next

up

previous

contents

index

Next: Orthogonal Matrices Up: No Title Previous: Optimization Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node197.html (5 of 5)9/3/2006 14:19:17

Orthogonal Matrices

next

up

previous

contents

index

Next: Orthogonal Polynomials Up: No Title Previous: Orthogonal Functions

Orthogonal Matrices A real square (n,n) matrix is orthogonal if

, i.e. if

. Orthogonal matrices play

a very important role in linear algebra. Inner products are preserved under an orthogonal transform: , and of course the Euclidean norm e.g. solve the least squares problem

by solving the equivalent problem

, so that we can, .

Important examples are Givens rotations Householder transformations. They will help us to maintain numerical stability because they do not amplify rounding errors. Orthogonal (2,2) matrices are rotations or reflections if they have the form:

respectively.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node198.html9/3/2006 14:19:19

Orthogonal Polynomials

next

up

previous

contents

index

Next: Orthonormal Up: No Title Previous: Orthogonal Matrices

Orthogonal Polynomials Polynomials of order n are analytic functions that can be written in the form

They can be differentiated and integrated for any value of x, and are fully determined by the n+1 coefficients a1. For this simplicity they are frequently used to approximate more complicated or unknown functions. In approximations, the necessary order n of the polynomial is not normally defined by criteria other than the quality of the approximation. Using polynomials as defined above tends to lead into numerical difficulties when determining the a , i

even for small values of n. It is therefore customary to stabilize results numerically by using orthogonal polynomials over an interval [a,b], defined with respect to a weight function W(x) by

Orthogonal polynomials are obtained in the following way: define the scalar product

between the functions f and g, where W(x) is a weight factor. Starting with the polynomials p0(x)=1, p1(x) =x, p2(x)=x2, etc., Gram-Schmidt decomposition one obtains a sequence of orthogonal polynomials such that

. The normalization factors N are arbitrary. When

all N are equal to one, the polynomials are called orthonormal. i

Examples:

http://rkb.home.cern.ch/rkb/AN16pp/node199.html (1 of 2)9/3/2006 14:19:21

n

Orthogonal Polynomials

Orthogonal polynomials of successive orders can be expressed by a recurrence relation:

This relation can be used to compute a finite series

with arbitrary coefficients a , without computing explicitly every polynomial p ( i

Horner's Rule).

j

Chebyshev polynomials T (x) are also orthogonal with respect to discrete values x : n

where the x depend on M. i

also [Abramowitz74], [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node199.html (2 of 2)9/3/2006 14:19:21

i

Orthonormal

next

up

previous

contents

index

Next: Outlier Up: No Title Previous: Orthogonal Polynomials

Orthonormal Used as an abbreviation for orthogonal and normalized. e.g. Orthogonal Polynomials.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node200.html9/3/2006 14:19:22

Outlier

next

up

previous

contents

index

Next: Overdetermined Systems Up: No Title Previous: Orthonormal

Outlier The statistical term for something physicists often include in the general term ``noise''. An outlier is an observation which does not correspond to the phenomenon being studied, but instead has its origin in background or in a gross measurement (or assignment) error. In practice, nearly all experimental data samples are subject to contamination from outliers, a fact which reduces the real efficiency of theoretically optimal statistical methods. Methods which perform well even in the presence of outliers are called robust methods ( Robustness).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node201.html9/3/2006 14:19:23

Overdetermined Systems

next

up

previous

contents

index

Next: Pade Approximation Up: No Title Previous: Outlier

Overdetermined Systems Fitting,

Least Squares,

Linear Equations

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node202.html9/3/2006 14:19:24

Pade Approximation

next

up

previous

contents

index

Next: Parallel Processing Up: No Title Previous: Overdetermined Systems

Pade Approximation A Padé approximation is a rational function, viz. a ratio of two polynomials, which agrees to the highest possible order with a known polynomial of order M:

One may think of the coefficients c as representing a power series expansion of any general function. In k

the rational function, one has to set a scale, usually by defining b0 = 0. This leaves m + n + 1 unknowns, the coefficients a and b , for which it is unproblematic to solve: the expression is multiplied with the i

i

denominator of the rational function, giving on both sides of the equation polynomials containing the unknown coefficients; one equates all terms with the same power of x to obtain the solution. Padé approximations are useful for representing unknown functions with possible poles, i.e. with denominators tending towards zero. For a discussion and algorithm, see [Press95], also [Wong92].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node203.html9/3/2006 14:19:25

Parallel Processing

next

up

previous

contents

index

Next: Penalty Function Up: No Title Previous: Pade Approximation

Parallel Processing Modern real-time digital signal and image processing operations have a tendency of being highly compute-intensive. Speedups of many orders of magnitude over previous systems were found through improvements in new technologies, e.g. integrated circuits; also improving algorithms and programming techniques have contributed. A major gain also comes from parallel computer architectures, interconnected commodity processors with programs using parallelism and pipelining at different levels.

For some applications, such architectures can improve overall speed substantially. Minsky expected only increase in speedup by bus-oriented multiprocessor architectures; supercomputer architects a claimed an increase according to Amdahl's formula

(but see also Amdahl's law concerning

general gains in parallelism). H.T. Kung claims ( [Kung79]) a perfectly linear speedup for his systolic array architecture. Clearly, we are in a domain of conjectures (and hype), and except for specific applications, nothing general can be stated. Most recently, it seems that the market favours clusters of general-purpose processors, with connections programmable as a shared-memory or message passing paradigm; they seem to dominate other architectures economically, even if applications lend themselves readily to finer-grain parallelism and better adapted architectures. Systolic arrays are one- to three-dimensional arrays of simple, mostly identical processing elements, with nearest-neighbour connection. They both compute and pass data rhythmically through the system (the word ``systole'' is used in physiology, describing the rhythmical pulses of blood through the body). An example of the use of systolic arrays is the implementation of the solution of the general linear least squares problem

http://rkb.home.cern.ch/rkb/AN16pp/node204.html (1 of 3)9/3/2006 14:19:27

Parallel Processing

with the known matrix A(m,n) and vector b(m), and the unknown vector x(n). Usually m>n. If we used the orthogonal triangularization A = QR by the Givens rotation, we could use the following systolic architecture (derived in [Gentleman81]) to perform the QR decomposition, and a linear one for the backsubstitution

.

In the figure, circles correspond to computation of the coefficients of the Givens rotation, and the http://rkb.home.cern.ch/rkb/AN16pp/node204.html (2 of 3)9/3/2006 14:19:27

Parallel Processing

squares perform the rotation. In [McWhirter83] a systolic architecture is described that produces immediately the residuals of such a fit. Because of problems connected with synchronization of a large array of processors, the asynchronous data-driven wave-array processor is usually preferred. It has the same structure as a systolic array, but without a global clock. Not correct timing, but only correct sequencing is important. For more reading and more references,

next

up

previous

contents

[Kung88], [Bromley86], [Whitehouse85].

index

Next: Penalty Function Up: No Title Previous: Pade Approximation Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node204.html (3 of 3)9/3/2006 14:19:27

Penalty Function

next

up

previous

contents

index

Next: Petri Nets Up: No Title Previous: Parallel Processing

Penalty Function A technique for introducing constraints into an otherwise unconstrained minimization problem, the name comes from the idea of adding a penalty for the violation of constraints. While minimizing the function, one therefore minimizes also the constraint violation. In the limit that the penalty is large compared with the rest of the function, the constraints will eventually be satisfied if possible. The technique is very general and can be applied to both equality and inequality constraints, but is of course not as efficient as more specialized methods designed for particular types of constraints. For equality constraints of the form g(a) = 0, where the vector a represents the free parameters of the problem, and g may be a vector if there are more constraints than one, the penalty function should be P = kg2(a), so that the total function to be minimized would be

or, more generally

where f(a) is the usual

or negative log-likelihood function, and k is a positive constant chosen large

enough that the penalty function is more important than f(a). For inequality constraints of the form g(a) >0, the same formalism applies, except that the penalty function is added only when the constraints are violated ( Minimization).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node205.html9/3/2006 14:19:28

Petri Nets

next

up

previous

contents

index

Next: Point Spread Function Up: No Title Previous: Penalty Function

Petri Nets Petri nets form a graphical language used in describing discrete parallel systems. They allow one to express the concepts of concurrency, and are used in modelling complex systems. They have been found useful in describing protocols used in networks. [Reisig85], [Murata89], [Ajmone94]For introductory reading .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node206.html9/3/2006 14:19:29

Point Spread Function

next

up

previous

contents

index

Next: Poisson Distribution Up: No Title Previous: Petri Nets

Point Spread Function Often called this in imaging systems, the PSF (also termed impulse response) of a system is the expression of combined effects of sensors and transmission affecting an observed image. Mathematically, the PSF is expressed as a function, typically in two dimensions, which acts on the original distribution via convolution. Read also Linear Shift-invariant Systems. For more reading, see [Jain89].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node207.html9/3/2006 14:19:30

Poisson Distribution

next

up

previous

contents

index

Next: Polar Coordinates Up: No Title Previous: Point Spread Function

Poisson Distribution The Poisson distribution can be defined as the limiting case of the binomial distribution for = const. It thus describes the behaviour of a large number n of independent experiments of but which only a very small fraction pn is expected to yield events of a given type A. As an example, n may be the number of radioactive nuclei in a source and p the probability for a nucleus to decay in a fixed interval of time. The probability for X=k events of type A to occur is

The distribution has the following properties

If k events are observed, is an unbiased estimator of the single parameter also equal to , hence approximately equal to

. The variance of

is

A simple generator for random numbers taken from a Poisson distribution is obtained using this simple recipe: if is a sequence of random numbers with uniform distribution between zero and one, k is the first integer for which the product

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node208.html9/3/2006 14:19:32

.

Polar Coordinates

next

up

previous

contents

index

Next: Polynomials Up: No Title Previous: Poisson Distribution

Polar Coordinates The two-dimensional polar coordinates

are related to Cartesian coordinates (x,y) by:

The matrix A giving polar coordinate unit vectors in terms of Cartesian unit vectors is then:

The volume element is dimensional polar coordinates,

, and the distance element is Spherical Coordinates.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node209.html9/3/2006 14:19:34

. For three-

Polynomials

next

up

previous

contents

index

Next: Population Up: No Title Previous: Polar Coordinates

Polynomials A polynomial of degree n in z is a function

where z and the coefficients a can be real or complex. Two important application domains are the i

following: ●

●

1) Polynomial approximation, including data-fitting, interpolation, and computer representations of functions. One may use either a single polynomial for the whole range of the argument, or a family of polynomials each defined only over a subinterval, with continuity of a specified order of derivative at the junction points ( Spline Functions). 2) Many problems, e.g. eigenvalue computation, can be reduced to finding the roots of a polynomial equation P (z) = 0. Methods of solving these are of two kinds: global, which find all n

the roots at once; or simple, which find a single root a and then ``deflate'' the polynomial by dividing it by z-a before repeating the process. Neville Algorithm, Pade Approximation. Some polynomials are illalso Interpolation, conditioned, i.e. the roots are very sensitive to small changes like truncation errors in the coefficients a , i

or the determination of the a suffers from their correlations. The use of orthogonal polynomials can i

overcome this. For practical fast computation of polynomial expressions,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node210.html9/3/2006 14:19:35

Horner's Rule.

Population

next

up

previous

contents

index

Next: Positivity Up: No Title Previous: Polynomials

Population In statistics, one calls population the group of ``events'' for which data are available and can be studied. Events are characterized by one or more random variables. The name comes from frequent applications to groups of people or animals. Studies are usually done on a (random) sample, taken from a population.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node211.html9/3/2006 14:19:36

Positivity

next

up

previous

contents

index

Next: Power of Test Up: No Title Previous: Population

Positivity The positivity (of a matrix) can be defined only for square, symmetric matrices; a matrix A is positivedefinite if

for all non-zero vectors x. A necessary and sufficient condition for this is that all

the eigenvalues of A be strictly positive. An analogous definition exists for negative-definite. If all the eigenvalues of a symmetric matrix are non-negative, the matrix is said to be positive semidefinite. If a matrix has both positive and negative eigenvalues, it is indefinite. When the elements of the matrix are subject to experimental errors or to rounding errors, which is nearly always the case in real calculations, one must be careful in recognizing a zero eigenvalue. The important quantity is then not the value of the smallest eigenvalue, but the ratio of the smallest to the largest eigenvalue. When this ratio is smaller than the relative accuracy inherent in the calculation, the smallest eigenvalue must be considered to be compatible with zero.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node212.html9/3/2006 14:19:37

Power of Test

next

up

previous

contents

index

Next: Predictor-Corrector Methods Up: No Title Previous: Positivity

Power of Test The power of a test is the probability of rejecting background events in hypothesis testing ( NeymanPearson Diagram). It can also be defined as the probability of not committing a type II error in hypothesis testing.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node213.html9/3/2006 14:19:37

Predictor-Corrector Methods

next

up

previous

contents

index

Next: Principal Component Analysis Up: No Title Previous: Power of Test

Predictor-Corrector Methods The predictor-corrector methods form a large class of general methods for numerical integration of ordinary differential equations. As an illustration, consider Milne's method [Milne49] for the first-order initial value y(x0)=y0. Define

equation

Then by Simpson's rule (

Numerical Integration, Quadrature),

Because

this corrector equation is an implicit equation for y approximation for y

n+1

; if h is sufficiently small, and if a first

n+1

can be found, the equation is solved simply by iteration, i.e. by repeated

evaluations of the right hand side. To provide the first approximation for y

, an explicit predictor

n+1

formula is needed, e.g. Milne's formula

The need for a corrector formula arises because the predictor alone is numerically unstable; it gives spurious solutions growing exponentially. Milne's predictor uses four previous values of y, hence extra starting formulae are needed to find y1, y2 and y3 when y0 is given. The starting problem is a weakness of predictor-corrector methods in general; nevertheless they are Numerov's serious competitors to Runge-Kutta methods. For details Numerov's Method and Method and, , [Wong92] or [Press95].

http://rkb.home.cern.ch/rkb/AN16pp/node214.html (1 of 2)9/3/2006 14:19:39

Predictor-Corrector Methods

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node214.html (2 of 2)9/3/2006 14:19:39

Principal Component Analysis

next

up

previous

contents

index

Next: Probability Up: No Title Previous: Predictor-Corrector Methods

Principal Component Analysis The principal component analysis or Karhunen-Loeve transform is a mathematical way of determining that linear transformation of a sample of points in L-dimensional space which exhibits the properties of the sample most clearly along the coordinate axes. Along the new axes, the sample variances are extremes (maxima and minima), and uncorrelated. The name comes from the principal axes of an ellipsoid (e.g. the ellipsoid of inertia), which are just the coordinate axes in question. By their definition, the principal axes will include those along which the point sample has little or no spread (minima of variance). Hence, an analysis in terms of principal components can show (linear) interdependence in data. A point sample of L dimensions for whose L coordinates M linear relations hold, will show only (L-M) axes along which the spread is non-zero. Using a cutoff on the spread along each axis, a sample may thus be reduced in its dimensionality (see [Bishop95]). The principal axes of a point sample are found by choosing the origin at the centre of gravity and forming the dispersion matrix

where the sum is over the N points of the sample and the x are the ith components of the point i

coordinates.

stands for averaging. The principal axes and the variance along each of them are then

given by the eigenvectors and associated eigenvalues of the dispersion matrix. Principal component analysis has in practice been used to reduce the dimensionality of problems, and to transform interdependent coordinates into significant and independent ones. An example used in several particle physics experiments is that of reducing redundant observations of a particle track in a detector to a low-dimensional subspace whose axes correspond to parameters describing the track. In practice, nonlinearities of detectors, frequent changes in detector layout and calibration, and the problem of transforming the coordinates along the principal axes into physically meaningful parameters, set limits to the applicability of the method. A simple program for principal component analysis is described in [O'Connel74].

next

up

previous

contents

index

http://rkb.home.cern.ch/rkb/AN16pp/node215.html (1 of 2)9/3/2006 14:19:41

Principal Component Analysis

Next: Probability Up: No Title Previous: Predictor-Corrector Methods Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node215.html (2 of 2)9/3/2006 14:19:41

Probability

next

up

previous

contents

index

Next: Probability Calculus Up: No Title Previous: Principal Component Analysis

Probability If in a total of N observations (or experiments) the event A occurs n times the probability of observing A is

(the frequency definition of probability). Obviously,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node216.html9/3/2006 14:19:42

.

Probability Calculus

next

up

previous

contents

index

Next: Probability Density Function Up: No Title Previous: Probability

Probability Calculus Let

denote arbitrary events with probabilities

. Define

as the event

complementary to A, hence with probability 1-P(A). Let AB be an event for which both A and B are true, and let P(B|A) denote the probability of an event B occurring under the condition that A is given. The Kolmogorov axioms can then be written in the following form: ●

a)

,

●

b) P(E) = 1 (E is the unit, or any event), c) (for A,B mutually exclusive),

●

d) P(AB) = P(A)P(B|A).

●

Rules: ●

i)

●

ii)

●

iii)

for

mutually

exclusive ( sum rule), ●

iv)

for n mutually exclusive events (rule of total probability),

●

v) P(AB) = P(A) P(B) for independent events A and B.

also Boolean Algebra.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node217.html9/3/2006 14:19:44

Probability Density Function

next

up

previous

contents

index

Next: Protocol Up: No Title Previous: Probability Calculus

Probability Density Function If a random variable X has a cumulative distribution function F(x) which is differentiable, the probability density function is defined as interval

is then

. The probability of observing X in the . For several variables

the

joint probability density function is

The transformation of a given probability density function f(x) to the probability density function g(y) of a different variable y = y(x) is achieved by

The assumption has to be made for y(x) to be a monotonically increasing or decreasing function, in order to have a one-to-one relation. In the case of a multidimensional probability density function, the derivative is replaced by the Jacobi determinant. see [Grimmett86], [Grimmett92].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node218.html9/3/2006 14:19:46

Protocol

next

up

previous

contents

index

Next: Pseudoinverse Up: No Title Previous: Probability Density Function

Protocol The set of rules agreed for the transfer of information between computer systems. Protocols are vital elements in computer networks with different host systems. Protocols are defined at different layers. High-level protocols may concern software for job submission or filing systems, low-level protocols concern transfers of small packets of information or even characters, independent of the information content; lowest-level protocols determine the hardware interfaces. A good protocol includes the addition of error-detecting and even error-correcting information, e.g. cyclic redundancy checks (CRCs). For more reading, see [McNamara82].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node219.html9/3/2006 14:19:47

Pseudoinverse

next

up

previous

contents

index

Next: Pseudorandom Numbers Up: No Title Previous: Protocol

Pseudoinverse The inverse A-1 of a matrix A exists only if A is square and has full rank. In this case, Ax = b has the solution x = A-1b. The pseudoinverse A+ is a generalization of the inverse, and exists for any (m,n) matrix. We assume m > n. If A has full rank (n) we define:

and the solution of Ax = b is x = A+b. The best way to compute A+ is to use singular value decomposition. With (n,n) are orthogonal and S (m,n) is diagonal with real, non-negative singular values s

, where U and V i

we find

does not exist, and one uses only the first r If the rank r of A is smaller than n, the inverse of singular values; S then becomes an (r,r) matrix and U,V shrink accordingly. see also Linear Equations.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node220.html9/3/2006 14:19:49

,

Pseudorandom Numbers

next

up

previous

contents

index

Next: Pull Value Up: No Title Previous: Pseudoinverse

Pseudorandom Numbers Generated in a digital computer by a numerical algorithm, pseudorandom numbers are not random, but should appear to be random when used in Monte Carlo calculations ( Random Numbers). The most widely used and best understood pseudorandom generator is the Lehmer multiplicative congruential generator, in which each number r is calculated as a function of the preceding number in the sequence:

or

k

where a and c are carefully chosen constants, and m is usually a power of two, 2 . All quantities appearing in the formula (except m) are integers of k bits. The expression in brackets is an integer of is to mask off the most significant part of the result length 2k bits, and the effect of the modulo of the multiplication. r0 is the seed of a generation sequence; many generators allow one to start with a different seed for each run of a program, to avoid re-generating the same sequence, or to preserve the seed at the end of one run for the beginning of a subsequent one. Before being used in calculations, the r

i

are usually transformed to floating point numbers normalized into the range [0,1]. Generators of this k

type can be found which attain the maximum possible period of 2 -2, and whose sequences pass all reasonable tests of ``randomness'', provided one does not exhaust more than a few percent of the full period ( [Knuth81]). A detailed discussion can be found in [Marsaglia85]. For portable generators, and many caveats concerning pseudorandom number generators, see [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node221.html9/3/2006 14:19:50

Pull Value

next

up

previous

contents

index

Next: Purity of Test Up: No Title Previous: Pseudorandom Numbers

Pull Value Defined in the context of least squares fitting, the pull value (also stretch value, or simply pull) of a variable is the difference between the direct measurement of the variable and its value as obtained from the least squares fit, normalized by dividing by the estimated error of this difference. Under the usual assumption of Gaussian errors, pulls should exhibit a standard normal distribution ( with = 0 and = 1), and any deviation from this distribution allows one in principle to identify wrong error assignments or other incorrect assumptions. In practice, the least squares fit correlates the different pull values strongly, so that the source of whatever deviations are observed is often difficult to localize. Outliers, for instance, often result in a general distortion of pull values, without being identifiable directly from these distortions. For the computation of errors of fitted quantities, Least Squares.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node222.html9/3/2006 14:19:51

Purity of Test

next

up

previous

contents

index

Next: QR Decomposition Up: No Title Previous: Pull Value

Purity of Test The purity of a test is the probability of rejecting background events in hypothesis testing ( Pearson Diagram).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node223.html9/3/2006 14:19:52

Neyman-

QR Decomposition

next

up

previous

contents

index

Next: Quadrature Up: No Title Previous: Purity of Test

QR Decomposition Orthogonal matrix triangularization (QR decomposition) reduces a real (m,n) matrix A with and full rank to a much simpler form. It guarantees numerical stability by minimizing errors caused by machine roundoffs. A suitably chosen orthogonal matrix Q will triangularize the given matrix:

with the (n,n) upper triangular matrix R. One only has then to solve the triangular system Rx = Pb, where P consists of the first n rows of Q. The least squares problem

is easy to solve with A = QR and

. The solution

becomes

This is a matrix-vector multiplication

, followed by the solution of the triangular system

by back-substitution. The QR factorization saves us the formation of

and the

solution of the normal equations. Many different methods exist for the QR decomposition, e.g. the Householder transformation, the Givens rotation, or the Gram-Schmidt decomposition.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node224.html9/3/2006 14:19:54

Quadrature

next

up

previous

contents

index

Next: Quantile Up: No Title Previous: QR Decomposition

Quadrature The computation of definite integrals in one or more dimensions; Quadrature.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node225.html9/3/2006 14:19:55

Numerical Integration,

Quantile

next

up

previous

contents

index

Next: Quantization Up: No Title Previous: Quadrature

Quantile A random variable X is described by a distribution function F(x) and also, if F(x) is differentiable, by a probability density function

. The quantile (or fractile ) x of the distribution, with 0 < q

q <1, is defined by

i.e. q is the probability of observing X < x . The quantile x1/2 is called the median of the distribution; x1/4 q

and x3/4 are the lower and upper quartiles. In analogy, also quintiles and percentiles, etc., are in use.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node226.html9/3/2006 14:19:56

Quantization

next

up

previous

contents

index

Next: Quasirandom Numbers Up: No Title Previous: Quantile

Quantization In its original meaning, quantization is the step of passing from a continuous to a discrete variable, like in analogue-to-digital signal conversion. More generally, the term can be used for any method decreasing the precision of representation by eliminating part of the information. Applied to image compaction ( [Jain89]), quantization describes a step of eliminating or reducing the relevance of coefficients that carry little information (a different, analogous quantization is found in the thresholding of coefficients in principal component analysis). A typical compression step used for images transforms pixel matrix, by a discrete cosine transform, then uses a quantization step a (sub-)image, e.g. an which consists of a suitable linear combination of the transformed pixels (i.e. in the ``frequency domain''), and then uses Huffman coding for the resulting information. Except for quantization, these steps have a clearly defined inversion, so that in the definition of the quantization matrix the key criterion is the quality difference between the original and the encoded/decoded image.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node227.html9/3/2006 14:19:57

Quasirandom Numbers

next

up

previous

contents

index

Next: Radius of Curvature Up: No Title Previous: Quantization

Quasirandom Numbers These are sequences of numbers to be used in Monte Carlo calculations ( Random Numbers), optimized not to appear highly random, but rather to give the fastest convergence in the computation. They are applicable mainly to multidimensional integration, where the theory is based on that of uniformity of distribution ([Kuipers74]). Because the way of generating and using them is quite different, one must distinguish between finite and infinite quasirandom sequences: ●

●

A finite quasirandom sequence is optimized for a particular number of points in a particular dimensionality of space. However, the complexity of this optimization is so horrendous that exact solutions are known only for very small point sets ([Kuipers74], [Zaremba72]) The most widely used sequences in practice are the Korobov sequences. An infinite quasirandom sequence is an algorithm which allows the generation of sequences of an arbitrary number of vectors of arbitrary length (p-dimensional points). The properties of these sequences are generally known only asymptotically, where they perform considerably better than truly random or pseudorandom sequences, since they give 1/N convergence for Monte Carlo integration instead of 1/

. The short-term distribution may, however, be rather poor, and

generators should be examined carefully before being used in sensitive calculations. Major improvements are possible by shuffling, or changing the order in which the numbers are used. An effective shuffling technique is given in [Braaten79].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node228.html9/3/2006 14:19:58

Radius of Curvature

next

up

previous

contents

index

Next: Radon Transform Up: No Title Previous: Quasirandom Numbers

Radius of Curvature Given a space curve described by the equation the derivative (``velocity'')

, where u is a variable parameter (``time''),

is a tangent vector to the curve at the point

. The arc

length s along the curve is defined by the equation

and the unit tangent vector is

By definition,

. Differentiating the equation

, we get

, hence the vector

is normal to the curve. By definition, the curvature at the point

where R is the radius of curvature. Note that since

is the length of this normal vector,

is a unit tangent vector,

angle by which the direction of the curve changes over the infinitesimal distance Example 1. Let x,y,z be Cartesian coordinates,

http://rkb.home.cern.ch/rkb/AN16pp/node229.html (1 of 2)9/3/2006 14:20:02

is simply the .

let u=x, and introduce the notation

Radius of Curvature

Then

In the special case of a plane curve with

we get

Example 2. For a charged particle in a magnetic field the radius of curvature of the track is proportional to the momentum component perpendicular to the field.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node229.html (2 of 2)9/3/2006 14:20:02

Radon Transform

next

up

previous

contents

index

Next: Random Numbers Up: No Title Previous: Radius of Curvature

Radon Transform The Radon transform of a function f(x,y) is defined as the integral along a straight line defined by its distance from the origin and its angle of inclination , a definition very close to that of the Hough transform

where the delta function defines integration only over the line. The range of

is limited to

.

Like in the Hough transform, the Radon operator maps the spatial domain (x,y) to the projection domain ( ), in which each point corresponds to a straight line in the spatial domain. Conversely, each point in the spatial domain becomes a sine curve in the projection domain, hence the use of the name sinogram. In tomography, a back-projection operator and the inverse of the Radon transform are used to reconstruct images in three dimensions from intensities recorded in one or two dimensions (see [Barrett81], [Phelps86], [Jain89]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node230.html9/3/2006 14:20:04

Random Numbers

next

up

previous

contents

index

Next: Random NumbersCorrelated Up: No Title Previous: Radon Transform

Random Numbers Random numbers are particular occurrences of random variables. They are used in Monte Carlo calculations, where three different types may be distinguished according to the method used to generate them: ●

●

●

a) Truly random numbers are unpredictable in advance and can only be generated by a physical process such as radioactive decay: in the presence of radiation, a Geiger counter will record particles at time intervals that follow a truly random (exponential) distribution. b) Pseudorandom numbers are those most often used in Monte Carlo calculations. They are generated by a numerical algorithm, and are therefore predictable in principle, but appear to be truly random to someone who does not know the algorithm. c) Quasirandom numbers are also generated by a numerical algorithm, but are not intended to appear to have the properties of a truly random sequence, rather they are optimized to give the fastest convergence of the Monte Carlo calculation.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node231.html9/3/2006 14:20:05

Random Numbers, Correlated

next

up

previous

contents

index

Next: Random Variable Up: No Title Previous: Random Numbers

Random Numbers, Correlated The notion of correlation is linked with that of variance and elements in an error matrix. Correlated random numbers arise from uncorrelated random numbers by error propagation. If correlated random numbers have to be generated according to a known error matrix, the inverse operation (of error propagation) is required: what is wanted is the matrix A which transforms the unit matrix I into the (given) error matrix E when propagating errors, viz. . This is exactly the problem of Cholesky decomposition: A will be a triangular matrix, and it can be found from E by that comparatively simple algorithm.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node232.html9/3/2006 14:20:06

Random Variable

next

up

previous

contents

index

Next: Rank Filter Up: No Title Previous: Random NumbersCorrelated

Random Variable The results of observations or experiments are subject to chance, if the phenomenon studied is of a statistical nature, or if the measurement is of limited accuracy. A measurement can be characterized by one or more numbers X which are random variables i

One talks of a discrete or a continuous random variable, if it takes discrete (e.g. integer) or continuous values, respectively. Random variables are characterized by their probability density function or, equivalently, distribution function.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node233.html9/3/2006 14:20:07

Rank Filter

next

up

previous

contents

index

Next: Recursion Up: No Title Previous: Random Variable

Rank Filter Rank filters in image processing sort (rank) the greyvalues in some neighbourhood of every pixel in ascending order, and replace the centre pixel by some value in the sorted list of greyvalues. If the middle value is chosen, this is the median filter. If the smallest or largest value is chosen, the filter is called a minimum or maximum filter, respectively; the latter are also used in greylevel morphological transforms Morphological Operations). (

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node234.html9/3/2006 14:20:08

Recursion

next

up

previous

contents

index

Next: Regression Analysis Up: No Title Previous: Rank Filter

Recursion Recursive definitions of functions or concepts (also called recurrence relations) occur whenever a function or concept is defined using the function or concept itself. An example of a particularly simple recursive definition is given by the factorial:

Another example of a recursive definition of a mathematical function is found in orthogonal polynomials:

In computer science, recursive definitions abound in formalizing the syntax of languages. A string, for instance, is defined as a null string OR the concatenation of a single character with a string. Typical for recursive definitions is, of course, that the process of using the definition iteratively will eventually come to a halt: the repeated use of the factorial's definition starting with a given n will come to an end when reaching 0!, and therefore will result in a non-recursive definition, the product of all integers from 1 to n. The practical implementation of recursive algorithms can ease the programming of certain problems considerably. A recursive algorithm is one that, directly or through other program parts, calls itself before returning, i.e. it can be activated simultaneously at different levels and with different calling parameters. For such algorithms to be possible, certain compiler facilities are necessary; in particular, the calling parameters of a program must be kept in a stack. Recursive features are also important in writing operating system components.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node235.html9/3/2006 14:20:09

Regression Analysis

next

up

previous

contents

index

Next: Regularization Up: No Title Previous: Recursion

Regression Analysis A technique for finding mathematical relationships between dependent and indpendent variables. Taken from the terminology of statistics textbooks, regression analysis is closely related to fitting and to principal component analysis. also Linear Regression.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node236.html9/3/2006 14:20:10

Regularization

next

up

previous

contents

index

Next: Relaxation Up: No Title Previous: Regression Analysis

Regularization A numerical method to solve problems of deconvolution by introducing a priori information about the smoothness of the expected result. For more detail, [Provencher82], [Blobel85], [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node237.html9/3/2006 14:20:11

Relaxation

next

up

previous

contents

index

Next: Resampling Up: No Title Previous: Regularization

Relaxation A simple iterative method for solving systems of equations. The method was much used even in precomputer times for its simplicity in solving large systems of linear equations, using hand calculation. On computers, many numerical methods in solving differential equations are of the relaxation type, e.g. the finite difference method. Relaxation consists of finding, for a given set of approximate parameters, the residuals, i.e. those values of the equations which are zero for the correct parameter values. Residuals are then iteratively reduced (``relaxed'') by selected changes in the parameters, in an order such that convergence is as rapid as possible. Relaxation procedures are difficult to generalize, and convergence depends strongly on the quality of the first approximation. For details, e.g. [Press95]. For application in solving large systems of linear equations,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node238.html9/3/2006 14:20:12

Linear Equations, Iterative Solutions .

Resampling

next

up

previous

contents

index

Next: Residuals Up: No Title Previous: Relaxation

Resampling The mathematical calculation of values (amplitudes, greyvalues, colour values) in a digital signal or image for coordinate values lying between the points sampled in the original signal or image. Typically, this is obtained by some interpolation method, using neighbouring values; also Geometrical Transformations.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node239.html9/3/2006 14:20:13

Residuals

next

up

previous

contents

index

Next: Right-handed Coordinate System Up: No Title Previous: Resampling

Residuals The (usually small) quantities expressing the degree to which constraint equations or other systems are not satisfied due to measurement errors or approximate parameters. Most iterative solution methods use the repeated evaluation of residuals, and often minimization of some function of residuals ( e.g. Least Squares Method).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node240.html9/3/2006 14:20:14

Right-handed Coordinate System

next

up

previous

contents

index

Next: Rms Error Up: No Title Previous: Residuals

Right-handed Coordinate System If the axes of a three-dimensional Cartesian coordinate system are rotated such that the x axis points toward the observer and the y axis points down, then the coordinate system is right-handed if the z axis points toward the right and left-handed if it points to the left. Any other three-dimensional orthogonal coordinate system can then be defined to be right- or left-handed depending on the determinant of the matrix A (relating it to the right-handed Cartesian system) being +1 or -1. ( Coordinate Systems).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node241.html9/3/2006 14:20:15

Rms Error

next

up

previous

contents

index

Next: Robustness Up: No Title Previous: Right-handed Coordinate System

Rms Error Short for root mean square error. The estimator of the standard deviation.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node242.html9/3/2006 14:20:16

Robustness

next

up

previous

contents

index

Next: Rotations Up: No Title Previous: Rms Error

Robustness A procedure is called robust when it can be shown that it is not very sensitive to the assumptions on which it depends and to the quality of the data it operates on. Examples: ●

● ●

- matrix inversion methods that produce reasonably reliable results for slightly degenerate matrices ( Linear Equations), - statistical tests that are insensitive to outliers, - methods of analysis that compensate for expected uncontrollable systematic errors.

For a discussion, see [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node243.html9/3/2006 14:20:17

Rotations

next

up

previous

contents

index

Next: Runge-Kutta Methods Up: No Title Previous: Robustness

Rotations A rotation is a linear transformation (usually in three-dimensional space with a positive definite scalar product) that preserves scalar products. Usually a determinant of +1 is also postulated, else the transformation is called a reflection. If

Let

and

are (three-dimensional) vectors, R is a rotation and

and

are the rotated vectors, then

and

be orthonormal basis vectors, i.e. . Then

The

matrix

which represents the rotation R, is an orthogonal matrix, since

http://rkb.home.cern.ch/rkb/AN16pp/node244.html (1 of 5)9/3/2006 14:20:32

. Define matrix elements

Rotations

If x1, x2, x3 are the components of the vector

with respect to the basis

,

then

In matrix notation,

where R is the

matrix defined above.

If the rotation R is followed by a second rotation S, the result is a third rotation Q= SR, defined by

In terms of

matrices the composition SR is simply the matrix product, since

The above formalism treats rotations as active transformations, i.e. the vectors are rotated and the basis vectors are kept fixed. The passive point of view is often adopted, where a vector is not transformed, but its coordinates x1, x2, x3 change because the basis vectors are rotated. If http://rkb.home.cern.ch/rkb/AN16pp/node244.html (2 of 5)9/3/2006 14:20:32

Rotations

then the new coordinates

are defined by

or in matrix notation

If one passive rotation (coordinate transformation) U is followed by another, V, such that

then the total result is a third passive rotation P, such that

Note that the composition of passive rotations, first U and then V, leads to a matrix product, P=UV, in which the order is reversed. The reason for the reversal is that the matrix elements of U and of V are taken with respect to two different bases, and .

http://rkb.home.cern.ch/rkb/AN16pp/node244.html (3 of 5)9/3/2006 14:20:32

Rotations

A rotation is defined by a rotation axis

, and an angle of rotation

. With

the corresponding rotation matrix is

In vector notation,

A general rotation R can also be parameterized by the Euler angles , where different convention is to use

,

and

is an active rotation by an angle about the axis instead of

Example (

,

,

, ,

Let x1,x2,x3 and

. Let

be the coordinates of a point

and

be a second Euclidean coordinate system. with respect to the two systems, i.e.

The coordinate transformations from one system to the other and back are:

http://rkb.home.cern.ch/rkb/AN16pp/node244.html (4 of 5)9/3/2006 14:20:32

,

, etc., we have

Coordinate Systems). A Euclidean coordinate system is determined by an origin

three orthonormal basis vectors

. (A

, the relation is very simply that

. The ranges of the angles are: . Explicitly, with

, as

Rotations

where

, i.e. R is the rotation defined by

Suppose one has measured three reference points

,

. , and

in the two systems in order to determine

,

the coordinate transformation. The three distances

and

should be

independent of the coordinate system; this gives three constraints

One should make a least squares fit in order to get the constraints exactly satisfied (the gives a consistency check of the measurements). Define

if one defines

. Similarly,

of the fit

. Then

. It follows that

, and the

matrix R can be found from the linear equation

The solution for R is unique whenever the vectors

next

up

previous

contents

and

are linearly independent. Finally,

index

Next: Runge-Kutta Methods Up: No Title Previous: Robustness Rudolf K. Bock, 7 April 1998 http://rkb.home.cern.ch/rkb/AN16pp/node244.html (5 of 5)9/3/2006 14:20:32

Runge-Kutta Methods

next

up

previous

contents

index

Next: Runs Up: No Title Previous: Rotations

Runge-Kutta Methods Runge-Kutta (RK) methods for numerical integration of ordinary differential equations are popular because of their simplicity and efficiency. They are considered a class different from the predictorcorrector methods or PC methods, although the RK and PC methods are very similar in many respects. The fourth-order RK method of Nystrom for second-order equations

has proved very useful in application to the tracking of charged particles in magnetic fields. It consists of the formulae

This formulation is taken from [Abramowitz74] (algorithm 25.5.20), and minimizes the number of evaluations of the magnetic field (in practice only two evaluations per step are needed, since y4 is close to y(x+h), and the next step can be started using the latest field values from the present step). If applied to the equation

, written in the form

with

method reduces to Simpson's rule

For a more detailed discussion, e.g. [Wong92], [Press95], or [Flowers95]. http://rkb.home.cern.ch/rkb/AN16pp/node245.html (1 of 2)9/3/2006 14:20:37

, the Nystrom

Runge-Kutta Methods

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node245.html (2 of 2)9/3/2006 14:20:37

Runs

next

up

previous

contents

index

Next: Runs Test Up: No Title Previous: Runge-Kutta Methods

Runs In any sequence of real numbers not containing exact zeros, a run is a subsequence of consecutive numbers of the same sign, immediately preceded and followed by numbers of the opposite sign, or by the beginning or end of the sequence. The number of runs in a sequence is therefore one more than the number of sign changes in the sequence. If M positive numbers and N negative numbers appear in a random sequence with all orderings equally probable, then the expected number of runs and its variance are:

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node246.html9/3/2006 14:20:38

Runs Test

next

up

previous

contents

index

Next: Saddle Point Up: No Title Previous: Runs

Runs Test A test whether a one-dimensional data sample is compatible with being a random sampling from a given distribution. It is also used to test whether two data samples are compatible with being random samplings of the same, unknown distribution. One first forms the histogram of the difference between the two histograms to be compared, or of the difference between the histogram and the function to be compared, and then one counts the number of runs in the difference. This number is then compared with that expected under the null hypothesis, Runs). which is such that all orderings of sign are equally probable ( The runs test is usually not as powerful as the Kolmogorov test or the it can be combined with the

test (

test since it is (asymptotically) independent of it.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node247.html9/3/2006 14:20:39

Chi-Square Test), but

Saddle Point

next

up

previous

contents

index

Next: Sagitta Up: No Title Previous: Runs Test

Saddle Point If

is a scalar function of n variables, a saddle point is any point (

) where the gradient of f is zero, but which is not a local maximum or minimum because the second derivatives of f are of different sign in different directions. The condition on the second derivatives is in general that the matrix of second derivatives (the Hessian ) is indefinite at a saddle point ( Positivity).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node248.html9/3/2006 14:20:40

Sagitta

next

up

previous

contents

index

Next: Sample Up: No Title Previous: Saddle Point

Sagitta If the end points of a circle segment (length s, circle radius ) are joined by a straight line, the chord, the deviation of the chord from the circle at the segment's midpoint is called the sagitta f:

For small angles, or

If a curvature

, one obtains

is measured by measuring f (e.g. for a particle track in a magnetic field), then

where s = projected length. By

with P = projected momentum, H= magnetic flux density

and e= elementary charge 0.2998 GeV/ c T-1 m-1] one obtains

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node249.html9/3/2006 14:20:43

Sample

next

up

previous

contents

index

Next: Sample MeanSample Variance Up: No Title Previous: Sagitta

Sample A number of observations

of a random variable. A sample is a random sample if

the probability density describing the probability for the observation of product

is given by a

This implies in particular that the X are independent, i.e. that the result of any observation does not i

influence any other observations. In statistical nomenclature, sample is usually short for random sample , taken from a population .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node250.html9/3/2006 14:20:45

Sample Mean, Sample Variance

next

up

previous

contents

index

Next: Sampling from a Probability Up: No Title Previous: Sample

Sample Mean, Sample Variance Estimator

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node251.html9/3/2006 14:20:46

Sampling from a Probability Density Function

next

up

previous

contents

index

Next: Sampling Theorem Up: No Title Previous: Sample MeanSample Variance

Sampling from a Probability Density Function Frequently, a random sample is required that exhibits a known probability density function. Random number generators, on the other hand, usually supply samples with a uniform distribution, typically between 0 and 1. What is needed is a recipe for converting the flat probability density function (pdf) to the desired (``target'') pdf. Call the target pdf f(x); distribution function is defined by

By definition,

F(x) is uniformly distributed over the interval [0,1], for all f(x); hence a uniformly distributed random number r can be interpreted as a random value of F(x). If a generator supplies r, the variable x is obtained by solving

for x. If f(x) is not analytically integrable, or only known in discretized form (e.g. as a histogram), simple methods of numerical integration and, possibly, interpolation will usually suffice.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node252.html9/3/2006 14:20:47

Sampling Theorem

next

up

previous

contents

index

Next: Scalar Product Up: No Title Previous: Sampling from a Probability

Sampling Theorem The greyvalues of digitized one- or two-dimensional signals are typically generated by an analogue-todigital converter (ADC), by sampling a continuous signal at fixed intervals (e.g. in time), and quantizing (digitizing) the samples. The sampling (or point sampling) theorem states that a band-limited analogue signal x (t), i.e. a signal in a finite frequency band (e.g. between 0 and BHz), can be completely a

reconstructed from its samples x(n) = x(nT), if the sampling frequency is greater than 2B (the Nyquist rate); expressed in the time domain, this means that the sampling interval T is at most 1/2B seconds. Undersampling can produce serious errors (aliasing ) by introducing artefacts of low frequencies, both in one-dimensional signals and in digital images:

For more details and further reading ,

e.g. [Kunt80] or [Rabiner75].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node253.html9/3/2006 14:20:48

Scalar Product

next

up

previous

contents

index

Next: Scatter Diagram Up: No Title Previous: Sampling Theorem

Scalar Product Let

and

when a and

be vectors in a real vector space (that is, the linear combination are real numbers). By definition, the scalar product

is again a vector

, also sometimes denoted

, is a real number, and the following relations are valid,

In addition, one often requires the scalar product to be positive definite, i.e.

except when

. Let

be basis vectors, and define the metric tensor

Write also

j

It is convenient to write the index j of the vector component x as a superscript, and to introduce the summation convention that a repeated index appearing once as a superscript and once as a subscript is to be summed over from 1 to n. Hence,

If the scalar product is positive definite, the length

http://rkb.home.cern.ch/rkb/AN16pp/node254.html (1 of 3)9/3/2006 14:20:53

of the vector

is defined by

.

Scalar Product

The angle

between two vectors

and

Schwarz inequality ensures that

is defined by

.

In the positive definite case it is always possible to introduce orthonormal basis vectors ( Schmidt decomposition), so as to obtain the Euclidean form of the scalar product,

Gram-

The Minkovski scalar product, for n=4,

is not positive definite. Various special notations are used for such 4-vectors (particle four-momentum vectors). As a more general example, the integral

where w is any weight function, may be regarded as a scalar product between the functions f and g defined on the interval [a,b]. This defines a so-called L2-space. ( Orthogonal Polynomials). In the case of a complex vector space, the symmetry and bilinearity conditions are slightly more complicated,

where * denotes complex conjugation.

http://rkb.home.cern.ch/rkb/AN16pp/node254.html (2 of 3)9/3/2006 14:20:53

Scalar Product

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node254.html (3 of 3)9/3/2006 14:20:53

Scatter Diagram

next

up

previous

contents

index

Next: Schwarz Inequality Up: No Title Previous: Scalar Product

Scatter Diagram The elements

of a sample can be visualized without loss of accuracy by marking

for each element a stroke at the corresponding place on an X axis. The resulting figure is a onedimensional scatter diagram. (Usually one prefers to construct a one-dimensional histogram). If the of two variables they are marked as points in elements of the sample are pairs (X , Y ), i

i

an X,Y-plane. In the resulting two-dimensional scatter diagram a correlation between X and Y (viz. a sizeable correlation coefficient) can often be detected by mere visual inspection of the data.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node255.html9/3/2006 14:20:54

Schwarz Inequality

next

up

previous

contents

index

Next: Shaping Up: No Title Previous: Scatter Diagram

Schwarz Inequality For any positive definite scalar product, real or complex, the inequality

holds. In particular, for complex numbers x , y one has i

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node256.html9/3/2006 14:20:55

i

Shaping

next

up

previous

contents

index

Next: Sharpening Up: No Title Previous: Schwarz Inequality

Shaping Sensor signals are often processed, in the analogue domain, by shaping circuits. The purpose is to improve the signal-to-noise ratio, and in the time domain, the shape of the signal. In the frequency domain it can be seen as a bandpass filter limiting the bandwidth to the part of the spectrum with the most favourable signal-to-noise ratio. A typical shaper is a combination of integrating and differentiating functions, often implemented as resistor-capacitor combinations.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node257.html9/3/2006 14:20:56

Sharpening

next

up

previous

contents

index

Next: Sigmoid Function Up: No Title Previous: Shaping

Sharpening A process of edge enhancement in digital images. Physiological experiments have shown that the human visual system is very sensitive to changes or discontinuities in the greyvalue of an image. Luminance changes, for instance, contain much more information than homogeneous grey areas. We want to consider here only such greylevel edges (not texture edges, etc.). Many edge enhancement methods are known, mainly based on spatial differentiation, smoothing is mainly based on integration.

In the discrete case, differentiation corresponds to difference operations. The simplest procedures to produce two-dimensional gradients are the following non-linear operators: the three-point gradient

http://rkb.home.cern.ch/rkb/AN16pp/node258.html (1 of 4)9/3/2006 14:21:00

Sharpening

and the Roberts operator

Instead summing the two terms, one could also use the maximum. Without some prefiltering these operators are sensitive to noise. Usually linear filters are used, i.e. convolution filters which multiply the greyvalues of the input pixels in some neighbourhood with a ``kernel'', a matrix of filtering coefficients, and then replace the centre pixel by the result. Frequently, the Sobel operator is applied. It includes some averaging and takes into account the distances from the centre, with different weights. Usually, it is written in the form of two convolution masks S and S (given are the and version): x

y

which define the edge magnitude given by |S | + |S | and the edge direction given by x

The effect of applying the

y

Sobel operator can be seen in the example:

Sometimes a template matching method is used for edge extraction: The image is convolved with several masks, and subsequent processing takes the convolved images as input. In the following example with eight masks, generated by rotation from a single starting mask, the maximum in the convolved images is taken as the edge magnitude and its direction defines the edge http://rkb.home.cern.ch/rkb/AN16pp/node258.html (2 of 4)9/3/2006 14:21:00

.

Sharpening

direction. Here are some possible masks, all shown only in one of the eight principal directions (NW). The other seven masks are obtained by rotating the coefficients around the centre pixel. The first two are (rotated versions of) the Prewitt and Sobel operators, and the last two are called the Kirsch and Compass operators.

Frei and Chen (see [Frei77]) use nine masks M

i

; they form an orthogonal basis, such

neighbourhood N can be written as a linear combination of the M s, viz.

that any

i

, and are defined by:

The weights w define the intensity with which a certain pattern contributes to N; the first four masks i

http://rkb.home.cern.ch/rkb/AN16pp/node258.html (3 of 4)9/3/2006 14:21:00

Sharpening

describe edge patterns, M5 to M8 line patterns, and M9 the average. Laplace transforms are rotation-invariant, and produce image sharpening without any directional information. Here are some examples:

One can see that Laplacians are proportional to subtracting an average of neighbouring pixels from the centre pixel. This has an analogy in photography, where subtraction of a blurred version of an image from the image is called unsharp masking. For the more optimal edge detectors by Canny, Shen-Castan we refer to the literature, [Shen92], [Canny83]. About morphological edge detection, Morphological Operations. For further reading, refer to the standard textbooks [Jain89], [Gonzalez87], [Pratt78], [Rosenfeld76].

next

up

previous

contents

index

Next: Sigmoid Function Up: No Title Previous: Shaping Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node258.html (4 of 4)9/3/2006 14:21:00

Sigmoid Function

next

up

previous

contents

index

Next: Signal Processing Up: No Title Previous: Sharpening

Sigmoid Function A smooth and continuous thresholding function of the type

s(x) =

1 . 1 + e-ax

For large a, the function approaches a Heaviside step function at x = 0. The sigmoid is frequently used as a transfer function in artificial neural networks.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node259.html9/3/2006 14:21:01

Signal Processing

next

up

previous

contents

index

Next: Significance of Test Up: No Title Previous: Sigmoid Function

Signal Processing Image Processing

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node260.html9/3/2006 14:21:02

Significance of Test

next

up

previous

contents

index

Next: Simplex Method Up: No Title Previous: Signal Processing

Significance of Test The significance of a test is the probability of rejecting good events in hypothesis testing ( NeymanPearson Diagram). It can also be defined as the probability of a type I error in hypothesis testing.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node261.html9/3/2006 14:21:03

Simplex Method

next

up

previous

contents

index

Next: Simpson's Rule Up: No Title Previous: Significance of Test

Simplex Method Under the name simplex method one understands a minimizing algorithm for general non-linear functions, due basically to Nelder and Mead [Nelder65]. More precisely, this is called the downhill simplex method. The simplex method is an efficient iterative algorithm to solve unconstrained minimization problems numerically for several but not too many variables. Quick convergence and intelligent choice of linearization of the function to be minimized are non-trivial key elements in general minimization algorithms. The simplex method does not use derivatives, analytic or numeric. It attempts to enclose the minimum inside an irregular volume defined by a simplex (= an n-dimensional convex volume bounded by (n-1)-dimensional hyperplanes and defined by n + 1 linearly independent corners, e.g. a tetrahedron for n = 3). The simplex size is continuously changed and mostly diminished, so that finally it is small enough to contain the minimum with the desired accuracy. The operations of changing the simplex optimally with respect to the minimal/maximal function values found at the corners of the simplex are contraction, expansion and reflection, each determining new simplex corner points by linear combinations of selected existing corner points. The details of the method are explained in [Nelder65]; the method is usually embedded in standard minimizing packages. A mathematical discussion of the downhill simplex method and of a (quite different) simplex method used in linear programming problems, is found in [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node262.html9/3/2006 14:21:04

Simpson's Rule

next

up

previous

contents

index

Next: Simulated Annealing Up: No Title Previous: Simplex Method

Simpson's Rule A simple rule, providing limited-accuracy results for numerical integration; the rule uses function values at equidistant abscissae (or mesh points). Numerical Integration, Quadrature, [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node263.html9/3/2006 14:21:05

Simulated Annealing

next

up

previous

contents

index

Next: Singular Value Decomposition Up: No Title Previous: Simpson's Rule

Simulated Annealing A method of solving minimization problems with a very large number of free parameters, typically with an objective function that can be evaluated quickly.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node264.html9/3/2006 14:21:06

Singular Value Decomposition

next

up

previous

contents

index

Next: Skewness Up: No Title Previous: Simulated Annealing

Singular Value Decomposition Often abbreviated SVD: any (m,n) matrix can be decomposed into:

where U is an (m,m) orthogonal matrix, V is an (n,n) orthogonal matrix and S is an (m,n) diagonal matrix with real, non-negative elements in descending order:

The

are the singular values of A and the first

columns are the left and right singular

vectors of A. S has the form:

where

If

is a diagonal matrix with the diagonal elements ( . If , then

and

,

). We assume now

= 0, then r is the rank of A. In this case, S becomes an (r,r) matrix,

and U and V shrink accordingly. SVD can thus be used for rank determination. The SVD provides a numerically robust solution to the least squares problem. The solution

becomes with

:

http://rkb.home.cern.ch/rkb/AN16pp/node265.html (1 of 2)9/3/2006 14:21:09

Singular Value Decomposition

For more details, see [Golub89] or [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node265.html (2 of 2)9/3/2006 14:21:09

Skewness

next

up

previous

contents

index

Next: Small Samples Up: No Title Previous: Singular Value Decomposition

Skewness The skewness, , of a distribution is defined as the quotient of the third moment (X) and the third power of the standard deviation

about the mean E

It vanishes for symmetric distributions and is positive (negative) if the distribution develops a longer tail to the right (left) of the mean E(x).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node266.html9/3/2006 14:21:10

Small Samples

next

up

previous

contents

index

Next: Smoothing Up: No Title Previous: Skewness

Small Samples Many statistical methods and theorems are considerably simplified in the asymptotic limit of large amounts of data, very precise measurements, and/or linear models. Unfortunately, research workers accustomed to using the simplified results may be unaware of the complications which arise in the general case, i.e. for small data samples. These complications are of various different origins: Differences between the Bayesian and non-Bayesian approaches to statistics ( Bayesian Statistics), which are negligible for large data samples, become important for small samples, where the treatment of prior knowledge has a large effect on the quoted results. Many methods assume that certain underlying distributions are Gaussian because of the central limit theorem, whereas for small samples this may not be true. For example,

tests on histograms are valid

only when there are enough events per bin ( Binning) so that the distribution in each bin is approximately Gaussian. For small samples, one must fall back on the multinomial distribution, which is much harder to handle. Related to the above is the problem of giving correct confidence limits on variables when only a few events are available. In the non-Bayesian theory, the exact confidence limits are given for the Poisson distribution (for cross-sections) in [Regener51], and for the binomial distribution (for branching ratios) in [James80]. The observed value of a variable is often used in place of its expected value in statistical calculations, and this approximation may be poor for small samples. For example, the variance of a Poisson distribution is exactly the square root of the expected value, but only approximately the square root of the observed value. Since the expected value depends on the hypothesis (or on the parameters of the fit) it is more convenient to use the observed value which is constant during a fit and does not depend on the hypothesis. This introduces a bias for small samples, since a negative fluctuation will then be assigned a smaller error and larger weight than a positive fluctuation. In calculating the usual

, the observed number of events in a bin should be compared with the

integral of the expectation over the bin, but one usually takes the value of the expectation in the middle of the bin and multiplies by the bin width. For small samples, bins may be wide and this approximation may be poor if the expectation is varying considerably. When errors are propagated from measured quantities to derived quantities, the usual formula takes account only of the linear terms in the transformation of variables. This is a good approximation only http://rkb.home.cern.ch/rkb/AN16pp/node267.html (1 of 2)9/3/2006 14:21:12

Small Samples

when the transformation really is linear or when the errors are small. The calculation for the more general case can be quite complicated and generally gives rise to non-Gaussian errors on the derived quantity, even when the original measured quantities have Gaussian errors. An example of such a calculation by Monte Carlo methods is given in [James81].

next

up

previous

contents

index

Next: Smoothing Up: No Title Previous: Skewness Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node267.html (2 of 2)9/3/2006 14:21:12

Smoothing

next

up

previous

contents

index

Next: Software Engineering Up: No Title Previous: Small Samples

Smoothing In digital image processing, noise removal or, more generally, image smoothing is one of the most important design goals of image enhancement. Most techniques, unfortunately, have the side effect of also blurring the image where this is undesirable, e.g. at sharp edges. Convolutional low-pass filter masks are, e.g.: Averaging:

Gaussian:

These methods reduce the noise, but usually they also blur the image at undesirable places. For larger masks (and if special convolver hardware is not available) it is preferable to do the convolution by passing to the frequency domain ( Convolution). Random noise can be removed preserving edges by using the median filter. An example for a different smoothing mask removing out-of-range pixels by convolution is

which computes the average of the eight neighbours; this must be used conditionally: to suppress

http://rkb.home.cern.ch/rkb/AN16pp/node268.html (1 of 2)9/3/2006 14:21:14

Smoothing

impulse noise, the centre pixel is replaced by this average if it differs by more than a given threshold. Another method to smooth images preserving sharp edges is given in [Kuwahara76]; the method compares four different areas around the centre pixel as in the figure:

and replaces the centre pixel by the average of the most homogeneous area, the one with the smallest variance, where the variance is usually defined as:

with the sums taken over

and

denoting the average. Other authors, e.g. [Nagao78],

use more areas of different shapes. An extension of the median operation can be found in [Astola89], where several edge-preserving smoothing methods, based on the median operation and FIR filters, are described. Introductory textbooks are [Jain89], [Gonzalez87], [Pratt78], [Rosenfeld76].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node268.html (2 of 2)9/3/2006 14:21:14

Software Engineering

next

up

previous

contents

index

Next: Sorting Up: No Title Previous: Smoothing

Software Engineering The systematic application of terminology, methods and tools to achieve defined technical goals for a software-intensive system. The acceptance of software engineering principles is highly relevant to the creation of large programs in a team, in particular if these programs ● ● ●

- are to be run under different computer systems, - have a long lifetime, and - need to be adapted to changing problems.

In software engineering, the programming process is divided into phases which together are called the life cycle of the program. They are: ● ● ● ● ● ● ●

- requirements analysis and specification - architectural design, - detailed design - implementation (coding) - unit (module) testing - system testing (integration) - operation and maintenance (evolution).

For each phase, methodologies have been used and described; where a customer-producer relation exists (viz. the users are not part of the same organization as the software suppliers), software standards have been defined ([Mazza94], [Fairclough96]). Despite a tradition that started in the 1970s, the field is still in rapid development, and few methods have been widely accepted. For further reading, browse the Internet, or see [Freeman76], [Wasserman80], [Lehmann80], [Booch87], [McDermid91], [Checkland92], [Marciniak94], [Humphrey95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node269.html9/3/2006 14:21:15

Sorting

next

up

previous

contents

index

Next: Spherical Coordinates Up: No Title Previous: Software Engineering

Sorting Sorting is the process of rearranging information in a way to make part of the information obey some specified ordering rules (more properly, sorting should be called ordering). Well-known examples are to arrange numerical information in ascending or descending order, or to order alphanumeric information alphabetically. The information used for sorting is often expected to be contained in a single computer word, particularly if numerical sorting is intended. Alphabetic sorting usually allows one to define the length of character strings to be considered in the sort. This information is called the sorting key . If an algorithm is to be used which does not allow for a sufficiently long sorting key, it may be used in several passes, starting with the least significant key; this trick assumes, however, that the algorithm preserves the original order if two keys are identical. Alternatively, a simple secondary sort may be applied to those (usually small) groups of data, which the first sort has found with identical keys. One talks about internal sorting if the information to be ordered can be contained in a computer's memory, of external sorting if information on tape(s) or disk files are to be ordered. Simple algorithms for internal sorting are a favourite playground for students in programming. Efficient algorithms for a large number of keys, instead, are non-trivial to write and should preferably be taken from a program library. The trivial bubble sort algorithm corresponds to what we do when we sort items by hand: pick up every unsorted item in turn and place it at the correct location in a slowly growing sorted pack. On the computer, of course, this is done by simple list processing techniques, to avoid frequent copying. Algorithms of this type use computer time increasing with n2. What is usually offered in good programming libraries are algorithms that use time proportional to , and hence are still efficient for large n (say n>50). A well-known algorithm efficient for random order of keys at input is the Shell algorithm [Naur74]. It uses a number of bubble sort passes on subsets of the keys. At first, only pairs of elements at large distance are compared and, if necessary, interchanged. Later, distances decrease whereas the number of elements being bubble-sorted increases. Other methods use continuous splitting of the array to be sorted into smaller and smaller subarrays, which finally contain very few elements and are then ordered. The recombination of this tree structure results finally in the ordered array ([Scowen65]). Such algorithms often achieve time-efficiency at the expense of auxiliary intermediate storage. Efficient sorting algorithms and discussions can be found in the literature ([Flores69], [Knuth81], [Press95]). It should be noted that some algorithms make use of assumed properties of the input file like the existence of already ordered sequences, and are inefficient (although correct) for data lacking these properties.

http://rkb.home.cern.ch/rkb/AN16pp/node270.html (1 of 2)9/3/2006 14:21:16

Sorting

As the ordering keys, in the general case, carry a `` load'' of information to accompany them, internal sorting algorithms frequently do not reorder information in store. Their output, instead, is the sequence of pointers which corresponds to ordered access to the keys, hence permitting subsequent secondary sorts (e.g. further key words), or rearranging of any desired kind. In the case of external sorting, the usual approach is to sort as much of the external information as possible at a time by internal sorting passes, each time recording the result again externally on auxiliary files. The final single ordered file is then obtained by a number of merging passes. External sorting algorithms are not usually part of a program library, in fact the name algorithm is no longer adequate for a program in whose strategy system characteristics play a major role.

next

up

previous

contents

index

Next: Spherical Coordinates Up: No Title Previous: Software Engineering Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node270.html (2 of 2)9/3/2006 14:21:16

Spherical Coordinates

next

up

previous

contents

index

Next: Spline Functions Up: No Title Previous: Sorting

Spherical Coordinates The spherical coordinates (

) are normally defined in terms of the Cartesian coordinates (x,y,z)

by:

The matrix A giving spherical coordinate unit vectors in terms of Cartesian unit vectors ( Systems) is then:

Coordinate

, and the distance element is

The volume element is .

The above relationships hold when the angle is defined as being with respect to the z axis. It is sometimes convenient to define the angle with respect to the x-y plane, in which case is replaced by . This is the case when using astronomical coordinates, where is the declination (elevation angle).

Rudolf K. Bock, 7 Feb 2006

http://rkb.home.cern.ch/rkb/AN16pp/node271.html9/3/2006 14:21:18

Spline Functions

next

up

previous

contents

index

Next: Stack Up: No Title Previous: Spherical Coordinates

Spline Functions When approximating functions for interpolation or for fitting measured data, it is necessary to have classes of functions which have enough flexibility to adapt to the given data, and which, at the same time, can be easily evaluated on a computer. Traditionally polynomials have been used for this purpose. These have some flexibility and can be computed easily. However, for rapidly changing values of the function to be approximated the degree of the polynomial has to be increased, and the result is often a function exhibiting wild oscillations. The situation changes dramatically when the basic interval is divided into subintervals, and the approximating or fitting function is taken to be a piecewise polynomial. That is, the function is represented by a different polynomial over each subinterval. The polynomials are joined together at the interval endpoints (knots) in such a way that a certain degree of smoothness (differentiability) of the resulting function is guaranteed. If the degree of the polynomials is k, and the number of subintervals is n+1 the resulting function is called a (polynomial) spline function of degree k (order k+1) with n knots. Splines are highly recommended for function approximation or data fitting whenever there is no particular reason for using a single polynomial or other elementary functions such as sine, cosine or exponential functions. For practical problems, spline functions have the following useful properties. They are: ● ● ● ●

- smooth and flexible, - easy to store and manipulate on a computer, - easy to evaluate, along with their derivatives and integrals, - easy to generalize to higher dimensions.

The name spline function was introduced by Schönberg in 1946. The real explosion in the theory, and in practical applications, began in the early 1960s. Spline functions are used in many applications such as interpolation, data fitting, numerical solution of ordinary and partial differential equations (finite element method), and in curve and surface fitting. An early book about splines with programs is [Boor78], a more recent publication is [Press95]. [Flowers95]. An application of spline functions to track fitting is given in [Wind74].

next

up

previous

contents

index

http://rkb.home.cern.ch/rkb/AN16pp/node272.html (1 of 2)9/3/2006 14:21:19

also

Spline Functions

Next: Stack Up: No Title Previous: Spherical Coordinates Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node272.html (2 of 2)9/3/2006 14:21:19

Stack

next

up

previous

contents

index

Next: Standard Deviation Up: No Title Previous: Spline Functions

Stack A programming concept much needed in nested or recursive operations. A stack can be defined as a list of items such that manipulations are done only at the beginning of the list (LIFO = last-in-first-out). Example: In following a generalized tree structure of data, operations at a given level are suspended when a new, lower level is encountered. The parameters describing the current level are then stacked for later continuation (after lower level operations have been terminated). The operations of entering/removing items to/from a stack are commonly called pushing and popping. also [Maurer77]. also FIFO.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node273.html9/3/2006 14:21:20

Standard Deviation

next

up

previous

contents

index

Next: Statistic Up: No Title Previous: Stack

Standard Deviation The standard deviation of a random variable is the positive square root of its variance.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node274.html9/3/2006 14:21:21

Statistic

next

up

previous

contents

index

Next: Stirling's Formula Up: No Title Previous: Standard Deviation

Statistic If

are the elements of a sample, then any function

is called a statistic. It can be used to estimate parameters of the population from which the sample was taken ( Estimator), typically to perform statistical tests (chi-square test Student's Test), to test some Hypothesis Testing), or to classify an event into one of several categories ( Neymanhypothesis ( Pearson Diagram).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node275.html9/3/2006 14:21:22

Stirling's Formula

next

up

previous

contents

index

Next: Stratified Sampling Up: No Title Previous: Statistic

Stirling's Formula An approximation to the factorial function n! which is valid for large n:

The formula is good to 1% for n=8.

also Gamma Function.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node276.html9/3/2006 14:21:23

Stratified Sampling

next

up

previous

contents

index

Next: Structured Programming Up: No Title Previous: Stirling's Formula

Stratified Sampling A variance-reducing technique used in Monte Carlo methods. An analogy is the systematic sampling of a (human) population in opinion polls: one usually chooses a representative group that parallels the entire population in some key characteristics, achieving a more meaningful result than by random sampling, hard to achieve in the first place. In stratified sampling, the volume to be sampled over is split into intervals or subvolumes, and each interval is sampled with a sample size and, possibly, according to techniques most adapted to the function in this interval. The contributions are not added at the level of individually sampled points, but partial sums are added with appropriate weights. For further reading, see [Press95].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node277.html9/3/2006 14:21:24

Structured Programming

next

up

previous

contents

index

Next: Student's Distribution Up: No Title Previous: Stratified Sampling

Structured Programming A (historical?) stepping stone in development methods for designing and implementing large programs ( Software Engineering), structured programming was introduced to improve program readability. It is defined, not very rigidly, by two key rules: ●

●

- a top-down approach, characterized by development of control structures for major subsystems before defining their components, which in turn are developed at their highest level before their details are considered; - a deliberate restriction in the use of nesting and branching constructs inside the programming language used. Ideally, only statement concatenation (bracketing), selection of statements based on the testing of a condition, and iteration are permitted [Dijkstra68].

In practice, these rules have led professionals to abandon languages too close to the machine instructions, and to produce programs containing only tree-like calling structures of program parts and with few backwards GO TO statements, preferably none at all. The intended advantages are the ease of writing and maintaining programs, i.e. a general speed-up of defining, implementing, debugging and documenting programs. When using the still preferred high-level language for writing large scientific programs, FORTRAN, some constructs typical for structured programming exist, but much additional discipline is recommended to apply the rules of structured programming. More recently, object-oriented programming has emerged, which embodies many of the principles of structured programming; ([Katzan79], [Metcalf82], [Metcalf96], [Ross96]).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node278.html9/3/2006 14:21:25

Student's Distribution

next

up

previous

contents

index

Next: Student's Test Up: No Title Previous: Structured Programming

Student's Distribution If variance

is a sample of size N drawn from a normal distribution with mean E(X) and , then

and

are estimators of the mean E(X) and of the variance of the estimator

. The quotient

is described by Student's distribution (also called the t-distribution) with f=N-1 degrees of freedom. Its probability density is

where

For

denotes Euler's gamma function. The t-distribution has the properties

it approaches the standard normal distribution.

http://rkb.home.cern.ch/rkb/AN16pp/node279.html (1 of 2)9/3/2006 14:21:27

Student's Distribution

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node279.html (2 of 2)9/3/2006 14:21:27

Student's Test

next

up

previous

contents

index

Next: Successive Over-Relaxation Up: No Title Previous: Student's Distribution

Student's Test A quantity is determined from a sample of N measurements, and the resulting mean is to be compared to an a priori value .The mean and variance are estimated from the sample,

A test statistic T is defined by

, which follows Student's distribution, and can

be compared to its quantiles for three different hypotheses:

Tables of the quantiles t , etc., of the t-distribution can be found in the literature (e.g. [Brandt83]). a

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node280.html9/3/2006 14:21:28

Successive Over-Relaxation

next

up

previous

contents

index

Next: T-DistributionT-Test Up: No Title Previous: Student's Test

Successive Over-Relaxation Linear Equations, Iterative Solutions

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node281.html9/3/2006 14:21:29

T-Distribution, T-Test

next

up

previous

contents

index

Next: Template Matching Up: No Title Previous: Successive Over-Relaxation

T-Distribution, T-Test Student's Distribution, Student's Test

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node282.html9/3/2006 14:21:30

Template Matching

next

up

previous

contents

index

Next: Thresholding Up: No Title Previous: T-DistributionT-Test

Template Matching As a measure, how well an arbitrary pattern of greyvalues, a template g(x,y), matches a given image f(x, y), one uses a ( Metric) distance function , e.g.:

The minima of these measures are the best match. In the case of the Euclidean distance

the maximum of

is the best match, the other terms being constant. This ``cross-correlation''

yields a result only if the integral is computed over the whole area G. In the discrete case, this takes the form

if the variation in the energy of the image f can be ignored. Otherwise the normalized cross-correlation has to be used:

It takes the same amount of computing time for any

, whereas the computation of the other two

measures can be halted as soon as the misregistration

exceeds a given threshold. For more reading,

e.g. [Pratt78] or [Rosenfeld76].

http://rkb.home.cern.ch/rkb/AN16pp/node283.html (1 of 2)9/3/2006 14:21:32

Template Matching

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node283.html (2 of 2)9/3/2006 14:21:32

Thresholding

next

up

previous

contents

index

Next: Training Sample Up: No Title Previous: Template Matching

Thresholding Thresholding describes the operation of setting values below a given threshold to zero. This may concern all pixels in an image, or amplitudes in a digital signal. Sometimes, the term implies that values above the threshold are set to one, creating a binary image or signal. Thresholding is often applied to suppress noise, in situations where the signal-to-noise ratio is large. If a high fraction of channels contains only low-amplitude noise, thresholding produces sparse information and may be a powerful step towards data compression. Thresholding with some very simple encoding scheme, like transmitting the sparse channels along with their channel number, is often referred to as zero suppression .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node284.html9/3/2006 14:21:33

Training Sample

next

up

previous

contents

index

Next: Transformation of Random Variables Up: No Title Previous: Thresholding

Training Sample A sample of events, usually obtained by Monte Carlo methods, which is representative of a class of events, i.e. exhibits sufficiently the properties ascribed to this class; a training sample is typically used to optimize some algorithm or coefficients of a representation, e.g. ([Bishop95]) the weights in an artificial neural network. The performance of the resulting algorithm must be checked by using it on an independent validation sample .

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node285.html9/3/2006 14:21:34

Transformation of Random Variables

next

up

previous

contents

index

Next: Trimming Up: No Title Previous: Training Sample

Transformation of Random Variables If X is a random variable described by the probability density f(x), and if Y=Y(X) then

is the probability density of y. For a transformation of several random variables into

where

one has

is the Jacobian or Jacobi Determinant of the transformation.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node286.html9/3/2006 14:21:35

Trimming

next

up

previous

contents

index

Next: Truly Random Numbers Up: No Title Previous: Transformation of Random Variables

Trimming Trimming a data sample consists in removing the n members having the n/2 largest values and the n/2 smallest values of a given parameter. The trimmed mean is the mean value of a parameter for a data sample ignoring the n extreme values. The even positive integer n determines the amount of trimming. When n is one less than the size of the data sample, the trimming is maximum, and the trimmed mean is just the median. Trimming makes a calculation more robust, i.e. less sensitive to outliers, at the expense of reduced statistical efficiency. also Winsorization.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node287.html9/3/2006 14:21:36

Truly Random Numbers

next

up

previous

contents

index

Next: Tuple Up: No Title Previous: Trimming

Truly Random Numbers Truly random numbers can only be generated by a physical process and cannot be generated in any standard digital computer. This makes it rather clumsy to use them in Monte Carlo calculations, since they must be first generated in a separate device and either sent to the computer or recorded (for example on magnetic tape) for later use in calculations. One such tape, containing 2.5 million random 32-bit floating point numbers generated using radioactive decay, may be obtained from the Argonne National Laboratory Code Center, Argonne, Illinois 60439, USA. Magnetic tapes containing more extensive sets of truly random digits generated by a similar device are available from Okayama, Japan [Inoue83]. For general information, Random Numbers.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node288.html9/3/2006 14:21:37

Tuple

next

up

previous

contents

index

Next: Type-I Error Up: No Title Previous: Truly Random Numbers

Tuple A finite sequence of elements, occurring in a prescribed order. An n-tuple is a sequence of n elements; Cartesian coordinates (x, y, z) in Euclidean space are a 3-tuple, a personnel file may contain a 20-tuple for each employee, like name, birthday, phone number, etc. An n-tuple may contain real numbers, and then is equivalent to an n-vector, and any n-tuple may be stored as a record; these notions are, in fact, largely overlapping (at least in everyday computer jargon).

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node289.html9/3/2006 14:21:39

Type-I Error

next

up

previous

contents

index

Next: Unfolding Up: No Title Previous: Tuple

Type-I Error and type-II error,

Hypothesis Testing.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node290.html9/3/2006 14:21:40

Unfolding

next

up

previous

contents

index

Next: Uniform Distribution Up: No Title Previous: Type-I Error

Unfolding Synonymous with deconvolution,

Convolution. For more detail, see [Blobel85].

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node291.html9/3/2006 14:21:41

Uniform Distribution

next

up

previous

contents

index

Next: Validation Sample Up: No Title Previous: Unfolding

Uniform Distribution This is the simplest distribution of a continuous random variable and has the following properties of the probability density function:

It has two parameters: one can take the mean (b+a)/2 and the width (b-a) or the boundaries a and b. The and forming the following estimators: parameters are estimated by taking a sample

Simple (but biased) estimators of a and b are, of course,

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node292.html9/3/2006 14:21:43

and

Validation Sample

next

up

previous

contents

index

Next: Variance Up: No Title Previous: Uniform Distribution

Validation Sample Training Sample

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node293.html9/3/2006 14:21:44

Variance

next

up

previous

contents

index

Next: Wavelet Transform Up: No Title Previous: Validation Sample

Variance The variance of a random variable X is the second moment about the expectation value E(X):

An estimator for the variance of a sample is the (

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node294.html9/3/2006 14:21:45

Estimator) sample variance .

Wavelet Transform

next

up

previous

contents

index

Next: Weighted Mean Up: No Title Previous: Variance

Wavelet Transform Wavelets in general are functions that can be used to efficiently represent other functions. Wavelet representation is a fairly recent technique (see [Daubechies96]), and closely connected to image representation; older textbooks will not be helpful. The discrete wavelet transform is defined by a square matrix of filter coefficients, transforming an array into a new array, usually of the same length. The transform is chosen such that in the transform space the information is sparse, inviting compression. If correctly constructed, the matrix is orthogonal, and in this case not only the transform but also its inverse can be easily implemented (see [Press95]). The wavelet transform resembles the Fourier transform in many respects, but it is non-sinusoidal, and the scale of submatrices can be adapted to the problem at hand, viz. small and local features can be represented as well as overall and global characteristics. Usually a signal is looked at in the time domain x(t) or in some transform domain X(f). In the case of the Fourier transform the two alternatives and their relation are

Normally, there is no information about the time localization of frequency components as, e.g. in musical notation. Wavelet transforms can be interpreted as a mapping of x(t) into a two-dimensional function of time and frequency. Essentially they decompose x(t) into a family of functions, which are well localized in time and not of infinite duration like the sine and cosine functions of a Fourier transform. This is done by choosing a ``mother'' wavelet w(x), translating and dilating it like in this figure:

http://rkb.home.cern.ch/rkb/AN16pp/node295.html (1 of 2)9/3/2006 14:21:46

Wavelet Transform

If one chooses, e.g., a box function, one gets the simplest of wavelet transforms, Haar transform.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node295.html (2 of 2)9/3/2006 14:21:46

Weighted Mean

next

up

previous

contents

index

Next: Width Up: No Title Previous: Wavelet Transform

Weighted Mean If are several independent unbiased measurements of a physical quantity and if , then the weighted mean or weighted the measurements have the standard deviations average

is an unbiased estimator of

if the weights w1 are independent of the X1. The variance of

is

The minimal variance

is obtained with the weights

If the individual standard deviations are all equal to reduces to the (

Mean) arithmetic mean

with the variance http://rkb.home.cern.ch/rkb/AN16pp/node296.html (1 of 2)9/3/2006 14:21:48

, the weighted average, with these weights,

Weighted Mean

If all X1 are Gaussian random variables, so is the weighted mean (

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node296.html (2 of 2)9/3/2006 14:21:48

Convolution).

Width

next

up

previous

contents

index

Next: Winsorization Up: No Title Previous: Weighted Mean

Width The width of a statistical distribution is not unambiguously defined in the literature, although the term is frequently used by physicists. Sometimes, width is used as a synonym for standard deviation, i.e. the . More correctly, for empirical distributions like histograms positive square root of the variance obtained from experiment the width is frequently used as an abbreviation for full width at half maximum. The latter is a relevant parameter of the Breit-Wigner ( or Lorentz or Cauchy) distribution.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node297.html9/3/2006 14:21:49

Winsorization

next

up

previous

contents

index

Next: Zero Suppression Up: No Title Previous: Width

Winsorization A procedure similar to trimming but instead of throwing away the n extreme values, the n extreme values are replaced by the two remaining extreme values. That is, the extreme values are moved toward the centre of the distribution. This technique is sensitive to the number of outliers, but not to their actual values.

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node298.html9/3/2006 14:21:50

Zero Suppression

next

up

previous

contents

index

Next: References Up: No Title Previous: Winsorization

Zero Suppression Thresholding

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node299.html9/3/2006 14:21:51

References

next

up

previous

contents

index

Next: Index Up: No Title Previous: Zero Suppression

References 10000 10000 Abramowitz74 M. Abramowitz and I.A. Stegun (Eds.), Handbook of Mathematical Functions, National Bureau of Standards, Dover, New York, 1974. Ahmed75 N. Ahmed and K.R. Rao, Orthogonal Transforms for Digital Signal Processing, Springer, Berlin, Heidelberg, 1975. Ajmone94 M. Ajmone-Marsan et al., Modelling with Generalized Stochastic Petri Nets, Wiley, New York, 1994. Ames77 W.F. Ames, Numerical Methods for Partial Differential Equations, Academic Press, New York, 1977. Anderson92 E. Anderson, et al., LAPACK Users' Guide, Society for Industrial and Applied Mathematics, Philadelphia, 1992. Astola89 J. Astola, P. Haavisto, and Neuvo, Detail Preserving Monochrome and Color Image Enhancement Algorithms, in: From Pixels to Features, J.C. Simon (Ed.), Elsevier, Amsterdam, 1989. Barrett81 H.H. Barrett and W.S.Swindell, Radiological Imaging, Academic Press, New York, 1981. Beale91 R. Beale and T. Jackson, Neural Computing: An Introduction, Institute of Physics Publishing, Bristol, 1991. Beauchamp87

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (1 of 13)9/3/2006 14:21:53

References

K.G. Beauchamp, Transforms for Engineers, Clarendon, Oxford, 1987. Beasley93 D. Beasley, D.R. Bull, and R.R. Martin, An Overview of Genetic Algorithms, University Computing 15(2) (1993) 58. Bishop95 C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. Blobel84 V. Blobel, Least Squares Methods and Function Minimization, in `Formulae and Methods in Experimental Data Evaluation', Vol. 3, European Physical Society, CERN, 1984. Blobel85 V. Blobel, Unfolding Methods in High-energy Physics Experiments, CERN School of Computing, CERN 85-09 (1985). Boor78 C. de Boor, A Practical Guide to Splines, Springer, Berlin, Heidelberg, 1978. Booch87 G. Booch, Software Engineering with ADA, Benjamin/Cummings, Menlo Park 1987. Bowers91 D.S. Bowers, From Data to Database, Van Nostrand Reinold, 1991. Braaten79 E. Braaten and G. Weller, An Improved Low-discrepancy Sequence for Multidimensional QuasiMonte Carlo Integration, J. Comp. Phys. 33 (1979) 249. Brandt83 S. Brandt, Statistical and Computational Methods in Data Analysis, North Holland, 1983. Branham90 R.L. Branham, Scientific Data Analysis, An Introduction to Overdetermined Systems, Springer, Berlin, Heidelberg, 1990. Breit36 G. Breit, E. Wigner, Capture of Slow Neutrons, Phys. Rev. 49 (1936) 519. Breit59 G. Breit, Theory of Resonance Reactions, in: Handbuch der Physik XLI/1 Springer, Berlin, http://rkb.home.cern.ch/rkb/AN16pp/node300.html (2 of 13)9/3/2006 14:21:53

References

Heidelberg, 1959. Bromley86 K. Bromley (Ed.), Highly Parallel Signal Processing Architectures, SPIE Critical Review of Technology Series 19 614 1986. Buchberger83 B. Buchberger et al., Computer Algebra, Symbolic and Algebraic Computation, Springer, Berlin, Heidelberg, 1983. Budd91 T. Budd, An Introduction to Object-oriented Programming, Addison Wesley, 1991. Canny83 J.F. Canny, Finding edges and lines in images, Master Thesis, MIT 1983. Cassel80 D.G. Cassel and H. Kowalski, Pattern Recognition in Layered Track Chambers Using a Tree Algorithm, DESY Report 80/107. Chartrand85 G. Chartrand, Introductory Graph Theory, Dover, 1985. Char91 B.W. Char et al., The Maple V Language Reference Manual, Springer, Berlin, Heidelberg, 1991. Checkland92 P. Checkland and J. Scholes, Soft System Methodology in Action, Wiley, New York, 1992. Courant62 R. Courant and D. Hilbert, Methods of Mathematical Physics, Wiley, New York, 1962. Datapro83 The EDP Buyer's Guide, Datapro Research Corp., Delran N.J., 1983. There are multiple up-todate publications and services offered by Datapro (excellent access via the Internet). Daubechies96 I. Daubechies, Where do wavelets come from?, Proceeding of the IEEE, Special Issue on Wavelets, 84/4 1996. Davenport88

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (3 of 13)9/3/2006 14:21:53

References

J.H. Davenport, Y. Siret, and E. Tournier, Computer Algebra: Systems and Algorithms for Algebraic Cpmputation, Academic Press, New York, 1988. Deitel94 H.M. Deitel and P.J. Deitel, C++ How to Program, Prentice Hall, 1994. Deo74 N. Deo, Graph Theory with Applications to Engineering and Computer Science, Prentice Hall, 1974. Dijkstra68 E.W. Dijkstra, Go To Considered Harmful, Comm. of the ACM, March 1968. Dongarra79 J.J. Dongarra, J. Bunch, C.B. Moler, and G. Stewart, LINPACK User's Guide, SIAM, Philadelphia, 1979. Dougherty92 E.R. Dougherty, An Introduction to Morphological Image Processing, Tutorial Texts in Optical Engineering TT9, SPIE Optical Engineering Press, 1992. Drijard80 D. Drijard et al., On the Reduction in Space Resolution of Track Detectors Caused by Correlations in the Coordinate Quantization, Nucl.Instrum. Methods 176 (1980) 389. Eadie71 W.T. Eadie et al., Statistical Methods in Experimental Physics, North Holland, 1971. Edwards93 L.K.Edwards (Ed.), Applied Analysis of Variance in Behavioral Science, Statistics Textbooks and Monographs, vol. 137, Marcel Dekker Inc. New York, 1993. Efron79 B. Efron, Computers and the Theory of Statistics, SIAM Rev. 21 (1979) 460. Efron82 B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans, SIAM, Bristol, 1982. Fairclough96 J. Fairclough (Ed.), Software Engineering Guides, Prentice Hall, 1996. Flores69 http://rkb.home.cern.ch/rkb/AN16pp/node300.html (4 of 13)9/3/2006 14:21:53

References

I. Flores, Computer Sorting, Prentice Hall, 1969. Flowers95 B.H. Flowers, An Introduction to Numerical Methods in C++, Oxford University Press, Oxford, 1995. Freeman76 P. Freeman, Software Engineering, Springer, Berlin, Heidelberg, 1976. Frei77 W. Frei and C.C. Chen, Fast Boundary Detection: A Generalization and a New Algorithm, IEEE Trans. on Computers, Oct. 1977. Frühwirth97 R.Frühwirth, Track Fitting with non-Gaussian Noise, Comp. Phys. Comm. 100 (1997) 1. Gentleman81 W.M. Gentleman and H.T. Kung, Matrix triangularization by systolic arrays, SPIE Real- Time Signal Processing IV 298 1981. GML83 GML Corporation, Information Services: Computer Review. Lexington Mass. 1983. Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John Hopkins University Press, 1989. Gonzalez87 R.C. Gonzalez and P. Wintz, Digital Image Processing, Addison-Wesley1987. Goodman68 J.W. Goodman, Introduction to Fourier optics, McGraw-Hill 1968. Grimmett86 G. Grimmett and D. Welsh, Probability, an Introduction, Oxford University Press, Oxford, 1986. Grimmett92 G.R. Grimmett and D.R. Stirzaker, Probability and Random Processes, Oxford University Press, Oxford, 1992. Hall76

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (5 of 13)9/3/2006 14:21:53

References

G. Hall and J.M. Watt, Modern Numerical Methods for Ordinary Differential Equations, Clarendon, Oxford, 1976. Hammersley64 J.M. Hammersley and D.C. Handscomb, Monte Carlo Methods, Methuen, London, 1964. Haralick87 R.M. Haralick, S.R. Sternberg, and X. Zhuang, Image Analysis Using Mathematical Morphology, IEEE trans. on pattern analysis and machine intelligence 9-4 1987. Haykin91 S. Haykin, Adaptive Filter Theory, Prentice Hall, 1991. Hearn95 A.C. Hearn, REDUCE User's Manual, RAND publication CP78, July 1995. Hennessy90 J.L. Hennessy and D.A. Patterson, Computer Architectures: A Quantitative Approach, Morgan Kaufmann Publishers, 1990. Hopfield86 J.J. Hopfield and D.W. Tank, Science 239 (1986) 625. Horn97 D. Horn, Neural Computing Methods and Applications, Summary talk AIHEP96, Nucl. Instrum. Methods Phys. Res. A389 (1997) 2. Hough59 P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, International Conference on High Energy Accelerators and Instrumentation, CERN, 1959. Humpert90 B. Humpert, A Comparative Study of Neural Network Architectures, Comp.Phys.Comm.57 (1990) 223. Humphrey95 W.A. Humphrey, A Discipline for Software Engineering, Addison Wesley, 1995. Inoue83 H. Inoue et al., Random Numbers Generated by a Physical Device, Appl. Stat. 32 (1983) 115. Jain89 http://rkb.home.cern.ch/rkb/AN16pp/node300.html (6 of 13)9/3/2006 14:21:53

References

A.K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989. James80 F. James and M. Roos, Errors on Ratios of Small Numbers of Events, Nucl. Phys. B172 (1980) 475. James81 F. James, Determining the Statistical Significance of Experimental Results, in CERN Report 8103 (1981). James83 F. James, Fitting Tracks in Wire Chambers Using the Chebyshev Norm instead of Least Squares, Nucl. Instrum. Methods Phys. Res. 211 (1983) 145. Kalos86 M.H. Kalos and P.A. Whitlock, Monte Carlo Methods, Wiley, New York, 1986. Katzan79 H. Katzan, FORTRAN 77, Van Nostrand Reinold Co., 1979. King92 T. King, Dynamic Data Structures, Academic Press, San Diego 1992. Knuth81 D.E. Knuth, The Art of Computer Programming, Addison-Wesley, 1981. Kuipers74 L. Kuipers and H. Niederreiter, Uniform Distribution of Sequences, Wiley, New York, 1974. Kung79 H.T. Kung, and C.E. Leiserson, Systolic arrays for VLSI. in: Sparse Matrix Proceedings, 1978, SIAM, Philadelphia, 1979. Kung88 S.Y. Kung, VLSI Array Processors, Prentice Hall, 1988. Kunt80 M. Kunt, Traitement Numérique des Signaux, Editions Georgi, St.-Saphorin, 1980. Kunt84 M. Kunt, Atelier de Traitement Numérique des Signaux, Presses Polytechniques Romandes,

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (7 of 13)9/3/2006 14:21:53

References

1984. Kuwahara76 M. Kuwahara et al, Processing of RI-angio-cardiographic images, Digital Processing of Biomedical Images, Plenum Press, 1976. Landau44 L. Landau, J. Physics (USSR) 8 (1944) 201. Also: Collected Papers, D. ter Haar (Ed.), Pergamon Press, Oxford, 1965. Lee86 J. Lee, R.M. Haralick, and L.G. Shapiro, Morphological Edge Detectors, Proc. 8th ICPR, Paris, 1986 Lehmann80 M.M. Lehmann, Programs, Life Cycles, and Laws of Software Evolution, Proceedings IEEE 68 (1980) 9. Lindfield95 G. Lindfield and J. Penny, Numerical Methods Using MATLAB, Ellis Horwood Limited, 1995. Loney94 K. Loney, Oracle DBA Handbook, McGraw-Hill, 1994. MACSYMA87 Macsyma User's Guide, Symbolics Inc., Cambridge Mass. 1987. Maragos87 P. Maragos, Tutorial on advances in morphological image processing and analysis, in: Optical Engineering, SPIE 26-7 1987. Marciniak94 J.J.Marciniak (ed), Encycolpedia of Software Engineering, John Wiley, New York, 1994. Marsaglia85 G. Marsaglia, A Current View of Random Number Generators in Computer Science and Statistics, Elsevier, Amsterdam, 1985. MATLAB97 MATLAB 5 Reference Guide, The MathWorks, Inc., 24 Prime Park Way, Natick (MA), 1997. Maurer77 http://rkb.home.cern.ch/rkb/AN16pp/node300.html (8 of 13)9/3/2006 14:21:53

References

H.A. Maurer, Data Structures and Programming Techniques, Prentice Hall, 1977. Mayne81 A. Mayne, Data Base Management Systems, NCC Publications, National Computing Centre, Manchester, 1981. Mazza94 C.Mazza et al., Software Engineering Standards, Prentice Hall, 1994. McClellan73 J.H. McClellan, The Design of 2-D Digital Filters by Transformation, Proc. 7th Annual Princeton Conference on Information Science and Systems, 1973. McDermid91 J.A. McDermid (Ed.), Software Engineer's Reference Book, Butterworth-Heinemann1991. McNamara82 J.E. McNamara, Technical Aspects of Data Communication, Digital Press, 1982. McWhirter83 G.J. McWhirter, Recursive Least Squares Minimisation using a systolic array, Proc. SPIE RealTime Signal Processing VI 431 1983. Metcalf82 M. Metcalf, FORTRAN Optimization, Academic Press, New York, 1982. Metcalf96 M. Metcalf and J.Reid, FORTRAN 90/95 Explained, Oxford University Press, Oxford, 1996. Metropolis53 N. Metropolis et al., Journal Chem.Phys. 21 (1953) 1087. Meyer88 B. Meyer, Object-oriented Software Construction, Prentice Hall, New York1988. Milne49 W.E. Milne, Numerical Calculus, Princeton University Press, Princeton, New Jersey, 1949. Moyal55 J.E. Moyal, Theory of Ionization Fluctuations, Phil. Mag. 46 (1955) 263.

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (9 of 13)9/3/2006 14:21:53

References

Murata89 T. Murata, Petri Nets: Properties, Analysis and Applications, Proceedings of the IEEE, 77/4, p.541, 1989. Nagao78 M. Nagao and T. Matsuyama, Edge Preserving Smoothing, Proc. 4th Int. Conf. on Pattern Recognition, Kyoto, 1978. Naur74 P. Naur, Concise Survey of Computer Methods, Studentlitteratur Lund, 1974. NBS52 National Bureau of Standards, Applied Mathematics Series 9, Tables of Chebyshev Polynomials S (x) and C (x), United States Government Printing Office, Washington, 1952. n

n

Nelder65 J.A. Nelder, R. Mead, A Simplex Method for Function Minimization, Computer Journal 7 (1965) 308. NIHimage96 Public Domain NIH Image program, developed at the U.S. National Institutes of Health, available via Internet by anonymous FTP from zippy.nimh.nih.gov. O'Connel74 M.J. O'Connel, Search Program for Significant Variables, Comp. Phys. Comm. 8 (1974) 49. Oppenheim75 A.V. Oppenheim and R.W. Schafer, Digital Signal Processing, Prentice Hall, 1975. Phelps86 M.E. Phelps et al., Positron Emission Tomography and Autoradiography, Raven Press, New York, 1986. Pratt78 W.K. Pratt, Digital Image Processing, Wiley, New York, 1978. Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, Second edition, Cambridge University Press, 1995. (The same book exists for the Fortran language). There is also an Internet version which you can work from.

http://rkb.home.cern.ch/rkb/AN16pp/node300.html (10 of 13)9/3/2006 14:21:53

References

PROG79 Programs for Digital Signal Processing, Edited by the Digital Signal Processing Committee, IEEE Acoustics, Speech and Signal Processing Society, IEEE Press, 1979.. Provencher82 S.W. Provencher, A constrained regularization method for inverting data represented by a linear algebraic or integral equation, Comp. Phys. Comm. 27 (1982) 213. Rabbani91 M. Rabbani and Paul W. Jones, Digital Image Compression Techniques, SPIE Optical Engineering Press, Tutorial Text TT7, The International Society for Optical Engineering, Bellingham, Washington USA, 1991. Rabiner75 L.R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall 1975. Raghavan93 R. Raghavan, Cellular Automata in Pattern Recognition, Information Sciences 70 (1993) 145. Ralston78a A. Ralston and Ph. Rabinowitz, A First Course in Numerical Analysis, McGraw-Hill 1978. Ralston78b M.L. Ralston and R.I. Jennrich, Dud, a Derivative-free Algorithm for Non-linear Least Squares, Technometrics 20-1 (1978) 7. Rayna87 G. Rayna, REDUCE Software for Algebraic Computation, Springer, Berlin, Heidelberg, 1987. Regener51 V.H. Regener, Statistical Significance of Small Samples of Cosmic Ray Counts, Phys. Rev. 84 (1951) 161. Reisig85 W.Reisig, Petri Nets: An Introduction, Monographs on Theoretical Computer Science, Springer, Berlin, Heidelberg, 1985. Rey78 W.J.J. Rey, Robust Statistical Methods, Springer, Berlin, Heidelberg, 1978. Rosenfeld76 http://rkb.home.cern.ch/rkb/AN16pp/node300.html (11 of 13)9/3/2006 14:21:53

References

A. Rosenfeld and A.C. Kak, Digital Picture Processing, Computer Science and Applied Mathematics, Academic Press, New York, 1976. Ross96 P.W. Ross (Ed.), The Handbook of Software for Engineers, CRC Press, 1996 Scowen65 R.S. Scowen, QUICKERSORT, Algorithm 271, Comm. of the ACM 8 (1965) 669. Serra80 J. Serra, Image Analysis and Mathematical Morphology, Academic Press, New York, 1980. Shen92 J. Shen and S. Castan,S., An Optimal Linear Operator for Step Edge Detection, Computer Vision, Graphics and Image Processing 54 (1992). Sivia96 D.S. Sivia, Data Analysis: A Bayesian Tutorial, Oxford University Press, Oxford, 1996. Skiena90 S. Skiena, Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica, Addison-Wesley, 1990. Smith76 B.T. Smith, J.M. Boyle, J.M. Dongarra, J.J. Garbow, Y. Ikebe, Klema and C.B.Moler, Matrix Eigensystems Routines: EISPACK Guide, 2nd edn., Springer, Berlin, Heidelberg, New York, 1976. Strang88 G. Strang, Linear Algebra and its Applications, 3rd edn., Harcourt Brace Jovanovich College Publishers, 1988. Vitter87 J.S. Vitter and W.C. Chen, Design and Analysis of Coalesced Hashing, Oxford University Press, Oxford, 1987. Wasserman80 A.I. Wasserman, Information Systems Design Methodology, Journ. Am. Soc. For Inf. Science 31 No.1 (1980). Reprinted in FREE80. Weszka79 J.S.Weszka and A.Rosenfeld, Histogram modification for threshold selection, IEEE Trans. SMChttp://rkb.home.cern.ch/rkb/AN16pp/node300.html (12 of 13)9/3/2006 14:21:53

References

9 (1979). Whitehouse85 H.J. Whitehouse, J.M. Speiser and K. Bromley, Signal Processing Applications of Concurrent Array Processor Technology, in: VLSI and Modern Signal Processing, Prentice Hall, 1985. Wind74 H. Wind, Momentum Analysis by using a Quintic Spline Model for the Track, Nucl. Instrum. Methods 115 (1974) 431. Wolfram86 S. Wolfram (Ed.), Theory and Applications of Cellular Automata, World Scientific Press, 1986. Wolfram91 S. Wolfram, Mathematica, Addison-Wesley 1991. Wong92 S.S.M. Wong, Computational Methods in Physics and Engineering, Prentice Hall, 1992. Young71 D.M. Young, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971. Zahn71 C.T. Zahn, Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters, IEEE Trans. Computers C20 (1971) 68. Zahn73 C.T. Zahn, Using the Minimum Spanning Tree for Recognising Dotted and Dashed Curves, Proceedings International Computing Symposium Davos, 1973, p. 381. Zakrzewska78 K. Zakrzewska et al., A Numerical Calculation of Multidimensional Integrals, Comp. Phys. Comm. 14 (1978) 299. Zaremba72 S.K. Zaremba (Ed.), Applications of Number Theory to Numerical Analysis, Academic Press, New York, 1972.

Rudolf K. Bock, 7 April 1998 http://rkb.home.cern.ch/rkb/AN16pp/node300.html (13 of 13)9/3/2006 14:21:53

About this document ...

next

up

previous

contents

index

Up: No Title Previous: Index

About this document ... This document was generated using the LaTeX2HTML translator Version 96.1-g (July 19, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds. The command line arguments were: latex2html -auto_navigation -contents_in_navigation -html_version 3.1 address Rudolf K. Bock, 7 April 1998 AN16pp.tex. The translation was initiated by Rudi Bock on Thu Apr 9 15:02:45 METDST 1998

Rudolf K. Bock, 7 April 1998

http://rkb.home.cern.ch/rkb/AN16pp/node302.html9/3/2006 14:21:54

Data analysis briefbook

Data analysis briefbook

Categorical Data Analysis

Categorical Data Analysis

Paleontological Data Analysis

Paleontological Data Analysis

Statistical Data Analysis

Statistical Data Analysis

High-dimensional Data Analysis

High-dimensional Data Analysis

Longitudinal Data Analysis

Longitudinal Data Analysis

Analysis of panel data

Analysis of panel data

Spatiotemporal Data Analysis

Spatiotemporal Data Analysis

Advances in Data Analysis

Advances in Data Analysis

Analysis of Survey Data

Analysis of Survey Data

Intelligent Data Analysis

Intelligent Data Analysis

Analysis of Economic Data

Analysis of Economic Data

Longitudinal Data Analysis

Longitudinal Data Analysis

Python for Data Analysis

Python for Data Analysis

Nonparametric Functional Data Analysis

Nonparametric Functional Data Analysis

Analysis of Panel Data

Analysis of Panel Data

Access Data Analysis Cookbook

Access Data Analysis Cookbook

Analysis of Financial Data

Analysis of Financial Data

Bayesian data analysis

Bayesian data analysis

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis

Applied Functional Data Analysis

Applied Functional Data Analysis

Sequence Data Analysis Guidebook

Sequence Data Analysis Guidebook

Python for Data Analysis

Python for Data Analysis

Advanced Quantative Data Analysis

Advanced Quantative Data Analysis

Paleontological Data Analysis

Paleontological Data Analysis

Applied Functional Data Analysis

Applied Functional Data Analysis

Applied Life Data Analysis

Applied Life Data Analysis

Multivariate data analysis

Multivariate data analysis

Longitudinal Data Analysis

Longitudinal Data Analysis

Bayesian Data Analysis

Bayesian Data Analysis

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close