Random Graphs The book is devoted to the study of classical combinatorial structures, such as random graphs, permutations, and systems of random linear equations in finite fields.
The author shows how the application of the generalized scheme of allocation in the study of random graphs and permutations reduces the combinatorial problems to classical problems of probability theory on the summation of independent random variables. He concentrates on recent research by Russian mathematicians, including a discussion of equations containing an unknown permutation. This is the first English-language presentation of techniques for analyzing systems of random linear equations in finite fields. These results will interest specialists in combinatorics and probability theory
and will also be useful in applied areas of probabilistic combinatorics, such as communication theory, cryptology, and mathematical genetics. V. F. Kolchin is a leading researcher at the Steklov Institute and a professor at the Moscow Institute of Electronics and Mathematics (MIEM). He has written four books and many papers in the area of probabilistic combinatorics. His papers have been published mainly in the Russian journals Theory of Probability and Its Applications, Mathematical Notes, and Discrete Mathematics, and in the international journal Random Structures and Algorithms.
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS
EDITED BY G.-C. ROTA
Editorial Board R. Doran, M. Ismail, T.-Y. Lam, E. Lutwak Volume 53
Random Graphs 6 18 19
22 23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41
42 43 44 45 46 47 48 49 50 51
52
H. Minc Permanents H. O. Fattorini The Cauchy Problem G. G. Lorentz, K. Jetter, and S. D. Riemenschneider Birkhoff Interpolation J. R. Bastida Field Extensions and Galois Theory J. R. Cannon The One-Dimensional Heat Equation S. Wagon The Banach-Tarski Paradox A. Salomaa Computation and Automata N. White (ed.) Theory of Matroids N. H. Bingham, C. M. Goldie, and J. L. Teugels Regular Variation P. P. Petrushev and V. A. Popov Rational Approximation of Real Functions N. White (ed.) Combinatorial Geometries M. Pohst and H. Zassenhaus Algorithmic Algebraic Number Theory J. Aczel and J. Dhombres Functional Equations in Several Variables M. Kuczma, B. Chozewski, and R. Ger Iterative Functional Equations R. V. Ambartzumian Factorization Calculus and Geometric Probability G. Gripenberg, S.-O. Londen, and O. Staffans Volterra Integral and Functional Equations G. Gasper and M. Rahman Basic Hypergeometric Series E. Torgersen Comparison of Statistical Experiments A. Neumaier Interval Methods for Systems of Equations N. Korneichuk Exact Constants in Approximation Theory R. A. Brualdi and H. J. Ryser Combinatorial Matrix Theory N. White (ed.) Matroid Applications S. Sakai Operator Algebras in Dynamical Systems W. Hodges Basic Model Theory H. Stahl and V. Totik General Orthogonal Polynomials R. Schneider Convex Bodies G. Da Prato and J. Zabczyk Stochastic Equations in Infinite Dimensions A. BjOrner, M. Las Vergnas, B. Sturmfels, N. White, and G. Ziegler Oriented Matroids G. A. Edgar and L. Sucheston Stopping Times and Directed Processes C. Sims Computation with Finitely Presented Groups T. Palmer Banach Algebras and the General Theory of *-Algebras F. Borceux Handbook of Categorical Algebra I F. Borceux Handbook of Categorical Algebra II F. Borceux Handbook of Categorical Algebra III
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS
Random Graphs V. F. KOLCHIN Steklov Mathematical Institute, Moscow
AMBRIDGE UNIVERSITY PRESS
CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo
Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521440813
© Cambridge University Press 1999
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1999
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data Kolchin, V. F. (Valentin Fedorovich) Random graphs / V. F. Kolchin p cm. - (Encyclopedia of mathematics and its applications; v. 53) Includes bibliographical references and index. ISBN 0 521 44081 5 hardback 1. Random graphs. I. Title. II. Series. QA166.17.K65 1999 98-24390 511'.5 - dc20 CIP ISBN 978-0-521-44081-3 hardback
Transferred to digital printing 2007
CONTENTS
Preface 1
ix
The generalized scheme of allocation and the components of random graphs The probabilistic approach to enumerative combinatorial problems 1.2 The generalized scheme of allocation 1.3 Connectivity of graphs and the generalized scheme 1.4 Forests of nonrooted trees 1.5 Trees of given sizes in a random forest 1.6 Maximum size of trees in a random forest 1.7 Graphs with unicyclic components 1.8 Graphs with components of two types 1.9 Notes and references
1
1.1
2
Evolution of random graphs 2.1
2.2 2.3 2.4 2.5 3
3.2 3.3 3.4 3.5
22 30 42 48 58
70 86 91
Subcritical graphs Critical graphs Random graphs with independent edges Nonequiprobable graphs Notes and references
Systems of random linear equations in GF(2) 3.1
1
14
Rank of a matrix and critical sets Matrices with independent elements Rank of sparse matrices Cycles and consistency of systems of random equations Hypercycles and consistency of systems of random equations vii
91
97 100 109 120 122 122 126 135 143
156
Contents
viii
3.6 3.7
Reconstructing the true solution Notes and references
4 Random permutations 4.1
4.2 4.3 4.4 5
Random permutations and the generalized scheme of allocation The number of cycles Permutations with restrictions on cycle lengths Notes and references
Equations containing an unknown permutation 5.1
5.2 5.3 5.4
A quadratic equation Equations of prime degree Equations of compound degree Notes and references
164 177
181 181
183 192
212
219 219 225 235 239
Bibliography
241
Index
251
PREFACE
Combinatorics played an important role in the development of probability theory
and the two have continued to be closely related. Now probability theory, by offering new approaches to problems of discrete mathematics, is beginning to repay its debt to combinatorics. Among these new approaches, the methods of asymptotic analysis, which have been well developed in probability theory, can be used to solve certain complicated combinatorial problems. If the uniform distribution is defined on the set of combinatorial structures in question, then the numerical characteristics of the structures can be regarded as random variables and analyzed by probabilistic methods. By using the probabilistic approach, we restrict our attention to "typical" structures that constitute the bulk of the set, excluding the small fraction with exceptional properties. The probabilistic approach that is now widely used in combinatorics was first
formulated by V. L. Goncharov, who applied it to S,,, the set of all permutations of degree n, and to the runs in random (0,1)-sequences. S. N. Bernstein, N. V. Smirnov, and V. E. Stepanov were among those who developed probabilistic combinatorics in Russia, building on the famous Russian school of probability founded by A. A. Markov, P. L. Lyapunov, A. Ya. Khinchin, and A. N. Kolmogorov. This book is based on results obtained primarily by Russian mathematicians and presents results on random graphs, systems of random linear equations in GF(2), random permutations, and some simple equations involving permutations. Selecting material for the book was a difficult job. Of course, this book is not a complete treatment of the topics mentioned. Some results (and their proofs) did not seem ready for inclusion in a book, and there may be relevant results that have escaped the author's attention. There is a large body of literature on random graphs, and it is not possible to review it here. Among the probabilistic tools that have been used to analyze random structures are the method of moments, Poisson and Gaussian approximations, generating functions using the saddle-point method, Tauberian-type theorems, analysis ix
x
Preface
of singularities, and martingale theory. In the past two decades, a method called the generalized scheme of allocation has been widely used in probabilistic combinatorics. It is so named because of its connection with the problem of assigning n objects randomly to N cells. Let 111, ... -77N be random variables that are, for example, the sizes of components of a graph. If there are independent random variables 1, ... , 4N so that the joint distribution of 111, ... , I]N for any integers kl, ... , kN can be written as
P{i11=k1,...,17N=kN}=P{ where n is a positive integer, then we say that ill, ... , ?IN satisfy the generalized scheme of allocation with parameters n and N and independent random variables
1,
, N-
Graph evolution is the random process of sequentially adding new edges to a graph. For many classes of random graphs with n labeled vertices and T edges, the parameter 0 = 2T/n plays a role of time in the process; various graph properties
often change abruptly at the critical point 0 = 1. Graph evolution is the most fascinating object in the theory of random graphs, and it appears that it is well suited to the generalized scheme. We will show that applying generalized schemes makes it possible to analyze random graphs at different stages of their evolution and to obtain limit distributions in those cases in which only properties similar to the law of large numbers have been proved. The theory of random equations in finite fields is shared by probability, combinatorics, and algebra. In this book, we will consider systems of linear equations in GF(2) with random coefficients. The matrix of such a system corresponds to a random graph or hypergraph; therefore, results on random graphs help to study these systems. We are sure that this application alone justifies developing the theory of random graphs. The theory of random permutations is a well-developed branch of probabilistic combinatorics. Although Goncharov has investigated the cycle structure of a random permutation in great detail, there is still great interest in this area. We will
fully describe the asymptotic behavior of P{v = k) for the total number v, of cycles in a random permutation for all possible behaviors of the parameters n and
k = k(n) as n -k oo. We will also give some of the asymptotic results for the number of solutions of the equation Xd = e, where an unknown X E Sn, d is a fixed positive integer, and e is the identity of the group S'. Although the generalized scheme of allocation cannot be applied to nonequiprobable graphs, we present some results in this situation by using the method of moments. The statistical applications of nonequiprobable graphs call for the development of regular methods of analyzing these structures. The book consists of five chapters. Chapter 1 describes the generalized scheme of allocation and its applications to a random forest of nonrooted trees, a random
Preface
xi
graph consisting of unicyclic components, and a random graph with a mixture of trees and unicyclic components. In Chapter 2, these results are applied to the study of the evolution of random graphs. Chapter 3 is devoted to systems of random linear equations in GF(2). Much of this branch of probabilistic combinatorics is the work of Russian mathematicians; this is the first English-language presentation of many of the results. Random permutations are considered in Chapter 4, and Chapter 5 contains some results on permutation equations of the form Xd = e. Most results presented in this book derive from work done over the past fifteen years; notes and references can be found in the last section of each chapter. (It is, of course, impossible to give a complete list in each particular area.) In addition to articles used in the text, the summary sections of all chapters include references to papers on related topics, especially those in which the same results were obtained by other methods.
We assume that the reader is familiar with basic combinatorics. This book should be accessible to those who have completed standard courses of mathematical analysis and probability theory. Section 1.1 includes a list of pertinent results from probability. This book continues in the tradition of Random Mappings [78] and differs from other treatments of random graphs in the systematic use of the generalized scheme of allocation. We hope that the chapter on systems of random linear equations in GF(2) will be of interest to a broad audience. I wish to express my sincere appreciation to G.-C. Rota, who encouraged me to write this book for the Encyclopedia of Mathematics series, even though there are already several excellent books on random graphs. My greatest concern is writing the book in English. I am indebted to the editors who have brought the text to an acceptable form. It is apparent that no amount of editing can erase the heavy Russian accent of my written English, so my special thanks go to those readers who will not be deterred by the language of the book. I greatly appreciate the support I received from my colleagues at the Steklov Mathematical Institute while I wrote this book.
1
The generalized scheme of allocation and the components of random graphs
1.1. The probabilistic approach to enumerative combinatorial problems The solution to enumerative combinatorial problems consists in finding an exact or approximate expression for the number of combinatorial objects possessing the property under investigation. In this book, the probabilistic approach to enumerative combinatorial problems is adopted. The fundamental notion of probability theory is the probability space (Q, A, P), where Q is a set of arbitrary elements, A is a set of subsets of S2 forming a oralgebra of events with the operations of union and intersection of sets, and P is a nonnegative countably additive function defined for each event A E A so that P (S2) = 1. The set S2 is called the space of elementary events and P is a probability.
A random variable is a real-valued measurable function
_ ((w) defined for all
(ES2. Suppose Q consists of finitely many elements. Then the probability P is defined on all subsets of S2 if it is defined for each elementary event w E 0. In this case, any real-valued function _ ((o) on such a space of elementary events is a random variable.
Instead of a real-valued function, one may consider a function f((o) taking values from some set Y of arbitrary elements. Such a function f ((0) may be considered a generalization of a random variable and is called a random element of the set Y. In studying combinatorial objects, we consider probability spaces that have a natural combinatorial interpretation: For the space of elementary events S2, we take the set of combinatorial objects under investigation and assign the same probability to all the elements of the set. In this case, numerical characteristics of combinatorial objects of 0 become random variables. The term "random element of the set S2" is usually used for the identity function f((o) = w, w E Q, mapping each element
of the set of combinatorial objects into itself. Since the uniform distribution is I
2
The generalized scheme of allocation and the components of random graphs
assumed on S2, the probability that the identity function f takes any fixed value w is the same for all w E Q. Hence the notion of a random combinatorial object of S2, such as the identity function f((o) = co, agrees with the usual notion of a random element of a set as an element sampled from all elements of the set with equal probabilities. Note that a random combinatorial object with the same distribution could also be defined on larger probability spaces. For our purposes, however, the natural construction presented here is sufficient for the most part. The exceptions are those few cases that involve several independent random combinatorial objects and in which it would be necessary to resort to a richer probability space, such as the direct product of the natural probability spaces. Since we use probability spaces with uniform distributions, in spite of the probabilistic terminology, the problems considered are in essence enumeration problems of combinatorial analysis. The probabilistic approach furnishes a convenient form of representation and helps us effectively use the methods of asymptotic analysis that have been well developed in the theory of probability. Thus, in the probabilistic approach, numerical characteristics of a random combinatorial object are random variables. The main characteristic of a random variable is its distribution function F(x) defined for any real x as the probability of the
event { < x}, that is,
F(x) = P{ < x}. The distribution function F(x) defines a probability distribution on the real line called the distribution of the random variable . With respect to this distribution, given a function g(x), the Lebesgue-Stieltjes integral
Joog(x)dF(x) ao
can be defined. The probabilistic approach has advantages in the asymptotic investigations of combinatorial problems. As a rule, we have a sequence of random variables , n = 1 , 2, ... , each of which describes a characteristic of the random combinatorial object under consideration, and we are interested in the asymptotic
behavior of the distribution functions F (x) = P{ < x} as n -* oo. A sequence of distributions with distribution functions F (x) converges weakly to a distribution with the distribution function F(x) if, for any bounded continuous function g(x),
00
as n oo. The weak convergence of distributions is directly connected with the pointwise convergence of the distribution functions as follows.
1.1 Probabilistic approach to enumerative combinatorial problems
3
Theorem 1.1.1. A sequence of distribution functions Fn (x) converges to a distribution function F(x) at all continuity points if and only if the corresponding sequence of distributions converges weakly to the distribution with distribution function F(x). In a sense, the distribution, or the distribution function F(x), characterizes the random variable 4. The moments of l; are simple characteristics. If
fJ-
Ix I d F(x)
exists, then
E=
xdF(x)
00
I
is called the mathematical expectation, or mean, of the random variable . Further, /00
Mr = El;r = J
xrdF(x)
-00
is called the rth moment, or the moment of rth order (if the integral of Ix Ir exists).
In probabilistic combinatorics, one usually considers nonnegative integervalued random variables. For such a random variable, the factorial moments are natural characteristics. We denote the rth factorial moment by
m(r) = El; ( - 1) ... (l; - r + 1). If a distribution function F(x) can be represented in the form
F(x) =
x
p(u) du,
I
where p(u) > 0, then we say that the distribution has a density p(u). In addition to the distribution function, it is convenient to represent the distribution of an integervalued random variable by the probabilities of its individual values. For l; , we will use the notation
k=0,1,..., and for integer-valued nonnegative random variables ln, pkn)
=
k},
k = 0, 1, ... .
It is clear that 00
El; = Ykpk, n=0
if this series converges. It is not difficult to see that the following assertion is true.
4
The generalized scheme of allocation and the components of random graphs
Theorem 1.1.2. A sequence of distributions { P(n) }, n = 1, 2, ... , converges weakly to a distribution { pk} if and only if for every fixed k = 1, 2, ... , pkn) -+ Pk
as n -4 oo. If an estimate of the probability P{ > 01 is needed for a nonnegative integervalued random variable , then the simple inequality 00
00
k=1
k=1
(1.1.1)
can be useful. In particular, for a sequence fin, n = 1, 2, ..., of such random variables with Eln -+ 0 as n oo, it follows that
Since it is generally easier to calculate the moments of a random variable than the whole distribution, one wants a criterion for the convergence of a sequence of distributions based on the corresponding moments. But, first, it should be noted that even if a random variable has moments of all orders, its distribution cannot, in general, be reconstructed on the basis of these moments, since there exist distinct
distributions that have the same sequences of moments. For example, it is not difficult to confirm that for any n = 1, 2, ..., 00
fo
xne-1/4sinx1/4dx = 0.
Hence, for -1 < a < 1, the function
pa(x) = -L 24
e-1/4(1 +a sin X 1/4)
is the density of a distribution on [0, oo) whose moments do not depend on a. Thus the distribution functions with moments of all orders are divided into two classes: The first class contains the functions that may be uniquely reconstructed from their moments, and the second class contains the functions that cannot be reconstructed from their moments. There are several sufficient conditions for the moment problem to have a unique solution. Let
Mn = f00
Ix I' dF(x).
00
A distribution function F(x) is uniquely reconstructed by the sequence mr, r = 1, 2, ... , of its moments if there exists A such that 1 Mn/n < A. n
(1.1.2)
1.1 Probabilistic approach to enumerative combinatorial problems
5
The following theorem describing the so-called method of moments is applicable only to the first class of distribution functions. Theorem 1.1.3.
If distribution functions Fn (x), n = 1, 2, ..., have the moments
of all orders and for any fixed r = 1, 2, ...,
xrdFn(x)
m;. n) =
mr,
ImrI < 00,
roo
as n -+ oo, then there exists a distribution function F(x) such that for any fixed
r = 1,2,..., fOO
xr dF(x),
Mr = J 0C
and f r o m the sequence F n (x), n = 1 , 2, ... , it is possible to select a subsequence
Fnk (x), k = 1, 2, ... , that converges to F(x) as n -k oo at every continuity point of F(x). If the sequence mr, r = 1, 2, ... , uniquely determines the distribution function F(x), then Fn (x) F(x) as n - oo at every continuity point of F(x). Note that the normal (Gaussian) and Poisson distributions are uniquely reconstructible by their moments. To use the method of moments, it is necessary to calculate moments of random variables. One useful method of calculating moments of integer-valued random variables is to represent them as sums of random variables that take only the values 0 and 1. Theorem 1.1.4.
If Sn =1 + ... + 4n,
and the random variables m=1,2, n,
take only the values 0 and 1, then for any
Sn(Sn-1)...(S, -m+1)=
im
where the summation is taken over all different ordered sets of different indices im }, the number of which is equal to (m)m!.
Generating functions also provide a useful tool for solving many problems related to distributions of nonnegative integer-valued random variables. The complex-valued function 00
4, (z) = O1: (z) = E pkzk = Ez k=0
(1.1.3)
6
The generalized scheme of allocation and the components of random graphs
is called the generating function of the distribution of the random variable 4. It is defined at least for Izi < 1. For example, for the Poisson distribution with parameter A, which is defined by the probabilities k
pk
ke
k=0,1, ...,
eA(Z-1). the generating function is Relation (1.1.3) determines a one-to-one correspondence between the generating functions and the distributions of nonnegative integer-valued random variables, since the distribution can be reconstructed by using the formula
1 Pk = _O(k) (0),
k = 0, 1,
....
(1.1.4)
Generating functions are especially convenient for the investigation of sums of independent random variables. If l , ... , n are independent nonnegative integervalued random variables and S = l;i + + ,,, then 0sn (z) =
i
(z) ...
n
(z)
The correspondence between the generating functions and the distributions is continuous in the following sense.
Theorem I.I.S.
Let {pkn'}, n = 1, 2, ..., be a sequence of distributions. If for
anyk=0,1,...,
A
Pk
as n -* oo, then the sequence of corresponding generating functions On (z), n = 1, 2, ... , converges to the generating function of the sequence {pk} uniformly in any circle I z I < r < 1. In particular, if { pk} is a distribution, then the sequence of corresponding generating functions converges to the generating function 0 (z) of the distribution { pk }
uniformly in any circle Izi < r < 1. Theorem 1.1.6. If the sequence of generating functions On (z), n = 1, 2, ... , of the distributions {pkn' } converges to a generating function O (z) of a distribution
{pk} on a set M that has a limit point inside of the circle Izi < 1, then the distributions {pkn' } converge weakly to the distribution { pk}.
Since a generating function 0 (z) _ be represented by the Cauchy formula
o
pkzk is analytic, its coefficients can
0 (z) dz
P,,
JC
n =0, 1,...,
where the integral is over a contour C that lies inside the domain of analyticity of 0 (z) and contains the point z = 0.
1.1 Probabilistic approach to enumerative combinatorial problems
7
Thus, if we are interested in the behavior of p, as n -> oo, then we have to be able to estimate contour integrals of the form
_
1
2ni c
g(z)e Xf(z) dz,
where g(z) and f (z) are analytic in the neighborhood of the curve of integration C and A is a real parameter tending to infinity. The saddle-point method is used to estimate such integrals. The contour of integration C may be chosen in different ways. The saddle-point method requires choosing the contour C in such a way that it passes through the point zo, which is a root of the equation f(z) = 0. Such a point is called the saddle point, since the function 91f(z) has a graph similar to a saddle or mountain pass. The saddle-point method requires choosing the contour of integration such that it crosses the saddle point zo in the direction of the steepest descent. However, finding such a contour and applying it are complicated problems, so for the sake of simplicity one usually does not choose the best contour, hence losing some accuracy in the remainder term when estimating the integral. A parametric representation of the contour transforms the contour integral to an integral with a real variable of integration. Therefore the following theorem on estimating integrals with increasing parameters, based on Laplace's method, sometimes provides an answer to the initial question on estimating integrals. Theorem 1.1.7.
If the integral
G(,l) =
g(t)e'`f(t) dt foo
converges absolutely for some A = Ao, that is,
g(t) l e'ol(t) dt < M. FOO
if the function f (t) attains its maximum at a point to and in a neighborhood of this point
f(t) = f(to) + a2 (t - to)2 + a3(t - to) 3 + .. . with a2 < 0; if for an arbitrary small 8 > 0, there exists h = It (S) > 0 such that
f(to) - f(t) >- h,
fort - tol > 8; and if, as t -+ to,
g(t) = C(t - to)2,(1 + O(It - tol)),
The generalized scheme of allocation and the components of random graphs
8
where c is a nonzero constant and m is a nonnegative integer, then, as J
00,
G(A) = e'kf(to)A-m-1/2cclm+lr(m + 1/2)(1 + where r (x) is the Euler gamma function and 1
,I
1
a2
In particular if m = 0, then c = g(to), and as A - oo, G(A) = exf(to)
g(to) _(to)ll n/(I +
To demonstrate that this rather complicated theorem can really be used, let us estimate the integral 00
r(A+1)=Jx)'e-xdx 0
oo, and obtain the Stirling formula. The change of variables x = At leads to the equation as A
r(A + 1) =
00 A;'+le-x
I
e-A(t-l-log t) dt.
Here g(t) = 1, and f(t) = -(t - 1 - logt), f(1) = 0, f'(l) = 0, f"(1) _ -1. The conditions of the theorem are fulfilled; therefore, by (1.1.5), 00
G(A) =
J0
ef(t) dt = 2n/X(1 +
and for the Euler gamma function, we obtain the representation
r(A + 1) _ .k;1+1/2e-x 2Jr(1 + 0(1/,/,-k)) oo, coinciding with the Stirling formula, except for the remainder term, which can be improved to 0 (1 /),). Generating functions are only suited for nonnegative integer-valued random variables. A more universal method of proving theorems on the convergence of sequences of random variables is provided by characteristic functions. The charas A
acteristic function of a random variable distribution is defined as cp(t) = tpg(t) = Ee`t =
or the characteristic function of its
ftxdF,
(1.1.6)
where -oo < t < oo and F(x) is the distribution function of l; .
If the rth moment mr exists, then the characteristic function to(t) is r times differentiable, and cp(r)(0) = irmr.
1.1 Probabilistic approach to enumerative combinatorial problems
9
Characteristic functions are convenient for investigating sums of independent random variables, since if S, = l 1 + + fin, where 41, ... ,n are independent random variables, then 1(t)...cp4n(t)-
cos(t) = (p
The characteristic function of the normal distribution with parameters (m, U2) and density
p(x) = is
e-(x-m)2/(2a2)
1
2n a
eimt-a2t2/2
Relation (1.1.6) defines a one-to-one correspondence between characteristic functions and distributions. There are different inversion formulas that provide a formal possibility of reconstructing a distribution from its characteristic function, but they have limited practical applications. We state the simplest version of the inversion formulas. Theorem 1.1.8. If a characteristic function cp(t) is absolutely integrable, then the corresponding distribution has the bounded density 1
p(x) = 2n
J
°O
e-itxco(t) dt.
The correspondence defined by (L 1.6) is continuous in the following sense.
Theorem 1.1.9. A sequence of distributions converges weakly to a limit distribution if and only if the corresponding sequence of characteristic functions cpn (t) converges to a continuous function cp (t) as n -k oo at every fixed t, -oo < t < 00. In this case, cp(t) is the characteristic function of the limit distribution, and the convergence cpn (t) -+ (p (t) is uniform in any finite interval. For a sequence i;n of characteristics of random combinatorial objects, applying Theorem 1.1.9 gives the limit distribution function. But for integer-valued characteristics, one would rather have an indication of the local behavior, that is, the behavior of the probabilities of individual values. To this end the so-called local limit theorems of probability theory are used. Let l; be an integer-valued random variable and pn = P f = n). It is clear that P(l; E F'1 } = 1, where F1 is the lattice of all integers. If there exists a lattice rd with a span d such that P(i;' E rd} = 1 and there is no lattice r with span greater than d such that P(l; E r}, then d is called the maximal span of the distribution of %'. The characteristic function cp(t) of the random variable is periodic with
period 2n/d and rcp(t) I < 1 for 0 < t < 2,r/d.
10
The generalized scheme of allocation and the components of random graphs
For integer-valued random variables, the inversion formula has the following form: 1
n
Pn = 2n J e-itnco(t) dt. n Consider the sum SN = 1 + + N of independent identically distributed integer-valued random variables 1, ... , l;N. When the distributions of the summands are identical and do not depend on N, the problem of estimating the probabil-
ities P{SN = n}, as N - oo, has been completely solved. If there exist sequences of centering and normalizing numbers AN and BN such that the distributions of the random variables (SN - AN)/BN converge weakly to some distribution, then the limit distribution has a density. Moreover, a local limit theorem holds on the lattice with a span equal to the maximal span of the distribution of the random variable 1. If the maximal span of the distribution of 1 is 1, then the local theorem holds on the lattice of integers. Theorem 1.1.10. Let 1, 2, ... be a sequence of independent identically distributed integer-valued random variables and let there exist AN and BN such that, as N oo for any fixed x,
P
{SNAN BN
-< x j
__*
x
Jf 00
p(u) du.
Then, if the maximal span of the distribution of l 1 is 1,
BNP{SN = n} - p((n - AN)/BN) -+ 0 uniformly in n.
Local limit theorems are of primary importance in what follows. Therefore, let us prove a local theorem on convergence to the normal distribution as a model for proofs of local limit theorems in more complex cases, which will be discussed later in the book.
a'IN1
Theorem 1.1.11. Let the independent identically distributed integer-valued random variables 4i, 2, ... have a mathematical expectation a and a positive variance a2. Then, if the maximal span of the distribution oft is 1,
1+...+4N=n}-
P{
uniformly in n as N
1
(n-aN)2 I - 2a2N
oo.
Proof. Let
z=
n - aN
a'J
and
PN(n) = P{ 1 +
+ N = n}.
0
1.1 Probabilistic approach to enumerative combinatorial problems
11
If cp(t) is the characteristic function of the random variable 1, then the character+ N is equal to cpN(t), and istic function of the sum SN = 41 + 00
tPN(t) = E PN(n)eitn n=-oo
By the inversion formula,
PN(n) = 2n
J
e-itntoN(t) dt.
(1.1.7)
Let cp* (t) denote the characteristic function of the centered random variable 1 - a, which equals cp(t) exp{-ita}. Since /n = aN+az,..IN--, it follows from (1.1.7) that eitoz 1 (cP*(t))N dt.
PN(n) = 2n J
After the substitution x = to,i', this equality takes the form na-1N
1
o-IN--PN(n)
= - f-7rc-l-N- etxz(tp*(x/ (aI-N)))Ndx. 27r
(1.1.8)
By the inversion formula, 1
e-z2/2
=
1
27r
27r
00
_
e-ixz-x2/2
dx.
(1.1.9)
00
It follows from (1.1.8) and (1.1.9) that the difference RN = 27r
(o-V'PN(n)
-
I
e-z2/2)
2n
(1.1.10)
can be written as the sum of the following four integrals:
It = LI 12 =
f
e-x2/2) dx,
e-ixz-x2/2 dx,
13 = f
_/N
14 = I a
where the constants A and E will be chosen later. To see that RN 0 as N oo, we take an arbitrary S > 0 and show that RN can be made less than 3 for sufficiently large N.
The generalized scheme of allocation and the components of random graphs
12
For 12, we have
e-x
1121 _<
2/2
dx,
A
and 1121 can be made arbitrarily small by the choice of sufficiently large A. Since a and Dal = a2, for the characteristic function cp*(t) as t we have cP*(t)
=1-
or 212
+o(t2).
2
0,
(1.1.11)
Let coN(t) denote the characteristic function of (SN - aN)/(a fN-), which equals
(cp*(x/(af/)))N. For any fixed x and N -+ oo, we obtain from (1.1.11) the relation
logcpN(x) = Nlogcp*(x/(aJ)) //
= N log (1 \\\
x2 \ - 2N + o(1/N) I
JJJJJJ
2
_ - 2 +00), implying that for any fixed x as N
oo,
e-xz/2
ON(x)
(1.1.12)
Moreover, as seen from (1.1.11), there exists s > 0 such that, f o r I t I < s,
or
2 2
Iw*(t)I < 1 -
<
e-°2`2/4.
(1.1.13)
Using this inequality to estimate 13, we find that 13
f
Ndx <
ex
2/4dx,
and by the choice of sufficiently large A, 1131 can be made arbitrarily small. Lets be such that (1.1.13) is satisfied and let A be large enough so that 1121 < 3/4 and 1131 < 8/4. Let us now estimate the integrals Il and 14 for fixed s and A. Rela-
tion (1.1.12) implies that the distribution of (SN -aN)/(a fN) converges weakly, as N -k oo, to the normal distribution with parameters (0, 1). The convergence of the characteristic functions cpN(x) to the characteristic function of the normal law is uniform in any finite interval, and the integral Ii tends to zero as N 00. For 14, we have 14
<no 1 0*(x/(a'IN-- ))INdx = aJJe
Since the maximal span of the distribution of l is 1,
max Iw(t)I=q<1.
e
1.1 Probabilistic approach to enumerative combinatorial problems
13
Hence,
1141 < Q / 2 rqN,
and 14-40as N--* oo. The estimates of II and 14 show that there exists No such that III I < 6/4 and 1141 < 3/4 for N > No. Thus the difference RN tends to zero as N - oo uniformly for all integers n.
In most applications of local theorems in this text, the distribution of the summands of the sum S N = 41 + + 4N depends on the number of summands N. In such cases, there is no complete answer to the question of when the local theorem holds for SN. Even in the case of convergence to the normal law, the known sufficient conditions for the validity of a local theorem cannot be deemed fully satisfactory. Hence, for each specific distribution whose parameters depend on the number of summands in the sum, it is necessary to invoke the classical scheme given above as a model. In the hope of finding simple sufficient conditions for the validity of local theorems for integer-valued identically distributed summands, as in Theorems 1.1.10 and 1.1.11, we will often omit the particularly cumbersome calculations arising in estimating characteristic functions. If I, ... , 4N are independent identically distributed random variables such that
for 0
and
+ SAN has the binomial distribution with parameters (N, p), then SN = Ski + that is, for any k = 0, 1, ..., N,
P{SN = k} = (N)pkqN_k If Npq --* oo, then the binomial distribution is approximated by the normal law. The following theorem, known as the local de Moivre-Laplace theorem, can be obtained by a direct analysis of the explicit formula. Theorem 1.1.12. If N -+ oo and (1 + u6)/(Npq)
0, where
k - Np
pq
then
(N)pkqN_k = k
1
27rNpq
e_u
2/2
6,q
1 + q - p (3u - u3) + 0
1 + u6
Npq
))
-
Theorem 1.1.12 implies the well-known integral de Moivre-Laplace theorem.
The generalized scheme of allocation and the components of random graphs
14
Theorem 1.1.13. If N -+ oo and (1 + u6)/(Npq) --* 0, where
k - Np then
f
u
P{SN < k} =
1
2n
e-xz/2dx(1 +o(1)). 00
If p
0, then the binomial distribution is approximated by the Poisson law. It is well known that if N --> oo and Np -+ X, 0 < A < oo, then
(N)pkqN_k for any fixed k = 0, 1,
e
.... The Poisson approximation is also valid if Np tends
to infinity not too quickly.
Theorem 1.1.14. If N
oo, Np -+ oo, (1 + u2)p __* 0, where
Np u=k - p then
(N)pkqN_k = (Np)k e-Np(l + 0(1)). The Poisson distribution converges to the normal law as its parameter tends to infinity.
0, where u = (k - , then
Theorem 1.1.15. If (1 + u6)/), 1
k!
2nd,
2
e-" /2 1 +
u3
- 3u
+O
6
1 + u6 k
Sometimes it is necessary to estimate the tails of the binomial distribution in the form of an inequality with an explicit constant. Theorem 1.1.16. For any x > 0, 2 P{SN-ESN>Nx}<e-2Nx
1.2. The generalized scheme of allocation In the past three decades, the so-called generalized scheme of allocation of particles
has been applied to many probabilistic problems of combinatorics, and many of the results in this text were obtained by reducing combinatorial problems to such a generalized scheme.
1.2 The generalized scheme of allocation
15
Consider n independent trials, each having N equiprobable outcomes, 1, 2, .... N. Let i7i denote the number of occurrences of the ith outcome in this sequence
of trials, i = 1, 2, ... , N. The random variables 711, ... , 11N have the multinomial distribution: If the nonnegative integers k 1 . . then
.
. .
P{171 = k1, ... ,11N = kN}
=
kN are such that k1 +
n'
k1(...kN(Nn'
+kN = n, (1.2.1)
The situation in which the multinomial distribution arises can be described in terms of an equiprobable scheme of allocating particles to cells. If n particles are independently distributed with equal probabilities into N cells labeled 1, 2, ... , N, then the contents of cells 711, ... , 11N have the multinomial distribution (1.2.1). In the scheme of allocating particles to cells yielding the multinomial distribution, the contents of cells can be obtained by independent sequential allocation of particles. If one does not require that the contents of cells can be obtained by some sequential allocation of particles, with a simple probability law governing the sequential trials, then any set of integer-valued nonnegative random variables +77N = n, can be viewed as a scheme of allocating 111, ..., 11N, such that 171 + n particles to N cells, and one can interpret i7i as the number of particles in the
cell with index i, i = 1, 2, ..., N. Some probabilistic problems of combinatorics can be treated by using generalized schemes of allocation in which the joint distribution of the contents of cells 171, ... , 17N can be represented in the form
(1.2.2)
where 1, ... , N are independent identically distributed integer-valued random variables. The generalized scheme of allocating particles to cells is given by the parameters n and N and the distribution of the random variables 1, ... , N, which by relation (1.2.2) determines the joint distribution of the contents of the cells 771, ... , 11N. Set
k = 0, 1,
pk = P{set = k},
....
(1.2.3)
For the random variables 171, ... , 17N with the multinomial distribution (1.2.1), relation (1.2.2) is satisfied if 1 has the Poisson distribution with arbitrary parameter A:
),ke-)'
pk = P141 = k) =
,
k!
k = 0, 1, ....
(1.2.4)
Therefore the distribution of 171, ... , 77N satisfying relation (1.2.2) for some distribution (1.2.3) can be viewed as a generalization of the multinomial distribution.
The generalized scheme of allocation and the components of random graphs
16
The term "classical scheme of allocation" has become common for the equiprobable scheme of allocating particles to cells leading to the multinomial distribution (1.2.1). The terminology of the classical scheme of allocating particles to cells proved to be convenient for describing a number of combinatorial problems where the multinomial distribution appears. Many results pertaining to the classical scheme of allocation can be obtained by applying relation (1.2.2) between the multinomial distribution and the Poisson distribution (1.2.4). Introducing generalized schemes of allocating particles not only broadens the scope of convenient language for describing combinatorial objects, but also offers the possibility of applying methods based on relation (1.2.2) that have been developed to analyze the classical scheme. Let µ,(n, N) denote the number of cells containing exactly r particles in the generalized scheme of allocation with distributions (1.2.2) and (1.2.3). We show that the representation (1.2.2) can be used to study this random variable. irk, ... , Ni be independent identically distributed random variables whose distribution is linked with the distribution of 41, ... , N as follows:
it
k}
= P{sel = k I 1
k = 0, 1.....
r},
Also let
Sn = 1 + ... + N,
4Z(r) =fir)
+... + N)
The following lemma expresses the distribution of Ar (n, N) in terms of the probabilities of sums of independent identically distributed random variables.
Lemma 1.2.1. (r)
P{µr(n, N) = k} _ (k)P1 - pr)N k P{ SP{SN nkr
} .
Proof. Let
(1.2.5)
be the eveent that exactly k of the random variables 1, ... take the value r. By equality (1.2.2),
P{µr(n, N) = k} = P{Akr) I SN = n} =
,N
SN = n} P{SN = n} Akr)
The lemma is derived by obvious manipulations of the numerator: The events can occur f o r ( k) distinct choices of random variables taking the value r; therefore
SN = n}
_
()P1 k
P)N-k
()P1 - pr)N-kP{SNl k = n -kr}.
1.2 The generalized scheme of allocation
17
In the generalized scheme of allocating particles, there is a rather simple approach to study the order statistics 17(1) < '1(2) < ... < rl(N) constructed for the random variables ill, ..., 17N arranged in nondecreasing order. Let N) be independent identically distributed random variables such that
k = 0,1,...,
1g A),
P{
where A is a subset of the set of natural numbers with P {Sel V A) > 0. In particular,
if A consists of one value r, then l i 4) = defined preceding Lemma 1.2.1. Set
(r) where f ir) is the random variable
SN)- IA)+...+
N)
The following lemma reduces the study of distributions of order statistics to that of probabilities related to sums of independent random variables. Lemma 1.2.2.
For any positive integer in,
P{rl(m) < r} = I -
m-1 NP{S(Ar) + S(Ar) = n} ()(1 - Pr)1 pl
P{SN
1=0
()P,!(1
P{rl(N-m+l) < r) _
-
NP{`S(Ar)
n)
Pr)N-I
I
1=0
(1.2.6)
-
N-}
+ S(Ar) Nn} =
(1.2.7)
where Ar is the set of all nonnegative integers not exceeding r, Ar is its complement
in the set of all nonnegative integers, and Pr = P( i > r}. Proof. Let us prove (1.2.7) for m = 1. For the maximal order statistic 11(N) max(17i, ... , 17N), by (1.2.2) and the independence of 1, ... , l;N, we have P{17(N) < r} = P{111 < r, .... 17N < r}
= P{41
(Ar)
we finally obtain
(1 - Pr)NP(SNr) -n) P{SN = n}
.
(1.2.8)
Relations (1.2.6) and (1.2.7) for other values of in are similarly proved. For the joint distribution of the random variables An (n, N), ..., Itre (n, N), we can prove the following lemma as we did in Lemma 1.2.1.
18
The generalized scheme of allocation and the components of random graphs
Lemma 1.2.3.
P{itr1(n,N) =kl,..., prs(n,N) =ks} ks N-ki-...-ks
k1 kI N ! pri ... Prsks (1 _ Pry
Prs
k1!...ks!(N-ki P J S(rl,...,rs)
x
ks)!
= n - k r- ... - ks rs}
l
1
1
P{SN = n}
where s - 1, ki, ... , ks, ri, ... , rs are nonnegative integers and rl, ... , rs are distinct. Lemmas 1.2.1, 1.2.2, and 1.2.3 express the distributions of the random variables µr (n, N) and the order statistics X7(1), 17(2), ... , 11(N) in the generalized scheme
of allocating particles in terms of probabilities related to sums of independent random variables. Obtaining limit distributions for the random variables µr (n, N) and 17(1), 11(2), ... , 17(N) is reduced to applying local limit theorems for sums of independent identically distributed integer-valued random variables. We now give some examples of how combinatorial problems can be reduced to the generalized scheme of allocating particles to cells.
Example 1.2.1. Consider single-valued mappings of the set X, = 11, 2, ... , n} into itself. A single-valued mapping s of the set Xn into itself can be represented as 1 ,
2,
..
,
n
s Si,
S2,
...,
sn
where sk denotes the image of k, k = 1, 2, ... , n, under the mapping s. The mapping s may be thought of as an oriented graph I' s) = r'(Xv, Wn) with vertex set Xn and arcs Wn = {(k, sk), k = 1, 2, ... , n}, where the arc (k, sk) is directed from k to sk, k = 1, 2, ... , n. The number of arcs entering the vertex kin the graph rns), which is the number of pre images of the element k under the mapping s, is called the multiplicity of the vertex k.
Let En denote the set of all single-valued mappings of Xn into itself, and rn the set of all graphs of these mappings. The number of elements of En is obviously equal to nn. If the uniform distribution is defined on the set En, then we obtain a probability space whose set of elementary events S2 is the set En; and the
probability for any subset of E, is the number of elements in the subset divided by nn. The random mapping or is any of the nn possible mappings with probability
P(or = s) = n-n, s E En. If 'or =
(
1
2,
Q1,
Q2,
nl -
an
1.2 The generalized scheme of allocation
19
where the random variable at is the random image of the element i, i = 1, 2, ... , n, then, for any s,
P{a = s} = P{al = Si, ... , an = sn} = n-n. Thus the random variables al, ..., an are independent and take the values 1, 2, .... n with equal probabilities. Let 11r denote the multiplicity of the vertex r in the random mapping a, r = 1, 2, ... , n. The quantity rlr is equal to the number of random variables al, . . . , an taking the value r; thus, for nonnegative integers kl, ... , kn with kl + +kn = n,
the probability P {,l l = kl, ... , nn = kn } is equal to the sum of probabilities P{al = sl, ... , an = sn} = n-n, where among si, . . . , sn there are exactly kr values equal to r, r = 1, 2, ... , n. The number of summands in this sum is obviously n!/(kl! . . . kn!); therefore P{ t1 1
-=
in
--1r__1 =
n!
Thus the joint distribution of the multiplicities of the vertices qt, ... , rln of a random mapping is the multinomial distribution. Taking the vertices as cells and the arcs going into these vertices as particles, we obtain the classical scheme of allocating n particles ton cells with multinomial distribution of the contents of the cells rl l , ... , rln. For the random variables rl 1, ... , ?In, relation (1.2.2) holds:
P0]1 =kl,...,t1n
=kn I
=n},
in which 1, ... , n are independent and identically Poisson-distributed. The number of vertices µr (n) in a random mapping with multiplicity r corresponds to the number of cells containing exactly r particles in the classical scheme of allocating n particles to n cells; to study these variables, as well as the order statistics made up of the multiplicities of the vertices, one can invoke Lemmas 1.2.1, 1.2.2, and 1.2.3.
Example 1.2.2. Consider all distinct partitions of n into N summands not less than r > 0. The number of such partitions is (n-(N i)N-1) . Let us define the uniform distribution on the set of these partitions by assigning the probability (n-(r-1)N-1)-1 to each partition n = nl + +nN, nl,... , nN > r. Then n can N-1 be written in the form
where the summands 171, ... , 17N are random variables. If n1, ... , nN > r and
P{,11=nl,...,i1N=nN}=
(n-(r-1)N-1\-1 N-1
The generalized scheme of allocation and the components of random graphs
20
The general scheme of allocation corresponding to this combinatorial problem is obtained if we use the geometric distribution for the distribution of the random
variables 1, ... , N:
=k}= pk-r(1-p), k=r,r+1,..., 0< p< 1. Indeed, as is easily verified,
=
I
In - (r - 1)N - 1
N-1
)
since, for geometrically distributed summands,
In - (r - 1)N - 1)pn_Nr(lp)N
N-1
Example 1.2.3. Note that it is not necessary for the random variables , ... , i4N in a generalized scheme to be identically distributed. Consider the following example. Draw n balls at random without replacement from an urn containing m, balls of
the ith color, i = 1, ..., N. Let nt denote the number of balls drawn of the ith color, i = 1, ... , N. It is easily seen that for nonnegative integers n 1, . that
. .
, n N such
(m)(mJ%.r) = nl..... r1N = nN) _
nlnNP{r11 (m) n
where m =ml+ +MNIf in the generalized scheme of allocation the random variables 1, ... , N have the binomial distributions
k} _
k`) pk(1
-
p)Mi-k,
where 0 < p < 1, k = 1,2,...,mi,i = 1,...,N, then
(nl) ... P{
(fN)
(m) n
and the distribution of the random variables 111, ... ,11N coincides with the con-
ditional distribution of the independent random variables 1, ... , N under the + N = n. Thus t11, ... , ON may be viewed as contents of cells condition 41 + in the generalized scheme of allocation, in which the random variables i;1, .... N have different binomial distributions.
1.2 The generalized scheme of allocation
21
Example 1.2.4. In a sense, the graph rn of a random mapping consists of trees. Indeed, the graph can be naturally decomposed into connected components. Clearly, each connected component of the graph r,, contains exactly one cycle. Vertices in the cycle are called cyclic. If we remove the arcs joining the cyclic vertices, then the graph turns into a forest, that is, a graph consisting of rooted trees. Recall that a rooted tree with n + 1 vertices is a connected undirected graph without cycles, with one special vertex called the root, and with n nonroot labeled vertices. A rooted tree with n + 1 vertices has n edges. In what follows, we view all edges of trees as directed away from the root, and the multiplicity of a vertex of a tree is defined as the number of edges emanating from it.
Let T denote the set of all rooted trees with n + 1 vertices whose roots are labeled zero, and the n nonroot vertices are labeled 1 , 2, ... , n. The number of elements of the. set T is equal to (n + 1)i-1 A forest with N roots and n nonroot vertices is a graph, all of whose components are trees. The roots of these trees are labeled with 1, . . . , N and the nonroot vertices with 1, ... , n. We denote the set of all such forests by Tn, N. The number of elements in the set T,,, N is N(n + N)' -1. The number of forests in which the kth tree contains
nk nonroot vertices, k = 1, 2, ... , n, is n!
nl!...nN! (n1 + 1)n1
1
... (nN + 1)nN
1
,
where the factor n!/(n 1! . nN!) is the number of partitions of n vertices into N ordered groups, and (nk + 1)nk-1 is the number of trees that can be constructed from the kth group of vertices of each partition. Then
(nl + I)n1-1 ... (nN + n!
I
n,+...+nN=n
1)nN-1
= N(n + N)n-1,
(1.2.9)
where the summation is taken over nonnegative integers n 1, ... , n N such that Next, we define the uniform distribution on T, N. Let rlk denote the number of nonroot vertices in the kth tree of a random forest in Tn,N, k = 1, ... , N. For the random variables i11, ... , 1IN, we have n! (nl + 1)n1 ... (nN + 1)nN
n1> ... , ]N = nN} = N(n
+ N)n-1(n1 + 1)I ... (nN + 1)!
(1.2.10)
where n 1, ... , n N are nonnegative integers and n 1 + + n N = n. Let us consider independent identically distributed random variables 41, ..., 4N for which
k) =
(k + 1)kxke-e(x)
(k + 1)!
k = 0, 1, ... ,
(1.2.11)
where the parameter x lies in the interval 0 < x < e-1 and the function 0(x) is
22
The generalized scheme of allocation and the components of random graphs
defined as 00 kk-1
9(x) =
xk. k=1
k!
By using (1.2.9), we easily obtain P{
(nl + 1)n, ... (nN + I
1 + ... + N = n} =
xne-Ne(x)
nl+...+nN=n(nl+1)I...(nN+l)!
N(n + N)n-1 xne-NB(x). n!
hence, for any x, 0 < x < e-1, and for nonnegative integers n1, ..., nN such that
n1+...+nN=n,
P( l
I41+...+4N=n)
n! (nl + 1)nI ... (nN + 1)nN
(1.2.12)
The right-hand sides of (1.2.10) and (1.2.12) are identical, and the joint distribution
of 111, ... , 11N coincides with the distribution of 41, ... , N under the condition that 1 + +4N = n. Thus, for the random variables t)l, ... , rI N and 1, ..., N, relation (1.2.2) holds, enabling us to study tree sizes in a random forest by using the generalized scheme of allocating particles into cells, with the random variables , N having the distribution given by (1.2.11). 41,
1.3. Connectivity of graphs and the generalized scheme Not pretending to give an exhaustive solution, let us describe a rather general model of a random graph by using the generalized scheme of allocation. Consider the set of all graphs rn(R) with n labeled vertices possessing a property R. We assume that connectivity is defined for the graphs from this set and that each graph is represented as a union of its connected components. In the formal treatment that follows, it may be helpful to keep in mind the graphs of random mappings or of random permutations. The former graphs consist of components that are connected directed graphs with exactly one cycle, whereas the latter graphs consist only of cycles.
Let an denote the number of graphs in the set rn (R) and let bn be the number of connected graphs in rn (R). We denote by f'n,N(R) the subset of graphs in rn (R) with exactly N connected components. Note that the components of a graph in f'n,N(R) are unordered, and hence we can consider only the symmetric characteristics that do not depend on the order of the components. To avoid this restriction, we, instead, consider the set fn,N(R) of combinatorial objects constructed by means of all possible orderings of the components of each graph from
1.3 Connectivity of graphs and the generalized scheme
23
rn,N(R). The elements of this set are ordered collections of N components, each of which is a connected graph possessing the property R, and the total number of vertices in the components is equal to n. Since the vertices of a graph in 1',,,N(R) are labeled, all the connected components of the graph are distinct; therefore the number of elements in Fn,N(R) is equal to N! an, N, where an,N is the number of elements of the set r,N(R) consisting of the unordered collection of components. Now let us impose a restriction on the property R of graphs. Let a graph possess the property R if and only if the property holds for each connected component: The property R is then called decomposable. Set ao = 1, bo = 0 and introduce the generating functions 00
anxn
A(x) _ n=1
Lemma 1.3.1.
n.
I
'
B(x) =
bnxn 00 n. I
n-o
If the property R is decomposable, then an ,N =
n! NI
bn, ... bnN n i +...+n N=n
n1!...nN!,
(1.3.1)
where the summation is taken over nonnegative integers n 1, ... , n N such that
n1 +nN=n n N) denote nN the number of graphs in G,N(R) with ordered components of sizes n1, . . . , nN.
n1+
We construct all an (n 1, ... , nN) such graphs and decompose then labeled vertices
into N groups so that there are n, vertices in the ith group, i = 1, ..., N. This n N!) ways. From n; vertices, we construct a connected graph possessing the property R; this can be done in bn; ways. Thus the number of ordered sets of connected components of sizes n1, ... , nN is can be done in n ! / (n I! .
n!bn1. . .bfN
an(nl,...,nN) _ n1!...nN!
Since N components can be ordered in N! ways, the number a n (n 1, ... , nN) of unordered sets, or the number of graphs in rn,N(R) having exactly N components of sizes n1, ... , nN, is
an(n1,...,nN)=
Lemma 1.3.2.
'
an(n1,...,nN)= N!
nIn,.. .bnN N! n1! ... nN!
If the property R is decomposable, then
= e B(x)
(1.3.2)
24
The generalized scheme of allocation and the components of random graphs
Proof. As follows from (1.3.1), the number an of all graphs in rn(R) is n
an =
ni!... nN. n1. N=l N=El N n1+...+nN=n
(1.3.3)
nN!'
By dividing both sides of this equality by n!, multiplying by xn, and summing over n, we get the chain of equalities 00
A(x) - 1
anxn
n=1
n!
oo
1
E N! n=1 N=1 Z
bnlxnl ... bnNxnN
nn
n1!...nN!
L-+
``
bn'n)N
nn
=
eB(x)
- 1,
which proves the lemma. Let us define the uniform distribution on the set Fn (R) and consider the random
variables an, equal to the number of components of size m in a random graph from Fn (R). The total number of components vn of a random graph from rn (R) is related to these variables by vn = at + + an. Arrange the components in order of nondecreasing sizes and denote by lam the size of the mth components in the ordered series; if m > vn, set j8m = 0. We will also consider the random variables defined on the set I'n,N (R) of ordered sets of N components. The ordered components labeled with the numbers from 1 to N play the role of cells in the generalized scheme of allocating particles. Define the uniform distribution on F ,N(R) and denote by 111, ... , ?IN the sizes of the ordered connected components of a random element in rn,N(R). It is then clear that
=n1,...,t1N=nN) _ Theorem 1.3.1.
N!an(ni,...,nN)
an(nl,...,nN)
N! an,N
an,N
.
(1.3.4)
If the series
B(x) = E bnxn n=0
n!
(1.3.5)
has a nonzero radius of convergence, then the random variables 77 1, ... , 17N are the contents of cells in the generalized scheme of allocation in which the independent
1.3 Connectivity of graphs and the generalized scheme
25
identically distributed random variables 1, ... , N have the distribution bkx k} =
k
(1.3.6)
k! B(x)'
where the positive value x from the domain of convergence of (1.3.5) maybe taken arbitrarily.
Proof. Let us find the conditional joint distribution of the random variables 1, , N with distribution (1.3.6) under the condition S l + + 4N = n. For such random variables,
P{6 +...+4N=n}=
xn
bnl...bnN,
. (B(x))N nl+...+nN=n nil ...nNi
(1.3.7)
and by virtue of (1.3.1), n
P{
1
+...+ N=n}_ (B(x
Hence, ifnl,...,nN > 1 andnl
I N!
nlanN.
(1.3.8)
=n, then
=nN 16
=n}
bnl ... bnNXn
=n} bnl ... bnNn!
nl!...nN!N!an N' and according to (1.3.2),
I 1+
+4N=n}= an(n1,...,nN)
(1.3.9)
an,N
From (1.3.4) and (1.3.9), we obtain the relation (1.2.2) between the random variables 171, ... , 17N and 1, ... , l N in the generalized scheme of allocating particles to cells. In the generalized scheme of allocating particles, we usually study the random
variables / r(n, N) equal to the number of cells containing exactly r particles and the order statistics 77(1), 71(2), ... , 1)(N) obtained by arranging the contents of
cells in nondecreasing order. In this case, tr(n, N) is the number of components of size r, and 77(1), 11(2), ... , 11(N) are the sizes of the components in a random element from I'n,N(R) arranged in nondecreasing order. The random variables help in studying distributions of the random variables a l ,--- an and the associated variables defined on the set F, (R) of all graphs possessing the property R.
26
The generalized scheme of allocation and the components of random graphs
Lemma 1.3.3.
For any positive x from the domain of convergence of (1.3.5),
P{vn = N} =
n! (B(x))N an xn
(1.3.10)
+4N = n}.
PR I +
Proof. Relation (1.3.10) follows from (1.3.8) because P{vn = N} = an,N/an by definition. It is clear by virtue of (1.3.3) that the number an can also be expressed in terms of probabilities related to i;1, ... , N: 00
an = Lemma 1.3.4.
N=1
nm (B(x))N
n).
n ,.n
(1.3.11)
For any nonnegative integers N, m 1, ... , Mn,
P{a1 =m1,...,an =mn I vn =N} = P{N-1(n, N) = m 1, ... , µn (n, N) = mn }.
Proof. The conditional distribution on rn (R) under the condition vn = N is concentrated on the set I'n, N (R) of graphs having exactly N connected components and is uniform on this set. Hence,
P{a1=In 1,...,an=mnlvn=NJ =
CN(m1,...,mn) an,N
,
(1.3.12)
where an,N is the number of elements in rn,N(R) and cN(m1, ... , mn) is the number of graphs in I'n,N(R) such that the number of components of size r is m
r=1,2,
,n.
Consider the above set I'n,N(R) composed of ordered sets of N components. Let cN (m 1, ... , m n) denote the number of elements in I'n, N (R) such that the number
of components of size r is mr, r = 1, 2, ... , n. It is clear that
P{µ1(n, N)=ml,...,µn(n, N)=rnn}=
CN(m1, ... , Mn)
(1.3.13)
an, N
where an,N is the number of elements in f'n,N(R). The assertion of the lemma follows from (1.3.12) and (1.3.13) because an,N = N! an,N and cN(ml, ,.. , Mn) _
N!CN(m1,...,mn). Thus, if the series (1.3.5) has a nonzero radius of convergence, then all of the random variables expressed by a1, ..., an can be studied by using the generalized scheme of allocating particles in which the random variables i;1, ... , N have the distribution (1.3.6). Roughly speaking, under the condition that the number v, of connected components of the graph rn(R) is N, the sizes of these components (under a random ordering) have the same joint distribution as the random variables
1.3 Connectivity of graphs and the generalized scheme
27
in the generalized scheme of allocating particles that are defined by the independent random variables 1, ..., i;N with distribution (1.3.6). Thus, for , 17N
111,
vn = N the random variables A, ... , ON are expressed in terms of al, ..., an in exactly t h e same way as the order statistics Y ( 17.) ,,-17(N)
in the generalized
scheme of allocating particles are expressed in terms of µ1(n, N), ... , µn (n, N). Hence, Lemma 1.3.4 implies the following assertion. Lemma 1.3.5.
For any nonnegative integers N, k1, ... , kN,
P{O1=k1,...,ON=kNI vn=N}=P{ri(1)=k1,...,1)(N)=kN}. (1.3.14)
We now consider the joint distribution of µ1(n, N), ... , An (n, N). Lemma 1.3.6.
For nonnegative integers m 1, ... , mn such that m 1 + +mn = N
P{t-t1(n,N)=ml,...,µn(n,N)=mn}
n!bj'...bn" m1! ... Mn! (1!)m' ... (n!)mnan P{vn = N}
(1.3.15)
Proof. To obtain (1.3.15), it suffices to calculate CN (m 1, ... , m n) in (1.3.13). It is clear that
cN(m1, ..., Mn) = E
..., nN),
where the summation is taken over all sets (n 1, . . . , n N) containing the element r exactly Mr times, r = 1, ... , n. The number of such sets is N!/(m 1 ! . . mn !), and for each of them, by (1.3.2),
an(nl,...,nN)=
n!bm'...bmn n (1!)mt ... (n!)jnn
Hence,
N!n!bi 1...bn" jN(yn1, ... , Mn) = ml! ... mn! (1!)m' ... (n!)mn To obtain formula (1.3.15), it remains to note that an,N an P{vn = NJ = an,N = Ni an
Lemmas 1.3.4 and 1.3.6 enable us to express the joint distribution of the random
variables a1, ... , an in a random graph from rn (R).
28
The generalized scheme of allocation and the components of random graphs
Lemma 1.3.7.
If ml,..., mn are nonnegative integers, then n
F1
P{al = ml, ... , an = Mn) =
bOr
mr r!)mr an r=l (
if _r=1 rmr = n,
0
otherwise.
Proof. By the total probability formula,
Plat =ml,...,an =mn} N
=EP{vn =k}P{a1 =ml,...,an =mn I vn =k} k=1
= P{vn = N}P{al = ml, ... , an = Mn I vn = N}, where N = in 1 +
+ mn. By using Lemma 1.3.4, we find that
P{al =In1,...,an =mn} = P{vn = N}P{1-t1(n, N) = m1, ..., An(n, N) = mn}.
(1.3.16)
It remains to note that P {µ 1(n, N) = M1, ... , µn (n, N) = mn } = 0 if M1 + n and that equality (1.3.15) from Lemma 1.3.6 holds for the 2m2 + + mmn probability P {µ 1(n, N) = m 1, ... , µn (n , N) = mn } if m 1 + + m n = N and + mmn = n. The substitution of (1.3.15) into (1.3.16) proves ml + 2m2 + Lemma 1.3.7. We now turn to some examples.
Example 1.3.1. The set Sn of one-to-one mappings corresponds to the set rn (R) of graphs with n vertices for which we have the property R: Graphs are directed with exactly one arc entering each vertex and exactly one arc emanating from each vertex. This property is decomposable. The connected components of such a graph are (directed) cycles. In this case, an = n!, bn = (n - 1)!, and the generating functions 1
A(x) = 1 - x ,
B(x)
log(1 - x)
satisfy the relations of Lemma 1.3.2:
A(x) = eB(x).
(1.3.17)
To study the lengths of cycles of a random permutation and the associated variables,
one can use the generalized scheme of allocating particles in which the random variables 41, ..., 4N have the distribution xk
klog(1 - x) '
k=1,2,..., 0<x<1.
1.3 Connectivity of graphs and the generalized scheme
29
Example 1.3.2. The set En of all single-valued mappings corresponds to the set F,, (R) of graphs with n vertices with property R: The graphs are directed with exactly one arc emanating from each vertex. This property is decomposable. Since the number of elements of En is nn, from relation (1.3.17) for the generating functions we find that 00 n n Nin ,
B(x) = log A(x) = log T n=0 yielding
n-1
k
bn=(n-1)!Y k! k=u
The radius of convergence of A (x) and B(x) is a-1, and at the point x = e-1, they diverge. To study the characteristics of a random mapping, we can use the generalized scheme of allocating particles in which the random variables 1, ... , N have the distribution bkxk 0 < x < e-1.
k = 1,2,...,
k} = kb B(x),
Example 1.3.3. Consider the set of all unordered partitions of the set Xn = (1, 2, ..., n) into disjoint subsets, the union of which is Xn. The partition of X, into unordered subsets Yl, ..., YN corresponds to the hypergraph of I,n,N(R) with n vertices and N hyperedges Y1, ... , YN. Since all of the N! orderings of the hyperedges Yl, ... , YN are distinct, each hypergraph of I'n,N(R) gives us N! distinct objects of Fn,N(R) that are hypergraphs with n vertices and N ordered hyperedges A1, ... , AN, with the sets of hyperedges being permutations of Y1, ... , YN. The property R determining this class of graphs requires that a graph be a hypergraph whose distinct hyperedges have no common vertices. Each connected component of such a graph is a hyperedge. Clearly, the number of connected graphs possessing the property R with n vertices is 1, that is, bn = 1, so °O
n
B(x)=-=ex-1. n_1 Since R is decomposable,
A(x) =
eex-1
This equality, or (1.3.3), yields
_ In
n
n!
N! nt+...+nN=n N=1
1 nl!...nN!,
where the second summation is over positive integers n1, ... , n N.
30
The generalized scheme of allocation and the components of random graphs
Thus, to study random partitions, we can use the generalized scheme of allocation in which the random variables 1, ... , N have the truncated Poisson distribution xk k} = k!(ex - 1),
k = 1,2,...,
0 < x < 00.
Example 1.3.4. A tree is a connected graph without cycles. As the set Fn,N(R), let us consider the set .T,,N of all forests consisting of N trees with the total number n of labeled vertices. The trees in a forest are not ordered. The property R determining this class of graphs requires that a graph be undirected without cycles. The property R is decomposable. The number bn of connected graphs possessing the property R is the number of nonrooted trees with n vertices and bn = nn-2, so the generating function is 00 nn-2xn
B(x) = E n=1
n!
Thus, to study a random forest from Tn,N, we can use the generalized scheme of allocation in which the random variables 1, ... , N have the distribution kk-2xk k} = k -2x) , k = 1, 2, ... , 0 < x < e-1.
1.4. Forests of nonrooted trees The graphs consisting of nonrooted trees and unicyclic components play the same role in investigating graphs as the forests of rooted trees do for graphs of mappings. Hence, the following sections concentrate on these objects, using the generalized scheme of allocation. As in Example 1.3.4, let JT ,N be the set of all forests of N nonrooted trees with
n vertices. It is known that the number of forests of N ordered rooted trees with total number n of nonroot vertices is N(N + n)n-1. In contrast to the forests of rooted trees, there is no simple formula for the number Fn,N = I -n,N I of forests of nonrooted trees. Therefore the first step is to study the asymptotic behavior of FF,N Denote by T the number of edges in a forest belonging to .F,,N. It is easy to see that T = n - N. Following the general algorithm for applying the generalized scheme of allocation, let us consider the set .&,N, which consists of N ordered nonrooted trees, and define the uniform distribution on this set. Denote by ill, ... , riN the sizes of ordered trees in a random graph from.&,N. By Cayley's formula for counting trees, the number bn of nonrooted trees with n vertices is nn-2. Denote by 5n(n1, . . . , nN) the number of elements in Fn,N for which {ill = n1, ... , 11N = nN}. It is easy to see that for positive integers n1, . . . , nN
1.4 Forests of nonrooted trees
31
with n!
5n(n1,...,nN)= nl! . . . nN
!bn1...bnN,
(1.4.1)
and the number of elements in Fn,N is n ! bn
Fn,N= nl+.+nN=n
iin(nl,...,nN) nt+...+n,v=n
l
... bfN
nil ... fN.
Thus, for the number of forests Fn,N, we obtain the formula nnnl-2
Y
Fn,N = - N!1
1
n1+ +nN=n
nnN-2 N
nl! . . . nN!
(1.4.2)
,
where the summation is over positive integers n 1, ... , nN such that n 1 +
+
nN = n. Introduce independent identically distributed random variables 41, ... , 4N for which k
P{1 = k} = k,B(x) =
k!k 2 k
B(x) ,
k = 1, 2, ... ,
(1.4.3)
0<x<e-1.
(1.4.4)
where 00 bkxk
B(x)=E k=1
k!.
°O
=1 k=1
k-2k k!.
,
In accordance with the results of the previous section and Example 1.3.4, the generalized scheme of allocation can be applied to investigating random forests of nonrooted trees, that is, relation (1.2.2) is valid: For any integers n1, . . . ,11N,
P{111 =n1,...,11N=nN}= P{1 =n1,...,4N=nN
141+...+4N=n}.
For the number of forests Fn, N, formula (1.3.8) is valid, which, of course, can be obtained directly from (1.4.2) and (1.4.3):
n!(B(x))N
F"'N=
.4.5) (1.4.5)
where B(x) is defined by (1.4.4), and the value of the parameter x in the distribution (1.4.3) of the random variables 41, ... , N can be chosen arbitrarily from the domain of convergence of the series B(x). Thus, to obtain the asymptotics of Fn, N, it is sufficient to choose an appropriate value of x, 0 < x < e-1, and analyze the asymptotic behavior of the probability SAN = n} for the sum of the random variables 1, ... , N that have the distribution (1.4.3) with the chosen value of the parameter x.
The generalized scheme of allocation and the components of random graphs
32
The first two moments of the random variable 1 have the following expressions: 00 kk-lxk
1
B(x) k=' k=1
}00 kkxk
1
E1
k!
B(x) ` k!
Therefore, along with B(x), we consider two functions kkxk
a(x)
oo kk-1xk
0(x) _
0"
1: k=1
k! .
k=I
k.
The function 0(x) is the solution of the equation 0e-0
=x
(1.4.6)
if we choose the solution that is less than 1.
The functions a(x) and B(x) can be represented in terms of this function. Differentiating (1.4.6) gives 0'(x)e-B(x)
-
0(x)0'(x)e-9(x)
= 1;
hence, 0(x) (1.4.7)
0'(x) = x(1 - 0(x))
On the other hand, kkxk
x0'(x) _
"0
k=1
k.
= a(x).
Thus
a(x) =
0(x)
(1.4.8)
1-0(x)
Slightly more complicated calculations are needed to obtain the relation B(x) = 12(1 - (1 - 9(x))2.
(1.4.9)
Consider the function
h(x) = (1 - 0(x))2. By using (1.4.7), we obtain
h'(x) = -2(1 - 0(x))0'(x) = -20xx)
_2T 00 k k-1 k-I =
k=1
kd
1.4 Forests of nonrooted trees
33
When we integrate both sides of this equality, we obtain rx
J
- kk-1
h'(t) dt = h(x) - 1 = -2 k=1
k!
x
Jo tk-ldt
00
_
-2 T kk
-2B(x),
k=1
which implies equality (1.4.9). Relations (1.4.8) and (1.4.9) allow us to calculate the mean E 1 and the variance
For 0 < 0 < 1, we set x = 9e-e. For such a choice of the parameter x, O(x) = 9,
a(x) =
9 B,
1
B(x) = e(22 e);
therefore
0(x) B(x)
m= 2
6
D
a(x) 1 - B(x)
_
-
2
2-0' (0(x)) B(x)
-
20 (1 - 0)(2 - 0)2
If the parameter 0 is fixed, then Theorem 1.1.11 may be applied to the sum
In fact, the theorem on local convergence to the normal law is valid in a wider region.
If N - oo and 0 = O (N) varies such that ON -+ oo and (1 - 0)3N - oo, then Theorem 1.4.1.
uniformly in the integers k such that u = (k - Nm)/(QIN--) lies in any fixed finite interval. Proof. First we prove that, under the conditions of the theorem, the distribution of mN)/(Q-lN--) converges weakly to the normal distribution with parameters (0, 1). According to Theorem 1.1.9, it is sufficient to demonstrate convergence of the corresponding characteristic function VN(t) to the characteristic function e-t2/2 of the standard normal distribution.
The generalized scheme of allocation and the components of random graphs
34
The characteristic function of l equals 1
°O
tv(t) = B(x)
F
kk-2xkeitk
B(xeit)I _ B(x)
k!
By virtue of (1.4.7), (1.4.8), and (1.4.9),
B(x) = 2 (1 - (1 - 6(x))2),
xB'(x) = 0(x), x2B"(x) = 02(x)(1
- 0(x))-t,
x3B"'(x) = 03(x)(1 - 20(x))(1 - 0(x))-3. Therefore
(1.4.10)
For x = 0e-B, 0(x) = 0,
B(x) =
0(2-0) 2
Denote by fi(t) the characteristic function of the centered random variable
i;l - 0(x)/B(x). Then ,I'(0) == 0 ,
,I"(0) = -Q2 =
20
(1- 0)(2 - 0)2'
Let
g(t) = log t/*(t). It is not difficult to check that g111(t)
=
2i0(xeit)(202(xeit) - 0(xeit) - 2)
(l - 0(xeit))3(2 - 0(xeit) )2
Therefore, if x = 0e-9, then there exists a constant c such that
Ig (t)I - (1
CO
- 0)3'
(1.4.11)
1.4 Forests of nonrooted trees
35
and
/J(t) = eg(t) =exp
or
22
3
2 + O \(1Itle)3 j . / 1-
The characteristic function coN (t) of the random variable
(1.4.12) mN)/(QN/_N)
satisfies the equality coN(t) = *N(t/(Q/)); hence, for any fixed t, as N -a oo, cPN(t) = eXp -
t2 2
+O
(1.4.13)
1
O(1
9)3N
The conditions of the theorem specify that NO -+ oo, N(1 - 0)3 for any fixed t, as N -* oo,
coN(t) , and the distribution of
oo; hence,
e-tz /2,
converges weakly to the standard
normal law. To prove the local convergence of these distributions, we need additional estimates of the characteristic function co(t). It is reasonable to assume that the local theorem is valid in the same regions as the integral theorem proved above, but the necessary estimates are complicated to find, and therefore we restrict ourselves to a proof of the local theorem only in the case where O < Op < 1 and ON oo.
From (1.4.12), it follows that there exists s > 0 such that for ItI < s and
O
e-co-r.
(1.4.14)
We now show that for any s, 0 < s < n, there exists a positive constant c such
that fors
If O -+ 0, then
x = Oe-B = O - 02 + 0(03),
-
fi(t)
B(xett) B(x)
- xeu +x2e2it/2+ 0(03) 0(1-0/2)
= eit + (e2tt - eit)0/2 + 0(02). Now
left +
2it
(e
- e`t)0/212 = 1 - 20 sin2(t/2) + 0(02),
as O -). 0; therefore
IcP(t)I = 1 - 0 sin2(t/2) + 0(02),
(1.4.15)
36
The generalized scheme of allocation and the components of random graphs
uniformly in t, and fore < ItI < r there exists 3 > 0 and cl > 0 such that (1.4.16)
Iw(t)I < e-`'a
for 9 < S. For any 9, 0 < 0 < 1, the distribution of l has maximal span 1 and tp(t) is continuous in t and 0 in the region
B={(t,0):s
q = sup Iw(t)I < 1, B
and there exists c2 > 0 such that I99(t)I <
e-c2e
(1.4.17)
for (t, 0) E B. This estimate and (1.4.16) imply (1.4.15). Proving the local theorem, we follow the proof of Theorem 1.1.11 as a model for similar proofs. We set
k-mN
u=
PN(k) =
k}
and represent the difference
RN = 2ir (cY./77PN(k)
-
2n e-u2 /2 J
as a sum of the following four integrals:
Il =
LA
e-itu((
A
e-itu-t2/2
12 = 13 =
(t/(a
)))N -
e-t2/2)
dt,
dt,
A
f
14 = f
e-etu(*
a
where the constants A and s will be chosen later. To see that RN 0 as N --* oo, we show that RN can be made arbitrarily small by choosing of s, A, and N. It is clear that
1121 < Lid
e-t2/2 dLid
1.4 Forests of nonrooted trees
37
and 1121 can be made arbitrarily small by choosing a sufficiently large A.
Choose e > 0 such that estimate (1.4.14) is fulfilled. Then, for 6 < Bo < 1, Itl < E, *(t1(a-1-N))I
<e
so that 1131 < I
e-ctzdx,
JA
and 1131 can be made arbitrarily small by the choice of sufficiently large A. For fixed A, the integral Il tends to zero because cp (t) e-t2/2 uniformly with respect to t in any finite interval. Finally, with the help of estimate (1.4.17), we obtain that as N -- oo,
(a)
1141<
< aIN--
J
a Je-c6N
N
dt
I(v(t)INdt 0.
Denote by p(u; a, 0) the density of the stable law with parameters a and ,B in Zolotarev's parameterization (see [60]). If a 1, the characteristic function f (t) of this distribution can be represented in the form
f(t)=exp{-It1"expj_K(a)flIrI where K (a) = 1 - I 1 - al. By the inversion formula, p(u; a,
=
I:e_ituexp {_ItIexP -IZ K(a)f ItI
dt.
(1.4.18)
If N -+ oo and 6 = 1, then the distribution of 2N)/(bN3/2), where b = 2(2/3)2/3, is approximated by the stable distribution with parameters
a3/2,0=-1.
Theorem 1.4.2. If N -+ oo, 6 = 1, b = 2(2/3)2/3, then
n) = p(u; 3/2, -1)(1 + o(1)) uniformly in the integers n such that u = (n - 2N)/(bN2/3) lies in any fixed finite interval.
The generalized scheme of allocation and the components of random graphs
38
Proof. The terms of the sum N = 41 +
+ 4N are independent identically
distributed random variables, and for 0 = 1, 2kk-2e-1
k} =
k = 1, 2, ... ,
,
k!
(1.4.19)
and 2, since O(e-1) = 1 and B(e-1) = 1/2. The maximal span of the distribution is 1; therefore, by Theorem 1.1.10, it suffices to prove that the distribution of (ON - 2N)/(bN2/3) converges weakly to the stable law given in the theorem. In addition to 0(x), a (x), and B (x) defined above, we consider the function
00 kk-3zk
C(z) = T
,
k!
k=1
Izi < I.
This can be expressed in terms of 0(z): Let
g(z) = (1 - O(z))3. By using the equalities
z0'(z) =
0(Z)
1- O (z)
and
B(z) = z(1 - (1 - 0(z))2), we easily obtain
00
zg'(z) = -30(z) + 302(z) = 30(z) - 6B(z).
Integration then gives
rZ
J
g'(u)du = g(z) - 1 = 3
Z
J
0(u)du
u
r z B(u)du
-6J
u
= 3B(z) - 6C(z).
Expressing B(z) in terms of 0(z) demonstrates that, for Izi < 1,
C(z)=12-4(1-O(z))2-1(1-0(z))3. Since 0(e-1) = 1, we find that C(e-1) = 5/12. Set
u(z) = 1 - 0(z),
v(z) = C(z) - C(e-1).
We have shown that
v(z) = -4u2(z) - -ju3(z). If we invert this expression, we obtain two formal solutions
u(z) = ±2i v(z) + 3v(z) + O(Iv(z)I3/2);
39
1.4 Forests of nonrooted trees
since u(x) > 0 and v(x) < 0 for 0 < x < e-1, we choose the solution
u(z) = -2i v(z)
3v(z) + O(Iv(z)I3/2).
{
(1.4.20)
Hence,
6(V
(1 - 6(z))2 = u2(z) = -4v(z) -
+ O(Iv(z)12).
(1.4.21)
The first two derivatives of C(z) are 00
kk-2zk-1
Y
_
k!
B(z) z
k=1
-
'
kk-lzk-2 k!
k=1
Therefore, for real t,
C(e-'+'t) - C(e-1) = it/2 + O(t2).
(1.4.22)
Now we find an expression of the characteristic function tp(t) of the random variable t;1 with distribution (1.4.19). It is clear that B(e-1+ft)/B(e-1).
qp(t) =
From (1.4.20), (1.4.21), and (1.4.22), we find that for z =
ett-1
cp(t) = 1 - (1 - 0(z))2 1 +4v(z) +
13Z
(v(z))3/2 + 0(Iv(z)12) 3/2
= 1+2it+ 3`2nIt13/2(ItTl
f
= 1+2it+1bt13/2i
it
(t)
+O(t2)
3/2
+0(t2),
where b = 2(2/3)2/3. By virtue of the equality r
(It
1312
Itll
we can rewrite the last relation as l+it tO(t) = BB(e_1)) = 1
ZTrt l =-exp{41tI
+ 2it -
Ibt13/2 exp j i7rt
Since
e-21t
= 1 - 2it + 0(t2)
+ O(t2).
40
The generalized scheme of allocation and the components of random graphs
as t -+ 0, we find that
*(t) = e-2itw(t) =1- IbtI3/2 exp { i,-rt
2N)/(bN2/3) is
The characteristic function of the random variable 3/2
,N(tl(bN2/3))
C1 - I tI3
exp
+ 0(t2).
{ijrtj
0(N4/3))N
+
and converges to 4i7rt
f (t) = exp { - It X3/2 exp { islthe
1t 111
at any fixed t. The function f (t) characteristic function of the stable law p(u; a, 0) with parameters a = 3/2, ,B = -1. Therefore, according to Theorem 1.1.10, as N -+ oo,
n} - p(u; 3/2, -1) -+ 0 uniformly in k, where u = (k - 2N)/(bN213). The function p(x; 3/2, -1) is positive for any x; hence,
k} = p(u; 3/2, -1)(1 + o(1)) uniformly in k such that u = (k - 2N)/(bN2/3) lies in any fixed finite interval.
We now turn to the estimate of the number of forests Fn,N with n vertices, N trees, and T = n - N edges. Theorems 1.4.1 and 1.4.2 allow us to estimate the number of forests. Theorem 1.4.3.
oo and 0 = 2T/n varies such that ON -+ oo and
If n
N(1 - 9)3 -k oo, then Fn,N =
n2T 1 - B 2T TI (1 + o (1))
(1
.
4 23) .
Proof. Put
0 = 2T/n,
x = 9e-0 .
(1.4.24)
By virtue of (1.4.5),
Fn,N =
nl(B(x))N N!xn
P{ N
= n),
(1 . 4 . 25)
where the parameters are chosen so that
B(x) = 1(1 - (1 _ O)2) = 2
0(2 2 9) = 2TN. n2
(1.4.26)
1.4 Forests of nonrooted trees
41
2/(2 - 0) = n/N, by Theorem 1.4.1,
Since m =
n} =
(1.4.27)
-0+00)),
2
where
Q2_ =
20 1 = (1 - 0)(2 - 9)2
_
nT (1 - 8)N2
If we substitute (1.4.24), (1.4.25), and (1.4.27) into (1.4.25), we can conclude that under the conditions of the theorem,
Fn N =
n!(B(x))N N(1 - B) (1
N!x" 2nnT
+ 0(1))
n2T 1 - 8 0+00)). 2TTTe-T 27 T
If n -+ oo and 2T/n -+ 1 so that
Theorem 1.4.4.
(1 - 2T/n)Nl/3 --* b2/3v/2,
-oo < v < oo,
then
Fn,N =
2/3)2/3 p(-v; 3/2, -1)(1 + o(1)).
(1.4.28)
N! 2NN1
Proof. Under the conditions of the theorem,
u - n - 2N - -(1 - 2T/n)N1/3 (bN)2/3
n
b2/3N
, -v;
thus, by Theorem 1.4.2, continuity, and positivity of the density p(u; 3/2, -1),
n} = p(-v; 3/2, -1)(1 + o(1)).
(1.4.29)
We chose 0 = 1 in Theorem 1.4.2; hence, x = e-1 and B(x) = B(e-1) = 1/2. Having substituted these values and (1.4.29) into (1.4.25), we conclude that, under the conditions of Theorem 1.4.4, n!
Fn,N =
p(-v; 3/2, -1)(1 + o(1))
N! 2Ne_"bN2/3 nn, p(-v; 3/2, -1)(1 +0(1)). = N! 2NN1/6(2/3)2/3
Although the density p(x; 3/2, -1) cannot be represented in terms of simple functions, we can use the relation p(x; a, 0) = p(-x; a, -,B) and the following series expansion for x > 0 and 1 < a < 2 for our calculations: 1
p(
°O
n n-0
I)n
I ((n + 1)/a) an!
irn 2
n
a
42
The generalized scheme of allocation and the components of random graphs
1.5. Trees of given sizes in a random forest Let µr = it, (n, N) be the number of trees with r vertices in a random forest with n labeled vertices and N nonrooted trees, r = 1, 2.... Recall that such a forest has T = n - N edges. In this section, we consider the asymptotic behavior of the random variables µr (n, N). Following the approach established in the previous section, we use the generalized scheme of allocation of n particles to N cells determined by identically distributed random variables 1, ... , iN such that 2kk-2ek-le-kB
P{ 1=k}=pk=pk(e)=
k!(2-0)
k = 1,2,...,
0<0<2.
As we have calculated, 2 2-8'
and for 0 < 0 < 1, 20
o.2 =7(0)=D41= (1-e)(2-e)2' We will also use the notation 2
2
Sr=Sr(B)=pr
\
1-pr- (µ Q2- r)2 pr1
r=1,2....
The random variables µr behave much like the corresponding variables for a random forest of rooted trees. We highlight some of these results; see [30] for a complete description. As before, let 0 = 2T/n. Again the value 0 = 1 is of particular interest, so we introduce the following notations: For r = 1 , 2, ... ,
nr = nr(e) =
ar =ir(e)=
P, (0),
0 < 0 < 1,
pr(1),
1<0<2,
sr(0),
0<0 < 1,
Pr(l) (1 - Pr(1)),
1<0<2.
The truncated values 7rr(0) and QY (0) allow us to summarize the rather complicated behavior of µr, r > 3, in the following two theorems. Theorem 1.5.1. If n, N - oo and r = r(n) > 3 varies such that NJrr (0) -+ oo, then
P{l-tr = k) =
-e-u;12(1 Qrr
(9)
1
2n N
+o(1))
1.5 Trees of given sizes in a random forest
43
uniformly in the integers k such that
ur
k - Nrrr, ((0)
Qrr(0)N1/2
lies in any fixed finite interval.
Theorem 1.5.2. If n, N -+ oo and r = r(n) > 3 varies such that N7rr(8) -, A for some A, 0 < A < oo, then for anyfixed k = 0, 1, ... , eke-)
P{µr = k} =
k-
(1 +o(1)).
The random variables µl and µ2, like their analogs for forests of rooted trees, have some special properties, but we will not discuss them. When edges are added sequentially to a forest, then by Theorems 1.5.1 and 1.5.2,
the asymptotic behavior of µr does not depend on 0 if 8 > 1. If Npr(1) --> 00, then the limit distribution of Ltr, with similar centering and normalizing, is the standard normal distribution for all 0, 1 < 8 < 2.
There are similar results for the case 0 > 1 and Npr(1) - A for some A, 0 < A < oo, with the limit distribution of the µr for all 8, 1 < 8 < 2, being the Poisson distribution with parameter A. Thus the point 0 = 1 can be interpreted as a critical point in the evolution of a random forest. We now prove Theorems 1.5.1 and 1.5.2.
Proof of Theorems 1.5.1 and 1.5.2. According to Example 1.3.4 and Lemma 1.2.1,
P{It, = k} =
where N = 1 +
1,
... , l;N;
()P(1 - pr)N
kP
}
1
nn} kr
'
(1.5.1)
+ N, N) _ ir) + ... + N , the random variables
4N) are independent and identically distributed,
kk-2xk
pr = P{41=k}= k2X B (x)
r` k!
°° kk-2xk
B(x) =
k=1
P{fir) = k} = P(41 = k I i
r),
(1.5.2)
and the parameter x of the distribution of 1, ... , SAN may be taken arbitrarily from the domain of convergence of the series B(x).
44
The generalized scheme of allocation and the components of random graphs
We set 0 = 2T/n. It is convenient to choose x = x = e-1 for 1 < 0 < 2. With these choices, (1.5.1) gives
P{µr=k}=(k)(01
(nr))k(-7rr(O))N-kP {
0e-9
for 0 < 0 < 1 and
N(r-)k=n-kr }
(1.5.3)
n}
where k} = 7rk(0),
k = 1,2,...,
girl and the distribution of is defined by (1.5.2). Reasoning by contradiction, we see that it is sufficient to prove Theorems 1.5.1 and 1.5.2 under the assumption that 0 lies in any of the following three domains: first, where NO oo and (1- 0)3N --+ oo; second, where (1- 0)3N is bounded
by an arbitrary constant; and, third, where (1 - 0)3N -+ -oo. Negating either theorem implies the existence of a subsequence of the parameters n, N such that 0 lies in one of these three domains for which the other conditions are satisfied but for which the conclusion is false. Therefore we assume that n, N -+ oo in such a way that 0 lies in one of the domains and prove the assertions of Theorems 1.5.1 and 1.5.2 in the corresponding three cases. Consider first Theorem 1.5.1 in the first domain of 0. By the de Moivre-Laplace theorem, the binomial distribution is approximated by a normal or Poisson distribution. More precisely, if N7rr, (0) -+ oo, then
(N
-Jrr(0))N-k
= 27rnn1(0)(11)
nr(0))e-ZZ/2
(1.5.4)
uniformly in k such that
z=
(k - N7rr(0))2 2N7rr(0)(1 -7rr(0))
lies in any fixed finite interval. The probability PRN = n } from the denominator of (1.5.3) has been estimated in the previous section. Applying Theorem 1.4.1, we have for 0 in the first domain,
n} =
1
a
0+00)),
(1.5.5)
where U2
20
= or 2(0) =
(1 -0)(2-0)
To find the asymptotics of the numerator of (1.5.3), we begin by calculating the
1.5 Trees of given sizes in a random forest
45
first and second moments:
It -
1-7Cr
r
a2(8) = D (r) = r
r
1
a2
(Jr(/i_r)2\
1 - TCr
(1 - 1Lr)a2
2/(2 - B).
where µ =
A proof similar to that of Theorem 1.4.1 shows that a normal approximation is valid for the sum N) _ fir) + + N). More precisely, if n, N oo such that
6N- ooand(1-9)3N--* oo,then s} =
1
e-(s-Nmr)2/(2v,2N)(1
ar 2nN
+o(1))
(1.5.6)
uniformly in r > 3 and s such that (s - Nmr)/(a,/N) lies in any fixed finite interval. We now use (1.5.6) with s = n -kr and N-k summands to obtain an asymptotic expression for n - kr}. Since
k=Nnr+ur rrfA, where 2
2
arr=arr(B)=Pr(l-pr-(µ-r)
2
2
we have
N-k=N(1-Pr)-ur(rrr'1N =N(1- Pr) 1-
urarr (I - pr)1,I
).
(1.5.7)
It is easy to see that arr/(1- Pr) is bounded, and for Ur lying in any finite interval,
N - k = N(1 - pr)(1 + O(N-1/2)). 1
(1.5.8)
The exponent in (1.5.6) may now be written as
u2- (n-kr-Nmr)2 2a,2(N-k)
Taking into account (1.5.7), (1.5.8), and the equalities
n=Nµ,
pr (µ - r)
mr - µ= 1-pr
mr-r=
r
1-Pr
which hold for 0 in the first domain, we obtain
k(mr - r) - N(mr - µ)
ar(N-k)'I'
r2(µ - r) (k - Npr)
aarr(N-k)'i'
- r) (k - Npr) (µ - r) p Pr +o(1)). (1 +o(1)) = a(l - pr)1/2arrJ7 a(I - pr)1/ZUr(1
46
The generalized scheme of allocation and the components of random graphs
Applying (1.5.7) gives
P{,N'-k
= n - kr} =
) e-pr(µ-r)2U21(2a2(1-pr))(l
I
+ o(1)).
yr 2n N(1 - Pr)
(1.5.9)
When we substitute (1.5.4), (1.5.5), and (1.5.9) into (1.5.3), we see that under the conditions of Theorem 1.5.1 with 9 in the first domain, this expression transforms into the product of an exponent and a coefficient. The coefficient of the exponent is
v 27rN 2irNpr(1 - pr)ar 2irN(1 - Pr) I V27rNpr(1
1
- pr)(1 - pr(µ - r)2/((1 - pr)Q2))
Qrr
2N
Combining the exponents from (1.5.4) and (1.5.9) yields the resulting exponent
(k - Npr)2 _ pr(tt - r)2(k - Npr)2 + 0(1) 2Npr(I - pr) 2Q2(l - pr)QrrN
(k - Npr)2 +0(l). 2o'rrN
Thus Theorem 1.5.1 is proved for 9 varying in the first domain. Under the conditions of Theorem 1.5.2, k is fixed, and when we apply (1.5.5) and (1.5.6) with the corresponding parameters, we obtain the ratio
P{,N)-k = n - kr}
1.
P{ = n}
Therefore the assertion of Theorem 1.5.2 follows from the Poisson approximation of the first factor in (1.5.3). In the second domain, we choose the parameter of the distribution of 1, ... , 4N
to be 1. If Npr (1) - oo, then (Nk)(Pr(l))k(l
1
-
pr(1))N-k =
e
_Z2/2
(1 +0(1))
2 NPr(1)(1 - Pr (1)) (1.5.10)
uniformly in k such that
z=
k - Npr(1) Npr(l)(1 - pr(1))
lies in any fixed finite interval. Applying Theorem 1.4.2 gives
bN2/3PRN = n} = p(u; 3/2, -1)(1 + o(1))
(1.5.11)
uniformly in n such that u = (n - 2N)/(bN2/3) lies in any fixed finite interval.
1.5 Trees of given sizes in a random forest
47
Restricting the random variables ir), ... , N) does not affect their maximum span and convergence to the stable law with density p(u; 3/2, -1). The only r)
difference is that now the mean of a summand is
= mr(1) = 2/(1 - pr(1)).
Therefore, as j -+ oo, bj213
bj213
P{ (1 - Pr('))213 XP
= 11 _
(1 - pr(1))213
{(r) - Jmr(1))(1 - pr(1))213
- I - jmr(1))(1 - r(1))2/3
bj213
bj213
= p(v; 3/2, -1)(1 + o(1)) uniformly in 1 such that v
_ (1 - jmr(1))(1 - Pr(1))2/3 b j213
lies in any fixed finite interval.
By substituting N - k for j and n - kr for 1 and recalling that
k = Npr(1) +z/Npr(1)(1 - pr (1)), where z is bounded, we have
bN2!3P{,N) k = n - kr} = p(u; 3/2, -1)(1 + o(1))
(1.5.12)
uniformly in r > 3, where, as in (1.5.11), u = (n - 2N)/(bN2"3), since V
(1 - jmr(1))(1 - pr(1))213
_ n - 2N +
bj213
bN213
(1). 0(1)'
Thus the asymptotics of n} and P1 (r )k = n - kr) is the same and their ratio in (1.5.3) tends to 1. Therefore the asymptotics of P{µr = k} is determined by the first factor and coincides with the asymptotics of the corresponding binomial probability. Theorems 1.5.1 and 1.5.2 have now been proved in the second domain. It remains to prove the theorems for the third domain, where (1 - 2T/n)3N -oo. We choose 6 = 1 in the distribution of the random variables i, ... , lv and prove that in (1.5.3) the ratio
P{,N'-k
=n-
n} - 1
(1.5.13)
-(1- pr(1)), where z lies in any uniformly in r, and k = Npr(1) + z Npr(1 fixed finite interval.
In this case, (1 - 2T/n)3N
-oo, so the values n for the sum N and the
values n - kr for the sum N-k lie in what is called the region of large deviations. Therefore we need to apply the theorem on large deviations. We will not give the
48
The generalized scheme of allocation and the components of random graphs
proof, but the main idea is simple: If the distribution of a sum of independent identically distributed integer-valued random variables with zero mean converges to a stable law with parameter a, 1 < a < 2, then the major contribution to a large deviation of the sum is made by only one of the summands (see [137]). Applying this theorem to the sum N gives the following result for B in the third domain. If n, N - oo such that N(1 - 2T/n)3 -) -oo, then
n} = P{t;N - 2N = n - 2N)
=
2 = n - 2N}(1 + o(1)) 2
n)
1/2
N (1 + 0(1)) (n - 2N)5/2
The theorem given in [137] cannot be applied to the sum since its summands become noninteger after centering by the expectation mr. Britikov, using the method given in [137], along with ideas from [58] and [113], proved in [30] that the probability n - kr} has the same asymptotics as PRN = n}.
More precisely, if n, N - oo such that N(1 - 2T/n)3 -k -oo, then
P{,i?k=n-kr)
=
(N - k)P{ lri - Mr = n - kr - (N - k)mr)
_ (N - k)P{4(r) = n - 2N + O(ff)} =
)1/2
\/
(27r
N (n - 2N)5/2
(1 + 0(1))
uniformly in r > 1 and k such that (k - Npr(1))/(Npr(1)(1 - pr(1)))1/2
lies in any fixed finite interval. Thus the ratio in (1.5.3) tends to 1, and the asymptotics of P{µr = k} is determined by the first factor and coincides with the asymptotics of the corresponding binomial probability. This proves Theorems 1.5.1 and 1.5.2 in the third domain. The proof of Theorems 1.5.1 and 1.5.2 is now complete.
1.6. Maximum size of trees in a random forest The results of the previous section give some information on the behavior of the maximum size 17(N) of trees in a random forest from Jr,,,N with T = n - N edges.
Indeed, if 0 = 2T/n -k 0 and there exists r = r(n, N) such that Npr(0) -' 00 and Npr+1 (B) -+ X, 0 < A < oo, then the distribution of the number µr of trees of size r approaches a normal distribution, and the distribution of µr+1 approaches
1.6 Maximum size of trees in a random forest
49
the Poisson distribution with parameter A. This implies that the limit distribution of the random variable i1(N) is concentrated on the points r and r + 1.
If 0 = 2T/n -k y, y > 0, then there are infinitely many r = r(n, N) such that the distribution of µr approaches a Poisson distribution; hence, the distribution of 11(N) is scattered more and more as y increases. If 0 < y < 1, then the limit distribution is concentrated on a countable set of integers, whereas if y > 1, then 71(N) must be normalized to have a limit distribution, and the normalizing values tend to infinity at different rates, depending on the region of 0. Thus, it should be possible to prove the limit theorems for 17(N) when T/n --* 0
by using results on µr from the previous section. But if 2T/n -* y for y > 0, this approach may not work, and even if it did, the proofs would not be simple.
Therefore we choose instead to use the approach based on the generalized scheme of allocation. Let l, ... , 4N be random variables with distribution 2rr-2er-le-rB pr(e) = P{41 = k}
=
0<0 <2,
'
r!(2 - 0)
(1.6.1)
where k = 1, 2. .... We choose 0 = 2T/n. Then, according to Lemma 1.2.2, P{ n(N)
< r } = (1 - Pr )N
P
(R)
=n n}j n
(1 . 6 . 2)
'
where
N=
N) .
with lri, ... , Ni being independent identically distributed random variables such that P{&(r)
k = 1, ..., r,
k 151 < r),
= k} =
(1.6.3)
and
r
Pr = Pr(e) =
r) = E pk(0).
(1.6.4)
k=1
We now state the theorems that completely describe the behavior of 77(N), deferring
their proofs. Our procedure follows Britikov [28]. Theorem 1.6.1. If n, N -+ oo, 0 = 2T/n -k 0, and the integers
r=r(n,N)> 1 vary such that Npr (0) - oo and Npr+1(0) -+ X for 0 < A < oo, then P{11(N) = r} = e-X +0(1),
P{17(N) = r + 1) = 1 - e-A + 0(1).
The generalized scheme of allocation and the components of random graphs
50
Note that if A 0 0 in the conditions of the theorem, then Npr (0) + 00 without any additional requirements. In particular, the conditions of the theorem are fulfilled if T/n(r-1)/r ,
0 < p < 00.
p,
Under this condition, Theorem 1.6.1 was proved by Erdds and Renyi [37], whose well-known paper provided the only results on the behavior of 17(N) until Britikov's work seventeen years later [28].
Theorem 1.6.2. If n, N -+ oo, 0 = 2T/n -* y, 0 < y < 1, then for any fixed integer k,
(Y - 1 - log Y)5/2 k+(a))(y-l-logy) e(
[a] < k} = exp
(ey-1 - y)
27r
0+00A
where
log n -
a
log log n
z 0-1-log9
and [a] and (a) denote, respectively, the integer and fractional parts of a.
00, 0 = 2T/n
Theorem 1.6.3. If n, N
1, and N(1 - 0)3 -k 00, then for
any fixed z,
P017(N) - U < z} where $
a-e z,
log(Be1-e) and u is the root of the equation
(2
/2
)
NN312 __ u5/2eu
Theorem 1.6.4. If n, N -+ 0o such that N1/3(1 - 2T/n) -+ v, -00 < v < 00, then for any fixed positive z,
)'Is (z, v),
00
P{ 11(N)
1
p(v; 3/2, -1)
=1
1s! \
\-
3
where b = 2(2/3)2/3, Is (w, Y) =
p(Y - xi - ... -xs; 3/2, -1)
I.
A = {(xl,
(xl ... xs)5/2
, xs): xj ? w, j =
dx ...dxs, 1
1, .. , s},
and p(y; 3/2, -1) is the density of the stable law with parameters a = 3/2,
,8=-1.
1.6 Maximum size of trees in a random forest
51
Theorem 1.6.5. If n, N -+oo, N(1 - 2T/n)3 -+-oo, then for any fixed z,
P
n-2N-t1(N)
z
f p(y; 3/2, -1) dy.
bN2/3
We will prove Theorems 1.6.1-1.6.5 with the help of relation (1.6.2). Under the conditions of Theorems 1.6.1-1.6.3, P{eN)
n} - 1,
=
(1.6.5)
and the limit distribution of r)(N) is the same as the limit distribution of the maximum of the random variables 1, . . . , err. Therefore we first obtain some auxiliary results on the asymptotic behavior of 00
Pr=Pr(6)=
Pr(0) k=r+1
Lemma 1.6.1. If n, N vary such that Npr (0)
0, and the integers r = r(n, N) > 1 oo, 0 = 2T/n oo, Npr+1 (0) + A, 0 < A < oo, then
NPr-1 - 00,
NPr+1 ' 0.
NPr + A,
0e-0
Proof. Under the conditions of the lemma, x = that
- 0. It follows from (1.6.3)
00
Pr+s(e)
Pr = E Pr+s(6) = Pr+1(0) s=1
Pr+1 = Pr+1(0)T,
1 +
(00 s=2 pr+1(0)
Pr+s(e)
(1.6.7)
s=2
Taking into account the bounds for factorials
rr
re-r
< r! <
27rrrr-e-re1/(12r)
we find from (1.6.1) that
Pr+s(0) Pr+1(6)
A(xe)s
1
where cl is a constant. Hence, 00
Pr+s(0) ss=2
<
(1.6.6)
clxe
Pr+l(0) - 1 - xe
= o(1)
52
The generalized scheme of allocation and the components of random graphs
as 0 - 0. Now by virtue of (1.6.6) and (1.6.7),
NPr = Npr+1(0) = 1,, +o(1),
NPr-1 > Npr(O) -+ 00,
NPr+1 + 0.
Note that if A 0, then Npr(0) -k oo without any additional conditions, so this requirement may be excluded from the conditions of the lemma if A 0. Indeed, Pr(0)
Npr(0) = NPr+1(0)
Pr+l (0)
Since x - 0 and
r
pr(0)
r-2 I
r21
(r+1)r-2
x - (1 r+1)
pr+1(0) -
x
there exists a constant c2 such that Npr(0) > c2Npr+1(0)/x and Npr(0) -k oo. Lemma 1.6.2. oo, then
oo, 0 = 2T/n -* y, 0 < y < 1, and r = r(n, N) -
If n, N
NPr = Npr(B)c(1 - c)-1(1 + o(1)),
where c = yet-Y. Proof. It is clear that 00
NPr = Npr(0)Y
Pr+s(0)
s=1
Pr(B)
and
p(0)
Pr(0)
_
(
r )5/2 (xe)s(1+O(1/r)).
r+s
Moreover, there exist constants c3 > 0 and q < 1 such that Pr+s(0)/Pr(e) < c3(xe)s < c3gs.
Therefore the series YO 0_1 p,+,(9)/pr(0) converges uniformly and we can pass to the limit under the sum so that 00
00 CS
T pr+s(e)/pr(e)
57
S=1
s=1
= 1 c c
1.6 Maximum size of trees in a random forest
Lemma 1.6.3.
If n, N
oo, 0 = 2T/n
53
1, and N(1 - 0)3
oo, then for
any fixed z,
NPr --). e-Z,
where r is an integer such that per = u + z + o(1), P
log(0e1-e), and u is
the root of the equation
(2) 1/2 N0312 = u5/2eu Proof. It is clear under the conditions of the lemma that 0 = - log(9e1-B) - 0 and u -t oo, since N18312 - oo by virtue of the condition N(1 - 0)3 - oo. We apply Stirling's formula and obtain 00 k2k NPr = Nk-E1 k!B(x)
(
NAB
(n/
r+
Pk5
1/2
ll
3/2
(I+o(!))-
k>r
r
The sum Y(Pk)-5/2e-1k$
k>r
is an integral sum of the function f (y) = y by the corresponding integral: E(Ok)-5/2e-1kO
=1
5/2e-Y with step
0) Y-5/2 e-Y dy(1
r
k>r
0 and is approximated
+ 0(1))
c
=
(r$)-5/2e-rl (1
+ o(1)).
Therefore 1/2
(2
NPr =
-1 n JJJ
NN312(ro)-512e-rO(1 + o(1)).
By definition, r$ = u + z + o(1) and
(2) 1/2 NN3/2 = u5/2eu Substituting these expressions into (1.6.8) yields
NPr = e-Z(1 + o(1)).
Now we are ready to prove the theorems of this section.
(1.6.8)
The generalized scheme of allocation and the components of random graphs
54
Proof of Theorems 1.6.1-1.6.3. By applying Lemma 1.6.1, we find that under the conditions of Theorem 1.6.1,
(1 - Pr)N -
(1 - Pr-1)N _+ 0,
(1 - Pr+l)N -+ 1
as N - oo. These relations, together with (1.6.5), whose proof is pending, imply the assertion of Theorem 1.6.1. Let
a=
log n - i log log n
0-1-logo
'
and choose r = [a] + k, where k is a fixed integer. Under the conditions of Theorem 1.6.2, r = [a] + k -> oo and according to Lemma 1.6.2, NPr = Npr(o)c(1 - c)-1(1 + o(1)),
where c = ye1-y. It is easy to see that 2W-2er-lerO
ner(1-B+1ogO)
Npr (0) =
n
r!(2 - o)
(1 + 0(1))
(y - 1 -logY)5/2e-(k-{a))(y-l-logy)(1
Y 2n
+O(1)).
Thus
NPr -
(y - 1 - log y)512Ce (k-{a})(y-l-logy)(1
y(1-c) 2n
+ o(1)),
and consequently,
1 - log
(1 - Pr)N = exp (
(ey-1
y5/2
- y) -/27r
e
_(k-(a))(y-l-logy)
0+00)).
Under the conditions of Theorem 1.6.3, Lemma 1.6.3 shows that NPr
e_Z
and
(1-Pr)N-, e-e z. Thus, to complete the proof of Theorems 1.6.1-1.6.3, it remains to verify (1.6.5) under each set of conditions. Since ON -* oo and N(1 - 0)3 oo, by Theorem 1.4.1 the random sum N is asymptotically normal, and
PRN = n} =
1
a(0) 2 N
0+00)),
1.6 Maximum size of trees in a random forest
55
where
or 2(e) = D1 =
29
(1-0)(2-0)
1/2for
While estimating the asymptotic behavior of (1 - Pr)N in Lemmas 1.6.1-1.6.3,
we determined the choice of r. We now prove the central limit theorem for the sum (r)
these choices of r. Set BN = The characteristic function of the random variable fi - m (0), where m (0) _ Q(9)N.
E41, is -itm(o)
1 - Pr
r
E
pk(0)eitk
= e-'
1 - Pr
k=1
(t) -
Pk(6)eitk
k>r
where cp(t) is the characteristic function of the random variable 1. Hence, the characteristic function cpr (t, 9) of the random variable Nm (9))/BN can be written e(1itNm(o)/BN
tPr(t> 8) =
Pr)N
(BNt ) (1
('P
N
-
Pk(0)eitk/BN(1 +0(1)))
k>r
According to Theorem 1.4.1, the distribution of the standard normal law, and consequently, e-itNm(o)/
Nm (9))/BN converges to
coN(t/BN) -+ e-t2/2
(1.6.9)
It is clear that
E k>r
pr(6)eitk/BN
= Pr +
pr(0)(eitk/BN
- 1) = Pr + 0
k>r
1
kpr(9)
C BN k>r
and it is not difficult to prove in each of the three cases that
1B Y'kpk(0) = o(1/N). N
(1.6.10)
k>r
Estimates (1.6.9) and (1.6.10) imply that for any fixed t,
(Pr Q, 0) ,
e-tz/2,
and the distribution of (-(r) - Nm(9))/BN converges to the standard normal distribution. The local convergence
n} =
1
a(9) 2nN
(1 +0(1))
needed for the proof of (1.6.5) can be proved in the standard way.
Thus the ratio in (1.6.5) tends to 1, and this, together with the estimates of (1 - Pr)N, completes the proof of Theorems 1.6.1-1.6.3.
The generalized scheme of allocation and the components of random graphs
56
To prove Theorem 1.6.4, the following lemma is needed.
If N -+ oo, the parameter 6 in the distribution (1.6.1) equals 1, N113(1 - 2T/n) -a v, and r = zN213, where z is a positive constant, then
Lemma 1.6.4.
bN2/3P{r;. =n} = f(z,v)+o(1), where z-3/2
f(z, y) = exp
I
2, I
(Po'; 3/2, -1) + E s=1
s
3
1
Is(z, y)
s!
and Is (z, y) is defined in Theorem 1.6.4.
Proof. As N
oo, 2kr-2e-k
Pk = Pk (1) =
ml/2 k-5/2(1
=
k-
(1.6.11)
uniformly in k > r. It is clear that
itk
1
k
1
(bN2/31
exp I bN2/3 I _ b3/2N
k512
-5/2
exp
itk {
bN2/3 1
1
bN2/3
The last sum is an integral sum of the function y-5/2 e"Y with step 1/(bN2/3); hence, 1
k5/2
p exp
itk
_
1
b3/2N
y-512 e i
J
dy+o(1)).
(1.6.12)
Set
f
3
H(t, z) =
°O
Y-5/2 e'ty dy.
Then
H(t, z)II(0, z) =
4 fy
2 -3/2
eo
3
5/2 dy =
(1.6.13)
Taking into account b = 2(2/3)2/3, we obtain from (1.6.12) and (1.6.13) that pk exp k>r
{
itk bN2/3 J _
-
2
1/2
n/
1
k>r
k/2/2
itk
exp { bN2/3
H(t, z) + o(1)
N
1+0(1) N (1.6.14)
In particular,
NPr = H(0, z)(1 + o(1)).
(1.6.15)
1.6 Maximum size of trees in a random forest
The characteristic function cpr(t, 1) of the random variable can be written
cpr(t, 1) =
(p (b2/3' 1) - exp
J_b2i2t13
57
- 2N)/(bN2/3)
tk>r
k
jPkexplbN2/3I
2 in this
where cp(t, 1) is the characteristic function of 41 Note that case. It follows from (1.6.13), (1.6.14), and Theorem 1.4.2 that
\ (Pr (t, 1) =
Nt
cPN(
213'
1I
1
-
(
l
PkexP j
bN2 3 1
k>r
N
(1+0(1) )) (1 - Pr)-N
))N (t)H(Nz)+0Nx
(1-H(0,z)+0(NII_N where *(t) is the characteristic function of the stable distribution with density p(y; 3/2, -1). Thus, for any fixed t, as N -+ oo,
cpr(t, 1) -a g(t, z) = ,fi(t) exp{-H(t, z) + H(0, z)). The function g(t, z) is continuous; therefore, by Theorem 1.1.9, it is a characteristic function. Since Ig(t, z) I is integrable, it corresponds to the density
f(z, Y) =
27r
f
00
e_ityg(t, z) dt.
The span of the distribution of fir) is 1; therefore, by Theorem 1.1.10, the local convergence is valid. Thus it remains to show that f (z, y) has the form given in Theorem 1.6.4. Representing e-H(t,z) by its Taylor series gives
f(z, y) = eH(0,z)
00
(-1)s s.i
f (z. v).
(1.6.16)
S=O
where 00
fs(z, Y) =
27r
e-'ty*(t)Hs(t, z) dt.
J 00
It is easy to see that the function 2,1-i-rz3/2H(t, z) is the characteristic function of the distribution with density Pz(Y) = 2Z3/2y 5/2,
y > Z.
(1.6.17)
58
The generalized scheme of allocation and the components of random graphs
Therefore the function (2/-7r-z3/2)'
* (t) Hs (t, z)
is the characteristic function of the sum ,B + 01 + + j of independent random variables, where 0 has the stable law with density p(y; 3/2, -1) and 01, . .., 0,s are
identically distributed with density pz(y). The density of the sum,B+fil + +is is
hs(Y) = (Zz3/2)SIs(t, y), where Is (t, y) is defined in Theorem 1.6.4. Thus 00
if e_1t /i(t)Hs(t, Y) dt = 2n
(3)s 4
Is (t, Y) -
When we substitute this expression into (1.6.16), we find that
f(z, y) = eH(o'z)
00
1
sE =O
-s!
(--) s Is(t, Y)
(1.6.18)
Taking into account (1.6.15), Theorem 1.4.2, and (1.6.18), we see that Theorem 1.6.4 follows from (1.6.2). To prove Theorem 1.6.5 with the help of (1.6.2), we need to know the asymptotic behavior of large deviations of P { N = n }. We give that information without proof (see [28]).
Lemma 1.6.5. If n, N
oo, the parameter 0 in the distribution (1.6.1) equals 1,
N(1 - 2T/n)3 - -oo, and r = n - 2N - bzN213, where z is a constant, then (Y)
_
2
n} _ Gr )
1/2
N (n - 2N)5/2
00
p(y; 3/2, -1) dy (1 + o(1)). (1.6.19)
The assertion of Theorem 1.6.5 follows from (1.6.19), Theorem 1.4.2, and the fact that NP,. - 0 under the condition of Theorem 1.6.5.
1.7. Graphs with unicyclic components A graph is called unicyclic if it is connected and contains only one cycle. The number of edges of a unicyclic graph coincides with the number of its vertices. Let U, denote the set of all graphs with n vertices where every connected component is unicyclic. Any graph from U, has n edges. In this section, we study the structure of a random graph from U,, . We follow the general approach described in Section 1.2. As usual, denote by u the number of graphs in U,,; we will study u as n 00. Let b,, be the number of unicyclic graphs with n vertices, and bn the number of
1.7 Graphs with unicyclic components
59
unicyclic graphs with n vertices, where the cycle has size r. The cycle of a unicyclic graph is nondirected; in other aspects, a unicyclic graph is similar to the connected
graph of a mapping of a finite set into itself. Let do be the number of connected graphs of mappings of a set with n labeled vertices into itself, and d(nr) the number of such graphs with the cycle of size r. It is easy to see that dnl)
=
bnl)
= dnl),
nn-1,
d(2) = 21 n n
Inn-3 = nn-1 - nn-2
bn2) = d(2)
bn') = dn'')/2,
2
r > 3.
Introduce the generating functions 00 d(x)=EdnXn
bnxn
B(x)
n=1
n!
n=1
00
c) = n=1
ni .
nn-2xn n!
These functions can be represented in terms of the function 00 nn-1xn 0(x) _
57,
,
n. .
n=1 0e-e
= x in the interval [0, 1]. This function was which is the root of the equation used in Section 1.4. Taking into account the notation introduced here and using the results of Section 1.4, we see that
d(x) _ -log(1 - 0(x)),
c(x) = 122(1 - (1 - 0(x))2 .
Since bn = bnl) + + b(nn), we have 00 bnxn 1 00 dnxn
B(x) =
n=1
n
=
2
n=1
1
n!
dnl)xn
1
+2
n=1
n!
dn2)xn
1
+
2
n=1
n
1
= 2d(x) +0(x) - -c(x) 2 log(1 - 0(x)) + 0(x) - 4 (1 - (1 - 0(x))2).
(1.7.1)
In accordance with the general model of Section 1.4, let us introduce independent
identically distributed random variables 1, ... , N for which
k} =
bkxk
k! B(x)'
k = 1, 2, ....
(1.7.2)
The number of graphs in U, with N components can be represented in the form
un N _ n! Nt
bnl ... bnN
E n1!...nN! nl+ +nN=n
n! (B(x))N N!xn P{ l + ... + N = n}. (1.7.3)
The generalized scheme of allocation and the components of random graphs
60
In what follows, we choose
x = (1 -
1/,-)e1-1/fn-.
Theorem 1.7.1. As n --> oo,
un =
nn-1/4(1
27re314
21/ f(1/4)
+0(1)),
where 00
r(P) = jxP_1e_xdx is the Euler gamma function. Before proving Theorem 1.7.1, we will prove some auxiliary results.
Lemma 1.7.1.
Forx=(1-1/,fn-)e1-11'1
(1-B(xe`t))2= I -2it+s1(t)+82(t,n), n
where El (t)/t -+ 0 as t --* 0 uniformly inn and I s(t, n) I < 21 t I / J. Proof. We found in Section 1.4 that kk-2wk
u(w) = (1 - O(w))2 = 1 - 0"2 E ki
= 1 - 2c(w),
IwI < e-1.
k=1
When we write u(w) as u(e-1+`r)
u(xe") =
+
1
n
+ ez(t, n),
(1.7.4)
it is clear thatO(x) = 1-1//and6(e-1) = 1;therefore u(e-1)-u(x) = -1/n. With this equality and the observation that x < e-1, we obtain the estimates
182(t, n)I = l u(xe`t) - u(e-l+it) - 1/nl xk)(eitk
kk-2(e-k
=2
-
- 1)
kl
k=1
00 kk-1(e-k - xk) I/t I < 2E kl
k=1
= 21t1(e(e 1) - e(x)) = altll,,,fn.
(1.7.5)
1.7 Graphs with unicyclic components
61
The function u(e-l+it) has the first derivative -2i at the point t = 0; thus, as
t - 0, u(e-l+")
= -2it +o(t).
(1.7.6)
The assertion of the lemma follows from (1.7.4), (1.7.5), and (1.7.6).
Lemma 1.7.2. I fn -+ oo, N = a log n + o (log n), where a is a positive constant, then
+N=k}= 2ai'(a) za-le-z/2(1 + o(1))
nP{ll+
1
uniformly in k such that z = k/n lies in any fixed interval of the form 0 < zp <
z
ap(t) = B(xe't)/B(x) Lemma 1.7.1 and equation (1.7.1) give
4B(xe"/") = - log
1 - 2it n
+ 3 + o(1).
Therefore
O
(t 1
n)
-
B(xeit/")
- logn - log(1 - 2it) + 3 + o(1)
B(x)
I-
log n + 3 + o(1)
log(1 - 2it) + o(1) logn
and if N = a log n + o(1), then for any fixed t, PN(t) = O N (t/n)
= 1-
and the distribution of (il + density
log(1 - 2it) + o(l)) N
logn
J
_
1+0(1) (1 - 2it)a'
+l; N)/n converges weakly to the distribution with 1
za-1 e-Z/2
2a IF (a)
that is, to the chi-square distribution with 2 degrees of freedom, which corresponds
to the characteristic function (1 - 2it)-a. The local convergence can be proved in the usual way by using Lemmas 1.12.31.12.7 from [78].
62
The generalized scheme of allocation and the components of random graphs
Let un,N be the number of graphs in un with N components and 1
n = 4logn, Lemma 1.7.3. If n Un,N
N-n
u=
oo, then
2n e3/4 nn-114A e-x 0+00)) - 21/4x(1/4) N!
uniformly in N such that I u I < (log n)1/4
Proof. It is clear that n!
Un,N = N!
bn1...NN
L
n 1+...+nN=n
n1 t ... nN!
n! (B(x))N N!xn
4N = n}.
(1.7.7)
By putting a = 1 /4 in Lemma 1.7.2, we obtain
n} = 21/4T(1/4) e-1/2(1 + o(1))
(1.7.8)
uniformly in N when I u I < (log n)1/4 The assertion of the lemma follows from (1.7.7) and (1.7.8), since
B(x) =
4logn + 4 +o(1),
xn = e-n-1/2(1 + o(1)), (B(x))N = ),n e314(1 + o(1)).
(1.7.9)
The assertion of Theorem 1.7.1 can be obtained by summing un,N over N. Lemma 1.7.3 estimates un,N for N close to The following lemmas give estimates of un,N for the other values of N needed in the proof.
For any fixed ao, al, 0 < ao < al < oo, there exists a constant cl such that for ao log n < N < al log n,
Lemma 1.7.4.
N
Un,N < cln n-1/4__
N!
Proof. It follows from Lemma 1.7.2 that there exists a constant A such that nP{i;l-{
(1.7.10)
1.7 Graphs with unicyclic components
63
for a0 log n < N < a i log n. Indeed, if (1.7. 10) did not hold, then a sequence of the
parameters n - oo, N = a log n +o(log n) would exist for which the assertion of Lemma 1.7.2 would not be true. Lemma 1.7.4 then follows from (1.7.7), (1.7.9), and (1.7.10). Lemma 1.7.5.
If N < log n, then there exists a constant c2 such that
dm=(m-1)!57Ink k! k=0
Indeed, since the number of forests with n nonroot vertices and N rooted trees labeled 1, ..., N is N(n+N)n-1, the number dm(r) of connected graphs of mappings of an m-set into itself with the cycle of size r can be represented as
/ I
m!mm-r-1
m)(r- 1)!rmin-r-1 = r (m-r)!
Here ("r) is the number of possible choices of r vertices that constitute the cycle;
(r - 1)! is the number of cycles that can be constructed from r vertices; and rmm-r-1 is the number of forests with r cyclic vertices as the roots. Hence,
m mmm-r-1 dm - >dmr)E m
r=1
r=1
(m
in-1 Ink
=(m-1)! ki.
r)!
k=0
As in -+ oo, dm = 11(m - 1)!em(1 + o(1)), and there exists a constant c3 such that
b,,, < dm < c3(m - 1)!em.
Moreover, B(x) = logn(1 + o(1))/4 and xm < e-m for all in > 0. Therefore there exists a constant c2 such that
m} < It is clear that N
U i=1 k>[n/N]
C2
mlogn
(1.7.11)
The generalized scheme of allocation and the components of random graphs
64
Since P141 = k) decreases as k increases, we have
k>[n/N]
< NP141 = [n/N]} E
n - k}
k>[n/N]
[n/N1}. The lemma now follows from (1.7.11).
Lemma 1.7.6. For N < log n, un,N
c4nn- 1 I log n
;'N e-A n
N!
'
where c4 is a constant.
This lemma follows from (1.7.7), (1.7.9), and Lemma 1.7.5.
Proof of Theorem 1.7.1. Roughly speaking, un,N = ckn e- /N!, where c does not depend on N, and to obtain un, we sum the Poisson probabilities whose sum is 1. To do this rigorously, we divide the sum 00
un = Y un,N N=1
into four parts. Recall that u = (N - An)/ Xn. Let S1 = Y. un,N,
S2 = E un,N,
S3 = E Un,N,
S4 = E Un,N,
Al
A2
A3
A4
where
Al = {N: Jul < (logn)1/4},
A2 = {N: Jul > (logn)1/4, aologn < N < al lognj,
A3 = {N: N
al logn}. As n -f oo,
Ne ) Al
N!
1 +o(1);
therefore it follows from Lemma 1.7.3 that
Si =
2 re3/4 nn-1/4(1 + o(1)). 21/4F(1/4)
(1.7.12)
1.7 Graphs with unicyclic components
It remains to show that S2, S3, and S4 are
65
o(n1-1/4). Lemma 1.7.4 implies that
S2 < c1nn-1/4
N!
Al
o(nn-114) and it follows from (1.7.12) that S2 = To obtain an estimate for S3, we use the inequality
Xme-X
),Ne-A.
<m
N!
1
m!
which is true for m < X. Choose ao < 1/4 such that
ao - ao log ao - ao log 4 < 1/8. Then, for in = ao log n, m
_ne-fin
C5
M!
n118'
where C5 is a constant. By using the estimate from Lemma 1.7.6, we find that S3
-< C4C5n'- 1/4-1/3 log n. To obtain an estimate for S4, we use the inequality Un,N
< n! (B(x))N < C6nn-1/4n'ne -
N! N! xn where c6 is a constant, which follows from (1.7.7) if P{l;i + replaced by 1. Form > A, ;,Ne-x
1:
N>m
N!
+ l;N = n} is
< m!Xm
Chooseai > 1/4such thatal-ailogai-allog4 < -2.Thenform =ailogn and An = (logn)/4, we have the estimate An /m! < n-2; thus (1.7.13) implies C6nn-514 that S4 < The assertion of the theorem follows from the estimates obtained for Si, S2, S3, and S4.
We denote the number of components in a random graph of U by xn. The following theorem is a direct corollary of Lemma 1.7.3 and Theorem 1.7.1. Theorem 1.7.2. As n -+ oo,
P{xn = N) =
2
n log n
e-x`2/2(1 +o(1))
uniformly in N for which u = (N - 4logn)/ 4logn lies in any fixed finite interval.
The generalized scheme of allocation and the components of random graphs
66
Indeed, Lemma 1.7.3 and Theorem 1.7.1 imply that
P{xn = N} = un,N = un
-0+00))
one-fin
N!
uniformly in I u I < (log n)114, where An = a log n. We now consider the maximum size Pn of the components of a random graph from U, .
Theorem 1.7.3. If n -* oo, then for any fixed y, 0 < y < 1, s
P(3n < yn} _
E (dsl i
W, (1, Y) + o(1),
0<s<1/y
where Wo(x, y) = 1, and for s = 1, 2, ... , Ws(z, Y) =
dxl ... dxs
Ji>
<
x1...xs(z-x1-...-xs)314
Proof. To study fin, we use the general approach of Section 1.2. Let ill, ... , t1N be random variables with distribution P{r71
I
(1.7.14)
It follows from (1.7.7) that these variables can be interpreted as the sizes of the ordered components of a random graph from U,, (see Section 1.2), in which xn is N. Therefore 00
P{fin < yn} _
P{xn = N) P{'1(N) < yn},
(1.7.15)
N=1
where 0 < y < 1 and t)(N) = maxi
P{rl(N)
where 1, ... , N are independent identically distributed random variables for which
k) =
k I l < yn),
and the random variables 1, ... , N have distribution (1.7.2). We now estimate
Hyn(t) _ k>yn
1.7 Graphs with unicyclic components
67
for x =(I - 1/fn-)e-1+1/ J. By (1.7.1) for any fixed y, 0 < y < 1, as n - oo, Hyn (t) =
I 2
Y
bkxk itk/n e
k!
k>yn
+0(1).
Let us prove that
Hyn (t) = H(y, t) + o(1), where
H(y, t) =
°O
1
u-le-(1 -2it)u/2
1
du.
4 JY
It is easily seen that
(1 - l/ / )kekl` = e-kl(2n)(1 +0(1/.)),
` k1
M=0
m -k k m!
=
2+0 11')
uniformly in k > yn. Therefore, as n -+ oo, 1
Hyn(t) =
1-
k>yn k
-
1 )k
(k
exp
+
itk 1 ; kme-k n
L. m l m=O
` ke-(1-2it)k/(2n)(l+o(1/J ))
=
2 k>yn
This sum is an integral sum of the function
Hyn(t ) = 2 1
j
00
u-1e-(1-2it)u/2
u-le(1-2it)u/2 with
step 1/n. Hence,
du + o(1) = H(y, t) + o(1).
Y
In particular, we obtain the following estimate for the tail of the distribution (1.7.2):
yn} =
=
1
r
bkxk
B(x) k` k! Y 4Hyn(0) + o(1)
4H(y, 0) + o(1)
log n
log n
(1.7.17)
as n - oo. The character-
We now find the limit distribution of the sum
istic function of 1/n is
*(0 - to(tln) - Hyn(t)/B(x) 1 - Hyn(0)/B(x)
Using the estimates
tp(t/n) = 1 - log(1 - 2it)/ logn + 0(1/ log n),
4B(x) = logn + 0(1),
68
The generalized scheme of allocation and the components of random graphs
from (1.7.16) and (1.7.17), as n -+ oo, yields
log(1 - 2it) - 4H(y, t) + 0(1) fi(t) - 1 _
(
logn
and for any fixed t and N =
1 - 4H(y, 0) + o(1)
)(
logn
1
)
4log n + o(log n), -2it)-1/4e-H(v,t)+H(v,o)
N(t) _+ wv(t) = (1
e-H(y,t) into its Taylor series, as we did in the proof of Lemma When we expand 1.6.4, we find that the characteristic function cpy (t) corresponds to the density
(_1)s
eH(v,o)-z/2
fv(z) = 21/4F 1 l4 (
) ` 45s!
Ws(z, Y)
o<s<1/v
Thus, for any y, 0 < y < 1, the distribution of +4N)/n converges weakly to the distribution whose density is fy (z) as n -± oo and N = log n + o(log n). 4 We can show that local convergence of these distributions holds. If n oo and N = 4log n + o(log n) and 0 < y < 1, then
fy(z)+o(1)
(1.7.18)
holds uniformly in k for which z = k/n lies in any given interval of the form
0 oo and N =
log n + o(log n), a
yn})N = (1
- 4H(y,o)
o(l))N
= e-H(v,o) +o(1).
(1.7.19)
Substituting estimates (1.7.19), (1.7.18), and (1.7.8) into (1.7.16) gives P{ri(N) <_ yn}
=o<s
(-1)s W5(1, Y) + o(1).
(1.7.20)
To obtain the distribution of fin, we need to average the distribution of ii(N) with respect to the distribution of xn. By Theorem 1.7.2, the number of components xn is asymptotically normal with parameters (I log n, 4log n), and for N = log n + 4 o(logn), the probability P(17(N) < yn) is asymptotically constant; therefore the assertion of the theorem follows from (1.7.15). Denote by Un,2 and Un,3 the sets of all graphs with n labeled vertices consisting
of unicyclic components where each cycle has more than one or more than two vertices, respectively. It is not difficult to see that we can treat Un,t, i = 2, 3, in the same way as un (which, following the above notation, we have to denote by
1.7 Graphs with unicyclic components
69
U,, 1). The role of B(x) forU,1, i = 2, 3, is played by the generating functions 00 bn ixn
Bi(x)=T
,
n!
n=1
i=2,3,
where bn,i is the number of unicyclic graphs with n vertices and cycle lengths not less than i. It is clear that
r
bn,2 = 0"
00
bn
r=2
= dn+ 2
d(r)1
r=3
B2(x) = -1c(x) + d(x) _ -4 (1
and forx = (1 -
- (1 - 9(x))2) - 2 log(1 - 9(x)),
1/,/n`)e-1+1/,1n-
1
B2(x) = 4logn -
1
4
+o(1),
(B2(x))N = ,ln e-1/4(1 +0(1)). Similarly, 00
00
bn, 3 = Y' bnr> = r=3
2
dnr>
>
r=3
B3(x) = 2 d(x) - 9(x) + c(x)
- 2 log(l - 9(x)) - 9(x) - 2 (1 - (1 - 0(x))2),
and for x = (1 -
(1.7.21)
1/,fn-)e-1+11,1n-
1
B3(x) = 41ogn -
3
4
+o(1),
(B3(x))N = An e-3/4(1 +o(1)). Therefore, if n -* oo, then for the numbers un`i of the graphs in Lfn,i and for the number uni N of such graphs with N components, we have
un`) = Ainn-1/4(1 + o(1)),
un' N = Ai n e
nn-1/4(1 + o(1))
uniformly in the integers N such that IN -,ln I / .ln lies in any fixed finite interval,
The generalized scheme of allocation and the components of random graphs
70
where 27 e3/4
l = 21/4r(1/4),
2 re-114
2
e-314
A3 = 21/4r(1/4).
2 = 21/4r(1/4),
(1.7.22)
Theorems 1.7.2 and 1.7.3 are valid for the random variables xn and f3n in Un,2 and U,2.
1.8. Graphs with components of two types The generalized scheme of allocation can be used in the investigations of random graphs with nonhomogeneous structure. Consider the set An,T of all graphs with n vertices and T edges where each connected component contains no more than one cycle. As usual, we assign equal probabilities to the elements of An,T and
consider a random graph with values from An,T. Since any graph from the set An,T consists of trees and unicyclic components, we can use the results of the previous sections to study various characteristics of a random graph from A,,T. Consider first the number of elements in A,,T. As in the previous sections, we will denote by an the number of graphs under consideration with n vertices and by bn the numbers of connected graphs under consideration with n vertices. Instead of A,,T, we will use, where necessary, the notation An 'T if cycles of lengths 1 and 2 are allowed; An2T if cycles of length 1 are forbidden; and An3T if cycles of lengths 1 and 2 are forbidden. Denote the number of graphs in An` T by an` T and preserve the notation an,T if the specialization is not needed. In accordance with the previous sections, the number of forests with n vertices, T
edges, and N = n - T trees is denoted by Fn,N. We use un' to denote the number of graphs with n vertices and unicyclic components if they are included in A('),T, i = 1, 2, 3, and preserve the notation un for the number of such graphs in An,T if the specialization is not important. It is clear that n
an,T = m= 0
Theorem 1.8.1. If n, T
n m
umFn-m,N
oo such that T/n
0, then n2T
an,T = Fn,N(1 + o(1)) =
2T T j
(1 + o(1)).
Proof. It follows from Theorem 1.7.1 that there exists a constant cl such that
um < Clmm-. 1/4
(1.8.2)
1.8 Graphs with components of two types
71
Theorem 1.4.3 shows that under the conditions of Theorem 1.8.1, 2T
Fn,N
=
2T T!
(1 + o (1))
(1
.
.
8 3) .
The condition T/n -+ 0 implies that (T - m)/(n - m) --> 0 uniformly in m, 0 < m < T. Therefore, under the conditions of Theorem 1.8.1, there exists a constant C2 such that Fn-m'N
m)2(T-m)
C2 (n -
(1.8.4)
< 2T-m(T -m)!
forallm,0<m
n
an,T = Fn,N + _
um Fn-m,N
M=1 T
2eTn
= F, ,,N
T)2
)'))
/m)).
(1.8.5)
This completes the proof because 2Tn/(n - T)2 -> 0. Let Wn,T be the number of vertices contained in the unicyclic components of the random graph in An, T. It is easily seen from Theorem 1.8.1 that if n, T - oc
and Tin -+ 0, then P160n,T = 0) - 1, and the limit distributions of the number of trees of fixed sizes in a random graph from A,, T coincide with the corresponding limit distributions in a random forest and are described in Theorems 1.5.1 and 1.5.2; the limit distribution of the maximum size of trees in a random graph from A,,T is given in Theorem 1.6.1.
Now let n, T -* oo such that 0 = 2T/n - A, 0 < A < 1. According to Theorem. 1.4.3, under these conditions, n Fn,N =
If n, T
oo, 2T/n
2 T 1l 2T T!
(1+0(1))-
(1.8.6)
A, 0 < A < 1, and m = o(n), then by Theorem 1.4.3,
Fn_m,N =
(n -
m)2(T-m)
1 -,l
2T -m (T -m)!
(1+00)).
(1.8.7)
Since 0 = 2T/n -* A, 0 < A < 1, implies 2(T - m)/(n - m) < 0, there exists a constant c such that
c(n 2T-m(T - m)!
m)2(T-m)
Fn_m,N <
(1.8.8)
72
The generalized scheme of allocation and the components of random graphs
In subsequent proofs, we will use a cumbersome technical estimate given in the following lemma.
Let n, T -+ oo and let there be constants Ao and k1 such that 0 < A0 < 9 = 2T/n < X1 < 1. Then Lemma 1.8.1.
(1- T)...(1-m T I (1-
Cn,T(m) = e2Tm/n
n2(T-m)
1
1)...(1-mn
x(1- n
1)
(1.8.9)
< 1,
where mo < m < T and mo is sufficiently large.
Proof. Write the logarithm of cn,T(m) as
log cn,T(m) =
2Tm
+
n
it
m
1 - T + 2(T - m) log (1 - n
log i=1
M-1
+
log
1-
i=1
00
n
m-1
1
00
k
k=1
00
+Y k=1
kTkl -
k
k=2
i=1
00
2m m lk
(n/
k
2T m lk 1
k=1
(n/ m-1
1:
tk
i=l
Using (m - 1)k+1 i=1
ik >
k+1
we obtain the estimate 00
logcn,T(m) < k=1
(
2m m k k (n )
00
k=1
2T
k+1 (n) (m - 1)k+1
(m - 1)k+l
k=1 00
m k+1
k(k + 1) Tk
k=1
k(k + 1)nk
Mk+l
1k-1(k(k + 1)nk 1
(1-M
(2(k + 1) -
2Tk n
/ 1 )k+l)) Tk-(1-M
k+1 nk
1.8 Graphs with components of two types
73
To prove the assertion of the lemma, we note that for sufficiently large m,
/
k+l / ek-I1-m11 <0
1 \k+1 2k
ck=2(k+1)-Bk-11-m I
for all k. Indeed, since 0 < Xo < a < X1 < 1, for sufficiently large m, 1
)k+ l 2k
1-m
ek >
2k,
k > 1,
and therefore
ck <2(k+1)-2k, which implies that ck < 0 for all k > 3 and sufficiently large m. In addition,
(
Cl
4 2 =4-8- r1-m122 e- 1-m/f112< 3-9-e2 e++m,
C2
4 ( =6-29- 1-m1\3 J 021-m1)3 <5-20
4
4
4
B2+B2m+rn
C
and Cl < 0, c2 < 0 for sufficiently large m, since for 0 < Xo < 0 < XI < 1,
3-6- 2
4
5-26-B2 <0.
<0,
Let bn,i be the number of connected unicyclic graphs with n vertices that belong AT, i = 1, 2, 3. If this specification is of no significance, we write bn for the to A;,'
number of connected unicyclic graphs. Let an,T (k) be the number of graphs in An,T with exactly k cycles. It is clear that 00
an,T (k) =
mI . lienl
n 1 Y (m)Fnm,N m=k
...bmk (1.8.10)
m
1
+ +m k =m
As in Section 1.7, let bnxn
00
B(x) _ n=1
n! (1.8.11)
bn
Bi(x) =
ixn
n=1
and set
x = Be-. B
1, 2, 3,
The generalized scheme of allocation and the components of random graphs
74
For such x, according to (1.7.1) and (1.7.2),
B1(x) _ -21og(1 -0)+O- 4(1 - (1 - 9)2) _ -Z log(1 -0)+ 210+102,
4B2(x)
_ -Z log(1 - 9) - 4(1 - (1 - 9)2)
_ -?log(1-9)-29+492, B3(x) _ -21og(1-9)-9+4(1-(1-9)2)
_ -2log(1-9)-29-492. Theorem 1.8.2. If n, T -+ oo such that 9 = 2T/n -+ A, 0 < A < 1, then for any i = 1, 2,3 and anyfixed k = 0, 1, ... , a,W (k)
=
n2T r I_
2TTi k!
Ak
` 0+00)),
where an T (k) is the number of graphs in An` T with exactly k cycles, and
Al = 2-og -.k +-+2 4 a,2
1
2
A2=-Zlog(1-k)-2+ 4 X
A3=-2log(1-A)-21
,l2 2
4
Proof. We partition the first sum of (1.8.10) into two parts, S1 and S2. We set M = T 1/4 and include in S1 the summands with m < M. For any x from the convergence domain of the series (1.8.11), the estimate
E m>M
bmI mix ... bm mt+...+mk=m
mk
< (B(x)
x
MI! ... Mk!
)k-1
"' m>>M/k m!
m
(1.8.12)
holds. As in Section 1.7, let do be the number of connected graphs of single-valued mappings of a set with n elements into itself and let
dmxm
d(x) _
m=1
m!
Since
m-1
bm
k=o
mk
<(m-1)!em
1.8 Graphs with components of two types
75
(see the proof of Lemma 1.7.5), the estimate bmxm
. (ex)m
m>M/k m'
m>M/k
holds. Recall that we chose x = Oe-e. According to the hypothesis of the theorem,
0 = 2T/n - A., 0 < A < 1, and there exists q < I such that ex = Oet beginning with some n. Therefore bmxm
m>M/k
rr
1
<
m!
1-q
q
M/k
_B
(1.8.13)
Taking into account estimates (1.8.8), (1.8.9), and Lemma 1.8.1, we find that I S2
k! L1
(n)Fn-m,NM !bml...b
Mrnll...ynk!
L1
m>M ml+...-fmk=m
C
k
n! (n - m ) 2(T- m) bml ... bmk
v" m>M
ki .2 T Ti.
m (1-n) Cn 2T
k12TT!
e
L_.
m>M ml+"'+mk=m 2T -2m
m(
n
n1
1 (l_m_1bm'mk
1-T 1
E
1-T
ml! . . . mkt
bml...bmk
(Oe_B)m
>
mM ml+' +mk=m m
2T
< k!2TT!(B(x))k-1
MI!...Mkl
E bM!
m>M/k C2n2T
q Tl/4
k!2TT!(1-q)
/k ,
where cl, C2 are some constants. Thus, under the conditions of the theorem, S2 = O(n 2T/(2TT!)).
We now estimate the sum Si. According to (1.8.8),
T! Fn-m,N -
T ! (n
2T m(T - m)!
- n2Txm
1-
2Tnm
uniformly in m < M = T t/a
0+00))
(1 +O(1))
76
The generalized scheme of allocation and the components of random graphs
rE Er
Therefore, for any fixed k = 1, 2, ... , 1
S1
= kl
M!
m<M nll+...+mk=m
n2T
In
M
X
1
(n)bml
k! 2TT!
F
...bmk
mlt ... ynkl
bmlx"'1 ... bkx"'k (1+0(1)).
m1!...Ink!
m=k
Taking into account the estimate of S2, we obtain
Sl -
n2T
1-
k! 2 T T! M=k
X
ml+...+mk=m
bmlxml ... bmkx"'k (1 + 0(1)) + 0(1) yn11...nrkl
n2T 1- (B(x))k (1+0(1)). 2TT!k!
Combining the estimates of Sl and S2 yields 2T n an,T (k) = k!
2T_
(B(x))k(1 + o(1))
T!
9e-0
under the hypothesis of the theorem. Since x =
B1(x) -* Al,
Theorem 1.8.3.
B3(x) - A3.
A2,
oo such that 0 = 2T/n
If n, T an`T
B2(x)
-k Ae-X, we also have
n2T2T/T!1-J
=
eA',
A, 0 < A < 1, then
i = 1, 2> 3,
where Ai, i = 1, 2, 3, as in Theorem 1.8.2. Proof. To obtain the asymptotics of an,T, we have to estimate the sum 00
an,T =
(1.8.14)
Ean,T(k).
k=0
After normalization, we have n2T )-I
( 2TT !
00
an,T = k=0
n2T )-I
an,T(k),
(2T T !
where f o r any fixed k = 0, 1, ... , 2T
-1 an,T (k)
B(Xe
1
k!
(1.8.15)
1.8 Graphs with components of two types
77
as n, T -+ oo, 2T/n -+ ,l, 0 < ), < 1. We can pass to the limit under the sum in (1.8.15) if the series converges uniformly with respect to the parameters n, T. To see this, it suffices to obtain an estimate
(!)' n2T
an,T
(1.8.16)
Ak
such that the series F_'O Ak converges. Using (1.8.8) and (1.8.9) and reasoning as we did in the proof of the estimate of S2 give
E
an,T(k) =k!1 n
ml+...+mk=m
m=1
Cn2T
...bmk m! 1 nynJFn-m,Nbm1 MI!...Mk!
°O
xmbml
k!2TT! m=k ml+...+mk=m
=
bmk
MI!...mk!
cn2T (B(k))k
2TT!k!
Thus we have an estimate of the form (1.8.16) and can pass to the limit under the sum in (1.8.15) to obtain n2T
1-,l
an,T = 2T T, Depending on the set of graphs under consideration, replace B(x) with BI (x), B2 W, or B3(x), and Theorem 1.8.3 is proved.
A random graph from An,T has exactly N = n - T trees and a random number xn,T of unicyclic components. We denote by the number of unicyclic components in a random graph from T, i = 1, 2, 3. Theorem 1.8.4.
If n, T -> oo such that 0 = 2T/n
0 < A < 1, then for
any i = 1, 2, 3 and for anyfixed k = 0, 1, ... , P{xn`T
= k} =
Ake-Ai
`ki
(1 +o(1)),
where the A; are as in Theorem 1.8.2. Proof. The assertions of the theorem follow from Theorems 1.8.2 and 1.8.3, since
P {x(`) n,T -
k -a (`) n, T(k)l a(`) n,T.
Now we consider the case 0 = 2T/n -* 1. Let (on,T be the number of vertices that lie in the unicyclic components of a random graph from An`T, i = 1, 2, 3. It is clear that if we know the distribution of a characteristic of the random graph
78
The generalized scheme of allocation and the components of random graphs
under the condition On j = m in }the unconditional distribution can be obtained by averaging over the distribution of conT. Theorem 1.8.5. If n, T -+ oo such that e = 1 - 2T/n for any i = 1, 2, 3,
m} =
E
0 and e3n -+ oo, then
314e-Y(1
r(1/4)Y
+ o(l))
uniformly with respect to m such that y = 8 2m/2 lies in any fixed interval of the form 0 < yo < y < yi < oo,Jand there exists a constant A such that, for all in, Pictn1T=m}
:2
Y.
Proof. We denote the number of graphs in .An `T by an` T and the number of graphs
for which con T = m by an`T ..Clearly, (i) an,T
0" (i) !_. an, T, m'
(1.8.17)
M=0 (nM
an,T m
=
()u2Fn_mN.
(1.8.18)
We decompose the sum in (1.8.17) into two parts. Let 0 < yo <
Y1
< 00,
Y=82 m12, and an, T,m°
s1 =
S2=
(i)
E anTm 00
M=0
m: yE[Yo,Yl ]
By Theorem 1.7.1 and the equalities (1.7.21), (1.8.19)
UM(') = Aim"' -1/4(l + o(1))
uniformly in in in the region yo < y < y1, where Ai, i = 1, 2, 3, are defined in (1.7.22). There exists a constant cl such that, for all in,
III <
Clmm-114
(1.8.20)
To estimate Fn-m, N, it is convenient to use the intermediate formula (1.4.25). From (1.4.26) and the equality
9(2-9)=1-E2, we have Fn-m N =
(n - m)! (1 - e2)N N!xn-m
n - m},
(1.8.21)
1.8 Graphs with components of two types
79
where, according to Theorem 1.4.1,
uniformly in k such that u = (k - Nµ)/(Q,/N--) lies in any finite interval,
µ
_
_
2
2-0
n or
N'
2 - 2(1 - s) s(1+s)2
Ifs -+ 0, sin -f oo, and m s/n - 0, then for k = n - m, u
Consequently,
Qv
(k - N) _
n - m) =
_
(1 +o(1)).
Q 2nN 0+00)).
(1.8.22)
1
It follows from (1.8.21) and (1.8.22) that 2)N
Fn m N =
(n - m)! . (1 - s en (1 2NN! (1 - s)n 2nn
s (1
-
s)me-m(1-e)(I
+ o(1)). (1.8.23)
There exists a constant c such that
0 27rNP{4N = k) < c; therefore Fn-m,N
c(n - m)! (1 - s2)Nen(I-8)'V °
2NN! (1 - s)n 2nn
for all m, 0 < m < T. We note that ass
(1 - s)me-m(1-e)
(1.8.24)
0,
(1 - s)meem = e-Y(1 +o(1)) uniformly in m such that yo < y < y1, and for all m, (1 - s)mesm <
e-Y
Clearly, (1.8.23) holds uniformly in m such that y = s2m/2 lies in the interval [yo, y1l Therefore, if n -* oo, s = 1 - 2T/n 0, and s3n - oo, then
Fn-m,N = fn (n - m)! e-m-Y(1 + o(1)) uniformly in m such that yo < y < y1, where
A
(1-8
2)Nen(1-E)/
2NN! (1 - s)n
2nn'
(1.8.25)
The generalized scheme of allocation and the components of random graphs
80
and there exists a constant Ao such that for all in, a-m-Y.
Fn-m,N < Aofn(n - m)!
(1.8.26)
Therefore, by (1.8.18), (1.8.19), and (1.8.25), we have the equality mm-1/4e-m-Y
a(`) n,T, m = n! At fn
(1+00))
M!
= n! Ai fn 21/4
r (1 l4)
(1.8.27)
1
3/4e-y
r(1/4) Y
82
2 (1 + 0(1)),
which holds uniformly in in such that yo < y < yl; and outside of this domain, by (1.8.18), (1.8.20), and (1.8.26), we have An!
a(i)
2
3/4e-Y
- ,/-r(1/4) y
2
(1.8.28)
where A is a constant. The sum -3/4 e-y
[ r(1/4) Y
2
m:YE[yo,yi]
step s2/2. Therefore, by choosing yo small enough and yl and n large enough, this sum can be made is the integral sum of the function
(1(1/4))-lz-3/4e-z with
arbitrarily close to 1, and the sum for remaining values of m can be made arbitrarily small. Thus
an, n,T= n! Ai fn 21/4r(1/4) 0+00)). Now it follows from (1.8.27) and (1.8.28) that a(i)
=m}
n,T, m
S
(i)
= 2r(1/4)y
an,T
3/4e Y(1 + o(1))
uniformly in in such that yo < y < yl and that outside this domain,
P { w(i) n,T
= m} <
As2 2
-3/4e-Y.
Y
This completes the proof of the theorem.
When we substitute the exact expressions for Ai and fn, we obtain for i
1,2,3, (i)
n! (1 -
s2)Nen(l-8)
an,T = Ci
2NN! (1 - e) 11
2-
n
0+00)),
(1.8.29)
where Cl = e314, C2 = e-1/4, and C3 = e-314. It is easy to confirm that if
1.8 Graphs with components of two types
81
8=1-2T/nom 0, then n! 1 -
e2)N en(1-e)
2NN! (1 - s)n
21rn
n2T
o(1)).
2T T !
Thus, under the conditions of Theorem 1.8.5, the asymptotic formulas an`T
=
(1 + o(1)),
2TT!
i = 1, 2, 3,
are valid. Let Kn,T denote the number of unicyclic components in a random graph from
An,T and use 18n,T to denote the number of vertices in the maximal unicyclic component. Theorem 1.8.6. If n, T -+ oo such that s = 1 - 2T/n for any fixed x, 1 P Kn,T + 2109 8 <x
1
(D (x) =
2log8
0 and 83n - oo, then
2n
f x e -u2/2 du. 00
Proof. For any fixed x,
+ 11098 -< x - 21og e
P fKnT 00
P{mn,T=m}P xm+1loge
<X1
M=0
0
98
>
where xm is the number of components in a random graph from Um discussed in Section 1.7. By Theorem 1.7.2, the random variable (urn - 4 log m 1
/4 1og m
is asymptotically normal with parameters///(0, 1).
Let y = 82m /2 and 0 < yo < y < yl < oo. Then log in = log(2y) - 2 loge. Further, since e -+ 0, P
{Xrn
+
2
log e
x - 2 log s
-
1 (x )
uniformly in in such that y E [yo, yl ] and does not depend on in asymptotically. In view of Theorem 1.8.5, by choosing yo small enough and yl and n large enough, the sum
T, m;yE[YO,Yl l
P((On,T = m}
The generalized scheme of allocation and the components of random graphs
82
can be made arbitrarily close to 1. Therefore
-2 loge
P Kn,T + -1098<-X
fi(x)
for any fixed x.
Consider now the maximum size of the unicyclic components. Recall that in Section 1.7 we introduced W, (z, y), setting Wo(z, y) = 1, and WS(z, Y) =
dx1...dxs /4, fs(z,Y) x1 ... xs(z - x1 - ... - X')'
where
XS(z,Y)={xi>y, i=1,...,s, xi+...+xs 0, (_ASl
Y
P{EZ0n,T -< Y} -+ 0)
S
I
ZS (Y),
s=0
where
ZS(Y) =
I'(1/4)
j
314e-yWs
y
(1, 2y
dy,
s = 0, 1.....
Proof. For any fixed y > 0, P{Wn,T = m}P{s2flm < Y},
P{E2fin, T <- Y} _ m=0
where fin is the maximum size of the components in a random graph from Urn studied in Section 1.7. If y = elm/2 and y E [yo, y1], then Y gym
P{s2fim
Pm
2ym
<
-
s
WS
4s s! S=o (
1,
+0(1). 2Y
It is clear that this holds uniformly in in such that y E [yo, Y1]. Choosing a small enough yo and a large enough y1 and averaging over the distribution of wn,T prove Theorem 1.8.7.
1.8 Graphs with components of two types
83
The number of trees in any graph of An,T is N = n - T. Let rin,T be the maximum size of trees in a random graph from An,T.
oo such that e = 1 - 2T/n - 0 and E3n - oo, then
Theorem 1.8.8. If n, T
POt)n,T - U < z) -+ e-e z, where
log(6e-9), 0 = 2T/n, and u is the root of the equation
\1/2 2 I N8312 = u5/2eu.
(1.8.30)
P{WOn,T = m}P{jrln-m,T-m - U < z}.
(1.8.31)
Proof. It is clear that 00
P{0t7n,T - u < z} = M=0
Let v = E3n. It is easily seen that, under the conditions of Theorem 1.8.8, the root of equation (1.8.30) can be written as
u = log v - 2 log log v - log 4,/-7r- + o(1).
(1.8.32)
Let y = E2m/2 lie in a finite interval 0 < yo < y < yi < oo. Set Bm =
2(T - m) n-m
Vm =
E,3nn,
Em=1-
2(T - m) n - m
P(m) = -log(8meem).
Since Em = E(1 + o(1)), it follows from (1.8.32) that the root of the equation 1/2
(2 2
)
N(8 (m))3/2 = u5/2eu
can be written as
um = logvm -
log log Vrn - log 4/ +o(1) = u+o(1)
uniformly in yin any fixed interval [yo, y j ]. Therefore, by applying Theorem 1.6.3, we obtain
P{fBt)n_m,T-m - u < z) --* e-e Z
(1.8.33)
uniformly in y E [yo, y1]. In the main part of the sum in (1.8.3 1), this probability does not depend on m asymptotically. Therefore, averaging (1.8.33) over the distribution of con,T proves Theorem 1.8.8.
When we compare Theorems 1.8.7 and 1.8.8, we see that the maximum size of trees in a random graph from An,T is greater than the maximum size of the unicyclic components, since ,3 = E2/2(1 + o(1)) and u oo. Let crn,T be the
84
The generalized scheme of allocation and the components of random graphs
maximum size of components of a random graph from An, T, that is,
an,T = max(on,T, tin,T) Averaging over the distribution of w, ,T gives the following theorem.
Theorem 1.8.9. If n, T - oo such that E = 1 - 2T/n -k 0 and E3n -+ oo, then for any fixed z,
P(Pan,T - u < z} -+ e-e z,
where ,B = - log(0e-8), 0 = 2T/n, and u is the root of the equation 1/2 2)
N1213
=
u5/2eu
To conclude this section, we consider the case where n, T tends to a constant.
oo such that E3n
If n, T oo such that en113 -) 2 3-2/3v, where e 1 - 2T/n and v is a constant, then for any i = 1, 2, 3,
Theorem 1.8.10.
c, n! e"
(i)
an,T =
2NN!
P(v)(1 + o(1)),
where
Cl _
_e-1/4
,v3e3/4
c2
2Jr'(1/4)' 00
= 2,/F(1/4)'
/3-e-3/4 c3
- 2/r'(1/4)'
y-3/4 p(-v - y; 3/2, -1) dy,
p(v) = J 0
and p(u; 3/2, -1) is the density of the stable law defined by (1.4.18).
Proof. We again use T
()UmFn_m,N.
an,T = m=o
m
(1.8.34)
According to Theorem 1.7.1, as in -+ oo,
um =
Amm-1/4(1
+ o(1)),
(1.8.35)
where the value of the coefficient A depends on the type of the unicyclic components in An,T, and 27re3/4
Al = 21/41'(1/4),
A2 =
27re-1/4
21/4r(1/4)'
2ne-3/4 A3 = 21/4x(1/4).
1.8 Graphs with components of two types
85
To estimate Fn_m,N, we use formula (1.4.25) with 9 = 1. Then (n
m)!
n - m},
F,,-.,N = 2NN! a- n+m
(1.8.36)
+ N is a sum of independent random variables with distri-
where N = 1 + bution (1.4.19):
2kk-2e-1
k=1,2,....
ki
By Theorem 1.4.2,
bN2/3PRN = k) = p(u; 3/2, -1)(1 +o(1)) uniformly in k such that u = (k - 2N)/(bN2/3) lies in any fixed finite interval. Under the conditions of Theorem 1.8.9,
(n - 2N)/bN2/3) - -v. Let y = m/(bN2/3) and 0 < yo < y < yl < oo. Then, under the conditions of the theorem,
_ (n - m - 2N) bN2/3
-v - y.
Thus, by (1.8.36),
Fn m,N =
(n - m)! p(-v - y; 3/2, -1) 2NN! e-n+mbN2/3
(1+00))
(1.8.37)
uniformly in m such that y E [yo, yl]. Since b = 2(2/3)2/3, from (1.8.35) and (1.8.37), we obtain n an, T, m
M
um Fn-m,N
An! mm-1/4p(-v - y; 3/2, -1) m! 2NN! e-n+mbN2/3
An! en'p(-v - y; 3/2, -1) 2NN!
_
27rm3/4bN2/3
An!enNf3
-3/4
23/42NN! 2nNY
(1+00))
(1 + o(1))
1 (1 + o(1)) P(-v - y; 3/2, -1) bN2/3
uniformly in m such that y E [yo, yl]. To obtain an,T, we need to carry out the summation in (1.8.35). If we choose a small enough yo and a large enough yt, substitute the expression of an, T,m into (1.8.34), note that the obtained sum is the
integral sum of the function z 3/4p(-v - y; 3/2, -1) with step b-In-2/3, and omit the needed estimation of the tails, we have
an,T =
cn! en
2NN! I-N o
°O
Y-3/4 p(- v - y; 3/2, -1) dy(1 + o(1)),
The generalized scheme of allocation and the components of random graphs
86
where A/f323/472::7r
Recall our convention that if we consider the set An` T, then A is replaced by A, ,
i=1,2,3. It follows from Theorem 1.8.10 that the number COn,T of the vertices that form
the unicyclic components in a random graph of A,,,T has the following limit distribution:
If n, T - oo such that s = 1 - 2T/n - 0 and sin - v, then bN2/3P{wn,T = m} =
1
p(v)
Y-3/4 p(-v - y; 3/2, -1)(1 +o(1))
uniformly in m such that y = m/(bN2/3) lies in any fixed interval of the form 0 < yo < y < yl < oo and p(v) is defined in Theorem 1.8.10.
1.9. Notes and references In this book, we use a probabilistic approach to combinatorial problems. Section 1.1 provides the results from probability theory that suffice for the probabilistic analysis presented in the book. All of the results in Section 1.1 can be found in standard treatments of probability theory; however, we follow [76], where these results are given along with full proofs.
A detailed discussion of the saddle-point method can be found in [42]. Theorem 1.1.7 is a simplified version of the corresponding theorem that gives a full asymptotic expansion of G(k).
The proof of the local limit theorem (Theorem 1.1.11) was suggested by B. V. Gnedenko and is contained in the book [49], which remains one of the best textbooks on the limit theorems of probability theory (see also [43, 122, 60]). The approximation of the binomial distribution by the normal and Poisson laws was investigated by Yu. V. Prokhorov [125] (see also [90]). The inequality from Theorem 1.1.16 was proposed by Hoeffding [59] for sums of bounded random variables (see also [122]). Section 1.2 is devoted to a description of the generalized scheme of allocation of particles, which is a generalization of the multinomial trials. It was introduced in [69] and now has a significant place in probabilistic combinatorics (see also [78]). Successful applications of the generalized scheme are mostly limited to the equiprobable cases; there are only a few examples where a nonequiprobable scheme has a natural combinatorial interpretation. Along with the nonequiprobable multinomial distribution, Example 1.2.3 is an example of a nonequiprobable scheme.
Example 1.2.4 concerns random forests with rooted trees and is related to branching processes. Indeed, the distribution (1.2.11) is that of the total progeny
1.9 Notes and references
87
in the Galton-Watson process µ(t, G), which begins with one particle that has Poisson-distributed numbers of offspring of a particle. Therefore a random forest with N trees and n nonroot vertices can be represented by the same process that begins with N particles under the condition that the total progeny is n + N. We describe more precisely the correspondence between random trees and the branching process µ(t, G), whose distribution of the number of offspring of one particle is the Poisson distribution with parameter A.
Let µr (t, G) be the number of particles at time t having exactly r direct descendants, and let v(G) be the total progeny over the whole period of evolution of the process.
Consider the set Tn of all rooted trees whose nonroot vertices are labeled 1, 2, ... , n, and whose root is labeled by 0. Assigning the probability (n + 1)-n+t to each tree of T, gives the uniform distribution on T. Any vertex of a tree is joined to the root by a unique path, whose number of edges is called the height of the corresponding vertex. We assume that all the edges of a tree are directed from the root and call the number of edges emanating from a vertex the degree of the vertex. Let µr(t, Tn), r, t = 0, 1, ... , n, be the number of vertices of height t having degree r. Consider the matrices I I µr (t, Tn) I I and I I µr (t, G) l l , t, r = 0, 1, ... , n,
and a matrix M = Il m , (t) II of the same dimension with nonnegative elements. Kolchin [73] showed that
P{Ilµr(t,Tn)II =M}=P{Ilµr(t,G)II =MI v(G)=n+1}. This relation means that the distribution of any random variable that can be expressed in terms of the random variables µr(t, Tn), r, t = 0, 1, . . . , n, coincides with the conditional distribution of the corresponding random characteristic of the branching process under the condition that v (G) = n + 1. This scheme has been used widely to obtain a complete description of the properties of random trees and forests [73, 74, 75, 111, 112, 113, 114, 116]. Recently Yu. L. Pavlov [118, 119] discovered that the branching process that has a geometric distribution of the number of offsprings corresponds - in the same sense as discussed above - to a random plane planted tree with unlabeled vertices. This representation of random plane planted trees is also mentioned in [4, 136, 138]. Note that we are aware of only these two branching processes that have the Poisson and the geometric distributions of the number of offspring, which lead to sets of trees with uniform distribution. Results on more general classes of forests with nonuniform distributions can be found in [120, 121]. The correspondence between random plane planted trees and a branching process that has a geometric distribution appears to be deep and can be considered as a correspondence of realizations, that is, there exists a one-to-one correspondence between the set of such trees and the realizations of the corresponding
88
The generalized scheme of allocation and the components of random graphs
branching process. It seems that this fact was first pointed out in an explicit form by V. A. Vatutin [138]. The general approach to investigating connectivity and the sizes of components of random graphs of various types is presented in Section 1.3. This general approach was first outlined by Kolchin [78], but its particular forms had already been used to investigate other random graphs, such as random permutations, random mappings, and random forests of rooted trees [71, 72, 73, 74, 75].
Forests of nonrooted trees are investigated in Sections 1.4-1.6. Section 1.4 concerns the number of such forests. The number of forests of N labeled rooted trees with n nonroot vertices is N(N + n)tt-1. In contrast to the forests of rooted trees, the number Fn,N of nonrooted forests cannot be expressed by a simple formula. A complete analysis of the random forests of nonrooted trees was conducted by V. E. Britikov, who used the generalized scheme of allocation. The possibility of using such an approach was pointed out in [78, 77]. When Britikov began investigating FF,N, it was known only that for any fixed N as n oo, nn-2
Fn,N
2N-1(N - 1)! 0+00)).
(1.9.1)
A complete description of the asymptotic behavior of Fn,N can be found in [29]. In particular, formula (1.9.1) is generalized for N -f oo and proves that if n --* 00
and (1 - 2T/n)3n
-oo, then nn-2
Fn,N
2N-1(N
- 1)!
2T
(n -1
-5/2
0+00)).
The cases in which (1 - 2T/n)3n tends to a constant and (1- 2T/n)3n - 00 are covered by Theorems 1.4.4 and 1.4.3, respectively. Section 1.5 deals with the numbers µ, of trees with r vertices, r = 3, 4, ... , in a random forest. A complete description of the limit distributions of these random variables was obtained by Britikov [30]. Theorems 1.5.1 and 1.5.2 summarize the results proved in [30], where, in addition, the behavior of µl and µ2 is analyzed. The general approach used to investigate the order statistics in the generalized scheme was suggested in [70] and is also described in Lemma 1.2.2 in [78]. In Section 1.6, we apply this approach to the maximum size of trees in random unrooted forests. The results of this section were obtained by Britikov [28]. Theorems 1.6.11.6.5 cover all possible regular variations of the parameters n and N, but not the case where N is bounded. Clearly, for any fixed k, the size of the kth largest tree of the forest can be analyzed in the same way. Luczak and Pittel [ 101 ] realized this posibility and interpreted the results of their analysis as an evolution of a random forest (see also [31]).
It is pertinent to note here the results that concern the investigations of the ordered series of components of wide classes of random graphs [4, 7, 14, 15, 35, 36, 41, 56]. There are two natural ways of labeling the components. One way is to
1.9 Notes and references
89
arrange them in decreasing order; the other is to use a particular random labeling called the size-biased permutation. For the first type of labeling, let Mi > M2 > be the sequence of sizes of the components of a graph with n vertices numbered in decreasing order. Let Ci be the size of the component that contains the vertex with label 1, let C2 be the size of the component that contains the vertex with the smallest label among the vertices not included in the first component, and so on.
It is clear that the joint distribution of the random variables C1, C2.... normalized by n places unit mass on the set A of infinite sequences of nonnegative numbers such that 0 = {(XI, x2, ...), x1 + X2 + ... = 11, and the joint distribution of MI, M2, ... normalized by n is concentrated on the set {(XI,
X2
For some classes of graphs, the limit distributions of the sequences C1, C2, .. . and M1, M2, ... are known. Let us describe a class of the limit distributions. Let Z1, Z2, ... be independent identically distributed random variables with density
B(1-z)B-1, 00. Let
Y1=Z1,
Y2=Z2(1-Zi),
Y3=Z3(1-Zl)(1-Z2),...
and let Y(1), Y(2), ... be the order statistics constructed from Yi, Y2. .... The distribution of Y1, Y2, ... on i is called the GEM distribution with parameter 0, and
the distribution of Y(i), Y(2).... on V is called the Poisson-Dirichlet distribution with parameter 0. It is known that the distribution of the random variables Mt, M2, ... normalized by n for the cycle sizes of a random permutation of degree n converges, as n -> oo, to the Poisson-Dirichlet distribution with parameter 0 = 1 and that the random variable C1 is uniformly distributed on the set {1, . . . , n (see, for example [78]). For random mappings, the distributions of the random variables C1, C2.... and M1, M2, ... normalized by n converge, respectively, to the GEM distribution and the Poisson-Dirichlet distribution with parameter 0 = 1/2 [3]. As usual, let a, denote the number of components of size r of a random graph with n vertices. The joint distribution of the random variables a1, ... , a, of the form
P{a =a
a =a }_ (0+n-1\ / n n
n
I
1
ea]+"'+an
jai2a2a!! ...nnai...an
where ai, ... , an are nonnegative integers such that al + 2a2 +
+ nan = n is
90
The generalized scheme of allocation and the components of random graphs
similar to the joint distribution of the random variables al, ... , a for a random permutation (see Lemma 1.3.7). This distribution arises frequently in population genetics and is known as the Ewens distribution [40, 67].
If the random variables Cl, C2.... and Ml, M2.... correspond to a graph with the Ewens distribution of al, ... , a with parameter 0, then as n -+ oo, the distributions of the normalized random variables converge, respectively, to the GEM distribution and the Poisson-Dirichlet distribution with the same parameter 0 [67]. See also [139, 140, 141]. Section 1.7 contains the results on unicyclic random graphs obtained in [77]. The analysis of random graphs with components of two types presented in Section 1.8
is also contained in [77]. The idea of considering a graph as a combination of connected components of certain types can be attributed to Agadzhanyan [1, 2]. The results of Section 1.8 can be found in [77].
2
Evolution of random graphs
2.1. Subcritical graphs This chapter deals with several models of random graphs with n labeled vertices and T edges as n, T -k oo. The parameter B = 2T/n plays a decisive role in the behavior of random graphs, and it may be interpreted as time in the evolution of the graphs. It turns out that many of the characteristics change their behavior abruptly near the point 0 = 1. It is convenient to distinguish three domains of the variation of the parameter 0. We say that a random graph is subcritical if n, T oo in such a way that (1 - 8)3n oo. Thus, for a subcritical graph, 0 may tend to unity, but not too fast. A critical graph is characterized by the conditions that n, T -k oo and (1 - 0)3n tends to a constant. And, finally, a graph is supercritical if n, T -* 00
and (1 - 0)3n
-oo.
In this section we consider three sets of graphs. Let 9n T be the set of all graphs with n labeled vertices and T edges with loops and multiple edges, provided each vertex may have no more than one loop and each pair of vertices may be connected by no more than two edges. Let gn2T be the set of all graphs with n labeled vertices and T edges that have no loops; however, each edge may occur twice, so that each pair of vertices may be connected by no more than two edges. And, finally, let gn3T be the set of all graphs with n labeled vertices and T edges that have neither loops nor multiple edges. Denote the number of graphs in 9n `T by gn,T, i = 1, 2, 3. We introduce the uniform distribution on 9n' T, i = 1, 2, 3, assigning equal probabilities to all elements of the corresponding set, and denote by G 'T a random graph such that
PIG(') = G} for any G E 9n'T, i = 1, 2, 3. 91
= (9"n,
Evolution of random graphs
92
Recall that in Section 1.8 we considered the sets A,(,' T, i = 1, 2, 3, of all graphs
with n labeled vertices and T edges with components of two types: trees and unicyclic components. In A,(,3) , the unicyclic components have neither loops nor
multiple edges; in
the unicyclic components have no loops, but may contain
cycles of length 2; and in A,(,1T, the unicyclic components may contain loops and cycles of length 2. Thus,
i =1,2,3.
A,(,'T Cgn'T,
The results of Section 1.8 allow us to describe the limit distributions of various characteristics of subcritical random graphs G('T, i = 1, 2, 3.
oo such that (1 - 2T/n)3n -+ oo, then for any
If n, T
Theorem 2.1.1.
i=1,2,3, ll -+ 1. P {G(') n,T 1 l n,T E A(')
Proof. It is clear that (i)
J
(i)
ll - (i)
(i)
P1 nTE 4nTJ-afT/gnT We need to determine the asymptotics of gn' ., i = 1, 2, 3, under the conditions of Theorem 2.1.1 to match the results on a(') from Section 1.8.
Recall that if 0 = 2T/n - A, 0 < a, < 1, then by Theorems 1.8.1, 1.8.2, and assertion (1.8.29), n 2T
Ci
n' ,T =
2(T)T.!
0+00))
(2.1.1)
for any i = 1, 2, 3, where c2(A) = e-x/2+x2/4,
cl (A) = eX/2+x2/4,
If n, T (3) gn
C3
e-x/2-x2/4.
oo and T3/n4 -+ 0, then
Cn(n - 1)/2 T
_ (n(n - 1))T 2TT! n2T
1- n(n2- 1)
1
- n(n 4- 1)
2(T - 1) n(n - 1)
e-T/n-T2/n2
2TTi
(1
+o(1)),
(2.1.2)
and Theorem 2.1.1 is proved for i = 3. It is clear that each graph from gn27. can be obtained by a choice of T edges, which is equivalent to an allocation of T particles into (2) cells, provided each cell
2.1 Subcritical graphs
93
contains no more than two particles. Therefore
E
gn2T -
tj +2t2=T
`S) `S t2 tl)
where S = (2), tl cells have exactly one particle, and t2 cells have two particles. Hence, S!
gn2T
ti+2t2_T
tl!t2!(S - tl - t2)! T!S!
T
0
t!(T-2t)!(S-T+t)!-
For any fixed t, !
T! .
(T - 2t)! (S - T + t)!
- T2tST-te-T2/(2S)(1 +00)) =
2 ST C2T n2
t
)
e-T2/n2(l +0(1)).
Therefore, under the conditions of Theorem 2.1.1,
_ gn2T
-
STe-T2/n2 °O
=
n
1
t
2T2
E (n2) (1 + 0(1))
T!
t=0
2Te-T/n-T2/n2
e2T2/n20+00))
2TT! 2T
=
2TT!e-T/n+T2/n20
+00)).
(2.1.3)
gn3T
Similarly, each graph from can be obtained by a choice of T edges, which is equivalent to an allocation of T particles into n + (2) cells, provided that no more than two particles are allocated into each of (2) cells and only one particle may be put into each of n cells. Therefore, putting S = (2) yields gnlT
_
n t1+t2+2t3=T
t2)
(t2S) 1(S
tl)
\
t3
2T
gn T = 2T T!
eT/n+T2/n2
0+00)).
(2.1.4)
Then, by comparing (2.1.1) to (2.1.2), (2.1.3), and (2.1.4), we obtain the assertion of the theorem.
Evolution of random graphs
94
According to Theorem 2.1.1, each of the subcritical graphs Gn'T, i = 1, 2, 3, consists of trees and unicyclic components and, with probability tending to 1, does not contain more complicated components. Given a random graph G, denote by /- t ,(the number of trees of size r, by il(G) the maximum size of trees, by co(G) the total number of vertices in the unicyclic components, by x (G) the number of unicyclic components, by ,B (G) the maximum size of the unicyclic components, and by ct(G) the maximum size of the components. Let y (Gn'T) be a characteristic of the random graph Gn'T and let yn`T be the corresponding characteristic of the random graph from An'T Then, by the formula of total probability,
E An, }P{y, . < x}
P{y(Gn'T) < x} =
-FPjIGn'T gAn'T}P{y(G, .) <xI GnT gQn'T} for any x. By Theorem 2.1.1,
P{Gn'TEA('T}-+1 if the graph Gn'T is subcritical. Therefore, for any characteristic y (G,'T) of the subcritical graph,
P{y(Gn'T) < x} = P{yn'T < x}(1 +o(1)) + o(1),
(2.1.5)
and if Ply(i) < x} tends to a limit, then the probability P{y (GnT) < x} has the same limit. Thus, many of the results of Section 1.8 can be reformulated for the corresponding characteristics of the random graphs Gn'T , i = 1, 2, 3. If y(Gn'T) is an integer-valued characteristic, then for any fixed integer k,
P{y(Gn'T) = k} = P{yn'T
= k)(1+00))+00),
(2.1.6)
and if P{yn'T = k} has a nonzero limit, then relation (2.1.6) allows us to obtain the limit of the probability P{y(Gn'T) = k}. Theorem 2.1.2. If n, T -> oo such that T/n -. 0, then for any i = 1, 2, 3, P{w(Gn'T) = 01
If n, T co such that e = 1 - 2T/n x > O and any i = 1, 2, 3, P{w(Gn'T )s2/2 < x}
-,
1.
0 and On x 1
r(1/4)
oo, then for any fixed
Y-3/4
o
Proof. The assertions of the theorem follow from (2.1.5), (2.1.6), and Theorems 1.8.1 and 1.8.5.
2.1 Subcritical graphs
95
Theorem 2.1.3. If the graph G(') is subcritical, i = 1, 2, 3, and r = r(n, T) > 3 varies such that Npr (6) -+ oo, then for any fixed x, P
µrlGn,T - NPr(e) <
Qrr(0)- 1
}
e-u1/2
1
du,
27C J x00
where
N=n-T, 6 = 2T/n, 20-26k-le-1'
Pr (0) =
Qrr(B) =
k = 1,2,..., k!(2 - B)
'
Pr(e)(I - Pr(e) - (p - k)2pr(e) Q2 2 (2 - 6)'
26 Cr2
-
(1 - 9)(2 - 6)2'
If r = r(n, T) > 3 varies such that Npr(6) -a A, 0 < A < oo, then for any fixed
k = 0,1,..., P{µr(GG`T)
= k} =
eke-.l
(I +o(1)).
k!
Proof. In view of (2.1.5) and (2.1.6), the assertion of the theorem follows from Theorems 1.5.1 and 1.5.2 because, by Theorem 2.1.2, the number co(Gn`7.) of vertices in the unicyclic components for subcritical graphs is small compared with 1. the total number of vertices; more precisely, P{cw(G(')) < n2i3} Theorem 2.1.4.
If n, T - oo such that T/n -+ 0, r = r(n, T) > 1 and
Npr (6) + oo, Npr+1(6)
0 < A < oo, then for any i = 1, 2, 3,
P{a(G, .) = r} = P{ri(GnT ) = r} = e- +o(1),
P{a(Gn`T)=r+1} = P{q(G,`T)=r+1}=1-e
+o(1).
Proof. In view of (2.1.5) and (2.1.6), the assertions of the theorem follow from Theorem 1.6.1. Theorem 2.1.5.
oo such that 6 = 2T/n -+ A,
If i = 1, 2, 3 and n, T
0 <,X < 1, then for anyfixed k = 0, 1, ... , Ake-A1
P{x(Gn`T)=k}= Ik!
(1+o(1)),
Evolution of random graphs
96
where A2
A A1=-Zlog(1-,t)+2+4, 1
A2=-2og-,l-2+
A2
A
1
4
k2
A A3=-2log(1-A)-24. 1
For anyfixedk=0,±1,..., [a] < k} = P{rl(Gn1T - [a] < k}(1 +o(1))
-
exp{
- (A (ex-log
))5/2e(k+(a))(A-1-logA)1(1+o(1)),
where
_
log n - (5/2) log log n
9-1-1og9
a
[a] and {a) are, respectively, the integer and fractional parts of a.
Proof. The assertions of the theorem follow from (2.1.5), (2.1.6), and Theorems 1.8.4 and 1.6.2. Theorem 2.1.6. If i = 1, 2, 3 and n, T e3n -* oo, then for any fixed x,
P
oo such that e = 1 - 2T/n - 0 and
{x(G,i) + loge<x -Z loge}-
LX e-"2/2du, 00
JJJ
and for any fixed x > 0,
x} = T
(_4Ss! s
ZS(x)(1 +o(1)),
s=0
where ZS (x) is defined in Theorem 1.8.7. Finally, for any fixed z, P{Oa{Gn`T)
where 0
u < z}(1 + o(1)) = e-e z(1 +o(1)),
- u < z} =
log(9e-e), 0 = 2T/n, and u is the root of the equation 1/2
(21
N0312 =
u5/2eu.
(2.1.7)
Proof. The results of the theorem are the consequences of (2.1.5), (2.1.6), and Theorems 1.8.6, 1.8.7, 1.8.8, and 1.8.9.
2.2 Critical graphs
97
2.2. Critical graphs Recall that a graph with n vertices and T edges is called critical if n, T -* oo such
that s = 1 - 2T/n -+ 0 and sin tends to a constant. We have seen that many of the characteristics of the random graphs i = 1, 2, 3, change their behavior if 0 = 2T/n approaches the value 1. For example, the number of cycles, or the number of unicyclic components x(G(') ), tends to zero in probability if 0 0, A, has the Poisson distribution with parameter A; , i = 1, 2, 3, respectively, if 0 0 < A < 1, where 2
Al
=-Zlog(1-.l)+2+ 4,
A A2 A2=-2log(1-A)-2+ 4, 1
,l ,l2 A3=-2log(1-A)-24, 1
and is asymptotically normal with parameters (-1 log s, 4-1 logs) if s - 0, 4 sin oo. Thus, 0 = 1 is a singular point and one can correctly suppose that the behavior of the graphs near this point is interesting but difficult to investigate. Indeed, not much is known about the properties of critical graphs. We present here only one assertion about this behavior. Recall that A(' A. is the set of graphs with n labeled vertices and T edges that consists of trees and unicyclic components with neither loops nor multiple edges
for i = 3, without loops and with cycles of length 2 allowed for i = 2, and with cycles of lengths 1 and 2 allowed for i = 1. Theorem 2.2.1. If n, T -+ oo such that sn1/3 -+ 2 3-2/3v, where v is a con-
stant, then for any random graph G, ,, i = 1, 2, 3,
j
PIG(`) n E A(`) n, T 1ll
-
f3 3n
/^I,(1/4)
e4v3/27p(v)(1 +o(1)),
where 00
p(v) = J
Y-3/4 p(- v - y; 3/2, -1) dy
0
and p(y; 3/2, -1) is the density of the stable law, introduced in Theorem 1.4.2, with the characteristic function
f(t) = exp { -
ItI3/2eint/(41tl)I.
Proof. It is clear that
PIG(') nT
(') -a(i) (1) EAT{-anTlgfT,
Evolution of random graphs
98
T, and gn`T is the number of graphs in nT, i = 1, 2, 3. In accordance with Theorem 1.8.10,
where an` T is the number of graphs in
cin! en
o(1)),
an,T =
whereN=n - T, f3-e-1/4
,f3-e314 Cl
2x1'(1/4),
c2
r3-e-314
= 2 J r(1/4)'
c1
= 2V' f(1/4)
In the previous section, we proved that (i)
gn,T =
n 2T c, 1) (1 2T T
+ o(1)),
where cl (1) = e3/4, c2(1) = e-1/4 c3(1) = e-3/4
Since T = n(1 - e)/2 and e3n -+ 8v3/9, we easily find n!. e n T.i 2 T = 2N/7-re4v3/27(1 + o(1)) 2NNI,lN--n2T
and, consequently, (i) an,T
3n
=
e4v3/27p(v)(1 +o(1)).
-17(1/4)
gnt,T
The function p(v) can be represented by a convergent power series. The function
g(v) = p(-v) =
f
00
Y-3/4 p(v - y; 3/2, -1) dy
0
can be thought of as the convolution of the function
gl (Y) =
Y-3 (0,
4
y > 0,
Y<0
and the function g2(y) = p(y; 3/2, -1), so that oo
g(v) = f
91(Y)g2(v - Y) dy. 00
Therefore the Fourier transform g(t) of the function g(v) is the product of the Fourier transforms of the functions g1 (y) and g2(y). The Fourier transform g1 (t) of the function $1 (y) is 27reint/(8Itp g1
,r(3/4)ItI1/4'
2.2 Critical graphs
99
and the Fourier transform $2(t) of the function g2(y) = p(y; 3/2, -1) is the characteristic function of this density: g2(t) = eXp {
- It
I3/2eint/(41tD}.
Thus, 2Jrei"tl(81tD
g(t) =
Jr (3/4) It 11/4
Itl3/2eiat/(4ItU}.
exp { -
By the inversion formula, 1
/' °O
g(v) = 2n J
e-itvg(t) dt °O
1
r(3/4) f°o
e-itvItI-1/4eint/(81tI) exp { - ItI3/2eint/(41tD1 dt,
and therefore, under the hypotheses of Theorem 2.2.1,
PIG(') E AniT } _
f2_r(1/4),f2_r(3/4)h(v)(1 +o(1)),
T
where
h(v)
00
_f
eitvItI-1/4eiat/(81tD exp { - It
13/2eiat/(41tD}
dt.
00
Since F (1 /4) r (3/4) _ -,,12-n, we obtain
PIG(') nj E AnT } - 2,/2-,7 h(v)(1 +0(1)).
(2.2.1)
The function h (v) can be represented by a convergent power series.
Theorem 2.2.2. If n, T oo such that sn1/3 -+ 2 3-2/3v, where v is a constant, then for any random graph Gn`T , i = 1, 2, 3, P{Gn,T E A(') n,T } - P(v)(1 +o(1)),
I,(v) =
e4v3/27
3n
°O
k=0
k
kI
r (2k 3+ 2
\
cos
3
.
Proof. Let us represent h(v) by a power series in v. Since the left-hand side of (2.2.1) is real, /'c*
h(v) = 3t
J
eitvItI-1/4eiat/(8ItD exp { -
Itl3/2eint/(4ItU}
dt.
Evolution of random graphs
100
Consider first the integral 00
feitVt- 1 /4ei"/8 exp
h1(v) =
t3/2et"/4} dt.
By expanding eity we obtain
hi (v) =
ei"/8
00
E
ikl k
k=00
oo
f
tk-114
exp { - t3/2ei"/4} dt.
After the change of variables t3/2ei"/4 = z, we obtain
2 00 vk 3
Ek=0
(2k +
p (i7rk
k!
3
3
1
2
Therefore
91h 1(v) =
2
00
E
cos 3 r
(2k
I(2.2.2)
\\
+2
//
k=0
Similarly, for 0
h2(v) = f
f
00
eitvIt 1-1/4e-i"/8 exp (- ItI3/leint/(41tU} dt
e-itvt-1/4e-i"/8 exp J - t3/2e-in14}
,
dt,
we obtain 2 00 vk
1h2(v) = 3 E k=0
k!
cos
"k ( 2k 3
I
3+
1
2
(2.2.3)
The assertion of the theorem follows from (2.2.1), (2.2.2), and (2.2.3).
Theorem 2.2.2 allows us to calculate the limit values of P(G,,'T E A. }. For example,
P(O) =
2/3.
Some values of P(v) are given in Table 2.1.
2.3. Random graphs with independent edges When we were determining the number of graphs in the classes G('), i = 1, 2, 3, in Section 2.1, we associated each of the classes with the corresponding equiprobable scheme of allocating particles into cells. It is easily seen from these correspondences that the realizations of each of the random graphs i = 1, 2, 3, could be obtained by a sequential allocation of particles, but these random allocations are dependent. For example, if a pair of vertices has been connected in the random
2.3 Random graphs with independent edges
101
Table 2.1. Values of P(v) v
P(v)
-3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2
0.0053 0.0118 0.0239 0.0443 0.0755 0.1196 0.1768 0.2461 0.3244 0.4078
V
-1.0 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 1.0
P(v) 0.4919 0.5727 0.6470 0.7128 0.7693 0.8551 0.8860 0.9105 0.9297 0.9447
V
P(v)
1.8
0.9563 0.9653 0.9722 0.9776
2.0 2.2 2.4 2.6 2.8 3.0
0.9819 0.9852 0.9878 0.9899 0.9915 0.9929
1.2 1.4
1.6
graph after allocating some of the edges, then the outcomes of all subsequent allocations cannot be the edges connecting these two vertices. The classes of random graphs whose edges are independent seem to be easier to investigate by using the methods of probability theory. The best-known random graph with this property is G,,, p with n vertices such that each of the (2) possible edges belongs to the edge set of G,,, p with probability p independently of the behavior of the other edges. This graph has a random number of edges with the binomial distribution with n trials and the probability of success p. In this section, we consider the random graph G,,,T with n vertices labeled 1, ..., n and T edges that can be obtained by T independent trials. In each trial,
the loop at any point i occurs with probability n-2 and the edge connecting the vertices i and j, i j, occurs with probability 2n-2. In other words, if the edge set of G,,,T consists of T edges ((i(1), j(1)), ..., (i(T), j(T)), then i (1), j (1), ... , i (T), j (T) are independent identically distributed random variables taking the values 1, 2, . .. , n with equal probabilities. It is clear that the realizations of the random graph Gn,T are not equiprobable. For example, for n = 2 and T = 1, the graphs with a loop and an isolated vertex have the probabilities 1/4 each, and the connected graph has the probability 1/2. Nevertheless, this model has some advantages and is conducive to treatment by probabilistic methods.
Since i(1), j(1), ... , i(T), j(T) are independent identically distributed random variables, we can associate to the random graph G,,,T the classical scheme of allocating particles where 2T particles are allocated into n cells such that each particle falls into any of n cells with probability 1/n independently of the allocations of the other particles. By using this relationship, we can, for example, easily find the distribution of the number of loops in Gn,T. Indeed, we have T trials, corresponding to T edges, and in each of these trials a loop appears with
102
Evolution of random graphs
probability 1/n. Thus, the total number of loops al in Gn,T has the binomial distribution with parameters (T, 1/n). The mean number of loops is Eal = T/n. If 2T/n -+ A, 0 < A < oo, then the Poisson distribution with parameter A/2 is the limit distribution for al. Under the condition al = m, the other edges may be considered as the result
of T - m independent allocations into (2) cells corresponding to (2) possible edges of the complete graph with n vertices. Therefore, with al = m, the number a2 of cycles of length 2 in Gn,T can be thought of as the number of cells with exactly two particles in the classical (equiprobable) scheme of allocation of T - m particles into (2) cells. The classical scheme of allocation has been well studied. In particular, if n, T -k oo such that 2T/n 0 < A < oo, then the distribution of the number of cells, occupied by exactly two particles each, converges to the Poisson distribution with parameter x,2/4. Since the limit distribution does not depend on m for m = o(n), averaging over the distribution of al shows that al and a2 are asymptotically independent and their distributions approach the Poisson distributions. Theorem 2.3.1. If n, T - oo such that 2T/n fixed nonnegative integers kl and k2,
A, 0 < A < oo, then for any
2)k (X2)k2
P{ al = kl, a2 = k2} _
e
0(1)).
Because the edges of Gn,T are independent, we can apply direct probabilistic approaches to investigations of the structure of Gn,T. Theorem 2.3.2. If n, T -+ 00 such that T/n 0, then in Gn,T, with probability tending to 1, there are no cycles and all the components are trees.
Proof. Denote the number of cycles of length r with r distinct vertices by ar, and let v(Gn,T) = al + + an be the total number of cycles considered as induced subgraphs of Gn,T. We can represent ar as a sum of indicators. The edges of Gn,T
appear sequentially in T trials. We assign the numbers 1, 2, ..., T to the trials and arrange (in some order) all (r) possible subsets of cardinality r of the trial numbers. We define the random variable l;j to be equal to 1 if the subset of trial numbers labeled with i forms a cycle in Gn,T, and j = 0 otherwise. It is clear that
ar=
+
In turn, each of the random variables l l , ... , 4(T) can be represented as a sum of indicators. The cycle corresponding to the subset with label i can be constructed from r different vertices and r different edges. There exist (Y) possibilities to choose these r vertices and (r - 1)!/2 possibilities to construct a cycle from these r vertices for r > 3. Each construction fixes r edges that must occur. These r edges
2.3 Random graphs with independent edges
103
can occur at r fixed places of the subset labeled i, and there exist r! possibilities to assign these r edges to r places. Thus the event 1 } can be realized by one
of the (r)(r - 1)! r!/2 variants. For r > 3, each of these variants has the probability (2/n2)r. Thus,
Ear =
(T)(n)( r-1)!r! r
r
2
(-)
r
(2.3.1)
n2
It is not difficult to check that this formula is also valid for r = 1 and r = 2. It follows from (2.3.1) that
Ear <
Trnr(r - 1)!r! r! r! 2
2 )1
(
\
n2 2
-/
l _ (2T n
r 2r
Therefore, n
Ev(Gn,T) = E Ear r=1
has the upper bound
,i- 2T r
Ev(GnT)
1
n J 2r
Under the conditions of the theorem, Ev(Gn,T) tends to zero and the number of cycles in Gn,T is zero with probability approaching 1.
We denote by An,T the set of all graphs with n labeled vertices and T edges whose components are trees and unicyclic components. Note that loops and cycles
of length 2 are permitted. As before, 0 = 2T/n, s = 1 - 2T/n. Theorem 2.3.3. If n, T -+ oo such that sin -+ oo, then
P{G ,r
An,T} "--
4 En
Proof. We have to prove that under the conditions of the theorem, the graph Gn,T has no component with more than one cycle with probability less than 4/(83 n). If in Gn,T there exists such a component, then in Gn,T there either exists a subgraph
that consists of two cycles connected by a chain (pince-nez) or there exist two cycles that have a common sequence of edges (a cycle with a bridge). We uset`S to denote the number of subgraphs of Gn,T that consist of cycles of lengths r and s connected by a chain oft edges, and denote by art) the number of subgraphs of Gn,T that consist of a cycle of length r with two vertices connected by a sequence of t edges. To prove the assertion of the theorem, it is sufficient to show that the
Evolution of random graphs
104
mean number of such subgraphs tends to zero. It is clear that P{Gn,T
a) > 0 < E (
;ts +
An,T } = P r,s,t
r,t
Yts
r,s,t
+Yt) r,t
By reasoning in the same way as in the proof of formula (2.3.1), we obtain the estimates
+n
Edt) <
r (r t - 1)(r+t)(r-1)!(r-1)(t-1)! X
G +t
n2
n
G +s+t-1
n
n
)(r+s+t-1)! r! s! (t - 1)t r!s!(t-1)!
2 r+s+t (r+s+t)(r+s+t)-n2 !(
T
x
2r (2T)r+t
)r+t
r + t). (2
T)
)
2
< n
2T r+s+t
(n)
Thus, the mathematical expectation of the total number of pince-nez and cycles with a bridge can be estimated as follows: 00
00
E(t)s
E r(t) +
,
r,t=0
r,s,t=0
< 2
(t)
r
(2T
O n)
n r
r+t + 2 n
2T r+s+t <
(t)
rr ( n) 0
4
n(1 - 2T/n)3
Theorem 2.3.4. If n, T oo such that 0 = 2T/n --* A, 0 < A < 1, then the distribution of the number of cycles v(Gn,T) in Gn,T converges to the Poisson distribution with parameter 1
A = -21og(1 - A). Proof. In view of Theorems 2.3.1 and 2.3.3, we can reduce the proof to the application of Theorem 2.1.5 concerning the random graph Gn3T without loops and multiple edges. Indeed, by the formula of total probability,
1' P{al
P{v(Gn,T) = k}
= k1, a2 = k2, Gn,T E An,T}
ki+k2
x P{v(GG,T) = k I al = ki, a2 = k2, Gn,T E An,T}
+
P{al = ki, a2 = k2, Gn,T
An,T}
kt+k2
x P{v(Gn,T) = k I al = ki, a2 = k2, Gn,T 0 An,T}.
2.3 Random graphs with independent edges
105
An,T) -> 0, and it is not difficult to see
According to Theorem 2.3.3, P{GG,T that
P(GG,T E A I al = k1, a2 = k2) = P{Gn3T-kI-k2 E An,T), P{v(Gn,T) = k I ak = k1, a2 = k2, Gn,T E An,T}
= P{x(Gn3T-k1-k2) -k-k, - k2j. Thus
P{v(Gn,T) = k} _
P{a1 = k1, a2 = k2}
(2.3.2)
kt+k2
x P{x(Gn3T-k1-k2) = k - k1 - k2}(1 + o(l)) + o(1). According to Theorem 2.1.5, under the conditions of Theorem 2.3.4, for any
fixed k1, k2 = 0,1,...,andk>k1+k2, Ak-kl -k2 e-A3
P{x(Gn3T) =k-k1 -k2},
3 (k - k1 - k2)!
where
k2
k A3=--1og(1-A)-24. 1
Now it follows from (2.3.2) and Theorem 2.3.1 that
P{v(Gn,T) = k} _
k1li
(Z)kl e-X/2 k-
e-
X2/4
k1 +k2
Ak-k" -k2 x (k -3k1
-
k2)!e-A3(1 + o(1)) + o(1)
e-A
k!
k!
kl +k2
k1!k2!(k - k1 - k2)!
Ak(;,)k,
x
=
k!
\4
\2k A3-kl-k2(1
+O(1)) +o(1)
CA (1 + o(1)),
where k k2 A=A3+2+ 4
1
-- 1090 - X).
By reasoning in the same way, we can reformulate the theorems proved for Gn3T so that they can also be applied to subcritical and critical graphs Gn,T. As an
Evolution of random graphs
106
example, we give an analogue of Theorem 2.1.6 on the number x (G,, T) and on the maximum sizes rl(G,,,T), ,B(G,,,T), and a(G,,,T) of trees, unicyclic components, and all components in G,,,T, respectively.
Theorem 2.3.5. If n, T -+ oo such that e = 1 - 2T/n -+ 0 and e3n -+ oo, then for any fixed x,
P
{x(GflT)+loge <x
-
logs -> 1 2Jr
ff x
e-u2/2 du;
o0
for any fixed x > 0, s
P(8218(Gn,T) < x} = E00(4ss! Zs(x)(1 + o(1)),
where Zs (x) is defined in Theorem 1.8.7; and
P{j O (Gn,T) - u < z} = P{Ptj(Gn,T) - u < z}(1 + o(1)) = e-e z(1 + o(1)), where fi
log(0e-9), 0 = 2T/n, and u is the root of the equation 1/2
(2 )
(n - T),63/2 = u5/2e".
For the same reasons, Theorem 2.2.2 can be extended to the critical graph Gn,T
Theorem 2.3.6. stant, then
If n, T -+ oo such that en1/3
P{Gn,T E An,T}
F7r2
4v 3/27
e
2 3-2/3v, where v is a con-
rvk r(2k 3+2 1
ki
nk cos 3 (1 +0(1)).
k=0
For the supercritical case where n, T oo such that sin -+ -oo, we present here only the simplest results. In the final section of this chapter, we will give a short review of what is known about the supercritical graphs. It is known that if 0 = 2T/n -+ A, A > 1, a giant component appears in the graph Gn3T and, with probability tending to 1, consists of trees, unicyclic components, and this giant component formed by all the vertices that are not contained in trees and unicyclic components. As 2T/n increases, the size of the giant component increases and the number of unicyclic components decreases. If 0 = 2T/n -- A, 1 < A < oo, then the number of unicyclic components has oo, we have the following result. a Poisson distribution. For 0 Theorem 2.3.7. If n, T -* oo such that 0 = 2T/n -+ oo, then with probability tending to 1, there are no unicyclic components in Gn,T.
2.3 Random graphs with independent edges
107
Proof. The number of unicyclic component with r vertices is not greater than crr-1/2, where c is a constant (see, e.g., [16]). Denote by xr(Gn,T) the number of unicyclic components of size r in Gn,T. By reasoning as in the proof of (2.3.1), we find that
Exr(Gn,T) <
c(n) Trr1/2rC
(r)
2\ r /1f
C1
- 2r(n2
r)
- r(rn2 1)1T-r J (2.3.3)
where the last factor is the probability that the T - r edges, which were not used for the construction of unicyclic components, neither connect the vertices in the component with the vertices outside the component nor connect any pair of vertices in the component. It is sufficient to prove that
E Exr(Gn,T) - 0. 1
With the help of estimate (2.3.3), we find that (0e)re-2r(n-(r+1)12)(T-r)1n2
Y. Exr(Gn,T) _< Y, 1
1
For sufficiently large n and 1 < r < n, e-2r(n-(r+1)/2)(T-r)/n2 < e rB/4
and q =
eel-e/4
< 1. Therefore
EX(GnT)qr= 0" 1
q q
r=1
oo, we conclude that a unicyclic component - 0 as 0 exists in Gn,T with a probability that tends to zero.
Since q =
Be1-0/4
Finally, we consider the behavior of the random graph Gn,T near the point where the graph becomes connected. Denote the number of components in Gn,T by xn,T
Theorem 2.3.8. If n -+ oo and 2T = n log n + xn + o(n), where x is a constant, then with probability tending to 1, the graph consists of a giant connected component and isolated vertices. Also, for any fixed integer k = 0, 1, ... ,
P{xn,T - 1 = k} .
-kx
e-e_
x
Evolution of random graphs
108
Proof. We have to prove that, with probability tending to 1, Gn,T consists of one giant component and isolated vertices, and that the distribution of the number of these isolated vertices converges to the Poisson distribution with parameter a-x.
The edges of Gn,T appear as a result of T independent trials, and these T trials can be considered as the allocation of 2T particles into n cells such that any particle is allocated independently of the other and, with equal probabilities, falls into any of n cells. Therefore the number of isolated vertices in Gn,T has the same distribution as the number µo(2T, n) of empty cells in the well-studied classical scheme of allocating particles. Under the conditions of the theorem, the distribution of µo(2T, n) converges to the Poisson distribution with parameter a-x. To complete the proof, it suffices to show that, with probability tending to 1, the remaining vertices form one giant component. If, in addition to the isolated vertices, there were two other components, then the graph would contain a tree of
size r, 2 < r < n/2, such that any vertex of the tree would not be connected to any vertices outside the tree. A skeleton of one of the two components could play the role of such a tree.
By r we denote the number of trees of size r which are the skeletons of connected components of Gn,T. We will show that under the conditions of the theorem,
E4r+O, 2
and consequently, with probability tending to 1, such a tree does not occur in Gn,T. We can represent .r as a sum of indicators and find that
Er=
1)
\r -
()_2 r! (2)r_1 (1\1- 2r(n-r) )T-r+l r
n2
(2.3.4)
n2
This formula is similar to (2.3.1): We choose r vertices and r - I edges that form the tree, and the last factor is the probability that none of the T - r + 1 edges that remain connects a vertex from the set of r selected vertices with a vertex from the set of n - r remaining vertices. By using formula (2.3.4), we can check, for example, that with probability tending to 1, there are no isolated edges in Gn,T. Indeed, for r = 2, 2T
I-
)T_1
4(n2 _
2Te-4(n-2)(T-1)/n2
<
2)
and the right-hand side of (2.3.5) tends to zero if n
oo and 2T = n log n +
xn + o(n). It follows from (2.3.4) that Ear
r r-1 nrr r-2 r!
2r
(r - 1)! r! n2(r-1)
-2r(n-r)(T-r+1)/n2
e
(2.3.5)
109
2.4 Nonequiprobable graphs
and for all sufficiently large n, ner exp f
n \2T
2T )r-1 n
er
J
_2rT
1
n
exp lj-2n
2
81 9
9r I
Therefore 2
Er 3
2T
00
E r=3
(Oe1-49/9)r
n2(8e1-40/9)3
2T(1 If n
Bel-40/9)'
oo and 2T = n log n + xn + o(n), then 9e1-40/9
=
2logne-4logn/9+1-4x/9+o(1)
and for all sufficiently large n, 9e1-40/9
clogn
< n4/9
where c is a constant. Therefore, under the conditions of the theorem,
Er+O. 3
Taking into account that 0 also, we see that, with probability tending to 1, the graph Gn,T has only one component besides the isolated vertices.
2.4. Nonequiprobable graphs The model of the random graph Gn,T considered in the previous section can be easily extended to nonequiprobable graphs. However, the approach based on the generalized scheme of allocation, which reduces the investigations of equiprobable graphs to some problems concerning sums of independent random variables, does not apply to nonequiprobable graphs. In this case, few results have been obtained because of the lack of effective methods to investigate these objects. In this section, we consider a generalization of the random graph Gn,T of the previous section. We preserve the notation Gn,T for this nonequiprobable graph with n vertices labeled with the numbers 1, 2, ... , n and T edges, which can be obtained by the following procedure. We consider T independent trials, in each of which one edge is drawn. The edge connects two different vertices or forms a loop;
Evolution of random graphs
110
the vertices with labels i and j are connected with the probability 2pi pj, and the loop at vertex i is formed with the probability p?; i, j = 1, . . , n, p1, ... , pn > 0, + pn = 1. Thus, after T trials we have a realization of the random graph P1 + Gn,T, which may have loops and multiple edges. The main result of this section is the following assertion. .
Theorem 2.4.1.
Assume that pi = ai /n, where ai = ai (n), 0 < s < ai < E,
i = 1, ... , n , s and E are constants, and the limit
a2 = lim
n
n-+oo n
a
i=1
exists.
Then, if n, T oo such that 2T/n -+ A, 0 < Xa2 < 1, the distribution of the number of cycles v(Gn,T) in the graph Gn,T converges to the Poisson distribution
with parameter A = - In (I - Aa2). 2
In proving the theorem, the limit distribution of the random variable ar, the number of cycles of length r, and the joint limit distribution of ar, , ... , ars are obtained.
Theorem 2.4.2. Under the conditions of Theorem 2.4.1, without the requirement Aa2 < 1, the distribution of the random variable ar for any fixed r tends to the Poisson distribution with parameterAr = Ara2r/(2r). Theorem 2.4.3.
Under the conditions of Theorem 2.4.1, without the requirement
Xa2 < 1, the joint distribution of art , ... , ar, for any fixed 1 < rl < < rs converges to the distribution of s independent random variables that have the Poisson distributions with parameters Ar1, ... , Ars, respectively.
The proof will be accomplished by the method of moments.
A cycle of length r has no self-intersections if it is composed of r vertices and exactly r edges of G,,,T. Denote by ar the number of cycles without selfintersections of length r, r > 3, in the random graph Gn,T. For r distinct vertices il, ... , ir, let li,,,,,,ir = 1 if in Gn,T there exists a cycle composed of these r vertices containing exactly r edges of GG,T; in other cases, we set ii,,...,i,. = 0. Then
ar =
il....irI
(2.4.1)
where the summation is taken over all (Y) distinct unordered sets of r distinct indices. In the complete graph with vertices ii, ... , ir, there exist (r - 1)!/2 distinct cycles containing exactly r edges. We label these cycles in an arbitrary order with the numbers j = 1, ... , (r - 1)!/2 and represent the random variable
2.4 Nonequiprobable graphs
111
j...... (r as the sum of indicators: (r-1)!/2
(j)
l],..., lr =
11
.,lr
(2.4.2)
j=1
= 1 if the jth cycle exists in GG,T, and jj) it = 0 otherwise. !r We now investigate the behavior of the random variable
where jIj)
v(Gn,T)=al+...+an,
where the variables ar are defined by (2.4.1) for r > 3, al is the number of loops, and a2 is the number of pairs of parallel edges in Gn,T Each cycle in the graph Gn,T may be thought of as the set of edges that form this cycle; therefore, the following assertion is needed for evaluating such probabilities as 1). Let Vr = ((i1, j1), ... , (ir, jr)} be the set of r distinct pairs of vertices in the graph G, T, where ik jk, k = 1, ... , r. Denote by P(Vr) the probability of the event that all the edges from Vr occur in Gn,T.
Lemma 2.4.1. If n, T - oo, 2T/n -)- A, 0 < A < oo, 0 < s < ai < E < oo,
i = 1, ..., n, then for arbitrary fixed s, E, and r, Xr
P(Vr) =
ai1aj1 ... aaj1 +
(i))
(2.4.3)
uniformly with respect to al, ... , an and all sets Vr. Moreover, for any S > 0, there exists a constant c such that, for all r and n,
P(Vr) < c
(A + S)r
nr
a11ajl ... a;,air.
(2.4.4)
Proof. Set qk = 2p;, pjk, k = 1, ... , r. Then T[m P(r17r)
ml,...,mr>1
m1
.
mr.
q1 1 ... qmr
x (1 -q1 -...
-qr)T-ml-...-m,
= T[rlgl...gr((1 -q1 -...-qr)T-r
' (T -
r)[m1+...+mr-rl
ml!...mr!
1
x (1 - q1 - ... - qr)T _ml_..._m, ).
(2.4.5)
is taken over all sets Here x[m] = x(x - 1) (x - m + 1); the summation in {m 1, ... , mr} in which m 1, ... , Mr > 1 and there exists i, 1 < i < r, such that
Evolution of random graphs
112
mi > 1. It is clear that
(1-q1-...-qr)T-r<1, and for an arbitrary fixed r,
(I - q1 - ... - qr)T-r = 1 + O(1/n).
(2.4.6)
In addition,
Z/
(T -
r)[ml+...+mr-r]
gml-1 1
Mi!...Mr!
... qmr-1 (1 - q1
qr
)T-ml-..._mr
r
qi(T -r)Si,
(2.4.7)
i=1
where
(T - r -
Si
m11...mr!
ml,...,mr?1 m;>1
X
ql 1-1 ...
qmr-1
qr)T-ml-..._mr.
(1 - q1
qj
Let li = mi - 2,1j = m j - 1, j # i (recall that ml > 1). Then
(T - r -
S; _ 1,,...,lr>o
1)[1j+...+lr]
(11 + 1)! ... (li + 2)! ... Or + 1)!
Xq1 ...qlrr(1-q1-...-qr (T - r -
57 ll,...,lr>Q X
q11
1)[11+...+lr]
111...lr!
... qrr (1 - q1
-r-1-tl-...-lr
qr)T
= 1.
(2.4.8)
Now assertion (2.4.3) follows from (2.4.5)-(2.4.8), and assertion (2.4.4) from (2.4.5), (2.4.7), and (2.4.8), since r
qi (T - r) < i-1
2TrE2 n2
(2T)r T[r]gi...gr <
n2r
'
a1aJ1...airajr.
113
2.4 Nonequiprobable graphs
oo, 2T/n Corollary 2.4.1. If n, T X, 0 < ,l < oo, 0 < e < ai < E < oo, i = 1, ... , n, then for arbitrary fixed e, E, X, and r,
Pjij)lr = 1 1 =
n _a
2 ...a? (1 Ir
11
+C
(n))
uniformly with respect to j, l < j < (r -1)!/2,all sets{i1,...,ir}andal,...,a,,. Moreover, for any 3 > 0, there exists a constant c such that, for all r and n, (j) P{'ll...ir =
1}
c
+ S)r 2 ...air. 2 nr ail
Proof. The equality 1 holds if and only if in G,, r there exist r fixed edges, {(k1, jt), ..., (kr, jr)}, kv ,0 jv, v = 1, ... , r, which form the jth cycle on the vertices i t , ... , it . For these edges, the sets (k1, ... , kr) and { J1, ... , jr } coincide with the set {i1, ... , it}. Therefore, the corollary follows from Lemma 2.4.1. The notation { i 1, ... , it } denotes an unordered set of distinct indices i 1, ... , it ; the number of such sets is (Y). For ordered sets of distinct indices ii, , ir, we will use the notation (ii, ... , ir); the number of such sets is n}r1. By the symbols
lil.....lr}
(i1.....ir)
we will denote the summations over all distinct unordered and ordered sets of r distinct indices, respectively. It is clear that the summation over all unordered sets {il, ... , ir} is well suited to summands f whose values are invariant with respect to the permutations of indices. For such summands,
Y, A...ir = r!
(2.4.9)
J 1...ir,
(il,....ir)
and, moreover, f (I)....(I)...i(k)...i(k) _ ...it I
ll
r
t
,
r
r
f l...jrk
(2.4.10)
,Ir(k)
if the left-hand side summation is taken over all distinct ordered sets of distinct i(1), ... , i(k) r-dimensional indices Lemma 2.4.2.
If 0 < e < ai < E < oo, i = 1, ... , n,
then for any fixed r,
asn - oo,
(a)'a,2.
a?
(1+0(i)). \
(2.4.11)
Evolution of random graphs
114
Proof. The following representation is valid: n
2)r
n
= L ail il... a2 =
C l=1
(i1,....Ir)
Il ,..., Zr=1
2 ail2 ... air, 2
all2 ...air2
2
ai
(11 ,.... ir)
where the summation in the first sum is taken over all distinct ordered sets of distinct indices, and in the asterisked sum, over all distinct ordered sets, each have at least two identical indices. The number of summands in the first sum is n m; the number of summands in the second sum is equal to nr - n[rl and does not exceed crnr-1 where the constant cr depends only on r. Therefore
a2 ... a2 > n[rl,2r
a? ... a?Ir < Crnr
1 E2r
11
(l l ...., it )
(i1.....ir)
and the proof is complete.
Corollary2.4.2.
Under the conditions of Theorem 2.4.2, for any fixed r > 3, Ara2r
Ear -+
2r
Moreover, for any S > 0, there exists a constant c such that
Ear < c
(A + S)r (a2 + S)r
2r
Proof. Using representations (2.4.1) and (2.4.2), with the aid of (2.4.9), Corollary 2.4.1, and Lemma 2.4.2, we obtain
Ear = (r
I
2
(2
r
a. ... a.2 1 + O (n
1)nr
\\
(r - 1)!
2r
2 ((1))
2nrr!
a E (il.....ir)
(-
2r r +0(1)). a?)r (1 +0(1)) _-(1
air
1+O n
i=1
The second assertion follows immediately from the inequality of Corollary 2.4.1.
We now evaluate the factorial moments of ar. If Sn = i +
+ Son, where
1, ... , n take the values 0 and 1 only, then according to Theorem 1.1.4,
Sn(Sn-1)...(Sn-In
+1)=kl...km,
(2.4.12)
(kl.....km )
where the summation is taken over all distinct ordered sets of m distinct indices.
115
2.4 Nonequiprobable graphs
In our case, the indices have a composite structure because
-
aY
(r-1)!/2 (IP
i1 .ir {ii....,ir} j=1
The following representation is analogous to (2.4.12):
ar (ar
ar - m +
... .
(h) ill...ir1)
=
U
(2.4.13)
i1"')...ir"')>
where the summation is taken over all distinct ordered sets M iil), ... , ir1) }, J1), ... , (Ii(im), ... , iy(m)1, jm))
of distinct indices of the form ({i1, ... , iY}, j); the set {il, ... , ir} in the index is considered an unordered set of distinct indices, and j indicates the number of the cycle formed by the vertices i1, ... , ir. We show that under the conditions of Theorem 2.4.2, for any fixed r and any fixed m > 1,
( ra2r m I .
EaYml
(2.4.14)
2r
This assertion for m = 1 follows from Corollary 2.4.2.
In order to become accustomed to the more complicated notation, we first consider the case m = 2. By (2.4.13), Ea[21 =
Pli(iii )...i(1) _i r
(1)},jl),
I
2))
i(2) =
... r
1
1}
Decompose the right-hand side sum into two sums. Let the first sum E 1 include the summands with nonintersecting sets {iii), . . . , i(r1)} and {ii2), ... , i(re)}. When we take into account that in this case 2r edges must exist to guarantee
-
(j1) 1
(j2)
1)...i(1)-5(2)...i(2) r 1
and by using Lemma 2.4.1, we obtain P { (ii... (1) - 4 i1 lr l
(2).
1r(2)
= 1j
()2TaI) . .aa2) 2 2 .. a(2) (1+0(0). .
Therefore
n)2r ((r
21)!)2 2 2 2 aia) ... ai(1)ai(2) ... a,(z) 2
E 1= ...,irl)}
x (1 + O(1/n)).
r2)})
1
r
1
r
Evolution of random graphs
116
It is clear by virtue of (2.4.9) and (2.4.10) that 2 2 2 . al(l) ... a(l)ai!2)
r
1
2
1
r
(r!)2
i( 1
1 a2 ... azr <11.....i21)
lliil......i1
Therefore, by virtue of Lemma 2.4.2,
((r_21 )1\ 2
n ai
2r
n
(ri)2
(1 +0(1)) _
2rr 2r
2
a
I
0+00)). (2.4.15)
We now show that the remaining sum E2 tends to zero. The summation in E2 is taken over the pairs of composite indices in which the sets {ii1), ... , i!1)} and {ii2), .. , it )} have at least one common element. Each
composite index ({i1, ... , it}, j) corresponds to a cycle in the complete graph with n vertices; the cycle consists of r edges and the vertices i1, ... , ir. Two cycles corresponding to the indices ({i(1), ... , ir1)), Ji) and ({i12), .. , i,(.2)}, j2) can have M < 2r distinct vertices and L distinct edges. We decompose the sum E2 into the sums EM,L containing summands with fixed values of the parameters M and L. The number of such sums does not exceed (2r)2; therefore it is sufficient to prove that any sum EM, L tends to zero. It is easy to see that in the case M < 2r,
the inequality L > M + 1 is valid. The number of summands in the sum EM,L does not exceed nM, and the probability that L fixed edges appear in Gn,T does not exceed, by virtue of (2.4.4), the value cn-L. This implies c
c
EM,L
(2.4.16)
oo, E2
(2.4.17)
0.
The assertion (2.4.14) for m = 2 follows from (2.4.15) and (2.4.17). Now let us consider the factorial moment of an arbitrary order m. By (2.4.13), EaYml
= E1 + E2,
where the sum El includes only summands that do not have a pair of sets from (1)
, ... , lr(1) }, ... ,
.(m)
(m)
}with common elements. In this case, rm edges must occur in the graph Gn, T to guarantee that the corresponding random variables equal 1. From this and Lemma 2.4.1, it follows that {i 1
Ir
{i 1
(im) (h) =1 P V (1)... , ... , 4 (( )... i(.) = 1 irl)
=
(X)mr n
2 2 al2ll) ... al2rl) ... al(m) ... arm) (1 + o(1)), 1
1
117
2.4 Nonequiprobable graphs
and, by (2.4.9) and Lemma 2.4.2,
1. ra2r m
El = I 2r
(2.4.18)
(1+0(1)).
It remains to prove that the sum E2 taken over the remaining sets of indices tends to zero. The summation in E2 is taken over m sets of composite indices that have at least one common element in at least one pair of the sets {ii' , ... , i,(.p) {i(q), ... , itgi}, p q. Recall that each composite index corresponds to a cycle in the complete graph with n vertices. The cycles corresponding to m indices can contain M distinct vertices and L distinct edges. We decompose the sum E2 into the sums EM,L containing summands with fixed values of the parameters M and L. The number of such sums does not exceed (rm)2; therefore it is sufficient to prove that any sum EM,L tends to zero. It is clear that if M < rm, then L > M+ 1. Thus, since the number of summands in the sum EM,L does not exceed nM, and by (2.4.3) the probability of L fixed edges occurring in G,,,T does not exceed cn-L,
EM,L < Therefore, as n
C
nL-M
<
c
n.
oo,
E2 - 0.
(2.4.19)
The assertion (2.4.14) follows from (2.4.18) and (2.4.19).
By (2.4.14), the limit distribution for ar, r > 3, is the Poisson distribution with parameter Ar = ,kra2r/(2r). It is easy to see that in the current situation the number of loops al and the number of pairs of parallel edges a2 approach the Poisson distributions with parameters Al = Aa2/2 and A2 = A2a4/4, respectively. This proves Theorem 2.4.2. The more general Theorem 2.4.3 can be proved analogously. It is sufficient to verify that under the conditions of the theorem, Xm, ... a,ms
E xImi] ... alms]
for arbitrary fixed integers m 1, ... , m5, where
;ra2r Ar =
2r
By (2.4.13),
ark
kjk))
k] _ ((j k) jik)) ...,(Imkk,lmk)))
...((k)
`
(k)
Ii
Imk
where I ( k ) _ ( 1'k)
... , irk'k) },
1 = 1, ... , Mk,
k = 1, ... , s,
Evolution of random graphs
118
are unordered sets of rk vertices, and j(k), l = 1, ... , Mk, k = 1, ..., s, are the numbers of cycles of length rk under the labeling chosen. Therefore Ealmll rt ... alms] rs
(i^l0") =1,...,;(s
0")
{
=1,...,
I1u
)
I(.(1))
I
=1,...,
Ims')
(j.(8,))
=
1
where I = 11(1,1>,
(I,', jm11')1 ... ((hs>, >
Is>),
... , (Ims', jm')))-
We decompose the sum on the right-hand side of this representation into two parts; let the sum El include only summands with the distinct elements in all (k) II , l = 1 , ... , mk, k = 1, ... , s; and let the sum E2 include all the remaining summands. For the summands of the first sum, the corresponding random variables equal l only if there exist m 1 rl + +m srr fixed edges in G,,,T. Therefore, by Lemma 2.4.1, 1
1
(Jmss s)
I(t)
=
IM(IS)
(A)mlrl++msrs
2
1
I 2
2
2
irl
11
irs
/
ait(1 1) ... a( 1 ,1 ) ... a-(.,,s) ... a(mss) 0 + 00)),
n
and, by (2.4.9), (2.4.10), and Lemma 2.4.2,
rta2rt
(
2rl
mt
)
.
...
(
rsa2rs 2rs
fs (1 + o(1)).
It remains to prove that E2 tends to zero. The summation in E2 is taken over sets of composite indices in which at least one of the elements 1, 2, ... , n is encountered at least twice. A cycle corresponds to each of the composite indices. The existence of a common element in the cycles implies that the number M of distinct vertices contained in the cycles and the number L of distinct edges involved in the cycles satisfy L > M + 1. We decompose the sum E2 into a finite number of sums EM,L
containing summands with fixed values of the parameters M and L. By virtue of (2.4.3), for each of these sums, the estimate c
c
holds because the number of summands does not exceed n`x, and the probability of L fixed edges occurring in G,,,T does not exceed cn-L. This proves Theorem 2.4.3. To prove Theorem 2.4.1, we need the following auxiliary assertion.
119
2.4 Nonequiprobable graphs
Lemma 2.4.3. be nonnegative integer-valued random variables such that for an arbitrary fixed s and arbitrary nonnegative integers k1, . . . , ks, P{l (n) 1
al' ... as e_al_..._as
- k1, ... ,1, (n)-- ks} s
k1!...kst
as n _+ oo, where al, a2.... is a fixed sequence of nonnegative numbers. Moreover, suppose
nn) - 0 as s
(2.4.20)
oo, uniformly in n, and let 00
Eak=A
Then the distribution of the random variable nn) to the Poisson distribution with parameter A.
= lain) +
+ lnn) converges
Proof. We show that for an arbitrary fixed e > 0 and an arbitrary fixed m, Ame- A
nn)= m}-
j
M!
<E
for sufficiently large n. For fixed e and m, there exists s such that Ame-A,. s
_
Ame-A
m!
where As = a1 +
m!
+ as.
It is not hard to see that
Therefore, by (2.4.20), IP{inn) = m} m}l < E/3 for sufficiently large s. Finally, the conditions of the lemma yield the convergence of the distribution of S n) = l; in) + . + (for any fixed s) to the Poisson distribution with parameter + as. Therefore As = a1 + 1
m}
-
Asme -AS
m!
E
-3
for sufficiently large s. Theorem 2.4.1 follows from Theorem 2.4.3 and Lemma 2.4.3, whose conditions are satisfied when Aa2 < 1.
120
Evolution of random graphs
2.5. Notes and references The investigation of the evolution of random graphs began when P. Erd6s and A. Renyi published the results of their study [37] in 1960. Along with the basic properties of the random graph Gn3T, they discovered the effect known as a phase
transition. At about the same time, V. E. Stepanov studied the graph G p, as documented later [ 133,134,135]. Until recently, Stepanov's results had not seemed to receive wide recognition. In particular, Stepanov proved that if p = c/n, where
c is a constant, c > 1, then the size of the giant component is asymptotically normal with mean na(c) and variance no(c), where Y
c
Y(1 - Ylc)
0(c) = c(1-y)2 ,
and y < 1 is the root of the equation Ye-Y = Ce-C A similar assertion for the graph Gn3T was proved by B. Pittel [123] about twenty years later. He found that the size of the giant component of Gn3T is asymptotically
normal with parameters not(c) and no(c)(1 - 2y + 2y2/c) as n, T -* oo and
2T/n- c> 1. Many open questions concerning the evolution of random graphs remain. The main goal of this chapter is to demonstrate the approach based on the generalized scheme of allocation in investigations of the evolution of random graphs. Section 2.1 shows that fine properties of subcritical graphs can be obtained in a rather simple and natural way, especially as concerns the behavior of subcritical graphs near the critical point. The transition phenomena for the graph Gn3T were first considered by B. Bollobas [20]. The results presented in Section 2.1 can be found in [77]. The approach based on the generalized scheme of allocation allowed us to prove asymptotic normality of the number of unicyclic components and find the limit distribution of the maximum sizes of trees and unicyclic components. Section 2.2 is devoted to critical graphs. The behavior of random graphs near the critical point, and especially in the critical domain where the giant component appears, is very complicated and difficult to investigate. The investigations of the behavior are far from complete, but even now the results obtained could fill another book. Much information about random graphs can be found in the fundamental work by Bollobas [21] and in the book [105], which is devoted to the evolution of random graphs. A detailed investigation of the birth of the giant component
is given in [63]. Supercritical graphs are considered by Luczak [99], who, in particular, proved that the right-hand bound of the critical domain is determined by the conditions n, T -+ oo, (1 - 2T/n)3n -+ -oo. Formally, to analyze supercritical random graphs, we can use the representation of almost all such graphs as a combination of components of three types: one giant
2.5 Notes and references
121
component, trees, and unicyclic components. However, this approach is hampered by the absence of a simple formula for the number of connected graphs with n
vertices and T edges with k = T - n > 0. Note that k = T - n is equal to the number of independent cycles in the graph and is called the cyclomatic number of the graph. Denote by c(n, k) the number of connected graphs with n labeled vertices and a cyclomatic number k. It is clear that c(n, -1) is the number of trees, and by the Cayley formula, c(n, -1) = nn-2, whereas c(n, 0) is the number u of unicyclic graphs considered in Section 1.7. The numbers c(n, k) were investigated by Stepanov (see [10, 142, 143]) and E. M. Wright [151, 152] and are known as the Stepanov-Wright numbers (see [143]). As n - oo and k3/n --> 0,
c(n, k)
d(3rr)1/2
(l2ke
n"+(k-l)/2(1
+ 0(1)),
where, as it was proved by Meertens, d = 1/(2n) (see Bender, Canfield, and McKay [16]). We hope that the results of the study by Bender et al. [ 17], who give the asymptotics of c(n, k) for all regular variations of the parameters n and k, can be used in the application of the generalized scheme to random graphs and help to bring the investigations of supercritical graphs to the level attained for the subcritical case in Section 2.1. Note that obtaining the limit distributions of numerical characteristics of supercritical graphs would be merely a problem of averaging if the joint distribution of the size of the giant component and the number of its edges were known.
The parameter 0 = 2T/n plays the role of time in the evolution of random graphs. Therefore, each numerical characteristic of a random graph can be considered not only as a random variable, but also as a random process with the time parameter 0. Of significant interest is the approach using the convergence of such processes. This approach is used in the recent papers [34, 62, 127]. Note that the investigations of convergence of such random processes in combinatorial problems were started by B. A. Sevastyanov [132] and Yu. V. Bolotnikov [22, 23, 24]. The random graph G,,,T discussed in Section 2.3 was investigated by Kolchin [79, 83]. This graph provides an appropriate model of the graph corresponding to the left-hand side of a system of random congruences modulo 2 considered in the next chapter. An analogy of Theorem 2.3.8 for bipartite graphs was proved by Saltykov [131]. The nonequiprobable version of the graph G,,,T is considered in Section 2.4, where the results of the papers [88,66,65] are presented. Here we use the method of moments. The lack of regular methods for an asymptotic analysis of nonequiprobable graphs makes it impossible to carry out anything approaching a complete investigation of such graphs. It seems to us that developing the methods appropriate for the analysis of nonequiprobable combinatorial structures is a problem of great importance.
3
Systems of random linear equations in GF(2)
3.1. Rank of a matrix and critical sets In this section, we consider systems of linear equations in GF(2), the field with elements 0 and 1. Let us begin with two examples where such systems appear. Consider first a simple classification problem. Suppose we have a set of n objects
of two sorts, for example, of two different weights. We may sequentially sample pairs of the objects from the set at random, compare the weights of the objects from the chosen pair, and determine whether the weights are identical or different. The problem is to identify the objects that have the same weight - actually, to estimate the probability of finding that solution. For a formal description of the situation, let {1, 2, ... , n} be the set of objects under consideration and let xj be t h e unknown type of the object j, j = 1 , ... , n. W e may assume that x1, ... , x take the values 0 and 1, depending on the class to which the object belongs. We
choose a pair of objects i (t) and j (t) in the trial with number t, t = 1, ... , T, and let bt be the result of their comparison: bt = 0 if their weights are identical, and bt = 1 otherwise. Thus, the results of the comparisons can be written as the following system of linear equations in GF(2):
xi(t) +xj(t) = bt,
t = 1, ... , T.
(3.1.1)
It is clear that the system can be rewritten in the matrix form
AX= B, where X = (x l , ... , and B = (b1, ... , bT) are column-vectors, and the elements ati of the matrix A = I1 ate 11, t = 1, ... , T, j = 1, . . . , n, are random variables whose distribution is determined by the sampling procedure. It is convenient to associate the system, or more precisely, the matrix A, with the random graph GG,T with n vertices that correspond to the variables x1, ... , x,,. The graph has T edges (i (t), j (t)), t = 1, ... , T. Therefore the graph can have loops and multiple edges, depending on the sampling procedure. 122
3.1 Rank of a matrix and critical sets
123
In this chapter, we consider the characteristics of the graph Gn,T that are related to some of the properties of the system (3.1.1). It is clear that the connectedness of the graph is an important characteristic for the classification problem. Indeed, in the case where the graph is connected, we can determine all values of the variables xl, ... , xn if we set one of them equal to 0 or 1. In both cases, the partitions of the set are the same, but the system has two different solutions. In the case where the graph Gn,T is disconnected, the system has more than two solutions; therefore a complete classification is impossible.
Now let the vector B consist of independent random variables that take the values 0 and 1. If the balance is out of order, the weighings can sometimes be wrong, and the variables b1, ... , bT can differ from the true values. In this case, we obtain a system with distorted entries on the right-hand side that sometimes has no solution. If the balance is completely wrong, we may assume that the variables bl .... bT do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. In this situation, several natural problems arise. Does the right-hand side b1, . . . , bT depend on the left-hand side of the system or are the sides independent? Can we reconstruct the real values of x1, ... , xn in the case where the right-hand parts bl, ... , bT are distorted? Let us turn to the second example. Let a vector (Cl, ... , cn) in GF(2) be given. If we take an initial vector xi, . . . , xn, then we can develop the recurring sequence
xn+t, t = 1, 2, ... , by the following recurrence relation:
t=1,2,....
(3.1.2)
This recurrence relation can be realized with the help of a device called a shift register, presented in Figure 3.1.1. A shift register consists of n cells or stages with labels 1, 2, ... , n. The n-dimensional (0, 1) vector of the contents of these stages is called the state of the shift register. At an initial moment, the state of the shift register under consideration is the vector (xl, ... , xn). The choice of the vector (cl, . . . , cn) means that we choose the stages with numbers corresponding to the ones in the sequence cl, ... , cn and form the mod 2 sumxn+l = Clxl + +Cnxn.
At the next moment, the contents of all stages are shifted to the left so that xn transfers to the stage numbered n - 1, xn-1 transfers to the stage n - 2, and so +Cnxn is placed into the on, xl leaves the register, and the sum xn+1 = Clxl + stage with label n. Thus the state (xl, ... , xn) transfers to the state (x2, ... , xn+l).
Xt
xt+n-1
Figure 3.1.1. Shift register
J
Systems of random linear equations in GF(2)
124
The process is repeated. Thus, if cl, . .. , cn are given, then for any initial state xl, ... , xn, the recurring sequence (3.1.2) satisfies xn+1 = Clx1 + ... +Cnxn,
Xn+2 = C1x2 +
+ Cnxn+1,
Xn+T = C1XT + ... + Cnxn+T-1
Let us change the notations and put bt = xn+t, t = 1, 2, ..., T, and all = cl, . . . , aln = cn. Then the first relation becomes
=bl It is clear that we can substitute clxl + and obtain
+ cnxn for xn+l in the second relation
a21x1 + ... + a2nxn = b2
In the same way, we obtain allxl + ... + alnxn = b1, (3.1.3)
aTlxl + ... + aTnxn = bT
Suppose that the initial state (xl, ... , xn) is unknown and we observe the sequence b1, . . . , bT. Then we can regard relations (3.1.3) as a system of linear equations with respect to the unknowns xl, ... , xn. A natural question is how many observations are needed to reconstruct the initial state and to obtain all elements
of the sequence bt, t = T + 1, ... . The other situation concerns the feedback points cl, ... , cn. Suppose we observe the sequence b 1 . . . . . bT, but the vector (Cl, ... , cn) determining the shift register is unknown. If the number of l's in (Cl, ..., c,) is k, then there are (k) possibilities for this vector. If we use an exhaustive search to find the true vector that corresponds to the observed sequence, we have the following situation. If the chosen vector is true, then system (3.1.3) is consistent for any T, but if the vector
(cl, ..., cn) is wrong, then the system becomes inconsistent for some T. Therefore the consistency of the system (3.1.3) serves as a test for selecting the true vector.
Let us introduce the auxiliary notions of a critical set and a hypercycle for our investigations of systems of linear equations in GF(2). Note that the ordinary notions of linear algebra, such as the notion of linear independence of vectors, rank of a matrix, Cramer's rule for finding the solutions of linear systems of equations,
3.1 Rank of a matrix and critical sets
125
and so on, are extended in the obvious way to the n-dimensional vector space over GF(2). For example, if the rank of a T x n matrix A = IIat.I II in GF(2) is r, then the homogeneous system of equations
AX= 0, where X = (x1, ... , x,) is the column-vector of unknowns, has exactly n - r linearly independent solutions. Denote by
at = (atl,...,atn), t = 1,...,T, the rows of the matrix A. If the coordinate-wise sum
ate + ... + atm =0, then the set C = {t1, ... , tm} of row indices is called a critical set. If C1 and C2 are critical sets and C1 # C2, then
C1AC2=(C1UC2)\(C1nC2) is also a critical set. Let El, ... , ss take the values 0 and 1 . Critical sets C1, ... , CS are called independent if
s1C1 A s2C2 A ... A ESC, = 0, if and only ifel Denote by s(A) the maximum number of independent critical sets and by r (A) the rank of the matrix A.
Theorem 3.1.1. For any T x n matrix A in GF(2),
s(A) + r(A) = T. Proof. We consider the homogeneous system of equations
A'Y=0
(3.1.4)
in GF(2), where A' is the transpose of A. There is a one-to-one correspondence between the solutions of the system (3.1.4) and the critical sets: The solution 1 t,,...,tm = (Y1, , YT), whose components yt,, ..., ytm are 1 and the other components are zero, corresponds to the critical set C = {t1, ... , t,,,}. The linear independence of solutions corresponds to the independence of critical sets. Therefore the maximum number of critical sets s(A) equals the maximum number of linearly independent solutions of system (3.1.4), which we know is T - r(A).
Systems of random linear equations in GF(2)
126
In addition to the critical sets of a T x n matrix A = Ilatj II, we consider a hypergraph GA that is also defined by the matrix A. The set of vertices of the hypergraph GA is the set 11, . . . , n) of column indices and the set of enumerated hyperedges is the set {el, ... , eT}, where
et={j: atj=1}, t=1,...,T. Thus there exists a correspondence between a row at = (at 1, ... , atn) and the hyperedge et, t = 1, ..., T. Note that the empty set corresponds to a row consisting of zeros. The multiplicity of a vertex j in a set of hyperedges C = {et. , ... , etn } is the number of hyperedges in C that contain this vertex. A set of hyperedges C = {ett , ... , e1, } is called a hypercycle if each vertex of the hypergraph GA has an even multiplicity in C, in other words, if the coordinatewise sum of rows at, + + atm in GF(2) equals the zero vector. If each row of the matrix A contains exactly two l's, then the hypergraph GA is an ordinary graph, perhaps with multiple edges, and a hypercycle is an ordinary cycle or a union of cycles. The set of the indices of hyperedges that form a hypercycle is a critical set for the matrix A. Let 81, ... , ss take the values 0 and 1 . Hypercycles C 1 . . . . . Cs are independent, if
s1C1 L1s2C20... pe5C5=0, if and only if Si = = es = 0. Therefore the maximum number s(A) of critical sets of the matrix A equals the maximum number of independent hypercycles in GA.
3.2. Matrices with independent elements This section deals with random matrices with independent elements. Let A = II atj II
be a T x n matrix whose elements are independent random variables taking the values 0 and 1 with equal probabilities, and let pn (T) be the rank of the matrix A in GF(2). The following theorem is the main result of this section. Theorem 3.2.1. Let s > 0 and in be fixed integers, in + s > 0. If n
oo and
T = n + m, then 00
P{pn(T) = n - s) -' 2-s(m+s) `
-f1
m+s
(i_) fl (1
1
-
1
21 f
where the last product equals 1 for in + s = 0.
Proof. The limit theorem will be proved by using an explicit formula for Pip, (T) = n - s). Denote by pn (t) the rank of the submatrix of A which consists
3.2 Matrices with independent elements
127
of the first t rows of the matrix A. We interpret the parameter t as time and consider
the process of sequential growth of the number of rows. Let i;t = 1 if the rank p, (t - 1) increases after joining the tth row, and t = 0 if the rank preserves the previous value. It is clear that
P"(0 = 1 + ... -f- t It is not difficult to describe the probabilistic properties of the random variables 1, , T. The event {fit = 1 } means that the tth row is linearly independent with respect to the set of the rows with numbers 1, ... , t - 1, and the event {fit = 0) means that the row with number t is a linear combination of the preceding rows. If among the preceding t - I rows there are exactly k linearly independent ndimensional vectors, then the linear span of these k vectors contains 2k vectors (all linear combinations of these k vectors). The matrix A is constructed in such a way that each row can be obtained by sampling with replacement from a box containing all 2" distinct n-dimensional vectors. In other words, any row of the matrix A is independent of all other rows and is equal to any n-dimensional vector with probability 2-n. Therefore 2k
0 I pn(t - 1) = k) = 2n, 1-2k
P{4t=1I pn(-1)=k} =
n
.
Thus the process pn (t) is a Markov chain with stationary transition probabilities that are given by (3.2.1). To find P{pn(T) = n - s}, we can sum the probabilities of all trajectories of the Markov chain that lead from the origin to the point
with coordinates (n + m, n - s), that is, the trajectories such that pn(0) = 0, pn (n + m) = n - s. If we represent a trajectory as a "broken line" with intervals of growth and horizontal intervals, we see that any such a broken line has exactly n + m - (n - s) = m + s horizontal intervals corresponding to m + s zeros among the values of i;1, . , n+m . The graph of the trajectory with fit, = 0, ... , 0 is illustrated in Figure 3.2.1. By using (3.2.1) and Figure 3.2.1, we can easily write an explicit formula for the probability of a particular trajectory and for the total probability. The derivation of this probability is quite simple if m + s = 0. Indeed, the only trajectory with pn (0) = 0 and p, (n + m) = n + m has no horizontal intervals, and at each interval the broken line increases; therefore 1
P{pn(n+m)=n-s} = 1-2n n
_
H n
i=-m+1
2) ...(l-
)(I -2n
(i_) 1
2'
,
2n+m-1 \
2n
f
Systems of random linear equations in GF(2)
128
n-s
t = n + m
tm+s
t2
tl
Figure 3.2.1. Graph of the trajectory with tl =
= tm+s = 0
and in thecasem+s=0,asn - oo, °O
P{pn(n+m)ns} fl
i=s+1
(_).
This coincides with the assertion of the theorem for m + s = 0 because the last product equals 1. In the general case, for m + s < 0,
P{pn(n+m)=n-s}
T
P{ 1 = 1,...,4tl-1 = 1,4t1
1
-2n)... 1-
1
(11
2t-(m+s)-1 2n
2t) -I+t2-2+- +tm+s -m -s
x
2n (m +s)
t-(m+ s)-1 /
F1 k= 0
2k
-) 2n
2t1-1+t2-2+... +tm+s -m -S
x 1
1,...}
3.2 Matrices with independent elements
129
Taking the factor 2(n-s)(m+s) out of the sum yields
P{pn(n +m) = n - s}
I I ( I - 2'
= 2-s(m+s)
i=s+1
E
x
i
As will be seen from the following evaluations, the moments t1, ... , tm+s are concentrated at the end of the trajectory; therefore, in the sum of the formula, it is convenient to switch to the variables
it=-(ti-l+s-n), 1=1,...,m+s. It follows from 1 < tl <
< tm+s < n + m that
0
If we change the sign, we see that the domain 1 < ti < terms of the new variables is
< t rl+s < n + m in
0
=
(3.2.2)
2-S(m+S) 2
i=s+1
1
0
It is easily seen that, as n -+ oo,
ft (I-2i)- ft 1
i=s+1
(1-2i), 1
(3.2.3)
i=s+1
and
2-im+s-...-Il Y O
2-im+s-...-11.
(3.2.4)
0
To complete the proof it remains to transform the right-hand side of (3.2.4). It
Systems of random linear equations in GF(2)
130
is not difficult to see that
T
2-ir-...-i2-11
0-
2-il
E 2-ir 0
-1
-
1//
2
)
2-212
2-ir-...-lg
0
1
\\
12<11
\\
2
-1
1
1
i3 -
2-ir..-4 -36
22
0
1
2)-1 1 1
-1
2Y
r
_
1 I
2-Yir
221-1
0
1
1 - 21
(3.2.5)
Passing to the limit in (3.2.2) and taking into account (3.2.3), (3.2.4), and (3.2.5) provide the assertion of the theorem.
Let the elements of a T x n matrix A = II at.i II be independent and take the values 0 and 1 with equal probabilities. We consider the system of equations
AX = 0
(3.2.6)
with respect to unknowns X = (xl, ... , xn) in GF(2). Denote by vn,T the number of linearly independent solutions of this system of equations. If the rank pn (T) of the matrix A equals r, then Vn,T = n - r. Therefore Theorem 3.2.1 yields the following assertion. Theorem 3.2.2. Lets > 0 and m be fixed integers, in +s > 0. I fn - oo, then 00
P{vn,n+m=s}-2s-(m+s) ji (1
-2i ii(1-2i1
i=s+1
i=1 \
where the last product equals 1 for m + s = 0.
In particular, for m = s = 0, P{vn,n = 0}
f00I 1- Zi I= 0.28878816.... i=1
/
f
3.2 Matrices with independent elements
131
The results of Theorems 3.2.1 and 3.2.2 are of special interest because they are stable in the sense that the limit distribution of the rank of a matrix is invariant with respect to deviations of the distributions of its elements from the equiprobable distribution.
Theorem 3.2.3. Let the elements of a T x n matrix A = II atj II be independent and suppose there is a positive constant S such that, for the probabilities pry _
Plate = 1}, the inequalities
0 and m be fixed integers, m + s > 0. Then, as n -* oo, 00
1
m+s
1
fl (i_)1 (1-21)
-1
i=1
where the last product equals 1 for m + s = 0. Because these results are outside of the main combinatorial direction of this book, we will omit the complicated proof of this theorem (see, e.g., [93]). We illustrate the situation by proving that, under the conditions of Theorem 3.2.3, the mean value of the number of nontrivial solutions of system (3.2.6) is invariant to deviations of the distributions of elements of A from the equiprobable distribution. Let /An,T be the number of nontrivial (i.e., nonzero) solutions of system (3.2.6). If we associate to the vector X an indicator that is 1 if X satisfies the system, then P{AX = 01.
EILn,T = X#O
We will evaluate EAn,T by using the following lemma on summation of independent random variables in GF(2). Lemma 3.2.1. Let 1, ... , 4n be independent random variables that take the values 0 and 1 with probabilities
P{i = 1} = 1
P{i = 0} =
n.
1
Then, in GF(2),
n =')=
I-Al...An 2
Proof. It is clear that it suffices to prove the assertion of the lemma for n = 2. In
Systems of random linear equations in GF(2)
132
that case,
P{ l+42=1) = P{i;1=1, i;2=0}+P{ 1=0, 42=1} _ (1 - 01)(1 + A2) (1 + 01)(1 - A2) +
4
4
1-A1L2 2
If the elements of A are independent and take the values 0 and 1 with equal probabilities, then by Lemma 3.2.1, for any X 0,
+ alnxn = 0})T = 2-T
P{AX = 0} _ (P{o iixl +
Therefore E An,T = (2' - 1)2-T, and for T = n + m, where m is a fixed integer, 1
El-In,n+m = 2m
2n+m'
and as n -+ oo, 1
El2n,n+m
2m
Under some conditions on the nonequiprobable distribution of the matrix A, the last result still holds. Let
ptj(n)= P{atj = 1), and, as before, denote by An,T the number of nontrivial solutions of system (3.2.6).
Theorem 3.2.4.
Under the conditions of Theorem 3.2.3,
EAn,T - 2-m. Proof. By using the indicators as in the calculation of the mean number of solutions in the equiprobable case, we find that
LL n
Eµn,T = : P{AX = 01 = X00
Pj,,...,jk'
(3.2.7)
k=1
where, for any fixed set {11, ... , jk) from the domain of summation, the term Pi....., jk = P{AX = 01 corresponds to the vector X = (x1, ... , xn) whose elements with indices j1, ... , jk are 1 and the remaining elements are zero. We represent the probabilities pry) as
l-
P(n)
1 - Atj 2
3.2 Matrices with independent elements
133
According to the conditions of the theorem, there exists A < I such that I Ate I < A for all t and j. Since the rows of A are independent, T
(t)
P31,...,1k -
pJl,...,lk' i=1
where
jk = P{ati1 + ... + atfk =01. By Lemma 3.2.1, P{a tli
1 + Atjl ... Atik + ... + a t jk =AI = 2
'
jk
and for all tand l < jl
1-Ak
1+Ak
(t)
2
-p31....,3k -<
2
Hence, for Pj ......jk' we obtain the bounds
(i_L)T < P(t) 2
11....,ik
< ( 1 +2Ak )
T
By using these inequalities, we find from (3.2.7) that n
(n)
2Ak)T <
T<
(k) (1 2Ok)T
n
(3.2.8)
k=1
k=1
Now let T = n + m, where m is a fixed integer. The left and the right sides of (3.2.8) can be estimated in the same way. Therefore we obtain only an estimate of the right-hand side. Let
= S(0)
1 +Okn+m
kE =l (k)
(
2
and compare S(0) to
S(0)= Ln ` k=1
n
(k)
1
2n+m
We have seen that S(0) -+ 2-m as n -> oo. We show that for any fixed A, 0 < A < 1, the difference S(A) - S(0) tends to zero. We divide S(0) into
Systems of random linear equations in GF(2)
134
two parts:
(n) C1 +Ok \n+m
S1 (A) _
/11
k
1
2
y (n
S2(A) =
/1+Ak)n+m 2
sn
/r
k)
where e, 0 < e < 1/2, will be chosen later. For the sake of simplicity, suppose
that E is such that En is an integer; then for any E and 0, 0 < A < 1, 0 <
E<1/2,
n)
Sl (0) = 1
((1+Lff 2
k
C 1 + n+m
l+A)n+m
\nEn /
2
Ennsn
Ene-sn
(En) 8n
2
< En
by using the inequality n! > nn J e-n. This bound for S1(A) can be written as S1 (o) <
En
1+OJlm(2s6e) I+An -E
2
If we choose a sufficiently small E, we can make the value (1 + 0)/(2EEe-E) less than 1. For such E, the bound tends to zero as n -+ oo. Thus, there exists a fixed E, 0 < E < 1/2, such that the value S1 (A) and, consequently, S1 (0) tend to zero, and S1 (A) - S1(0) -+ 0. We now estimate the difference S2 (A) - S2(0). It is clear that
0<S2(0)-S2(0) _
(k)2n+m((I+Ak)-1)
E
1
k 2n +n1
((1 + Osn) - 1) n
_ ((1 + Qsn)n+m - 1) en
Since (1 + DEn)n+m
above that S2(0) - S2(0) S(0) -+ 2-n'; hence, S(A)
k 2n+m
((1 + En) n+m - 1).
1 as n -+ oo, it follows from the estimate obtained 0. Thus we have shown that S(A) - S(0) -->. 0 and 2-m. Theorem 3.2.4 is thus proved.
3.3 Rank of sparse matrices
135
We can actually relax the hypotheses of Theorem 3.2.4. The result remains true
if fort = 1,...,T, j = 1,...,n, lognt+xn < p(t) < 1 _ lognn+xn where xn tends to infinity arbitrarily slowly (see [93]). These bounds are exact in a sense because, as we will show in the next section, the limit distribution of the rank of a matrix A differs from the distribution given in Theorem 3.2.1 if the probability of 1's does not satisfy these inequalities.
3.3. Rank of sparse matrices In Section 3.1, we introduced the notion of critical sets of a matrix. Recall that a set (ti, ... , t o } of row indices of a matrix in GF(2) is called critical if the coordinatewise sum of rows with indices t1, ... , t,,, is the zero vector. The notion of independence of critical sets was also introduced, and s(A) denoted the maximum number of independent critical sets of a matrix A. According to Theorem 3.1.1, the rank
r(A) of a matrix A is related to s(A) by the equality s(A) +r(A) = T. Therefore, instead of the rank of a matrix, we can investigate the maximum number s(A) of independent critical sets of the matrix. In this section, critical sets are applied in the analysis of the rank of random sparse matrices. Let the elements of a T x n matrix A = Ilat.i II be independent random variables such that
P{ate=1}=
logn +x n
P{ate=0}=1- logn +x n
(3.3.1)
where x is a constant, t = 1, ... , T, j = 1, ... , n. We find the limit distribution of s(A) for such a matrix. Theorem 3.3.1. If n, T --* oo such that T/n -+ a, 0 < a < 1, and condition (3.3.1) is valid, then the distribution of the maximum number of independent critical sets s(A) converges to the Poisson distribution with parameter A = ae-x. We show first that the distribution of the number of critical sets that correspond to zero rows of the matrix converges to a Poisson distribution. Denote the number of zero rows of the matrix A by ln,T.
Lemma 3.3.1. If n, T -* oo such that Tin -+ a, 0 < a < oo, and condition (3.3.1) is valid, then for any fixed k = 0, 1, ... ,
ekek} -+
where A = ae-x.
V
136
Systems of random linear equations in GF(2)
Proof. The probability pn that a fixed row consists entirely of zeros is
logn +xln J n
C
Ian
= 1-
and under the conditions of the lemma,
pn = 1e-x(1+o(1)). n
The random variable ln,T has the binomial distribution with parameters (T, p,), where T is the number of trials and pn is the probability of success. Under the conditions of the lemma, the mean number of successes Tpn tends to ae-x; hence, the binomial distribution converges to the Poisson distribution with parame-
ter ae-x. We now prove that if a < 1, then with probability tending to 1, all critical sets consist of only zero rows. Lemma 3.3.2. If n, T oo such that Tin a, a < 1, and condition (3.3.1) is valid, then with probability tending to 1, the critical sets of A consist of only zero rows.
Proof. We consider the total number of critical sets in which each contains at least one nonzero row. It is sufficient to prove that the mathematical expectation of this number tends to zero. Although the proof of this fact is straightforward, it involves many cumbersome estimations of sums containing the binomial coefficients. An even number of successes among k independent trials with probability of
success p occurs with probability (1 + (q - p)k)12. Let us find the probability that k fixed rows form a critical set containing a nonzero row. The indices of these rows form a critical set if each column of the submatrix formed by these rows contains an even number of l's. According to the remark on the probability that the number of successes is even, this probability equals
(,+(,_ 2(logn+x) n
2
Therefore the probability that these k rows constitute a critical set equals
n I/l +
1-
2(lognn +x))k
n
Note that the probability that there is no 1 in all these k rows is equal to
1-
logn+xlkn n
/I
3.3 Rank of sparse matrices
137
By using the corresponding indicators to represent the total number of nontrivial critical sets and the number of the critical sets that consist of zero rows, we obtain
the following expression for the mean number of critical sets that do not consist of zero rows: ()_r
k=0
(T) k - k=0
log nn + x\kn
k\1
/I
where
rk=1-{- 1 -
2(logn +x))k n
We include the terms with k = 0 into these sums because they cancel each other. Note first that
(T) T
kC1 -
k=0
log
n+x)kn
n
-
n+x)n)T
(1 + 1 1 - log
J J
\
and under the conditions of the lemma, loge+x)n)T
//
))
= l 1+ne x+o(
T
=e"e X(1+o(1)).
Now consider the sum
0
T
(T)
k2nrk = 1: k)2n
k-0
1=0
n (T)1-rk =
k=0
/
2(log
1+11\
()(l+a= l'n
1=0
I) 2n `\ f
\ 1=0
1 (k)f2n E (T)kl =0
k=0
Let ak =
(;)(1 +a)T,
and divide the sum
S(n, T) =
(k) k
n))k)n
2n (1 + ak)T
(k)
E kk=0
klalk
`f
(1 + ak)T
Systems of random linear equations in GF(2)
138
2T
ii k2
k1
n/2
k3
n
k4
Figure 3.3.1. Graphs of the functions (n)2-" and rk
into five parts so that
S(n,T)=S1+S2+S3+S4+S5, where
S1 =
S2 =
ak, 0
S4 =
ak, k2
S5 =
ak,
ak, k4
k3
k1 = en,
S3 =
ak, kt
k3 = n - 1n1/2+1/1o
k2 = n (1 - s), 2
2
k4 = n + 1n1/2+1/10
2
2
2
and the value of s will be chosen later. For convenience we present the graphs of the functions (k )2-n and rk = (1 + (1 - 2(logn +x)/n)k)T as functions of k in Figure 3.3.1. The major contribution to S(n, T) is made by the sum S4. It is clear that
I - 2(logn +x))k \\\
n
= e2k(logn+x)/n(1 +o(1))
J
= Ie-x(1 +o(1)) n
uniformly in the integers k = n/2 + ufn-/2 such that I uI < n1110These k form the domain of summation of S4, which equals {k: Jul < n1/10). Therefore rkn =
gn (l+(l_2oo+)k)T
= 1 l+ne-x+o \1// n \\
=
eae x(1+00))
T
3.3 Rank of sparse matrices
139
uniformly in k in the domain of summation of S4. Thus
`,
S4 =
()e_x(1+o(1))
()_r=
k:IuI
k:Iu
= eae-x
(n) -(I + o(1)) = eae x (1 + 0(1)), k
k:IuI
since by the de Moivre-Laplace theorem, -+ 1. E (n)1 k in
k:Iul
We now have to show that the remaining four sums tend to zero. We begin with S5
k>n/2+nl/l0,fn-/2
Since rk is monotone, we find that
(n)
T S5 < rk4
k:u>nl/lo
k 2n
Under the conditions of the lemma S5 -j 0, since, as was proved, r a and according to the de Moivre-Laplace theorem, n
k:u>n/10 -
-+ eae-x,
1
k In-
Let us estimate
()_r1.
S1 = y 0
By using the monotonicity of rk , we find that, for sufficiently small e such that en is an integer,
S1 < rp <
k
(n) k 2n
=
(1 + en)ne2T (en)en fin-en2n
2T-n k
(n) k
< (1+k1)(n)2T-n k
2T/n
n
< (1 + en) (2ese) -e
It is clear that 2T /n (2eee-e)-1 < q < 1 for sufficiently small e; therefore S1 -. 0 as n oo. It remains to consider S2 and S3. Let us begin with
S2 =
Y' sn
ak.
Systems of random linear equations in GF(2)
140
We first show that ak is a monotone increasing function for k such that sn < k
n(1 - e)/2. Indeed, n (k+,)
T
rk+1
ak+1
//n1 T
ak
\klrk
- n -k (1+(1 -2(logn+x/n)k+1\T 1 + (1 - 2(logn +x)/n)k
k+1
J)
> n - n(1 - s)/2
- n(1 - s)/2 - 1
I-
x
(1 - 2(logn +x)/n)k - (1 - 2(logn +x)/n)k+1 1 + (1 - 2(log n + x)/n)k
\T
l+s 1 -s+2/n xI 1\\\
(1 - 2(logn +x)/n)k - (1 - 2(logn +x)/n)k+1 \T JI 1 + (1 - 2(logn +x)/n)k
Since 1 + (1 - 2(logn +x)/n)k > 1, we obtain ak+1
1+s
ak
1-s+2/n 2(logn+x))k(1
x
n
1+s
1-s+2/n
-1+ 2(logn +x)11 n
I
I- 2(lognn + x) ( 1 - 2(lognn + x) )
J/f
kT
For sufficiently large n,
(1 + s)/(1 - s + 2/n) > l + e. Moreover, fork satisfying sn < k < n(1 - s/2),
- 2(logn +x)1k < e_2k(logn+x)ln < cn -2s , \1
J
n
where c is the constant a-2ex Thus, for sufficiently large n,
aakl
> (1 +s) I 1 -
2c(ingn2E x)1 T
> (1 +s)(1 -e/2) > 1.
T
3.3 Rank of sparse matrices
141
If we estimate S2, we can use the monotonicity of ak to obtain the inequality S2 < nak2.
Let us estimate
ak2=(n ) 1
(1+11-2(logn+x)
\
2
k2
n
Since a rough estimate is acceptable, we content ourselves with the bound n
1
k2
2n
(n) k2n k: u <-n' 110
kE _1/10
-
-
1
27r
e-u2/2du(1+o(1)) 00
1
27rn1/1o
e-n1/5/2(1 + 0(1)).
Here we used the well-known asymptotics e_u2/2 du I-z
= Ie-z2/2(1 +o(1)) z
as z -+ oo. Thus, there exists a constant a such that
(;);
s
/2.
Let us estimate the second factor of ak2. It is clear that
/1 C1 + 1
+x))kZ)T
- 2(lognn
< (1
+e-2k2(logn+x)/n) T
e-(1-s)(logn+x))T <
+
eTn-1+ee-x(1-E)
< ebn8,
where b is a positive constant. By combining the estimates of the two factors of ak2, we obtain the bound
S2 < nak2 < nae -nl/5/2ebn' and S2 --+ 0 if we chooses < 1/5. It remains to estimate S3
/
1 (1 + I 1-
= k2
k 2n (n\
\
2(logn +x)1k n
J
T
Systems of random linear equations in GF(2)
142
It is clear that
(1 +
S3
C1 _
)(n) 1 1\k
< \
<
+x))k2T
2(logn
/ k<3
n ebnEae-n11512
2n
,
S3
Proof of Theorem 3.3.1. The assertion of Theorem 3.3.1 follows from Lemmas 3.3.1 and 3.3.2 because by Lemma 3.3.2,
P{s(A) = 'n,T} + 1 under the conditions of the theorem. The following theorem is a corollary to Theorem 3.3.1. Suppose that
logT+x T where x is a constant and t = 1,
(3.3.2)
..., T, j = 1, ..., n.
Theorem 3.3.2. If n, T --* oo such that Tin --* a, 1 < a < oo, and condition (3.3.2) is valid, then the distribution of s(A) converges to the Poisson distri-
bution with parameter)` = e'/a. Proof. Since the rank of a matrix is the maximum number of linearly independent rows or columns, we apply Theorem 3.3.1 to the transpose matrix and obtain the assertion of Theorem 3.3.2. Because we know the limit distribution of the rank of a matrix A, we can obtain some results for the behavior of the solutions to the system of linear equations with the matrix A. Let us consider the system
AX= B,
(3.3.3)
where the elements of the T x n matrix A = IIatjIl are independent, and for
t = 1,...,T, j = 1,...,n, P{atj=11=
logn + x n
where x is a constant, the column-vector B = (b1, ..., bT) is independent of A, and the random variables bl, ... , bT are independent, taking the values 0 and 1 with equal probabilities. Denote by t n1T the number of solutions of the system (3.3.3). The examples cited in Section 3.1 show that the consistency of linear systems plays a particular
3.4 Cycles and consistency of systems of random equations
143
role in some of the problems related to such systems. The probability of consistency Pn,T of system (3.3.3) is the probability that the system has at least one solution:
Pn,T = P{µn1) > 0}. By using Theorem 3.3.1 we can easily prove the following assertion. Theorem 3.3.3. If n, T -+ oo such that T/n -+ a, 0 < a < 1, and condition (3.3.1) is valid, then ae x/2
Pn,T
Proof. If the rank r(A) of A equals r, then P{µn1T > 0 I r(A) = r} =
2_T+r (3.3.4)
Indeed, let the linearly independent rows have the indices 1, 2, ..., r. Then each of the rows with indices r + 1, ..., T is a linear combination of the first r rows, and for the system to be consistent, each of the right-hand parts br+1, ... , bT must satisfy a linear relation of the form Eltbl +
+ Ertbr = bt,
t = r + 1,..., T,
(3.3.5)
where 81t, ... , Ert are constants taking the values 0 and 1. The probability of the validity of any of the relations (3.3.5) is equal to 1/2 and, hence, assertion (3.3.4) is true.
Since {r(A) = r} = {s(A) = T - r}, by the total probability formula,
P{r(A) = r} 2T
Pn,T = r=0
r=
P{s(A) = s} ZS .
(3.3.6)
s=0
The last series from (3.3.6) is majorized by the series E°_0 2-S and converges uniformly. Therefore it is possible to pass to the limit under the sum in (3.3.6). Passing to the limit with the help of Theorem 3.3.1 yields °O Xse-x = e Pn T -+ E
X/2
2Ss!
S=O
where X = ae-x.
3.4. Cycles and consistency of systems of random equations In this section, we consider a system of T equations in GF(2):
xi(t) + xi(t) _ fit,
t = 1, ..., T,
(3.4.1)
144
Systems of random linear equations in GF(2)
where i (t), j (t), t = 1, ... , T, are independent random variables that take the values 1, . . . , n with equal probabilities, and the variables 01, ... , PT take the values 0 and 1. We denote by An,T the matrix of this system. As in Section 3.1, we associate the matrix An,T to a graph G,,T with n labeled vertices that correspond to the variables X 1 ,-- . , xn. The graph has T edges (i(t), j(t)), t = 1, ... , T. Thus the edges of the graph Gn,T may be considered an outcome of T independent trials: In each trial, an edge joins two different vertices i and j with probability 2n-2 and forms the loop at a vertex i with probability n-2, i, j = 1, ... , T. Thus the graph Gn,T is the same as the graph considered in Section 2.3. Denote by µn,T the number of solutions of the system (3.4.1) and consider the probability of consistency
Pn,T = P(An,T > 01. We want to express Pn,T in terms of the characteristics of GG,T. Denote by xn,T the number of components of the graph GG,T. Theorem 3.4.1. If $i .... PT are independent random variables that take the values 0 and 1 with equal probabilities and do not depend on A,, T, then
Pn T = 2T-n E PIXn,T = k) Zk. k=1
Proof. We first assume that Gn,T is a connected graph. We can then choose a tree that is a skeleton of the graph. This tree contains n - 1 edges that correspond to a subsystem containing n - 1 equations of the system. If we assign a fixed value to one of the unknowns, then with the help of the corresponding subsystem, we
obtain the values of all other unknowns. Consequently, the right-hand sides of the remaining T - n + 1 equations must each take a fixed value for the system to be consistent. Since fl1, ... , PT are independent and take the values 0 and I with probabilities 1/2, the probability of consistency is (1/2) T-n+1 for G,,T connected. Now assume the graph G,,T consists of k components with n 1, ... , nk vertices and TI , ..., Tk edges, respectively. The whole system is consistent if and only if each of its subsystem is consistent. Under the condition that the number of components Xn,T = k and, consequently, that the system decomposes into k disjoint subsystems, the probability of consistency is 1
1
2T2-n2+1
1
1
2Tk-nk+l = 2T-,,+k*
When we apply the formula of total probability, we obtain the assertion of the theorem.
3.4 Cycles and consistency of systems of random equations
145
According to Theorem 3.4.1, the number of components of the graph Gn,T can be used to investigate the system (3.4.1). Likewise, we can consider the maximum number of independent critical sets s(A,,r) introduced in Section 3.1. According to Theorem 3.1.1, the maximum number of independent critical sets s(A,,T) and the rank r(A,,T) of the matrix An,T are related by the equality
s(A,,,T) + r(An,T) = T. It is not difficult to prove that
Xn,T = n - T +s(An,T), and the rank r (An,T) = n -xn, T . Thus, the assertion of Theorem 3.4.1 is equivalent to relation (3.3.6). We remarked in Section 3.1 that a critical set of An,T corresponds to a cycle or a union of cycles in the graph Gn,T, and the maximum number of critical sets s(A,,T) equals the maximum number of independent cycles.
The graph Gn,T was studied in Section 2.3. We have seen that if n, T - oo such that 2T/n .l, 0 < A < 1, then with probability tending to 1, the graph has no components with more than one cycle. Therefore, under these conditions, all cycles of Gn,T are isolated and, consequently, independent. As in Section 3.1, we denote by v(Gn,T) the number of cycles in Gn,T. It was proven (see Theorems 2.3.3 and 2.3.4) that if 2T/n A, 0 < A < 1, then
P{v(GG,T) = s(Af,T)} - 1,
(3.4.2)
and for any fixed k = 0, 1.. P(v(Gn T) = k) -+
A xe n (3.4.3)
,
k!
'
where
A = -2 log(1 -.X). These results allow us to analyze the probability Pn,T of consistency of the system (3.4.1).
If n, T oo such that 2T/n --> A 0 < A < 1, and the righthand sides $i .... iT of the system (3.4.1) are independent random variables that take the values 0 and 1 with probabilities 1/2 and do not depend on An,T, Theorem 3.4.2.
then Pn,T
,
(1 - ;)1/4.
Systems of random linear equations in GF(2)
146
Proof. When we use Theorem 3.4.1 or the equivalent formula (3.3.6), we find that
Pn,T =
PIXn,T = k) 2TIn+k = k=1 T
_
P{r(An,T) = r)2T-r r=0
1
P{s(An,T) = S} 2S
S=O
Taking into account (3.4.2) and (3.4.3) and passing to the limit under the sum yield 00
Pn,T s=0
Ase-n
= e-n/2 = (1 - ),)1/4.
s! 2s
In the same way, we can treat the nonequiprobable case, where the indices
(i(t), j (t)), t = 1, ..., T, of the variables of system (3.4.1) are independent identically distributed random variables that take the value i with probability pi, i = 1, ... , n, p1 + + pn = 1. As before, let the right-hand sides 01, ... , PT be independent, take the values 0 and 1 with equal probabilities, and not depend on An, T. We retain the notation Pn,T for the probability of consistency of such a system. Theorem 3.4.3.
Let pi = ai In, where ai = ai (n), 0 < s0 < ai < s1 < oo,
i = 1, ... , n, 80 and s1 are constants, and let n
a2 = lim
n i=1
If n, T -+ oo such that 2T/n -* A and a2), < 1, then Pn,T
-a2A)1/4.
Proof. In Section 2.4, the nonequiprobable graph Gn,T corresponding to the matrix An,T was considered. The graph contains n labeled vertices and T edges that can be obtained by the following T independent trials. In each trial, one edge is drawn. The edge connects two different vertices i and j with probability 2 pi pj, and a loop at a vertex i is formed with probability p?, i, j = 1, ... , n, p1 + + pn = 1. According to Theorem 2.4.1, under the conditions of Theorem 3.4.3 for any
fixed k=0,1,..., Ake-n
P{v(Gn,T) = k} -
k!
3.4 Cycles and consistency of systems of random equations
147
where v(Gn,T) is the number of cycles in Gf,T, and
A = -1 log(1 - a2),). If we reason as we did in the proof of Theorem 3.4.2, we obtain the assertion of Theorem 3.4.3. The proofs of Theorems 3.4.2 and 3.4.3 are mainly based on assertion (3.3.4) that
P{hn,T > 0 I r(An,T) = r} = 2
T+r
(3.4.4)
The proof of this assertion in Section 3.3 used the fact that if r rows are linearly independent and r(An,T) = r, then each of the remaining rows is a linear combination of these r rows, and the system is consistent only if the corresponding right-hand sides satisfy a certain linear relation. If the right-hand sides /3t, ... , PT are independent, then such a relation is satisfied with probability 1/2, and the events corresponding to different relations are independent. In other words,
each cycle in G, T imposes a restriction on the right-hand sides P 1 ,-- . , ,BT, these restrictions are independent, and each of them is satisfied with probability 1/2. If the right-hand sides ,Bl, ... , PT take the values 0 and 1 with unequal probabilities, then property (3.4.4) is not valid, and the corresponding formula for the probability Pn,T of the consistency of the system becomes more complicated. In this section, we prove the following assertions. Let
Pn,T (k) = P{hn,T > 0, v(Gn,T) = k},
Theorem3.4.4.
Pn,T = P{µn,T > 0}.
Let the right-hand sides ,Bl, ... , / T of the system (3.4.1) be
independent identically distributed random variables that take the values 0 and 1
with probabilities 1 - p and p, respectively, 0 < p < 1, A = 1 - 2p. If n, T oo such that 2T/n 0 < A < 1, then for any fixed k = 0, 1, .. , Pn,T(k)
Pn,T
4kk!
(- log(1 -),)(1 - Aa,))k
1 _X
1
,
1/4
1-AX
Theorem 3.4.5. Let the right-hand sides 01, ... , OT of the system (3.4.1) take the values 0 and 1, and let m = m (T) be the number of 1's in ,Bl, ... , fT.
Systems of random linear equations in GF(2)
148
p, 0 < p < 1,
If n, T -* oo such that 2T /n -> A, 0 < A < 1, and m / T then for any fixed k = 0, 1, ... , Pn,T (k)
4 k!
log(, - ).)(1 - A),))k 1 - a,
I-X Pn,T
1/4
1-0X
where .=1-2p. Before proceeding to the proof of these theorems, we will establish some auxiliary results. Let ,Bl, ... , ,BT be independent identically distributed random variables that take the values 0 and 1 with probabilities 1 - p and p, respectively; let
0 = 1 - 2p; and let E be the set of the even numbers. Let r 0 = 0 and rl, ... , rk be positive integers. We consider the random variables 17i = firo+...+ri_1+1 +
+
i=1,
,k.
Lemma 3.4.1.
P{ii E E, i = 1, ..., k) = k (1 + Sri)
(1 + irk).
Proof. It suffices to note that the random variables ill , ... , ilk are independent and that the probability of the event of the sum ,Bl -++ Pr being even equals
(1 + Or)/2. When the variables ,Bl , ... , OT are nonrandom, we need a similar assertion for the following scheme of allocating m particles into T cells. The cells are divided into k + 1 groups of cells containing rl, ... , rk, T - rl - rk cells, respectively. We assume that each cell can contain at most one particle, that m < T, and that each of (m) possible allocations are equiprobable. We introduce the random variables
1,
, T, setting i;i = 0 if the cell number i is empty, and i = 1 otherwise,
f o r i = 1, ... , T. By analogy with the random variables ill, ilk, we define the random variables
i = ro+...+r;_,+1 + ... + ro+...+r, ,
i = 1, ... , k.
It is not difficult to verify the following assertions.
Lemma 3.4.2.
If rl, ... , rk are fixed, T --* oo, and m / T --* 0, then
P{ti E E, i = 1, ... , k}
1.
Lemma 3.4.3. If ri , ... , rk are fixed, T --> oo, and m / T -+ 1, then
E E, i = 1, ... , k}
1
3.4 Cycles and consistency of systems of random equations
if all rl,
.
. .
149
, rk are even; and
i=1,...,k}--* 0 if at least one of rl, ... , rk is odd. Lemma 3.4.4. If rl, ... , rk are fined, T
oo, and m / T --* p, 0 < p < 1, then Zk (1
E E, i = 1, ... , k}
+ 0r') ... (1 + Ark
where A = 1 - 2p. We now consider the graph Gn,T and mark the cycles in the graph by the following rule. Recall that An,T is the set of all graphs with n labeled vertices and T edges whose components are trees and unicyclic components, allowing cycles of length 1 and 2. If a realization of the graph Gn,T belongs to the set A,, T, then every cycle of length r is marked with probability pr independently of the others. If the graph contains a component with more than one cycle, then no cycle of the graph is marked. We denote by pn,T (k) the probability of the event that the number of cycles v(GG,T) in the graph Gn,T is equal to k and all cycles are marked. It is clear that the probability Pn,T of the event that all cycles are marked equals 00
Pn,T = Y. pn,T(k). k=0
As in Section 1.7, we denote by dm the number of mappings of the set 11,
. . .
, m}
d(mr)
into itself whose graphs are connected, and by the number of mappings of the set 11, ... , m } into itself whose graphs are connected and contain a cycle of length r. Let Fn,N denote the number of forests with n labeled vertices and N
trees, T = n - N. dm(r)
Explicit expressions for dm and the number of rooted trees, we obtain
M!
d(r) = M
are well known. By using the formula for
mm-r-1
(m - r)!
hence, m-1 mk dmr)=(m-1)I>-
m
dm=
kI
k=o
r=1
Lemma 3.4.5.
pn,T(k) =
'
For any integer k, 1 < k < min (n, T),
r`
m!Dm1 Dmk 2 n ) F n-m,N (n2) E m1! ...mk! 2kk1 M=1 mI+"+mk=m T!
T
n
CIn
Systems of random linear equations in GF(2)
150
where m
D. =
dmr)Pr, r=1
and for k = 0, Pn,T(0) = Fn,NT!
(2
T
\n 2)
(
.
Proof. For k = 0, the assertion is obvious. As in Section 1.7, let us denote by bn the number of connected graphs with n labeled vertices and one cycle of length r. It is clear that
b(l) = dnl)
bnr) = dnr)/2,
b,(,2) = dn(2)
r > 3.
(3.4.5)
Denote by Cn,T the event that the graph Gn,T contains no unmarked cycles. We represent the event {v(Gn,T) = k, Gn,T E An,T, Cn,T} as a union of the following disjoint events: In a specific order, T trials give T fixed edges that form a graph consisting of trees and k unicyclic components, including a marked cycle. It follows from this description that Pn,T (k) = P{v(Gn,T) = k, Gn,T E .'4n,T, Cn,T } n
E
n
m!
In
MI!...Mk!k!
X M1
Mk
r1= 1
T'
/2\T
mk PrkFn-m,T-m b(rk)
..
b(rl) ml Prl
n2
/J11
211+12 >
rk= 1
where sl = sl (rl, ... , rk) is the number of 1's among rl, . . . , rk, and s2 = s2(ri, . . . , rk) is the number of 2's among r1, ... , rk. The factor 2-51 appears because the probability 2n-2 is replaced by n-2 in si cases. The factor 2-s2 reflects the fact that permuting trials in which two identical edges occur results in the same graph. The lemma follows from the relations (3.4.5). Theorem 3.4.6. If n, T
oo such that 2T/n -+ ? , 0 < k < 1, then for any
fixedk=0,1,..., Pn,T(k) _
(D(a)2kk! 1
-
(1 + o (1)),
where 00 Dmxm
D(x) _ m=1
m!
,
a = Ae-
3.4 Cycles and consistency of systems of random equations
151
Proof. The proof is similar to the proof of Theorem 1.8.2. We partition the sum from Lemma 3.4.5 into two parts. We put
M=T1/4. It is clear that for any x in the domain of convergence of the series 00 Dmxm
D(x) = we have Dm1xm' ... Dmkxmk m1....mk.
m>M ml+...+mk=m
Dmxm
k-1
E
m>M/k
m.
(3.4.6)
Along with the function D(x), let us introduce the generating function of the number of connected mappings 00 = T,dmxm
d(x)
m!
M=1
The inequality
D(x) < d(x)
(3.4.7)
holds because m
in
dmr) Pr
Dm =
dmr) = dm.
<
r=1
r=1
Also, m-1
dm=(m-1)!E iMk<(m-1)!em, k=o k.
which implies dnxm m!
m>M/k
, (ex)m.
(3.4.8)
m>M/k
'
Let
0o nn-lxn
6(x) =
°O
,
a(x) =
nt
.
n=l
T
n=O
n nx n n.!'
By Example 1.3.2 and (1.4.8),
d(x) = loga(x),
a(x)
9(x))-1.
We put a = 2T/n and x = ae-a for a < 1. Then
0(x) = a,
d(x) _ -log (1 -a).
Systems of random linear equations in GF(2)
152
Under the hypothesis of the theorem, a = 2T/n --> X, 0 < ), < 1, and for x = ae-a, there exists q < 1 such that ex = ae1-a < q < 1 for sufficiently large n. Therefore
L
Ml k
m>M/k
(3.4.9)
1 -1 q q
dmxm m!
rr
Using estimates (1.8.8), (1.8.9), and (3.4.6)-(3.4.9) yields
k (m) T!
S2 =
< cT!
2
T
>
1:
M> -M mt+...+mk=m
< C1 1:
n(ae-a)
1:
cln
dmxm
1
m>M/k
m)2(T-m)dm, ... dmk nn!(n (n - m)! 2T-m(T - m)! ml! ...mk!
ml!' mk!
m>M ml+...+mk=m
cln(D(x)) k
ml!...mk!
mF"-m,N
M ml+...+mk=m
(?)T 1: n2
(n)
- 1 -q
m!
T114/k
q
c2Tq
T1141k
where cl, c2 are constants. Thus, under the hypothesis of the theorem, S2 -+ 0. If n, T -+ oo, 2T/n -+ A, 0 < < 1, then by virtue of (1.8.7),
2T-m(T - m)!
0+00))
n2Txm 1- (1+0(1)) 2Tnm uniformly in m < M = T 1/4. Therefore, f o r any fixed k = 1 , 2, ... , T! Sl =
() 2
i
1
2 -k-k!-
X
TZ
1:
i
m <M ml++mkm
M
(n\
Dml ... Dmk
m' mF"-m'N MI!...mk!
Dmixml ... Dmkxmk (I
E m=k
mm! ' mk!
+o (1))
By using the estimate of S2, we obtain Sl
_2k1.l k!
2kk!
00
E m=k ml+...+mk=m (D(x))k(1 + 0
Dm x m1 ... Dmk x mk (1+o(1))+o(1) ml! . . . Mk! 1
(1)).
3.4 Cycles and consistency of systems of random equations
153
Combining the estimates of Si and S2, we obtain, under the hypothesis of the theorem, pn,T(k) = P{v(Gn,T) = k, Gn,T E An,T, Cn,T)
= (D(x))k
a,
1
2kk!
(1+o(1)).
(3.4.10)
Hence the assertion of Theorem 3.4.6 for k > 1 follows, since x = ae-a = (2T/n) e-2T/n - Ae-)` = a and D(x) -+ D(a). We use (1.8.6) and the representation from Lemma 3.4.5 and conclude that
1 - a.(1 + o (1)).
Pn,T (0) =
Corollary 3.4.1. If n, T oo such that 2T/n ---> )., 0 < A < 1, then the probability Pn,T of the event that the graph Gn,T contains no unmarked cycles satisfies the relation
Pn,T = eD(a)/2 1 _-X(1 + o (1)). Proof. We denote by pnT the probability of only marked cycles in the case where the graph has k unicyclic components and all the probabilities pr are equal to 1,
r = 1, 2. .... In this case, D(a) = d(a) = 2A = - log (1 - )), and Theorem 3.4.6 gives Ake-A pn17 (k)
k = 0, 1, ... .
,
k!
To prove the corollary, it suffices to show that in the sum n
(3.4.11)
Pn,T = E Pn,T (k), k=0
one can pass to the limit under the sum. Let us show that for any e > 0, there exists
K such that 00
(3.4.12)
T. Pn,T (k) < e. k=K+1
We choose K such that 00
Ake-A
k=K+1
s
k!
2
and for fixed K, we choose no so that for n > no, K
K
pn,T (k) k=0
k=0
Ake-A
k
£
< 2.
Systems of random linear equations in GF(2)
154
Then, for n > no, 00
nk-A
00
pn, T(k) -
pn1T (k)
k!
k=K+1
k=K+1
K
K k=0
-
k=0
nke-A
k
and therefore 00
pn,T (k) < &. k=K+1
Since P, ,T (k) < pn1 , (k), estimate (3.4.12) and the validity of passing to the limit under the sum are established. Proof of Theorems 3.4.4 and 3.4.5. A cycle leads to the inconsistency of system (3.4.1) if the sum of the right-hand sides of the subsystem corresponding to the cycle is odd. Let pr be the probability that this sum is even for a cycle of length
r. Then Pn,T(k) = pn,T(k) for any k = 0, 1. .... Therefore Theorems 3.4.4 and 3.4.5 are direct corollaries to Theorem 3.4.6 and the fact proved above that one can pass to the limit under the sum in (3.4.11). To prove Theorem 3.4.4, we notice that in this case, according to Lemma 3.4.1,
Pr= (1+Or)/2, where A = 1 - 2p; therefore 00
D(x) = E
D 'n
m=1
M!
2
00
yn
L d!
m=1
m
EY m=1 r=1
=
00 1
00
xrn
+
d r(nr)(1 + Or)xm
m
2 m=1 r=1
2m! (Mr)
rm
d m! x
= 2(d (x) +d(x, 0)),
where 00
m
d(x,A)=T, E m=1 r=l
(3.4.13)
) Arxm m!
Forx=ae-Q,0
d(x)=-log(1-a), and
d(x, A) _ -log (1 - a0).
3.4 Cycles and consistency of systems of random equations
155
Indeed,
m d(r)Orxm =
00
t
m=1 r=1
00 mm-r-l irxm
=I :E r=1 m=r
m
1: - 1: r=1 m=1
1: 1: M m. 00
00 d(r)Arxm
00
M!!
00 T00 orxr 57 (t + r t-lxt
- r=1
(m - r)!
t=0
t!
By using the well-known equality r)t-txt
(t +
00
ear
t
t=0
r
from [124], Chapter 2, Problem 210 (see also [126]), we obtain
00 Qrxrear
d(x, 0) = Y
00 Qrar
= Y r = - log (1 - at). r=1
r
r=1
We conclude by noting that for a = 2T/n - A, 0 < A < 1,
d(x, A) = - log (1 - a0)
d(x) --* - log (1 - A),
log (1 - ),0).
Let us turn to the proof of Theorem 3.4.5. If m / T ->. 0, then for any fixed k, all the cycles are marked with probability tending to 1. Therefore Ake-A
Pn,T (k) = pniT (k) (1 + o (1)) =
(1 + o (1)).
kt
In the case where m / T - 1, we have pr - 0 for odd r and pr - 1 for even r by Lemma 3.4.3. Therefore, in this case, Dm ± D(m2)
d(m2r)
1
D(x) _ m=1
00
D mxm
B(2)(a) _
mt .
m
M=1
m.t
It is not difficult to see that 00 A2r
B(2)(a) _ Y r=1
In the case where m / T
1
.r = - log (1 - A2).
p, 0 < p < 1, by Lemma 3.4.4, Pr -+
(l
+ Or)/2,
Systems of random linear equations in GF(2)
156
and, as in (3.4.13),
D(x)
D(a) = (d(a) + d (a, 0))/2 = - 2 log (1 - X)(1 - A),).
3.5. Hypercycles and consistency of systems of random equations In Section 3.2, we studied the rank of random matrices and found, in particular, that if the elements of a T x n matrix A = II ati 11 are independent identically distributed random variables taking the values 0 and 1 with equal probabilities, then the rank
r(A) of the matrix A has a threshold property: If T/n -+ a and a < 1, then
P{r(A) = T} - 1, and if T/n - a and a > 1, then P{r(A) = n} -+ 1. In other words, the maximum number of independent critical sets s(A) tends in probability to zero in the former case and to infinity in the latter case. A similar property apparently holds for the sparse matrices considered in Section 3.3: We proved only that if a < 1, then s(A) has in the limit a Poisson distribution, and
Es(A)-->oofor a>1. In Section 3.4, we considered systems with at most two unknowns in each
equation. It was shown that if T/n -+ a, 0 < a < 1/2, then the maximal number of independent critical sets or independent cycles in the corresponding graph approaches the Poisson distribution with parameter A = -2 log(1 - 2a). As follows from Theorem 2.1.6, if a > 1/2, then s(A) tends in probability to infinity.
The case of a matrix with independent and identically distributed random elements taking the values 0 and 1 with probabilities 1/2 and the case of a matrix with at most two elements in each row studied in Section 3.4 can be considered as the extreme cases in terms of the behavior of the rank and the maximum number of independent critical sets. In these cases, the threshold effect appears at the points
T/n = 1 and T/n = 1/2, respectively. In this section, we consider an intermediate case and obtain a weaker form of the threshold effect. We consider the system of random linear equations in GF(2): x=1(t)+...+xt.(t)=bt,
t=1,...,T,
(3.5.1)
where i1(t), ... , ir(t), t = 1 , ... , T, are independent identically distributed random variables taking the values 1, ... , n with equal probabilities, and the independent random variables b1, . . . , bT do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. If r = 2, we obtain the system considered in Section 3.4. In Section 3.1, we introduced the notions of critical sets for a matrix and hypercycles for the hypergraph corresponding to a matrix. Denote by A,,,,,T the matrix
3.5 Hypercycles and consistency of systems of random equations
157
of system (3.5.1) and by Gr,n,T the hypergraph with n vertices and T hyperedges el, ..., eT that corresponds to this matrix. Thus we consider a random hypergraph Gr,n,T, whose matrix A = A,,n,T = Ilat.i I I has the following structure. The elements of the matrix ate, t = 1, ... , T, j = 1, ... , n, are random variables and the rows of the matrix are independent. There are r ones allocated to each row: Each 1, independent of the others, is placed in each of n positions with probability 1/n, and atj equals 1 if there are an odd number of l's in position j of row t. Therefore, there are no more than r ones in each row. For such regular hypergraphs, the following threshold property holds: If n, T oo such that T/n -> a, then an abrupt change in the behavior of the rank of the
matrix Ar,n,T occurs while the parameter a passes the critical value ar. This property can be expressed in terms of the total number of hypercycles in Gr,n,T. Let s(A,,,,T) be the maximum number of independent critical sets of A,,, T or independent hypercycles of the hypergraph Gr,n,T. Then S(Ar,n,T) =
2s(Ar.n.T)
-I
is the total number of critical sets or hypergraphs.
In this section, we prove that the following threshold property is true for S(Ar,n,T).
Theorem 3.5.1. Let r > 3 be fixed, T, n - oo such that T/n - a. Then there exists a constant a, such that ES(Ar,n,T) O for a < a, and ES(Ar,n,T) - 00
for a > a,. The constant a, is the first component of the vector that is the unique solution of the system of equations
)a ex cosh A,
(ar
ar-x
x (ar-xll1r x
=1
= 1.
(3.5.2)
AtanhA = x, with respect to the variables a, x, and X. The numerical solution of the system of equations gives us the following values of the critical constants: C13 = 0.8894...,
a4 = 0.9671...,
a5 = 0.9891...,
a6 = 0.9969...,
a7 = 0.9986...,
a8 = 0.9995....
158
Systems of random linear equations in GF(2)
Expanding the solution of the system into powers of e-' yields
e-''
e-2r
artil-log2 log2(2r2
r
1
+log2-r-2)
which gives values close to the exact ones for r > 4. Let us give some auxiliary results that will be needed for the proof of Theorem 3.5.1. The total number of hypercycles S(Ar,n,T) in the hypergraph Gr,,,,T with the matrix Ar,n,T can be represented as a sum of indicators. Let t1,,,,,tm = 1 if the hypercycle C = {et, , ... , etm } occurs in Gr,n,T, and t...... tm = 0 otherwise. It is clear that 11 does not depend on the indices t1, ... , t,,,. Indeed, from the definition of the random hypergraph Gr,n,T, the indicator 4t...... t , = 1 if and only if there are an even number of 1's in each column of the submatrix consisting of the rows with indices t1 , .. t,,,. The number of 1's inn columns of any m rows, before these numbers were reduced modulo 2, have the multinomial distribution with rm trials and n equiprobable outcomes. Denote by rll (s, n), ... , nn (s, n) the contents of the cells in the equiprobable scheme of allocating s particles into n cells. In these notations, the number of 1's in the columns of any m rows, before those numbers have been reduced modulo 2, have a distribution that coincides with the distribution of the variables r/l (rm, n), ... , r/n (rm, n). Therefore
tm = 1) = P{ril(rm, n) E E, ... , r/n(rm, n) E E}, where E is the set of even numbers, and the average number of hypercycles in Gr,n,T can be written in the following form: T
ES(Ar,n,T) = E () PE(rm, n), M=1
(3.5.3)
P
where
PE(rm, n) = P{ril(rm, n) E E_., 11n(rm, n) E E}. Thus, to estimate ES(Ar,n,T), we need to know the asymptotic behavior of PE(rm, n). We consider a more general case and obtain the asymptotic behavior of the probabilities
PR(s, n) = P{rjl (s, n) E R, ... , rjn(s, n) E R}, where R is a subset of the set of all nonnegative integers.
The joint distribution of the random variables rjl(s, n), ... , ?In (s, n) can be expressed as a conditional distribution of independent random variables i , ... , Sin, identically distributed by the Poisson law with an arbitrary parameter A, in the
3.5 Hypercycles and consistency of systems of random equations
159
following way (see, e.g., [90]). For any nonnegative integers s1, ..., s, such that
s1+...+Sn=S,
P{rlt(s,n)=S1,...,rln(s,n)=sn} =P{
=S}.
=Sn I
1
Therefore
PR(s, n) = P{rjl(s, n) E R, ... , 17n (s, n) E R)
E R, ..., Sn E R, 1 + ... + n = s} P{'1 + + Sin = S}
S I 1 E R, ..., n E R}
_ (P{l;1 E R})n
Pt
We now introduce independent identically distributed random variables
(R),
nR)
with the distribution
1
=kI
i E R},
k = 0,1,....
It is not difficult to see that
=s},
E R} = P
=s 1 4 1 E
and therefore P{41R>
PR(S, n) = (P{1 E R})n
+ ... +nRi = s}
(3.5.4)
S}
Let x = s/n and choose the parameter A of the Poisson distribution in such a way that
x = Eli
ke-'
=
k ki
E R}.
kER
Let d be the maximum span of the lattice on which the set R is situated and denote the lattice by FR.
Theorem 3.5.2. If s, n --* oo such that n E FR, then in any interval of the form
0<xo<x<x1<00, PR(s, n) =
x L ) n d-,x-(1
E R})n (xxex
+ o(1))
uniformly in x = s/n, where the parameter A of the Poisson distribution of the random variable l;1 is the root of the equation x = E iRi, and Q2 = DA(R) (the variance).
Systems of random linear equations in GF(2)
160
Proof. The local limit theorem holds for the sum i I + + ,(,R). Following the classical proof of the local limit theorem of Gnedenko [49], we prove that if
s, n - oo such that n E rR, then P{ (R) + .
+ 4nR) = n} =
d
v 2nn
0+00))
uniformly in x = s/n in any interval of the form 0 < xo < x < xi < oo, where a2 = Di (R), and d is the span of the lattice rR. When we substitute the expression into (3.5.4) and take into account that the sum i + + n is distributed by the Poisson law with parameter An, we obtain the assertion of the theorem. Note that (3.5.4) implies the estimate s! eXn
PR (s, n) <
E R})n
(An)s '
where PR (s, n) does not depend on A, and on the right-hand side any positive value
can be assigned to this parameter. Let E = {0, 2, ...}. In this case, E E} = e-X cosh A,
and the estimate takes the form i
PE (s, n) < (cosh k)' a,sns ,
(3.5.5)
where A > 0 can be chosen arbitrarily. We now estimate T
ES(Ar,n,T) _
Q PE(rm, n).
M=1
Lemma 3.5.1. If r > 3 is fixed, and T, n -+oo such that T/n -+a, then for any s > 0, there exists S > 0 such that
T
(T)PE(rmn)<sE.
1<m
Pro of. First we point out that
p
A2k (E) 1
= 2k} = P{t l = 2k 14, E E} = Ei (E) _ , tanh ;.
(2k)! cosh A,'
k = 0, 1, ... ,
3.5 Hypercycles and consistency of systems of random equations
161
Put x = rm/n and choose the parameter A of the Poisson distribution in such a way that x = A tanh X. From (3.5.5), it follows that
(rm)!
PE(rm, n) < (cosh,) n arm n rm Since the value of x becomes small for sufficiently small S, we can assume that A < 1 in the domain of summation. For such A, X2/4 < x = A tanh), < ),,2, and therefore z
cosh, < eA <e4x We now estimate the sum. It is easy to see that
TlJ PE (rm, n) < . 1 <m <ST
m
T"'
m!
)"
(cosh(rm)!
1 <m <ST
<
E
7+me4xn
(rm) rm -m ),rmnrm
1 <m <ST
(Tn)m e4rm
I/rm m(r-1)
1
;7m -/2
1<m<6T
\n
C(T)r/2 e4r m)r/2r/2-1 rr/2-1
`
\T
n
1<m<ST
`
J
n-(:rl2 )rr/2-le4rsr/2-1
1<mL<ST
m
(T 1\
Since T/n tends to a constant, the last sum can be made arbitrarily small by choosing a sufficiently small S.
Lemma 3.5.2. If r is fixed, and T, n -+ oo such that Tin then for any e > 0, there exists 6 > 0 such that
a,0
Q PE(rm, n) < E. T
m
(1-S)T <m
Proof. Put A = rm/n and let an integer mo be chosen such that mo/T < S. With such a choice of ),, by (3.5.5),
(T)PE(rmn) m=T-mo
m
T
<
I
m=T-mo
T I (cosh A)n
m
(rm)! (rm)rm
Systems of random linear equations in GF(2)
162
Since in the domain of summation, A is greater than some positive constant, there
exists q < 1 such that
e-;
cosh A = (1 + e-2)')/2 < q.
By using the inequality
(rm)! < c(rm)rme-rm(rT)1/2, where c is a constant, we obtain
(T)PE(rmn) M=T-mo
< c(rT)I T)(e-)`cosha,)n Q m=T-mo
In
qm(1 - q)T-m
T
c(rT)1/2gn
qT (1 - q)mo
m=T-mo
< c(rT) 1/2
n-T
q
((1 - q)mo/(n-T) )
Since q, a < 1, the value mo/(n - T) can be made arbitrarily small by choosing
a sufficiently small S, and therefore the value q/(1 - q)mo/(n-T) can be made smaller than some Q < 1. Thus, for a sufficiently small S, the right-hand side tends to zero under the conditions of the lemma.
Proof of Theorem 3.5.1. We now estimate the middle part of the sum. As T/n -k
a and S < mIT < 1 - S, the values x = rm/n lie in an interval of the form 0 < xo < x < xl < oo. When we apply Theorem 3.5.2, we obtain for even rm, )rm
PE(rm, n) =
exn
2,47
E E})n ( X
0+00))
uniformly in x, xo < x < x1, where x = EWE) = X tanhA, a2 = DB(E) _
A2+x-x2. From
E E} = e-;` cosh A, we obtain the final estimate: As T, n -* oo,
T/n-* a, X
PE(rm, n) = (cosh A)n ( e)
xn
2orx (1 + o(1))
(3.5.6)
uniformly in m, b < m/T < 1 - S. Setting p = m / T, q = 1 - p, and using the normal approximation to the binomial distribution show that, as T -k oo,
Q
(m) pmgT-m(pmqT
\m) \ = \
uniformly in in, 8 < m/T < 1 - S.
-m)-1 = p mm T
m1
2nT
(1+00))
3.5 Hypercycles and consistency of systems of random equations
163
Let a = T/n and write p= mIT in terms of x = rm/n and a. Then
p=
m
T
= x
m
q=
ar '
T
1
ar-x
=
ar
and the estimate of (m) takes the following form. As T -+ oo, 8 < m/T < 1 - 8,
Q_ T
((arY(ar - x)x n/r ar (1+0(1)) xx(ar - X)ar 2nx(ar - x)an
(3.5.7)
uniformly in m. We combine the estimates (3.5.6) and (3.5.7) and obtain
Q PE(rm, n) = (f(a, x))n
2ar
I
Q
2irx(ar - x)an
0+00)),
where x
f(a, x) = cosh A, (-)
(arx x
)
ar x
x/r
( ar
)a
The function f (a, x) increases as a increases,
f (a, x) = f(a, x) log x
x
-\
ar-x\1/r x )
-+
-00,
x -+ 0,
and the derivative f x (a, x) has no more than two zeros. Therefore the system of equations
f (a, x) = 1,
fx(a,x) = 0,
(3.5.8)
Atanh) = x has the unique solution (ar , Xr, Xr ); at this point, the function f (ar, x) as a function
of x attains its maximum, which is equal to 1. Therefore, for all x, 0 < x < ar,
.f(ar, x) -< ,f(ar, xr) = 1. In addition,
f(a, x) < f(ar, x) < 1, f(a, xr) > f(ar, Xr) = 1,
a < ar, a > ar.
This implies that the middle part of the sum tends to zero for a < ar and tends to
infinity for a > ar.
Systems of random linear equations in GF(2)
164
If we consider the estimates for the tails of the sum in Lemmas 3.5.1 and 3.5.2,
we obtain the assertion of Theorem 3.5.1 because system (3.5.8) can be easily transformed to the form mentioned in the statement of the theorem. It would be interesting to find the limit distribution of the number of hypercycles. Up to now, no one has succeeded even in proving that S(Ar,,,,T) tends in probability to infinity as T, n oo, T/n a > a,.
3.6. Reconstructing the true solution We consider the system of equations in GF(2):
x;(t) +xj(t) = bt,
t = 1, ... , T,
(3.6.1)
where the pairs (i (t), j (t)), t = 1, ... , T, are independent identically distributed two-dimensional random vectors that take values (i, j), i < j, i, j = 1, ... , n, with equal probabilities (2)-1. In Section 3.1, we interpreted a system similar to (3.6.1) as the result of T trials performed with the aim of classifying n objects by random pairwise comparisons,
and we set bt = 0 if the comparison of xt(t) and xj(t) showed that these objects were f r o m the same class, and bt = 1 otherwise, for t = 1, ... , T. If the comparisons are not absolutely right, then the result of a comparison may deviate from the true value. Suppose that X* = (xi , ... , xn) is the vector of true values of the unknowns, and the column-vector B* _ (b ', ..., bT) is obtained by substituting X* into the left-hand side of system (3.6.1):
AX* = B*,
(3.6.2)
where A is the matrix of system (3.6.1). If the measurements are not precise, then it is natural to suppose that
bt=br+st, t=1,...,T, where Si, ... , ET are independent identically distributed random variables that do not depend on A and take the values 0 and 1. These random variables can be interpreted as errors. Let
p=
1
2
A
= pt-01 = 1),
A
1
q
2
= P{s1 = 01,
(3.6.3)
where A is called the excess. The problem is to estimate or reconstruct the vector X * = (X,*, ... , xn) on the basis of the matrix A and the right-hand side B = (b1, ... , bT) of system (3.6.1). In a similar situation over the field of real numbers, an estimate of the true solution of a system of linear equations with perturbed right-hand sides can be found by the least-square method. Under some conditions on the matrix and the
3.6 Reconstructing the true solution
165
errors in the right-hand sides, the least-square method provides an estimate that converges to the true solution as the number of equations tends to infinity. In contrast to the field of real numbers, in GF(2) a good estimate k = (.xi...... xn) with probability tending to 1 coincides with the true solution X* = (xi , ...,
asT -k oo. As usual, we associate the graph r'n,T with the left-hand side of system (3.6.1). The graph I'n,T has n labeled vertices corresponding to the unknowns x1, . . . , xn and T edges et = (i (t), j (t)), t = 1, ... , T. The edges el, . . . , eT are independent and assume the n (n - 1)/2 possible values with equal probability. Therefore, the graph I'n,T may have multiple edges. It is clear that along with the vector X*, the vector k* , ... , .xn) with elements i = xl + 1, t = 1, ..., n, satisfies the system (3.6.2). The pair X*, X* is uniquely determined by the system (3.6.2) if the graph I'n,T is connected, in other words, if the system (3.6.2) contains all the unknowns and is not decomposed into subsystems with disjoint sets of unknowns. Denote by Pn,T the probability that the graph rn,T is connected. It follows from
Theorem 2.3.8 that if n, T -). oo such that T = n log n + an + o(n), where a is a constant, then
Pn,Te -e-Q Thus, if n, T - oo and the pair X *, X * is determined by the system (3.6.2) with probability tending to 1, then n log n
1 + logn'
where wn -+ 00. In this section, we present three algorithms for reconstructing the true solution of system (3.6.1) with perturbed right-hand sides. We first describe the reconstruction method that can be called the voting algorithm. This algorithm consists of correcting the right-hand sides b 1 . . . . . bT of the system (3.6.1) by the majority rule. Let the system (3.6.1) contain the subsystem with mij, i < j, equations:
xi +xj = (3.6.4)
xi+xj = The true value of aid ),
. .
. ,
a(m`j) equals a
(mij)
ij
= xl + x* .
We set ai j = 1 if
aid) +... +alm'j) > mij/2, and ai j = 0 otherwise.
Systems of random linear equations in GF(2)
166
Under some conditions, system (3.6.1) is indecomposable and a, j = a for all i, j = 1, ... , n; thus the true solution is reconstructed. Denote by P(n, T) the probability of reconstructing the true solution of system (3.6.1) by the voting algorithm, that is,
P(n,T)=P{a1j=a,j-, i, j=1,...,n). oo and A - 0 such that
Theorem 3.6.1. If n, T
02T n 2 log n
then P(n, T) -
1.
Proof. Let
µ(n, T) = min m,j, 1
where the minimum is over all subsystems of the form (3.6.4). It is clear that
P(n, T) = P{µ(n, T) > m}Pm(n, T) + P(µ(n, T) < m)Pm(n, T),
(3.6.5)
where Pm (n, T) and Pm (n, T) are the conditional probabilities of reconstructing the true solution under the conditions {µ(n, T) > m} and {µ(n, T) < m}, respectively.
We obtain a rough estimate for the probability P{µ(n, T) > in). It is clear that P{µ(n, T) > m} is the probability that each cell contains more than m particles in the classical scheme of allocating T particles into (2) cells. Denote by lii the number of particles in the ith cell and put i = 1 if rj, < in, and , = 0 if rii > in, i = 1,..., (n) .By(1.1.1),
P{µ(n, T) < m} =
-<
01
2 n
2
m}.
The random variable rll has the binomial distribution with T trials and the oo, the normal approbability of success (2)-1. Since a = Eril = T/(2) proximation is valid for this distribution. We choose in = a(l - 0), assume that (0.x)3/ T -> 0, and estimate the probability P{ril < m). By taking into account the choice of in and the equality Di11 = a(l + o(1)), we obtain
P{iji < n} = P{(i1 - a) / Di11 < (m - a)l Drii} = P{(?11 - a)/Di71 -< 0 / (1 +o(1))} A
=
e-u2/2 du(1 + o(1)).
1
oo
3.6 Reconstructing the true solution
167
Hence, there exists a constant c such that
P{iji < m} <
ce-02a/2.
Thus, form = all - 0), P{µ(n, T) < m} - 0 because 02a/ log n - oo and nee-4ai2
(3.6.6)
0.
Now we have to show that under the conditions of the theorem, Pm (n, T) -a 1.
In other words, we have to prove that ai j = a for all i, j = 1, ..., n with .
probability tending to 1. The additional requirement of the indecomposability of the system (3.6.1) or of the connectedness of the graph rn,T is obviously fulfilled. Recall that bt = br + Et. We may assume that in the subsystem (3.6.4),
k=1....,mi1, where the random variables E(m'') are independent and have the same distribution as E1, ... , ET from the right-hand side of (3.6.1). Denote by i; (n, T) the number of wrong decisions, that is, the number of realized events {ai j 0 a*),
i, j = 1, ... , n. Now let i j = 1 if Eli +
+ Eim`'> > mi j/2, and i j = 0
otherwise. It is clear that the number of wrong decisions can be represented in the form
(n, T) = Y, ij, i<j and
1-Pm(n,T) = n
2 )E(62
I µ(n, T) > m)
_ ()P{i2 = I
T) > m}.
(3.6.7)
Now we derive estimates for P{4'12 = 1
µ(n, T) > m} = P{E12 + ....+
E1212)
> ml2/2 I µ(n, T) > m}.
The random variables Ei2 .... , 81212) are independent and have the same distribution as the random variables Si, ... , ET from the right-hand side of system (3.6.1). We set Sk = E1 + + Ek and estimate
P{Sk > k/2} = P{Sk - ESk > k0/2). Here, and later in this section, we use the following inequality of exponential type for the sum Sk that was proposed by Hoeffding [59] and can be found in [ 122] (see
Systems of random linear equations in GF(2)
168
Theorem 1.1.16). For any positive 0,
P(Sk - ESk > k0/2} <
e-ko212.
(3.6.8)
Therefore
PIET +... +E jm`') > mid/2 I /i(n, T) > m} < e-`"o2/2, and from (3.6.7), we obtain
1 - Pm(n, T) <
()e_m2I2.
(3.6.9)
Form =a(l - O), a = T/(2), under the conditions of Theorem 3.6.1, the righthand side of (3.6.9) tends to zero. Thus, the assertion of the theorem now follows from (3.6.5), (3.6.6), and (3.6.9).
We now describe the second algorithm for reconstructing the true solution of system (3.6.1), which can be called the method of coordinate testing. We choose a vector X(o) = (xIO), ..., x,(o)) by random sampling from the set of all n-dimensional vectors over GF(2). Denote by B(O) _ (b(O), .. , b(o)) the column-vector obtained by substituting X(O) for X in the left-hand side of (3.6.1). Let i6(X(O)) be the number of the coordinates of B(O) that coincide with the corresponding coordinates of the vector B = (b1, ..., bT) of the right-hand sides of system (3.6.1). We construct a vector X(1) = (x11), .. , xnl))from X(O) and system (3.6.1) and show that, with probability tending to 1, the vector X(1) coincides with the true solution X*. Therefore we consider the vectors Xi,O = Xi,O =
(x1o),
... , x(0)l, 1,x1+i, ... , xnO))
and calculate the values 0(Xi,o) and 0(X1,1), defined for the vectors Xi,o and X1,1 in the same way P(X(O)) was defined for X(O). For i =
n, let
(1) _ I 0
ill
if P(Xi,o) ? P(Xi,l), if P(Xi,o) < P(Xi,1)
Denote by 4 (X) the number of coordinates of the vectors X and X* that coincide. The value
ri(X) = max
(X)),
where X = (11, ... , xy,) = (xl + 1, ... , x + 1), is called the number of coincidences.
3.6 Reconstructing the true solution
169
If n -
oo, then the distribution of the random variable (21j(X(O)) - n)/./ converges weakly to the distribution of the modulus of the
Lemma 3.6.1.
random variable that has the normal distribution with parameters (0, 1). Proof. Since the vector X(o) is chosen from the set of all n-dimensional vectors by random sampling with equal probabilities, the random variable S = (X (o) ) has the binomial distribution with parameters (n, 1/2). From the obvious equality (X) + (X) = n, the random variable il(X(o)) is represented in the form
n(X(o)) = max(Sn, n - Sn). It is clear that
(27 (X(o)) -n) = max
(2Snn
1
12Sn - nI,
n
Sn/
J
and the assertion of Lemma 3.6.1 follows from the convergence of the distribution
of (2Sn - n)/.J to the normal distribution with parameter (0, 1). We can now prove the following assertion concerning the algorithm of coordinate testing.
Theorem 3.6.2. If n, T - * oo and A -> 0 such that
A2T n2 log n
then
P{X(1) = X*} - I. Proof. For definiteness, assume
(X(o)) > (X(o)). The coordinates of X(o) that coincide with the corresponding coordinates of X* are called true, whereas those that do not coincide are called wrong. For the algorithm of coordinate testing to lead to the true solution, the following obvious conditions must be fulfilled. For each coordinate of the vector X(o), the value of fi (X(o)) must increase if we replace the wrong value of the coordinate by the true value, and the value of ,B(X(°)) must strictly decrease if we replace the true value by the wrong one. We separate all the equations of the system (3.6.1) that contain xi, and denote the number of such equations by ni. Replacing by x(o) changes the contribution in l(X(o)) of these equations only, and each equation containing xi contributes
170
Systems of random linear equations in GF(2)
1 or -1. If x(°) is wrong, then the increment of 0(X(0)) due to replacing xl°) by x(°) is equal to the random variable Oi(X(°)) such that (Oj(X(°)) +ni)/2 has the binomial distribution with parameters (n1, pi), where pi is the probability that the coincidence in a fixed equation containing xi appears after substituting x( 0) for 0), provided .40) is wrong. It is not difficult to see that xI
pi = vq + (1 - v)p,
(3.6.10)
where q = P {bi = bl }, p = 1- q, and v is the probability that the second variable in the equation has the true value. The second variable takes values from the set 1xi°),
... ,
x1°)
with equal probabilities. Therefore v = (k - 1)/(n - 1), where k is the number of true coordinates of X(0), which equals (X(0)) under the assumption that (X(0)) > (X(0)). It follows from Lemma 3.6.1 and equality (3.6.10) that
(1+0)k
hi = 2(n-1)
+ C1
k 11-0 -n-1 2
(2(X(0))-n+1)0 2+ 2(n-1) 1
which we write as ISn I0
1
Pi=2"1' 2,fn-
(3.6.11)
where
(2 (X(0))
n+
n1
By assumption, n is asymptotically normal with parameters (0, 1). Therefore
P{I4nl > (O2T/(n2logn)) 1/4} -
1
(3.6.12)
because 02T/(n2logn) -+ oo. Next, we find a lower bound f o r n1, i = 1, ... , n. To this end, we take into account only the first variable in each equation. Then we obtain the classical scheme of equiprobable allocation of T particles into n cells, and by applying the corresponding results on the distribution of the minimum of contents of cells [90], we find that
P{ min ni > T/(2n)} -* 1. 1
(3.6.13)
3.6 Reconstructing the true solution
171
For the increments sot (X°>), we have
(X(c)) < 0 1 xiO) is wrong}
= P{(4(X1c)} +ni)/2 < ni/2 1 xtci is wrong)
= P{Sn; < ni/2}, where Sn, has the binomial distribution with parameters (n1, p;). From (3.6.11), we find that
P{Sn < n1/2} = P{Sn; - ESn, < When we use estimate (3.6.8) of the exponential type for the binomial distribution and take into account (3.6.12) and (3.6.13), we obtain
P{S< n/2}
exp I-
02T
02T
(n2logn)"21
In a similar way, we obtain the bound
0 1 xeis true}
exp
A2T (2T)_h/2J 2logn
Therefore an upper estimate for the probability of at least one wrong decision while testing all the coordinates of the vector X(O) is n
(Xtci) < 0 1 x,o) is wrong) +
(X(O)} > 0 1 x(° is true))
i=1
I_ 0
< 2n exp { -
T
n2
Q2T \ -1/2
\ n log n
and tends to zero under the conditions of the theorem because
LI2T/(n2log n) -* 00. With the help of a preliminary search of the n-dimensional vectors, it is possible to select an initial vector X(°) with a great number ,j (X (0)) of coordinates coincid-
ing with the corresponding coordinates of the true solution X*. If the algorithm for coordinate testing begins with this initial vector, then a much smaller number of equations is needed to reconstruct the true solution. This number is comparable to the number of edges needed for the graph Fn,T to be connected. Theorem 3.6.3. If n, T -a oo and A
0 such that
02T n log n
-+ 00,
172
Systems of random linear equations in GF(2)
then there exists an algorithm that reconstructs the true solution of system (3.6.1) with probability tending to 1.
Proof. The algorithm, which gives the true solution under the conditions of the theorem, begins with a preliminary search of an initial vector X(°) with a large number of coincidences with the true vector X*. The choice of X(°) is determined by a search of all n-dimensional vectors. To this end, we choose the level l =
Tq-uTNIT,where q = P{br = br) = (1+0)/2anduT = 0//18, and select the vectors X for which ,B (X) > 1. Recall that 0 (X) is the number of coincident coordinates of the vector B = (b1, . . . , bT) and the vector of the right-hand sides of system (3.6.1) that are obtained when X is substituted into the left-hand side of the system. The vector X* will be selected with probability tending to 1. Indeed,
P{O(X*) < Tq - UT/} = P{ST - EST < -uTvlT}, where ST is the number of successes in T independent trials with the probability of success equal to q = (1 + 0)/2. By using estimate (3.6.8), we find that
P{f(X*) < 11 <
e-"z/2,
and the complementary probability P{p(X*) > 11 1 because UT -) 00. If (X) = s, then the probability of the coincidence of a fixed component of the right-hand sides is
qs(s - 1)
P(s)- n(n-1) +
q(n - s)(n - s - 1)
n(n-1)
2s(1 - q)(n - s)
n(n-1)
+
and, since q = (1 + 0)/2, we find 1
P(s)=2+
0(2s-n)(2s-n+1) 2n(n-1)
For example, lets < 2n/3. Then p(s) < 1/2 + 0/9, beginning with some n, and for any fixed X with 4 (X) = s < 2n/3,
P{f(X)>1) = P{ST>Tq - UT/) < P{ST - EST > 70T/18
-
uTv1_T_}
= P{ST - EST > OT/31, where ST is the number of successes in T independent trials with probability p(s) of success. By using the inequality (3.6.8) of exponential type, we find
P{fB(X) > 11 < e- °2T/1s
3.6 Reconstructing the true solution
173
The probability that none of the vectors X with (X) < 2n/3 will be selected 2"e-°2T/18, does not exceed and under the conditions of the theorem this probability tends to zero. Thus, with the help of the exhaustive search, it is possible to select, with probability tending to 1, a vector X(c) such that l (X(c)) > 2n /3. Beginning the algorithm for coordinate testing with this vector X(O), we find, using the notations introduced in the proof of Theorem 3.6.2, that
0 I xac) is wrong) = P{S"; - ES"; < -0I4"In;/(2,1'n-)j. Using estimate (3.6.8) and taking into account that with probability tending to 1, Ion I > / /3 for the selected vector and ni > T/(2n), we find the estimate
P{-i(X(c)) < 0 I xiO) is wrong) < P{Sni
- ES, < -Ani/6}
< e-°ZT/(36n) Similarly we obtain P{4-i (X(0)) > 0 I x(o) is true} <
e-°2T/(36n)
As in the proof of Theorem 3.6.2, an upper bound for the probability of at least one wrong decision, while all n coordinates of X(O) are tested, is 2"e-A2T/(36n) and tends to zero under the conditions of the theorem. Thus, if we use the exhaustive search, then the true solution can be reconstructed
under the condition A2T/(n log n) oo. If the number of equations T is such that A2T/(n2log n) oo, then the reconstruction can be realized by the voting algorithm, which is more economical with respect to the number of operations. Clearly, there is considerable interest in the algorithms that lead to the true solution
with probability tending to 1 under intermediate conditions on the number of equations T and do not require the exhaustive search of all 2n vectors. Let us describe an algorithm that will be referred to as A2. Consider all (T) 2 equations obtained as the pairwise unions of the equations of the system (3.6.1). Among the equations obtained by this operation, there are equations that contain
either four, or two, or zero unknowns each. Denote by S2 the subsystem that includes all the equations with two unknowns each. The algorithm A2 ends with the application of the voting algorithm to the subsystem S2. The following theorem gives the conditions under which the algorithm A2 reconstructs the true solution.
Theorem 3.6.4.
If n, T -+ oo and A -+ 0 such that A4T2 n3 log n
-+ 00,
then the algorithm A2 reconstructs the true solution with probability tending to 1.
Systems of random linear equations in GF(2)
174
Proof. Let i and j be arbitrary, assume i < j, and consider all equations of the system S2 of the form
xi +xj = bi(j), (3.6.14)
The equality mij = m means that the graph rn,T corresponding to system (3.6.1) contains exactly m vertices, say vl,..., vm, such that the graph rn,T contains the edges (vi, i), (vl, j), ..., (vm, i), (v., j). The right-hand sides b9), ..., b("') are the pairwise sums of 2mi j independent random variables, and therefore they are independent and, according to Lemma 3.2.1, take the true value
bid = x, + x with probability (1 + 02)/2 and the wrong value with probability
(1 - 02)/2. Let bi j = 1 if b(') +
+ b;m`') > mij/2, and bi j = 0 otherwise. As in the proof of Theorem 3.6.1, we denote by µ(n, T) the minimum value of mij over all subsystems of the form (3.6.14). As in (3.6.5), the probability P(n, T) of reconstructing the true solution can be represented in the form
P(n, T) = P{µ(n, T) > m}P,n(n, T) + P{µ(n, T) < m}Pm(n, T), (3.6.15) where P,n (n, T) and P. (n, T) are the conditional probabilities of reconstructing the true solution by the majority method under the condition that {µ(n, T) > m} and (µ(n, T) < m), respectively. As in the proof of Theorem 3.6.1, we need to estimate P{µ(n, T) > m}, but here this estimation is more laborious.
=0ifmij > m,i < j,i, j = 1,...,n.Itis
1ifmij clear that
P{ic(n, T) < m} = P
i<j
'ij > 0 < ()P{rn12
< m}.
(3.6.16)
Let tti = 1 if the edges (i + 2, 1) and (i + 2, 2) occur in rn,T, and µi = 0 otherwise; and vi = 1 if exactly one of the edges (i + 2, 1), (i + 2, 2) occurs in rn,T, and vi = 0 if the edges (i + 2, 1) and (i + 2, 2) do not occur in rn,T. The random variable m 12 can be represented as the following sum of indicators:
M12=µ1++µn-2,
3.6 Reconstructing the true solution
175
and m
P{m12<m}_E
Pi,...ik,
(3.6.17)
k=0 {i,,...,ik}
where is the probability that µi, , ... , µik take the value 1 and all the other random variables take the value 0. It is not difficult to see that (Mt, Nt), where
Nt = vi +...+vt,
Mr = tti +-..+µt,
is a Markov chain because (1-tt+1, vt+1) depends only on the number of edges used
to construct the random variables µl, ... , µt, vi, ... , vt, t = 1, ... , n - 2. More precisely, let
p(t I Yt-1, Zt-1) = P{l-tt = 1 I Mt-1 = Yt-1, Nt-1 = Zt-1},
q(t I Yt-1, Zt-1) = P{/-tt = 0 I Mt-1 = Yt-1, Nt-i = Zt_1}. By using this notation, we can write the probability that µi, , ... , µlk take the value 1 and all the other random variables take the value 0 in the form Pi,...ik = P{µi, = 11 ... , N-ik = 1,
/-ti = 0,
il,...,1k I V1 =Z1,...,Vn-2=Zn-2}
i
= q(I I Yo, Z0) ... q(il - I Yi,-2, Zi,-2)p(il Yi,-1, Zi,-1) I
x q(il + 1
I
I
Yi,, Zi,) ...q(n - 2 1 Yn-3, Zn-3),
where Z0 = Yo = 0, Zt = zl +
+ zt, and Yt is the number of il, ... , ik that do not exceed t. We now estimate the probabilities p(t Y, Z) and q(t Y, Z). It is clear that p(t I Y, Z) + q (t Y, Z) = 1, and the probability p(t I Y, Z) does not depend on t and equals the probability p2(s, N) that two fixed places corresponding to the I
I
I
edge (1, t), (2, t) will be occupied after allocating s = T - 2Y - Z edges into N = (2) - Z places in the classical scheme of allocation of particles. Therefore
p2(s, N) =
S!
1! 1! (s - 2)! N2
(-
2
s-2
s!
+ k+l>2,k,1<1 k!l!(s-k-l)!Nk+1 (1
2
N
)s-k-1
Systems of random linear equations in GF(2)
176
and we have the following estimates:
\s-2
1)
S(N 1)
P2 (s, N)
S(
2
\s-2
1-(
s(s - 1)
+
k+t,ok!1!(s-k-1-2)!Nk+t
N2
- S(S2 1)
2 \ s-k-1-2
(s - 2)! 2
1 -N
),-2 + S(
1) N2
(i_pJ1
(i_(i_)
s-2
Since
T-3n<s=T-Y-Z
(2n)
- Z < n(n - 1)/2,
we obtain for all k = 0, 1, ..., n - 2, PkQn-k-2
pit ...ik <
,
where
P = max p(t 11'i-1, Zr-1),
Q = 1 - min p(t I Yr-1, ZI-1) Therefore it follows from (3.6.16) and (3.6.17) that
P{m12 < M) :S (P + Q)n-2P{41 + ... + sen-2
m),
where 1, ... , n-2 are independent identically distributed random variables,
PR I = 11 = P/(P + Q),
PR I = 01 = Q/(P + Q).
As n, T -+ oc and T/(2) -+ 0, P+Q
42 n (I+ O(T/n2))>
and under the conditions of the theorem,
(P + Q)n-2 = 1 + o(1).
The random variable n-2 = 1 + parameters (n - 2, P/(P + Q)).
(3.6.18)
+ n-2 has the binomial distribution with
3.7 Notes and references
Let a =
177
(n - 2)P/(P + Q) and m = a(l - 02). We assume that T
is not too large, so that A4T2/n 10 3 - 0. Then, for sufficiently large n,
m} <
_
Dn-2 1
j
02,J/2
2n -oo
-02a/2(1 + 0(1))}
2
e U /2 du(1 + o(1)),
and there exists a constant c such that
m} <
Ce-D aa/8
(3.6.19)
Thus, by virtue of (3.6.16), (3.6.18), and (3.6.19),
P{µ(n, T) < m} <
cn2e-A4a/8
0
(3.6.20)
because, under the conditions of the theorem, A4T2/(n3 log n) - oo, and consequently,
nee-°4a/8
0.
As in the proof of Theorem 3.6.1, we have to show that under the conditions of Theorem 3.6.4, the system S2 is indecomposable and P(n, T) -+ 1. In other words, we have to show that bid = b f o r all i, j = 1, ... , n with probability tending to 1. By the same reasoning as in the proof of Theorem 3.6.1, for m = all - 02), we obtain
1 - Pm(n, T) <
()e_m4'4,
(3.6.21)
and under the conditions of Theorem 3.6.4, the right-hand side of (3.6.21) tends to zero. The assertion of the theorem follows from (3.6.15), (3.6.20), and (3.6.21).
3.7. Notes and references The theory of systems of random equations in finite fields was developed by the Russian mathematicians V. E. Stepanov, G. V. Balakin, I. N. Kovalenko, A. A. Lev-
itskaya, and others. The connection between systems of equations in GF(2) and graphs was first pointed out and used by Stepanov. The notion of a critical set was introduced in [79] (see also [13] and [85]). The theory of recurring sequences and shift registers mentioned in Section 3.1 can be found in [50] and [156]. Theorems 3.2.1 and 3.2.2 were proved by Kovalenko in [92]. This brilliant result initiated a series of investigations of similar problems that were carried out by Kovalenko and his school. These investigations developed in two directions. The first direction concerns extensions of Theorems 3.2.1 and 3.2.2 to matrices
Systems of random linear equations in GF(2)
178
over more general algebraic structures. It is not difficult to see that by virtue of the Markovian character of the process pn (t), a recurrence relation for pn,T (k) = P{pn (T) = k} can be derived and used for the proof of Theorem 3.2.1. In this way, the extension of the result to a finite field with q elements can be easily obtained
[93]. Let the elements of T x n matrix A = Ilat, j II in GF(q) take the values 0, 1, ... , q - 1 with equal probabilities, then the pn,T(k) for any k = 0, 1, .. . satisfy the equation
pn,T(k) = znpn,T-1(k) + (1 - z")pn-1,T-1(k),
(3.7.1)
where z = 1/q. Indeed, if the first row of A is a zero vector, then p,, (T) = p, (T - 1), and if the row contains at least one nonzero element, then p, (T) = pn_1(T - 1) + 1. It follows from (3.7.1) that ifs > 0 and m are fixed integers,
m +s > 0,n -> oo, and T =n+m, then 00
Pip,, (T) = n - s} _ q-s(m+s) F1 (1
- qi) H (I - q')
i=s+1
-1 .
(3.7.2)
i=1
The investigations in the second direction concern the bounds of invariance of the results of Theorems 3.2.1 and 3.2.2 with respect to the deviations of the distribution of elements of the matrix A from the equiprobable distribution. The problem of the invariance and a proof of Theorem 3.2.3 are given in [91, 92]. A modified proof of Theorem 3.2.3 is contained in [93]. Theorem 3.2.4 can be easily extended to any moment of a fixed order of the number of solutions, but that is not sufficient for the proof of the invariance property,
since the limit distribution (3.7.2) does not satisfy the sufficient conditions of the unique reconstruction by its moments; hence, Theorem 1.1.3 cannot be applied. Levitskaya [96, 97] presents results on the number of solutions of linear random systems over arbitrary rings and the corresponding results on the invariance of the moment and the limit distributions. These results are summarized in [93], where, in particular, the exact bounds for the invariance are given for random linear systems in arbitrary finite rings. For the system considered in Theorem 3.2.3, the exact bounds for pO have the form
<
1 - bn ,
where S, = (log n + xn)/n and x, - oo arbitrarily slowly as n -+ oo. Matrices that satisfy condition (3.3.1) were considered by Balakin [12], who also proved Theorems 3.3.1 and 3.3.2. Closer investigation of the estimates used in our proof of Theorem 3.3.1 allows us to obtain the following assertions. Theorem 3.7.1. If n ---> oo,
T =n+0 logn,
3.7 Notes and references
179
,B - -oo, f3n = o(n/ log n), and condition (3.3.1) holds, then the distribution of s(A) converges to the Poisson distribution with parameter e-x. Theorem 3.7.2.
If n ---> oo,
T = n + 0 log n + o(log n), fl is a constant, and condition (3.3.1) holds, then the distribution of s(A) converges
to the Poisson distribution with parameter e-x if (3 < 0, and with parameter
e-x-0
if (3 > 0.
Theorems 3.3.1, 3.3.2, 3.7.1, and 3.7.2 give a complete description of the behavior of the rank of such matrices, except for the case (3 = 0, where the behavior is unknown. Note that in [12], the analogues of Theorems 3.3.1, 3.7.1, and 3.7.2 are proved for the systems over GF(q), q > 2 (see also [86]), and the connection between the rank of a matrix in GF(q) and other characteristics such as the permanent rank and rank of lines is considered. The initial results on the ranks of random matrices are presented in [38] and [11]. Stepanov began investigating systems of linear equations of the form (3.4.1) with the help of their relations to random graphs. In particular, he proved The-
orems 3.4.1 and 3.4.2. Now the theory of random graphs provides a basis for obtaining the results on the systems of random equations with coefficients taking their values with equal probabilities. If the coefficients of a system are essentially nonequiprobable, then there are no standard approaches to investigating its properties. Only a few results are known for such systems. We remark that at this time, graph theory is not sufficiently developed to answer questions about nonequiprobable cases. Only the method of moments (see Theorem 1.1.3) and the so-called direct methods are used to solve these problems. Theorem 3.4.3 is a corollary to Theorem 2.4.1 proved in [88] by the method of moments. Theorems 3.4.4 and 3.4.5 are proved in [83]. The asymptotics of the probability of consistency of a system of linear equations in GF(2) (and in more general algebraic structures) with independent random coefficients that take the values 0 and 1 with equal probabilities have been obtained by Levitskaya [98] (see also
[93]). This probability takes only two values and is the same for all possible right-hand sides of the system that are not the zero vector. It follows from Theorems 3.4.4 and 3.4.5 that the probability of consistency of the system (3.4.1) depends on the number of l's in the vector of the right-hand sides of the system (see also [83]). The results of Section 3.5 on the behavior of the probability of consistency of the system (3.5.1) can be found in [13] (see also [85]). Theorem 3.5.1 is proved by the author, but the critical values ca, were first obtained by Balakin under slightly different assumptions on the matrix Ar,,,,T. These results are extended to GF(q) in [89]. The proof of Theorem 3.5.2 is given in [87].
180
Systems of random linear equations in GF(2)
We can consider the probability of the consistency of a system from the point of view of mathematical statistics. Consider, for example, the system (3.4.1) and assume the following two hypotheses on the distribution of the right-hand sides of the system. Let the hypothesis Ho be the existence of a vector X* = (xl , ... , xn
which is interpreted as the true solution of the system, and bt = x*t) + x*(t), t = 1, ... , T. Under hypothesis Ho, system (3.4.1) is always consistent. Under the alternative hypothesis Ht, the right-hand sides bl, ..., bT are independent random variables that are independent of the left-hand side of the system and take the values 0 and 1 with equal probabilities. To distinguish between the hypotheses Ho and H1, we can use the consistency of the system as a test: If the system is consistent, we accept the hypothesis Ho, and we accept HI otherwise. Therefore the hypothesis Ho is never rejected if it is true, and the error of the first kind, the probability of rejecting Ho if it is true, is zero. The error of the second kind, the probability of accepting Ho if it is wrong, is equal to the probability of consistency of the system (3.4.1). Thus, the probability of consistency is the main characteristic in the statistical problem of testing the hypotheses Ho and H1. Section 3.6 is devoted to the other statistical problems that consist of recon-
structing the true solution on the basis of a system of random equations with distorted right-hand sides. These results can be found in the paper [84].
4
Random permutations
4.1. Random permutations and the generalized scheme of allocation 1, 2, ... , n) into Denote by S the set of all one-to-one mappings of the set X itself. This set contains n ! elements. We consider a random permutation or that equals any element of S with probability (n!)-1. A permutation s E S can be written as (1
2
n
S
S1, S2,
,
Sn
where sk is the image of k under the mapping s, k = 1, ... , n. The mapping s can be represented also by the graph Fn(3) = r(Xn, Wn) whose vertex set is Xn, and the edge set Wn consists of the arcs (k, sk) directed from k to sk, k = 1, . . . , n. Since exactly one arc enters each vertex and exactly one arc emanates from each vertex, the graph rns' consists of the connected components that are cycles, which are called the cycles of the permutation s. Denote by Fn the random graph corresponding to the random permutation a, s I_ which takes the values s with equal probabilities. It is obvious that P{ r, = rn
(n!)-1
In Section 1.3, we showed that the generalized scheme of allocation introduced in Section 1.2 can be applied to a wide class of problems related to the behavior of the connected components of random graphs. In Example 1.3.1, we showed that the generalized scheme can be used in the study of random permutations. Recall that in the generalized scheme, we separate the subset of graphs with exactly N components, assign one of the N! possible orders to the set of these components, and denote by 711, ... , ?IN the sizes of the components. If there exist nonnegative identically distributed random variables 1, ,N 181
Random permutations
182
such that for any integers k1, ... , kN,
P(iji
I
1+...4N=n}, (4.1.1)
we say that the generalized scheme determined by the random variables 41, , SAN is applied to the random graph. As was shown in Example 1.3.1, the generalized scheme that corresponds to the
random graph r, of a random permutation from Sn is determined by the random variables 1, ... , N with the distribution k
P141 = k) =
k = 1, 2... ,
k log(1 - x)'
0 < x < 1,
(4.1.2)
since the number of elements in Sn is an = n ! and the number of connected realizations of the random graph rn is bn = (n - 1)!. For the random permutations, the corresponding generating functions have the form
-
00
A(x) _ E anxn = n! -X' n=0 00
bn
B(x) = Y n = -log(1 - x). n=0
Thus the study of various characteristics of random permutations can be accomplished with the help of the generalized scheme. This is demonstrated for the most part in [78]. Recall some combinatorial identities that follow from the general results of Section 1.3. Let vn be the number of cycles in a random permutation from Sn. Lemma 1.3.3 gives the equality
P{vn=N}=
(
(4.1.3)
N
Denote by a, the number of cycles of length r in a random permutation from Sn, r = 1, ... , n. According to Lemma 1.3.7, for any nonnegative integers
m1,...,mn, n
P{at = m1, ..., an = mn} _ H r=1
if m 1 + 2m2 +
1
rmrmr!
+ mm, = n, and the probability is zero otherwise.
(4.1.4)
4.2 The number of cycles
183
Let us introduce the generating function 00
P{ai =ml,...,an =mn}tl' ...tnn
0n(tl,...,tn) _ ml,...,mn
ynl( ..mn!
(2)m2...(tn)mn
(1)m'
where the summation is over the set of integers
Mn ={m1 > 0, i = 1,...,n,
=n}.
Put wo = 0. It is not difficult to see that cpn (t1, ... , tn) is the coefficient of un in the expansion of exp{ut1 + u2t2/2 + }: 00
00
(Pn(t1,
(p(u, tl, t2, ...) =
... , tn)un = exp
n=0
i[n to
(4.1.5)
n=1
The generating function (4.1.5) was obtained by Goncharov and was the basis of his pioneering investigations of random permutations [53]. In [78], the approach based on the generalized scheme of allocations was used in such investigations. In the next sections, we will present some examples of how the generalized scheme of allocation can be applied to random permutations. This will supplement the investigations presented in [78].
4.2. The number of cycles It is well known that the number of cycles vn in a random permutation from Sn is asymptotically normal with parameters (log n, log n) as n -> oo. More precisely,
as n - oo, P{vn = NJ =
1
27r log n
e "2/2(1 + o(1))
(4.2.1)
uniformly in the integers N such that u = (N - log n)/ log n lies in any fixed finite interval. The approach based on the generalized scheme of allocation makes it possible to obtain the asymptotics of the probability P{vn = NJ for all possible values of
N = N(n) as n - oo. According to (4.1.3), for any integer N,
P{vn = N} =
(-loN'xn x))T
P{i;l
n},
(4.2.2)
where the parameter x can be taken arbitrarily from the interval (0, 1), and 1, ... , N are independent identically distributed random variables with distribution (4.1.2).
Random permutations
184
Thus, to study the asymptotic behavior of the distribution of v,,, it is sufficient to obtain the corresponding local limit theorems for the sum
N =6 + ... + N' where the parameter x in the distribution of the summands can be chosen so that obtaining the local theorems becomes simple. We begin with x = 1 - 1/n and prove a series of limit theorems that make it possible to describe the behavior of the probability P{vn = N} for the values of N not too far from log n.
If n - oo, N = y log n + o(log n), where y is a constant,
Theorem 4.2.1.
0 < y <00, then 1
=k} =
ni'(Y)zy-le_Z(1
+o(1))
uniformly in the integers k such that z = k/n lies in any interval of the form 0 < zp < z < zl and zo and zl are constants. Before proving the theorem, we obtain some auxiliary results. We have chosen
x = 1 - 1/n. For such x, 1/n)k k} _ (1 klogn
k = 1, 2, ... ,
(4.2.3)
and the characteristic function of the random variable l;1 equals ten(t)=-llog(1-eit+ 1eit
lo g n
.
n
\
)
Represent tpn (t) in the form
=
- logn ( l og
( - it\ +log (1 + ) *l (t) + *2 (t))) -1-
>
(4 2 4) .
.
where r 1 (t) =
1
it -
it _ it /n - it '
2 (t) = n(1/n -l
it).
For *1(t) and *2(t), the following estimates are valid:
i'1(t)1
1
2 (t) I
leit - 1 - its < Iti 2'
I ti
leit- 11 nltl
-
(4.2.5)
1 n.
(4 2 6) .
.
4.2 The number of cycles
185
By using the explicit form of tpn (t), the representation (4.2.4) and the bounds (4.2.5) and (4.2.6), we obtain the following estimates of cpn (t).
If n oo, N = y log n + o(log n), where y is a constant, 0 < y < oc, then for any fixed t, Lemma 4.2.1.
N co
_t
1
n
(1 - it)y
Lemma 4.2.2. If n oo, N = ylog n + o(logn), where y is a constant, 0 < y < oo, then there exist positive constants s and c such that for I t/n I < s,
On (t/n)I < cltI-Y. Lemma 4.2.3. If n -+ oo, then for 0 < s < ItI < r, where a is an arbitrary constant, there exists a constant c such that for sufficiently large n,
Icpn(t)i < c/logn. Lemma 4.2.4.
If n
oo, then there exists a positive constant s such that for
It/ni <E, 2
(1+ItI)logn As follows from Lemma 4.2.1, as n - oo and N = y log n +o(log n), where y is a constant, 0 < y < oo, the distributions of the normalized sums N/n converge
to the gamma distribution with characteristic function (1 - it)-y and density zy-1e-Z/ I'(y), z > 0. Actually, as stated in Theorem 4.2.1, these distributions become close locally.
Proof of Theorem 4.2.1. By the inversion formula, the probability
k} = PRN/n = z} can be represented in the form 1
n = z} =
27rn
an
e-itzcn
f7rn
(tl n) dt,
and 1
1'(Y) z
y-1 e z =
foo
a-itz
(1 - it)y
dt.
Hence,
2,rnP{r;N/n=z}-2ne-z=I1+I2+13+I4,
Random permutations
186
where A
I1 =
J
e-ttz(cn (t/n) - (1 - it)-y) dt, A
e-ttzco (t/n) dt,
12 = f
13 = J
e-itzcn (t/n) dt, n<jtI
f
14
e-itz(l - it)-y dt,
with the constants e and A to be chosen later.
By Lemma 4.2.1, cpn (t/n) -+ (1 - it)-y for any fixed t. By Theorem 1.1.9, this means that the convergence is uniform with respect to t in any finite interval. Therefore I1 -+ 0 for any fixed A as n -> oo. By Lemma 4.2.3, for sufficiently large n,
1131 < 2lrn(c/logn)N < 27rne-2N1y, and, for N = y log n + o(log n), the right-hand side tends to zero as n To estimate 12 and 14, we integrate by parts. For 14, this leads to as
fA
e-ttz(1 - it)-y dt =
e- ttz
00
- iz(1 - it)y
+ y f °O e-itz(1 Z
A
-
oo.
it)-y-1 dt.
A
Therefore
2y
2
Ilal
z(1 + A2)y/2 + z
f lA
2y /0O dt
2
c zAy +
Z
fA
dt
00
(1 + t2)(y+1)/2 C4
tY+l+l
-- Ay ,
where c4 is a constant, and 14 can be made arbitrarily small by the choice of sufficiently large A. Similarly, °O
IA
e-itz
O
N
_ dt n n
(OL
e
1 (l 1) dt.
4.2 The number of cycles
187
By using the estimates of Lemmas 4.2.2, 4.2.3, and 4.2.4, we obtain
z
z C
1
C
2
sn
2 N 2N + - I cpn (s) I +
2
zn
)N)
3 logn+c
Av
dt
fA
f°°dt tv+t
where c, C2, and C3 are constants. If we choose sufficiently large A and n, we can make 1 121 arbitrarily small.
Now we can prove the following theorem on the behavior of the probability P{vn = N}. Theorem 4.2.2.
If n
0o and N = y log n + o(log n), where y is a constant,
0
(log n)N
N!nr(y)
(1 +o(1)).
Proof. For x = 1 - 1/n, the representation (4.2.2) takes the form N
P{vn = N) =
N!
(11 g nl/n)n P{fin, = n},
(4.2.7)
+4'N is the sum of independent identically distributed random variables with distribution (4.2.3). By Theorem 4.2.1,
where N = 4i +
n} =
nr(Y)e-1(1
+o(1)).
By substituting this expression into (4.2.7), we obtain the assertion of Theorem 4.2.2.
The case where y = N/ log n -* 0 is described by the following theorem. Theorem 4.2.3. If n
0o and y = N/ log n
0, then e-1
n} = NP{i;1 = n}(1 + o(1)) =
(1 + o(1)). n
Proof. Taking into account that y < 1/2 beginning with some n, we choose the level n (1 - y) and represent the probability P{ N = n) as follows:
n) =
n, i < n(1 - y), i = 1, ... , n} + NPRN = n, N > n(1 - y)).
(4.2.8)
Random permutations
188
Since
-1
PR I = m} =
e
n log n
(1 + 0(1))
uniformly in m, n > m > n(1 - y), we see that
PRN = n, N > n(1 - y)} = E P{N = m, N-1 = n - m} m>n(1-y)
= e-1 PRN_1 < yn}(1 + o(1)).
(4.2.9)
n log n
We now prove
PRN-1 < yn) --* 1.
(4.2.10)
Show that the random variable N/(yn) converges in probability to zero. By the representation (4.2.4) and the estimates (4.2.5) and (4.2.6), Wn
Y f = -logn (log \n - Yn/ + O \Yn// +
log
1
0(
log n
y n log n
and if y = N/ log n -* 0, then N
t
(yn)
11og(Y-it)-logy+O logn
-(
(ynlogn))N1 1
Thus, the characteristic function of W(yn) converges to the characteristic function of the random variable that assumes the value 0 with probability 1, and we obtain (4.2.10). With some technical difficulties, it can be proved that under the conditions of the theorem,
P{fin = n, t < n(1 - y), i = 1, ... , n} = o(1/(n logn)). The assertion of the theorem follows from this relation and the relations (4.2.8), (4.2.9), and (4.2.10). Theorem 4.2.4. If n -+ oo and y = N/ log n -* 0, then
P{vn = NJ =
y (log n)N
Ni n
0+00)). (1
Proof. The assertion of the theorem follows immediately from Theorem 4.2.3 and representation (4.2.7) if we take into account that the gamma function 1' (y) _
1/y(1 +o(1)) as y -+ 0.
4.2 The number of cycles
189
Now consider the case where N/ log n -+ oo. We distinguish four subcases:
a=n/N-aoo, a-+ c> 1, a
lwithm=n - N -+ oo,anda -lwith m
fixed.
Let a oo. We must select the value of the parameter x so that ton. Since for l;l with distribution (4.1.2),
is close
(1 - x)log(l - x)'
we choose x, 0 < x < 1, such that x (4.2.11)
(1 - x)log(1 - x)
a'
where a = n/N. This equation is approximately satisfied if we take
x=1- a log a 1
If N/ log n oo, then x = 1 - 1/(a log a) is farther from the singular point x = 1 than x = 1 - 1/n, and therefore the normal approximation is valid for the
sum N. Theorem 4.2.5. If n, N
oo, a = n/N - oo and
oo such that N/ log n
the parameter x = 1 - 11 (a log a) and 6a = a log a, then
k} =
2nNe-Zl2(1 + o(1)) Qa
Z
uniformly in the integers k such that z = (k - n)/(aaN/_N) lies in any fixed finite interval. Proof. The characteristic function of the random variable l; i is
Pn(t)
log (l - xe`t) log(1 - x)
It is easy to see that for any fixed t, as N/ log n -* oo and a = n/N -* oo, e
(or.,) t
1 - 2N + o N
Denote by /n(t) the characteristic function of N =
n)/(a ay w ), then
under the conditions of the theorem for any fixed t,
*n(t) = (e_itcon (
t tv
))N
-
,
e-t2/2,
and the distribution of N converges weakly to the normal distribution with parameters (0, 1).
Random permutations
190
The local convergence can be proved by the standard reasoning and we omit this technical part of the proof of Theorem 4.2.5. From Theorem 4.2.5 and representation (4.2.2), we obtain the following assertion.
Theorem 4.2.6. If n, N -+ oo such that N/ log n -+ oo, a = n/N -* oo, then
P{v = NJ _ (- log(1 - x))N (1 + o(1)), N! xnUa 27rN
where x = 1 - 1/(a log a) and as = a -10-9a. The following theorem for the case where a tends to a constant greater than 1 can be proved in the same way as Theorem 4.2.5.
If n, N -+ oo and there exist constants ao and al such that 1 < ao < a < al, the parameter x = x, where xa is the unique solution of Theorem 4.2.7.
equation (4.2.11) in the interval (0, 1), and 2 _
xalog(1 - X,,,) + xa
Qx (1 - xa)2log2(1 - xa)
k}
2,-e-ZZ/2(1
=
+ o(1))
Qx
uniformly in the integers k such that z = (k - n)/(vx 27rN) lies in any fixed finite interval. Proof. The proof is similar to the proof of Theorem 4.2.5 and we omit the details.
Note only that a =
and or; = D41 for x = xa.
Using Theorem 4.2.7 and representation (4.2.2), we obtain the following assertion on the distribution of v,,.
oo and there exist constants ao and al such that
Theorem 4.2.8. If n, N
1
(- log(l - xa))N (1 - o(1)), N! xot 0rX 2nN
where xa is the unique solution of equation (4.2.11) in the interval (0, 1), and
2_
xalog(1 - X,,) + xa
Ox
(1 -x0)2log2(1 - xa)
The asymptotic normality of N is preserved if a = n/N specified below.
1 slowly, as
4.2 The number of cycles
191
Theorem 4.2.9. If n, N -+ oo such that a = n/N -+ 1 and m = n - N -* oo, and the parameter x = xa, where xa is the unique solution of equation (4.2.11) in the interval (0, 1), then
k} =
e-Z2/2(1 + o(1))
1
27 m
uniformly in the integers k such that z = (k - n)/,,,I-m- lies in any fixed finite interval.
The proof is similar to the proof of Theorem 4.2.5 and we omit the details. From Theorem 4.2.9 and representation (4.2.2), we obtain the following asser-
tion on the behavior of P{v = N}. Theorem 4.2.10. If n, N --* oo such that a = n/N --+ 1 and m = n - N then
P{v = N} _
(- log(1 - xa))N N! xa
27rm
oo,
(1 + o(1)),
where xa is the unique solution of equation (4.2.11).
It is not difficult to see that if m2/N
xa =
2m
N
0, then
(1 + O(m/N)),
and consequently
(- log(1 - xa))N = xa (1 + x,,12 + O(xa))N = xaem (l + O(m2/N)), xa
2"' m"' (1
= Nm
+ O(m2I N)).
Therefore it follows from Theorem 4.2.9 that if n, N - oo, a = n/N -> 1, m --+ oo and m 2 /N --* 0, then
P{v = N} =
Nm
N!2"'m!
0+00)).
Finally we consider the case where m is bounded. Theorem 4.2.11.
If N -+ oo and the parameter x = 1/N, then for any fixed
k=0, 1,..., 1
Z
kk!e-1/2
Proof. By expanding the characteristic function cp(t) of the random variable 41 with parameter x = 1/N, we obtain for any fixed t,
00 =
lo
_
tt
log(1 xx)
_ = e`t (1 + 2 (e"
1)
+ O(x2) )
.
Random permutations
192
If x = 1/N and N oo, then the characteristic function of N - N is equal to (1 + (e" - 1)/(2N) + O(N-2))N and tends to e(e"-1)/2. This means that the distribution of N - N converges to the Poisson distribution with parameter 1/2.
From this theorem and representation (4.2.2), we obtain the following assertion,
which completes the description of the asymptotic behavior of the distribution of vn.
Theorem 4.2.12. If n -+ oo, n/N -+ 1, and m = n - N is fixed, then
P{vn = N) =
Nm
N! 2mm!
0+00)).
It is not difficult to see that Theorems 4.2.2, 4.2.4, 4.2.6, 4.2.8, 4.2.10, and 4.2.12 give a complete description of the asymptotic behavior of the distribution of the number of cycles in a random permutation of degree n as n oo.
4.3. Permutations with restrictions on cycle lengths In this section, we present some results on permutations with restrictions on their cycle lengths. Let R be a subset of the set of natural numbers. We consider the set Sn,R of all permutations of degree n with cycle lengths from the set R. One of the first questions that arises in this situation concerns the asymptotic behavior of the number an,R of elements in S,,R. This problem is far from being completely solved. Here we describe some of the solutions provided by an approach based on the generalized scheme of allocation. Let the uniform distribution be defined on Sn,R and let vn,R be the total number of cycles in a random permutation from this set. Put bn, R = (n - 1) ! if n E R, and bn,R = 0 otherwise. It is easy to see that
P{vn,R=N}_
bnl,R ... bnN R
n! N. an R
nl .... nN.
1+
(4.3.1)
We introduce independent identically distributed random variables N ) with distribution
P{ (R) = k) = bk,Rxk
xk
k! BR (x)
kBR(x)'
bk Rxk
xk
k E R,
where 00
BR(x)=E k=1
k! kER
k'
x>0.
(4.3.2)
4.3 Permutations with restrictions on cycle lengths
193
By using these random variables, we can rewrite (4.3.1) in the form
P{vn,R = N} =
n! (BR(x))N P{ xn N! an,R
(R)
+.--+ N(R)--n}.
(4.3.3)
Hence, summing over N, we obtain 00
nI
anR=xe BRW
R(x))N -BR(x) P{ 1 N! e
(R)
+...+4 (R) =n}.
(4.3.4)
N=1
It is clear that above we have repeated the general approach of Section 1.3 for the case of the set Sn, R, and relations (4.3.1), (4.3.3), and (4.3.4) are the realizations
of the general relations (1.3.1), (1.3.10), and (1.3.11), respectively. To find the asymptotics of the numbers an, R, it is sufficient to choose an appropriate value of the parameter x, substitute it into the expression of the distribution (4.3.2), and then prove a local limit theorem for the sum of independent random variables with this distribution. We succeed in obtaining results on an,R only if the structure of R has some regularity. In the general case, the asymptotics of an,R is unknown. To demonstrate the approach, we consider first a simple case where R is the set E of even numbers.
Theorem 4.3.1. If n --* oo, then
an,E = 2 (n )n (I + o(1))
(4.3.5)
e
for even n, and an, E = 0 for odd n.
Proof. To prove the theorem, we use the representation (4.3.4). We consider the random variables 4(E), ... with distribution (4.3.2), where R = E _ {2, 4, ...}, and N)
x2k
BR(x) = BE (x) _
k
kER
= -2log(1 -x2).
The random variables ; _1(E)/2, i = 1, ... , N, are independent identically distributed, and
P{fit = k} = -
x2k
2klog(1 - x2)
>
k = 1, 2, ....
(4.3.6)
If we choose x = 1 - 1/n, then this distribution coincides with distribution (4.2.3) from the previous section, and according to Theorem 4.2.1, if n - oe, N = y log n + o(log n), where y is a constant, 0 < y < oo, then
P{sit + ... + 4N = k} =
1
nr(y)
zY-le-z(1
+ o(1))
Random permutations
194
uniformly in the integers k such that z = k/n lies in any fixed interval of the form 0 < zo < z < z1, where zo and z1 are constants. Since
(E)
(E)
=
+ ... + N =
we obtain that if n -k oo, N = (log n)/2 + o(log n), and n is even, then i
= n} =
n/2} =
e-1/2(1 + o(1)).
(4.3.7)
n 1-7-r
For odd n, this probability equals zero. To obtain an,R with the help of relation (4.3.4), we have to sum the probabilities P{.iv ) = n} with the Poisson coefficients. To this end, we need to estimate these probabilities for all N. We show that for all N,
P{.N' = n} < 2N
(4.3.8)
n log n
This bound is a consequence of the following chain of estimates. It follows from (4.3.2) that
p
(E) {
-n)
=
xn 1
(BE(x))N K(n,N) k1 ... kN
where
K(n, N) = {k1, ... , kN: k1 + ... + kN = n, k1, ... , kN E R}. Hence,
=n} =
xn
1
(ki ... kN-1
n(BE(x))N K(n,N)
N n(BE(x))N
xk'
... xkN-'
k1
kN 1
KO
+...+
k2 ... kN
N-1 xk
N Lr n(BE(x))N (1
<
We obtain relation (4.3.8) because B = BE(x) = (logn)/2. We split the sum
n/2 BNe-B
S=E
N=1
N. !
N nBE(x)
4.3 Permutations with restrictions on cycle lengths
195
into four summands, dividing the domain of summation into four parts:
Al = {N: 1 < N < B - B3/4}, A2 = {N: B - B3/4 < N < B + B3/4
A3 = {N: B+B3"4
BNe-B -AT!
S2 = NEA2
N -B
1ie i/2n E B NI (1 +0(1)) =
+o(1)),
NEA2
since B = (logn)/2, and as N -+ co, BNe-B 1.
N!
NEA2
The remaining part of the sum is o(1/n). Indeed, by applying estimate (4.3.8), we obtain
2B n logn
Sl
N!
NEAP
BNe-B
1
BNe-B
n NEAP
N!
and Si = o(11n) because
BNe-B NEA1
N!
asn It follows from (4.3.8) that S3 <
BNe B
log n 2n
N!
NEA3
If we use the normal approximation for the Poisson distribution, we find that
BNe-B Nl
NEA3
BNel-B <
< N>B+B3/4
N.
eo
Cl 41
/4
where cl and c2 are constants. Hence, S3 = o(1/n).
e-u2/2 du <
Random permutations
196
Similarly, by using (4.3.8), we obtain 1
< log n
- logn
BNe-B N! < log n
(.;)Ne_B
1 BS4
Na
_ (el N>B2
B/ e
Nz
-B
Hence, S4 = o(1/n) because e/B < e-1 for n sufficiently large. If we combine the estimates of S1, S2, S3, and S4, we obtain
S=
1 (1 + o(1)).
V
7re n
Substituting this expression into (4.3.4) and expanding n ! by the Stirling formula give the assertion of the theorem.
The analogous result is valid for the number of permutations for which R is the set of odd numbers. We turn now to the case where the set R is not as regular as E. Let R(k) be the number of elements of R that are not greater than k. Set R(O) = 0. In the sequel, we assume that
lim R(k)/k=p, 0
k-*oo
In this case, p is called the density of R in the set of natural numbers. We will find the asymptotics of a,,,R under the following additional conditions on the set R.
(1) There exists a positive integer r such that, for any nonnegative integer s, the set R n Is + 1, . . . , s + r} cannot be embedded in any integer lattice with a step not equal to 1.
(2) The generating function F(z) of the set R has a finite number m of poles at the points zl = e2nit/m l = 0, 1, ..., m - 1, on the unit circle Izj = 1; in other words, it is of the form
F(z) _ T, zk = P(z)/(1 -
zm),
(4.3.9)
kER
where P(z) is a polynomial. Note that, since the coefficients of the series F(z) take a finite number of values,
by Szego's theorem (see, for example, [19]), there are only two possibilities for F(z): Either F(z) has the form (4.3.9), or the set of singular points of F(z) is dense everywhere on the unit circle, and therefore F(z) cannot be extended outside the unit circle. We consider here only the first case. In this case, the coefficients of F(z), with exception of some initial numbers, form a periodic sequence with
4.3 Permutations with restrictions on cycle lengths
197
period m, and, therefore, the set R has density p = l/m, where 1 is the number of units in the period. Consider independent identically distributed random variables 1, ... , N with distribution xk k} = k E R, (4.3.10)
kB(x)'
where
x=1-n,
k
B(x)=E
k
kER
Theorem 4.3.2. Suppose that R has the density p > 0 and satisfies conditions oo, N = p log n + o (log n). Then (1) and (2), n
1e-y/ r(p)(1 + o(1))
k} = y
uniformly in the integers k such that y = k/n lies in any fixed interval of the form
0
Theorem 4.3.3. Suppose that R has the density p > 0 and satisfies conditions (1) and (2). Then, as n oo,
an,R = (n - 1)! eBn,R/ r(p)(1 + 0(1)),
(4.3.11)
where
Bn,Rk(1-n kER
\
\k I
.
/I
Since E0 1(1 - 1/n)k/k = logn, the assertion (4.3.11) can be written in the form
an R = n! e-Ln,R/ r(p)(1 + o(1)), where lk Ln,Rk(1-nI. k¢R
/
Theorem 4.3.4. Suppose that R has the density p > 0 and satisfies conditions (1) and (2). Then, as n -* oo,
P{vn,R = N} =
1
2nBn R exp I -
(N - Bn,R)2 2Bn,R
1 0+00))
Random permutations
198
uniformly in the integers N such that (N - B,,,R)/ B,,,R lies in any fixed finite interval.
To prove Theorem 4.3.2, we establish some auxiliary results. The characteristic function of distribution (4.3.10) is
00 k ER
xkeitk
B(xeit)
kB(x)
B(x)
Lemma 4.3.1. If R has the density p > 0, then, as n - oo,
tp(t)
log(1-it)
n
log n
1
+0
/
1
log n
for any fixed t.
Proof. We first derive some auxiliary estimates. It is easy to see that 00
1: xk = Exk(R(k) - R(k - 1)) k=1
kER
00
00
k=1
k=1
_ Y'xkR(k) -
xkR(k- 1)
00
_ (1 - x) T, xkR(k). k=1
Set s = log n. For such s,
xkR(k) < E k < log2n, 1
1
and, since R has positive density,
xkR(k) = k>e
kxkR(k) k k>s
=
kxkp(1 +0(1)). k>s
Thus, as n -+ oo,
xk = p(1 - x) kER
00
kxk(1 + o(1)) + O
k=1
= p(1 -X)
/
= pn + o(n).
(1
\
(1o2n)
xx)2 (1 + 0(1)) I + (4.3.12)
4.3 Permutations with restrictions on cycle lengths
199
Similarly we obtain the estimate
B(x)=1
k
=plogn+o(n).
(4.3.13)
kER
We now write the characteristic function in the form it/n)
B(x) (B(xe`tl n\ - B(x)).
tP (n) = It is easy to see that
B(xe`tln) - B(x) xk(eitk/n
-0
kER 00
ixk(eitk/n _ 1)(R(k) - R(k - 1))
_ k=1
_ E kxkR(k) (eit11n - 1 00
_
xkR(k) (etkmn(i
(eit(k+1)ln
k+I
- eit/n) + n (eit(k+l)/n - 1)
k=1
+
_ 1))
l
(ett(k+1)/n
I
k+ 1
- 1) +
(eit(k+l)/n
1
n(k+ 1)
First of all, we estimate the part that does not contribute essentially to the sum.
If t is fixed and n - oc, then 00
k=1
xkR(k
(eit(k+1)ln
- 1) - O
k(k + 1)n
k=1
xk n2
=O
1
C
n
We transform the other parts of the sum as follows: 00
0 k=1
xkR(k)eitk/n(1
- eit/n)
k 00
-it E k=1
00
-it E k=1
00
- it k=1
xkR(k) eitk/n kn
+ 00E xkR(k)eitk/n (1 + It
R(k)eitk/n+O
1
n2
kn
xk(k)
k
k=1
eitk/n + O
(1)
- eit/n)
n
)700
k=1
(4.3.14)
Random permutations
200
and 00
T1
xkR(k) (eit(k+1)/n kn 00
_ k
- 1))
xkR(k) (eitk/n _ 1) + 00 xkR(k) eitk/n (e it/n kn
=l
_
k=1
xkk (k) (eitk/n
- 1) + O (n f
\/
k=1
- 1)
kn
(4.3.15)
.
Similarly, 00
x kR () k (e it(k+1)/n
- 1) =
k(k + 1)
E xk L. k=1
(k) (eitk/n
- 1)+0 (logn J) n
k2
Set s = log n and E = n log n. Then
n Yxk _ 0 (logn),
xkR(k)eitk/n <
r
k<s
n E kxk _ O
kn
lon2e n
k<e
xk - O (logn). ak<e
xkR(k) (eitk/n - 1) < k2
k<e
n
k<e
xkR(k) (eitk/n _ 1)
k<e
Z
kn
n
In exactly the same way,
xkR(k)eitk/n < k>E
kn
1: k> E
kn
xkR(k) (eitk/n - 1) < k2
kx k < log n
1
n k>E 2
1
n
1
+ nn
Ixk < 1
n
.
n k>E
It is clear that
R(k)/k = p + o(1),
1
n
k>E
xkR(k) (eitk/n - 1) k> E
xk < xE < e-logn =
n1
xk = e"(1 + o(1))
,
4.3 Permutations with restrictions on cycle lengths
201
uniformly in k, s < k < E. Hence,
E e
xkR(k)eitk1n = _ p kn
T 1 e-k(1-it)ln + 0 1 Y` e-kln
L
s
=P
L
n
n E
1 e-k(1-it)ln + o(1). E
Similarly,
xkk (k) teitk1n
pe
E
xkk (k) eitk/n 2
- 1) = p Y
e
s
e-k/n(eitk/n
kn
- 1 + o(1).
The sums in the right-hand sides of these relations are integral sums of integrable functions. Therefore, as n oo, their limits exist and equal 00
e-(1-it)zdz =
1
fo 00
e_z(eitz
fo
- l) dz =
1 - it
- 1,
f 0 l e-z(eitz - 1) dz = - log(1 - it), z
0
respectively. Thus, as n -+ oo, for any fixed t,
B(xe`tin) - B(x) = -p log(1 - it) + o(1), and hence,
=1-log(1-it)+0
t
(n)
logn
1
logn
Lemma 4.3.1 implies that for any fixed t, as n -+ oo and N = p log n +o(logn),
9
N
(n)
(1
lit)P
+o(1),
(4.3.16)
and for the normalized sum (41 + + W/n the limit distribution is the distribution with the characteristic function (1- i t)-P that has the density y P-1 e-y/ r (p). To prove the local convergence of the distributions, we have to estimate cp(t/n) outside a neighborhood of zero.
Random permutations
202
Lemma 4.3.2. Suppose that R has the density p > 0 and satisfies conditions (1) and (2). Then, for any e > 0, there exists q < 1 such that for e < I t I < 7r, Iw(t)I < q.
Proof. Let kt, k2, and k3 be integers and ak,, ak2, ak3 > 0. It is easy to verify that lak, eitkt
+ ak2eitk2 +
ak3eitk312
= (ak, + ak2 +
ak3)2
- 2ak,ak2(I - cost (k2 - k,
- 2ak,ak3(I - cost(k3 - k1)) - 2ak2ak3(I - cost(k3 - k2)).
For a > 0 and S > 0,
a- a2-6 >S/(2a). Therefore
ak, +ak2 +ak3 - Iak,eitk, +ak2eitk2 +ak3eitk3
ak,ak2(I - cost(k2 - kt)) aki + ak2 + ak3
+
ak,ak3(I - cost (k3 - kt))
aki + ak2 + ak3
+ ak2ak3(I
- cost(k3 - k2)) (4.3.17)
ak, + ak2 + ak3
Suppose now that, as in condition (1), the integers kt, k2, and k3 do not lie on any lattice with a step greater than 1 and are contained in an interval of length r. Then, for e < ItI < 7r, the three cosines from the right-hand side of (4.3.17) do not simultaneously take the value 1. Moreover, since kt, k2, and k3 are contained in an interval of length r, their differences can take only a finite number of values. Therefore, there exists a > 0 such that fore < I t I < n,
(1-cost(k2-ki))+(1-cost(k3-ki))+(1-cost(k3-k2)) > 3a
(4.3.18)
uniformly in all such kt, k2, and k3.
We now let ak = xk/k, k = 1, 2, ..., and suppose condition (1) holds for kt > k2 > k3. It follows from (4.3.17) and (4.3.18) that
aki +ak2 +ak3 - I ak, e itkt + ak2 e itk2 + ak3 e itk3 I > aarak, .
(4.3.19)
4.3 Permutations with restrictions on cycle lengths
203
Write the characteristic function cp(t) in the form
rl+r
0o
a
Ot) _ 1=0 k=rl+1
(R(k) - R(k - 1))eitk
B(x)
From every set (rl + 1, ... , rl + r), select, according to condition (1), three integers k11, k21, and k31 from R that do not lie on any integer lattice with a step not equal to 1. Using estimate (4.3.19) gives akiteitkhl + akveitk21 + ak3teitk3t I >
akv + ak2r + ak31 - I
aarak1l > aararl+r.
Therefore, taking into account that R(ki1) - R(ki1 - 1) = 1 for i = 1, 2, 3 and 1 = 0, 1.... yields rl+r
00
B(x)I 0(t)I E Y ak(R(k) - R(k - 1)) 1=0 k=rl+1 00
- y(akv + ak2r + ak31) l=0 00
+
akveitkit I
+ akueitk2t + ak3leitk31
l=0 00 xr(1+1)
aar < B(x) - r
1=0
l+1
(4.3.20)
Inequalities (4.3.20) imply the assertion of Lemma 4.3.2 because r is fixed, x =
1 - 1/n, and, as n -* oo, B(x )
=p l ogn + o (l ogn )
00 xr(1+1)
E
,
1+1
- -l og (I- x r) =
l ogn + o (logn ) .
Lemma 4.3.3. Suppose that R has the density p > 0 and satisfies conditions (1)
and ((2). Then there exist c1 and e > 0 such that for every 1 = 0, 1, ... , m - 1 and I t/n - 27r1/m I < e, for sufficiently large n, 1
Cl
n
1 + (t - 2irln/m)2logn
Proof. We start by estimating 1
n
t
t
i
n
nB(x)
Exkeitkln kE .
204
Random permutations
By condition (2), there exist c, 3 > 0 such that for Izl < 1, Izi - zI < 3, 1 =
0,1,...,m-l,
1: zk < kER
c ,
Izl - zI
1=0,1,...,m-1.
(4.3.21)
Set z = xeit/n where x = 1 - 1/n. It is clear that Izi < 1 and there exists s > 0 such that if It/n - 2ir1/mI < e, then Izi - z1 < 8 for sufficiently large n. Therefore, (4.3.21) implies that f o r I t/n - 27r1 /m I < e, l = 0, 1, ... , m -
1,
cn
E zk
i(t n 2nl m
kER c2n
B(x)v'-1 + (t - 27rln/m)2 Since B(x) = p log n + o(logn), there exists Cl such that for every 1 = 0, 1, ... ,
m - 1,iflt/n-27rl/mI < E, then t
cmn
n
logn 1 + (t - 27rln/m)2
for sufficiently large n. We now proceed to estimate the characteristic function cp(t) in the intermediate range of t. Obtaining the estimate involves some technical difficulties. So, for the sake of greater clarity, we first treat the case R = N. In this case, xk
k} =
pk =
B(x) _ Consider the random variable
,
kB(x)
k = 1,2,...,
log (I -x) = logn. _ 1 - 2. Its distribution is symmetric, and for
m > 0, 00
pm = Pl = m} = T, PkPk+m k=1
Let 00
0(t) = Y' pmeitm = Po + 2 m
pm cos tm. m=1
It is clear that the characteristic function cp(t) of the random variable 1 is related to ep(t) by the equality cp(t) = I cp(t) I2. To estimate ip(t), we use a standard inequality
4.3 Permutations with restrictions on cycle lengths
205
(see, e.g., [49]): For t > 0, 00
00
pm(1 - costm) > 2
1 - cp(t) = 2 m=1
(4.3.22)
Pm, s=0 mEM,s
where
jr 27r s <m< Ms={m:2t+
37r
t
Lemma 4.3.4.
2t
2irs
+ t }.
For mo > 0,
2 j Pm ? m>mo
00 Pk PI T, 1>2mo
k=1
Proof. By using 1 = m + k as the variable of summation, we obtain 1-MO
00
2 E Pm = 2 m>mo
00
P1 Pk + E E PkPI
E PkPk+m = m>mo k=1
00
1>mo+l k=1
1=1 k=1+mo (4.3.23)
The right-hand side of (4.3.23) is estimated from below by the quantity 00
PI > Pk = 1>2mo
k=1
PI l>2mo
To see this, it is sufficient to delete the first terms in the first sum from (4.3.23), retaining 1-MO
P1 Y' Pk, k=1
1>2mo
and, in the second sum from (4.3.23), to shift the domain of summation to 2mo, giving 00
P1 T Pk> l>2mo
k=1-mo+1
which does not exceed the second sum from (4.3.23) by the monotonicity of the probabilities. Lemma 4.3.5. For 0 < t < n, 1 - cp(t) ?:
I 3
T, Pkk>7r/t
206
Random permutations
Proof. Note that the summation on the right-hand side of (4.3.22) occurs over integers m from an interval of length 7r/t. If we enumerate intervals of such a length
on the positive semi-axis starting at the point n/(2t), the domain of summation will consist of the intervals labeled by odd numbers. Notice that the sequence of probabilities pk, k = 1, 2, ..., is monotone, and the numbers of integer points in any two intervals of length 7r/t differ by at most 1. Therefore each interval of length 7r/t for 0 < t < it contained in the right-hand side of the sum (4.3.22) contributes not less than one-third of the total sum of the two following intervals: the interval itself and the interval adjoined to it on the right side, which does not belong to the initial domain of summation. (Note that, as t - oo, the number of integer points in one interval increases and its contribution to the sum tends to 1/2.) Therefore, (4.3.22) implies 00
2
1-tp(t)>2E
pm - 3
i5-.
m>n/(2t)
s=0 mEM,
By applying Lemma 4.3.4, we obtain the assertion of Lemma 4.3.5.
It remains to estimate the sum of the form >k>,, pk from below. If we use the inequality 1 - 1/n > e-11(n we obtain xk k>a k
>
E e-k/(n-1)
J -e-y
00
1
k>ak/(n-1)n-1
>
1)
a/(n-1) Y
dy > C3
log
a
n-1 (4.3.24)
where c3 is a constant. We use Lemma 4.3.5, set a = 7rn / I t I in (4.3.24), and obtain for I t I /n < 7r,
/t
1
1 - cp (n)
E
3 l>nn/Itl
1
pt
- 3B(x)
(c3 -log
nn I ti(n - 1)
1
3B(x)
(log It I+ c4),
where c4 is a constant. Hence, we go on to estimate tp(t/n) and find that logItl+C4)112
<
(
1_ 3 log n
<exp
logItl+c4 6logn
}
If N > 1 log n, then ,p
(n
N
< exp S
12 log I t I +
C4
12 1ogn
} < csltl
- 1/12.
(4.3.25)
4.3 Permutations with restrictions on cycle lengths
207
We now return to the case R c N. We retain the notation cp(t) and tp(t) for the characteristic functions and set akxk
B(x)akx, ak=k; k
kER,
1
kER
SR(k) = 0 fork V R, and SR(k) = 1 fork E R. Lemma 4.3.6. Suppose that R has the density p > 0 and satisfies conditions (1) and (2). Then, for I t I /n < 7r and N > p log n, 2
N (t
tp
1/(12r2p)
< c6Iti
n
where r is defined in condition (1) and c6 is a constant.
Proof. We revise the arguments leading to estimate (4.3.25). Inequality (4.3.22) now takes the following form: For t > 0, 1 - I tp(t)12 =
00
2
B 2 (x)
00
E L: 1:
akxkSR(k)ak+mxk+maR(k+m),
s=0 mEMM k=1
where Ms
_
m.
r
3n
2 rs
2irs
2t + t < m - 2t + t We retain only one summand in each interval of length r, replace this summand by the minimum value over the interval, and use the transition from the sum over one interval of length r to one-third of the sum over the interval of twice the length.
Then we obtain for t > 0,
1`
00
EY
3 L ak+rlx k+r[ 3
ak+mxk+mSR(k+m)
rl>n/(2t)
s=0 mEMs
Once again, we preserve only one summand in each interval of length r and get
1-
itp (t)12
= 3B2(x) 2 >
2
akx kS Y k=1 °O
3B (x) T armx 2
M=1
2
ak+rlx k+rl
R (k)
I>7r/(2tr)
rm
Y,
arm+rlx
rm+rl
1>7r/(2tr) °O
xrm xr(m+l)
2( )r2 L, m- m f l 3Bx 1>n/(2tr) m=1
The assertion of Lemma 4.3.4 is based on the monotonicity of the probabilities pk, k = 1, 2 , .... The summands of the last double sum are similar to the
Random permutations
208
summands of the sum in Lemma 4.3.4, and the values xrk/k, k = 1, 2, ... , are also monotonic. Therefore we may use Lemma 4.3.4 and obtain 1 - Iw(t)12
ea
xrl
2
xrl
log(1 -xr)
xrm
3B2(x)r2 l>Er) l
For a fixed r, the estimate (4.3.24) remains true. Therefore, by taking into account the asymptotics B (x) = p log n + o (log n) and - log(1 - xr) = log n + o (log n), we find 2
1
3r2p2logn
(log ItI +c).
Hence,
-
I
1-
logIti+c
logIti+c eXp 1-6r2 P 2log n I
3r2 p2 log n)
and for N > 2 p log n, N c6it,-1/(12r2P)
where c6 is a constant.
Proof of Theorem 4.3.2. Consider the sum r;N = l + + 4N of independent identically distributed random variables with distribution (4.3.10). As we have seen, Lemma 4.3.1 implies that, as n -+ oo and N = p log n + o(log n), the distribution of Wn converges weakly to the distribution with density
We now prove the local convergence of these distributions. For an integer k, let y = k/n. By the inversion formula, /'nn
n
nn
dt,
where co(t) is the characteristic function of the distribution (4.3.10). The density of the limit distribution at a point u > 0 can be represented by the integral up-le-n
f'(p)
__
1F 27r
1 it)pe-itu
(1 -
du.
Hence,
1e y/r(p)=It+I2+I3,
4.3 Permutations with restrictions on cycle lengths
209
where
- IA e-lty
(PN ( t
f
_
13 = J
(1 - it)P) dt,
n)
A
12
1
e-,ty(1
1
dt,
- it)P
< IrI
e-ZtycoN
-t
dt,
n
and the constant A in the integrals is to be chosen later.
By (4.3.16), for any fixed A, the integral It tends to zero as n -* oo and N = p log n + o(log n). To estimate the integrals 12 and 13, we integrate by parts. For 12, this yields 00
e-ity
e-iry
00
IA
+
iy(1 - it)P A
(1 - it)P
e-ity
p00
p
dt =
dt.
(1 - it)P+1
Y JA
Hence, 2
2
y(1 + A2)p/2 + y
1121
dt
°O A
(1 + t2)(P+l)/2'
and 1121 can be made arbitrarily small by the choice of A. Similarly,
J
ty N
e
t n) dt = City(y N (n)
nn
+ I,
Q
where rrn
I= N iY
A
t
!co'
n
n
tP1
n)
tn
dt.
Therefore
2 1131
_Iw(.)IN+
2 Y
N+III
When we use the estimates of Lemmas 4.3.2 and 4.3.6, we obtain I(v(7r)IN < qN,
q < 1;
I (P(A/n)I N < c5A-11(12r2p)
Hence these summands can be made arbitrarily small. It remains to estimate the integral I. Choose s such that Lemma 4.3.3 is valid, and represent I as the sum of three integrals: I = Il (s) + 12 (s) + 13(8),
210
Random permutations
where I1 (E)
y
f
e-ItycoN-1
(n) n J
'
\ / dt,
I2(s) is the integral over the sum of E-neighborhoods of the poles of F(z), that is, over the sum that equals M-I
V
-En +
21rln
En +
m m
1-1
2nln m
and 13 (s) is the integral over the remaining set m-1
2nln 27r1n En + m m
AE = [-irn, -en] U [En, rn]\ By using Lemmas 4.3.3 and 4.3.6, we find
2Nclc6
°°
1 +t2)1/2t
ylogn
fA
-t/(12.Zn) dt,
(1
and for y > yo > 0, the value I II (s) I can be made arbitrarily small by the choice of a sufficiently large A. By using Lemma 4.3.3, we find 1
n
I
en+2
rln/m
c1
dt <
en+2nln/m
-
en+2nln/m
logn ci
pen
log n
J en
en+2nln/m
dt
1 + (t - 27rln/m)2
dt 1 -}- t2 '
and there exists a constant c7 such that for a fixed e, En
en
dt
< C7 log n.
1 + t2
Therefore, we use the estimate of Lemma 4.3.2 and find that for y > yo > 0, mclc4NgN-1 f8n
112 (E)1 <
ylogn
dt
< mclc4c7yo 1 Ng N-1
En 71==+==t2
and under the conditions of Theorem 4.3.2, the right-hand side tends to zero. Fort E AE, Jqo (tln)I < c8/B(x),
where c8 is a constant that is the upper bound of IF(z)I for IzI = x not in the
4.3 Permutations with restrictions on cycle lengths
211
neighborhoods of the poles. By using this estimate and the estimate of Lemma 4.3.2, we find 113 (01 :
<
Y
f
N-1 (n)
dt
n
c8N
ynB(x)q N-l
fAE As
c8N yB(x) q N_1
Under the conditions of the theorem, the last term of this chain of inequalities tends to zero for y > yo > 0. It is easy to see that by first choosing a sufficiently large A and then a sufficiently
large n, we can make the difference being estimated arbitrarily small. Note that the difference is bounded uniformly with respect to N, and hence, there exists a constant c9 such that for y > yo > 0 and for all N,
k} < c9/n.
(4.3.26)
Proof of Theorem 4.3.3. In (4.3.8), divide the domain of summation into two parts: N1 = {N: IN - B(x)I < N2/3} and N2 = {N: IN - B(x)I > N2/3). It is not difficult to see that the assertion of Theorem 4.3.2 is fulfilled uniformly in N E N1. Therefore
n} = ell I'(p)(1 + o(1)) uniformly in N E N1, so
_+00)). e-1
ni (p)
NEN1
(1
(B(x))N e-B(x) (1 + o(1)) = N! nr'(p)
We use the estimate (4.3.26) and obtain
(B(x NEN2
N!
n} < c9 n
NEN2
Since the sum on the right-hand side of this inequality tends to zero, the total sum
in (4.3.4) equals (enF(p))-l(1 + o(1)). It remains to note that
xn = e1(1 +o(1)),
B(x) = Bn,R = kER
k
(1
- n)k.
Random permutations
212
Proof of Theorem 4.3.4. According to (4.3.3), P{vn,R = N} =
n!(B(x))'' N!xnan,R
PRN = n).
If we substitute the corresponding expressions for an,R and PRN = n), we obtain
P{vn,R = N) = (B(X
)Ne-B(x)(1
+o(1))
for N = B(x) + o(B(x)). We note that B(x) = Bn,R and that the expression obtained above holds uniformly in N such that (N - B(x))/ B(x) lies in any fixed finite interval; thus, we obtain the assertion of Theorem 4.3.4.
4.4. Notes and references The probabilistic approach that is now commonly used in combinatorics was first formulated in an explicit form and applied in the investigations of the symmetric
group S, by V. L. Goncharov [51, 52, 53]. For the random variables al, ..., an, he found the joint distribution (4.1.4) and the generating function (4.1.5). For the total number of cycles vn = al + + an, he proved that, as n - oo,
Evn = logn+y+o(1), Iogn - (n2/2 - y/2)/ logn + o( logn
Dvn =
Goncharov also proved that the distribution of (vn - log n)/ log n converges to the standard normal distribution, and the distribution of a, converges to the Poisson distribution with parameter 11r. Let &, be the length of the maximum cycle in a random permutation from Sn. Goncharov [51, 53] showed that 00
(h
P{ fJV < m) =
h
Sh(m, n),
h=0
where So (M, n) = 1,
1
Sh(m, n) =
k1...kh k1,...,kh>m
Let
Io(x,l-x)=1,
Ih(x,l-x)=
f <1-x,
X1,...,xh>x
dxl ... dxh xl ..xh
0 < x < 1.
4.4 Notes and references
213
Goncharov proved that, as n - oo, the random variable On In has the distribution with the density
_h
X-1
fi(x)=x1 h!) I(x,l-x), 1+A-x-A h=0 which, as is clear from the preceding formula, is defined by different analytic expressions on the sequential intervals of the form [1/(1 +A), 1/A], where X is an integer. For example, 1
1
2<x<1;
O(x)=x,
fix)=x 1-log -xl x \ 1
1
1
3
- -2 1
Although Goncharov investigated the cycle structure of random permutations in great detail, these problems continue to be of significant interest to mathematicians. V. F. Kolchin [71 ] proposed an approach based on the generalized scheme of allocation. The results on the asymptotic properties of random permutations obtained with the help of this approach are presented in [78]. Note that, among the others, the asymptotic logarithmic normality of the middle terms of the series of order statistics composed of the lengths of cycles, and the local limit theorem on the convergence of the distribution of the total number of cycles v to the normal distribution were first proved by this method. It is clear that this approach makes it possible to investigate the asymptotic behavior of the local probabilities
P{v = NJ for all possible values of N = N(n) as n -+ oo. These investigations were carried out in [109, 115, 117, 146, 147, 148]. In Section 4.2, the results of these investigations are presented. Theorems 4.2.1 and 4.2.2 were proved by Yu. Pavlov in [115, 117]; and Theorems 4.2.5,4.2.6,4.2.9,4.2.10, and 4.2.12 were proved by L. M. Volynets in [146, 147, 148]. Methods of estimating the rate of convergence in limit theorems for sums of
independent random variables are well developed in the theory of probability. Therefore the approach that reduces the study of characteristics of random permutations to problems concerning the sums of independent summands provides an obvious way to obtain the limit theorems containing estimates of the rate of convergence. The estimates under the conditions of Theorem 4.2.1 were obtained by Yu. Pavlov [117] and for y = 1 by A. Pavlov [109]. The following result of Volynets [ 146] provides a better bound than the one given in [ 109].
Theorem 4.4.1. If n
oo, N = log n + x Iog n, x / log n -+ 0, then P{vn=N}=(log Nn
)N11+01 lo,n+log 1 n))-
Random permutations
214
Volynets [146] proved this theorem by using the approach based on the generalized scheme of allocation. Let En be the set of all single-valued mappings of the set { 1, ... , n ] into itself.
In particular, S, C En. The random mappings from En were first studied by J. B. Kruskal [94] and B. Harris [57], and many studies have considered subsets of E,,, which are distinguished from En by various constraints on the mappings. We mention only the articles by V. N. Sachkov [128, 129, 130], in which the mappings have the height of less than a fixed number, and cycle lengths are from a fixed set; the articles by A. A. Grusho [54, 55], which treat the subset En,r that consists of the mappings from En whose vertex degrees are not greater than r; the articles by Yu. Pavlov [114, 115] considering the characteristics of the mappings with exactly m components (the case m = 1 is considered by G. N. Bagaev in [8, 9]); and the article by J. Arney and E. A. Bender [5], which treats mappings with constraints on degrees of the vertices. The research in these directions began in the early seventies and is still ongoing. In our opinion, the most surprising results concerning mappings with constraints were obtained by I. B. Kalugin [64], which we summarize. Let E,, R be the subset of mappings from En such that the degrees of the vertices take values only from a set R that contains zero and does not coincide with the set {0, 1). Let l; (A) be a random variable with the distribution Ake-A
P{l; (),) = k] =
k! P(R.l) ,
k E R,
where A is a positive constant and Xke-x
P(R, A)
kl
kER
There exists aR such that El; (aR) = 1. Denote by BR the variance of the random variable l;'(aR). For the number of cyclic vertices XR' and the height Tn,R of the random mapping from En, R, the following assertions are well known [64, 78].
oo, then
Theorem 4.4.2. If n
n/BR P{aR) = k) = ze
Z2/2(1
+o(1))
uniformly in the integers k such that z = k BR/n lies in any interval of the form
0
If n --* oo, then for any fixed x > 0, 00
P I v BR f n tn,R < x 1 -
E (-1)ke k x2 2 2 k=-oo
4.4 Notes and references
215
An unexpected result appears if we consider the set En R of mappings from En defined as follows. If in the graph of a mapping from En we delete the edges that connect the cyclic vertices, we obtain a graph consisting of trees. The set En R contains the mappings from En such that the degree of any vertex of the trees takes a value in R. Thus the difference in the restrictions on the degrees in En R and En,R seems to be insignificant because only the restrictions on the degrees of cyclic vertices differ by 1. But the sets En, R and En R have a substantial difference in the structure of their corresponding random graphs. Let AR) and T, R, respectively, be the number of cyclic vertices and the height of a random mapping from the set En R with uniform distribution. For the random variable (A), set bR
aR =
=
If R does not coincide with the set of all nonnegative integers, then aR < 1. Theorem 4.4.4. If n -+ oo, then
P{AR) = k) =
1
bR 2nn
e-22/2(1 + o(1))
uniformly in the integers k such that z = (k - (1- aR)n)l (bR /) lies in any fixed finite interval. Theorem 4.4.5. If n -+ oo and t = t(n) is such that naR constant, then for any fixed integer m,
where ,B is a
expkppaR}+o(1),
where the constant k# depends only on P and the set R.
Since t = t (n) is of order log n, the random mappings from E* R have many cyclic vertices and, as a consequence, have the height of order log n rather than as in the case for the mappings from En,R. A satisfactory explanation for this situation is not known. In Section 4.3, we considered the set Sn,R of all permutations of degree n with cycle lengths from a fixed set R. The interest in such sets may be partly explained by their connection with the equations involving permutations, which we will look at in the next section. Another reason for investigating the set Sn,R and similar sets of mappings with various restrictions is the possibility (see [5]) of approximating more complicated sets of combinatorial objects by such sets with relatively simple constraints. Partly for these reasons, the asymptotic behavior of the number an,R of elements in Sn,R has been considered in some recent studies [25, 80, 102, 149, 153, 154].
Random permutations
216
The generating function f (z) for the numbers an, R of elements in Sn, R is 00
f(z) _ E n-0
an,RZ n
n.
ar
=exp rER
Therefore it is convenient to apply the saddle-point method to obtain the asymp-
totics of an,R. By this method, the cases in which the elements of R form an arbitrary arithmetic progression are considered in [25, 107]; see also [130]. The application of the Tauberian-type theorems is another approach that has been used in the investigations of this problem [153, 154, 155]. Let R(n) be the number of elements of R that are not greater than n and let I A I be the number of elements in A. Theorem 4.4.6. Let n -+ oo,
R(n)/n - p, 0 < p < 1,
(4.4.1)
and form > n, m = 0(n),
-Ik:k
.
(4.4.2)
Then
an,R = (n - 1)! exp{ln,R - yp}/ h(p)(1 + o(1)),
(4.4.3)
where 1
ln,R = Y r rER,r
,
y is the Euler constant, and IF is the Euler gamma function.
Conditions (4.4.1) and (4.4.2) indicate that the set R is similar to a typical realization of a random set containing each positive integer with probability p independent of the other integers. As examples of the sets R that satisfy conditions (4.4.1) and (4.4.2), we may take sets of the form R = {k: {g(k)} E 0},
(4.4.4)
where g(t) is a real-valued function of t > 0, {x} is the fractional part of x, and A is an interval or a finite union of intervals from [0, 1] with the Lebesgue measure p.
A. L. Yakymiv [154, 155] proved that a set R of the form (4.4.4) satisfies conditions (4.4.1) and (4.4.2) if
g(t) = tal(t),
4.4 Notes and references
217
where a is a noninteger positive number, I (t) is a slowly varying function, and as
t -4 00, dtn
l(t) = o(t-nl(t)),
n = 1, ..., [a] + 2.
Let ar,R be the number of cycles of length r in a random permutation of Sn,R + an,R be its total number of cycles. Yakymiv [154, 155] proved the following assertions.
and let vn,R = a1,R +
Suppose that conditions (4.4.1) and (4.4.2) are satisfied and n - oo. Then the distribution of the random variable (vn,R - ln,R)/ plogn Theorem 4.4.7.
converges weakly to the standard normal distribution, and for any fixed r E R, the distribution of ar,R converges to the Poisson distribution with parameter 1/r. A case of irregular behavior of an,R is considered in [149].
Theorem 4.4.8. If n -+ oo and R = E U M, where E is the set of all even positive numbers and M is a set of odd numbers such that the series
b=Y mEM
1
m
converges, then
an,R =
(n e
)n (eb + e b)(1 + o(1))
for even n, and e
l
b}(1+o(1))
for odd n. Volynets [149] proved this theorem with the aid of relation (4.3.4), in which she uses the representation
P{v=m,rl=s}P{ IE)+
+ N)m=n-s}.
s,m
Here the variables (R), ... , l
N) have the parameter x equal to 1 _-1 In, v is the number of these variables taking values in M, rl is the sum of these variables, N) are independent identically distributed random variables with and l (E), .. , the distribution
k} = Note that if b
1(1 - 1/n )k12
klogn
k E E.
0, the result of Theorem 4.4.8 transfers continuously to (4.3.5).
218
Random permutations
Theorems 4.3.2, 4.3.3, and 4.3.4 are given in [80]. It can be easily shown that the asymptotics (4.3.11) and (4.4.3) are identical. Thus, quite different sets of conditions yield coinciding results. This coincidence shows that there exist weaker conditions sufficient for the validity of the asymptotics (4.3.11). We give the detailed and cumbersome proof of Theorem 4.3.3 because we conjecture that condition (1) from this theorem and the existence of a positive density of R are sufficient for the validity of (4.3.11) and that it may be possible to simplify the proof. The research on the sets S,,,R of permutations with restrictions on the cycle lengths provides an example of a fruitful competition of various analytical methods of asymptotic analysis such as the saddle-point method, the application of Tauberian-type theorems, and the approach based on the generalized scheme of allocation. Note that it would also be interesting to consider the cases where the density
p=0.
5
Equations containing an unknown permutation
5.1. A quadratic equation If g and f are permutations of degree n, then the result of their sequential action h = fg is a permutation of degree n called the product of g and f. The set Sn of all permutations of degree n with this operation is the well-known symmetric group of degree n. Therefore we can consider equations of the form
Xd = a,
(5.1.1)
where d is a positive integer, a E Sn, and X is an unknown permutation from Sn. In the previous chapter, we considered the set Sn,R of all permutations of degree n with cycle lengths from a fixed set R and found the asymptotics for the number of elements in Sn,R for some regular sets R. The interest in the sets of permutations Sn,R may be partly explained by their connection with some equations involving permutations. For example, the set of all solutions of the equation XP = e
(5.1.2)
in the symmetric group Sn, where e is the identity permutation and p is a prime number, is exactly the set Sn,R with R = 11, p}. Indeed, a permutation X satisfies equation (5.1.2) if and only if its cycles are of the length 1 or p. Denote by T(P) the number of solutions of equation (5.1.2). Theorem 5.1.1. If p is a prime number, then Tn(P )
_L
1
(n-pk)!k!pk 0
Proof. Let or be a random permutation from Sn. It is clear that
P16P = el = T(P)/n!, 219
220
Equations containing an unknown permutation
and the study of T(P) is equivalent to the study of the probability P{aP = e}. Since T(P) = an,R, where R = {1, p},
{ap=e}={ar=0, r 1, r#p}=[oil +pup =n}, where ar is the number of cycles of length r in a random permutation from Sn. By (4.1.4),
P{al=n-pk, ap=k, ar=0, r#1, r0p}= (n - pk)! k! pk 1
Summing these probabilities over admissible values of k yields the assertion of the theorem. Set ao,R = 1 and consider the generating function of the sequence an,R, °O
.fR (z) _
an, R z
n
n!
k=0
Theorem 5.1.2. Zr
.fR (z) = exp
-r rER
1
Proof. According to (4.1.5), 00
OU,tl,t2.... ) _ T, On(tl,...,tn)tln n=0 00
= eXp Y
Ll n to
(5.1.3)
n=1
where
On(tl,...,tn) = Y P{a1 =ml,...,an =mn}tl I ...tn n, m1,...,mn
and ar is the number of cycles of length r in a random permutation from Sn. If we put tr = 1 for r E R and tr = 0 for r R, we find that the corresponding generating function c y n (t1, ... , tn) is
Y. P{ar = mr, r E R, ar = 0, r
R),
MR
where l
MR =
=n }
Iml..... rER
JJJ
.
5.1 A quadratic equation
221
It is easy to see that
Y
rar=nJ.
P{ar=mr, reR, Car=0, rER
MR
Thus, substituting tr = 1 if r E R and tr = 0 if r 0 R into (5.1.3) shows that the generating function for P I
an,R n!
Erar=n rER
equals 00
an
n=0
ur
Run
1
= exp
n.
(5.1.4)
rER
In view of Theorem 5.1.2, it is convenient to apply the saddle-point method to obtain asymptotics of T ,(P). In the next section, we will use a different approach based on the generalized scheme of allocation; however, for comparison, we now present the derivation of the asymptotics of T(2) by applying the saddle-point method.
Theorem 5.1.3. As n -+ oo, T(2) = e111 F2
/
(e)n/2 e n 1 1 + O
Proof. Since T.(2)
+ 57 00 n=1
zn
n!
= ez+z2/2,
by Cauchy's formula
F(n)
T(2) n!
1
ez+z2/2
27ri
zn+1
dz,
integrating over an arbitrary contour that goes around the point z = 0. We can write
F(n) = 21
f ez+z2/2-n1ogzdz
and choose the contour of integration to be the circle passing through the saddle point o, where the derivative of the function
f(z) = z +
z2 2
- nlogz
222
Equations containing an unknown permutation
is zero. From the equation n .f'(z)=1+z-
z
=0,
we find that
n + 4 - 2.
Q=
Thus, setting z = Qei °, jr < ip < 7r shows that
1 f ef(Z) z= 2ir i f
F(n)
Z
=
I 27rQn
f n eQerw+Q2e2ic0/2-n loge-incpd(p
1
27r
n
n
d
eQ+Q2/2 I
n
(p
For the sake of brevity, we let a = Q sin ip + (Q2 sin 2ip)/2 - nip and write the integral in the form n
eQ+Q2/2
F(n) =
27rQn
cosaeQ(1-cosrp)-e2(1-cos29p)/2dip
f7r
0+122/2
+ 1e
27rQn
n sinaeQ(1-cosrp)-QZ(1-cos2p)/2dip
J
Since F(n) is real, we see that
=
eQ+Q2/2 27rQn
-n
cosae
Q(1-cosy)-Q2(1-cos2p) /2F(n)
We choose s = Q-3/4 and estimate the integral outside the s-neighborhood of zero, as n oo, taking into account that p = In + 1/4 - 1/2 -+ oo. The integrand is even, so we only estimate the integral over ip, 0 < ip < 7r. It is convenient to consider the graphs of the functions cos ip and cos 2ip included in the exponent. With the help of the graphs presented in Figure 5.1.1, we can easily see that n/2
J f < 1 n/2 e-QZ(1-cos2W)12th < f /2 e-s2Q2/2dc < - e-s2Q2/2 = 2
s
s
2
since 1 - cos 2s > s2 for sufficiently small s. Similarly,
/2'
cosae-Q(1-cos40)-Q2(1-cos2w)/2dip
< f e1odip <
f
/2 n
n/2
e-edip =
Jr
e-Q
5.1 A quadratic equation
f
.1..
1.
n/2....
r/4.
£
223
I
n
3ir/4
Figure 5.1.1. Graphs of cos cp and cos 2cp
Thus p+,22/2
F(n)
=
(f
E
I , E
where s = Q-314. Since p + Q2 - n = 0, we find that, in a neighborhood of zero,
a = Q sin cp + 2 sin 2(p - ncp
=
02W
- nco + O(Q2IwI3) = O(Q2Iw13),
and therefore
cosy = 1 + 0(a2) = 1 + O(Q4cp6) The exponent of the integrand can be represented in the domain of integration as follows: Q(1 - Cos ) +
fi
- cos 2cp) =
(Q + 2Q2)(p2 + 0('02(P4). 2
Thus, for I cp I < E, cosae,2(1-cosW)-Q2(1-cos2W)12
= = =
e-W2(Q+2Q2)/2 (1+0(,o e-W2(Q+2,22)/2(1
e-W2(0+202)/2(1
2W4 +0 4 6))
+ 0(Q264 +Q4E6))
+ O(Q-112)).
Therefore
f
E
E cosae-Q(1-cosW)-02(1-cos2rp)/2dq
E
=f
0(Q-1/2 E
Equations containing an unknown permutation
224
The change of variables 6 =
e-2(2ez)/2d=
1
2
Q + 2Q2cp gives
Le_02/2d0
1
J re
2;r(Q + 2Q2)
=
1
+2Q2
E
e+2ez
(1 + O(e- )),
since as x -> oc, °O
e-u2/2
L
1
z
du = -e-x/2(1 + o(1)). x
Combining the estimates gives
F(n) _
ee+ezl2
1
Q + 2Q2
2nQn
(1+0('0-1 /2)) +
e e+Q2/2
27rQ"
Q + 2Q2
(1 + o(Q
O(e--Jo-l2)
-1/21).
It remains to substitute o = n -+1 /4 - 1/2 into this formula. Since F(n) = T(2)/n!, we find that
2n + 0(,0-1/2).
log n(2) = log n ! + Q + 2 - n log p - 2 log (Q + 2Q2) - log
(5.1.5)
Replace log n ! by Stirling's formula
log n ! = n log n - n + 2 log n + log 2n + 0(n-1).
(5.1.6)
It is easily sil y seen seen that t 1 n+1/4-2
Q
n(1+4n) 1/2 1
=
2
=T(1+8n+0(n )) =
-,In-
- 2+8 /
n-(1-2I +8n1 Q2
2
nI ) (n2
))'
(5.1.7)
fn-(5.1.8)
5.2 Equations of prime degree
225
When we use (5.1.7), we find
II
nlog Q =
=
(5.1.9)
(710
nlogn-4n+O4
Z
-
Finally,
log(Q + 2Q2) = log 2 + 2 log Q + log (1
( =n
= log n + log 2 + O
+2 (5.1.10)
I .
\\
By substituting estimates (5.1.6)-(5.1.10) into (5.1.5), we obtain the final formula for log T(2): log T(2)
= 2 log n - 2 + in - 4 - log
+ O I n-1/41
\
which implies the assertion of the theorem.
J
5.2. Equations of prime degree According to (4.3.4), the number an,R of permutations in Sn,R can be represented in the form n! BR(x)
00
an,R = xne
(BR(x))N -BR(x) P{ Ni e
(R) 1
+...+N(R)=n},
(5.2.1)
N=1
where xk
BR(x)=>-, k
(5.2.2)
kER
and (R), ... ,
N)
are independent identically distributed random variables,
k} =
xk , k BR (x)
k E R,
(5.2.3)
and the positive parameter x can be chosen arbitrarily from the domain of convergence of the series in (5.2.2). If p is a prime number, then the number T(P) of solutions of equation (5.1.2) is an,R, where R = {1, p}. Therefore XP BR(x)=x+-,
p
226
Equations containing an unknown permutation
and by (5.2.1), T(P)
=
n! e x+xplP 'n
00
N=1
where N = 1 +
(x +xPIP)N e _x_xP/P N.
n},
(5.2.4)
+ SAN, 1, ... , SN are independent identically distributed
random variables and P
P{ i-P}-Px+xP
P{sit=1}=Px+xP,
(5.2.5)
Thus, to find the asymptotics of T(, P), it suffices to choose an appropriate value
of x and to prove a local limit theorem for the sum N = 1 +
+ N. The
summation of independent random variables taking two values is a simple problem that is solved by the de Moivre-Laplace theorem. Therefore the approach based on the representation (5.2.4) seems more suitable here than the saddle-point method. We begin by applying this approach to the proof of Theorem 5.1.3.
Proof of Theorem 5.1.3. If R = 11, 2), then obviously
P{fit = 11 =
x
2 B(x) 2+x
21 =
x 2B(x) 2+x x2
'
where B(x) = BR (x) = x +x2/2, and ESN = N(x +x2)/B(x). In the main part of the sum in (5.2.4), the parameter N takes values close to B(x); therefore we choose x such that
Hence,
x= n+1/4-2, 1
B(x)=x+
x + x2
n
E1 = B(x) = B(x) , and Let
x2 2
n =2+2 n+1/4-4, 1
D 1=
1
x3
2B2(x)'
2n-1/2(1 + o(1)) as n -k oo (where D denotes the variance).
2(N - B(x))
A=
2_ log n,
and divide the sum from (5.2.4) into two parts so that
n(2) =
n! eaB(x)
x
(S1 + S2),
(5.2.6)
5.2 Equations of prime degree
227
where S1
=
BN(x) N!
N:IuI
B(x)
BN(x) N! N:Iul>A
SZ =
n},
= n}.
In the first sum,
1ogn(1+0(1)),
A -D1
IN - B(x)l B(x)
2
n1/4
and by using the normal approximation to the Poisson distribution, we obtain, as
n -+ oo, N
BN( ) e-B(x) =
,,/-2-,7-(=x= 0+00))
uniformly in the integers N such that I u I < A. The sum N - N has the binomial distribution with N trials and the probability
of success p(x) = x/(2 + x). If Jul < A, then N = B(x)(1 +o(1)), and 2xN
Np(x)(1 - p(x)) = (2 + x)2 = In (I + o(1)) oo. Therefore the normal approximation to the binomial distribution is as n valid. For I u I < A = 2 log n,
n-
n(B(x) - N)
-u(1 + O(n
B(x) Therefore, by the de Moivre-Laplace theorem,
PRN = n) =
2nND
1e-"2/2(1 +o(1))
uniformly in the integers N such that I u I < A.
The behavior of the functions pp1(N) = BN(x)e-B(x)/N! and 02(N) n} is represented approximately in Figure 5.2.1. The sum S1 can be estimated as follows: n}
S1 = N: I u l
=E N:Iul
e_u2/2(1
2 B(x)
1
2
2,/2;r B(x) 2n
N:I I
+ 0(1))
e-u2/2
(1 + o(1)).
228
Equations containing an unknown permutation
-A
n/2
A
Figure 5.2.1. The graphs of cpl (N) and cp2(N)
The last sum is an integral sum of the function e-u2/2 with step
so as n -+ co, S1 =
1
1
2 27rB(x)
27 F
e-u2/2du(1 + 0(1)) = 0
1
2,,/2-,7 -B (x)
0+00)).
By virtue of monotonicity, for I u I > A,
n} <
e-A2/2(1 + o(1)),
1
and there exists a constant c such that
cn-5/4 Therefore
SZ = N:Iul>A
n} < cn 5/4.
N!
Thus
S = S1 + S2 = S1(1 + 0(1)) =
1
27rB(x) (1 +
2
o(1)),
and by substituting this estimate into (5.2.6), we obtain
n'
n(2) =
aB(x)
2x" 2
Br
(x)
(1 + 0(1)).
It remains to substitute n
1
X =
n + 1/4 - 2,
B(x) = 2
1
1
+ -Vln + 2
1/4 -
4
5.2 Equations of prime degree
229
into the formula. It is easily seen that
= en/2+//2-1/4(1 +o(1)),
eB(x)
xn = nn/2e-,12(1 + o(1)). Therefore
T (2)
-n
ne-n n
27ren/2+vfn/2-1/4
2n n/2
nnl2e-,x122
= e1142-1/2 (n)n12 e'(1 + o(l)), e
and Theorem 5.1.3 with the remainder term of the form 1 + o(1) is proved. We now turn to the case where p is a fixed prime number, p > 3, and consider the number T (P) of solutions of equation (5.1.2).
oo and p is prime, p > 3, then
Theorem 5.2.1. If n T(P)
(e)n(1-11P) p-1/2en'/P(1
=
+00)).
Proof. The proof is almost the same as the proof of Theorem 5.1.3 given above and is also based on relation (5.2.4). For R = 11, p},
B(x) = BR(x) = x +xP/p, and the independent random variables 1, ... , N in (5.2.4) have the distribution x P11 = 11 = B(x)
P
P XP
px
P11
px +xP '
- - B(x) - px +xP p}
We choose the parameter x such that
x + xP = n.
(5.2.7)
Then
x = n11P - 1 n- 1+21p + 0 (n-2+21P l 1 p
B(x) = x + xPIP =
P
+ pp
1
n
1/p + O(n- 1+2/P),
XP
p(x) = px + XP = 1 - pn-1+1/P + O(n-2+2/p), n/B(x),
(p - 1)2
pn-1+1/p(1
+ o(1)).
Equations containing an unknown permutation
230
Let
u=
p(N - B(x)) B(x)D41
A = 21ogn,
,
and divide the sum in (5.2.4) into two parts so that
T(P)-n!e B(x) n
xn
(SI + S2),
where
Si =
n},
N!
N:Iul
BNi
S2 =
n}.
N:Iul>A
In the first sum, N = B(x)(1 + o(/)) and N
B N! ) e-B(x)
2 B(x) 0+00))
uniformly in the integers N such that I u I < A.
Let 4t* = (41 - 1)/(p - 1), i = 1, ... , N. The sum
N=S1 +...+cr/ has the binomial distribution with N trials and the probability of success
p(x) = xP/(px +xP) = 1 - pn-1+11P + O(n-2+2/Pl l // as n -k oo. It is clear that
n} = PK = (n - N)/(p - 1)}, n) = 0. Since E 1 = n /B (x),
and if (n - N) / (p-1) is not an integer, then P
B(x) = n/p(1 + o(1)), and
n-NE41 _ n(B(x)-N)
NDi
B(x)
ND41 nu
_
p B(x)N as n
-u(1 + 0(1))
oo and I u I < A, by using the de Moivre-Laplace theorem, we obtain
n} =
(n - N)/(p - 1)} =
2n
I
e-u2/2(1 +o(1))
uniformly in the integers N such that (n - N)/(p - 1) is an integer and I u I < A.
5.2 Equations of prime degree
231
Therefore
S1 = E BNt )e-B(x)PVN = n) N:Iul
e-"2/2(1 + 0(1)) N:Iul
v'2- 7r
p 2nB(x)
e-"2/2(1 + 0(1)),
2n N:I
2-
A
where the summation is over the integers N such that (n - N) / (p -1) is an integer. The last sum is an integral sum of the function a-u2/2 with step
Since the summation is over N such that (n - N)/(p - 1) is an integer, that is, only each (p - 1)th term is included in the sum, we obtain
p- 1
P
e_u2/2du
e-u2/2
1
NuA 2B(x)D1
f°°00
= 1.
Therefore, as n -a oo, S1
p 2 rB(x) (1 +
o(1)).
For Jul > A,
n} <
1
2 B(x)D_1 e
A2/2(1+0(l)),
and there exists a constant c such that
PRN = n} < and S2 <
cn-1-11(2p).
cn-1-11(2p)
Thus
S = S1 + S2 = S1(1 + 0(1)) =
p 2 rB(x) 0+00)), 1
and by substituting this estimate into (5.2.6), we obtain T(p)
_
n! eB(x) pxn,,/-2-7r B(x)
0+00)).
(5.2.8)
It is easily seen that
eB(x) = en/p+(p-1)nl/P/p(1 + o(1)),
xn = nn/pe'11PIp0+00)). When we substitute these expressions into (5.2.8), we obtain the assertion of Theorem 5.2.1.
Equations containing an unknown permutation
232
A slight refinement of the estimates used in the proof of Theorem 5.2.1 allows us to show that the assertion of the theorem is valid if p tends to infinity slowly, as specified below, where we prove a more general result.
Theorem 5.2.2. If p is prime and n, p -* 00 in such a way that p/n -+ 0, then n(1-1/P)
(n
(P)
Tn
\e/
112
oa (n1/P)m+kp
p Ek=0 (m +kp)! (1 +0(1));
(5.2.9)
in particular, if p-2n 11 P -+ oo, then T(P)
= (n\n(1-1/P) P-1/2 enhIP(1 +o(1)),
(5.2.10)
e/1
and if
p-1n1/P
0, then (P)
Tn
-
n n(1-11P) 1/2mlP m1 (1 +0(1)), p \e/
(5.2.11)
where m = n - p[n/ p], and [c] is the integer part of c. Proof. The proof is similar to the proof of Theorem 5.2.1, but now we need to trace
the effect of the parameter p in the remainder terms of the asymptotic formulas and to use a representation in terms of the Poisson probabilities instead of the representation (5.2.4). It follows from the equation x +xP = n that under the conditions of the theorem,
x=nt/P_nnp n
B=B(x)= p+
(5.2.12)
+OCn32 p2 (p
- 1)n1/P p
p
+O(n2lp) (5.2.13)
n
Therefore it is easy to confirm that
p(x) = The random variable
p} =
xp px + xP
1 - pn-1+1/p + O(n-2+21"1.
N)/(p - 1) can be represented in the form
p-1 =N - qrr, where riN has the binomial distribution with N trials and probability of success
q = q(x) = 1 - p(x) =
pn-t+1/p(l
+
O(n-1+1/p)).
(5.2.14)
Therefore it is not difficult to see that for n = m + p[n/p], the probability
5.2 Equations of prime degree
233
PRN = n} is nonzero if
N=[n/p]+m+k(p-1), 0
where 1 = m + kp. Thus, the representation (5.2.4) takes the form T(p)
n, 0" BN N1 = xn
n}
N=1
[nlpl
N
This results in the representation (p)
Tn
=
n!
[nl (Bq)l
E
xn
e -Bq
1!
k=O
(B(1 (N -1)!
-B(1-q)
q))N-1
(5.2.15)
e
where 1 = m + pk, N = [n/p] + m + k(p - 1), m = n - p[n/p]; and to obtain the basic assertion of the theorem, we must sum the products of two Poisson probabilities. Let [nlpl
S=
m+pk
(B+ k)Ie-Bq
a = (n1/p p/n )1/3
k=O
and divide s into two parts, Si =
(Bq)m+pk
(m + pk)I k:I(N-B)b-J/2I
S2 =
E
k:I(N-B)b-1121>a
e- Bq
(Bq)m+pke-Bq (m + pk)!
Note that a -+ 0 under the conditions of the theorem, and the normal approximation to the second multiplier
(B(1 - q))
(N-1)!
-B( 1-q ) _1
2nB(1+o(1))
e
(5.2.16)
is valid for all 1, N such that I (N - B)B-1/21 < a, and outside this region,
(B(1 (N -1)! where c is a constant.
q))N-1
B(1-q) <
C
(5.2.17)
2 rB
Equations containing an unknown permutation
234
It remains to show that s2 = o(sl) and 00 (ydl/p)m+pk _ t/p e 0+00)).
Si =
k=o
(5.2.18)
(m + pk)!
For the sake of brevity, we let b = Bq. It follows from (5.2.13) and (5.2.14) that under the conditions of the theorem,
b = n1/p(l +
O(pn-l+l/p)).
(5.2.19)
It is clear that b[b]+p
-b
sl
([b] + p) ! e since at least one of the summands with 1 from the interval ([b], [b] + p) is included in the sum Si. On the other hand, the summation over N > B + a / is the summation over 1,
with 1=m+pksuch that l>b+a./B+oW-B-).Let lo =b+a/+o(-,I-BThen, blo
e -b
S2
1+
lo!
bl0e-blo
b + b2 + ...
lo!(lo - b) _ lo!
to
to
cb10
since b/ to -+ 0. Therefore, cblo-[b]-p
S2 S1
lo(lo- 1)...([b]+p+1) C
- (1 + (lo - b)/b). .(1 + ([b] - b + p + 1)/b)
cib3
c3n3/p p3/2
c2b3
(lo - b)3 - (aI-B- ) 3 -
a3n3/2
where cl, c2, and c3 are constants. By the choice of a, the last bound tends to zero. This estimate, (5.2.16), (5.2.17), and (5.2.19) imply (5.2.18). Assertion (5.2.9) follows from (5.2.15), (5.2.16), (5.2.17), and (5.2.18). If p-2n 1/p --+ oo, then by using the normal approximation, we obtain
ao (nl/p)m+pk
O
p(1 +o(1)).
(m + pk)!
This yields assertion (5.2.10) of the theorem. Assertion (5.2.11) follows from the fact that if p-ln1/p -± 0, then
00 (nl/p)m+Pk
Y- (m + pk)!
nm/p M
0+00)).
5.3 Equations of compound degree
235
5.3. Equations of compound degree In this section, we consider the number T(d) of solutions of the equation
Xd = e,
(5.3.1)
where d is a natural number, e is the identity permutation, and X is an unknown element of the symmetric group Sn. The cases where d is a prime number were considered in the previous sections. Let d be a compound number and let 1 =
do < dl <
< d, = d be all different divisors of d. A permutation X is a solution of equation (5.3.1) if and only if the lengths of cycles of X belong to the set {do, . . . , d,}. Therefore T (d) is equal to the number an R of permutations in Sn,R, where R = {do,... , d,}. The following is a generalization of Theorems 5.1.3 and 5.2.1. Theorem 5.3.1. I fn -+ oo and d is a fixed number, d > 2, then \ n n_n/d 1
(d) = Tn
\n
n.1/d
7d=
exp
T, Jld
(1 + o(1))
J
if d is odd, and (n)n
n(d) =
e
j - - 1(1+00)) nJld
-n/d 1 exp lid
1
if d is even.
Note that the summation in the above formulas is over the divisors j of the number d, and if we put d = 2 and d = p, we obtain Theorem 5.1.3 and 5.2.1, respectively.
Proof. Let 1 = do < dl < {do,
< d, = d be all the divisors of d, R =
.. , d,}, xk B(x)=BR(x)k kER
and let 1, ... , i;N be independent identically distributed random variables, k}
xk
kB(x)'
E R,
(5.3.2)
where the positive parameter x can be chosen arbitrarily. Since d is compound,
r > 2. Put N = 4i +
+ SN It is clear that
(x+xd1 +...+xd'-1 +xd)/B(x).
236
Equations containing an unknown permutation
We choose the parameter x such that
x + xdl + ... + xd,-' + xd = n,
(5.3.3)
and in what follows, we consider the random variables 1, ... , N with distribution (5.3.2), where x is the solution of this equation. By iteration, it is not difficult to determine that
- ndr_1/d _ ... _ nl/d +0(1)
(5.3.4)
- ndr_i/d _ ... _ nl/d +112+o(l)
(5.3.5)
xd = n if d is odd, and
xd = n
if d is even. Since T(d) = an,R, where R = 11, dl, ... , d,._1, d), we can use the representation (5.2.1) and obtain n! eB(x) xn
00
n}.
(5.3.6)
N=1
Therefore, to obtain the assertions of Theorem 5.3.1, it is sufficient to find the asymptotics of n}. It is not difficult to see that n
B(x) , B(x)(x + dlxd' + ... + dxd)
D1 = B(x)
- n2
B2(x) xdi
xd
dl
d
xi JId
j'
where the summation is over the integers j, which are the divisors of d. In view of (5.3.4) and (5.3.5), n j/d
B(x) = lld
J
(1 +o(1)),
(5.3.7)
as n -k oo. By estimating the second and third central moments of 41 and using the characteristic function of N, we can prove that the distribution of the random variable NE41)/ converges to the normal law with parameters (0, 1) as N D 1 oo. If h is the maximal step of the lattice containing the set R, then the local limit theorem is valid on this lattice. We omit the proof of this local theorem.
5.3 Equations of compound degree
237
The remaining part of the proof of Theorem 5.3.1 repeats the corresponding part of the proof of Theorem 5.1.3 from Section 5.2. We put V
_ n - NE41
u
ND_1 '
_ d(N - B(x))
A = 2 2 log n,
B(x)De1
and divide the sum from (5.3.6) into two parts so that xn
n
where S1 =
Ni BN(x)
S2 = E
BN
n},
n}.
N:Iul>A
It is easy to see that N = B(x)(1 + o(1)) for JuI < A = 2 2logn and v
= n(B(x) - N) B(x)
= -u(l + O(n(5.3.8)
and by the local limit theorem, h
n} =
2nN D 1
e-nz/2(1 + o(1))
uniformly in the integers N such that I u I < A and (n - N)/ h are integers. Recall that h is the maximal span of the distribution of 1. As in the proof of Theorem 5.1.3, Section 5.2, we obtain S1
=
1
dh
1
d 2nB(x) 2n
N:Iul
e-uz/2(1 + o(1)).
B(x)D 1 e_n2/2,
with step The last sum is an integral sum of the function and the summation is over N such that (n - N) / h are integers, that is, only each hth term is included in the sum. Since h and d are relatively prime, we see that
hd
1
2n
N:juj
B(x)D41
S1
= d 27rB(x) (I
f
°°
e_U2/2
2n
_u2/2 e
du = 1 ,
oo
+ o(1)).
In estimating S2, it will not be possible now to use the monotonicity of the tails of the function cy2(N) = n} as we did in the proof of Theorem 5.1.3 in
Equations containing an unknown permutation
238
Section 5.2 (see Figure 5.2.1). By (5.3.8), in the first sum, I v I < sufficiently large n. Therefore, in the second sum,
n} <
2logn for a
n}. n:Ivl>
21ogn
By the integral limit theorem,
E n:lvi>
n} =
2
J
721=1
21ogn
e-z2/2dz(1 + o(1)), 2 1og n
and there exists a constant c such that, in the second sum,
n} < cn-1. Thus, S1 + S2 = S1(1 + o(1)), and we obtain n. e B(x)
T(d) =
xn d ,/2-7r B(x)
n
(1 +00)).
(5.3.9)
This implies the assertions of the theorem because e B(x)
= exp
xj
Y, I ild
and xn can be represented in the cases of odd and even d as follows. Let d be odd, then according to (5.3.4), xn = nn/de-(ndr-1/d+...+ni/d)/d (l
+ o(1)).
x1 =n ild + o(1),
xd = n
eB(x)
= exp
-
T nild
ndr-1/d
- ... - nl/d = o(1).
1(ndr-1/d+...+n1/d)+o(1)
,
ild and
x-neB(x) = n-n/d exp T, lid
Wild
J
0+00)).
When we substitute the last expression into (5.3.9), we obtain the first assertion of the theorem.
239
5.4 Notes and references
If d is even, we note that 2dr_1 = d and use (5.3.5) to obtain
xn = nn/de-(ndr-1/d+...+nl/d-1/2)/d-1/(2d)(1 + o(1))
Fort < j
forj=d, =n-ndr-1/d-...-n1/d+1/2+o(1);
xd
and for j = dr-1, xdr-1
=
ndr-1/d
- dr-1/d +O(1).
Thus
Jld
e(X) = exp
x-neB(x) = x-n/d eXp
ni/d
1Y jld j
(ndr-1/d
+
+nt/d - 1/2)/d +0(1)
,
-I I (I+ 0(l)). 2d
The substitution of the last expression into (5.3.9) gives us the second assertion of the theorem.
5.4. Notes and references The study of equations of the form Xd = e in the symmetric group Sn is directly related to one of the significant characteristics of the elements of Sn: the order of permutations. By the order O, (s) of a permutation s E Sn, we mean the least positive integer k such that sk is the identity permutation. The orders of elements in S, vary from 1 to the maximal value G (n) over all s E Sn. E. Landau [95] shows that log G (n) hm =1. n*oo nlogn
In spite of such a wide range of log On (s), the typical values of log On (s) are considerably less than log G(n) and are concentrated near 2-1 log2 n. Let On be the order of a random permutation from S, with uniform distribution. The following assertion is well known.
Theorem 5.4.1. For any fixed x,
lim P{ (log On - 2-1 log2 n)/ 3-1 log3 n }
n-+oo
e-n2/2 du. 27C
fo,
240
Equations containing an unknown permutation
The asymptotic normality of log On was first proved by P. Erd6s and P. Turan [39]. Other proofs of Theorem 5.4.1 can be found in [106, 18, 27]. All the proofs are rather cumbersome and involve many analytical difficulties. From our point of view, the simplest proof, but still not a sufficiently simple one, is suggested in [78], where the approach based on the generalized scheme is used. It seems to us that investigating the numbers of solutions of equations of the
form Xd = e could provide the basis for the study of the local behavior of On. Indeed, if p is prime, then T (p) is just the number of permutations s E Sn whose order On (s) = p. Since the leading term of the asymptotics of the number T (d) for (n/e)n(1-1/d) a compound d is almost all permutations counted by T(d) probably have the order d. It would be of considerable interest to find the asymptotics of the local probabilities P1 0, = d} for d that lie in a neighborhood of exp{2-1 log2 n} and to see whether the integral limit theorem follows from these results in spite
of the fact that the behavior of the probabilities P{On = d) is likely to be rather complicated. By virtue of the irregularity of the behavior of P{0,' = d}, this problem is not usually as trivial as is obtaining the integral limit theorem from the local theorem because now we have to obtain the local theorem ford of a specified form and, in addition, we have to know how many d of such a form exist. Theorems 5.1.1 and 5.1.2 for R = {1, 2} and Theorem 5.1.3 were proved in [32]. Theorem 5.1.2 for R = { 1, p}, p > 2, was proved in [61 ], and for an arbitrary R in [33].
Theorem 5.1.3 was proved in [103], where the result of Theorem 5.2.1 was also presented. Assertion (5.2.9) of Theorem 5.2.2 was proved by the saddle-point method in [144]. Theorem 5.3.1 was proved in [ 108, 145, 150] independently and almost simultaneously. The approach based on the generalized scheme of allocation, presented in Chapter 5 of this book, was first published in [82], where the proof of Theorem 5.1.3 was realized with the help of this approach. The proof of Theorem 5.3.1 in Section 5.3 follows A. V. Kolchin [68], who, in addition, extended this theorem to the case
d -+ oo such that d In In n/Inn -* 0. The general conditions of existence of a solution of the equation Xd = a, where a is a fixed permutation and X is an unknown permutation from Sn, are given in [102].
The system of equations
Xm' =X'n2 =...=X"'k =e> 1
2
k
where k > 2, m 1, ... , m k are fixed natural numbers, X1, ... , Xk E Sn , and e is the identity permutation in Sn, is considered in [110]. The asymptotic representation
of the number of solutions X = (X1, ... , Xk) such that Xi Xj = XjX1 for all i # j is found.
BIBLIOGRAPHY
[1] Sh. M. Agadzhanyan. On a general method of estimating the number of graphs from given classes. Avtomatika, (1):10-21, 1981. In Russian. [2] Sh. M. Agadzhanyan. The asymptotic formulae for the number of mcomponent graphs. Avtomatika, (4):27-33, 1986. In Russian. [3] D. J. Aldous. Exchangability and related topics. Lecture Notes in Math., 1117:1-198, 1985. [4] D. J. Aldous. Brownian bridge asymptotics for random mappings. Adv. Appl. Probab., 24:763-764, 1992. [5] J. Amey and E. A. Bender. Random mappings with constraints on coalescence. Pacific J. Math., 103:269-294, 1982. [6] R. A. Arratia. Independent process approximation for random combinatorial structures. Adv. Appl. Probab., 24:764-765, 1992. [7] R. Arratia and S. Tavard. Limit theorems for combinatorial structures via discrete process approximations. Random Structures and Algorithms, 3:321-345, 1992. [8] G. N. Bagaev. Distribution of the number of vertices in a component of an indecomposable mapping. Belorussian Acad. Sci. Dokl., 21(12):10611063, 1977. In Russian. [9] G. N. Bagaev. Limit distributions of metric characteristics of an indecomposable random mapping. In Combinatorial and Asymptotic Analysis, pp. 55-61. Krasnoyarsk Univ., Krasnoyarsk, 1977. In Russian. [10] G. N. Bagaev and E. F. Dmitriev. Enumeration of connected labelled bipartite graphs. Belorussian Acad. Sci. Dokl., 28:1061-1063, 1984. In Russian. [11] G. V. Balakin. On random matrices. Theory Probab. Appl., 12:346-353, 1967. In Russian. [12] G. V. Balakin. The distribution of random matrices over a finite field. Theory Probab. Appl., 13:631-641, 1968. In Russian. 241
Bibliography
242
[13] G. V. Balakin, V. I. Khokhlov, and V. F. Kolchin. Hypercycles in a random hypergraph. Discrete Math. Appl., 2:563-570, 1992. [14] A. D. Barbour. Refined approximations for the Ewens sampling formula. Adv. Appl. Probab., 24:765, 1992. [15] A. D. Barbour. Refined approximations for the Ewens sampling formula. Random Structures and Algorithms, 3:267-276, 1992. [16] E. A. Bender, E. R. Canfield, and B. D. McKay. The asymptotic number
of labeled connected graphs with a given number of vertices and edges. Random Structures and Algorithms, 1:127-170, 1990. [17] E. A. Bender, E. R. Canfield, and B. D. McKay. Asymptotic properties of labeled connected graphs. Random Structures and Algorithms, 3:183-202, 1992.
[18] M. R. Best. The distribution of some variables on a symmetric group. Nederl. Akad. Wetensch. Indag. Math. Proc., 73:385-402, 1970. [19] L. Bieberbach. Analytische Fortsetzung. Springer-Verlag, Berlin, 1955. [20] B. Bollobas. The evolution of random graphs. Trans. Amer. Math. Soc., 286:257-274, 1984. [21] B. Bollobas. Random Graphs. Academic Press, London, 1985. [22] Yu. V. Bolotnikov. Convergence to the Gaussian and Poisson processes of the variable Ar(n, n) in the classical occupancy problem. Theory Probab. Appl., 13:39-50, 1968. In Russian. [23] Yu. V. Bolotnikov. Convergence to the Gaussian process of the number of empty cells in the classical occupancy problem. Math. Notes, 4:97-103, 1968. In Russian. [24] Yu. V. Bolotnikov. Limit processes in a non-equiprobable scheme of allocating particles into cells. Theory Probab. Appl., 13:534-542, 1968. In Russian.
[25] Yu. V. Bolotnikov. On some classes of random variables on cycles of permutations. Math. USSR Sb., 36:87-99, 1980. [26] Yu. V. Bolotnikov, V. N. Sachkov, and V. E. Tarakanov. Asymptotic normality of some variables connected with the cyclic structure of random permutations. Math. USSR Sb., 28:107-117, 1976. [27] J. D. Bovey. An approximate probability distribution for the order of elements of the symmetric group. Bull. London Math. Soc., 12:41-46, 1980. [28] V. E. Britikov. Limit theorems on the maximum size of trees in a random forest of non-rooted trees. In Probability Problems of Discrete Mathematics, pp. 84-91. MIEM, Moscow, 1987. In Russian.
[29] V. E. Britikov. The asymptotic number of forests from unrooted trees. Math. Notes, 43:387-394, 1988. [30] V. E. Britikov. The limit behaviour of the number of trees of a given size in a random forest of nonrooted trees. In Stochastic Processes and Applications,
pp. 36-41. MIEM, Moscow, 1988. In Russian.
Bibliography
243
[31] I. A. Cheplyukova. Emergence of the giant tree in a random forst. Discrete
Math. Appl., 8(1):17-34, 1998. [32] S. Chowla, I. N. Herstein, and K. Moore. On recursions connected with symmetric groups. Canad. J. Math., 3:328-334, 1951. [33] S. Chowla, I. N. Herstein, and W. R. Scott. The solution of xd = 1 in symmetric groups. Norske Vid. Selsk., 25:29-31, 1952. [34] J. M. DeLaurentis and B. G. Pittel. Random permutations and Brownian motion. Pacific J. Math., 119:287-301, 1985. [35] P. J. Donnelly. Labellings, size-biased permutations and the gem distribution. Adv. Appl. Probab., 24:766, 1992. [36] P. J. Donnelly, W. J. Ewens, and S. Padmadisastra. Functionals of random mappings: Exact and asymptotic results. Adv. Appl. Probab., 23:437-455, 1991.
[37] P. Erdo"s and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hungarian Acad. Sci., Ser. A, 5(1-2):17-61, 1960. [38] P. Erdo"s and A. Renyi. On random matrices. Magyar Tud. Akad. Mat. Kutato Int. Kdzl., 8:455-461, 1963. [39] P. Erdo"s and P. Turan. On some problems of statistical group theory. iii. Acta Math. Acad. Hungar., 18(3-4):309-320, 1967. [40] W. J. Ewens. The sampling theory of selectively neutral alleles. Theoret. Pop. Biol., 3:87-112, 1972. [41] W. J. Ewens. Sampling properties of random mappings. Adv. Appl. Probab., 24:773, 1992. [42] M. V. Fedoryuk. Saddle Point Method. Nauka, Moscow, 1977. In Russian. [43] W. Feller. An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York, 1966. [44] P. Flajolet. The average height of binary trees and other simple trees. Journal of Computer and System Sciences, 25:171-213, 1982. [45] P. Flajolet. Random tree models in the analysis of algorithms. In P.J. Cour-
tois and G. Latouche, editors, Performance'87, pp. 171-187. NorthHolland, Amsterdam, 1988. [46] P. Flajolet, D. E. Knuth, and B. Pittel. The first cycles in an evolving graph. Discrete Math., 75:167-215, 1989. [47] P. Flajolet and A. M. Odlyzko. Random mapping statistics. In J.-J. Quisquarter and J. Vandewalle, editors, Advances in Cryptology, Lecture Notes in Computer Science, Vol. 434, pp. 329-354. Springer-Verlag, Berlin, 1990. [48] P. Flajolet and M. Soria. Gaussian limiting distributions for the number of components in combinatorial structures. J. Combinatorial Theory, Series A, 53:165-182, 1990. [49] B. V. Gnedenko and A. N. Kolmogorov. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Reading, MA, 1949.
244
Bibliography
[50] S. W. Golomb. Shift Register Sequences. Aegean Park Press, Laguna Hills, CA, 1982.
[51] V. L. Goncharov. On the distribution of cycles in permutations. Soviet Math. Dokl., 35(9):299-301, 1942. In Russian. [52] V. L. Goncharov. On the alternation of events in a sequence of Bernoulli trials. Soviet Math. Dokl., 36(9):295-297, 1943. In Russian. [53] V. L. Goncharov. On the field of combinatorics. Soviet Math. Izv., Ser. Math., 8:3-48, 1944. In Russian. [54] A. A. Grusho. Random mappings with bounded multiplicity. Theory Probab. Appl., 17:416-425, 1972. [55] A. A. Grusho. Distribution of the height of mappings of bounded multiplicity. In Asymptotic and Enumerative Problems of Combinatorial Analysis, pp. 7-18. Krasnoyarsk Univ., Krasnoyarsk, 1976. In Russian. [56] J. C. Hansen. Order statistics for random combinatorial structures. Adv. Appl. Probab., 24:774, 1992.
[57] B. Harris. Probability distributions related to random mappings. Ann. Math. Statist., 31:1045-1062, 1960. [58] C. C. Heyde. A contribution to the theory of large deviations for sums of independent random variables. Z. Wahrscheinlichkeitstheorie and verw. Gebiete, 7:303-308, 1967. [59] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58(301):13-30, 1963. [60] I. A. Ibragimov and Yu. V. Linnik. Independent and Stationary Related
Variables. Nauka, Moscow, 1965. In Russian. [61] E. Jacobstal. Sur le nombre d' elements du group symmetric S dont l' ordre est un nombre premier. Norske lyd. Selsk., 21:49-51, 1949. [62] S. Janson. Multicyclic components in a random graph process. Random Structures and Algorithms, 4:71-84, 1993.
[63] S. Janson, D. E. Knuth, T. Luczak, and B. Pittel. The birth of the giant component. Random Structures and Algorithms, 4:233-358, 1993. [64] I. B. Kalugin. The number of cyclic points and the height of a random mapping with constraints on multiplicities of the vertices. In Abstracts of the All-Union Conference Probab. Methods in Discrete Math., pp. 35-36. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1983. In Russian. [65] V. I. Khokhlov. On the structure of a non-uniformly distributed random graph. Adv. Appl. Probab., 24:775, 1992. [66] V. I. Khokhlov and V. F. Kolchin. On the structure of a random graph with nonuniform distribution. In New Trends in Probab. and Statist., pp. 445456. VSP/Mokslas, Utrecht, 1991. [67] J. F. C. Kingman. The population structure associated with the Ewens sampling formula. Theoret. Pop. Biol., 11:274-284, 1977.
Bibliography
245
[68] A. V. Kolchin. Equations in unknown permutations. Discrete Math. Appl., 4:59-71, 1994.
[69] V. F. Kolchin. A class of limit theorems for conditional distributions. Litovsk. Mat. Sb., 8:53-63, 1968. In Russian. [70] V. F. Kolchin. On the limiting behavior of extreme order statistics in a polynomial scheme. Theory Probab. Appl., 14:458-469, 1969. [71] V. F. Kolchin. A problem of allocating particles into cells and cycles of random permutations. Theory Probab. Appl., 16:74-90, 1971. [72] V. F. Kolchin. A problem of the allocation of particles in cells and random mappings. Theory Probab. Appl., 21:48-63, 1976. [73] V. F. Kolchin. Branching processes, random trees, and a generalized scheme of arrangements of particles. Math. Notes, 21:386-394, 1977.
[74] V. F. Kolchin. Moment of degeneration of a branching process. Math. Notes, 24:954-961, 1978. [75] V. F. Kolchin. Branching processes and random trees. In Cybernetics, Combinatorial Analysis and Graph Theory, pp. 85-97. Nauka, Moscow, 1980. In Russian. [76] V. F. Kolchin. Asymptotic Methods of Probability Theory. MIEM, Moscow, 1984. In Russian.
[77] V. F. Kolchin. On the behavior of a random graph near a critical point. Theory Probab. Appl., 31:439-451, 1986. [78] V. F. Kolchin. Random Mappings. Optimization Software, New York, 1986.
[79] V. F. Kolchin. Systems of Random Equations. MIEM, Moscow, 1988. In Russian. [80] V. F. Kolchin. On the number of permutations with constraints on their cycle lengths. Discrete Math. Appl., 1:179-194, 1991. [81] V. F. Kolchin. Cycles in random graphs and hypergraphs. Adv. Appl. Probab., 24:768, 1992. [82] V. F. Kolchin. The number of permutations with cycle lengths from a fixed set. In Random Graphs'89, pp. 139-149. Wiley, New York, 1992. [83] V. F. Kolchin. Consistency of a system of random congruences. Discrete Math. Appl., 3:103-113, 1993. [84] V. F. Kolchin. A classification problem in the presence of measurement errors. Discrete Math. Appl., 4:19-30, 1994. [85] V. F. Kolchin. Random graphs and systems of linear equations in finite fields. Random Structures and Algorithms, 5:135-146, 1994. [86] V. F. Kolchin. Systems of random linear equations with small number of non-zero coefficients in finite fields. In Probabilistic Methods in Discrete Mathematics, pp. 295-304. VSP, Utrecht, 1997.
246
Bibliography
[87] V. F. Kolchin and V. I. Khokhlov. An allocation problem and moments of the binomial distribution. In Probab. Problems of Discrete Math., pp. 1621. MIEM, Moscow, 1987. In Russian. [88] V. F. Kolchin and V. I. Khokhlov. On the number of cycles in a random non-equiprobable graph. Discrete Math. Appl., 2:109-118, 1992. [89] V. F. Kolchin and V. I. Khokhlov. A threshold effect for systems of random equations of a special form. Discrete Math. Appl., 5:425-436, 1995. [90] V. F. Kolchin, B. A. Sevastyanov, and V. P. Chistyakov. RandomAllocations. Wiley, New York, 1978. [91] I. N. Kovalenko. A limit theorem for determinants in the class of Boolean functions. Soviet Math. Dokl., 161:517-519, 1965. In Russian. [92] I. N. Kovalenko. On the limit distribution of the number of solutions of a random system of linear equations in the class of Boolean functions. Theory
Probab. Appl., 12:51-61, 1967. In Russian. [93] I. N. Kovalenko, A. A. Levitskaya, and M. N. Savchuk. Selected Problems
of Probabilistic Combinatorics. Naukova Dumka, Kiev, 1986. In Russian. [94] J. B. Kruskal. The expected number of components under a random map-
ping function. Amer. Math. Monthly, 61:392-397, 1954. [95] E. Landau. Handbuch der Lehre von der Verteilung der Primzahlen, vol. 1. Teubner, Berlin, 1909. [96] A. A. Levitskaya. Theorems on invariance of the limit behaviour of the number of solutions of a system of random linear equations over a finite ring. Cybernetics, (2):140-141, 1978. In Russian. [97] A. A. Levitskaya. Theorems on invariance for the systems of random linear equations over an arbitrary finite ring. Soviet Math. Dokl., 263:289-291, 1982. In Russian. [98] A. A. Levitskaya. The probability of consistency of a system of random linear equations over a finite ring. Theory Probab. Appl., 30:339-350, 1985. In Russian.
[99] T. Luczak. Component behaviour near the critical point of the random graph process. Random Structures and Algorithms, 1:287-310, 1990.
[100] T. Luczak. Cycles in a random graph near the critical point. Random Structures and Algorithms, 2:421-439, 1991. [101] T. Luczak and B. Pittel. Components of random forests. Comb. Probab. and Comput., 1:35-52, 1992. [102] M. P. Mineev and A. I. Pavlov. On the number of permutations of a special form. Math. USSR Sb., 99:468-476, 1976. In Russian. [103] L. Moser and M. Wyman. On the solution of xd = 1 in symmetric groups. Canad. J. Math., 7:159-168, 1955. [104] L. R. Mutafchiev. Local limit theorems for sums of power series distributed random variables and for the number of components in labelled relational structures. Random Structures and Algorithms, 3:403-426, 1992.
Bibliography
247
[105] E. Palmer. Graphical Evolution. Wiley, New York, 1985.
[106] A. I. Pavlov. On the limit distribution of the number of cycles and the logarithm of the order of a class of permutations. Math. USSR Sb., 42:539567, 1982. [107] A. I. Pavlov. On the number of cycles and the cycle structure of permutations from some classes. Math. USSR Sb., 46:536-556, 1984.
[108] A. I. Pavlov. On the permutations with cycle lengths from a fixed set. Theory Probab. Appl., 31:618-619, 1986. In Russian. [109] A. I. Pavlov. Local limit theorems for the number of components of random substitutions and mappings. Theory Probab. Appl., 33:196-200, 1988. In Russian.
[110] A. I. Pavlov. The number and cycle structure of solutions of a system of equations in substitutions. Discrete Math. Appl., 1:195-218, 1991. [111] Yu. L. Pavlov. The asymptotic distribution of maximum tree size in a random forest. Theory Probab. Appl., 22:509-520, 1977. [112] Yu. L. Pavlov. Limit theorems for the number of trees of a given size in a random forest. Math. USSR Sb., 32:335-345, 1977. [113] Yu. L. Pavlov. A case of limit distribution of the maximum size of a tree in a random forest. Math. Notes, 25:387-392, 1979. [114] Yu. L. Pavlov. Limit distributions of some characteristics of random mappings with a single cycle. In Math. Problems of Modelling Complex Systems, pp. 48-55. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1979. In Russian. [115] Yu. L. Pavlov. Limit theorems for a characteristic of a random mapping. Theory Probab. Appl., 27:829-834, 1981. [116] Yu. L. Pavlov. Limit distributions of the height of a random forest. Theory Probab. Appl., 28:471-480, 1983. [117] Yu. L. Pavlov. On the random mappings with constraints on the number of cycles. In Proc. Steklov Inst. Math., pp. 131-142. Nauka, Moscow, 1986. [118] Yu. L. Pavlov. Some properties of plane planted trees. In Abstr. All-Union Conference on Discrete Math. and its Appl. to Modelling of Complex Systems, p. 14. Irkutsk State Univ., Irkutsk, 1991. In Russian. [119] Yu. L. Pavlov. Some properties of planar planted trees. Discrete Math. Appl., 3:97-102, 1993. [120] Yu. L. Pavlov. The limit distributions of the maximum size of a tree in a random forest. Discrete Math. Appl., 5:301-316, 1995. [121] Yu. L. Pavlov. Limit distributions of the number of trees of a given size in a random forest. Discrete Math. Appl., 6:117-133, 1996. [122] V. V. Petrov. Sums of Independent Random Variables. Springer-Verlag, New York, 1975. [123] B. Pittel. On tree census and the giant component in sparse random graphs. Random Structures and Algorithms, 1:311-342, 1990.
248
Bibliography
[124] G. P61ya and G. Szeg6. Aufgaben and Lehrsatze aus derAnalysis. SpringerVerlag, Berlin, 1925. [125] Yu. V. Prokhorov. Asymptotic behaviour of the binomial distribution. Uspekhi Matem. Nauk, 8(3):135-142, 1953. In Russian. [126] J. Riordan. Combinatorial Identities. Wiley, New York, 1968.
[127] A. Rucinski and N. C. Wormald. Random graph processes with degree restrictions. Combinatorics, Probability and Computing, 1: 169-180, 1992. [128] V. N. Sachkov." Mappings of a finite set with restraints on contours and height. Theory Probab. Appl., 17:640-656, 1972. [129] V. N. Sachkov. Random mappings with bounded height. Theory Probab. Appl., 18:120-130, 1973. [130] V. N. Sachkov. Probability Methods in Combinatorial Analysis. Nauka, Moscow, 1978. In Russian. [131] A. I. Saltykov. The number of components in a random bipartite graph. Discrete Math. Appl., 5:515-523, 1995. [132] B. A. Sevastyanov. Convergence of the number of empty cells in the classical allocation problems to Gaussian and Poisson processes. Theory Probab. Appl., 12:144-154, 1967. In Russian. [133] V. E. Stepanov. On the probability of connectedness of a random graph g,,, (t). Theory Probab. Appl., 15:55-67, 1970. [134] V. E. Stepanov. Phase transition in random graphs. Theory Probab. Appl., 15:187-203, 1970. [135] V. E. Stepanov. Structure of random graphs g (x h). Theory Probab. Appl., 17:227-242, 1972. [136] L. Takacs. On the height and widths of random rooted trees. Adv. Appl. Probab., 24:771, 1992. [137] S. G. Tkachuk. Local limit theorems on large deviations in the case of stable limit laws. Izvestiya of Uzbek Academy of Sciences, (2):30-33, 1973. In Russian. [138] V. A. Vatutin. Branching processes with final types of particles and random trees. Adv. Appl. Probab., 24:771, 1992. [139] A. M. Vershik and A. A. Shmidt. Symmetric groups of high degree. Soviet Math. Dokl., 13:1190-1194,1972. [140] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups. i. Theory Probab. Appl., 22:78-85, 1977. [141] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups. ii. Theory Probab. Appl., 23:36-49, 1978. [142] V. A. Voblyi. Asymptotic enumeration of labelled connected sparse graphs with a given number of planted vertices. Discrete Analysis, 42:3-16, 1985. In Russian. [143] V. A. Voblyi. Wright and Stepanov-Wright coefficients. Math. Notes, 42:969-974, 1987. I
Bibliography
249
[144] L. M. Volynets. The number of solutions of an equation in the symmetric group. In Probab. Processes and Appl., pp. 104-109. MIEM, Moscow, 1985. In Russian. [145] L. M. Volynets. On the number of solution of the equation xs = e in the symmetric group. Math. Notes, 40:155-160, 1986. In Russian. [146] L. M. Volynets. An estimate of the rate of convergence to the limit distribution for the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 40-46. MIEM, Moscow, 1987. In Russian. [147] L. M. Volynets. The generalized scheme of allocation and the distribution of the number of cycles in a random substitution. In Abstracts of the Second All-Union Conf. Probab. Methods of Discrete Math., pp. 27-28. Petrozavodsk, 1988. In Russian. [148] L. M. Volynets. The generalized scheme of allocation and the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 131-136. MIEM, Moscow, 1988. In Russian. [149] L. M. Volynets. An example of a nonstandard asymptotics of the number of substitutions with restrictions on the cycle lengths. In Probab. Processes and Appl., pp. 85-90. MIEM, Moscow, 1989. In Russian. [150] H. Wilf. The asymptotics of eP(z) and the number of elements of each order in S,,. Bull. Amer. Math. Soc., 15:228-232, 1986. [151] E. M. Wright. The number of connected sparsely edged graphs. iii. J. Graph Theory, 4:393-407, 1980. [152] E. M. Wright. The number of connected sparsely edged graphs. iv. J. Graph Theory, 7:219-229, 1983. [153] A. L. Yakymiv. On the distribution of the number of cycles in random a-substitutions. In Abstracts of the Second All-Union Conference Probab. Methods in Discrete Math., p. 111. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1988. In Russian. [154] A. L. Yakymiv. Substitutions with cycle lengths from a fixed set. Discrete Math. Appl., 1:105-116, 1991. [155] A. L. Yakymiv. Some classes of substitutions with cycle lengths from a given set. Discrete Math. Appl., 3:213-220, 1993. [156] N. Zierler. Linear recurring sequences. J. Soc. Ind. Appl. Math., 7:31-48, 1959.
INDEX
algorithm A2, 173
independent critical sets, 125 inversion formula, 10
characteristic function, 8 classical scheme of allocation, 16 complete description of distribution of the number of cycles, 192 connected component, 23 connectivity, 22 critical graph, 91 critical set, 125
length of the maximum cycle, 212 limit distribution of the number of hypercycles, 164 linearly independent solutions, 125 local limit theorem, 10 mathematical expectation, 3 maximal span, 9 maximum number of independent critical sets, 125, 135 maximum size of components, 66 maximum size of components of a random graph, 84 maximum size of trees in a random forest, 48 maximum size of trees in a random graph, 83 maximum size of trees in a random graph from A,, T, 71 mean, 3 mean number of solutions in the equiprobable case, 132 method of coordinate testing, 168 multinomial distribution, 15 multiplicity of a vertex in a set of hyperedges, 126
decomposable property, 23 distribution function, 2
equations involving permutations, 219 equations of compound degree, 235 equations of prime degree, 225 factorial moment, 3 feedback point, 124 forest, 21 forest of nonrooted trees, 30 generalized scheme of allocation, 14 generating function, 6 graphs with components of two types, 70 homogeneous system of equations, 125 hypercycle, 126 251
252
Index
nonequiprobable graph, 109 normal distribution, 9 number of components, 107, 144 number of components in Un, 65 number of cycles in a random permutation, 182, 183 number of cycles of length r in a random permutation, 182 number of forests, 31 number of linearly independent solutions, 130 number of nontrivial solutions, 131 number of trees of fixed sizes, 71 number of trees with r vertices, 42 number of unicyclic components, 77, 81 number of unicyclic graphs, 58 number of vertices in the maximal unicyclic component, 81 number of vertices in unicyclic components, 71, 77 order of random permutation, 239 order statistics, 17 partition, 19 permutations with restrictions on cycle lengths, 192 Poisson distribution, 6 probability, 1 probability distribution, 2 probability of consistency, 144 probability of reconstruction of the true solution, 166 probability space, 1 problem of moments, 4 process of sequential growth of the number of rows, 127
random graph of a random permutation, 182 random graphs with independent edges, 100 random matrices with independent elements, 126 random pairwise comparisons, 164 random partitions, 30 random permutation, 28 random variable, 1 rank of matrix, 124 rank of random sparse matrices, 135 reconstructing the true solution, 165 saddle-point method, 7, 221 set of rooted trees, 21 shift register, 123 simple classification problem, 122 single-valued mapping, 18 statistical problem of testing the hypotheses Ho and H1, 180 subcritical graph, 91 summation of independent random variables in GF(2), 131 supercritical graph, 91 system of linear equations in GF(2), 122 system of random equations with distorted right-hand sides, 180 system with at most two unknowns in each equation, 156 threshold property, 156 total number of components, 24 total number of critical sets, 157 total number of cycles, 102, 212 total number of hypercycles, 158
unicyclic graph, 58 random element, 1 random forest, 21, 30 random graph corresponding to random permutation, 181
voting algorithm, 165 weak convergence, 2