Reliability Criteria in Information Theory and in Statistical Hypothesis Testing
Reliability Criteria in Information Theory and in Statistical Hypothesis Testing Evgueni A. Haroutunian National Academy of Sciences of Armenia Republic of Armenia
[email protected]
Mariam E. Haroutunian National Academy of Sciences of Armenia Republic of Armenia
[email protected]
Ashot N. Harutyunyan Universit¨at Duisburg-Essen Germany
[email protected]
Boston – Delft
R Foundations and Trends in Communications and Information Theory
Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 USA Tel. +1-781-985-4510 www.nowpublishers.com
[email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is E. A. Haroutunian, M. E. Haroutunian and A. N. Harutyunyan, Reliability Criteria in Information Theory and in Statistical R Hypothesis Testing, Foundations and Trends in Communications and Information Theory, vol 4, nos 2–3, pp 97–263, 2007 ISBN: 978-1-60198-046-5 c 2008 E. A. Haroutunian, M. E. Haroutunian
and A. N. Harutyunyan All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copying, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to photocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; www.nowpublishers.com;
[email protected] now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
[email protected]
R Foundations and Trends in Communications and Information Theory Volume 4 Issue 2–3, 2007 Editorial Board
Editor-in-Chief: Sergio Verd´ u Depart of Electrical Engineering Princeton University Princeton, New Jersey 08544 Editors Venkat Anantharam (UC. Berkeley) Ezio Biglieri (U. Torino) Giuseppe Caire (U. Sounthern California) Roger Cheng (U. Hong Kong) K.C. Chen (Taipei) Daniel Costello (U. Notre Dame) Thomas Cover (Stanford) Anthony Ephremides (U. Maryland) Andrea Goldsmith (Stanford) Dave Forney (MIT) Georgios Giannakis (U. Minnesota) Joachim Hagenauer (TU Munich) Te Sun Han (Tokyo) Babak Hassibi (Caltech) Michael Honig (Northwestern) Johannes Huber (Erlangen) Hideki Imai (Tokyo) Rodney Kennedy (Canberra) Sanjeev Kulkarni (Princeton)
Amos Lapidoth (ETH Zurich) Bob McEliece (Caltech) Neri Merhav (Technion) David Neuhoff (U. Michigan) Alon Orlitsky (UC. San Diego) Vincent Poor (Princeton) Kannan Ramchandran (UC. Berkeley) Bixio Rimoldi (EPFL) Shlomo Shamai (Technion) Amin Shokrollahi (EPFL) Gadiel Seroussi (MSRI) Wojciech Szpankowski (Purdue) Vahid Tarokh (Harvard) David Tse (UC. Berkeley) Ruediger Urbanke (EPFL) Steve Wicker (Cornell) Raymond Yeung (Hong Kong) Bin Yu (UC. Berkeley)
Editorial Scope R Foundations and Trends in Communications and Information Theory will publish survey and tutorial articles in the following topics:
• Coded modulation
• Multiuser detection
• Coding theory and practice
• Multiuser information theory
• Communication complexity
• Optical communication channels
• Communication system design
• Pattern recognition and learning
• Cryptology and data security
• Quantization
• Data compression
• Quantum information processing
• Data networks
• Rate-distortion theory
• Demodulation and Equalization
• Shannon theory
• Denoising • Detection and estimation
• Signal processing for communications
• Information theory and statistics
• Source coding
• Information theory and computer science
• Storage and recording codes
• Joint source/channel coding
• Wireless Communications
• Speech and Image Compression
• Modulation and signal design
Information for Librarians R Foundations and Trends in Communications and Information Theory, 2007, Volume 4, 6 issues. ISSN paper version 1567-2190. ISSN online version 15672328. Also available as a combined paper and online subscription.
R in Foundations and Trends Communications and Information Theory Vol. 4, Nos. 2–3 (2007) 97–263 c 2008 E. A. Haroutunian, M. E. Haroutunian
and A. N. Harutyunyan DOI: 10.1561/0100000008
Reliability Criteria in Information Theory and in Statistical Hypothesis Testing Evgueni A. Haroutunian1 , Mariam E. Haroutunian2 and Ashot N. Harutyunyan3 1
2
3
Inst. for Informatics and Automation Problems, National Academy of Sciences of Armenia, Yerevan, Republic of Armenia,
[email protected] Inst. for Informatics and Automation Problems, National Academy of Sciences of Armenia, Yerevan, Republic of Armenia,
[email protected] AvH Fellow, Inst. f¨ ur Experimentelle Mathematik, Universit¨ at Duisburg-Essen, Essen, Germany,
[email protected] To the memory of Roland Dobrushin the outstanding scientist and wonderful teacher.
Abstract This survey is devoted to one of the central problems of Information Theory — the problem of determination of interdependence between coding rate and error probability exponent for different information transmission systems. The overview deals with memoryless systems of finite alphabet setting. It presents material complementary to the contents of the series of the most remarkable in Information Theory books of Feinstain, Fano, Wolfowitz, Gallager, Csiszar and K¨orner, Kolesnik and Poltirev, Blahut, Cover and Thomas and of the papers by Dobrushin, Gelfand and Prelov.
We briefly formulate fundamental notions and results of Shannon theory on reliable transmission via coding and give a survey of results obtained in last two-three decades by the authors, their colleagues and some other researchers. The paper is written with the goal to make accessible to a broader circle of readers the theory of rate-reliability. We regard this concept useful to promote the noted problem solution in parallel with elaboration of the notion of reliabilityreliability dependence relative to the statistical hypothesis testing and identification.
Preface
This monograph is devoted to one of the central problems of Information Theory — the problem of determination of interdependence between coding rate and error probability exponent for different information transmission systems. The overview deals with memoryless systems of finite alphabet setting. It presents material complementary to the contents of the series of the most remarkable in Information Theory books. We briefly formulate fundamental notions and results of Shannon theory on reliable transmission via coding and give a survey of results obtained in last two–three decades by coauthors, their colleagues, and some other researchers. The review was written with the goal to make accessible to a broader circle of readers the concept of rate-reliability. We regard this concept useful to promote the noted problem solution in parallel with elaboration of the notion of reliability– reliability dependence relative to the statistical hypothesis testing and identification. The authors are grateful to R. Ahlswede, V. Balakirsky, P. Harrem¨oes, N. Cai for their useful inputs to the earlier versions of the manuscript. ix
x Preface The comments and suggestions of S. Shamai (Shitz), G. Kramer, and the anonymous reviewers are highly appreciated. The participation of our colleagues P. Hakobyan and S. Tonoyan in the manuscript revision was helpful. A. Harutyunyan acknowledges the support by the Alexander von Humboldt Foundation to his research at the Institute for Experimental Mathematics, Essen University.
Contents
1 Introduction
1
1.1 1.2
1
1.3 1.4
Information Theory and Problems of Shannon Theory Concepts of Reliability Function and of Rate-Reliability Function Notations for Measures of Information and Some Identities Basics of the Method of Types
2 E-capacity of the Discrete Memoryless Channel 2.1 2.2 2.3 2.4 2.5 2.6 2.7
Channel Coding and Error Probability: Shannon’s Theorem E-capacity (Rate-Reliability Function) of DMC Sphere Packing Bound for E-capacity Random Coding Bound for E-capacity Expurgated Bound for E-capacity Random Coding and Expurgated Bounds Derivation Comparison of Bounds for E-capacity
2 5 7 9 9 16 16 20 24 25 29
3 Multiuser Channels
33
3.1 3.2 3.3 3.4
33 38 43 47
Two-Way Channels Interference Channels Broadcast Channels Multiple-Access Channels xi
xii Contents 4 E-capacity of Varying Channels
57
4.1 4.2 4.3 4.4 4.5
57 59 62 66 74
Bounds on E-capacity for the Compound Channels Channels with Random Parameter Information Hiding Systems Multiple-Access Channels with Random Parameter Arbitrarily Varying Channels
5 Source Coding Rates Subject to Fidelity and Reliability Criteria
77
5.1 5.2 5.3 5.4 5.5
77 78 85 90 97
Introductory Notes The Rate-Reliability-Distortion Function Proofs, Covering Lemma Binary Hamming Rate-Reliability-Distortion Function Reliability Criterion in AVS Coding
6 Reliability Criterion in Multiterminal Source Coding
103
6.1 6.2 6.3
103 112 117
Robust Descriptions System Cascade System Coding Rates Reliability Criterion in Successive Refinement
7 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses 7.1 7.2 7.3 7.4
Prelude Reliability Function for Two Alternative Hypotheses Multiple Hypotheses: Interdependence of Reliabilities Optimal Testing and Identification for Statistical Hypothesis
129 129 130 136 141
Basic Notations and Abbreviations
153
References
157
1 Introduction
1.1
Information Theory and Problems of Shannon Theory The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at an other point. Claude Shannon, 1948
Information Theory as a scientific discipline originated from the landmark work “A mathematical theory of communication” [191] of an American genius engineer and mathematician Claude E. Shannon in 1948, and thereafter exists as a formalized science with more than a half century life. In the Guest Editorial [215] to “Commemorative issue 1948–1998” of IEEE Transactions on Information Theory Sergio Verd´ u certified: “With communication engineering in the epicenter of the bombshell, the sensational aftermath of Shannon’s paper soon reached Mathematics, Physics, Statistics, Computing, and Cryptology. Even Economics, Biology, Linguistics, and other fields in the natural and social sciences felt the ripples of Shannon’s new theory.” In his wise retrospective [82] on the founder’s life and scientific heritage Robert Gallager wrote: “Claude E. Shannon invented information theory and 1
2
Introduction
provided the concepts, insights, and mathematical formulations that now form the basis for modern communication technology. In a surprisingly large number of ways, he enabled the information age.” The exceptional role of Claude Shannon in development of modern science was noted earlier by Andrey Kolmogorov in Preface to Russian edition of Shannon’s “Works on Information Theory and Cybernetics” [197] and by Roland Dobrushin in Preface of Editor to the Russian translation of the book by Csisz´ar and K¨orner [51]. In [191] and another epochal work [194] Shannon mathematically addressed the basic problems in communications and gave their solutions, stating the three fundamental discoveries underlying the information theory concerning the transmission problem via noisy channel and its inherent concept — capacity, data compression with the central role of entropy in that, and source coding under fidelity criterion with specification of the possible performance limit in terms of the mutual information introduced by him. Under the term “Shannon Theory” it is generally accepted now to mean the subfield of information theory which deals with the establishment of performance bounds for various parameters of transmission systems. The relevant sections of this review treat noted fundamental results and go further in generalizations and solutions of those problems toward some classical and more complicated communication situations, focusing on the results and methodology developed mainly in the works of coauthors related to the role of the error probability exponent as a characteristic in the mathematical model of an information transmission system. Taking into account the interconnection of the statistical, probabilistic, and information theoretical problems we hereby add results also on the error exponent (reliability function) investigation in statistical hypotheses testing models.
1.2
Concepts of Reliability Function and of Rate-Reliability Function
Important properties of each communication channel are characterized by the reliability function E(R), which was introduced by
1.2 Concepts of Reliability Function and of Rate-Reliability Function
3
Shannon [195], as the optimal exponent of the exponential decrease exp{−N E(R)} of the decoding error probability, when code length N increases, for given transmission rate R less than capacity C of the channel [191]. In an analogous sense one can characterize various communication systems. The reliability function E(R) is also called the error probability exponent. Besides, by analogy with the concept of the rate-distortion function [26, 194], the function E(R) may be called the reliability-rate function. There is a large number of works devoted to studying of this function for various communication systems. Along with achievements in this part of Shannon theory a lot of problems have remained unsolved. Because of principal difficulty of finding the reliability function for the whole range of rates 0 < R < C, this problem is completely solved only in rather particular cases. The situation is typical when obtained upper and lower bounds for the function E(R) coincide only for rates R in some interval, say Rcrit < R < C, where Rcrit is the rate, for which the derivative of E(R) by R equals −1. It is desirable to create a more harmonious general theory and more effective methods of usable bounds construction for new classes of more complicated information transmission systems. It seems, that the approach developed by the authors is fruitful for this purpose. It consists in studying the function R(E) = C(E), inverse to E(R) [98, 100, 102]. This is not a simple mechanical permutation of roles of independent and dependent variables, since the investigation of optimal rates of codes, ensuring when N increases the error probability exponential decrease with given exponent (reliability) E, can be more expedient than the study of the function E(R). At the same time, there is an analogy with the problem from coding theory about bounding of codes optimal volume depending on their correction ability. This allows to hope for profitable application of results and methods of one theory in the other. The definition of the function C(E) is in natural conformity with Shannon’s notions of the channel capacity C and of the zero-error capacity C0 [152]. When E increases from zero to infinity the function C(E) decreases from C to C0 (it is
4
Introduction
so, if C0 > 0, in the other case C(E) = 0 when E is great enough). So, by analogy with the definition of the capacity, this characteristic of the channel may be called E-capacity. From the other side the name rate-reliability function is also logical. One of the advantages of our approach is the convenience in study of the optimal rates of source codes ensuring given exponential decrease of probability of exceeding the given distortion level of messages restoration. This will be the ratereliability-distortion function R(E, ∆, P ) inverse to exponent function E(R, ∆, P ) by Marton [171]. So the name shows which dependence of characteristics is in study. Later on, it is possible to consider also other arguments, for example, coding rates on the other inputs of channel or source, if their number is greater than one. This makes the theory more well-proportioned and comprehensible. Concerning methods for the bounds construction, it is found that the Shannon’s random coding method [191] of proving the existence of codes with definite properties, can be applied with the same success for studying of the rate-reliability function. For the converse coding theorem type upper bounds deduction (so called sphere packing bounds) E. Haroutunian proposed a simple combinatorial method [98, 102], which one can apply to various systems. This method is based on the proof of the strong converse coding theorem, as it was in the method put forth in [99] and used by other authors [35, 51], and [152] for deduction of the sphere packing bound for the reliability function. Moreover, derivation of the upper bound of C(E) by passage to limit for E → ∞ comes to be the upper bound for the zero-error capacity C0 . We note the following practically useful circumstance: the comparison of the analytical form of writing of the sphere packing bound for C(E) with expression of the capacity C in some cases gives us the possibility to write down formally the bound for each system, for which the achievable rates region (capacity) is known. In rate-reliability-distortion theory an advantage of the approach is the technical ease of treatment of the coding rate as a function of distortion and error exponent which allows to convert readily the results from the rate-reliability-distortion area to the rate-distortion ones looking at the extremal values of the reliability, e.g., E → 0, E → ∞. That fact is especially important when one deals with multidimensional situation. Having solved the problem
1.3 Notations for Measures of Information and Some Identities
5
of finding the rate-reliability-distortion region of a multiterminal system, the corresponding rate-distortion one can be deduced without an effort. In literature we know an early attempt to consider the concept of E-capacity R(E). In [51] (Section 2.5) Csisz´ar and K¨orner mention the concept of “generalized capacity” for DMC as “the capacity” corresponding to tolerated probability of error exp{−N E} (i.e., the largest R with E(R) ≥ E). But they limited themselves with consideration (problem 15, Section 2.5) only of the case E ≤ Ecr (W ), where Ecr (W ) = E(Rcr (W )). In some of the earlier works the rate-reliability function was also considered (for e.g., Fu and Shen [77], Tuncel and Rose [206] and Chen [42]). E. A. Haroutunian and M. E. Haroutunian [116] have been teaching the concept of E-capacity in Yerevan State University for many years.
1.3
Notations for Measures of Information and Some Identities
Here we introduce our notations for necessary characteristics of Shannon’s entropy and mutual information and Kullback–Leibler’s divergence. In the review finite sets are considered, which are denoted by U, X , Y, S, . . . . The size of the set X is denoted by |X |. Random variables (RVs) with values in U, X , Y, S, . . . are denoted by U, X, Y, S, . . . . Probability distributions (PDs) are denoted by Q, P, V, W, P V, P ◦ V, . . . . Let M PD of RV X be P = {P (x), x ∈ X }, and V be conditional PD of RV Y for given value x M
V = {V (y|x), x ∈ X , y ∈ Y}, joint PD of RV X and Y be M
P ◦ V = {P ◦ V (x, y) = P (x)V (y|x), x ∈ X , y ∈ Y}, and PD of RV Y be )
( M
PV =
P V (y) =
X x∈X
P (x)V (y|x), y ∈ Y .
6
Introduction
The set of messages to be transmitted are denoted by M and its cardinality by M . We use the following notations (here and in the sequel all log-s and exp-s are of base 2): for entropy of RV X with PD P : X M HP (X) = − P (x) log P (x), x∈X
for entropy of RV Y with PD P V : X M HP,V (Y ) = − P V (y) log P V (y), y∈Y
for joint entropy of RV X and Y : X M HP,V (X, Y ) = − P ◦ V (x, y) log P ◦ V (x, y), x∈X ,y∈Y
for conditional entropy of RV Y relative to RV X: X M HP,V (Y |X) = − P V (y) log V (y|x), x∈X ,y∈Y
for mutual information of RV X and Y : X M IP,V (X ∧ Y ) = IP,V (Y ∧ X) =
P (x)V (y|x) log
x∈X ,y∈Y
V (y|x) , P V (y)
for conditional mutual information of RV X and Y relative to M M M RV U with PD Q = {Q(u), u ∈ U}, P = {P (x|u), u ∈ U, x ∈ X }, V = {V (y|x, u), u ∈ U, x ∈ X , y ∈ Y}, IQ,P,V (X ∧ Y |U ) X V (y|x, u) M = Q(u)P (x|u)V (y|x, u) log , P V (y|u) u∈U ,x∈X ,y∈Y
for informational divergence of PD P and PD Q on X : X P (x) M D(P kQ) = P (x) log , Q(x) x∈X
and for informational conditional divergence of PD P ◦ V and PD M P ◦ W on X × Y, where W = {W (y|x), x ∈ X , y ∈ Y}: X V (y|x) M D(P ◦ V kP ◦ W ) = D(V kW |P ) = P (x)V (y|x) log . W (y|x) x∈X ,y∈Y
1.4 Basics of the Method of Types
7
The following identities are often useful D(P ◦ V kQ ◦ W ) = D(P kQ) + D(V kW |P ), HP,V (X, Y ) = HP (X) + HP,V (Y |X) = HP,V (Y ) + HP,V (X|Y ), IP,V (Y ∧ X) = HP,V (Y ) − HP,V (Y |X) = HP (X) + HP,V (Y ) − HP,V (X, Y ), IQ,P,V (Y ∧ X|U ) = HQ,P,V (Y |U ) − HQ,P,V (Y |X, U ), IQ,P,V (X ∧ Y, U ) = IQ,P,V (X ∧ Y ) + IQ,P,V (X ∧ U |Y ) = IQ,P,V (X ∧ U ) + IQ,P,V (X ∧ Y |U ).
1.4
Basics of the Method of Types
Our proofs will be based on the method of types [49, 51], one of the important technical tools in Shannon Theory. It was one of Shannon’s key notions, called “typical sequence,” that served, developed, and applied in many works, particularly in the books of Wolfowitz [222], Csisz´ar and K¨orner [51], Cover and Thomas [48], and Yeung [224]. The idea of the method of types is to partition the set of all N length sequences into classes according to their empirical distributions (types). The type P of a sequence (or vector) x = (x1 , . . . , xN ) ∈ X N is a PD P = {P (x) = N (x|x)/N, x ∈ X }, where N (x|x) is the number of repetitions of symbol x among x1 , . . . , xN . The joint type of x and y ∈ Y N is the PD {N (x, y|x, y)/N, x ∈ X , y ∈ Y}, where N (x, y|x, y) is the number of occurrences of symbols pair (x, y) in the pair of vectors (x, y). In other words, joint type is the type of the sequence (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ) from (X × Y)N . We say that the conditional type of y for given x is PD V = {V (y|x), x ∈ X , y ∈ Y} if N (x, y|x, y) = N (x|x)V (y|x) for all x ∈ X , y ∈ Y. The set of all PD on X is denoted by P(X ) and the subset of P(X ) consisting of the possible types of sequences x ∈ X N is denoted by PN (X ). The set of vectors x of type P is denoted by TPN (X) (TPN (X) = ∅ for PD P ∈ / PN (X )). The set of all sequences y ∈ Y N of N (Y |x) and conditional type V for given x ∈ TPN (X) is denoted by TP,V
8
Introduction
is called V -shell of x. The set of all possible V -shells for x of type P is denoted by VN (Y, P ). In the following lemmas very useful properties of types are formulated, for proofs see [49, 51, 63]. Lemma 1.1. (Type counting) |PN (X )| < (N + 1)|X | ,
(1.1)
|VN (Y, P )| < (N + 1)|X ||Y| .
(1.2)
Lemma 1.2. For any type P ∈ PN (X ) (N + 1)−|X | exp{N HP (X)} < |TPN (X)| ≤ exp{N HP (X)},
(1.3)
and for any conditional type V and x ∈ TPN (X) (N + 1)−|X ||Y| exp{N HP,V (Y |X)} N < |TP,V (Y |x)| ≤ exp{N HP,V (Y |X)}.
(1.4)
N (Y |x), then Lemma 1.3. If x ∈ TPN (X), y ∈ TP,V
QN (x) = exp{−N (HP (X) + D(P kQ)),
(1.5)
W N (y|x) = exp{−N (HP,V (Y |X) + D(V kW |P ))}.
(1.6)
Some authors frequently apply known facts of the theory of large deviations [48] for proofs of information-theoretical results. In tutorial [54] Csisz´ar and Shields deduce results on large deviations using the method of types. This method helps in a better perception of the subject because the process of inference in all cases is based on the examination of the types of vectors. That is why we prefer the usage of the method of types.
2 E -capacity of the Discrete Memoryless Channel
2.1
Channel Coding and Error Probability: Shannon’s Theorem M
Let X , Y be finite sets and W = {W (y|x), x ∈ X , y ∈ Y} be a stochastic matrix.
Definition 2.1. A discrete memoryless channel (DMC) W with input alphabet X and output alphabet Y is defined by a stochastic matrix of transition probabilities W : X → Y. An element W (y|x) of the matrix is a conditional probability of receiving the symbol y ∈ Y on the channel’s output if the symbol x ∈ X is transmitted from the input. The model for N actions of the channel W is described by the stochastic matrix W N : X N → YN , 9
10
E-capacity of the Discrete Memoryless Channel
Fig. 2.1 Communication system with noisy channel.
an element of which W N (y|x) is a conditional probability of receiving the vector y ∈ Y N , when the vector x ∈ X N is transmitted. We consider only memoryless channels, which operate in each moment of time independently of the previous and next transmitted or received symbols, so for all x ∈ X N and y ∈ Y N N
W (y|x) =
N Y
W (yn |xn ).
(2.1)
n=1
The Shannon’s model of two-terminal channel is presented in Figure 2.1. Let M denotes the set of messages and M be the number of messages. Definition 2.2. An N -block code (f, g) for the channel W is a pair of mappings, where f : M → X N is encoding and g : Y N → M is decoding. N is called the code length, and M is called the code volume.
Definition 2.3. The probability of erroneous transmission of the message m ∈ M by the channel using the code (f, g) is defined as M
e(m) = W N (Y N − g −1 (m)|f (m)) = 1 − W N (g −1 (m)|f (m)).
(2.2)
We consider two versions of error probability of the code (f, g): the maximal probability of error M
e(f, g, N, W ) =
max e(m),
m∈M
M
(2.3)
e(M, N, W ) = min e(f, g, N, W ), where the minimum is taken over all codes (f, g) of volume M , the average probability of error for equiprobable messages X M 1 e(m), (2.4) e(f, g, N, W ) = M m∈M
2.1 Channel Coding and Error Probability: Shannon’s Theorem
11
with M
e(M, N, W ) = min e(f, g, N, W ) as the minimum average probability between all possible codes of the length N and the volume M . It is clear that always e(f, g, N, W ) ≤ e(f, g, N, W ). Definition 2.4. The transmission rate of a code (f, g) of the length N and the volume M is M
R(f, g, N ) =
1 log M. N
(2.5)
The channel coding problem is the following: it is necessary to make the message set M of the code larger while keeping maximal (or average) probability of error possibly low. The problem is considered in an asymptotic sense when N → ∞. One of the Shannon’s main discoveries [191] is that he formulated and for some channels justified the statement which consists of the following direct and converse parts and now is noted as the Shannon’s Theorem or the Channel Coding Theorem. It is possible to characterize each channel W by a number C(W ), which is called capacity, such that when N → ∞: for 0 < R < C(W ) there exist codes with M = exp{N R} codewords, and the average probability of error e(N, M, W ) going to 0, for R > C(W ) for any code with M = exp{N R} codewords the average probability of error e(N, M, W ) goes to 1. Shannon introduced the notion of mutual information and discovered that in the case of DMC W C(W ) = max IP,W (X ∧ Y ), P
where P = {P (x), x ∈ X } is the PD of input symbols x.
(2.6)
12
E-capacity of the Discrete Memoryless Channel
The same result is valid for the case of maximal probability of error. One of the first proofs of Shannon’s theorem was given in the works of Feinstein [74, 75], where it was also established that for R < C(W ) the probability of error tends to zero exponentially with growing N . For a given channel W and given rate R the optimal exponent E(R) of the exponential decrease of the error probability first was considered by Shannon [195], who called it the reliability function.
Definition 2.5. The reliability function E(R, W ) of the channel W is defined as M
E(R, W ) = lim − N →∞
1 log e(M, N, W ), M = 2N R , 0 < R < C(W ). N (2.7)
Shannon introduced also the notion of the zero-error capacity C0 (W ) of the channel [192], which is the smallest upper bound of transmission rates of codes for which beginning from some N there exists a code (f, g) such that e(f, g, N, W ) = 0. Thus, for R > C(W ) we have E(R, W ) = 0 and E(R, W ) = ∞ for 0 ≤ R < C0 (W ). In the case of DMC the capacity and the reliability function of the channel do not depend asymptotically on consideration of the maximal or the average error probabilities [222]. For reliability function E(R, W ) of a given DMC W , upper and lower bounds and their improvements were obtained by Elias [69], Fano [72], Dobrushin [57], Gallager [78], Forney [76], Shannon et al. [198], Haroutunian [99], Blahut [34, 35], Csisz´ar et al. [52], Jelinek [150], and others. We shall remind here expressions of the sphere packing bound Esp (R, W ) for the reliability function of DMC, the random coding Er (R, W ), and the expurgated bounds Ex (R, W ). These types of bounds refer to the techniques by which similar bounds were first obtained. The modern form of writing of the sphere packing bound [51] Esp (R, W ) was introduced by Haroutunian [99] and independently by
2.1 Channel Coding and Error Probability: Shannon’s Theorem
13
Blahut [34]: M
Esp (R, P, W ) =
min
D(V kW |P ),
(2.8)
V :IP,V (X∧Y )≤R
M
Esp (R, W ) = max Esp (R, P, W ). P
(2.9)
Theorem 2.1. For any DMC W , for all R ∈ (0, C(W )) the following inequality takes place E(R, W ) ≤ Esp (R, W ).
The proofs are in [35, 51, 99, 152], see also [178]. A similar way to (2.8) and (2.9) form of writing of the random coding bound of reliability function Er (R, W ) was introduced by Csisz´ar and K¨orner [51] and defined as M
Er (R, P, W ) = min(D(V kW |P ) + |IP,V (X ∧ Y ) − R|+ ), V
where V runs over the set of all channels V : X → Y and M
Er (R, W ) = max Er (R, P, W ). P
The improved lower bound Ex (R, W ), first obtained by Gallager [78] and called expurgated bound, in formulation of Csisz´ar and K¨orner [51] is the following: Ex (R, P, W ) =
max
e PX =PX e =P, IP,V (X∧X)≤R
e − R], e + I (X ∧ X) [EdB (X, X) P,V
Ex (R, W ) = max Ex (R, P, W ), P
e is the mutual information of RV’s X and X, e such where IP,V (X ∧ X) that P ◦ V (x, x e) = P (x)V (e x|x), and Xp M dB (x, x e) = − log W (y|x)W (y|e x) (2.10) y∈Y
is the Bhattacharyya distance [35, 50] between x and x e both from X .
14
E-capacity of the Discrete Memoryless Channel
Theorem 2.2. For any DMC W , for R ∈ (0, C(W )) the following inequality holds E(R, W ) ≥ max(Er (R, W ), Ex (R, W )).
Theorem 2.3. If the capacity C(W ) of a channel W is positive then for sufficiently small values of R we have Ex (R, W ) > Er (R, W ). The smallest value of R, at which the convex curve Esp (R, W ) meets its supporting line of slope −1, is called the critical rate and denoted by Rcr . The comparison of Esp (R, W ) and Er (R, W ) results in Corollary 2.1. For Rcr ≤ R < C(W ) the reliability function of DMC W is found exactly: E(R, W ) = Esp (R, W ) = Er (R, W ). An essential refinement of upper bound for reliability function of DMC, called linear refinement, was found by Shannon et al. [198]. A list code for a channel W is a code (f, g) such that the range of g consists of subsets of a fixed number L of messages from M. For each received vector y the decoder produces a list of L messages and an error occurs if the sent message is not included in the obtained list. List decoding has a theoretical and practical importance. The following remark made by an Unknown Reviewer pertinently emphasizes the practical usefulness of the list decoding. Remark 2.1. List-decoding exponents are important because they reveal that the random-coding and the sphere packing bounds are essentially tight at all rates for point-to-point coding, as long as we are willing to live with a small rate-dependent list size while decoding. This makes the high-reliability low-rate story much more intriguing and puts the gap with expurgation into new perspective.
2.1 Channel Coding and Error Probability: Shannon’s Theorem
15
Denote by e(N, M, L) the optimal average probability of error for the list codes of length N with the number of messages M and the list size L. As a theoretical tool the list decoding was used by Shannon et al. in [198] in the inequality e(N1 + N2 , M, L2 ) ≥ e(N1 , M, L1 )e(N2 , L1 + 1, L2 )
(2.11)
and with its help the above-mentioned refinement of upper bound for the reliability function was obtained. The graph of the typical behavior of the reliability function bounds for the DMC is given in Figure 2.2. In [100] Haroutunian proved that the reliability function E(R, W ) of DMC is a continuous and strictly monotone function of R for all R > C0 (W ). This fact follows from the inequality Esp (R1 , W ) + E(R2 , W ) R1 + R2 E ,W ≤ , (2.12) 2 2 which is valid for all R1 ≤ R2 < C(W ). The inequality (2.12) in turn follows from the inequality (2.11).
Fig. 2.2 Typical behavior of the bounds for E(R, W ) of DMC W .
16
E-capacity of the Discrete Memoryless Channel
The mentioned properties of the reliability function E(R, W ) ensure the existence of the inverse function to E(R, W ), which we denote by R(E, W ), name it E-capacity and investigate in several works.
2.2
E-capacity (Rate-Reliability Function) of DMC
The same dependence between E and R as in (2.7) can be investigated taking the reliability E for an independent variable. Let us consider the codes, error probabilities of which exponentially decrease with given exponent E: e(f, g, N, W ) ≤ exp{−N E}.
(2.13)
Denote by M (E, N, W ) the best volume of the code of length N for channel W satisfying the condition (7.14) for given reliability E > 0. Definition 2.6. The rate-reliability function, which by analogy with the capacity we call E-capacity, is M
1 log M (E, N, W ). N →∞ N
R(E, W ) = C(E, W ) = lim
(2.14)
For the DMC it was proved [100] that E(R, W ) is continuous in R function for R ∈ (C0 (W ), C(W )) and is the inverse of the function R(E, W ). As in the case of capacity, E-capacity is called maximal or average and denoted, correspondingly, C(E, W ) or C(E, W ) depending on which error probability is considered in (2.14). It is clear that when ∞ > E > 0 C0 (W ) ≤ C(E, W ) ≤ C(E, W ) ≤ C(W ). As we shall see later, obtained bounds in the case of DMC do not depend on the kind (maximal or average) of the error probability.
2.3
Sphere Packing Bound for E-capacity
Now we shall present a very simple method for derivation (it was first expounded in [102]) of the upper bound, called sphere packing bound
2.3 Sphere Packing Bound for E-capacity
17
and denoted by Rsp (E, W ), of the E-capacity C(E, W ) for the average error probability. This bound, as the name shows, is the analogue of the sphere packing bound (2.8), (2.9) for E(R, W ). Let V : X → Y be a stochastic matrix. Consider the following functions M
Rsp (P, E, W ) =
min
V :D(V kW |P )≤E
IP,V (X ∧ Y ),
M
(2.15)
Rsp (E, W ) = max Rsp (P, E, W ). P
Theorem 2.4. For DMC W , for E > 0 the following inequalities hold C(E, W ) ≤ C(E, W ) ≤ Rsp (E, W ).
Proof. Let E and δ be given such that E > δ > 0. Let the code (f, g) of length N be defined, R be the rate of the code and average error probability satisfies the condition e(f, g, N, W ) ≤ exp{−N (E − δ)}, which according to definitions (2.2) and (2.4) is 1 X N N W {Y − g −1 (m)|f (m)} ≤ exp{−N (E − δ)}. M m
(2.16)
As the number of messages M can be presented as a sum of numbers of T P codewords of different types M = P |f (M) TPN (X)|, and the number of all types P ∈ PN (X ) is less than (N + 1)|X | (see (1.1)), then there exists a “major” type P ∗ such that \ (2.17) f (M) TPN∗ (X) ≥ M (N + 1)−|X . Now in the left-hand side of (2.16) we can consider only codewords of type P ∗ and the part of output vectors y of the conditional type V X W N {TPN∗ ,V (Y |f (m)) − g −1 (m)|f (m)} m:f (m)∈TPN∗ (X)
≤ M exp{−N (E − δ)}.
18
E-capacity of the Discrete Memoryless Channel
For y ∈ TPN∗ ,V (Y |f (m)) we obtain from (1.6) that n o X \ |TPN∗ ,V (Y |f (m))| − |TPN∗ ,V (Y |f (m)) g −1 (m)| m:f (m)∈TPN∗ (X)
× W N (y|f (m)) ≤ M exp{−N (E − δ)}, or X
M exp{−N (E − δ)} exp{−N (D(V kW |P ∗ ) + HP ∗ ,V (Y |X))} \ N TP ∗ ,V (Y |f (m)) g −1 (m) .
|TPN∗ ,V (Y |f (m))| −
m:f (m)∈TPN∗ (X)
X
≤
m:f (m)∈TPN∗ (X)
It follows from the g −1 (m) are disjoint, X
definition of decoding function g that the sets therefore \ N TP ∗ ,V (Y |f (m)) g −1 (m) ≤ |TPN∗ V (Y )|.
m:f (m)∈TPN∗ (X)
Then from (1.4) we have \ N f (M) TP ∗ (X) (N + 1)−|X ||Y| exp{N HP ∗ ,V (Y |X)} − M exp{N (D(V kW |P ∗ ) + HP ∗ ,V (Y |X) − E + δ)} ≤ exp{N HP ∗ V (Y )}. Taking into account (2.17) we come to M≤
exp{N IP ∗ ,V (X ∧ Y )} . (N + 1)−|X |(|Y|+1) − exp(N (D(V kW |P ∗ ) − E + δ))
The right-hand side of this inequality can be minimized by the choice of conditional type V , keeping the denominator positive, which takes place for large N when the following inequality holds D(V kW |P ∗ ) ≤ E − δ. The statement of Theorem 2.4 follows from the definitions of R(E, W ) and Rsp (E, W ) and from the continuity in E of the function Rsp (P, E, W ).
2.3 Sphere Packing Bound for E-capacity
19
Similarly the same bound in the case of maximal error probability can be proved, but it follows also from the given proof. Example 2.1. We shall calculate Rsp (E, W ) for the binary symmetric channel (BSC). Consider BSC W with X = {0, 1}, W (00 |1) = W (10 |0) = w1 > 0,
Y = {00 , 10 }, W (00 |0) = W (10 |1) = w2 > 0.
Correspondingly, for another BSC V on the same X and Y we denote V (00 |1) = V (10 |0) = v1 ,
V (00 |0) = V (10 |1) = v2 .
It is clear that w1 + w2 = 1, v1 + v2 = 1. The maximal value of the mutual information IP,V (X ∧ Y ) in the definition of Rsp (E, W ) is obtained when p∗ (0) = p∗ (1) = 1/2 because of symmetry of the channel, therefore IP ∗ ,V (X ∧ Y ) = 1 + v1 log v1 + v2 log v2 . The condition D(V kW |P ∗ ) ≤ E will take the following form: v1 log
v1 v2 + v2 log ≤ E. w1 w2
So, the problem of extremum with restrictions must be solved (see (2.15)) max −(1 + v1 log v1 + v2 log v2 ) = − v1 log wv11 + v2 log wv22 − E = 0 v1 + v2 = 1. Using Kuhn–Tukker theorem [152, 230], we find that v 1 and v 2 give the solution of the problem if and only if there exist λ1 > 0, λ2 > 0, satisfying the following conditions: ∂ (−1 − v1 log v1 − v2 log v2 ) + λ1 ∂v∂ i (−v1 log wv11 − v2 log wv22 + E) ∂vi + λ2 ∂v∂ i (v1 + v2 − 1) = 0, i = 1, 2, λ1 v1 log wv11 + v2 log wv22 − E = 0,
20
E-capacity of the Discrete Memoryless Channel
which for v 1 and v 2 , giving maximum, are equivalent to ( log v i + log e = −λ1 log wvii + log e + λ2 , i = 1, 2, v 1 log wv11 + v 2 log wv22 = E.
(2.18)
Solving the first two equations from (2.18) we obtain λ1 1+λ1
v i = wi
2
1 (λ1 −λ2 +1) log e − 1+λ 1
,
i = 1, 2.
λ1 Let us denote 1+λ = s and remember that v 1 + v 2 = 1, then as func1 tions of parameter s ∈ (0, 1) we get ws ws v1 = s 1 s , v2 = s 2 s . w1 + w2 w1 + w2
From the third condition in (2.18) we obtain the parametric expressions for E and Rsp (E, W ), namely w1s w1s−1 w2s w2s−1 log + log , w1s + w2s w1s + w2s w1s + w2s w1s + w2s ws ws ws ws Rsp (s) = 1 + s 1 s log s 1 s + s 2 s log s 2 s . w1 + w2 w1 + w2 w1 + w2 w1 + w2 E(s) =
It is not complicated to see that we arrived to the same relation between Rsp and E as that given in Theorem 5.8.3 of the Gallager’s book [79].
2.4
Random Coding Bound for E-capacity
The result we are going to present is a modification of Theorem 2.5 for E(R, W ) from the book by Csisz´ar and K¨orner [51]. Consider the function called random coding bound for C(E, W ) M
Rr (P, E, W ) =
min
V :D(V kW |P )≤E
|IP,V (X ∧ Y ) + D(V kW |P ) − E|+ ,
M
Rr (E, W ) = max Rr (P, E, W ). P
(2.19)
Theorem 2.5. For DMC W , for all E > 0 the following bound of E-capacity holds Rr (E, W ) ≤ C(E, W ) ≤ C(E, W ).
2.4 Random Coding Bound for E-capacity
21
The proof of this theorem is based on the following modification of the packing Lemma 5.1 from [51]. Lemma 2.6. For given E > 0, δ ≥ 0, type P ∈ PN (X ) and + , M = exp N min |IP,V (Y ∧ X) + D(V kW |P ) − E − δ| V :D(V kW |P )≤E
(2.20) there exist M distinct vectors x(m) from such that for any 0 m ∈ M, any conditional types V, V , and N large enough, the following inequality is valid \ [ N N 0 TP,V (Y |x(m)) TP,V 0 (Y |x(m )) m6=m0 + N (2.21) ≤ |TP,V (Y |x(m))| exp{−N E − D(V 0 kW |P ) }. TPN (X),
Proof of Lemma 2.6. For fixed N, M , and type P , let us consider a collection of M not necessarily distinct vectors x(m) randomly taken from TPN (X). Consider the family C(M ) of all such ordered collections C = (x(1), x(2), . . . , x(M )). Notice that if some collection C satisfies (2.21) for every m, V , V 0 , then x(m) 6= x(m0 ) for m 6= m0 , m = 1, M , m0 = 1, M . To prove that, it is enough to choose V = V 0 and D(V 0 kW |P ) < E. If V 0 is such that + D(V 0 kW |P ) ≥ E, then exp − N E − D(V 0 kW |P ) = 1, and (2.21) is valid for any M . It remains to prove Lemma 2.6 for V 0 such that D(V 0 kW |P ) < E. Denote \ [ 0 N N 0 Am (C, V, V ) = TP,V (Y |x(m)) TP,V 0 (Y |x(m )) m6=m0 and Am (C) = (N + 1)|X ||Y|
X
X
V
V 0 :D(V 0 kW |P )<E 0
Am (C, V, V 0 )
× exp N (E − D(V kW |P ) − HP,V (Y |X)) . On account of (1.4) C satisfies (2.21) for every m, V, V 0 , if Am (C) ≤ 1,
m = 1, M .
(2.22)
22
E-capacity of the Discrete Memoryless Channel
If for some C ∈ C(M ) M 1 X 1 Am (C) ≤ , M 2
(2.23)
m=1
then Am (C) ≤ 1 for at least M/2 vectors x(m) ∈ C. If C 0 is a subcollection of C with such vectors then for these m Am (C 0 ) ≤ Am (C) ≤ 1. Hence, Lemma 2.6 will be proved if there exists a collection C ∈ C(M ), satisfying (2.23) with
exp N
min
V :D(V kW |P )≤E
I
+
P,V (X ∧ Y ) + D(V kW |P ) − E − δ
≤
M 2
+ 1 I ≤ exp N min . P,V (X ∧ Y ) + D(V kW |P ) − E − δ/2 2 V :D(V kW |P )≤E
(2.24) To prove that (2.23) holds for some C ∈ C(M ), it suffices to show that for a sequence of M independent RV’s X(M ) = (X(1), X(2), . . . , X(M )) uniformly distributed over TPN (X) the following inequality holds EAm (X(M )) ≤ 1/2,
m = 1, M .
To this end we observe that \ [ 0 0 N N EAm (X(M ), V, V ) = E TP,V (Y |X(m)) TP,V 0 (Y |X(m )) m6=m0 X \ [ 0 N N = Pr y ∈ TP,V (Y |X(m)) TP,V 0 (Y |X(m )) m6=m0 y∈Y N X X 0 N N ≤ Pr y ∈ TP,V (Y |X(m)) Pr{y ∈ TP,V 0 (Y |X(m ))}, m6=m0 y∈Y N
because X(m) are independent and identically distributed. Let us note that the first probability is different from zero only if N (Y ). In this case, we have for N large enough for every fixed y ∈ TP,V N (Y ) y ∈ TP,V N (X|y)| |TP,V N Pr y ∈ TP,V (Y |X(m)) = |TPN (X)|
≤ (N + 1)|X | exp{−N IP,V (X ∧ Y )}.
2.4 Random Coding Bound for E-capacity
23
The second probability can be estimated in the same way. At last we obtain \ [ N N 0 E TP,V (Y |X(m)) TP,V 0 (Y |X(m )) m6=m0 N ≤ (N + 1)2|X | (M − 1)|TP,V (Y )|
× exp{−N (IP,V (Y ∧ X) + IP,V 0 (Y ∧ X))}. From (2.24) for any V 0 such that D(V 0 kW |P ) < E it follows that M − 1 ≤ exp{N (IP,V 0 (X ∧ Y ) + D(V 0 kW |P ) − E − δ/2}, and we get EAm (X(M ), V, V 0 ) × exp{N (E − D(V 0 kW |P ) − HP,V (Y |X))} ≤ (N + 1)2|X | exp{−N δ/2}, or EAm (X(M )) ≤ (N + 1)2|X |+3|X ||Y| exp{−N δ/2}, because of Lemma 1.1 (2.23) is valid for N large enough. We shall use Lemma 2.6 to prove Theorem 2.5. Proof. By Lemma 2.6 the existence of M vectors x(m) satisfying (2.20) and (2.21) is guaranteed for any E, P , and δ. Let us apply the decoding rule for decoder g using criterion of the minimum divergence (see Section 2.6): each y is decoded to such m0 that for some V 0 N 0 0 y ∈ TP,V 0 (Y |x(m )) and D(V kW |P ) is minimal.
The decoder g can make an error if the message m was transmitted, but there exists m0 6= m, such that for some V 0 \ N N 0 y ∈ TP,V (Y |x(m)) TP,V 0 (Y |x(m )) and D(V 0 kW |P ) ≤ D(V kW |P ).
(2.25)
24
E-capacity of the Discrete Memoryless Channel
Denote M D(P ) = V, V 0 : (2.25) is valid . We estimate the error probability e(m) in the following way: [ \ [ N N N 0 e(m) ≤ W TP,V (Y |x(m)) TP,V 0 (Y |x(m )) | x(m) . 0 D(P )
m6=m
Taking into account that the PD W N (y|x(m)) is constant for fixed P, V (see (1.6)) the last expression can be upper bounded by X \ [ N N 0 TP,V (Y |x(m)) TP,V 0 (Y |x(m )) W N (y|x(m)) . m6=m0 D(P ) Now from (2.21) using (1.6) we obtain that for maximal error probability for sufficiently large N , X e(f, g, N, W ) ≤ exp {N HP,V (Y |X)} D(P )
× exp −N (E − D(V 0 kW |P )) × exp {−N (HP,V (Y |X) + D(V kW |P ))} ≤ exp{−N (E − )}. The last inequality is valid because the number of all possible V, V 0 from D(P ) does not exceed (N + 1)2|X ||Y| .
2.5
Expurgated Bound for E-capacity
Now we shall formulate another lower bound called expurgated bound for E-capacity, the proof will be sketched in Section 2.6, it repeats all steps of analogous demonstration for reliability function E(R, W ), made by Csisz´ar and K¨orner by the method of graph decomposition [50]. Consider the following function: M
e + EdB (X, X) e − E|+ , Rx (P, E, W ) = min |IP,V (X ∧ X) V
e is the Bhattacharyya distance (2.10) and where dB (X, X) M
Rx (E, W ) = max Rx (P, E, W ). P
2.6 Random Coding and Expurgated Bounds Derivation
25
Theorem 2.7. For DMC W for any E > 0 the following bound holds R(E, W ) ≥ max(Rr (E, W ), Rx (E, W )). In the next theorem the region, where the upper and the lower bounds coincide, is pointed out. Let ∂Rsp (P, E, W ) Ecr (P, W ) = min E : ≥ −1 . ∂E Theorem 2.8. For DMC W and PD P , for E ∈ [0, Ecr (P, W )] we have R(P, E, W ) = Rsp (P, E, W ) = Rr (P, E, W ), and, particularly, for E = 0 Rsp (P, 0, W ) = Rr (P, 0, W ) = IP,W (X ∧ Y ).
2.6
Random Coding and Expurgated Bounds Derivation by the Method of Graph Decomposition
Alternative methods for existence part of coding theorems demonstration are Shannon’s random coding method (see Section 2.4) and Wolfowitz’s method of maximal codes [222]. In [50] Csisz´ar and K¨orner introduced a new original method, based on the lemma of Lov´asz on graph decomposition. We now present an account of scheme of analogous derivation of random coding and expurgated bounds for E-capacity R(E, W ) of DMC, which was first expounded in [116] and appeared in [107]. Theorems 2.7 and 2.8 formulated in Section 2.5 are consequences of the following Theorem 2.9. For DMC W : X → Y, any E > δ > 0 and type P ∈ PN (X ) for sufficiently large N codes (f, g) exist such that exp{−N (E + δ)} ≤ e(f, g, N, W ) ≤ exp{−N (E − δ)}
(2.26)
26
E-capacity of the Discrete Memoryless Channel
and R(f, g, N ) ≥ max(Rr (P, E + δ, W ), Rx (P, E + δ, W )).
The proof of Theorem 2.9 consists of a succession of lemmas. The first one is Lemma 2.10. For any type P , for any r ∈ (0, |TPN (X)|) a set C exists, e ∈ C and matrix V : X → X , such that C ⊂ TPN (X), |C| ≥ r and for any x different from the identity matrix, the following inequality holds \ N N x) C ≤ r TP,V (X|e x) exp{−N (HP (X) − δN )}, (2.27) TP,V (X|e where δN = N −1 [(|X |2 + |X |) log(N + 1) + 1]. For demonstration of code existence theorems various “good” decoding rules may be considered. The definitions of those rules may apply different real-valued functions α with domain X N × Y N . One says that gα decoding is used if to each y from Y N at the output of the channel the message m is accepted when codeword x(m) minimizes α(x(m), y). One uses such functions α which depend only on the type P of x and the conditional type V of y for given x. Such functions α can be written in the form α(P, V ) and at respective decoding gα : Y N → MN , the message m corresponds to the vector y, if N α(P, V ) = min α(P, Ve ), y ∈ TP,V (Y |x(m)) Ve
\
TP,NVe (Y |e x(m)).
M Here Ve = {Ve (y|e x), x e ∈ X , y ∈ Y} is a matrix different from V but guaranteeing that X X P (x)V (y|x) = P (e x)Ve (y|x), y ∈ Y, (2.28) x∈X
or equivalently, P V = P Ve .
x e∈X
2.6 Random Coding and Expurgated Bounds Derivation
27
Previously the following two rules were used [50]: maximumlikelihood decoding, when accepted codeword x(m) maximizes transition probability W N (x(m)|y), in this case according to (1.6) α(P, V ) = D(V kW |P ) + HP,V (Y |X),
(2.29)
and the second decoding rule, called minimum-entropy decoding, according to which the codeword x(m) minimizing HP,V (Y |X) is accepted, that is α(P, V ) = HP,V (Y |X).
(2.30)
In [108] and [116] another decoding rule was proposed by minimization of α(P, V ) = D(V kW |P ),
(2.31)
which can be called minimum-divergence decoding. e e Let Ve = {Ve (y|x, x e), x ∈ X , x e ∈ X , y ∈ Y}, be conditional distribue such that tion of RV Y given values of RV X and RV X X e x)Ve (y|x, x e) = P (x)V (y|x), (2.32) P (e x)V (x|e x e
X
e P 0 (x)V (e x|x)Ve (y|x, x e) = P (e x)Ve (y|e x).
(2.33)
x
Following the notation from [50] we write Ve ≺α V, if α(P, Ve ) ≤ α(P, V ), and P Ve = P V. Let us denote M
Rα (P, E, W ) =
min
e
V ,Ve :Ve ≺α V,D(V kW |P )≤E
n e IP,V (X ∧ X) o
+ e e (Y ∧ X|X) + D(V kW |P ) − E| | , P,V ,Ve
+ |I
(2.34)
e Y have values, correspondingly, on X , X , Y such that where RV X, X, the following is valid: e have distribution P and P V = P , both X and X e e Ve is the conditional distribution of RV Y given X and X satisfying (2.33), (2.34), and (2.28).
28
E-capacity of the Discrete Memoryless Channel
The main point of the theorem’s demonstration is Proposition 2.11. For any DMC W , any type P ∈ PN (X ), any 0 > 0, for all sufficiently large N codes (f, g ) exist E > 0, δN > 0, δN α such that 0 0 exp{−N (E + δN /2)} ≤ e(f, gα , N, W ) ≤ exp{−N (E − δN /2)}, (2.35)
and 0 R(P, f, gα , N ) ≥ Rα (P, E + δN , W ).
Remark 2.2. The functions Rα (P, E, W ) and Rx (P, E, W ) depend on E continuously.
Lemma 2.12. Let us introduce the following functions M
Rα,r (P, E, W ) =
M
Rα,x (P, E, W ) =
min
e min |IP,Ve (Y ∧ X)
V :D(V kW |P )≤E Ve :Ve ≺α V
min e
V ,V,Ve :Ve ≺α V
+ D(V kW |P ) − E|+ , n e e + I IP,V (X ∧ X) e e (Y ∧ X|X) P,V ,V
+ o + D(V kW |P ) − E . Then Rα (P, E, W ) ≥ max[Rα,x (P, E, W ), Rα,r (P, E, W )].
Lemma 2.13. A point Eα∗ (P, W ) exists, such that max[Rα,x (P, E, W ), Rα,r (P, E, W )] Rα,r (P, E, W ), when E ≤ Eα∗ (P, W ), = Rα,x (P, E, W ), when E ≥ Eα∗ (P, W ).
2.7 Comparison of Bounds for E-capacity
29
Lemma 2.14. For each α-decoding (2.29), (2.30), and (2.31) Rα,x (P, E, W ) ≤ Rx (P, E, W ), moreover, for maximum likelihood decoding (2.29) the equality holds.
Lemma 2.15. For each α-decoding Rα,r (P, E, W ) ≤ Rr (P, E, W ), moreover, for — maximum likelihood decoding, — minimum entropy decoding, — minimum divergence decoding, the equality holds. The proof of Theorems 2.9 and 2.7 constitutes the unification of Lemmas 2.10–2.15 and Proposition 2.11.
2.7
Comparison of Bounds for E-capacity
Lemma 2.16. For given DMC W , for type P and numbers 0 ≤ E 0 ≤ E Rr (P, E, W ) = 0min |Rsp (P, E 0 , W ) + E 0 − E|+ . 0 E :E ≤E
Proof. Applying definitions (2.17) and (2.13) we see Rr (P, E, W ) = = =
min
V :D(V kW |P )≤E
min
|IP,V (X ∧ Y ) + D(V kW |P ) − E|+
min
E 0 :E 0 ≤E V :D(V kW |P )=E 0
|IP,V (X ∧ Y ) + E 0 − E|+
min |Rsp (P, E 0 , W ) + E 0 − E|+ .
E 0 :E 0 ≤E
30
E-capacity of the Discrete Memoryless Channel
Lemma 2.17. Involving ∂Rsp (P, E, W ) Ecr = Ecr (P, W ) = min E : ≥ −1 . ∂E we can write for all E > 0 ( Rsp (P, E, W ), if E ≤ Ecr , Rr (P, E, W ) = |Rsp (P, Ecr , W ) + Ecr − E|+ ,
if E ≥ Ecr .
Proof. Since the function Rsp (P, E, W ) is convex in E, then for the values of E less than Ecr (P, W ) the tangency of the tangent is less than −1, and for E greater than Ecr (P, W ), it is equal or greater than −1. In other words Rsp (P, E, W ) − Rsp (P, E 0 , W ) < −1, E − E0
when E 0 < E ≤ Ecr ,
from where Rsp (P, E, W ) + E < Rsp (P, E 0 , W ) + E 0 , and consequently min
E 0 :E 0 ≤E<Ecr (P,W )
Rsp (P, E, W ) + E 0 = Rsp (P, E, W ) + E.
We obtain from this equality and Lemma 2.16 the statement of the lemma for the case E ≤ Ecr (P, W ). Now if Ecr (P, W ) ≤ E 0 < E then Rsp (P, E, W ) − Rsp (P, E 0 , W ) ≥ −1, E − E0 or Rsp (P, E, W ) + E ≥ Rsp (P, E 0 , W ) + E 0 , and consequently min
E 0 :Ecr ≤E 0
Rsp (P, E 0 , W ) + E 0 = Rsp (P, Ecr , W ) + Ecr .
2.7 Comparison of Bounds for E-capacity
31
Again using Lemma 2.16 and the latest equality we obtain that for the case E ≥ Ecr n Rr (P, E, W ) = min 0 min 0 |Rsp (P, E 0 , W ) + E 0 − E|+ , E :Ecr ≤E <E o 0 0 + min |R (P, E , W ) + E − E| sp 0 0 E :E ≤Ecr
= |Rsp (P, Ecr , W ) + Ecr − E|+ .
Remark 2.3. At the point E = 0 we have Rsp (P, 0, W ) = Rr (P, 0, W ) = IP,W (X ∧ Y ), Rsp (0, W ) = Rr (0, W ) = C(W ).
Remark 2.4. In the interval (0, Ecr (P, W )] the functions R(P, E, W ) and R(E, W ) are exactly determined by R(P, E, W ) = Rsp (P, E, W ) = Rr (P, E, W ), and R(E, W ) = Rsp (E, W ) = Rr (E, W ). So Theorem 2.8 is proved.
3 Multiuser Channels
3.1
Two-Way Channels
The two-way channel (TWC) was first investigated by Shannon [196]. The channel has two terminals and the transmission in one direction interferes with the transmission in the opposite direction. The sources of two terminals are independent. In the general model of TWC the encoding at each terminal depends on both the message to be transmitted and the sequence of symbols received at that terminal. Similarly the decoding at each terminal depends on the sequence of symbols received and sent at that terminal. Here we shall consider the restricted version of TWC, where the transmitted sequence from each terminal depends only on the message but does not depend on the received sequence at that terminal (Figure 3.1). A discrete restricted two-way channel (RTWC) with two terminals is defined by a matrix of transition probabilities WT = {W (y1 , y2 |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y1 ∈ Y1 , y2 ∈ Y2 }, where X1 , X2 are the finite input and Y1 , Y2 are the finite output alphabets of the channel. The channel is supposed to be memoryless, 33
34
Multiuser Channels
Fig. 3.1 Restricted two-way channel.
it means that for N -length sequences x1 = (x11 , x12 , . . . , x1N ) ∈ X1N , y1 = (y11 , y12 , . . . , y1N ) ∈ Y1N ,
x2 = (x21 , x22 , . . . , x2N ) ∈ X2N , y2 = (y21 , y22 , . . . , y2N ) ∈ Y2N ,
the transition probabilities are given in the following way: W N (y1 , y2 |x1 , x2 ) =
N Y
W (y1n , y2n , |x1n , x2n ).
n=1
Denote Wi (yi |x1 , x2 ) =
X
W (y1 , y2 |x1 , x2 ),
i = 1, 2.
y3−i
When the symbol x1 ∈ X1 is sent from the terminal 1, the corresponding output symbol y1 ∈ Y1 arrives on the terminal 2. At the same time the input symbol x2 is transmitted from the terminal 2 and the symbol y2 arrives on the terminal 1. Let M1 = {1, 2, . . . , M1 }, M2 = {1, 2, . . . , M2 } be the message sets of corresponding sources. The code for RTWC is a collection of mappings (f1 , f2 , g1 , g2 ), where f1 : M1 → X1N , f2 : M2 → X2N are the encodings and g1 : M2 × Y1N → M1 , g2 : M1 × Y2N → M2 are the decodings. The numbers 1 log Mi , i = 1, 2, N are called code rates. Denote f (m1 , m2 ) −1 gi (mi |m3−i )
= (f1 (m1 ), f2 (m2 )), = {yi : gi (m3−i , yi ) = mi },
i = 1, 2,
3.1 Two-Way Channels
35
then ei (m1 , m2 ) = WiN {YiN − gi−1 (mi |m3−i )|f (m1 , m2 )},
i = 1, 2, (3.1)
are the error probabilities of messages m1 and m2 . We shall consider the average error probabilities of the code X 1 ei (m1 , m2 ), i = 1, 2. (3.2) ei (f, g, N, WT ) = M1 M2 m ,m 1
2
Let E = (E1 , E2 ), Ei > 0, i = 1, 2. Nonnegative numbers R1 , R2 are said to be E-achievable rate pair for RTWC, if for any δi > 0, i = 1, 2, there exists a code such that for sufficiently large N 1 log Mi ≥ Ri − δi , N
i = 1, 2,
and ei (f, g, N, WT ) ≤ exp{−N Ei },
i = 1, 2.
(3.3)
The region of all E-achievable rate pairs is called the E-capacity region for average error probability and denoted by C(E, WT ). The RTWC as well as the general TWC were first investigated by Shannon [196], who obtained the capacity region C(WT ) of the RTWC. The capacity region of the general TWC has not been found up to now. Important results relative to various models of two-way channels were obtained by authors of works [1, 2, 3, 61, 62, 144, 188, 228]. In particular, Dueck [61] demonstrated that the capacity regions of TWC for average and maximal error probabilities do not coincide. Here the outer and inner bounds for C(E, WT ) are constructed. Unfortunately, these bounds do not coincide and the liquidation of this difference is a problem waiting for its solution. Consider the following PDs: P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 }, X Pi = Pi (xi ) = P (x1 , x2 ), xi ∈ Xi , x3−i
i = 1, 2,
36
Multiuser Channels
P ∗ = {P ∗ (x1 , x2 ) = P1 (x1 )P2 (x2 ), x1 ∈ X1 , x2 ∈ X2 }, P ◦ Vi = {P (x1 , x2 )Vi (yi |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , yi ∈ Yi }, i = 1, 2, where V1 , V2 are probability matrices. Let us introduce notations for outer and inner bounds of the E-capacity region. The following sphere packing region Rsp (E, WT ) in the coordinate space R1 , R2 will serve as an outer bound of C(E, WT ). Denote n Rsp (P, E, WT ) = (R1 , R2 ) : 0 ≤ R1 ≤ 0 ≤ R2 ≤
min
V1 :D(V1 kW1 |P )≤E1
min
V2 :D(V2 kW2 |P )≤E2
IP,V1 (X1 ∧ Y1 |X2 ), (3.4) o IP,V2 (X2 ∧ Y2 |X1 ) , (3.5)
and ! Rsp (E, WT ) = co
[
Rsp (P, E, WT ) .
P
Hereafter we shall denote the convex hull of the set A as co(A). The following random coding region Rr (E, WT ) will serve as an inner bound for C(E, WT ). Let n Rr (P ∗ , E, WT ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, R1 ≤
min
P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
|IP,V1 (X1 ∧ X2 , Y1 )
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ , R2 ≤
min
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
|IP,V2 (X2 ∧ X1 , Y2 )
o + D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ , and ! Rr (E, WT ) = co
[
Rr (P ∗ , E, WT ) .
P∗
The following theorem is proved in [118].
(3.6)
(3.7)
3.1 Two-Way Channels
37
Theorem 3.1. For all E = (E1 , E2 ), E1 > 0, E2 > 0, Rr (E, WT ) ⊆ C(E, WT ) ⊆ Rsp (E, WT ).
Corollary 3.1. The limit of Rr (E, WT ), when E1 → 0, E2 → 0, coincides with the capacity region of RTWC found by Shannon [196]: ! [ Rr (WT ) = co Rr (P ∗ , WT ) , P∗
where Rr (P ∗ , WT ) = {(R1 , R2 ) : 0 ≤ R1 ≤ IP ∗ ,W1 (X1 ∧ Y1 |X2 ), 0 ≤ R2 ≤ IP ∗ ,W2 (X2 ∧ Y2 |X1 )}. The proof of the lower bound is based on a modification of the packing Lemma 2.6. Example 3.1. Assume that the channel is binary: X1 = X2 = Y1 = Y2 = {0, 1}, and E1 = E2 . Consider the channel, with the following matrices of transition probabilities: 0.9 0.1 0.8 0.2 W1 : x2 = 0 : , x2 = 1 : , 0.2 0.8 0.3 0.7 0.9 0.1 0.8 0.2 W2 : x1 = 0 : , x1 = 1 : . 0.3 0.7 0.1 0.9 Let us compare the inner and the outer bounds of the capacity region, which are represented correspondingly in Figures 3.1(a) and 3.1(b). The bounds of capacity region do not differ visually, but using the special software, it was found, that the region of outer bound contains points which do not belong to the region of inner bound. This means, that really Rr (E, WT ) ⊂ Rsp (E, WT ). An average difference of rates R2
38
Multiuser Channels
Fig. 3.2 (a) The outer bound of the capacity region. (b) The inner bound on the capacity region.
for those points in the same interval of rates R1 approximately equals 0.000675025. Now consider the inner and outer bounds of E-capacity region, that are represented in Figures 3.3(a) and 3.3(b), respectively. The critical value of reliability is Ecr ≈ 0.051, starting from which the difference between inner and outer bounds grows faster, and for E ≈ 0.191 the inner bound becomes 0.
3.2
Interference Channels
Shannon [196] considered also another version of the TWC, where the transmission of information from one sender to its corresponding receiver may interfere with the transmission of information from other sender to its receiver, which was later called interference channel (IFC). The general interference channel (GIFC) with two input and two output terminals is depicted in Figure 3.4. The GIFC differs from the TWC in two respects: the sender at each terminal do not observe the outputs at that terminal and there is no side information at the receivers. Ahlswede [2, 3] obtained bounds for capacity region of GIFC. The papers of Carleial [39, 40, 41] are also devoted to the investigation of IFC, definite results are derived in a series of other works
3.2 Interference Channels
Fig. 3.3 (a) The inner bound of E-capacity. (b) The outer bound of E-capacity.
39
40
Multiuser Channels
Fig. 3.4 General interference channel.
[25, 44, 66, 90, 96, 169, 189, 203] but the capacity region is found only in particular cases. The main definitions are the same as for TWC except the error probabilities of messages m1 and m2 , which are ei (m1 , m2 ) = WiN {YiN − gi−1 (mi )|f (m1 , m2 )},
i = 1, 2,
because the decoding functions are gi : YiN → Mi , i = 1, 2. In [129] the following theorem is proved. Theorem 3.2. For all E = (E1 , E2 ), E1 > 0, E2 > 0 the following region n Rr (P ∗ , E, WI ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, R1 ≤
min
|IP,V1 (Y1 ∧ X1 )
P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ , R2 ≤
|IP,V2 (Y2 ∧ X2 ) o + D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ . [ Rr (E, WI ) = Rr (P ∗ , E, WI ) min
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
P∗
is the random coding bound of E-capacity region in the case of average error probability for GIFC: Rr (E, WI ) ⊆ C(E, WI ).
3.2 Interference Channels
41
The proof is based on another version of the packing Lemma 2.6. Lemma 3.3. For any E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P ∗ on X1 × X2 , if 0≤
1 |IP,V1 (Y1 ∧ X1 ) log M1 ≤ min N P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1 + D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ − δ,
0≤
1 log M2 ≤ N
min
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
|IP,V2 (Y2 ∧ X2 )
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ − δ, then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 ) and M2 vectors x2 (m2 ) ∈ TP2 (X2 ) such that for all P : X1 → X2 , P 0 : X1 → X2 , Vi : (X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for sufficiently large N the following inequalities take place X \ [ 0 0 TP,V (Y1 |f (m1 , m2 )) T 0 ,V 0 (Y1 |f (m1 , m2 )) P 1 1 f (m1 ,m2 )∈TP (X1 ,X2 ) m02 ; m01 6=m1 + ≤ exp{N HP,V1 (Y1 |X1 X2 )} exp{−N E1 − D(P 0 ◦ V10 kP ∗ ◦ W1 ) } × M1 M2 exp{−N (D(P kP ∗ ) − δ)}, \ [ 0 0 TP,V (Y2 |f (m1 , m2 )) T 0 ,V 0 (Y2 |f (m1 , m2 )) P 2 2 f (m1 ,m2 )∈TP (X1 ,X2 ) m01 ; m02 6=m2 + ≤ exp{N HP,V2 (Y2 |X1 X2 )} exp{−N E2 − D(P 0 ◦ V20 kP ∗ ◦ W2 ) } X
× M1 M2 exp{−N (D(P kP ∗ ) − δ)}. Now consider the situation when the second encoder learns from the first encoder the codeword, that will be sent in the present block. This model called IFC with cribbing encoders is given in Figure 3.5. In this case the second codeword depends on the choice of the first one: f2 : M2 × X1N → X2N and the random coding bound of E-capacity
42
Multiuser Channels
Fig. 3.5 Interference channel with cribbing encoders.
region for average error probability in Theorem 3.2 will take the following form: n Rr (P, E, W ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, R1 ≤
min
V1 :D(V1 kW1 |P )≤E1
|IP,V1 (Y1 ∧ X1 ) + D(V1 kW1 |P ) − E1 |+ ,
R2 ≤
min
|IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
V2 :D(V2 kW2 |P )≤E2
o + D(V2 kW2 |P ) −E2 |+ . Rr (E, W ) =
[
Rr (P, E, W ).
P
In this case the following lemma is applied: Lemma 3.4. For all E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P , if 1 log M1 ≤ N 1 log M2 ≤ N
+
min
|IP,V1 (Y1 ∧ X1 ) + D(V1 kW1 |P ) − E1 | − δ,
min
|IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
V1 :D(V1 kW1 |P )≤E1
V2 :D(V2 kW2 |P )≤E2
+ D(V2 kW2 |P ) − E2 |+ − δ,
then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 ) and for each x1 (m1 ) ∈ TP1 (X1 ) there exist M2 not necessarily distinct vectors x2 (m2 , x1 (m1 )) ∈ TP2 (X2 |x1 (m1 )) such that for all Vi : (X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for sufficiently large N
3.3 Broadcast Channels
43
the following inequalities take place \ [ [ X 1 N N 0 0 TP,V (Y |f (m , m )) T (Y |f (m , m )) 0 1 1 2 1 1 2 P,V1 1 M1 M2 m ,m 1 2 m01 6=m1 m02 + ≤ exp{N HP,V1 (Y1 |X1 X2 )} exp{−N E1 − D(V10 kW1 |P ) }, \ [ [ X 1 0 0 TP,V (Y2 |f (m1 , m2 )) TP,V20 (Y2 |f (m1 , m2 )) 2 M1 M2 m ,m 1 2 m02 6=m2 m01 + ≤ exp{N HP,V2 (Y2 |X1 X2 )} exp{−N E2 − D(V20 kW2 |P ) }. Here we omit the proofs of Theorem 3.2, Lemmas 3.3 and 3.4 leaving them as tasks to the reader.
3.3
Broadcast Channels
BC is a communication system in which there is one encoder and two or more receivers. We shall consider the BC WB with two receivers (see Figure 3.6). There are three sources, two private and one common, a finite input alphabet X and two finite output alphabets Y1 , Y2 and the probability transition function W (y1 , y2 |x). Denote by W1 , W2 the marginal transition probabilities: X X W1 (y1 |x) = W (y1 , y2 |x), W2 (y2 |x) = W (y1 , y2 |x). y2 ∈Y2
Fig. 3.6 Broadcast channel.
y1 ∈Y1
44
Multiuser Channels
The three sources create messages m0 , m1 , m2 from the corresponding finite message sets M0 , M1 , M2 . The messages must be coded by common encoder into one codeword and transmitted through channel W to two receivers. A code is a triple of mappings (f, g1 , g2 ), where f : M0 × M1 × M2 → X N is the encoding and g1 : Y1N → M0 × M1 , g2 : Y2N → M0 × M2 are the decodings. The code rates are N −1 log Mi , i = 0, 1, 2. i We denote by Dm = gi−1 (m0 , mi ) the set of all yi which are 0 ,mi decoded, correspondingly, into m0 , mi , i = 1, 2. These sets are disjoint and [ i Dm = YiN , i = 1, 2. 0 ,mi m0 ,mi
The decoding error probabilities at two outputs, when messages m0 , m1 , m2 are transmitted, depend on the code and N -dimensional matrices of transition probabilities WiN , i = 1, 2, and are defined by the condition of absence of memory: [ i ei (m0 , m1 , m2 ) = WiN Dm , 0 ,m0 |f (m0 , m1 , m2 ) 0 i 0 0 (m0 ,mi )6=(m0 ,mi )
i = 1, 2. For the code (f, g1 , g2 ) we consider maximal ei (f, gi , N, WB ) =
max ei (m0 , m1 , m2 ),
m0 ,m1 ,m2
i = 1, 2,
(3.8)
and average ei (f, gi , N, WB ) =
X 1 ei (m0 , m1 , m2 ), M0 M1 M2 m ,m ,m 0
1
i = 1, 2, (3.9)
2
error probabilities. Let E = (E1 , E2 ), Ei > 0, i = 1, 2. Nonnegative real numbers R0 , R1 , R2 are called E-achievable rates triple for GBC if for any δ > 0 and sufficiently large N there exists a code such that 1 log M0 Mi ≥ R0 + Ri − δ, N
i = 1, 2,
(3.10)
3.3 Broadcast Channels
45
and ei (f, gi , N, WB ) ≤ exp{−N Ei },
i = 1, 2.
(3.11)
E-capacity region C(E, WB ) for maximal error probability is the set of all E-achievable rates triples. We consider also the E-capacity region C(E, WB ) for the case of average error probabilities in (3.11). BC were first studied by Cover [45]. The capacity region of deterministic BC was found by Pinsker [180] and Marton [172]. The capacity region of the BC with one deterministic component was defined independently by Marton [172] and Gelfand and Pinsker [85]. A series of works are devoted to the solution of the problem for the asymmetric BC [46, 153, 210], the capacity region was found in [153]. Despite the fact that in many works several models of BC were considered ([20, 31, 68, 80, 83, 84, 88, 91, 154, 159, 183, 190] and others), the capacity region of BC in the situation, when two private and one common messages must be transmitted, has not yet been found. Bounds for the rate-reliability functions were not specified either. In [172] an inner bound for the capacity region of BC was found. Outer bounds for capacity region of BC without common message (when R0 = 0) were found in [45] and in [190]. New bounds for capacity of BC are obtained by Liang and Kramer [162]. In [218] Willems proved that the capacity regions of BC for maximal and average error probabilities are the same. The works [35, 47, 48, 51, 58, 152, 211] contain detailed surveys. Here an inner bound is formulated for the E-capacity region of the BC obtained in [130]. When E → 0 we obtain the random coding bound of the capacity region C(WB ), which coincides with the bound obtained in [172]. Let U0 , U1 , U2 be some finite sets. Consider RVs U0 , U1 , U2 , X, Y1 , Y2 with values, correspondingly, in U0 , U1 , U2 , X , Y1 , Y2 and with joint PD Q ◦ P ◦ Vi = {Q ◦ P ◦ Vi (u0 , u1 , u2 , x, yi ) = Q(u0 , u1 , u2 ) ×P (x|u0 , u1 , u2 )Vi (yi |x), i = 1, 2},
46
Multiuser Channels
where Q = {Q(u0 , u1 , u2 ), ui ∈ Ui , i = 0, 1, 2}, P = {P (x|u0 , u1 , u2 ), x ∈ X , ui ∈ Ui , i = 0, 1, 2}, Vi = {Vi (yi |x), x ∈ X , yi ∈ Yi }, i = 1, 2. So we have Markov chains (U0 , U1 , U2 ) → X → Yi , i = 1, 2. Let us denote Vi (Q, P, Ei ) = {Vi : D(Vi kWi |Q, P ) ≤ Ei },
i = 1, 2.
(3.12)
To formulate the inner bound of E-capacity region we consider the following inequalities for i = 1, 2: + 0 ≤ R0 ≤ min min |IQ,P,Vi (Yi ∧ U0 ) + D(Vi kWi |Q, P ) − Ei | , i
Vi ∈Vi (Q,P,Ei )
(3.13) 0 ≤ Ri ≤
min
Vi ∈Vi (Q,P,Ei )
|IQ,P,Vi (Yi ∧ Ui |U0 ) + D(Vi kWi |Q, P ) − Ei |+ , (3.14)
0 ≤ R3−i ≤
min
V3−i ∈V3−i (Q,P,E3−i )
|IQ,P,V3−i (Y3−i ∧ U3−i |U0 )
+ D(V3−i kW3−i |Q, P ) − E3−i |+ − IQ (U1 ∧ U2 |U0 ), (3.15) and the regions Rir (Q, P, E, WB ) = {(R0 , R1 , R2 ) : inequalities (3.13), (3.14), (3.15) take place for some (U0 , U1 , U2 ) → X → Yi , i = 1, 2} , [ Rr (Q, P, E, WB ) = Rir (Q, P, E, WB ), i=1,2
Rr (E, WB ) =
[
Rr (Q, P, E, WB ).
QP ∈QP(U0 ×U1 ×U2 ×X )
The following result is obtained in [130].
(3.16)
3.4 Multiple-Access Channels
47
Theorem 3.5. For all E1 > 0, E2 > 0 the region Rr (E, WB ) is an inner estimate for E-capacity region of BC: Rr (E, WB ) ⊆ C(E, WB ) ⊆ C(E, WB ), or in other words any rate triple (R0 , R1 , R2 ) ∈ Rr (E, WB ) is E-achievable for the BC WB . In [88] Hajek and Pursley conjectured that the achievable rates region of BC will not change if we consider |U0 | ≤ min(|X |, max(|Y1 |, |Y2 |)) and |Ui | ≤ 1 + |U0 | · (min(|X |, |Yi |) − 1),
i = 1, 2.
Corollary 3.2. When E1 → 0, E2 → 0, using time sharing arguments we obtain from (3.16) the inner bound for the capacity region C(WB ) of BC, obtained by Marton [172]: [ Rr (WB ) = Rr (Q, P, WB ), QP ∈QP(U0 ×U1 ×U2 ×X )
where Rr (Q, P, WB ) = {(R0 , R1 , R2 ) : for some (U0 , U1 , U2 ) → X → Yi }, i = 1, 2, 0 ≤ R0 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )} , 0 ≤ R0 + Ri ≤ IQ,P,Wi (Yi ∧ U0 Ui ),
i = 1, 2,
R0 + R1 + R2 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )} + IQ,P,W1 (Y1 ∧ U1 |U0 ) + IQ,P,W2 (Y2 ∧ U2 |U0 ) − IQ (U1 ∧ U2 |U0 ).
3.4
Multiple-Access Channels
The discrete memoryless multiple-access channel (MAC) with two encoders and one decoder WM = {W : X1 × X2 → Y} is defined by a
48
Multiuser Channels
matrix of transition probabilities WM = {W (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, where X1 and X2 are the finite alphabets of the first and the second inputs of the channel and Y is the finite output alphabet. There exist various configurations of the MAC [2, 185, 199, 217]. The most general model of the MAC, the MAC with correlated encoder inputs, was first studied by Slepian and Wolf [199] and then by Han [89]. Three independent sources create messages to be transmitted by two encoders (Figure 3.7). One of the sources is connected with both encoders and each of the two others is connected with only one of the encoders. Let M0 = {1, 2, . . . , M0 }, M1 = {1, 2, . . . , M1 } and M2 = {1, 2, . . . , M2 } be the message sets of corresponding sources. The code of length N for this model is a collection of mappings (f1 , f2 , g), where f1 : M0 × M1 → X1N , f2 : M0 × M2 → X2N are the encodings and g : Y N → M0 × M1 × M2 is the decoding. The numbers N −1 log Mi , i = 0, 1, 2, are called code rates. Denote f1 (m0 , m1 ) = x1 (m0 , m1 ), f2 (m0 , m2 ) = x2 (m0 , m2 ), g −1 (m0 , m1 , m2 ) = {y : g(y) = (m0 , m1 , m2 )}, then e(m0 , m1 , m2 ) = W N {Y N − g −1 (m0 , m1 , m2 )|f1 (m0 , m1 ), f2 (m0 , m2 )}, (3.17)
Fig. 3.7 MAC with correlated encoder inputs.
3.4 Multiple-Access Channels
49
are the error probabilities of messages m0 , m1 , and m2 . We consider the average error probabilities of the code X 1 e(f1 , f2 , g, N, WM ) = e(m0 , m1 , m2 ). (3.18) M0 M1 M2 m ,m ,m 0
1
2
Dueck [61] has shown that in general the maximal error capacity region of MAC is smaller than the corresponding average error capacity region. Determination of the maximal error capacity region of the MAC in various communication situations is still an open problem. In [199] the achievable rates region of MAC with correlated sources was found, the random coding bound for reliability function was constructed and in [101] a sphere packing bound was obtained. Various bounds for error probability exponents and related results have been derived also in [64, 65, 81, 166, 182], E-capacity region was investigated in [119, 132]. To formulate the results let us introduce an auxiliary RV U with values in a finite set U. Let RVs U, X1 , X2 , Y with values in alphabets U, X1 , X2 , Y, respectively, form the following Markov chain: U → (X1 , X2 ) → Y and be given by the following PDs: P0 = {P0 (u), u ∈ U}, Pi∗ = {Pi∗ (xi |u), xi ∈ Xi }, ∗
P =
i = 1, 2,
{P0 (u)P1∗ (x1 |u)P2∗ (x2 |u), x1
∈ X1 , x2 ∈ X2 },
P = {P (u, x1 , x2 ) = P0 (u)P (x1 , x2 |u), x1 ∈ X1 , x2 ∈ X2 }, P ∗ with x3−i P (x1 , x2 |u) = Pi (xi |u), i = 1, 2, and joint PD P ◦ V = {P0 (u)P (x1 , x2 |u)V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, where V = {V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y} is some conditional PD. To define the random coding region Rr (E, WM ) obtained in [132] we shall use the notion of conditional mutual information among three RVs introduced by Liu and Hughes in [166]: IP,V (X1 ∧ X2 ∧ Y |U ) = HP1∗ (X1 |U ) + HP2∗ (X2 |U ) + HP,V (Y |U ) − HP,V (Y, X1 , X2 |U ) = IP,V (X1 , X2 ∧ Y |U ) + IP (X1 ∧ X2 |U ).
50
Multiuser Channels
Then the random coding region is Rr (P ∗ , E, WM ) = {(R0 , R1 , R2 ) : R0 ≥ 0, R1 ≥ 0, R2 ≥ 0, Ri ≤
min
P,V :D(P ◦V kP ∗ ◦W )≤E ∗
|IP,V (Xi ∧ X3−i , Y |U )
+ D(P ◦ V kP ◦ W ) − E|+ ,
R1 + R2 ≤
min
P,V :D(P ◦V kP ∗ ◦W )≤E ∗
i = 1, 2,
|IP,V (X1 ∧ X2 ∧ Y |U )
+ D(P ◦ V kP ◦ W ) − E|+ , R0 + R1 + R2 ≤
min
P,V :D(P ◦V kP ∗ ◦W )≤E
(3.19)
(3.20)
|IP,V (X1 , X2 ∧ Y )
+ IP (X1 ∧ X2 |U ) + D(P ◦ V kP ∗ ◦ W ) − E|+ }, (3.21) and Rr (E, WM ) = co
( [
) Rr (P ∗ , E, WM ) .
P∗
The sphere packing bound of [119] is the following Rsp (P, E, WM ) = {(R0 , R1 , R2 ) : 0 ≤ Ri ≤
min
V :D(V kW |P )≤E
R1 + R2 ≤
IP,V (Xi ∧ Y |U, X3−i ),
min
V :D(V kW |P )≤E
R0 + R1 + R2 ≤
min
i = 1, 2, (3.22)
IP,V (X1 , X2 ∧ Y |U ),
V :D(V kW |P )≤E
IP,V (X1 , X2 ∧ Y )},
(3.23) (3.24)
and Rsp (E, WM ) = co
( [
) Rsp (P, E, WM ) .
P
Theorem 3.6. For all E > 0, for MAC with correlated sources Rr (E, WM ) ⊆ C(E, WM ) ⊆ Rsp (E, WM ).
Corollary 3.3. When E → 0, we obtain the inner and outer estimates for the channel capacity region, the expressions of which are similar
3.4 Multiple-Access Channels
51
Fig. 3.8 Regular MAC.
but differ by the PDs P and P ∗ . The inner bound coincides with the capacity region: Rr (P ∗ , WM ) = (R0 , R1 , R2 ) : 0 ≤ Ri ≤ IP ∗ ,W (Xi ∧ Y |X3−i , U ), i = 1, 2, R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y |U ), R0 + R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y ) , obtained in [199], where it was also proved that in this case it is enough to consider |U| ≤ |Y| + 3. The results for the following special cases can be obtained from the general case. Regular MAC. In the case with M0 = 1 (Figure 3.8) we have the classical MAC studied by Ahlswede [2, 3] and van der Meulen [209]. Ahlswede obtained a simple characterization of the capacity region. Theorem 3.7. For the regular MAC, when R0 = 0, n Rr (P ∗ , E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, for i = 1, 2, Ri ≤
min
P,V :D(P ◦V kP ∗ ◦W )≤E
|IP,V (Xi ∧ X3−i , Y |U ) + D(P ◦ V kP ∗ ◦ W ) − E|+ ,
R1 + R2 ≤
min
P,V :D(P ◦V kP ∗ ◦W )≤E
|IP,V (X1 ∧ X2 ∧ Y |U ) o + D(P ◦ V kP ∗ ◦ W ) − E|+ ,
52
Multiuser Channels
is the inner bound of the E-capacity region and Rsp (P, E, WM ) = (R1 , R2 ) : 0 ≤ Ri ≤ R1 + R2 ≤
IP,V (Xi ∧ Y |X3−i ), i = 1, 2, IP,V (X1 , X2 ∧ Y ) ,
min
V :D(V kW |P )≤E
min
V :D(V kW |P )≤E
is the outer bound. In [166] it was proved that in this case it is enough to consider |U| = 4. Asymmetric MAC. The model, when M1 = 1 (Figure 3.9), is called asymmetric MAC. It was considered by Haroutunian [101], and by van der Meulen [212]. Here one of two messages has access only to one encoder, whereas the other message has access to both encoders. Theorem 3.8. For the asymmetric MAC, when R1 = 0, n Rr (P, E, WM ) = (R0 , R2 ) : 0 ≤ R2 ≤ min V :D(V kW |P )≤E
× |IP,V (X2 ∧ Y |X1 ) + D(V kW |P ) − E|+ , R0 + R2 ≤
min
V :D(V kW |P )≤E
o × |IP,V (X1 , X2 ∧ Y ) + D(V kW |P ) − E|+ ,
Fig. 3.9 Asymmetric MAC.
3.4 Multiple-Access Channels
53
and Rsp (P, E, WM ) n = (R0 , R2 ) : 0 ≤ R2 ≤ R0 + R2 ≤
min
V :D(V kW |P )≤E
min
V :D(V kW |P )≤E
IP,V (X2 ∧ Y |X1 ),
o IP,V (X1 , X2 ∧ Y ) .
In this case, when E → 0, outer and inner bounds are equal and coincide with the capacity region of the asymmetric MAC [101, 212]. MAC with cribbing encoders. Willems [217] and Willems and van der Meulen [219] investigated MAC with cribbing encoders in various communication situations and established the corresponding capacity regions. We shall consider only one of these configurations (Figure 3.10), investigated by van der Meulen [211], when the first encoder has an information about the codeword produced by the second encoder. Theorem 3.9. For the MAC with cribbing encoders the outer and inner bounds are: n Rr (P, E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, Ri ≤
min
V :D(V kW |P )≤E
|IP,V (Xi ∧ Y |X3−i )
+ D(V kW |P ) − E|+ , i = 1, 2, R1 + R2 ≤
min
V :D(V kW |P )≤E
o × |IP,V (X1 , X2 ∧ Y ) + D(V kW |P ) − E|+ ,
Fig. 3.10 MAC with cribbing encoders.
54
Multiuser Channels
and Rsp (P, E, WM ) =
n (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, Ri ≤
min
V :D(V kW |P )≤E
R1 + R2 ≤
IP,V (Xi ∧ Y |X3−i ), i = 1, 2,
min
V :D(V kW |P )≤E
o IP,V (X1 , X2 ∧ Y ) .
And for E → 0 they coincide with the capacity region of this channel. Channel with two inputs and two outputs. This channel is defined by a matrix of transition probabilities WM = {W (y1 , y2 |x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y1 ∈ Y1 , y2 ∈ Y2 }. The two input–two output MAC, first considered by Ahlswede [3], may also be interpreted as a compound MAC with informed decoder (see Figure 3.11). In [3] the capacity region of this channel was determined. In [169] the capacity regions are found for compound MAC with common information and compound MAC with conferencing. Denote X WiN (yi |x1 , x2 ) = W (y1 , y2 |x1 , x2 ), i = 1, 2; j = 3 − i. yj
We use the following PDs P0 = {P0 (u), u ∈ U}, P = {P (u, x1, x2 ) = P0 (u)P (x1 , x2 |u), x1 ∈ X1 , x2 ∈ X2 },
Fig. 3.11 Channel with two inputs and two outputs.
3.4 Multiple-Access Channels
Pi =
Pi (xi |u) =
X
55
P (x1 , x2 |u), xi ∈ Xi ,
i = 1, 2,
x3−i
P ∗ = {P0 (u)P ∗ (x1 , x2 |u) = P0 (u)P1 (x1 |u)P2 (x2 |u), x1 ∈ X1 , x2 ∈ X2 }, P ◦ V = {P0 (u)P (x1 , x2 |u)V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, where V1 , V2 are probability matrices. Let us consider the random coding region Rr (E, WM ) Rr (P ∗ , E, WM ) =
n (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, R1 ≤ min
min
i=1,2 P,Vi :D(P ◦Vi kP ∗ ◦Wi )≤Ei
× |IP,Vi (X1 ∧ Yi |X2 , U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ , R2 ≤ min
min
i=1,2 P,Vi :D(P ◦Vi kP ∗ ◦Wi )≤Ei
× |IP,Vi (X2 ∧ Yi |X1 , U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ , R1 + R2 ≤ min min ∗ i=1,2 P,Vi :D(P ◦Vi kP ◦Wi )≤Ei
o × |IP,Vi (Yi ∧ X1 X2 |U ) + D(P ◦ Vi kP ∗ ◦ Wi ) − Ei |+ ,
and Rr (E, WM ) =
[
Rr (P ∗ , E, WM ).
P∗
The following theorem is proved in [135]. Theorem 3.10. For all E = (E1 , E2 ), E1 > 0, E2 > 0, Rr (E, WM ) ⊆ C(E, WM ). When E1 → 0, E2 → 0, this bound coincides with the capacity region constructed in [3], where it was determined that |U| ≤ 6.
4 E -capacity of Varying Channels
4.1
Bounds on E-capacity for the Compound Channels
Let X , Y, S be finite sets and the transition probabilities of a DMC, with input alphabet X and output alphabet Y, depend on a parameter s with values in S. In other words, we have a set of DMC Ws = {W (y|x, s), x ∈ X , y ∈ Y},
s ∈ S.
Values of the parameter s can be changed by different rules, depending on which different models of channels are formed. Varying channels can be considered in different situations, when the state of the channel is known or unknown on the encoder and decoder. The DMC is called compound channel WC (DCC), if the state s ∈ S of the channel is invariable during transmission of one codeword of length N , but can be changed arbitrarily for transmission of the next codeword. This channel can be considered in four cases, when the current state s of the channel is known or unknown at the encoder and at the decoder. The notions of rate, error probability, E-capacity are defined similarly to those of DMC (see Section 2).
57
58
E-capacity of Varying Channels
DCC were first studied by Blackwell et al. [33] and Dobrushin [56]. The capacity was found by Wolfowitz [221], who has shown that the knowledge of the state s at the decoder does not improve the asymptotic characteristics of the channel. So it is enough to study the channel in two cases. As for DMC, the capacity of the compound channel for average and maximal error probabilities are the same. In the book by Csisz´ar and K¨orner [51] the random coding bound and the sphere packing bound for reliability function of DCC are given. We shall formulate the sphere packing, random coding, and expurgated bounds of E-capacity of DCC. Let us denote by C(E, WC ) the E-capacity of DCC in the case, when the state s is unknown at the encoder and decoder. In the case, when s is known at the encoder and decoder, the E-capacity will be ˆ denoted by C(E, WC ). Let us introduce the following functions: M
Rsp (E, WC ) = max min P
min
IP,V (X ∧ Y ),
min
IP,V (X ∧ Y ).
s∈S V :D(V kWs |P )≤E
M ˆ sp (E, WC ) = R min max s∈S
P
V :D(V kWs |P )≤E
Consider also the functions M
Rr (E, WC ) = max min P
M ˆ r (E, WC ) = R min max s∈S
min
|IP,V (X ∧ Y ) + D(V kWs |P ) − E|+ ,
min
|IP,V (X ∧ Y ) + D(V kWs |P ) − E|+ ,
s∈S V :D(V kWs |P )≤E P
V :D(V kWs |P )≤E
and n o IP,V (X ∧ X) + |EdW,s (X, X) − E|+ , P s∈S PX =PX =P n o M ˆ x (E, WC ) = min max min R IP,V (X ∧ X) + |EdW,s (X, X) − E|+ . M
Rx (E, WC ) = max min s∈S
P
min
PX =PX =P
Theorem 4.1. For any DCC, any E > 0 the following inequalities hold max(Rr (E, WC ), Rx (E, WC )) ≤ C(E, WC ) ≤ Rsp (E, WC ), ˆ r (E, WC ), R ˆ x (E, WC )) ≤ C(E, ˆ ˆ sp (E, WC ). max(R WC ) ≤ R
4.2 Channels with Random Parameter
59
The proof of the lower bounds is a modification of the proof for DMC by the method of graph decomposition expounded in Section 2.6 and in [107].
4.2
Channels with Random Parameter
The channel WQ with random parameter (CRP) is a family of discrete memoryless channels Ws : X → Y, where s is the channel state, varying independently in each moment of the channel action with the same known PD Q(s) on S. The considered channel is memoryless and stationary, that is for an N -length input word x = (x1 , x2 , . . . , xN ) ∈ X N , the output word y = (y1 , y2 , . . . , yN ) ∈ Y N and states sequence s = (s1 , s2 , . . . , sN ) ∈ S N , the transition probabilities are W N (y|x, s) =
N Y
W (yn |xn , sn ), QN (s) =
n=1
N Y
Q(sn ).
n=1
Let again M be the message set and M be its cardinality. First we consider the situation with state sequence known to the sender and unknown to the receiver. The code for such channel is defined by encoding f : M × S N → N X and decoding g : Y N → M. The number R(f, g, N ) = (1/N ) log M is called code rate. Denote by M
e(m, s) = e(f, g, N, m, s) = W N (Y N − g −1 (m)|f (m, s), s)
(4.1)
the probability of erroneous transmission of the message m for given states sequence s. The maximal eQ and the average eQ error probabilities of the code (f, g) for the channel WQ are, correspondingly, X M eQ = e(f, g, N, WQ ) = max QN (s)e(m, s), (4.2) m∈M
eQ
s∈S N
X X M 1 = e(f, g, N, WQ ) = QN (s)e(m, s). M N
(4.3)
m∈M s∈S
E-capacity of the channel WQ for average error probability is denoted by C(E, WQ ). The CRP with additional information at the encoder was first considered by Shannon [193]. He studied the situation, when choosing the
60
E-capacity of Varying Channels
input symbol xn , n = 1, N one knows the states sm of the channel for m ≤ n, however the states sm for m > n are unknown. This model in literature is referred as causal side information. Gelfand and Pinsker [86] determined the capacity of CRP for average error probability in the case, when for the choice of the codeword x one needs to know the whole sequence s, in other words in the case of non-causal side information. As for the DMC for the CRP also the capacities for average and maximal error probabilities are equal. Let U, S, X, Y be RVs with values in finite sets U, S, X , Y, respectively, with PDs Q(s), P (u, x|s), and V (y|x, s), s ∈ S, u ∈ U, x ∈ X , y ∈ Y. Let us denote M
Rsp (Q, P, V ) = IQ,P,V (Y ∧ X|S),
(4.4)
M
Rr (Q, P, V ) = IQ,P,V (Y ∧ U ) − IQ,P (S ∧ U ) = IQ,P,V (Y ∧ X|S) − IQ,P,V (S ∧ U |Y ) − IQ,P,V (Y ∧ X|U, S).
(4.5)
Consider the following functions: M
Rsp (E, WQ ) = M
Rr (E, WQ ) =
min max
Q0 ∈Q(S)
P
min max
Q0 ∈Q(S)
P
min
Rsp (Q0 , P, V ),
min
|Rr (Q0 , P, V )
V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ . The following theorem was proved by M. Haroutunian in [128, 131]. Theorem 4.2. For all E > 0, for CRP with states sequence known to the sender the following inequalities are valid Rr (E, WQ ) ≤ C(E, WQ ) ≤ C(E, WQ ) ≤ Rsp (E, WQ ). Note that when E → 0 we obtain the upper and the lower bounds for capacity of the channel WQ : M
Rsp (WQ ) = max Rsp (Q, P, V ), P
M
Rr (WQ ) = max Rr (Q, P, V ), P
4.2 Channels with Random Parameter
61
where Rr (WQ ) coincides with the capacity C(WQ ) of the CRP, obtained by Gelfand and Pinsker [85]. They also proved that it is enough to consider RV U with |U| ≤ |X | + |S|. For the model with states sequence known at the encoder and decoder Rsp (E, WQ ) is the same, but upper and lower bounds coincide for small E, because M
Rr (E, WQ ) = max P
min
Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
|IQ0 ,P,V (Y ∧ X|S)
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ , with P = {P (x|s), x ∈ X , s ∈ S}. When E → 0 the limits of bounds coincide and we obtain the capacity, which is the same for maximal and average error probabilities C(WQ ) = C(WQ ) = max IQ,P,W (Y ∧ X|S). P
The results for this and next two cases are published in [117]. If the state sequence is unknown at the encoder and decoder let us P take W ∗ (y|x) = s∈S Q(s)W (y|x, s). Then the bounds will take the following form: M
Rsp (E, WQ ) = max P
M
Rr (E, WQ ) = max P
min
IP,V (Y ∧ X),
min
|IP,V (Y ∧ X) + D(V kW ∗ |P ) − E|+ ,
V :D(V kW ∗ |P )≤E V :D(V kW ∗ |P )≤E
with P = {P (x), x ∈ X } and for the capacity we obtain C(WQ ) = C(WQ ) = max IP,W ∗ (Y ∧ X). P
In the case when the state sequence is known at the decoder and unknown at the encoder the following bounds are valid: M
Rsp (E, WQ ) = max P
M
Rr (E, WQ ) = max P
min
IQ0 ,P,V (Y, S ∧ X),
min
|IQ0 ,P,V (Y, S ∧ X)
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
+ D(Q0 ◦ V kQ ◦ W |P ) − E|+ ,
62
E-capacity of Varying Channels
with P = {P (x), x ∈ X } and C(WQ ) = C(WQ ) = max IQ,P,W (Y ∧ X|S). P
Haroutunian in [128, 131] also studied the generalized CRP. Let ⊂ P(S) be a subset of the set of all PDs on S. Consider the CRP, where the PD Q of the states is invariable during the transmission of length N , but can be changed arbitrarily within P 0 during the next transmission. This channel was first considered by Ahlswede [8], who found the capacity, which is the same for average and maximal error probabilities. The following theorem is stated. P0
Theorem 4.3. For GCRP for any E > 0 0
0
P RrP (E, W ) ≤ CP 0 (E, W ) ≤ CP 0 (E, W ) ≤ Rsp (E, W ),
where 0
M
P Rsp (E, W ) = inf 0 Rsp (E, WQ ), Q∈P
0 M RrP (E, W ) =
4.3
inf Rr (E, WQ ).
Q∈P 0
Information Hiding Systems
We explore the model of information hiding system with the single message in Figure 4.1. The message (watermark, fingerprint, etc.) needs to be embedded in the host data set (which can be the blocks from the audio, image, and video data) and to be reliably transmitted to a receiver via unknown
Fig. 4.1 The model of information hiding system.
4.3 Information Hiding Systems
63
channel, called the attack channel as it can be subject to random attacks of attacker by a DMC. Side information, which can be cryptographic keys, properties of the host data, features of audio, image or video data or locations of watermarks, is available both to encoder and decoder. The decoder does not know the DMC chosen by the attacker. The encoding and decoding functions are known to the attacker, but not the side information. The information hider introduces certain distortion in the host data set for the data embedding. The attacker trying to change or remove this hidden information, introduces some other distortion. Several authors introduced and studied various models of datahiding systems. Results on capacity and error exponents problems have been obtained by Moulin and O’Sullivan [175], Merhav [174], SomekhBaruch and Merhav [200, 201], Moulin and Wang [176]. Let the mappings d1 : S × X → [0, ∞), d2 : X × Y → [0, ∞), be single-letter symmetric distortion functions. The information hiding N -length code is a pair of mappings (f, g) subject to distortion ∆1 , where f : M × S N × KN → X N is the encoder, mapping host data block s, a message m and side information k to a sequence x = f (s, m, k), which satisfies the following distortion constraint: dN 1 (s, f (s, m, k)) ≤ ∆1 , and g : Y N × KN → M is the decoding. An attack channel, subject to distortion ∆2 , satisfies the following condition: X X x∈X N y∈Y N
N dN 2 (x, y)A(y|x)p (x) ≤ ∆2 .
64
E-capacity of Varying Channels
The N -length memoryless expression for the attack channel A is A(y|x) =
N Y
A(yn |xn ).
n=1
The nonnegative number R(f, g, N ) = N −1 log |M| is called the information hiding code rate. We use an auxiliary RV U , taking values in the finite set U and forming the Markov chain (K, S, U ) → X → Y . A memoryless covert channel P , subject to a distortion level ∆1 , is a PD P = {P (u, x|s, k), u ∈ U, x ∈ X , s ∈ S, k ∈ K} such that for any Q X Q(s, k)P (u, x|s, k)d1 (s, x) ≤ ∆1 . (4.6) u,x,s,k
Denote by P(Q, ∆1 ) the set of all covert channels, subject to distortion level ∆1 . The N -length memoryless expression for the covert channel P is P (u, x|s, k) =
N Y
P (un , xn |sn , kn ).
n=1
A memoryless attack channel A, subject to distortion level ∆2 , under the condition of covert channel P ∈ P(Q, ∆1 ), is defined by a PD A = {A(y|x), y ∈ Y, x ∈ X } such that X d2 (x, y)A(y|x)P (u, x|s, k)Q(s, k) ≤ ∆2 . u,x,y,s,k
The set of all attack channels are denoted by A(Q, P, ∆2 ) under the condition of covert channel P ∈ P(Q, ∆1 ) and subject to distortion level ∆2 . The sets P(Q, ∆1 ) and A(Q, P, ∆2 ) are defined by linear inequality constraints and hence are convex. The error probability of the message m averaged over all (s, k) ∈ N S × KN equals to e(f, g, N, m, Q, A) X M = (s,k)∈ S N ×KN
Q(s, k)A{Y N − g −1 (m|k)|f (m, s, k)}.
4.3 Information Hiding Systems
65
We consider the maximal e(f, g, N, Q, P, ∆2 ) and the average e(f, g, N, Q, P, ∆2 ) error probabilities of the code, maximal over all attack channels from A(Q, P, ∆2 ). The following lower bound of information hiding E-capacity for maximal and average error probabilities is constructed by M. Haroutunian and Tonoyan in [137]. Rr (Q∗ , E, ∆1 , ∆2 ) = min
max
min
min
Q P ∈P(Q,∆1 ) A∈A(Q,P,∆2 ) V :D(Q◦P ◦V kQ∗ ◦P ◦A)≤E
× |IQ,P,V (Y ∧ U |K) − IQ,P (S ∧ U |K) + D(Q ◦ P ◦ V kQ∗ ◦ P ◦ A) − E|+ .
(4.7)
As E → 0 we obtain the lower bound of information hiding capacity: Rr (Q∗ , ∆1 , ∆2 ) =
max
min
P ∈P(Q∗ ,∆1 ) A∈A(Q∗ ,P,∆2 )
− IQ∗ ,P (S ∧ U |K)}
{IQ∗ ,P,A (Y ∧ U |K) (4.8)
which coincides with the information hiding capacity, obtained by Moulin and O’Sullivan [175]. In Figure 4.2 the random coding bound of E-capacity (4.7) is illustrated for three particular cases. In the first case ∆1 = 0.4, and ∆2 = 0.7 (curve “1”), in the second case ∆1 = 0.7, ∆2 = 0.7 (“2”) and in the third case ∆1 = 0.4 and ∆2 = 0.8 (“3”). In Figures 4.3(a), 4.3(b), and 4.3(c) the capacity function (4.8) is depicted.
Fig. 4.2 The random coding bound of E-capacity.
66
E-capacity of Varying Channels
Fig. 4.3 The capacity function.
The surface in Figure 4.3(a) illustrates the dependence of information hiding rate from the distortion levels. Curves in Figures 4.3(b) and 4.3(c) are intersections of the rate surface with the planes parallel to the ∆1 and ∆2 axes correspondingly. Recent results for the E-capacity of information system with multiple messages and for the reversible information hiding system are presented in [136, 138, 139], and [204].
4.4
Multiple-Access Channels with Random Parameter
Let X1 , X2 , Y, S be finite sets. The transition probabilities of a discrete memoryless multiple-access channel with two encoders and one decoder depend on a parameter s with values in S. In other words, we have a
4.4 Multiple-Access Channels with Random Parameter
67
set of conditional probabilities Ws = {W (y|x1 , x2 , s), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
s ∈ S.
The multiple-access channel WR with random parameter (MACRP) is a family of discrete memoryless multiple-access channels Ws : X1 × X2 → Y, where s is the channel state, varying independently in each moment with the same PD Q(s) on S. Time varying MAC was explored in [13, 14, 55, 149, 181, 202]. In [133, 134] M. Haroutunian studied the MAC with random parameter in various situations, when the whole state sequence s is known or unknown at the encoders and at the decoder. Denote by e(m1 , m2 , s) the probability of erroneous transmission of the messages m1 ∈ M1 , m2 ∈ M2 for given s. We study the average error probability of the code: X X 1 e(N, WR ) = QN (s)e(m1 , m2 , s). M1 M2 m ,m N 1
2 s∈S
When s is known at the encoders and decoder the code of length N is a collection of mappings (f1 , f2 , g), where f1 : M1 × S N → X1N and f2 : M2 × S N → X2N are encodings and g : Y N × S N → M1 × M2 is decoding. Denote f1 (m1 , s) = x1 (m1 , s), f2 (m2 , s) = x2 (m2 , s), g −1 (m1 , m2 , s) = {y : g(y) = (m1 , m2 , s)}, then e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 , s)|f1 (m1 , s), f2 (m2 , s), s} is the probability of the erroneous transmission of messages m1 and m2 . Let RVs X1 , X2 , Y, S take values in alphabets X1 , X2 , Y, S, respectively, with the following PDs: Q = {Q(s), s ∈ S}, Pi∗ = {Pi∗ (xi |s), xi ∈ Xi }, ∗
P =
{P1∗ (x1 |s)P2∗ (x2 |s), x1
i = 1, 2, ∈ X1 , x2 ∈ X2 },
P = {P (x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
68
E-capacity of Varying Channels
with X
P (x1 , x2 |s) = Pi∗ (xi |s),
i = 1, 2,
x3−i
and joint PD Q ◦ P ◦ V = {Q(s)P (x1 , x2 |s)V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, where V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y} is some conditional PD. The following region is called the random coding bound [ M Rr (E, WR ) = Rr (P ∗ , E, WR ), P∗
with n M Rr (P ∗ , E, WR ) = (R1 , R2 ) : 0 ≤ R1 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
IQ0 ,P,V (X1 ∧ X2 , Y |S)
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , IQ0 ,P,V (X2 ∧ X1 , Y |S) 0 ≤ R2 ≤ min Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , R1 + R2 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
|IQ0 ,P,V (X1 , X2 ∧ Y |S)
o + IQ0 ,P (X1 ∧ X2 |S) + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ . The next region is called the sphere packing bound [ M Rsp (E, WR ) = Rsp (P, E, WR ), P
4.4 Multiple-Access Channels with Random Parameter
69
where M
Rsp (P, E, WR ) = {(R1 , R2 ) : 0 ≤ R1 ≤ 0 ≤ R2 ≤ R1 + R2 ≤
min
IQ0 ,P,V (X1 ∧ Y |X2 , S),
min
IQ0 ,P,V (X2 ∧ Y |X1 , S),
Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
min
Q0 ,V : D(Q0 ◦P ◦V kQ◦P ◦W )≤E
IQ0 ,P,V (X1 , X2 ∧ Y |S)}.
The following theorem is proved by M. Haroutunian in [133]. Theorem 4.4. For all E > 0, for MAC with random parameter the following inclusions are valid Rr (E, WR ) ⊆ C(E, WR ) ⊆ Rsp (E, WR ).
Corollary 4.1. When E → 0, we obtain the inner and outer estimates for the channel capacity region, the expressions of which are similar but differ in the PDs P and P ∗ . The inner bound is n M Rr (P ∗ , WR ) = (R1 , R2 ) : 0 ≤ Ri ≤ IQ,P ∗ ,W (Xi ∧ Y |X3−i , S), i = 1, 2, o R1 + R2 ≤ IQ,P ∗ ,W (X1 , X2 ∧ Y |S) . For this model when the states are unknown at the encoders and the decoder the mappings (f1 , f2 , g) are f1 : M1 → X1N , f2 : M2 → X2N and g : Y N → M1 × M2 . Then f1 (m1 ) = x1 (m1 ), g
−1
f2 (m2 ) = x2 (m2 ),
(m1 , m2 ) = {y : g(y) = (m1 , m2 )},
and the probability of the erroneous transmission of messages m1 and m2 is e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 ), f2 (m2 ), s}.
70
E-capacity of Varying Channels
Consider the distributions Q = {Q(s), s ∈ S}, Pi∗ = {Pi∗ (xi ), xi ∈ Xi },
i = 1, 2,
P ∗ = {P1∗ (x1 )P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 }, P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 }, V = {V (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}, and W ∗ (y|x1 , x2 ) =
X
Q(s)W (y|x1 , x2 , s).
s∈S
In this case the bounds in Theorem 4.4 take the following form: n M Rr (P ∗ , E, WR ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0, R1 ≤ R2 ≤
+
min
|IP,V (X1 ∧ X2 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
min
|IP,V (X2 ∧ X1 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
+
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
R 1 + R2 ≤
min
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
|IP,V (X1 , X2 ∧ Y ) + IP (X1 ∧ X2 ) o + D(P ◦ V kP ∗ ◦ W ∗ ) − E|+ ,
and n M Rsp (P, E, WR ) = (R1 , R2 ) : 0 ≤ R1 ≤ 0 ≤ R2 ≤
min
IP,V (X1 ∧ Y |X2 ),
min
IP,V (X2 ∧ Y |X1 ),
V :D(V kW ∗ |P )≤E V :D(V kW ∗ |P )≤E
R1 + R2 ≤
min
V :D(V kW ∗ |P )≤E
o IP,V (X1 , X2 ∧ Y ) .
When E → 0, we obtain the inner and the outer estimates for the channel capacity region, the expressions of which as in the previous case are similar but differ by the PDs P and P ∗ . The inner bound is: M
Rr (P ∗ , WR ) = {(R1 , R2 ) : 0 ≤ Ri ≤ IP ∗ ,W ∗ (Xi ∧ Y |X3−i ), i = 1, 2, R1 + R2 ≤ IP ∗ ,W ∗ (X1 , X2 ∧ Y )}.
4.4 Multiple-Access Channels with Random Parameter
71
When the state is known on the decoder and unknown on the encoders the code (f1 , f2 , g) is a collection of mappings f1 : M1 → X1N , f2 : M2 → X2N and g : Y N × S → M1 × M2 . Then the probability of the erroneous transmission of messages m1 and m2 will be e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 , s)|f1 (m1 ), f2 (m2 ), s}. For this model the following distributions participate in the formulation of the bounds Pi∗ = {Pi∗ (xi ), xi ∈ Xi },
i = 1, 2,
P ∗ = {P1∗ (x1 )P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 }, P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 }, Q0 = {Q0 (s|x1 , x2 ), s ∈ S, x1 ∈ X1 , x2 ∈ X2 }, V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}. Then the bounds in Theorem 4.4 take the following form: n M Rr (P ∗ , E, WR ) = (R1 , R2 ) : 0 ≤ R1 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
IQ0 ,P,V (X1 ∧ X2 , S, Y )
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , IQ0 ,P,V (X2 ∧ X1 , S, Y ) 0 ≤ R2 ≤ min Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+ +D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , R1 + R2 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
|IQ0 ,P,V (X1 , X2 ∧ S, Y )
o + IP (X1 ∧ X2 ) + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ , and n M Rsp (P, E, WR ) = (R1 , R2 ) : 0 ≤ R1 ≤ 0 ≤ R2 ≤
min
IQ0 ,P,V (X1 ∧ Y, S|X2 ),
min
IQ0 ,P,V (X2 ∧ Y, S|X1 ), o IP,V (X1 , X2 ∧ Y, S) .
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
R1 + R2 ≤
min
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
72
E-capacity of Varying Channels
When E → 0, we obtain the inner and outer estimates for the channel capacity region, the expressions of which as in the previous case are similar but differ by the PDs P and P ∗ . The inner bound is M
Rr (P ∗ , WR ) = {(R1 , R2 ) : 0 ≤ Ri ≤ IQ,P ∗ ,W (Xi ∧ Y |X3−i , S), i = 1, 2, R1 + R2 ≤ IQ,P ∗ ,W (X1 , X2 ∧ Y |S)}. The case when the states of the channel are known on the encoders and unknown on the decoder is characterized by encodings f1 : M1 × S N → X1N and f2 : M2 × S N → X2N and decoding g : Y N → M1 × M2 . Then e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 , s), f2 (m2 , s), s} is the probability of the erroneous transmission of messages m1 and m2 . Let the auxiliary RVs U1 , U2 take values correspondingly in some finite sets U1 , U2 . Then with the following PDs Q = {Q(s), s ∈ S}, Pi∗ = {Pi∗ (ui , xi |s), xi ∈ Xi , ui ∈ Ui }, ∗
P =
{P1∗ (u1 , x1 |s)P2∗ (u2 , x2 |s), x1
i = 1, 2,
∈ X1 , x2 ∈ X2 },
P = {P (u1 , u2 , x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 }, and V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y} the random coding bound will be written in the following way: n Rr (P ∗ , E, WR ) = (R1 , R2 ) : R1 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
IQ0 ,P,V (U1 ∧ U2 , Y ) − IQ0 ,P (U1 ∧ S)
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , IQ0 ,P,V (U2 ∧ U1 , Y ) − IQ0 ,P (U2 ∧ S) R2 ≤ min Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
4.4 Multiple-Access Channels with Random Parameter
0 ≤ R1 + R2 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
73
|IQ0 ,P,V (U1 , U2 ∧ Y )
− IQ0 ,P (U1 , U2 ∧ S) + IQ0 ,P (U1 ∧ U2 ) o + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ . When E → 0, we obtain the inner estimate for the channel capacity region: n Rr (P ∗ , WR ) = (R1 , R2 ) : 0 ≤ Ri ≤ IQ,P ∗ ,W (Ui ∧ Y |U3−i ) − IQ,P ∗ (U1 ∧ S), i = 1, 2, o ∗ ∗ R1 + R2 ≤ IQ,P ,W (U1 , U2 ∧ Y ) − IQ,P (U1 , U2 ∧ S) . Another case is when the states are known on one of the encoders and unknown on the other encoder and on the decoder. For distinctness we shall assume, that the first encoder has information about the state of the channel. Then the code will consist of the following mappings: f1 : M1 × S N → X1N and f2 : M2 → X2N are encodings and g : Y N → M1 × M2 is decoding. The probability of the erroneous transmission of messages m1 and m2 is e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 , s), f2 (m2 ), s}. Let the auxiliary RV U takes values in some finite set U and Q = {Q(s), s ∈ S}, P1∗ = {P1∗ (u, x1 |s), x1 ∈ X1 , u ∈ U}, P2∗ = {P2∗ (x2 ), x2 ∈ X2 }, P ∗ = {P1∗ (u, x1 |s)P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 }, P = {P (u, x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 }, V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}. The random coding bound of the E-capacity region will be n Rr (P ∗ , E, WR ) = (R1 , R2 ) : IQ0 ,P,V (U ∧ Y, X2 ) − IQ0 ,P ∗ (U ∧ S) R1 ≤ min 1 Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
74
E-capacity of Varying Channels
R2 ≤
Q0 ,P,V
min
:D(Q0 ◦P ◦V
kQ◦P ∗ ◦W )≤E
IQ0 ,P,V (X2 ∧ Y, U ) − IQ0 ,P ∗ (U ∧ S) 1
+ + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E , 0 ≤ R1 + R2 ≤
min
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
|IQ0 ,P,V (U, X2 ∧ Y )
+ IQ0 ,P (U ∧ X2 ) − IQ0 ,P1∗ (U ∧ S) o + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ .
Corollary 4.2. When E → 0, we obtain the inner estimate for the channel capacity region: Rr (P ∗ , WR ) = {(R1 , R2 ) : 0 ≤ R1 ≤ IQ,P ∗ ,W (U ∧ Y |X2 ) − IQ,P1∗ (U ∧ S), 0 ≤ R2 ≤ IQ,P ∗ ,W (X2 ∧ Y |U ) − IQ,P1∗ (U ∧ S), R1 + R2 ≤ IQ,P ∗ ,W (U, X2 ∧ Y ) − IQ,P1∗ (U ∧ S)}.
4.5
Arbitrarily Varying Channels with State Sequence Known to the Sender
The arbitrarily varying channel (AVC) is a discrete memoryless transmission system, which depends on state s, that may change in an arbitrary manner within a finite set S. A series of works are devoted to the investigation of AVC in various situations [4, 15, 71, 146]. The capacity of the channel, when the state sequence is known to the sender, was found by Ahlswede [8]. In [146] the error exponent of AVS was studied. For the channel W the maximal and average error probabilities are, correspondingly, M
e = e(f, g, N, W ) = max max e(m, s), m∈M s∈S N
M
e = e(f, g, N, W ) =
1 X max e(m, s). M s∈S N m∈M
4.5 Arbitrarily Varying Channels
75
In [127] and [131] M. Haroutunian obtained the following bounds of E-capacity (Q(S) is the set of all distributions on S): M
Rsp (E, W ) = min max Q∈Q(S)
P
M
Rr (E, W ) = min max Q∈Q(S)
P
min
Rsp (Q, P, V ),
min
|Rr (Q, P, V )
V :D(V kW |Q,P )≤E V :D(V kW |Q,P )≤E +
+ D(V kW |Q, P ) − E| ,
where Rsp (Q, P, V ) and Rr (Q, P, V ) are defined in (4.4) and (4.5), respectively. Theorem 4.5. For all E > 0, for arbitrarily varying channel with state sequence known to the sender the following inequalities are valid Rr (E, W ) ≤ C(E, W ) ≤ C(E, W ) ≤ Rsp (E, W ). Note that when E → 0 we obtain the upper and the lower bounds for capacity: M
Rsp (W ) = min Rsp (WQ ), Q∈Q(S)
M
Rr (W ) = min Rr (WQ ). Q∈Q(S)
Rr (W ) coincides with the capacity C(W ) of the arbitrarily varying channel with state sequence known to the sender, found by Ahlswede in [8].
5 Source Coding Rates Subject to Fidelity and Reliability Criteria
5.1
Introductory Notes
In this section we expound the concept of the rate-reliability-distortion function [143] for discrete memoryless sources (DMS). Shannon rate-distortion function [194] shows the dependence of the asymptotically minimal coding rate on a required average fidelity (distortion) threshold for source noiseless transmission. Another characteristic in source coding subject to a distortion criterion can be considered, namely, an exponential decrease in error probability with a desirable exponent or reliability. The maximum error exponent as a function of coding rate and distortion in the rate-distortion problem was specified by Marton [171]. An alternative order dependence of the three parameters was examined by Haroutunian and Mekoush in [124]. They define the rate-reliability-distortion function as the minimal rate at which the messages of a source can be encoded and then reconstructed by the receiver with an error probability that decreases exponentially with the codeword length. Therefore, the achievability of the coding rate R is considered as a function of a fixed distortion level ∆ ≥ 0 and an error exponent E > 0. In a series of works [114, 115, 122, 123, 140, 170] and 77
78
Source Coding Rates Subject to Fidelity and Reliability Criteria
[213] the coauthors and their collaborators successively extended this idea into the multiuser source coding problems. Recently this approach was adopted by Tuncel and Rose [206]. Actually, the approach of [124] brings a technical ease. Solving a general problem on rate-reliability-distortion one can readily convert the results from that area to the rate-distortion one looking at the extremal values of the reliability, e.g., E → 0, E → ∞. It is more useful when we deal with a multiuser source coding problem. Particularly, with the limit condition E → 0 the rate-reliability-distortion function turns into the corresponding rate-distortion one. In Sections 5.2–5.5, we elaborate on the concept of the ratereliability-distortion function and its properties. In case of exact (“zerodistortion”) transmission the rate-reliability function is specified. The convexity properties of those functions, important from informationtheoretical point of view, are in the focus of the discussions.
5.2
The Rate-Reliability-Distortion Function
The blockwise encoded messages of a DMS must be transmitted to the receiver. The decoder based on the codeword has to recover the original message in the framework of required distortion and reliability. The model of such information transmission system is depicted in Figure 5.1 (cf. Figure 2.1). The DMS X is defined as a sequence {Xi }∞ i=1 of discrete independent identically distributed (i.i.d.) RVs taking values in the finite set X , which is the alphabet of messages of the source. Let P ∗ = {P ∗ (x), x ∈ X }
(5.1)
be the generating PD of the source messages. Since we are interested in the memoryless sources, the probability P ∗N (x) of the N -length vector of N successive messages x = (x1 , x2 , . . . , xN ) ∈ X N is defined as
Fig. 5.1 Noiseless communication system.
5.2 The Rate-Reliability-Distortion Function
79
product of the components probabilities P
∗N
M
(x) =
N Y
P ∗ (xn ).
(5.2)
n=1
The finite set Xb, different in general from X , is the reproduction alphabet at the receiver. Let d : X × Xb → [0; ∞)
(5.3)
be the given fidelity (distortion) criterion between source original and reconstruction messages. The distortion measure for sequences x ∈ X N b ∈ XbN is assumed to be the average of the components’ distortions and x M
b) = d(x, x
N 1 X d(xn , x bn ). N
(5.4)
n=1
We name the code, and denote (f, g) the family of two mappings: a coding f : X N → {1, 2, . . . , L(N )}, and a decoding g : {1, 2, . . . , L(N )} → XbN , where L(N ) is the volume of the code. The task of the system is to ensure restoration of the source messages at the receiver within a given distortion level ∆ and with a “small” error probability. Our problem is to estimate the minimum of the code volume sufficient for realizing this task. For a given distortion level ∆ ≥ 0, we consider the following set: M
b, d(x, x b) ≤ ∆}, A = {x ∈ X N : g(f (x)) = x that is, the set of satisfactorily transmitted vectors of messages. The error probability e(f, g, P ∗ , ∆, N ) of the code (f, g) for the source PD P ∗ given ∆ and N will be M
e(f, g, P ∗ , ∆, N ) = 1 − P ∗N (A).
80
Source Coding Rates Subject to Fidelity and Reliability Criteria
Definition 5.1. A number R ≥ 0 is called ∆-achievable rate for given PD P ∗ and ∆ ≥ 0 if for every ε > 0 and sufficiently large N there exists a code (f, g) such that 1 log L(N ) ≤ R + ε, N e(f, g, P ∗ , ∆, N ) ≤ ε.
(5.5) (5.6)
The minimum of these ∆-achievable rates defines the Shannon ratedistortion function [194]. We denote it by R(∆, P ∗ ). The properties of the rate-distortion function are well studied, and they can be found in the books by Berger [26], Csisz´ar and K¨orner [51], Cover and Thomas [48], and in the paper [9] by Ahlswede. Quite logically, we could replace (5.6) by the average distortion constraint X P ∗ (x)d(x, g(f (x))) ≤ ∆. x∈X N
However, for the discrete memoryless source models this approach brings us to the same performance limit, as it is noticed in Section 2.2 of the book [51]. We study the achievability of a coding rate involving, in addition to the distortion criterion, the requirement that the error probability exponentially decreases with given reliability E as N → ∞. Definition 5.2. A number R ≥ 0 is called (E, ∆)-achievable rate for a given PD P ∗ , E > 0 and ∆ ≥ 0 if for every ε > 0, δ > 0 and sufficiently large N there exists a code (f, g) such that 1 log L(N ) ≤ R + ε N
(5.7)
and the error probability is exponentially small: e(f, g, P ∗ , ∆, N ) ≤ exp{−N (E − δ)}.
(5.8)
5.2 The Rate-Reliability-Distortion Function
81
It is clear that if R is (E, ∆)-achievable then any real number larger than R is also (E, ∆)-achievable. Denote by R(E, ∆, P ∗ ) the minimum (E, ∆)-achievable rate for given PD P ∗ and call it the rate-reliability-distortion function. Before the detailed consideration of the rate-reliability approach in source coding with fidelity criterion, we recall the result from investigation of the inverse function. Reliability-rate-distortion dependence was studied by Marton [171], who showed that if R > R(∆, P ∗ ) then there exist N -length block codes (f, g) such that with N growing the rate of the code approaches R: N −1 log L(N ) → R and the error probability e(f, g, P ∗ , ∆, N ) converges to zero exponentially with the exponent F (R, ∆, P ∗ ) =
inf
D(P k P ∗ ).
(5.9)
P :R(∆,P )>R
In addition, this is the optimal dependence between R and E, due to the following theorem. Theorem 5.1. (Marton [171]) Let d be a distortion measure on X × Xb and R < log |X |. Then there exist block codes (f, g) such that with growing N (N ≥ N0 (|X |, δ)) (i) N −1 log L(N ) → R, (ii) for every PD P ∗ on X , ∆ ≥ 0 and each δ > 0 N −1 log e(f, g, P ∗ , ∆, N ) ≤ −F (R, ∆, P ∗ ) + δ and, furthermore, for every coding procedure satisfying (i) lim N −1 log e(f, g, P ∗ , ∆, N ) ≥ −F (R, ∆, P ∗ ). N →∞
The properties of the error exponent function F (R, ∆, P ∗ ) are discussed in [171] and [9]. Particularly, Marton [171] questioned the continuity of that function in R, meanwhile Ahlswede [9] gave the full solution to this problem proving the discontinuity of F (R, ∆, P ∗ ) for general distortion measures other than the Hamming distance.
82
Source Coding Rates Subject to Fidelity and Reliability Criteria
We proceed with a discussion of the notion of (E, ∆)-achievability, interpreting some properties of the function R(E, ∆, P ∗ ). Lemma 5.2. (i) Every (E, ∆)-achievable rate is also ∆-achievable. (ii) R(E, ∆, P ∗ ) is a non-decreasing function in E (for each fixed ∆ ≥ 0). The proof is an evident application of Definitions 5.1 and 5.2. Theorem 5.3. For any PD P ∗ lim R(E, ∆, P ∗ ) = R(∆, P ∗ ).
E→0
(5.10)
Proof. It follows from Lemma 5.2 that for any fixed ∆ ≥ 0 and P ∗ for every 0 < E2 ≤ E1 the following inequalities hold: R(∆, P ∗ ) ≤ R(E2 , ∆, P ∗ ) ≤ R(E1 , ∆, P ∗ ). For E → 0 the sequence R(E, ∆, P ∗ ) is a monotonically non-increasing sequence lower bounded by R(∆, P ∗ ) and must therefore have a limit. We have to prove that it is R(∆, P ∗ ). According to Theorem 5.1 every R > R(∆, P ∗ ) is an (E, ∆)-achievable rate for E=
inf
D(P k P ∗ ).
P :R(∆,P )>R
If R − R(∆, P ∗ ) < δ then R(E, ∆, P ∗ ) − R(∆, P ∗ ) < δ, R(E, ∆, P ∗ ) ≤ R. Then the truthfulness of (5.10) follows.
since
In the next part of this section we introduce auxiliary notations and demonstrate Theorem 5.4 in which the analytical form of the ratereliability-distortion function is formulated [124]. Let M
Q = {Q(b x | x), x ∈ X , x b ∈ Xb}, be a conditional PD on Xb for given x.
5.2 The Rate-Reliability-Distortion Function
83
Consider the following set of distributions: α(E, P ∗ ) = {P : D(P k P ∗ ) ≤ E}.
(5.11) M
Let Q(P, ∆) be the set of all conditional PDs QP (b x | x) = QP , corresponding to the PD P , for which the following condition on expectation of distortion holds: X M b = P (x)QP (b x | x)d(x, x b) ≤ ∆. (5.12) EP,QP d(X, X) x, x b
The solution of the main problem of this section is presented in the following theorem. Theorem 5.4. For every E > 0, ∆ ≥ 0, R(E, ∆, P ∗ ) =
max
min
P ∈α(E,P ∗ ) QP ∈Q(P,∆)
b IP,QP (X ∧ X).
(5.13)
The conformity with the rate-distortion theory is stated by Corollary 5.1. When E → 0 we obtain the rate-distortion function [26, 51, 194]: R(∆, P ∗ ) =
min QP ∗ :EP ∗ ,Q
P
b ∗ d(X,X)≤∆
b I P ∗ ,QP ∗ (X ∧ X).
(5.14)
Then, another basic result in Shannon theory is obtained by taking the receiver requirement on distortion equal to zero. Corollary 5.2. When ∆ = 0 (provided that X ≡ Xb with d(x, x b) = 0 for x = x b and d(x, x b) = 1, otherwise, i.e., d is the Hamming measure), then R(0, P ∗ ) = HP ∗ (X),
(5.15)
since now only the unitary matrix satisfies the condition on the expectation under the minimum in (5.14). This is the solution to the lossless
84
Source Coding Rates Subject to Fidelity and Reliability Criteria
source coding problem invented by Shannon in [191], which sets up the crucial role of the entropy in information theory. Note that an equivalent to the representation (5.13) in terms of the rate-distortion function is the following one R(E, ∆, P ∗ ) =
max
P ∈α(E,P ∗ )
R(∆, P ).
(5.16)
Taking into account these two facts, namely (5.15) and (5.16), it is trivial to conclude about the minimum asymptotic rate sufficient for the source lossless (zero-distortion) transmission under the reliability requirement. Let us denote by R(E, P ∗ ) the special case of the rate-reliability-distortion function R(E, ∆, P ∗ ) for ∆ = 0 and the generic PD P ∗ , and call it the rate-reliability function in source coding. Corollary 5.3. For every E > 0 and a fixed PD P ∗ M
R(E, P ∗ ) = R(E, 0, P ∗ ) =
max
P ∈α(E,P ∗ )
HP (X).
(5.17)
Corollary 5.4. Inverse to R(E, P ∗ ) is the best error exponent function E(R, P ∗ ) =
min
D(P k P ∗ )
(5.18)
P :HP (X)≥R
specialized from (5.9) for the zero-distortion argument, bearing in mind the continuity of the function in R for Hamming distortion measures.
Corollary 5.5. As E → ∞, Theorem 5.4 implies that the minimum asymptotic rate R(∆) of coding of all vectors from X N , each being reconstructed within the required distortion threshold ∆ is R(∆) = max R(∆, P ). P ∈P(X )
(5.19)
This is the so-called “zero-error” rate-distortion function [49, 51]. It is worth to mention, that it is the same for any P ∗ .
5.3 Proofs, Covering Lemma
5.3
85
Proofs, Covering Lemma
To prove Theorem 5.4 we apply the method of types (see Section 1.4). The direct part of the proof we perform using the following random coding lemma about covering of types of vectors. Lemma 5.5. Let for ε > 0 b + ε)}. J(P, Q) = exp{N (IP,Q (X ∧ X)
(5.20)
Then for every type P and conditional type Q there exists a collection of vectors N b j = 1, J(P, Q)}, {b xj ∈ TP,Q (X),
such that the set N bj ), j = 1, J(P, Q)}, {TP,Q (X | x
covers TPN (X) for N large enough, that is J(P,Q)
TPN (X) ⊂
[
N TP,Q (X | x bj ).
j=1
Proof. Using the method of random selection, similarly as in the proof of Lemma 2.4.1 from [51], we show the existence of a covering for TPN (X). Let {Zj , j = 1, J(P, Q)} be a collection of i.i.d. RVs with values b Denote by ψ(x) the characteristic function of the compleon TPN (X). SJ(P,Q) N ment of random set j=1 TP,Q (X | Zj ): J(P,Q) S N (X | Z ), 1, if x ∈ / TP,Q M j ψ(x) = j=1 0, otherwise. It is sufficient to show that for N large enough a selection exists such that X Pr ψ(x) < 1 > 0, N x∈TP (X)
86
Source Coding Rates Subject to Fidelity and Reliability Criteria
since it is equivalent to the existence of the required covering. We have J(P,Q) X [ N / ψ(x) ≥ 1 ≤ TPN (X) Pr x ∈ TP,Q (X | Zj ) . Pr N j=1
x∈TP (X)
Taking into account the independence of RVs Zj , j = 1, Jk (P, Q) and the polynomial estimate for the number of conditional types (see Lemma 1.1) we have for N large enough J(P,Q) [ N Pr x ∈ / TP,Q (X | Zj ) j=1
N −1 J(P,Q) bj ) TPN (X) ≤ 1 − TP,Q (X | x b − |X ||Xb| log(N + 1) − N HP (X)})J(P,Q) ≤ (1 − exp{N HP,Q (X | X) b + ε/2)})J(P,Q) , ≤ (1 − exp{−N (IP,Q (X ∧ X) where ε ≥ N −1 |X ||Xb| log(N + 1). Applying the inequality (1 − t)s ≤ exp{−st} (which holds for every s and 0 < t < 1) for b + ε/2)}, and s = J(P, Q) t = exp{−N (IP,Q (X ∧ X) we continue the estimation X Pr ψ(x) ≥ 1 N x∈TP (X)
b + ε/2)}} ≤ exp{N HP (X)} exp{−J(P, Q) exp{−N (IP,Q (X ∧ X) = exp{N HP (X) − exp{N ε/2}}. Whence, when N is large enough, then X Pr ψ(x) ≥ 1 < 1. N x∈TP (X)
5.3 Proofs, Covering Lemma
87
Now we are ready to expose the proof of Theorem 5.4. Proof. The proof of the inequality R0 (E, ∆, P ∗ ) ≥ R(E, ∆, P ∗ )
(5.21)
is based upon the idea of “importance” of the source messages of such types P which are not farther than E (in the sense of divergence) from the generating PD P ∗ . Let us represent the set of all source messages of length N as a union of all disjoint types of vectors: [ XN = TPN (X). (5.22) P ∈PN (X )
Let δ be a positive number. For N large enough we can estimate the probability of appearance of the source sequences of types beyond α(E + δ, P ∗ ): X [ P ∗N TPN (X) = P ∗N (TPN (X)) ∗) P ∈α(E+δ,P /
≤ (N + 1)
|X |
∗) P ∈α(E+δ,P /
exp −N
min
∗) P ∈α(E+δ,P /
D(P k P ) ∗
≤ exp{−N E − N δ + |X | log(N + 1)} ≤ exp{−N (E + δ/2)}. (5.23) Here the first and the second inequalities follow from the well known properties of types and the definition of the set α(E, P ∗ ). Consequently, in order to obtain the desired level for the error probability e(f, g, P ∗ , ∆, N ), it is sufficient to construct the “good” encoding function for the vectors with types P from α(E + δ, P ∗ ). Let us pick some type P ∈ α(E + δ, P ∗ ) and some QP ∈ Q(P, ∆). Let [ N N b bj 0 ), j = 1, J(P, QP ). C(P, QP , j) = TP,Q (X | x ) − TP,Q (X | x j P P j 0 <j
We define a code (f, g) for the message vectors of the type P with the encoding: ( j, when x ∈ C(P, QP , j), P ∈ α(E + δ, P ∗ ), f (x) = j0 , when x ∈ TPN (X), P ∈ / α(E + δ, P ∗ ),
88
Source Coding Rates Subject to Fidelity and Reliability Criteria
and the decoding: bj , g(j) = x
b0 , g(j0 ) = x
b0 is a fixed reconstruction vector. So, where j0 is a fixed number and x it is not difficult to see that in our coding scheme an error occurs only when j0 was sent by the coder. According to the definition of the code (f, g), to Lemma 5.5 and the inequality (5.12) we have for P ∈ α(E + δ, P ∗ ) X bj ) = N −1 bj )d(x, x d(x, x n(x, x b | x, x b) x,x b
=
X x,x b
P (x)QP (b x | x)d(x, x b)
b ≤ ∆, = EP,QP d(X, X)
j = 1, J(P, QP ).
For a fixed type P and a corresponding conditional type QP the number b used in the encoding, denoted by LP,QP (N ), is of vectors x b + ε)}. LP,QP (N ) = exp{N (IP,QP (X ∧ X) The “worst” types among α(E + δ, P ∗ ), the number of which has a polynomial estimate, and their optimal (among Q(P, ∆)) conditional distributions QP determine corresponding bound for the transmission rate: N −1 log LP,QP (N ) − ε − N −1 |X | log(N + 1) ≤
max
min
P ∈α(E+δ,P ∗ ) QP ∈Q(P,∆)
b IP,QP (X ∧ X).
(5.24)
Taking into account the arbitrariness of ε and δ, the continuity of the information expression (5.24) in E, we get (5.21). Now we pass to demonstration of the inverse inequality R0 (E, ∆, P ∗ ) ≤ R(E, ∆, P ∗ ).
(5.25)
Let ε > 0 be fixed. Consider a code (f, g) for each blocklength N with (E, ∆)-achievable rate R. It is necessary to show that for some QP (b x | x) ∈ Q(P, ∆) the following inequality holds N −1 log L(N ) + ε ≥ for N large enough.
max
P ∈α(E,P ∗ )
b IP,QP (X ∧ X).
(5.26)
5.3 Proofs, Covering Lemma
89
Let the complement of the set A be A0 , i.e., X N − A = A0 . We can write down: \ \ A TPN (X) = TPN (X) − A0 TPN (X) . For P ∈ α(E − ε, P ∗ ), provided N is large enough, the following estimates hold: T \ P ∗N (A0 TPN (X)) 0 N TP (X) = A P ∗N (x) ≤ exp{N (HP (X) + D(P k P ∗ ))} exp{−N (E − )} ≤ exp{N (HP (X) − ε)}. Whence \ A TPN (X) ≥ (N + 1)−|X | exp{N HP (X)} − exp{N (HP (X) − ε)} exp{N ε} = exp{N (HP (X) − ε)} −1 (N + 1)|X | ≥ exp{N (HP (X) − ε)}
(5.27)
for N sufficiently large. T b corresponds, such that To each x ∈ A TPN (X) a unique vector x b = g(f (b x x)). This vector determines a conditional type Q, for which N b | x). b ∈ TP,Q x (X
b = d(x, x b) ≤ ∆. So Q ∈ Q(P, ∆). Since x ∈ A, then EP,Q d(X, X) T N The set of all vectors x ∈ A TP (X) is divided into classes corresponding to these conditional types Q. Let us select the class having maximum cardinality for given P and denote it by \ A TPN (X) . QP
Using the polynomial upper estimate (Lemma 1.1) for conditional types Q, we have for N sufficiently large \ \ N |X | N A TP (X) ≤ (N + 1) A TP (X)
the number of
QP \ N . ≤ exp{N ε/2} A TP (X) QP
(5.28)
90
Source Coding Rates Subject to Fidelity and Reliability Criteria
b, such that g(f (x)) = x b for some Let D be the set of all vectors x \ \ N b). x ∈ A TPN (X) TP,Q (X | x P In accordance with the definition of the code |D| ≤ L(N ). Then \ A TPN (X) QP
≤
X N b TP,Q b) ≤ L(N ) exp{N HP,QP (X | X)}. (X | x P
b∈D x
From the last inequality, (5.27) and (5.28) it is not difficult to arrive to the inequality b − ε)} L(N ) ≥ exp{N (IP,QP (X ∧ X) for any P ∈ α(E − ε, P ∗ ). This affirms the inequality (5.26) and hence (5.25).
5.4
Some Properties and Binary Hamming Rate-Reliability-Distortion Function
Now we formulate some basic properties of R(E, ∆, P ∗ ) and specify it for an important class of sources. In [9] Ahlswede showed that despite the continuity of the Marton’s exponent function F (P ∗ , R, ∆) in R (where R(∆, P ∗ ) < R < R(∆)) for Hamming distortion measures, in general, it is discontinuous in R. In contrast to this fact we have the following result. Lemma 5.6. R(E, ∆, P ∗ ) is continuous in E. 0
Proof. First note that α(E, P ∗ ) is a convex set, that is if P ∈ α(E, P ∗ ) 00 and P ∈ α(E, P ∗ ), then 0
00
D(λP + (1 − λ)P k P ∗ ) ∈ α(E, P ∗ ), because 0
00
0
00
D(λP + (1 − λ)P k P ∗ ) ≤ λD(P k P ∗ ) + (1 − λ)D(P k P ∗ ) ≤ λE + (1 − λ)E = E,
5.4 Binary Hamming Rate-Reliability-Distortion Function
91
due to the concavity of D(P k P ∗ ) in the pair (P, P ∗ ), which means that it is a concave function in P for a fixed P ∗ . Hence, taking into account the continuity of R(∆, P ∗ ) in P ∗ , from (5.16) the continuity of R(E, ∆, P ∗ ) in E follows. It is well known [51] that the rate-distortion function is a nonincreasing and convex function in ∆. Let us see that the property of convexity in ∆ is not disturbed for the rate-reliability-distortion function. Lemma 5.7. R(E, ∆, P ∗ ) is a convex function in ∆. Proof. For fixed E let the points (∆1 , R1 ) and (∆2 , R2 ) belong to the curve of R(E, ∆, P ∗ ) and ∆1 ≤ ∆2 . We shall prove that for every λ from (0, 1), R(E, λ∆1 + (1 − λ)∆2 , P ∗ ) ≤ λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ). Consider for any fixed PD P ∗ the rate-distortion function (5.14). Using the fact that the rate-distortion function R(∆, P ∗ ) is a convex function in ∆, one can readily deduce R(E, λ∆1 + (1 − λ)∆2 , P ∗ ) = ≤
max
R(λ∆1 + (1 − λ)∆2 , P )
max
(λR(∆1 , P ) + (1 − λ)R(∆2 , P ))
P ∈α(E,P ∗ ) P ∈α(E,P ∗ )
≤λ
max
P ∈α(E,P ∗ )
R(∆1 , P ) + (1 − λ)
max
P ∈α(E,P ∗ )
R(∆2 , P )
= λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ).
In the sequel elaborating the binary Hamming rate-reliabilitydistortion function we conclude that R(E, ∆, P ∗ ) is not convex in E. However, before dealing with that example, it is relevant also to draw attention to the properties of the rate-reliability function R(E, P ∗ ), especially in the argument E. In this regard we have the following lemma.
92
Source Coding Rates Subject to Fidelity and Reliability Criteria
Lemma 5.8. R(E, P ∗ ) is concave in E for a fixed PD P ∗ . Proof. In (5.17) the maximization is taken over the convex set α(E, P ∗ ). From the concavity of HP (X) in P , it follows that the maximum is attained at the boundaries of α(E, P ∗ ), that is at those P for which D(P k P ∗ ) = E, unless the equiprobable PD (1/|X |, 1/|X |, . . . , 1/|X |) ∈ / ∗ α(E, P ), where the entropy attains its maximum value. Let E1 and E2 be arbitrary values from the definitional domain of R(E, P ∗ ), i.e., Ei ∈ (0, ∞), with 0 < E1 < E2 , i = 1, 2. And let R(E1 , P ∗ ) = HPE1 (X),
(5.29)
R(E2 , P ∗ ) = HPE2 (X)
(5.30)
and
be the values of the rate-reliability function for E1 and E2 , respectively, where PEi , i = 1, 2, are those PDs that maximize the entropy in (5.17) over α(Ei , P ∗ ), correspondingly, and therefore satisfy the condition D(PEi k P ∗ ) = Ei . For a real λ, 0 < λ < 1, taking into account (5.29) and (5.30), the following chain of reasonings yields the desirable result. R(λE1 + (1 − λ)E2 , P ∗ ) =
max
P ∈α(λE1 +(1−λ)E2 ,P ∗ )
HP (X)
= HPλE1 +(1−λ)E2 (X) (a)
≥ HλPE1 +(1−λ)PE2 (X) (b)
≥ λHPE1 (X) + (1 − λ)HPE2 (X) = λR(E1 , P ∗ ) + (1 − λ)R(E2 , P ∗ ), where D(PλE1 +(1−λ)E2 k P ∗ ) = λE1 + (1 − λ)E2 and the inequality (a) follows from the inequality D(λPE1 + (1 − λ)PE2 k P ∗ ) ≤ λD(PE1 k P ∗ ) + (1 − λ)D(PE2 k P ∗ ) = λE1 + (1 − λ)E2 . and the inequality (b) follows from the concavity of the entropy. The lemma is proved.
5.4 Binary Hamming Rate-Reliability-Distortion Function
93
Now let us turn to the consideration of an example promised above. For the binary source X with X = {0, 1}, PD P ∗ = {p∗ , 1 − p∗ } and Hamming distance 0, if x = x b, M d(x, x b) = 1, if x 6= x b, we denote the rate-distortion function and the rate-reliability-distortion function by RBH (∆, P ∗ ) and RBH (E, ∆, P ∗ ), respectively. Let H∆ (X), 0 ≤ ∆ ≤ 1, be the following binary entropy function M
H∆ (X) = −∆ log ∆ − (1 − ∆) log(1 − ∆). It is known (see [26, 48]) that ( HP ∗ (X) − H∆ (X), RBH (∆, P ∗ ) = 0,
0 ≤ ∆ ≤ min{p∗ , 1 − p∗ }, ∆ > min{p∗ , 1 − p∗ }. (5.31)
Using the following shortening, M
pmax =
max
P ∈α(E,P ∗ )
min{p, 1 − p},
the result below describes the binary Hamming rate-reliabilitydistortion function analytically [114]. Theorem 5.9. For every E > 0 and ∆ ≥ 0, ∗ / [α , α ], ∆ ≤ p 1 2 max , HPE (X) − H∆ (X), if p ∈ ∗ ∗ RBH (E, ∆, P ) = 1 − H∆ (X), if p ∈ [α1 , α2 ], ∆ ≤ pmax , 0, if ∆ > pmax , (5.32) where " # √ √ 2E − 22E − 1 2E + 22E − 1 M [α1 , α2 ] = , 2E+1 2E+1 1 p p 1 −2E −2E = 1− 1−2 , 1+ 1−2 2 2 and PE = {pE , 1 − pE } is a PD on X that results in D(PE k P ∗ ) = E.
94
Source Coding Rates Subject to Fidelity and Reliability Criteria
Proof. From (5.16) and (5.31) it is apparent that ( maxP ∈α(E,P ∗ ) (HP (X) − H∆ (X)), ∆ ≤ pmax , RBH (E, ∆, P ∗ ) = 0, ∆ > pmax . Let 0 ≤ ∆ ≤ pmax . We need to simplify the expression max
P ∈α(E,P ∗ )
(HP (X) − H∆ (X)) =
max
P ∈α(E,P ∗ )
HP (X) − H∆ (X).
Note that if PD {1/2, 1/2} ∈ α(E, P ∗ ) then max
P ∈α(E,P ∗ )
which holds when
1 2
HP (X) = 1,
1 ≤ E or log 2p1∗ + log 2(1−p ∗) p∗ (1 − p∗ ) ≥ 2−2(E+1) .
(5.33)
The condition (5.33) may be rewritten as a quadratic inequality p∗2 − p∗ + 2−2(E+1) ≤ 0, which is equivalent to p∗ ∈ [α1 , α2 ]. Consequently, the value of RBH (E, ∆, P ∗ ) is constant and equals to 1 − H∆ (X) for all PDs from [α1 , α2 ], which with E → 0 tends to the segment [0, 1]. Now consider the case when p∗ ∈ / [α1 , α2 ]. We show that max
P ∈α(E,P ∗ )
HP (X) = HPE (X),
(5.34)
where PE = {pE , 1 − pE } is the solution of D(PE k P ∗ ) = E, assuming pE the value nearest to 1/2. The equation (5.34) will be true by the following argument. Lemma 5.10. The function D(P k P ∗ ) = p log
p 1−p + (1 − p) log p∗ 1 − p∗
is a monotone function of p for P ∈ α(E, P ∗ ) with p belonging to min{p∗ , 1 − p∗ }, 12 .
5.4 Binary Hamming Rate-Reliability-Distortion Function
95
Proof. Let P1 = {p1 , 1 − p1 } and P2 = {p2 , 1 − p2 } be some binary PDs from α(E, P ∗ ) and p∗ ≤ p1 ≤ p2 ≤ 12 . It is required to prove the inequality D(P2 k P ∗ ) ≥ D(P1 k P ∗ ). Since the set α(E, P ∗ ) is convex, then we can represent P1 = λP ∗ + (1 − λ)P2 (for some 0 < λ < 1) and as in the proof of Lemma 5.6 we obtain D(P1 k P ∗ ) ≤ D(λP ∗ + (1 − λ)P2 k P ∗ ) ≤ λD(P ∗ k P ∗ ) + (1 − λ)D(P2 k P ∗ ) = (1 − λ)D(P2 k P ∗ ) ≤ D(P2 k P ∗ ).
Therefore, the lemma is proved (at the same time proving that pmax = pE ) and (5.34) is obtained, which gives us (5.32). An interesting and important fact to observe is that Theorem 5.9 and (5.32) show that for every binary PD P ∗ and distortion level ∆ ≥ 0 there exists a reliability value Emax > 0 at which RBH (E, ∆, P ∗ ) reaches its possible maximum value 1 − H∆ (X) and remains constant afterwards. It means that there is a reasonable demand of the receiver on the reliability. Higher requirements on it can be satisfied asymptotically by the same code rate. This constant is the value of the rate-distortion function for binary equiprobable source and Hamming distortion measure with condition ∆ ≤ min{p∗ , 1 − p∗ }. Equation (5.19) arguments our observation. Some calculations and pictorial illustrations of the rate-reliabilitydistortion function for the case considered in this section and more general ones can be found in the next section, where we discuss multiterminal configurations. In particular, Figures 6.2 and 6.6 represent typical plots for RBH (E, ∆, P ∗ ). Theorem 5.11. R(E, ∆, P ∗ ) is not convex in E. Proof. The statement immediately follows from the characterization of the rate-reliability-distortion function for the binary Hamming case (5.32): for any P ∗ one can easily choose a distortion level ∆ > pE such that RBH (E, ∆, P ∗ ) be constantly zero on some interval of values of the
96
Source Coding Rates Subject to Fidelity and Reliability Criteria
reliability E. The corresponding graphical representations in Section 6.1 will add to this logic a visual clearness. The following notice introduces a highly pertinent issue to the convexity analysis in rate-reliability-distortion area. Remark 5.1. The time-sharing argument, that very efficient theoretical tool in rate-distortion theory, fails in rate-reliability-distortion one. In other words, it is not true that “the convex combinations of two points on (E, ∆)-achievable rates’ hyperplane is (E, ∆)-achievable.” Figure 6.6 stands for an one-dimensional illustration of that claim. Nevertheless, the concavity of the binary Hamming rate-reliabilitydistortion function RBH (E, ∆, P ∗ ) in the reliability argument on the interval of its positiveness can be proved. That interval, under certain conditions, can coincide with the whole domain (0, ∞) of definition of this function. Let Einf (∆) be the infimum of reliability values E for which ∆ ≤ pE for given P ∗ and distortion level ∆ ≥ 0. Lemma 5.12. RBH (E, ∆, P ∗ ) is concave in E on interval of its positiveness (Einf (∆), ∞). Proof. First note that the condition ∆ ≤ pE provides the positiveness of the function RBH (E, ∆, P ∗ ) on (Einf (∆), ∞). For a fixed ∆ ≥ 0 and P ∗ there exists a value Emax of the reliability such that if E ≥ Emax , then RBH (E, ∆, P ∗ ) is constant and equals to 1 − H∆ (X). Since 1 − H∆ (X) is the maximal value of the binary Hamming rate-reliability-distortion function, it remains to prove the concavity of RBH (E, ∆, P ∗ ) on the interval (Einf (∆), Emax ]. Let 0 < E1 < E2 and Ei ∈ (Einf (∆), Emax ], i = 1, 2. It follows from (5.32) that RBH (E1 , ∆, P ) = HPE1 (X) − H∆ (X), RBH (E2 , ∆, P ) = HPE2 (X) − H∆ (X),
5.5 Reliability Criterion in AVS Coding
97
where D(PE1 k P ∗ ) = E1 , D(PE2 k P ∗ ) = E2 . M Putting Eλ = λE1 + (1 − λ)E2 , the proof looks almost identical to that of Lemma 5.8, videlicet, for 0 < λ < 1 we have RBH (Eλ , ∆, P ∗ ) =
max
P ∈α(Eλ ,P ∗ )
HP (X) − H∆ (X)
= HPEλ (X) − H∆ (X) (a)
≥ HλPE1 +(1−λ)PE2 (X) − H∆ (X) (b)
≥ λHPE1 (X) + (1 − λ)HPE2 (X) − H∆ (X) = λHPE1 (X) + (1 − λ)HPE2 (X) − λH∆ (X) − (1 − λ)H∆ (X) = λ(HPE1 (X) − H∆ (X)) + (1 − λ) × (HPE2 (X) − H∆ (X)) = λRBH (E1 , ∆, P ∗ ) + (1 − λ)RBH (E2 , ∆, P ∗ ), where again D(PEλ k P ∗ ) = Eλ and the inequalities (a) and (b) are valid due to the same arguments as those in the proof of Lemma 5.8. Note that, in particular, Einf (∆) can take on the value 0, providing the concavity of the binary Hamming rate-reliability-distortion function RBH (E, ∆, P ∗ ) on the whole domain (0, ∞). The explicit form (5.32) of the function allows us to conclude that it always holds when RBH (∆, P ∗ ) > 0. And when RBH (∆, P ∗ ) = 0 it holds under the condition RBH (E, ∆, P ∗ ) > RBH (∆, P ∗ ) for all values of E from (0, ∞). We will turn to some illustrations on this point in an example concerning the robust descriptions system elaborated in Section 6.1 of the next chapter.
5.5
Reliability Criterion in AVS Coding
The subject of the section is the error exponent criterion in lossy coding for a more general class of sources — arbitrarily varying sources (AVS). The paper [142] by Harutyunyan and Han Vinck solves the problem of specifying the AVS rate-reliability-distortion function and its dual AVS
98
Source Coding Rates Subject to Fidelity and Reliability Criteria
maximum error exponent one in the setting of Section 5.2. The formulas derived there constitute more general results implying the main formulas discussed in the previous sections of this review. Particularly, in zero-distortion case, these formulas specialize the ones derived by Fu and Shen [77] from their hypothesis testing analysis for AVS. The model of AVS is more general than that of DMS. In the first one, the source outputs distribution depends on the source state. The latter varies within a finite set from one time instant to the next in an arbitrary manner. Let X and S be finite sets, X assigned for the source alphabet and S for the source states, and let M
P = {Ps , s ∈ S} be a family of discrete PDs M
Ps = {P (x | s) : x ∈ X } on X . The probability of an x ∈ X N subject to a consecutive source states s is determined by M
P N (x | s) =
N Y
P (xi | si ).
n=1
Therefore, the AVS defined by P is a sequence of RVs {Xi }∞ i=1 for N which the PD of N -length RV X is an unknown element from P N , N th Cartesian product of P. Quite apparently, at each source state s ∈ S the AVS is a DMS with PD Ps . The problem statement for the AVS coding subject to fidelity and reliability criteria is totally identical to the DMS coding problem under the same constraints, with only difference in the source model. That’s why bear in mind those details explained in Section 5.2, such as the definitions of a distortion measure, a code (f, g), we just characterize the latter’s probability of error as M
e(f, g, ∆) = 1 − min P N (A | s), s∈S N
when assuming the source restoration within a ∆ threshold. The further concepts of (E, ∆)-achievable rates and the AVS rate-reliabilitydistortion function based upon the definition of e(f, g, ∆) are the analogies of DMS ones.
5.5 Reliability Criterion in AVS Coding
99
A wide range of problems on the subject of the AVS coding in zerodistortion case are treated and solved in Ahlswede’s papers [5] and [6]. The AVS coding problem under distortion criterion is also known as the Berger source coding game [27]. We reclaim the classical result [51] (Theorem 2.4.3) on the AVS rate-distortion function R(∆) in the following theorem. Theorem 5.13. Let the AVS be given by the family P. For any ∆ ≥ 0, R(∆) = max R(∆, P ), ¯ P ∈P
(5.35)
where P¯ is the convex hull of P, ( ) X X M λs Ps , 0 ≤ λs ≤ 1, λs = 1 . P¯ = s∈S
s∈S
In what follows, we sum up the results from [142] on the error exponent analysis for AVS coding without proofs. [142] sets up the AVS rate-reliability-distortion function R(E, ∆). Theorem 5.14. Let the AVS be given by the family P. For any E > 0 and ∆ ≥ 0, R(E, ∆) = max R(E, ∆, P ), ¯ P ∈P
(5.36)
or, equivalently, R(E, ∆) = max
max
¯ Q:D(QkP )≤E P ∈P
R(∆, Q).
(5.37)
(5.36) is the first general formula which implies several ones afterwards. Apparently, R(∆, P ) is easily derivable. Corollary 5.6. As E → 0, R(E, ∆, P ) → R(∆, P ), and the AVS ratedistortion function (5.35) obtains from (5.36). The formula for the AVS maximum error exponent function E(R, ∆) inverse to R(E, ∆) is the second general result.
100
Source Coding Rates Subject to Fidelity and Reliability Criteria
Theorem 5.15. For every ∆ ≥ 0, the best asymptotic error exponent E(R, ∆) for AVS coding is given by the formula (cf. the DMS version (5.9)) E(R, ∆) = inf
inf
¯ Q:R(∆,Q)>R P ∈P
D(Q k P )
(5.38)
with the assumption max R(∆, P ) < R ≤ log |X | . ¯ P ∈P
In case of ∆ = 0, another impact of Theorem 5.14 on the theory is for the AVS rate-reliability function RAVS (E) (cf. the DMS version (5.17)) for given exponent E. Theorem 5.16. For every E > 0, RAVS (E) = max
max
¯ Q:D(QkP )≤E P ∈P
HQ (X).
(5.39)
The formula (5.39) represents the analytics of the E-optimal rate function for the AVS coding obtained by Fu and Shen [77] while treating the hypothesis testing problem for AVS with exponential-type constraint. The counterpart of (5.39), also derived in [77], is the maximum error exponent EAVS (R) in AVS coding which directly follows from Theorem 5.15. Theorem 5.17. EAVS (R) = min
min
¯ Q:HQ (X)≥R P ∈P
D(Q k P )
for max HP (X ) < R ≤ log |X | . ¯ P ∈P
The formula (5.40) is easily verifiable from (5.38) and (5.18).
(5.40)
5.5 Reliability Criterion in AVS Coding
101
Corollary 5.7. As E → 0 and ∆ = 0, the asymptotically optimal lossless coding rate RAVS for AVS (cf. also [5]) from (5.36) is given by RAVS = max HP (X). ¯ P ∈P
Corollary 5.8. R(E, ∆) and R(E, ∆, P ) have the same limit, namely ¯ the zero-error rate-distortion function R(∆) (see [51]) as E → ∞. In particular, (5.36) or (5.37) implies ¯ R(∆) = max R(∆, P ). P ∈P(X )
These were the main issues concerning the AVS coding that we aimed to survey here. Along the aforecited claims one might convinced of the generality of the results on the subject of source coding under distortion and error exponent criteria. They require an effortless specialization for the DMS models that has been studying in the field more intensively. Example 5.1. Let X be a binary Hamming AVS changing its distribution law at each time stamp in an arbitrary “-floating” manner. M This model assumes that X follows either the PD P1 = {p, 1 − p} or M P2 = {p + , 1 − p − } as a matter of state at which it stays. The ratereliability-distortion function RBH (E, ∆) of this kind of sources can be readily derived from Theorem 5.14 in view of Theorem 5.9. For every E > 0 and ∆ ≥ 0 max HQE (X) − H∆ (X), if (a) Q: p≤q≤p+, D(QE kQ)=E RBH (E, ∆, ) = 1 − H∆ (X), if (b) 0, if (c) where (a) [p, p + ] ∩ [α1 , α2 ] = ∅, ∆ ≤ pmax (), (b) [p, p + ] ∩ [α1 , α2 ] 6= ∅, ∆ ≤ pmax (), (c) ∆ > pmax (),
102
Source Coding Rates Subject to Fidelity and Reliability Criteria M
M
with Q = {q, 1 − q} and QE = {qE , 1 − qE } be binary PD’s on source alphabet X , M
pmax () = M
max
max
Q: p≤q≤p+ Q0 : D(Q0 kQ)≤E
min{q 0 , 1 − q 0 }
and Q0 = {q 0 , 1 − q 0 } be another binary PD on X .
6 Reliability Criterion in Multiterminal Source Coding
6.1
Robust Descriptions System
We illustrate the rate-reliability approach in source coding by analysis of a multiterminal system. The rate-distortion solution for this system has a simple characterization, which follows from the result for the ordinary rate-distortion function. But the problem with the reliability criterion has a non-trivial solution and interesting specific nuances. We consider a source coding problem with fidelity and reliability criteria for the robust descriptions system with one encoder and many decoders. The rate-reliability-distortion function will be specified. An example will be given showing a distinction in calculation of the ratereliability-distortion function and the rate-distortion one. The original elaboration was published in [121]. A model of multiple descriptions, earlier studied by El Gamal and Cover [67] and named robust descriptions system, is presented in Figure 6.1. Some models of multiple description systems have been considered also in [7, 26, 29, 30, 67, 123, 140, 170, 220, 226]. Messages of a DMS encoded by one encoder must be transmitted to K different receivers. Each of them, based upon the same codeword, 103
104
Reliability Criterion in Multiterminal Source Coding
Fig. 6.1 Robust descriptions system.
has to restore the original message within a distortion with a reliability both acceptable to him/her. As in Section 5 the source X is defined by a sequence {Xi }∞ i=1 of discrete i.i.d. RVs taking values in the finite alphabet X . The source generic PD P ∗ and N -length vector probabilities are defined according to (5.1) and (5.2). The finite sets X k , k = 1, K, in general different from X and from each others, are the reproduction alphabets of the corresponding receivers. Let dk : X × X k → [0; ∞) be a single-letter distortion criterion between the source and kth reconstruction alphabet like in (5.3), k = 1, K. The distortion measures for sequences x ∈ X N and xk ∈ X kN of length N are averaged by the components’ distortions M
dk (x, xk ) =
N 1 X k d (xn , xkn ), N
k = 1, K.
n=1
M
A code (f, g) = (f, g 1 , g 2 , . . . , g K ) is a family of (K + 1) mappings: one encoder f : X N → {1, 2, . . . , L(N )} that feeds K separate decoders g k : {1, 2, . . . , L(N )} → X kN , k = 1, K. The system designer looks for an appropriate coding strategy to guarantee the recovery of the source messages for each addressee k, making satisfied the latter’s requirement on distortion not more than ∆k with reliability E k , k = 1, K.
6.1 Robust Descriptions System
105
For given distortion levels ∆k ≥ 0, k = 1, K we consider the sets: M
Ak = {x : g k (f (x)) = xk , dk (x, xk ) ≤ ∆k },
k = 1, K,
and the error probabilities of the code (f, g): M
ek (f, g k , P ∗ , ∆k , N ) = 1 − P ∗N (Ak ),
k = 1, K.
M
M
For brevity we denote E = (E 1 , . . . , E K ) and ∆ = (∆1 , . . . , ∆K ). Now similarly to the conditions (5.5) and (5.6) we define the achievability of a rate by reliability criterion and rate-reliabilitydistortion dependence for this multiterminal system. A number R ≥ 0 is called (E, ∆)-achievable rate for E k > 0, ∆k ≥ 0, k = 1, K, if for every ε > 0, δ > 0, and sufficiently large N there exists a code (f, g) such that 1 log L(N ) ≤ R + ε, N ek (f, g k , P ∗ , ∆k , N ) ≤ exp{−N (E k − δ)},
k = 1, K.
Denote the rate-reliability-distortion function for this system by R(E, ∆, P ∗ ), it is the minimum of the (E, ∆)-achievable rates for all components of E and ∆. Let R(∆, P ∗ ) be the corresponding ratedistortion function, which was specified by El Gamal and Cover in [67]. R(∆, P ∗ ) is the limit function of R(E, ∆, P ∗ ) as E k → 0, k = 1, K. We specify the rate-reliability-distortion function [115, 121]. We give also an example of the calculation of the rate-reliability-distortion function for the case of binary source with two receivers and Hamming distortion measures. This example demonstrates some peculiarity of rate-reliability-distortion function as a function of E and ∆ for different particular cases. It is shown that in contradistinction to the case of the rate-distortion function, where the smallest distortion component is decisive, for the rate-reliability-distortion one the decisive element may be the greatest reliability component. Some possible curves for R(E, ∆, P ∗ ) are illustrated in Figures 6.2–6.6. Now we formulate the result. Let P = {P (x), x ∈ X } be a PD on X and Q = {Q(x1 , . . . , xK | x), x ∈ X , xk ∈ X k , k = 1, K},
106
Reliability Criterion in Multiterminal Source Coding
Fig. 6.2 (a) RBH (E 1 , ∆1 , P ∗ ) for p∗ = 0.15, ∆1 = 0.1, (b) for p∗ = 0.15, ∆1 = 0.3.
be a conditional PD on X 1 × · · · × X K for given x. We shall also use the marginal conditional PDs X M Q(xk | x) = Q(x1 , . . . , xK | x), k = 1, K. xj ∈X j , j=1,K, j6=k
Assume without loss of generality that 0 < E1 ≤ E2 ≤ · · · ≤ EK .
6.1 Robust Descriptions System
107
Fig. 6.3 RBH (E 1 , ∆1 , P ∗ ) as a function of E 1 and ∆1 .
Fig. 6.4 RBH (E, ∆, P ∗ ) as a function of ∆1 and ∆2 for P ∗ = (0.15, 0.85), E 1 = 0.09, E 2 = 0.49.
Remark that then α(E k , P ∗ ) ⊆ α(E k+1 , P ∗ ), k = 1, K − 1. (For definition of α(E k , P ∗ ) see (5.11).) Let for each j ∈ {1, 2, . . . , K} the set Qj (P, ∆) be the collection of those conditional PDs QP (x1 , . . . , xK | x) for which the given P satisfies the following conditions: EP,QP dk (X, X k ) =
X x, xk
P (x)QP (xk | x)dk (x, xk ) ≤ ∆k ,
k = j, K.
108
Reliability Criterion in Multiterminal Source Coding
Fig. 6.5 RBH (E, ∆, P ∗ ) as a function of E 1 and E 2 for P ∗ = (0.15, 0.85), ∆1 = 0.1, ∆2 = 0.13.
Fig. 6.6 RBH (E, ∆, P ∗ ) for P ∗ = (0.15, 0.85), ∆1 = 0.1, ∆2 = 0.13, E 1 = 0.09.
The function R∗ (E, ∆, P ∗ ) determined as follows: M
R∗ (E, ∆, P ∗ ) = max max
h
max
min
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
IP,QP (X ∧ X 2 , . . . , X K ), · · · i min IP,QP (X ∧ X K ) .
min
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
max
IP,QP (X ∧ X 1 , . . . , X K ),
P ∈α(E K ,P ∗ )−α(E K−1 ,P ∗ ) QP ∈QK (P,∆)
6.1 Robust Descriptions System
109
helps to state the rate-reliability-distortion function for the robust descriptions problem. Theorem 6.1. For every E k > 0, ∆k ≥ 0, k = 1, K, R(E, ∆, P ∗ ) = R∗ (E, ∆, P ∗ ).
Corollary 6.1. When E k = E, k = 1, K, R(E, ∆, P ∗ ) =
max
min
P ∈α(E,P ∗ ) QP ∈Q1 (P,∆)
IP,QP (X ∧ X 1 , . . . , X K ).
Corollary 6.2. (El Gamal and Cover [67].) As E → 0, we obtain the rate-distortion function R(∆, P ∗ ) =
min
QP ∗ ∈Q1 (P ∗ ,∆)
IP ∗ ,QP ∗ (X ∧ X 1 , . . . , X K ).
The proof of Theorem 6.1 may be found in [121], it is a natural extension of the proof for Theorem 5.4. The direct part of that theorem is proved by constructing hierarchical type coverings applying Lemma 5.5. Exploration of an Example. We specify the rate-reliability-distortion function R(E, ∆) for the robust descriptions system of Figure 6.1 for the case of binary source with K = 2 and Hamming distortion measures. The binary source is characterized by the alphabets X = X 1 = X 2 = {0, 1}, with generic PD P ∗ = (p∗ , 1 − p∗ ) and Hamming distortion measures ( ( 0, for x = x1 , 0, for x = x2 , 1 1 2 2 d (x, x ) = d (x, x ) = 1, for x 6= x1 , 1, for x 6= x2 . The binary Hamming rate-reliability-distortion function is denoted by RBH (E, ∆, P ∗ ) and by RBH (∆, P ∗ ) — the binary Hamming ratedistortion one for the robust descriptions [121]. They are the extensions of the corresponding functions treated in Section 5.4.
110
Reliability Criterion in Multiterminal Source Coding
For E 1 such that (1/2, 1/2) ∈ α(E 1 , P ∗ ) Theorem 5.9 results in the following reduction ( 1 − H∆1 (X), when ∆1 ≤ 1/2, 1 1 ∗ RBH (E , ∆ , P ) = (6.1) 0, when ∆1 > 1/2. Let PE 1 = (pE 1 , 1 − pE 1 ) where pE 1 is the nearest to 1/2 solution of the equation D(PE 1 k P ∗ ) = E 1 . PE 2 is defined analogously. For E 1 such that (1/2, 1/2) ∈ / α(E 1 , P ∗ ) HPE1 (X) − H∆1 (X), when ∆1 ≤ pE 1 , 1 1 ∗ RBH (E , ∆ , P ) = 0, when ∆1 > pE 1 .
(6.2)
As we have seen in Section 5, RBH (E 1 , ∆1 , P ∗ ) as a function of ∆1 is concave, as a function of E 1 for ∆1 such that RBH (∆1 , P ∗ ) > 0 it is convex (see Figure 6.2(a)). For ∆1 such that RBH (∆1 , P ∗ ) = 0 it is convex when E 1 > E 1 (∆1 ), where E 1 (∆1 ) is the minimal E 1 , for which ∆1 ≤ pE 1 (see Figure 6.2(b)). RBH (E 1 , ∆1 , P ∗ ) as a function of E 1 and ∆1 is illustrated in Figure 6.3. Consider the case of ∆1 < ∆2 . It is not difficult to verify that in this case the binary Hamming rate-distortion function RBH (∆, P ∗ ) is determined by the demand of the receiver with the smallest distortion level. Therefore, we have RBH (∆, P ∗ ) = RBH (∆1 , P ∗ ) ( HP ∗ (X) − H∆1 (X), if ∆1 ≤ min(p∗ , 1 − p∗ ), = 0, if ∆1 > min(p∗ , 1 − p∗ ). For the case of E 1 < E 2 Theorem 5.9 gives RBH (E, ∆, P ∗ ) = max max
min
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
max
IP,QP (X ∧ X 1 , X 2 ), min
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
IP,QP (X ∧ X ) . 2
6.1 Robust Descriptions System
111
By analogy with the calculation of the binary Hamming rate-reliabilitydistortion function RBH (E 1 , ∆1 , P ∗ ) we have the following equalities. For E 1 such that (1/2, 1/2) ∈ α(E 1 , P ∗ ) max
min
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
=
IP,QP (X ∧ X 1 , X 2 )
( 1 − H∆1 (X), when ∆1 ≤ 1/2, when ∆1 > 1/2.
0,
For E 1 such that (1/2, 1/2) ∈ / α(E 1 , P ∗ ) max
min
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
=
IP,QP (X ∧ X 1 , X 2 )
( HPE1 (X) − H∆1 (X), when ∆1 ≤ pE 1 , when ∆1 > pE 1 .
0,
For E 2 such that (1/2, 1/2) ∈ α(E 2 , P ∗ ) − α(E 1 , P ∗ ) max
min
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
=
IP,QP (X ∧ X 2 )
( 1 − H∆2 (X), when ∆2 ≤ 1/2, 0,
when ∆2 > 1/2.
For E 2 such that (1/2, 1/2) ∈ / α(E 2 , P ∗ ) − α(E 1 , P ∗ ) max
min
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
=
IP,QP (X ∧ X 2 )
( HPE2 (X) − H∆2 (X), when ∆2 ≤ pE 2 , 0,
when ∆2 > pE 2 .
RBH (E, ∆, P ∗ ) as a function of ∆1 and ∆2 (for fixed E 1 , E 2 ) is illustrated in Figure 6.4 An example of RBH (E, ∆, P ∗ ) as a function of E 1 , E 2 (for fixed ∆1 , 2 ∆ ) is shown in Figure 6.5. It is apparent to see that RBH (E, ∆, P ∗ ) as a function of E 2 when (1/2, 1/2) ∈ α(E 1 , P ∗ ) is constant and equals to its possible maximum
112
Reliability Criterion in Multiterminal Source Coding
value 1 − H∆1 (X). Let us assume that ∆1 < min(p∗ , 1 − p∗ ).
(6.3)
Then RBH (∆, P ∗ ) = HP ∗ (X) − H∆1 (X). Consider the following two cases for fixed ∆1 , satisfying (6.3) and E 1 such that (1/2, 1/2) ∈ / α(E 1 , P ∗ ), ∆1 < pE 1 . 2 For E such that (1/2, 1/2) ∈ α(E 2 , P ∗ ), ∆2 < 21 , we obtain RBH (E, ∆, P ∗ ) = max[HPE1 (X) − H∆1 (X), 1 − H∆2 (X)], and when HPE1 (X) − H∆1 (X) < 1 − H∆2 (X),
(6.4)
then RBH (E, ∆, P ∗ ) = RBH (E 2 , ∆2 , P ∗ ) = 1 − H∆2 (X). For E 2 such that (1/2, 1/2) ∈ / α(E 2 , P ∗ ), ∆2 < pE 2 , we get RBH (E, ∆, P ∗ ) = max[HPE1 (X) − H∆1 (X), HPE2 (X) − H∆2 (X)], and when HPE1 (X) − H∆1 (X) < HPE2 (X) − H∆2 (X), then RBH (E, ∆, P ∗ ) = HPE2 (X) − H∆2 (X). Note that RBH (E, ∆, P ∗ ) as a function of E 2 when (1/2, 1/2) ∈ / 1 ∗ α(E , P ) and (6.4) does not hold is constant and equals to HPE1 (X) − H∆1 (X). RBH (E, ∆, P ∗ ) as a function of E 2 when (1/2, 1/2) ∈ / α(E 1 , P ∗ ) and (6.4) is valid, is presented in Figure 6.6.
6.2
Cascade System Coding Rates with Respect to Distortion and Reliability Criteria
In this section, we discuss a generalization [122] of the multiterminal communication system first studied by Yamamoto [223]. Messages of
6.2 Cascade System Coding Rates
113
Fig. 6.7 Cascade communication system.
two correlated sources X and Y coded by a common encoder and a separate encoder must be transmitted to two receivers by two common decoders within prescribed distortion levels ∆1x , ∆1y and ∆2x , ∆2y , respectively (see Figure 6.7). The region R(E 1 , E 2 , ∆1x , ∆1y , ∆2x , ∆2y , P ∗ ) of all achievable rates of the best codes ensuring reconstruction of messages of the sources X and Y within given distortion levels with error probabilities exponents E 1 and E 2 at the first and second decoders, respectively, is illustrated. Important consequent cases are noted. The problem is a direct generalization of the problem considered by Yamamoto. Let {Xi , Yi }∞ i=1 be a sequence of discrete i.i.d. pairs of RVs, taking values in the finite sets X and Y, which are the sets of all messages of the sources X and Y , respectively. Let the generic joint PD of messages of the two sources be P ∗ = {P ∗ (x, y), x ∈ X , y ∈ Y}. For memoryless sources the probability P ∗N (x, y) of a pair of N -sequences of messages x = (x1 , x2 , . . . , xN ) ∈ X N and y = (y1 , y2 , . . . , yN ) ∈ Y N is defined by P ∗N (x, y) =
N Y
P ∗N (xn , yn ).
n=1
The sets X 1 , X 2 and Y 1 , Y 2 are reconstruction alphabets and in general they are different from X and Y, respectively. Let d1x : X × X 1 → [0, ∞) , d2x : X × X 2 → [0, ∞) , d1y : Y × Y 1 → [0, ∞) , d2y : Y × Y 2 → [0, ∞) ,
114
Reliability Criterion in Multiterminal Source Coding
be corresponding distortion measures. The distortion measures for N -sequences are defined by the respective average of per letter distortions d1x (x, x1 ) = N −1 d1y (y, y1 ) = N −1
N X n=1 N X
d1x (xn , x1n ), d2x (x, x2 ) = N −1 d1y (yn , yn1 ), d2y (y, y2 ) = N −1
n=1
N X
d2x (xn , x2n ),
n=1 N X
d2y (yn , yn2 ),
n=1
where x ∈X N , y ∈Y N , x1 ∈ X 1N , x2 ∈ X 2N , y1 ∈ Y 1N , y2 ∈ Y 2N . For the considered system the code (f, fb, g, gb) is a family of four mappings: f : X N × Y N → {1, 2, . . . , K (N )}, fb : {1, 2, . . . , K (N )} → {1, 2, . . . , L (N )}, g : {1, 2, . . . , K (N )} → X 1N × Y 1N , gb : {1, 2, . . . , L (N )} → X 2N × Y 2N . Define the sets M Ai = {(x, y) : g(f (x, y)) = (x1 , y1 ), gb(fb(f (x, y)))
= (x2 , y2 ), dix (x, xi ) ≤ ∆ix , diy (y, yi ) ≤ ∆iy },
i = 1, 2.
For given levels of admissible distortions ∆1x ≥ 0, ∆2x ≥ 0, ∆1y ≥ 0, ∆2y ≥ 0 the error probabilities of the code are ei (f, fb, g, gb, P ∗ , ∆ix , ∆iy ) = 1 − P ∗n (Ai ) , M
i = 1, 2. M
Again, for brevity we denote E = (E 1 , E 2 ) and ∆ = (∆1x , ∆1y , ∆2x , ∆2y ). b is said to be a (E, ∆)A pair of two nonnegative numbers (R, R) achievable rates pair for E 1 > 0, E 2 > 0, ∆1x ≥ 0, ∆2x ≥ 0, ∆1y ≥ 0, ∆2y ≥ 0, if for arbitrary ε > 0, δ > 0, and N sufficiently large there exists a code (f, fb, g, gb) such that N −1 log K (N ) ≤ R + ε,
b + ε, N −1 log L (N ) ≤ R
and ei (f, fb, g, gb, P ∗ , ∆ix , ∆iy ) ≤ exp{−N (Ei − δ)},
i = 1, 2.
(6.5)
6.2 Cascade System Coding Rates
115
Let R(E, ∆, P ∗ ) be the set of all pairs of (E, ∆)-achievable rates and R(∆, P ∗ ) be the corresponding rate-distortion region. If at the first decoder only the messages of the source X and at the second decoder only the messages of the source Y are reconstructed, then R(∆, P ∗ ) becomes the set of all (∆1x , ∆2y )-achievable rates pairs R1 (∆1x , ∆2y , P ∗ ) studied by Yamamoto in [223]. We present the rate-reliability-distortion region R(E, ∆, P ∗ ) without proof. Let P = {P (x, y), x ∈ X , y ∈ Y} be some PD on X × Y and Q = {Q(x1 , y 1 , x2 , y 2 | x, y), x ∈ X , y ∈ Y, x1 ∈ X 1 , y 1 ∈ Y 1 , x2 ∈ X 2 , y 2 ∈ Y 2 } be conditional PD on X 1 × Y 1 × X 2 × Y 2 for given x and y. We proceed with the notations and definitions taking into account two possible cases regarding the reliability constraints. (1) E 1 ≤ E 2 . Let Q(P, E, ∆) be the set of those conditional PDs QP (x1 , y 1 , x2 , y 2 | x, y) for which if P ∈ α(E 1 , P ∗ ) the following conditions are satisfied EP,QP d1x (X, X 1 ) X = x,y,x1 ,y 1 ,x2 ,y 2 ≤ ∆1x , EP,QP d1y (Y, Y 1 )
=
X
x,y,x1 ,y 1 ,x2 ,y 2 ≤ ∆1y , EP,QP d2x (X, X 2 )
= ≤
X x,y,x1 ,y 1 ,x2 ,y 2 ∆2x ,
P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d1x (x, x1 ) (6.6) P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d1y (y, y 1 ) (6.7) P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d2x (x, x2 ) (6.8)
116
Reliability Criterion in Multiterminal Source Coding
EP,QP d2y (Y, Y 2 ) X = ≤
P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d2y (y, y 2 )
x,y,x1 ,y 1 ,x2 ,y 2 ∆2y ,
(6.9)
α(E 2 , P ∗ )
α(E 1 , P ∗ ),
otherwise, if P ∈ − then only (6.8) and (6.9) hold. (2) E 2 ≤ E 1 . If P ∈ α(E 2 , P ∗ ), let the set Q(P, E, ∆) consists of those conditional PDs QP which make the inequalities (6.6)–(6.9) valid, otherwise, if P ∈ α(E 1 , P ∗ ) − α(E 2 , P ∗ ), then only (6.6) and (6.7) are satisfied. When E 1 = E 2 = 0, the notation Q(P ∗ , ∆) specializes Q(P, E, ∆). Introduce the following sets: (1) for E 1 ≤ E 2 , \
R1 (E, ∆, P ∗ ) =
n b : (R, R)
[
P ∈α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ), o b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) , R P R2 (E, ∆, P ∗ ) \
=
[
n b : R ≥ R, b (R, R)
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
o b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) , R P (2) for E 2 ≤ E 1 , R3 (E, ∆, P ∗ ) \ =
[
n b : (R, R)
P ∈α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ max[IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ), max
min
P ∈α(E 1 ,P ∗ )−α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
o b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) . R P
IP,QP (X, Y ∧ X 1 , Y 1 )],
6.3 Reliability Criterion in Successive Refinement
117
Theorem 6.2. For every 0 < E 1 ≤ E 2 , ∆ix ≥ 0, ∆iy ≥ 0, i = 1, 2, \ R(E, ∆, P ∗ ) = R1 (E, ∆, P ∗ ) R2 (E, ∆, P ∗ ). For every 0 < E 2 ≤ E 1 , ∆ix ≥ 0, ∆iy ≥ 0, i = 1, 2, R(E, ∆, P ∗ ) = R3 (E, ∆, P ∗ ).
Corollary 6.3. When E 1 = E 2 = E, we obtain R0 (E, ∆, P ∗ ): \ [ M b : R0 (E, ∆, P ∗ ) = (R, R) P ∈α(E,P ∗ ) QP ∈Q(P,E,∆)
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ), b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) . R P As E → 0 we derive the rate-distortion dependence R(∆, P ∗ ): [ b : R(∆, P ∗ ) = (R, R) QP ∗ ∈Q(P ∗ ,∆)
R ≥ IP ∗ ,QP ∗ (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ), b ≥ IP ∗ ,Q ∗ (X, Y ∧ X 2 , Y 2 ) . R P
Corollary 6.4. If the first decoder recovers only the messages of the source X and the second decoder recovers only the messages of Y , we obtain the result by Yamamoto [223]: [ b : R ≥ IP ∗ ,Q ∗ (X, Y ∧ X 1 , Y 2 ), R(∆, P ∗ ) = (R, R) P QP ∗ ∈Q(P ∗ ,∆)
b ≥ IP ∗ ,Q ∗ (X, Y ∧ Y 2 ) . R P
6.3
Reliability Criterion in Hierarchical Source Coding and in Successive Refinement
In this section, we discuss the hierarchical source coding problem and the relevant concept of successive refinement of information subject to reliability criterion.
118
Reliability Criterion in Multiterminal Source Coding
The problem treatment was originally published in [120] and later developed in an addendum [141]. During the proofs below we shall demonstrate the conditions necessary and sufficient for the successive refinement under the reliability constraint. We reproduce the characterization of the rate-reliabilitydistortion region for the hierarchical source coding and the derivation of the conditions for successive refinability subject to reliability criterion according to [141]. The concept of source divisibility was introduced by Koshelev [155, 156, 157, 158] as an optimality criterion for source coding in hierarchical systems. The same notion as successive refinement of information was independently redefined by Equitz and Cover in [70]. They questioned the conditions under which it is possible to achieve the optimum performance bounds at each level of successively more precise transmission. In terms of rate-distortion limits the problem statement has the following description. Assume that we transmit information to two users, the requirement of the first on distortion is no larger than ∆1 and demand of the second user is more accurate: ∆2 ≤ ∆1 . The value R(∆1 , P ) of the rate-distortion function for a source distributed according to the probability law P is the minimal satisfactory transmission rate at the first destination. Adding an information of a rate R0 addressed to the second user the fidelity can be made more precise providing distortion no larger than ∆2 . It is interesting to know when it is possible to guarantee the equality R(∆1 , P ) + R0 = R(∆2 , P ). The answer to this question is given in Koshelev’s papers [155, 156, 157, 158], and in [70] by Equitz and Cover. Koshelev argued that the Markovity condition for the RVs characterizing the system is sufficient to achieve the rate-distortion limits. Later on the same necessary condition was also established in [70], where the authors exploited Ahlswede’s result [7] on multiple descriptions without excess rate. We quote their result in Theorem 6.7. Another proof of that result by means of characterization of the rate-distortion region for the described hierarchical source coding situation is given in Rimoldi’s paper [187]. In [120], treating the above notion of successive refinement of information an additional criterion to the quality of information
6.3 Reliability Criterion in Successive Refinement
119
reconstruction — the reliability is introduced. The extension of the rate-distortion case to the rate-reliability-distortion one is the assumption that the required performance limit is the rate-reliabilitydistortion function. So, messages of the source must be coded for transmission to the receiver with a distortion not exceeding ∆1 within the reliability E1 , and then, using an auxiliary information (let again at a coding rate R0 ), restored with a more precise distortion ∆2 ≤ ∆1 within the reliability E2 . Naturally but not necessarily E2 ≥ E1 . Under the successive refinement of information (or divisibility of the source with a PD P ) with reliability requirement, from (E1 , ∆1 ) to (E2 , ∆2 ), the condition R(E2 , ∆2 , P ) = R(E1 , ∆1 , P ) + R0 is assumed. The characterization of rate-distortion region for this hierarchical transmission (also called the scalable source coding) has been independently obtained by Koshelev [156] and Rimoldi [187]. It is derived also by Maroutian [170] as a corollary from the characterization of the ratereliability-distortion region for the same system. Later on, the error exponents in the scalable source coding were studied in the paper [151] by Kanlis and Narayan. In [206] Tuncel and Rose pointed out the difference between their result and the necessary and sufficient conditions for successive refinability under error exponent criterion previously derived and reported in [120]. In the paper [141] the author revises [120] and the region [170] of attainable rates subject to reliability criterion which was the basis for investigation [120]. Amending Maroutian’s constructions and proofs he restates the conditions by Tuncel and Rose [206]. First, we discuss the hierarchical source coding problem and the results on the achievable rates region. Let again P ∗ = {P ∗ (x), x ∈ X } be the PD of messages x of DMS X of finite alphabet X . And let the reproduction alphabets of two receivers be the finite sets X 1 and X 2 accordingly, with the corresponding single-letter distortion measures dk : X × X k → [0; ∞), k = 1, 2. The distortions dk (x, xk ), between the source N -length message x and its reproduced versions xk are defined according to (5.4). A code (f, g) = (f1 , f2 , g1 , g2 ) for the system (Figure 6.8) consists of two encoders (as mappings of the source N -length messages space X N
120
Reliability Criterion in Multiterminal Source Coding
Fig. 6.8 The hierarchical communication system.
into certain numerated finite sets {1, 2, . . . , Lk (N )}): fk : X N → {1, 2, . . . , Lk (N )},
k = 1, 2,
and two decoders acting as converse mappings into the reproduction N -dimensional spaces X 1N and X 2N in the following ways: g1 : {1, 2, . . . , L1 (N )} → X 1N , g2 : {1, 2, . . . , L1 (N )} × {1, 2, . . . , L2 (N )} → X 2N , where in g2 we deal with the Cartesian product of the two sets. Let the requirement of the first user on the averaged distortion be ∆1 ≥ 0 and of the second one be ∆2 ≥ 0. Thus the sets defined by M
A1 = {x ∈ X N : g1 (f1 (x)) = x1 , d1 (x, x1 ) ≤ ∆1 }, M
A2 = {x ∈ X N : g2 (f1 (x), f2 (x)) = x2 , d2 (x, x2 ) ≤ ∆2 }. will abbreviate the expressions which determine the probability of error (caused by an applied code (f, g)) at the output of each decoder: M
ek (f, g, P ∗ , ∆k , N ) = 1 − P ∗ (Ak ),
k = 1, 2.
We say that the nonnegative numbers (R1 , R2 ) make an (E1 , E2 , ∆1 , ∆2 )-achievable pair of coding rates if for every > 0, δ > 0, and sufficiently large N there exists a code (f, g), such that 1 log Lk (N ) ≤ Rk + , N ek (f, g, ∆k , N ) ≤ exp{−N (Ek − δ)},
k = 1, 2.
6.3 Reliability Criterion in Successive Refinement
121
Denote the set of (E1 , E2 , ∆1 , ∆2 )-achievable rates for the system by RP ∗ (E1 , E2 , ∆1 , ∆2 ). For a given quartet (E1 , E2 , ∆1 , ∆2 ) and a PD P ∈ α(E1 , P ∗ ) let Q(P, E1 , E2 , ∆1 , ∆2 ) be the set of conditional PDs QP (x1 , x2 |x) for which the expectations on the distortions hold X M P (x)QP (xk | x)d(x, xk ) ≤ ∆k , k = 1, 2, EP,QP dk (X, X k ) = x, xk
and only EP,QP d2 (X, X 2 ) ≤ ∆2 if P belongs to the difference of the sets α(E2 , P ∗ ) and α(E1 , P ∗ ), i.e., P ∈ α(E2 , P ∗ ) − α(E1 , P ∗ ), iff E2 ≥ E1 . And, respectively, in case of E1 ≥ E2 , under the notation Q(P, E1 , E2 , ∆1 , ∆2 ) we mean the set of conditional PDs QP for which EP,QP dk (X, X k ) ≤ ∆k ,
k = 1, 2,
when P ∈ α(E2 , P ∗ ), and only EP,QP d1 (X, X 1 ) ≤ ∆1 if P ∈ α(E1 , P ∗ ) − α(E2 , P ∗ ). For E1 = E2 = E we use the notation Q(P, E, ∆1 , ∆2 ) instead of Q(P, E1 , E2 , ∆1 , ∆2 ), but when E → 0, α(E, P ∗ ) contains only the PD P ∗ , hence the notation Q(P ∗ , ∆1 , ∆2 ) will replace Q(P, E, ∆1 , ∆2 ) one. Each of the pairs (E1 , ∆1 ) and (E2 , ∆2 ) in the considered hierarchical coding configuration of Figure 6.8 determines the corresponding rate-reliability-distortion and the rate-distortion functions R(Ek , ∆k , P ∗ ) and R(∆k , P ∗ ), k = 1, 2, respectively. An attempt to find the entire (E1 , E2 , ∆1 , ∆2 )-achievable rates region RP ∗ (E1 , E2 , ∆1 , ∆2 ) is made in Maroutian’s paper [170]. However, a revision [141] of his result stimulated by [206] brought the author to a different region, although, in the particular case of E1 , E2 → 0, as a consequence, Maroutian has obtained the answer for the hierarchical source coding problem under fidelity criterion [157], which was also
122
Reliability Criterion in Multiterminal Source Coding
studied by Rimoldi [187]. A careful treatment of the code construction strategy based on the type covering technique (by the way, used also by Tuncel and Rose in [206]) and the combinatorial method for the converse employed in [170], help us in obtaining the results in Theorems 6.3 and 6.4, which together unify the refined result for the multiple descriptions [170] (or, in other terminology, for the hierarchical or scalable source coding). Theorem 6.3. For E1 , E2 with E1 ≥ E2 > 0, and ∆1 ≥ 0, ∆2 ≥ 0, RP ∗ (E1 , E2 , ∆1 , ∆2 ) \ =
[
(R1 , R2 ) :
P ∈α(E2 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ max max R(∆1 , P ), IP,QP (X ∧ X ) , P ∈α(E1 ,P ∗ )−α(E2 ,P ∗ ) R1 + R2 ≥ IP,QP (X ∧ X 1 , X 2 ) . (6.10)
1
Or, equivalently, RP ∗ (E1 , E2 , ∆1 , ∆2 ) \ =
[
(R1 , R2 ) :
P ∈α(E2 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ max R(E1 , ∆1 , P ∗ ), IP,QP (X ∧ X 1 ) , 1 2 R1 + R2 ≥ IP,QP (X ∧ X , X ) .
(6.11)
Note that the equivalency of (6.10) and (6.11) is due to the fact that R(∆1 , P ) ≤ IP,QP (X ∧ X 1 ) for each P ∈ α(E2 , P ∗ ) and the representation (5.16). Theorem 6.4. In case of E2 ≥ E1 , RP ∗ (E1 , E2 , ∆1 , ∆2 ) \ =
[
P ∈α(E1 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
(R1 , R2 ) :
6.3 Reliability Criterion in Successive Refinement
R1 ≥ IP,QP (X ∧ X 1 ), R1 + R2 ≥ max max
R(∆2 , P ), IP,QP (X ∧ X , X ) . 1
P ∈α(E2 ,P ∗ )−α(E1 ,P ∗ )
123
2
(6.12) Or, equivalently, RP ∗ (E1 , E2 , ∆1 , ∆2 ) \ =
n (R1 , R2 ) :
[
P ∈α(E1 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ IP,QP (X ∧ X 1 ), o R1 + R2 ≥ max(R(E2 , ∆2 , P ∗ ), IP,QP (X ∧ X 1 , X 2 )) . (6.13) In this case, the equivalency of (6.12) and (6.13) is due to R(∆2 , P ) ≤ IP,QP (X ∧ X 1 X 2 ) for each P ∈ α(E1 , P ∗ ) and the representation (5.16). Comparing these regions with the ones (specialized for the corresponding cases) derived in [206] one can conclude that the regions by Tuncel and Rose formulated in terms of the scalable rate-reliability-distortion function actually are alternative forms of RP ∗ (E1 , E2 , ∆1 , ∆2 ) characterized in Theorems 6.3 and 6.4. In case of equal requirements of the receivers on the reliability, i.e., E1 = E2 = E, we get from (6.10) and (6.12) a simpler region, denoted here by RP ∗ (E, ∆1 , ∆2 ). Theorem 6.5. For every E > 0 and ∆1 ≥ 0, ∆2 ≥ 0, RP ∗ (E, ∆1 , ∆2 ) \ =
[
n (R1 , R2 ) :
P ∈α(E,P ∗ ) QP ∈Q(P,E,∆1 ,∆2 )
o R1 ≥ IP,QP (X ∧ X 1 ), R1 + R2 ≥ IP,QP (X ∧ X 1 , X 2 ) .
(6.14)
Furthermore, with E → 0 the definition of the set α(E, P ∗ ) and (6.14) yield Koshelev’s [156] result for the hierarchical source coding rate-distortion region, which independently appeared then also in [170] and [187].
124
Reliability Criterion in Multiterminal Source Coding
Theorem 6.6. For every ∆1 ≥ 0, ∆2 ≥ 0, the rate-distortion region for the scalable source coding can be expressed as follows: RP ∗ (∆1 , ∆2 ) =
[
n (R1 , R2 ) : R1 ≥ IP ∗ ,QP ∗ (X ∧ X 1 ),
QP ∗ ∈Q(P ∗ ,∆1 ,∆2 )
o R1 + R2 ≥ IP ∗ ,QP ∗ (X ∧ X 1 X 2 ) . (6.15)
As in [120], we define the notion of the successive refinability in terms of the rate-reliability-distortion function in the following way. Definition 6.1. The DMS X with PD P ∗ is said to be successively refinable from (E1 , ∆1 ) to (E2 , ∆2 ) if the optimal rates pair (R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) − R(E1 , ∆1 , P ∗ )), is (E1 , E2 , ∆1 , ∆2 )-achievable, R(E1 , ∆1 , P ∗ ).
provided
that
(6.16)
R(E2 , ∆2 , P ∗ ) ≥
It is obvious that with Ek → 0, k = 1, 2, we have the definition of the successive refinement in distortion sense [70, 157]. Another interesting special case is ∆1 = ∆2 = 0, then we deal with the successive refinement in “purely” reliability sense, namely the achievability of the optimal rates (R(E1 , P ∗ ), R(E2 , P ∗ ) − R(E1 , P ∗ )) related to the corresponding rate-reliability functions (5.17) for E2 ≥ E1 (since only this condition ensures the inequality R(E2 , P ∗ ) ≥ R(E1 , P ∗ )). Below we prove the conditions [141] for the successive refinement of information with respect to the above definition. The different conditions for two cases and their proofs employ Theorems 6.3 and 6.4 on hierarchical source coding.
6.3 Reliability Criterion in Successive Refinement
125
E1 ≥ E2 case: For this situation, from (6.11) it follows that the rates pair (6.16) is achievable iff for each P ∈ α(E2 , P ∗ ) there exists a QP ∈ Q(P, E1 , E2 , ∆1 , ∆2 ) such that the inequalities R(E1 , ∆1 , P ∗ ) ≥ max(R(E1 , ∆1 , P ∗ ), IP,QP (X ∧ X 1 )), ∗
1
2
R(E2 , ∆2 , P ) ≥ IP,QP (X ∧ X , X )
(6.17) (6.18)
hold simultaneously. These inequalities are satisfied for each P ∈ α(E2 , P ∗ ) iff R(E1 , ∆1 , P ∗ ) ≥ IP,QP (X ∧ X 1 ),
(6.19)
which is due to (6.17), and, meanwhile R(E2 , ∆2 , P ∗ ) ≥ IP,QP (X ∧ X 1 , X 2 ) ≥ IP,QP (X ∧ X 2 ) ≥ R(∆2 , P ∗ )
(6.20)
for (6.18). By the corollary (5.16) for the rate-reliability-distortion function it follows that (6.19) and (6.20) hold for each P ∈ α(E2 , P ∗ ) iff there exist a PD P¯ ∈ α(E2 , P ∗ ) and a conditional PD QP¯ ∈ Q(P¯ , E1 , E2 , ∆1 , ∆2 ), such that X → X 2 → X 1 forms a Markov chain in that order and at the same time R(E1 , ∆1 , P ∗ ) ≥ IP¯ ,QP¯ (X ∧ X 1 ), ∗
2
R(E2 , ∆2 , P ) = IP¯ ,QP¯ (X ∧ X ).
(6.21) (6.22)
So, in case of E1 ≥ E2 we get the conditions for the successive refinability under the reliability constraint defined by (6.21) and (6.22). E2 ≥ E1 case: Taking into account the corresponding rate-reliabilitydistortion region (6.13) it follows that (R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) − R(E1 , ∆1 , P ∗ )) is achievable iff for each P ∈ α(E1 , P ∗ ) there exists a QP ∈ Q(P, E1 , E2 , ∆1 , ∆2 ) such that R(E1 , ∆1 , P ∗ ) ≥ IP,QP (X ∧ X 1 )
(6.23)
and R(E2 , ∆2 , P ∗ ) ≥ max R(E2 , ∆2 , P ∗ ), IP,QP (X ∧ X 1 , X 2 ) .
(6.24)
126
Reliability Criterion in Multiterminal Source Coding
¯ P as the conditional PD For each P ∈ α(E1 , P ∗ ), selecting Q that minimizes the mutual information IP,QP (X ∧ X 1 , X 2 ) among those which satisfy (6.23) and (6.24), the optimal pair of rates (R(E1 , ∆1 , P ∗ ), R(E2 , ∆2 , P ∗ ) − R(E1 , ∆1 , P ∗ )) will be achievable iff R(E1 , ∆1 , P ∗ ) ≥ IP,Q¯ P (X ∧ X 1 )
(6.25)
and R(E2 , ∆2 , P ∗ ) ≥ max R(E2 , ∆2 , P ∗ ), IP,Q¯ P (X ∧ X 1 , X 2 ) .
(6.26)
Since the inequalities have to be satisfied for each P from α(E1 , P ∗ ), (6.25) and (6.26) are equivalent to R(E1 , ∆1 , P ∗ ) ≥
IP,Q¯ P (X ∧ X 1 )
(6.27)
IP,Q¯ P (X ∧ X 1 , X 2 ).
(6.28)
max
P ∈α(E1 ,P ∗ )
and R(E2 , ∆2 , P ∗ ) ≥
max
P ∈α(E1 ,P ∗ )
Then, recalling (5.16) again, the inequalities (6.27) and (6.28) in turn hold for each P ∈ α(E1 , P ∗ ) iff R(E1 , ∆1 , P ∗ ) =
max
P ∈α(E1
IP,Q¯ P (X ∧ X 1 )
(6.29)
IP,Q¯ P (X ∧ X 1 , X 2 ).
(6.30)
,P ∗ )
and meantime R(E2 , ∆2 , P ∗ ) ≥
max
P ∈α(E1 ,P ∗ )
Now, noting that the right-hand side of the last inequality does not depend on E2 and the function R(E2 , ∆2 , P ∗ ) is monotonically nondecreasing in E2 , we arrive to the conclusion that (6.30) will be satisfied ¯ P meeting (6.29) iff E2 ≥ E ˆ2 , where for Q ˆ 2 , ∆2 , P ∗ ) = R(E
max
P ∈α(E1 ,P ∗ )
IP,Q¯ P (X ∧ X 1 , X 2 ).
(6.31)
It must be noted also that the successive refinement in reliability sense in case of E2 ≥ E1 is not possible if max
P ∈α(E1 ,P ∗ )
IP,Q¯ P (X ∧ X 1 , X 2 ) > max R(∆2 , P ), P
6.3 Reliability Criterion in Successive Refinement
127
where the right-hand side expression is the value of the zero-error ratedistortion function (5.19) for the second hierarchy. Finally note that we obtain the conditions for the successive refinability in distortion sense [70, 157], and [187], letting E1 = E2 = E → 0, as in (6.21) and (6.22). We quote those particularized conditions according to the theorem by Equitz and Cover [70]. Theorem 6.7. For the DMS with distribution P ∗ and distortion measure d1 = d2 = d, the pair (R(∆1 , P ∗ ),R(∆2 , P ∗ ) − R(∆1 , P ∗ )) is achievable iff there exists a conditional PD Q, such that R(∆1 , P ∗ ) = IP ∗ ,Q (X ∧ X 1 ), EP ∗ ,Q d(X, X 1 ) ≤ ∆1 , ∗
2
2
R(∆2 , P ) = IP ∗ ,Q (X ∧ X ), EP ∗ ,Q d(X, X ) ≤ ∆2 ,
(6.32) (6.33)
and RVs X, X 2 , X 1 form a Markov chain in that order. Now resuming the discussions in this section we may conclude that the successive refinement of information in distortion sense is possible if and only if the Markovity condition in Theorem 6.7 with (6.32) and (6.33) for the source are fulfilled. In the more natural case of E2 ≥ E1 , the successive refinement of information under the reliability criterion is possible if and only if E2 is larger than the threshold defined by (6.31). Meanwhile, it would be interesting to note that the successive refinement in the “purely” reliability sense, i.e., when ∆1 = ∆2 = 0, is always possible for E2 ≥ E1 , since in that case Theorem 6.4 yields the achievability of the rates R1 = R1 + R2 =
max
HP (X),
max
HP (X),
P ∈α(E1 ,P ∗ ) P ∈α(E2 ,P ∗ )
which are the corresponding values of the rate-reliability function (5.17).
7 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
7.1
Prelude
The section serves an illustration of usefulness of combinatorial methods developed in information theory to investigation of the logarithmically asymptotically optimal (LAO) testing of statistical hypotheses (see also [49, 51, 54, 167]). Applications of information-theoretical methods in mathematical statistics are reflected in the monographs by Kullback [160], Csisz´ar and K¨orner [51], Cover and Thomas [48], Blahut [35], Han [94], Ihara [147], Chen and Alajaji [43], Csisz´ar and Shields [54]. Verd´ u’s book [216] on multi-user detection is an important information theory reference where the hypothesis testing is in service to information theory in its practical elaborations. The paper by Dobrushin, Pinsker and Shiryaev [59] presents a series of prospective problems on this subject. Blahut [34] used the results of statistical hypotheses testing for solution of information-theoretical problems. For series of independent experiments the well-known Stein’s lemma [51] shows that for the given fixed first kind error probability (N ) α1 = α1 the exponential rate of convergence to zero of the probability 129
130
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses (N )
of the second kind error α2 to infinity is as follows:
when the number of experiments tends (N )
lim N −1 log α2 (α1 ) = −D(P1 kP2 ),
N →∞
where D(P1 kP2 ) is informational divergence (see Section 1.2) or, as it is also named in Statistics, distance, or Kullback-Leibler information of hypothetical distributions P1 and P2 defined on the finite set X . In Section 7.2, we study functional dependence of the first and the second kind error probabilities of the optimal tests for the sequence of experiments concerning a Markov chain, and in Section 7.3 the case of many hypotheses for independent experiments will be considered. This problem was investigated in many works, e.g., Hoeffding [145], Csisz´ar and Longo [53], Birg´e [32], Blahut [34], Han [92, 94], Haroutunian [134], Natarajan [177], Perez [179], Tusn´ady [207], [208], Fu and Shen [77], Tuncel [205], and analyzed in some of noted books. Various associated problems, notions, and results are presented by Berger[28], Ahlswede [10], Ahlswede and Wegener [21], Ahlswede and Csisz´ar [16], Anantharam [23], Bechhofer et al. [24], Burnashev et al. [37], Gutman [87], Han and Amari [95], Han and Kobayashi [97], Lyapunov [168], Lin’kov [163, 164, 165], Chen [42], Poor and Verd´ u [184], Puhalskii and Spokoiny [186], Verd´ u [214], Feder and Merhav [73], Levitan and Merhav [161], Zeitouni and Gutman [225], Ziv [229], Zhang and Berger [227], and others. The results presented in Sections 7.2 and 7.3 were published in [103, 104, 105, 106]. New developments in this direction by Ahlswede and Haroutunian [18, 19], Ahlswede et al. [11], Haroutunian and Hakobyan [110, 111, 112, 113] and others are briefly discussed in Section 7.4.
7.2
Reliability Function for Two Alternative Hypotheses Concerning Markov Chains
Let x = (x0 , x1 , x2 , . . . , xN ), xn ∈ X = {1, 2, . . . , I}, x ∈ X N +1 , N = 0, 1, 2, . . . , be vectors of observed states of a simple homogeneous stationary Markov chain with finite number I of states. There are two competing hypotheses concerning the matrix of transition probabilities
7.2 Reliability Function for Two Alternative Hypotheses
131
of the chain: P1 = {P1 (j|i)} or P2 = {P2 (j|i)}, i, j = 1, I. In both cases [60] there exist corresponding stationary distributions Q1 = {Q1 (i)} and Q2 = {Q2 (i)}, not necessarily unique, such that X X Ql (i)Pl (j|i) = Ql (j), Ql (i) = 1, l = 1, 2, j = 1, I. i
i
We shall use the following definition of the probability of the vector x ∈ X N +1 of the Markov chain with transition probabilities Pl and stationary distribution Ql Ql ◦
M PlN (x) =
Ql (x0 )
N Y
Pl (xn |xn−1 ),
l = 1, 2,
n=1 M
Ql ◦ PlN (A) =
[
Ql ◦ PlN (x), A ⊂ X N +1 .
x∈A
Based on observed trajectory x the resolver device must adopt decision about correctness of the first or the second hypothesis. Denote α1 (φN ) the error probability of the first kind of the test criterion φN (x). It is the probability to reject the first hypothesis if in the reality it is correct. Let us denote by G N the set of vectors x for which the hypothesis P1 is adopted: M
G N = {x : φN (x) = 1}. Then the error probability of the first kind α1 (φN ) is M
α1 (φN ) = 1 − Q1 ◦ P1N (G N ) and the error probability of the second kind α2 (φN ) is M
α2 (φN ) = Q2 ◦ P2N (G N ). A sequence of tests φN (x), N = 1, 2, . . . , is called [32] logarithmically asymptotically optimal if for given value E1 > 0 α1 (φN ) ≤ e−N E1 , and the upper limit lim −N −1 log α2 (φN )
N →∞
132
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
takes its maximal value denoted by E2 (E1 ). By analogy with the notion introduced by Shannon into the information theory (see (2.7), (2.12)) it is natural to refer the function E2 (E1 ) also as reliability function. Let P = {P (j|i), i = 1, I, j = 1, I} be a matrix of transition probabilities of a stationary Markov chain with the same states set X and let Q = {Q(i), i = 1, I} be corresponding stationary distribution. Let us denote D(Q ◦ P kQl ◦ Pl ) Kullback–Leibler divergence of the distribution Q ◦ P = {Q(i)P (j|i), i = 1, I, j = 1, I} from the distribution Ql ◦ Pl = {Ql (i)Pl (j|i), i = 1, I, j = 1, I},
l = 1, 2,
where D(Q ◦ P kQl ◦ Pl ) =
X
Q(i)P (j|i)[log Q(i)P (j|i) − log Ql (i)Pl (j|i)]
i,j
= D(QkQl ) + D(Q ◦ P kQ ◦ Pl ) with D(QkQl ) =
X
Q(i)[log Q(i) − log Ql (i)],
l = 1, 2.
i
The main theorem of the section is the next one. Theorem 7.1. For any E1 > 0 the LAO tests reliability function E2 (E1 ) for testing of two hypotheses P1 and P2 concerning Markov chains is given by the formula: E2 (E1 ) =
inf
inf
Q:∃Q1 ,D(QkQ1 )<∞ P :D(Q◦P kQ◦P1 )≤E1
D(Q ◦ P kQ ◦ P2 ). (7.1)
The reliability function has the properties which are consequences of the theorem. (C1) If inf D(Q1 ◦ P1 kQ1 ◦ P2 ) < ∞, Q1
7.2 Reliability Function for Two Alternative Hypotheses
133
then E2 (E1 ) ≤ lim E2 (E1 ) = inf D(Q1 ◦ P1 kQ1 ◦ P2 ). E1 →0
Q1
(C2) E2 (E1 ) monotonically decreases by E1 for E1 > inf
inf
Q P :D(Q◦P kQ◦P2 )<∞
D(Q ◦ P kQ ◦ P1 ),
and for smaller E1 we have E2 (E1 ) = ∞. (C3) If E1 > inf D(Q2 ◦ P2 kQ2 ◦ P1 ), then E2 (E1 ) = 0. But if Q2
inf D(Q2 ◦ P2 kQ2 ◦ P1 ) = ∞, Q2
then E2 (E1 ) > 0 for E1 arbitrarily large. (C4) In particular case, when P1 and P2 have only positive components, stationary distributions Q1 and Q2 are unique and have strictly positive components, then the function E2 (E1 ) takes the form: E2 (E1 ) =
min
P :D(Q◦P kQ◦P1 )≤E1
D(Q ◦ P kQ ◦ P2 ),
moreover, matrix P has also positive components and Q is its corresponding stationary distribution. This is a result of Natarajan [177] obtained by application of large deviations technique. (C5) In case of independent experiments with values on X for two possible distributions P1 and P2 , and PD of a sample P we obtain the following formula E2 (E1 ) =
min
P :D(P kP1 )≤E1
D(P kP2 ).
Proof. The proof of Theorem 7.1 consists of two parts. In the first part by means of construction of a test sequence it is proved that E2 (E1 ) is not less than the expression noted in (7.1). In the second part it is proved that E2 (E1 ) cannot be greater than this expression. Let us name the second-order type of vector x (cf. [87, 148]) the square matrix of I 2 relative frequencies {N (i, j)N −1 , i = 1, I, j = 1, I} of the simultaneous appearance on the pairs of neighbor places of the states i and j. It is P N the set of vectors from X N +1 clear that ij N (i, j) = N . Denote TQ◦P which have the type such that for some joint PD Q ◦ P N (i, j) = N Q(i)P (j|i),
i = 1, I, j = 1, I.
134
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
N , then Note that if the vector x ∈ TQ◦P X N (i, j) = N Q(i), i = 1, I, j
X
N (i, j) = N Q0 (j),
j = 1, I,
i
for somewhat different PD Q0 , but in accordance with the definition of N (i, j) we have |N Q(i) − N Q0 (i)| ≤ 1,
i = 1, I,
and then in the limit, when N → ∞, the distribution Q coincides with Q0 and may be taken as stationary for conditional PD P : X Q(i)P (j|i) = Q(j), j ∈ X . i
The test φN (x) can be given by the set B(E1 ), the part of the space in which the hypothesis P1 is adopted. We shall verify that the test for which [ N B(E1 ) = TQ◦P (7.2)
X N +1 ,
Q,P :D(Q◦P kQ◦P1 )≤E1 ,∃Q1 :D(QkQ1 )≤∞
will be asymptotically optimal for given E1 . N can be writNote that for l = 1, 2 the probability of x from TQ◦P ten as Y Ql ◦ PlN (x) = Ql (x0 ) Pl (j|i)N Q(i)P (j|i) . i,j
From here if Ql ◦ PlN (x) > 0, then PD Q ◦ P is absolutely continuous relative to the PD Ql ◦ Pl . On the contrary, if Ql ◦ PlN (x) = 0, then the distribution Q ◦ P is not absolutely continuous relative to the measure Ql ◦ Pl and in this case D(Q ◦ P kQl ◦ Pl ) = ∞, but since D(Q ◦ P kQl ◦ Pl ) = D(Q ◦ P kQ ◦ Pl ) + D(QkQl ),
7.2 Reliability Function for Two Alternative Hypotheses
135
then at least one of the summands is infinite. Note also that if Q ◦ P is absolutely continuous relative to Ql ◦ Pl , then N Ql ◦ PlN (TQ◦P ) = exp{−N (D(Q ◦ P kQ ◦ Pl ) + o(1))},
where
o(1) = max max |N log Ql (i)| : Ql (i) > 0 , i −1 max |N log Ql (i)| : Ql (i) > 0 → 0, when N → ∞. −1
i
Really, this is not difficult to verify taking into account that the number N | of vectors in T N |TQ◦P Q◦P is equal to X exp −N Q(i)P (j|i) log P (j|i) + o(1) . i,j
Now our goal is to determine α1 (φN ) = max Q1 ◦ P1N (B(E1 )). Q1
N By analogy with (1.2) the number of different types TQ◦P does not 2
exceed (N + 1)|X | , then α1 (φN ) ≤ exp{−N (E1 + o(1))}. At the same time it is not difficult to verify that α2 (φN ) = max Q2 ◦ P2N (B(E1 )) Q2 ≤ exp − N
min
Q,P :D(Q◦P kQ◦P1 )≤E1 ,∀Q1 :D(QkQ1 )<∞
× D(Q ◦ P kQ ◦ P2 ) + o(1) . Therefore the proposed test possesses the necessary exponents of error probabilities. It is easy to see that the theorem is valid also for E1 = ∞, that is for α1 (φN ) = 0.
136
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
N Coming to the second part of the proof, note that for any x ∈ TQ◦P the probability Ql ◦ PlN (x) is constant. Hence for the optimal test the N . Now we corresponding set B 0 (E1 ) contains only the whole types TQ◦P can conclude that logarithmically asymptotically optimal is only the proposed sequence of tests, defined by the sets B(E1 ) .
Remark 7.1. The test given by the set B(E1 ) is robust because it is the same for different alternative hypothesis P2 .
7.3
Multiple Hypotheses: Interdependence of Reliabilities
In this section we shall generalize the results of the previous section for the case of L > 2 hypotheses. Let X = {1, 2, . . . , I} again be a finite set of states of the stationary Markov chain. The hypotheses concern to the matrices of the transition probabilities Pl = {Pl (j|i) i = 1, I, j = 1, I}, l = 1, L. The stationarity of the chain provides existence for each l = 1, L of the stationary distribution Ql = {Ql (i), i = 1, I}, not necessarily unique. On the base of the trajectory x = (x0 , x1 , . . . , xN ) of the N + 1 observations the test accepts one of the hypotheses Hl , l = 1, L. (N ) Let us denote αl|r (φN ) the probability to accept the hypothesis Hl in the condition that the Hr , r 6= l, is true. (N ) For l = r we denote αr|r (φN ) the probability to reject the hypothesis Hr . It is clear that X (N ) (N ) αr|r (φN ) = αl|r (φN ), r = 1, L. (7.3) l6=r
This probability is called [36] the error probability of the rth kind of the test φN . The quadratic matrix of L2 error probabilities A(φN ) = (N ) {αl|r (φ), r = 1, L, l = 1, L} sometimes is called the power of the tests. To every trajectory x the determined test φN results in a choice of a hypothesis among L ones. So the space X N +1 will be divided into L parts GlN = {x, φN (x) = l},
l = 1, L,
137
7.3 Multiple Hypotheses: Interdependence of Reliabilities
and αl|r (φN ) = Qr ◦ Pr (Gl ),
r, l = 1, L.
Denote El|r (φ) = lim − N →∞
1 log αl|r (φN ), N
r, l = 1, L.
(7.4)
The matrix E = {El|r r = 1, L, l = 1, L} we call the reliability matrix of the tests’ sequence φ. The problem formulated by Prof. R. Dobrushin during a seminar in the Institute of Information Transmission Problems of the USSR Academy of Sciences in March of 1987 consists in determination of the best in some sense reliability matrix E, which may be received for L known distributions. Note that from definitions (7.3) and (7.4) it follows that Er|r = min El|r .
(7.5)
l6=r
In case L = 2 there are only 2 parameters in matrix E, because E1|1 = E2|1 ,
E1|2 = E2|2 ,
so the problem lies in the determination of the maximum value of one of them (say E1|2 ) as a function of the given value of the other (E1|1 ). Let us name the sequence of tests LAO if for given family of E1|1 , E2|2 , . . . , EL−1|L−1 , these numbers make the diagonal of the reliability matrix and the remaining L2 − L + 1 components of it take the possible maximum values. Let P = {P (j|i)} be a matrix of transition probabilities of some stationary Markov chain with the same set X of states, and Q = {Q(i), i = 1, I} be the corresponding stationary PD. Let us define the sets M
Rl = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≤ El|l , ∃Ql : D(QkQl ) < ∞}, l = 1, L − 1, M
RL = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≥ El|l , l = 1, L − 1},
138
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
and introduce the functions: M
∗ (El|l ) = El|l , El|l ∗ El|r (El|l ) =
inf
Q◦P ∈Rl
l = 1, L − 1,
D(Q ◦ P kQ ◦ Pr ), r = 1, L, l 6= r, M
∗ EL|r (E1|1 , . . . , EL−1|L−1 ) =
inf
Q◦P ∈RL
D(Q ◦ P kQ ◦ Pr ),
l = 1, L − 1, r = 1, L − 1,
M
∗ ∗ (E1|1 , . . . , EL−1|L−1 ) = min El|L . EL|L l=1,L−1
The minimum on the void set always will be taken equal to infinity. Name the following conditions as compatibility conditions: 0 < E1|1 < min[inf D(Qr ◦ Pr kQr ◦ P1 ), r = 2, L], Qr
·
·
·
·
·
·
·
·
·
·
h
∗ 0 < El|l < min Er|l (Er|r ), r = 1, l − 1, i inf D(Qr ◦ Pr kQr ◦ Pl ), r = l + 1, L , l = 2, L − 1, Qr
Theorem 7.2. (1) If the family of finite positive numbers E1|1 , . . . , EL−1|L−1 verify compatibility conditions, then there exists LAO sequence of tests, the reliability matrix of which is defined ∗ , and all elements of this matrix are strictly by the function El|r positive. (2) If the compatibility conditions are violated, then for any tests at least one element of the reliability matrix will be equal to zero, that is the corresponding error probabilities will not decrease exponentially. The proof of this theorem is a simple combination of proofs of Theorems 7.1 and 7.3 concerning L hypotheses for i.i.d experiments. Differences in notations are the following: L PDs P1 , . . . , PL are given, the trajectory of observed results of N experiments is x = (x1 , . . . , xN ) and its probability is PrN (x)
=
N Y n=1
Pr (xn ),
r = 1, L.
139
7.3 Multiple Hypotheses: Interdependence of Reliabilities
Now let us redefine several notions and notations, manipulated throughout, somewhat different from the ones of Section 1.4. We call the type Px of a sample x (or sample PD) the vector (N (1|x)/N, . . . , N (I|x)/N ), where N (i|x) is the number of repetitions of the state i in x. The set of vectors x of given type P will be denoted as TPN . P (N ) denotes the set of all possible types of samples of length N . Let us introduce for given positive and finite numbers E1|1 , . . . , EL−1|L−1 the following notations: Rl = {P : D(P kPl ) ≤ El|l },
l = 1, L − 1,
RL = {P : D(P kPL ) > El|l , l = 1, L − 1}, \ (N ) Rl = Rl P (N ) , l = 1, L, ∗ ∗ El|l = El|l (El|l ) = El|l , ∗ El|r
=
∗ EL|r
=
∗ El|r (El|l )
(7.6)
l = 1, L − 1,
= inf D(P kPr ), r = 1, L, r 6= l, l = 1, L − 1,
P ∈Rl ∗ EL|r (E1|1 , . . . , EL−1|L−1 )
= inf D(P kPr ), P ∈RL
r = 1, L − 1,
∗ ∗ ∗ EL|L = EL|L (E1|1 , . . . , EL−1|L−1 ) = min El|L
(7.7)
l=1,L−1
∗ may be equal to infinity, this may occur, Note that parameter Er|l when some measures Pl are not absolutely continuous relative to some others. Theorem 7.2 admits the following form.
Theorem 7.3. Let the family of various PDs P1 , . . . , PL be given on the finite set X . For L − 1 positive finite numbers E1|1 , . . . , EL−1|L−1 strict inequalities fulfilment 0 < E1|1 < min D(Pl kP1 ), l=2,L
·
·
0 < Er|r < min
·
·
·
· ∗ min El|r (El|l ), min D(Pl kPr ) ,
l=1,r−1
·
·
·
·
l=r+1,L
·
·
·
·
·
(7.8)
r = 2, L − 1,
140
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
is necessary and sufficient for the existence of LAO sequence of tests ∗ ), r = 1, L, l = 1, L, all components with the reliability matrix E ∗ = (El|r of which are strictly positive and are determined in (7.7). We call the family E1|1 , . . . , EL−1|L−1 , which satisfy conditions (7.8) compatible. Remark 7.2. When hypotheses are reindexed the theorem remains valid with corresponding changes in conditions (7.8).
Remark 7.3. The maximum likelihood test adopts the hypothesis, which maximizes the probability of sample x, that is r∗ = (N ) arg max Pr (x). Meanwhile simultaneously r
r∗ = arg min D(Px kPr ), r
that is the maximum likelihood principle is equivalent to the minimum divergence principle. The rest of the section is devoted to an overview of works on hypothesis testing for sources with side information. In the paper [111] Haroutunian and Hakobyan studied the matrix of asymptotic interdependencies (reliability–reliability functions) of all possible pairs of the error probability exponents (reliabilities) in testing of multiple statistical hypotheses for arbitrarily varying object with the current states sequence known to the statistician. The case of two hypotheses when state sequences are not known to the decision maker was studied by Fu and Shen [77], and when decision is found on the base of known states sequence was considered by Ahlswede et al. [11]. In the same way as Fu and Shen from the main result the rate-reliability and the reliability-rate functions for arbitrarily varying source coding with side information were obtained in [111] (c.f. Section 5.2).
7.4 Optimal Testing and Identification for Statistical Hypothesis
7.4
141
Statistical Hypothesis Optimal Testing and Identification for Many Objects
In this section, we briefly present some new results and problems in statistics which are generalizations of those exposed in Sections 7.1–7.3. The identification or hypothesis testing problems are considered for models with K(≥ 1) random objects each having one from L(≥ 2) PDs. Let Xk = (Xk,n , n = 1, N ), k = 1, K, be K sequences of N discrete i.i.d. RVs representing possible results of N observations, respectively, for each of K randomly functioning objects. For k = 1, K, n = 1, N , Xk,n assumes values xk,n in the finite set X of cardinality |X |. Let P(X ) be the space of all possible PDs on X . There are L(≥ 2) PDs P1 , . . . , PL from P(X ) in inspection, some of which are assigned to the vectors X1 , . . . , XK . This assignment is unknown and must be determined on the base of N -samples (results of N independent observations) xk = (xk,1 , . . . , xk,N ), where xk,n is a result of the nth observation of the kth object. When L = K and all objects are different (any two objects cannot have the same PD), there are K! alternative versions of decisions. When objects are independent, there are LK possible combinations. Bechhofer et al. presented investigations on sequential multipledecision procedures in the book [24] which is concerned principally with a particular class of problems referred to as ranking problems. Chapter 10 of the book by Ahlswede and Wegener [21] is devoted to the statistical identification and ranking problems. We consider models from [21] and [24] and variations of these models inspired by the pioneering papers by Ahlswede and Dueck [17] and by Ahlswede [10], with application of the optimality concept developed in Sections 7.1–7.3 for the models with K = 1. Consider the following family of error probabilities of a test (N )
αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK , (m1 , m2 , . . . , mK ) 6= (l1 , l2 , . . . , lK ), mk , lk = 1, L,
k = 1, K,
142
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
which are the probabilities of the decision l1 , l2 , . . . , lK when actual indices of the distributions of the objects are, respectively, m1 , m2 , . . . , mK . The probabilities to reject all K hypotheses when they are true are X (N ) (N ) αl1 ,l2 ,...,lK |l1 ,l2 ,...,lK = αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK . (m1 ,m2 ,...,mK )6=(l1 ,l2 ,...,lK )
To study the exponential decrease of the error probabilities when the number of observations N increases we define reliabilities lim −
N →∞
1 (N ) log αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK = El1 ,l2 ,...,lK |m1 ,m2 ,...,mK ≥ 0. N (7.9)
In the papers [18] and [19] it was shown that questions arise in different models of statistical identification. The following models and problems were formulated and a part of them were solved. (1) K objects are different, they have distinct PDs among L ≥ K possibilities. As a starting point the problem of hypotheses LAO testing for the case K = 2, L = 2 was treated in [18] and [19]. (2) K objects are independent, that is they may follow also identical PDs. The problem is the same. An example for K, L = 2 was considered in [18] and [19]. It is surprising, but this model was not studied before the paper [18]. (3) For the model with one object, K = 1, and L possible PDs the question is whether the lth distribution occurred or not. This is the problem of identification of PDs in the spirit of the paper [17]. (4) The ranking (or ordering) problem [10]. Having one vector of observations X = (X1 , X2 , . . . , XN ) and L hypothetical PDs the receiver wants to know whether the index of the true PD of the object is in {1, 2, . . . , r} or in {r + 1, . . . , L}. (5) r-identification of PD [10]. Again K = 1. One wants to identify the observed object as a member either of the subset S of {1, 2, . . . , L} or of its complement, with r being the number of elements in S.
7.4 Optimal Testing and Identification for Statistical Hypothesis
143
In what follows, we survey the results related to the problems formulated above. In the papers [18] and [19] the problem concerning r-identification (which proved to be equivalent to the ranking problem) is completely solved. The full solution of the problem of LAO identification of the PD of an object (the above noted model 3) was in the same papers, which we are going to expose in the sequel. First, it is necessary to formulate our meaning of the LAO identification problem for one object. There are known L ≥ 2 possible PDs. The identification is the answer to the question whether rth PD occurred or not. As in the testing problem, this answer must be given on the base of a sample x employing a test φN (x). There are two error probabilities for each r = 1, L: the probability αl6=r|m=r (φN ) to accept l different from r, when r is in reality, and the probability φl=r|m6=r (φN ) that r is accepted, when it is not correct. The probability αl6=r|m=r (φN ) is already known, it coincides with P the probability αr|r (φN ) which is equal to l:l6=r αl|r (φN ). The corresponding reliability El6=r|m=r (φ) is equal to Er|r (φ) which satisfies the equality (7.5). And what is the reliability approach to identification? It is nec∗ essary to determine the optimal dependence of El=r|m6 =r upon given ∗ ∗ El6=r|m=r = Er|r , which can be assigned a value satisfying conditions (7.8). We need to involve some a priori probabilities of different hypotheses. Let us suppose that the hypotheses P1 , . . . , PL have, say, probabilities Pr(r), r = 1, L. The only constraint we shall use is that Pr(r) > 0, r = 1, L. We will see, that the result formulated in the coming theorem does not depend on values of Pr(r), r = 1, L, if they all are strictly positive. Now we can make the following reasoning for each r = 1, L: (N )
αl=r|m6=r =
Pr(N ) (m 6= r, l = r) = Pr(m 6= r)
P
X (N ) 1 αr|m Pr(m). Pr(m)
m:m6=r
m:m6=r
144
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
From here one can observe that for r = 1, L. 1 (N ) El=r|m6=r = lim − log αl=r|m6=r N →∞ N X X 1 (N ) = lim log Pr(m) − log αr|m Pr(m) N →∞ N m:m6=r
= min
m:m6=r
m:m6=r
∗ . Er|m
By analogy with the consequence (C5) in Section 7.2 we can conclude (with Rr defined as in (7.6) for each r including r = L, that for the values of Er|r from (0, minl:l6=r D(Pl ||Pr ))) El=r|m6=r (Er|r ) = min
inf D(P kPm )
m:m6=r P ∈Rr
= min
inf
m:m6=r P :D(P kPr )≤Er|r
D(P kPm ),
r = 1, L. (7.10)
This outcome is summarized in
Theorem 7.4. For the model with different distributions, under the condition that the probabilities of all L hypotheses are positive the reliability Em6=r|l=r for given Em=r|l6=r = Er|r is defined by (7.10). For a good perception of the theory it would be pertinent to discuss an example with the set X = {0, 1} having only 2 elements. Let the following five PDs are given on X : P1 = {0.10, 0.90}, P2 = {0.65, 0.35}, P3 = {0.45, 0.55}, P4 = {0.85, 0.15}, P5 = {0.23, 0.77}. In Figure 7.1 the results of calculations of El=r|m6=r as function of El6=r|m=r are presented.
7.4 Optimal Testing and Identification for Statistical Hypothesis
145
Fig. 7.1 The function El=r|m6=r for different values of r.
The elements of the matrix of divergences of all pairs of distributions are used for calculation of conditions (7.8) for this example.
0 1.278 l∈[5] {D(Pm kPl )}m∈[5] = 0.586 2.237 0.103
0.956 0 0.120 0.146 0.531
0.422 0.117 0 0.499 0.151
2.018 0.176 0.618 0 1.383
0.082 0.576 0.169 . 1.249 0
In Figures 7.2 and 7.3 the results of calculations of the same dependence are presented for four distributions taken from the previous five. In [113] Haroutunian and Hakobyan solved the problem of identification of distributions for two independent objects, what will be exposed in the coming lines. We begin with a lemma from [110] on LAO testing for two independent objects and L hypotheses concerning each of them. Let a sequence
146
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
Fig. 7.2 The function El=r|m6=r for four distributions taken from five.
of compound tests Φ = (φ1 , φ2 ) consists of the pair of sequences of tests φ1 and φ2 for respective separate objects. Lemma 7.5. If elements El|m (φi ), m, l = 1, L, i = 1, 2, are strictly positive, then the following equalities hold El1 ,l2 |m1 ,m2 (Φ) =
2 X i=1 i
Eli |mi (φi ),
El1 ,l2 |m1 ,m2 (Φ) = Eli |mi (φ ),
if m1 6= l1 , m2 6= l2 ,
(7.11)
if m3−i = l3−i , mi 6= li , i = 1, 2. (7.12)
The LAO test Φ∗ is the compound test, and for it the equalities (7.11) and (7.12) are valid. For identification the statistician have to answer the question whether the pair of distributions (r1 , r2 ) occurred or not. Let us consider two types of error probabilities for each pair (r1 , r2 ), r1 , r2 =
7.4 Optimal Testing and Identification for Statistical Hypothesis
147
Fig. 7.3 The function El=r|m6=r for another four distributions.
(N )
1, L. We denote by α(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) the probability that pair (r1 , r2 ) is true, but it is rejected. Note that this probability (N ) (N ) is equal to αr1 ,r2 |r1 ,r2 . Let α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) be the probability that (r1 , r2 ) is accepted, when it is not correct. The corresponding reliabilities are E(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) = Er1 ,r2 |r1 ,r2 and E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) . Our aim is to determine the dependence of E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) on given Er1 ,r2 |r1 ,r2 (ΦN ). Now let us suppose that hypotheses P1 , P2 , . . . , PL for two objects have a priori positive probabilities Pr (r1 , r2 ), r1 , r2 = 1, L, and consider the probability, which we are interested in: (N )
α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) =
PrN ((m1 , m2 ) 6= (r1 , r2 ), (l1 , l2 ) = (r1 , r2 )) Pr((m1 , m2 ) 6= (r1 , r2 ))
148
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
P =
α(m1 ,m2 )|(r1 ,r2 ) (N )Pr((m1 , m2 ))
(m1 ,m2 ):(m1 ,m2 )6=(r1 ,r2 )
P
.
Pr(m1 , m2 )
(m1 ,m2 )6=(r1 ,r2 )
Consequently, we obtain that E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) =
min (m1 ,m2 ):(m1 ,m2 )6=(r1 ,r2 )
Er1 ,r2 |m1 ,m2 .
(7.13) For every LAO tests Φ∗ from (7.9), (7.11), (7.12), and (7.14) we obtain that E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) = min ErI1 |m1 (Er1 |r1 ), ErII2 |m2 (Er2 |r2 ) , m1 6=r1 ,m2 6=r2
(7.14)
where ErI1 |m1 (Er1 |r1 ), ErII2 |m2 (Er2 |r2 ) are determined by (7.7) for, correspondingly, the first and the second objects. For every LAO test Φ∗ from (7.9), (7.11), and (7.12) we deduce that Er1 ,r2 |r1 ,r2 =
min
m1 6=r1 ,m2 6=r2
ErI1 |m1 , ErII2 |m2 = min ErI1 |r1 , ErII2 |r2 . (7.15)
and each of
0< 0<
ErI1 |r1 , ErII2 |r2
ErI1 |r1 ErII2 |r2
satisfy the following conditions:
< min
min l=1,r1 −1
< min
min l=1,r2 −1
∗ I El|m (El|l ),
min D(Pl ||Pr1 ) ,
(7.16)
min D(Pl ||Pr2 ) .
(7.17)
l=r1 +1,L
∗ II El|m (El|l ),
l=r2 +1,L
∗ (E I ), l = 1, r − 1 and From (7.7) we see that the elements El|m 1 l|l ∗ II I II El|m (El|l ), l = 1, r2 − 1 are determined only by El|l and El|l . But we are considering only elements ErI1 |r1 and ErII2 |r2 . We can use Stain’s Lemma formulated for L hypotheses and upper estimate in (7.16) and
7.4 Optimal Testing and Identification for Statistical Hypothesis
149
(7.17) as follows: 0<
ErI1 |r1
0 < ErII2 |r2
< min min D(Pr1 ||Pl ), min D(Pl ||Pr1 ) , l=1,r1 −1 l=r1 +1,L < min min D(Pr2 ||Pl ), min D(Pl ||Pr2 ) . l=1,r2 −1
(7.18) (7.19)
l=r2 +1,L
Let us denote r = max(r1 , r2 ) and k = min(r1 , r2 ). From (7.15) we have that, when Er1 ,r2 |r1 ,r2 = ErI1 |r1 , then ErI1 |r1 ≤ ErII2 |r2 and when Er1 ,r2 |r1 ,r2 = ErII2 |r2 , then ErII2 |r2 ≤ ErI1 |r1 . Hence, it can be implied that given strictly positive element Er1 ,r2 |r1 ,r2 must meet both inequalities (7.18) and (7.19), and the combination of these restrictions gives Er1 ,r2 |r1 ,r2 < min min D(Pr ||Pl ), min D(Pl ||Pk ) . (7.20) l=1,r−1
l=k+1,L
Using (7.15) and (7.20) we can determine reliability E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) in function of Er1 ,r2 |r1 ,r2 as follows: E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) Er1 ,r2 |r1 ,r2 = min Er1 |m1 (Er1 ,r2 |r1 ,r2 ), Er2 |m2 (Er1 ,r2 |r1 ,r2 ) , (7.21) m1 6=r1 ,m2 6=r2
where Er1 |m1 (Er1 ,r2 |r1 ,r2 ) and Er2 |m2 (Er1 ,r2 |r1 ,r2 ) are determined by (7.7). Finally we obtained Theorem 7.6. If the distributions Pl , l = 1, L, are different and the given strictly positive number Er1 ,r2 |r1 ,r2 satisfy condition (7.20), then the reliability E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) is defined in (7.21). The case with K = 2, L = 3 for the model of independent objects is studied in [112] and the problem for a model consisting of three or more independent objects is solved in paper [110]. The following example is treated in the papers [106] and [110] concerning one object and two independent objects, respectively. Let the set X = {0, 1} contains two elements and the following PDs are given on X : P1 = {0, 10; 0, 90}, P2 = {0, 85; 0, 14}, P3 = {0, 23; 0, 77}. In Figures 7.4 and 7.5 the results of calculations of
150
Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
Fig. 7.4 E2,1|1,2 (E1,1|3,1 , E2,2|2,3 ) for two independent objects.
Fig. 7.5 E2,1 (E1,1 ) for one object.
7.4 Optimal Testing and Identification for Statistical Hypothesis
151
functions E2,1|1,2 (E1,1|3,1 , E2,2|2,3 ) (for two independent objects) and E2|1 (E1|1 ) (for one object) are presented, respectively. The problem of hypothesis LAO testing is resolved by Haroutunian and Yessayan for a model consisting of two different objects (distributions of which cannot be the same) in the paper [125] for the case of three hypotheses (aforementioned Problem 1), and for two statistically dependent objects in [126]. In the paper [109] Haroutunian and Grigoryan examined the problem of three or more hypotheses LAO testing for a pair of simple homogeneous stationary Markov chains. It was noted in [18] and [19] that discussed problems and results may be extended in several directions. It is interesting to examine models in point of view of remote statistical inference formulated by Berger [28]. It is necessary to study models which are described by more general classes of RVs and processes [42, 87, 93, 94, 177]. One of the directions is connected with the inference of compressed data [12, 22, 95, 227]. One may see perspectives in application of identification approach and methods to the authentication theory [173] and steganography [38].
Basic Notations and Abbreviations
|a|+ bac AVC AVS BC BSC C(W ) C(W ) C(E, W ) = R(E, W ) C0 (W ) CRP co(A) d b) d(x, x DCC DMC DMS D(P kQ)
max(a, 0) integer part of the number a arbitrarily varying channel arbitrarily varying source broadcast channel binary symmetric channel capacity of DMC W capacity of DMC W for average error probability E-capacity or rate-reliability function zero error capacity of DMC W channel with random parameter the convex hull of the set A distortion measure b average distortion measure of vectors x and x discrete compound channel discrete memoryless channel discrete memoryless source divergence of PD P from Q 153
154
Basic Notations and Abbreviations
EP X e(f, g, N, W ) e(f, g, P ∗ , ∆, N ) E E(R, W ) Esp (R, W ) Er (R, W ) Ex (R, W ) exp, log GCRP GIFC g −1 (m) HP (X) HP,V (Y |X)
IP,V (Y ∧ X) IQ,P,V (Y ∧ X | U ) IFC L(N ) MAC MACRP M M n = 1, N PD P(X ) PN (X ) P, Q, V, W, . . .
expectation of the RV X the maximal error probability of N -block code (f, g) for channel W error probability of DMS P ∗ N -block code (f, g) subject to distortion ∆ error probability exponent (reliability) reliability function of DMC W sphere packing bound for the reliability function of DMC W random coding bound for the reliability function of DMC W expurgated bound for the reliability function of DMC W are to the base two generalized channel with random parameter general interference channel {y : g(y) = m} entropy of RV X with PD P conditional entropy of RV Y for given RV X with PD P and conditional PD V of Y given X conditional mutual information with PD P ◦ W mutual information of RV X and Y interference channel volume of source code multiple-access channel multiple-access channel with random parameter message set number of messages of the set M n = 1, 2, . . . , N probability distribution set of all PD on X subset of P(X ) consisting of the possible types of sequences x ∈ X N PD
Basic Notations and Abbreviations
P ◦W PW R(f, g, N ) R(∆, P ∗ ) R(E, ∆, P ∗ ) RBH (E, ∆, P ∗ ) TWC RTWC RV TPN (X) N (Y |x) TP,V
U, X, Y, S, . . . VN (Y, P ) W :X →Y
x A, B, P, X , Y, U, R, S, Z, . . . A⊂X |X | x∈X X ×Y XN X →Y →Z M
=
155
{P ◦ W (x, y) = P (x)W (y|x), x ∈ X , y ∈ Y} P {P W (y) = x P (x)W (y|x), y ∈ Y} transmission rate of a code (f, g) of blocklength N rate-distortion function rate-reliability-distortion function of the source with generic PD P ∗ binary Hamming rate-reliabilitydistortion function two-way channel restricted two-way channel random variable set of vectors x of type P , also called type set of vectors y of conditional type V for given x of type P , also called V -shell of x RV with values in U, X , Y, S, . . . set of all possible V -shells for x of type P discrete channel with input alphabet X , output alphabet Y and matrix of transition probabilities W vector (x1 , . . . , xN ) finite sets A is a subset of X size of the set X x is an element of the set X Cartesian product of the sets X and Y the set of N -length sequences of elements of X RV X, Y, Z form a Markov chain in this order by definition is equal
References
[1] R. F. Ahlswede, “On two-way communication channels and a problem by Zarankiewicz,” in Transactions on 6th Prague Conference on Information Theory, Statistical Decision Function, Random Proceedings, pp. 23–37, Prague, 1971. [2] R. F. Ahlswede, “Multy-way communication channels,” in 2nd International Symposium on Information Theory, Tsahkadzor, Armenia, 1971, pp. 23–52, Budapest: Akad. Kiado, 1973. [3] R. F. Ahlswede, “The capacity region of a channel with two senders and two receivers,” Annals of Probability, vol. 2, no. 2, pp. 805–814, 1974. [4] R. F. Ahlswede, “Elimination of correlation in random codes for arbitrarily varying channels,” Z. Wahrscheinlichkeitstheorie Verw. Gebiete, vol. 44, pp. 186–194, 1978. [5] R. F. Ahlswede, “Coloring hypergraphs: A new approach to multi-user source coding,” Part I, Journal of Combinatories, Information and System Sciences, vol. 4, no. 1, pp. 75–115, 1979. [6] R. F. Ahlswede, “Coloring hypergraphs: A new approach to multi-user source coding,” Part II, Journal of Combinatories, Information and System Sciences, vol. 5, no. 2, pp. 220–268, 1980. [7] R. F. Ahlswede, “The rate-distortion region for multiple descriptions without excess rate,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp. 721–726, 1985. [8] R. F. Ahlswede, “Arbitrarily varying channels with states sequence known to the sender,” IEEE Transations on Information Theory, vol. 32, no. 5, pp. 621– 629, 1986.
157
158
References
[9] R. F. Ahlswede, “Extremal properties of rate-distortion functions,” IEEE Transactions on Information Theory, vol. 36, no. 1, pp. 166–171, 1990. [10] R. F. Ahlswede, “General theory of information transfer,” Preprint 97-118, Discrete Strukturen in der Mathematik, Universit¨ at Bielefeld, 1997. [11] R. F. Ahlswede, E. Aloyan, and E. A. Haroutunian, “On logarithmically asymptotically optimal hypothesis testing for arbitrarily varying source with side information,” in Lecture Notes in Computer Science, Vol. 4123, General Theory of Information Transfer and Combinatorics, pp. 457–461, Springer Verlag, 2006. [12] R. F. Ahlswede and M. Burnashev, “On minimax estimation in the presence of side information about remote data,” Annals of Statistics, vol. 18, no. 1, pp. 141–171, 1990. [13] R. F. Ahlswede and N. Cai, “Arbitrarily varying multiple access channels,” Part 1, Preprint 96-068, Discrete Strukturen in der Mathematik, Universit¨ at Bielefeld, 1996. [14] R. F. Ahlswede and N. Cai, “Arbitrarily varying multiple access channels,” Part 2, Preprint 97-006, Discrete Strukturen in der Mathematik, Universit¨ at Bielefeld, 1997. [15] R. F. Ahlswede and N. Cai, “Correlated sources help transmission over an arbitrarily varying channel,” IEEE Transactions on Information Theory, vol. 43, pp. 1254–1255, 1997. [16] R. F. Ahlswede and I. Csisz´ ar, “Hypothesis testing with communication constraints,” IEEE Transactions on Information Theory, vol. 32, pp. 533–542, 1986. [17] R. F. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 15–29, 1989. [18] R. F. Ahlswede and E. A. Haroutunian, “On statistical hypothesis optimal testing and identification,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 24, pp. 16–33, 2005. [19] R. F. Ahlswede and E. A. Haroutunian, “On logarithmically asymptotically optimal testing of hypotheses and identification,” in Lecture Notes in Computer Science, Vol. 4123, General Theory of Information Transfer and Combinatorics, pp. 462–478, Springer Verlag, 2006. [20] R. F. Ahlswede and J. K¨ orner, “Source coding with side information and a converse for degraded broadcast channels,” IEEE Transactions on Information Theory, vol. 21, pp. 629–637, November 1975. [21] R. F. Ahlswede and I. Wegener, Search Problems. New York: J. WileyInterscience, 1987. (German original, Teubner, Sfuttgart, 1979, Russian translation, Mir, Moscow 1982). [22] R. F. Ahlswede, E. Yang, and Z. Zhang, “Identification via compressed data,” IEEE Transactions on Information Theory, vol. 43, no. 1, pp. 48–70, 1997. [23] V. Anantharam, “A large deviations approach to error exponent in source coding and hypothesis testing,” IEEE Transactions on Information Theory, vol. 36, no. 4, pp. 938–943, 1990.
References
159
[24] R. E. Bechhofer, J. Kiefer, and M. Sobel, Sequential Identification and Ranking Procedures. Chicago: The University of Chicago Press, 1968. [25] R. Benzel, “The capacity region of a class of discrete additive degraded interference channels,” IEEE Transactions on Information Theory, vol. 25, no. 2, pp. 228–231, 1979. [26] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ: Prentice-Hall, 1971. [27] T. Berger, “The source coding game,” IEEE Transaactions on Information Theory, vol. 17, no. 1, pp. 71–76, 1971. [28] T. Berger, “Decentralized estimation and decision theory,” in Presented at IEEE Seven Springs Workshop on Information Theory, Mt. Kisco, NY, September 1979. [29] T. Berger and Z. Zhang, “New results in binary multiple descriptions,” IEEE Transactions on Information Theory, vol. 33, no. 4, pp. 502–521, 1987. [30] T. Berger and Z. Zhang, “Multiple description source coding with no excess marginal rate,” IEEE Transactions on Information Theory, vol. 41, no. 2, pp. 349–357, 1995. [31] P. P. Bergmans, “Random coding theorem for broadcast channels with degraded components,” IEEE Transactions on Information Theory, vol. 9, pp. 197–207, 1973. [32] L. Birg´e, “Vitesse maximales de d´ecroissance des erreurs et tests optimaux associ´es,” Z. Wahrscheinlichkeitstheorie Verw. Gebiete, vol. 55, pp. 261–273, 1981. [33] D. Blackwell, L. Breiman, and A. J. Thomasian, “The capacity of a class of channels,” Annals of Mathematical Statistics, vol. 30, no. 4, pp. 1229–1241, 1959. [34] R. E. Blahut, “Hypothesis testing and information theory,” IEEE Transactions on Information Theory, vol. 20, pp. 405–417, 1974. [35] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, 1987. [36] A. A. Borovkov, Mathematical Statistics. (in Russian), Nauka, Novosibirsk, 1997. [37] M. V. Burnashev, S. Amari, and T. S. Han, “BSC: Testing of hypothesis with information constraints,” in Numbers, Information and Complexity, (Alth´ ofer, ed.), Boston: Kluwer Academic Publishers, 2000. [38] C. Cachin, “An information-theoretic model for steganography,” in Proceedings of 2nd Workshop on Information Hiding, (D. Ausmith, ed.), in Lecture Notes in Computer Science, Springer Verlag, 1998. [39] A. B. Carleial, “A case where interference does not reduce capacity,” IEEE Transactions on Information Theory, vol. 21, pp. 569–570, 1975. [40] A. B. Carleial, “Interference channels,” IEEE Transactions on Information Theory, vol. 24, no. 1, pp. 60–70, 1978. [41] A. B. Carleial, “Outer bounds on the capacity of interference channels,” IEEE Transactions on Information Theory, vol. 29, no. 4, pp. 602–604, 1983.
160
References
[42] P.-N. Chen, “General formula for the Neyman-Pearson type-II error exponent subject to fixed and exponential type-I error bounds,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 316–323, 1996. [43] P.-N. Chen and F. Alajaji, Lecture Notes in Information Theory. vol. I and II, http://shannon.cm.nctu.edu.tw. [44] M. H. M. Costa and A. El Gamal, “The capacity region of the discrete memoryless interference channel with strong interference,” IEEE Transactions on Information Theory, vol. 33, pp. 710–711, 1987. [45] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 2–14, 1972. [46] T. M. Cover, “An achievable rate region for the broadcast channel,” IEEE Transactions on Information Theory, vol. 21, pp. 399–404, 1975. [47] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on Information Theory, vol. 44, pp. 2524–2530, 1998. [48] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [49] I. Csisz´ ar, “The method of types,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2505–2523, 1998. [50] I. Csisz´ ar and J. K¨ orner, “Graph decomposition: A new key to coding theorems,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 5–12, 1981. [51] I. Csisz´ ar and J. K¨ orner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic Press, 1981. (Russian translation, Mir, Moscow, 1985). [52] I. Csisz´ ar, J. K¨ orner, and K. Marton, “A new look at the error exponent of a discrete memoryless channel,” in Presented at the IEEE International Symposium on Information Theory, Ithaca, NY: Cornell Univ., 1977. (Preprint). [53] I. Csisz´ ar and G. Longo, “On the error exponent for source coding and for testing simple statistical hypotheses,” Studia Scientiarum Mathematicarum Hungarica, vol. 6, pp. 181–191, 1971. [54] I. Csisz´ ar and P. Shields, “Information theory and statistics: A tutorial,” in Foundation and Trends in Communications and Information theory, noteHanover, MA, USA: now Publishers, 2004. [55] A. Das and P. Narayan, “Capacities of time-varying multiple-access channels with side information,” IEEE Transactions on Information Theory, vol. 48, no. 1, pp. 4–25, 2002. [56] R. L. Dobrushin, “Optimal information transmission in a channel with unknown parameters,” (in Russian), Radiotekhnika Electronika, vol. 4, pp. 1951–1956, 1959. [57] R. L. Dobrushin, “Asymptotic bounds of the probability of error for the transmission of messages over a memoryless channel with a symmetric transition probability matrix,” (in Russian), Teorija Veroyatnost. i Primenen, vol. 7, no. 3, pp. 283–311, 1962. [58] R. L. Dobrushin, “Survey of Soviet research in information theory,” IEEE Transactions on Information Theory, vol. 18, pp. 703–724, 1972.
References
161
[59] R. L. Dobrushin, M. S. Pinsker, and A. N. Shiryaev, “Application of the notion of entropy in the problems of detecting a signal in noise,” (in Russian), Lithuanian Mathematical Transactions, vol. 3, no. 1, pp. 107–122, 1963. [60] J. L. Doob, Stochastic Processes. New York: Wiley, London: Chapman and Hall, 1953. [61] G. Dueck, “Maximal error capacity regions are smaller than average error capacity regions for multi-user channels,” Problems of Control and Information Theory, vol. 7, no. 1, pp. 11–19, 1978. [62] G. Dueck, “The capacity region of two-way channel can exceed the inner bound,” Information and Control, vol. 40, no. 3, pp. 258–266, 1979. [63] G. Dueck and J. K¨ orner, “Reliability function of a discrete memoryless channel at rates above capacity,” IEEE Transactions on Information Theory, vol. 25, no. 1, pp. 82–85, 1979. [64] A. G. Dyachkov, “Random constant composition codes for multiple access channels,” Problems of Control and Information on Theory, vol. 13, no. 6, pp. 357–369, 1984. [65] A. G. Dyachkov, “Lower bound to average by ensemble error probability for multiple access channel,” (in Russian), Problems of Information on Transmission, vol. 22, no. 1, pp. 98–103, 1986. [66] A. El Gamal and M. N. Costa, “The capacity region of a class of deterministic interference channel,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 343–346, 1982. [67] A. El Gamal and T. M. Cover, “Achievable rates for multiple descriptions,” IEEE Transactions on Information Theory, vol. 28, no. 6, pp. 851–857, 1982. [68] A. El Gamal and E. Van der Meulen, “A proof of Marton’s coding theorem for the discrete memoryless broadcast channel,” IEEE Transactions on Information Theory, vol. 27, pp. 120–122, 1981. [69] P. Elias, “Coding for noisy channels,” IRE Convention Record, Part 4, pp. 37– 46, 1955. [70] W. H. R. Equitz and T. M. Cover, “Successive refinement of information,” IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 269–275, 1991. [71] T. Ericson, “Exponential error bounds for random codes in the arbitrarily varying channels,” IEEE Transactions on Information Theory, vol. 31, no. 1, pp. 42–48, 1985. [72] R. M. Fano, Transmission of Information, A Statistical Theory of Communication. New York, London: Wiley, 1961. [73] M. Feder and N. Merhav, “Universal composite hypothesis testing: A competitive minimax approach,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1504–1517, 2002. [74] A. Feinstein, “A new basic theorem of information theory,” IRE Transactions on Information Theory, vol. 4, pp. 2–22, 1954. [75] A. Feinstein, Foundations of Information Theory. New York: McGraw-Hill, 1958. [76] G. D. Forney, “Exponential error bounds for erasure, list and decision feedback schemes,” IEEE Transactions on Information Theory, vol. 14, no. 2, pp. 206– 220, 1968.
162
References
[77] F. W. Fu and S. Y. Shen, “Hypothesis testing for arbitrarily varying source with exponential-type constraint,” IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 892–895, 1998. [78] R. G. Gallager, “A simple derivation of the coding theorems and some applications,” EEE Transactions on Information Theory, vol. 11, no. 1, pp. 3–18, 1965. [79] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [80] R. G. Gallager, “Capacity and coding for degraded broadcast channels,” (in Russian), Problems of Informations on Transmission, vol. 10, no. 3, pp. 3–14, 1974. [81] R. G. Gallager, “A perspective on multiaccess channels,” IEEE Transactions on Information Theory, vol. 31, no. 1, pp. 124–142, 1985. [82] R. G. Gallager, “Claude E. Shannon: A retrospective on his life, work, and impact,” IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2681– 2695, 2001. [83] A. E. Gamal, “The capacity of a class of broadcast channels,” IEEE Transactions on Information Theory, vol. 25, no. 2, pp. 166–169, 1979. [84] S. I. Gelfand, “Capacity of one broadcast channel,” (in Russian), Problems on Information Transmission, vol. 13, no. 3, pp. 106–108, 1977. [85] S. I. Gelfand and M. S. Pinsker, “Capacity of broadcast channel with one deterministic component,” (in Russian), Problems on Information Transmission, vol. 16, no. 1, pp. 24–34, 1980. [86] S. I. Gelfand and M. S. Pinsker, “Coding for channel with random parameters,” Problems of Control and Information Theory, vol. 8, no. 1, pp. 19–31, 1980. [87] M. Gutman, “Asymptotically optimal classification for multiple test with empirically observed statistics,” IEEE Transactions on Information Theory, vol. 35, no. 2, pp. 401–408, 1989. [88] B. E. Hajek and M. B. Pursley, “Evaluation of an achievable rate region for the broadcast channel,” IEEE Transactions on Information Theory, vol. 25, pp. 36–46, 1979. [89] T. S. Han, “The capacity region of general multiple-access channel with certain correlated sources,” Information and Control, vol. 40, no. 1, pp. 37–60, 1979. [90] T. S. Han, “Slepian-Wolf-Cover theorem for networks of channels,” Information and Control, vol. 47, no. 1, pp. 67–83, 1980. [91] T. S. Han, “The capacity region for the deterministic broadcast channel with a common message,” IEEE Transactions on Information Theory, vol. 27, pp. 122–125, 1981. [92] T. S. Han, “Hypothesis testing with multiterminal data compression,” IEEE Transactions on Information Theory, vol. 33, no. 6, pp. 759–772, 1987. [93] T. S. Han, “Hypothesis testing with the general source,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2415–2427, 2000. [94] T. S. Han, Information-Spectrum Methods in Information Theory. Berlin: Springer Verlag, 2003.
References
163
[95] T. S. Han and S. Amari, “Statistical inference under multiterminal data compression,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2300– 2324, 1998. [96] T. S. Han and K. Kobayashi, “A new achievable region for the interference channel,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 49–60, 1981. [97] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for multiterminal hypothesis testing,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 2–13, 1989. [98] E. A. Haroutunian, “Upper estimate of transmission rate for memoryless channel with countable number of output signals under given error probability exponent,” in 3rd All Union Conference on Theory of Information Transmission and Coding, Uzhgorod, Publishing House of the Uzbek Academy of Science, pp. 83–86, Tashkent, 1967. (in Russian). [99] E. A. Haroutunian, “Estimates of the error probability exponent for a semicontinuous memoryless channel,” (in Russian), Problems on Information Transmission, vol. 4, no. 4, pp. 37–48, 1968. [100] E. A. Haroutunian, “On the optimality of information transmission by a channel with finite number of states known at the input,” (in Russian), Izvestiya Akademii Nauk Armenii, Matematika, vol. 4, no. 2, pp. 81–90, 1969. [101] E. A. Haroutunian, “Error probability lower bound for the multiple-access communication channels,” (in Russian), Problems of Information Transmission, vol. 11, no. 2, pp. 22–36, 1975. [102] E. A. Haroutunian, “Combinatorial method of construction of the upper bound for E-capacity,” (in Russian), Mezhvuz. Sbornic Nouchnikh Trudov, Matematika, Yerevan, vol. 1, pp. 213–220, 1982. [103] E. A. Haroutunian, “On asymptotically optimal testing of hypotheses concerning Markov chain,” (in Russian), Izvestiya Akademii Nauk Armenii, Matematika, vol. 23, no. 1, pp. 76–80, 1988. [104] E. A. Haroutunian, “Asymptotically optimal testing of many statistical hypotheses concerning Markov chain,” in 5th International Vilnius Conference on Probability Theory and Mathematical Statistics, vol. 1(A–L), pp. 202–203, 1989. [105] E. A. Haroutunian, “On asymptotically optimal criteria for Markov chains,” (in Russian), First World Congress of Bernoulli Society, section 2, vol. 2, no. 3, pp. 153–156, 1989. [106] E. A. Haroutunian, “Logarithmically asymptotically optimal testing of multiple statistical hypotheses,” Problems of Control and Information Theory, vol. 19, no. 5–6, pp. 413–421, 1990. [107] E. A. Haroutunian, “On Bounds for E-capacity of DMC,” IEEE Transactions on Information Theory, vol. 53, no. 11, pp. 4210–4220, 2007. [108] E. A. Haroutunian and B. Belbashir, “Lower estimate of optimal transmission rates with given error probability for discrete memoryless channel and for asymmetric broadcast channel,” in 6-th International Symposium on Information Theory, pp. 19–21, Tashkent, 1984. (in Russian).
164
References
[109] E. A. Haroutunian and N. M. Grigoryan, “Reliability approach for testing of many distributions for pair of Markov chains,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 29, pp. 89–96, 2007. [110] E. A. Haroutunian and P. M. Hakobyan, “On multiple hypotheses LAO testing for many independent objects,” In preparation. [111] E. A. Haroutunian and P. M. Hakobyan, “On multiple hypothesis testing by informed statistician for arbitrarily varying object and application to source coding,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 23, pp. 36– 46, 2004. [112] E. A. Haroutunian and P. M. Hakobyan, “On logarithmically asymptotically optimal testing of three distributions for pair of independent objects,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 24, pp. 76–81, 2005. [113] E. A. Haroutunian and P. M. Hakobyan, “On identification of distributions of two independent objects,” Transactions of the Institute for Informatics and Automation Problems of the NAS of RA, Mathematical Problems of Computer Science, vol. 28, pp. 114–119, 2007. [114] E. A. Haroutunian and A. N. Haroutunian, “The binary Hamming ratereliability-distortion function,” Transactions on Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 18, pp. 40–45, 1997. [115] E. A. Haroutunian, A. N. Haroutunian, and A. R. Kazarian (Ghazaryan), “On rate-reliabilities-distortions function of source with many receivers,” in Proceedings of Joint Session 6th Prague Symposium Asymptotic Statistics and 13-th Prague Conforence Information Theory, Statistical Decision Function Random Proceed, Vol. 1, pp. 217–220, Prague, 1998. [116] E. A. Haroutunian and M. E. Haroutunian, Information Theory. (in Armenian), Yerevan State University, p. 104, 1987. [117] E. A. Haroutunian and M. E. Haroutunian, “Channel with random parameter,” in Proceedings of 12-th Prague Conference on Information Theory, Statistical Decision Function Random Proceedings, p. 20, 1994. [118] E. A. Haroutunian and M. E. Haroutunian, “Bounds of E-capacity region for restricted two-way channel,” (in Russian), Problems of Information Transmission, vol. 34, no. 3, pp. 7–16, 1998. [119] E. A. Haroutunian, M. E. Haroutunian, and A. E. Avetissian, “Multipleaccess channel achievable rates region and reliability,” Izvestiya Akademii Nauk Armenii, Matematika, vol. 27, no. 5, pp. 51–67, 1992. [120] E. A. Haroutunian and A. N. Harutyunyan, “Successive refinement of information with reliability criterion,” in Proceedings of IEEE International Symposium on Information Theory, p. 205, Sorrento, Italy, 2000. [121] E. A. Haroutunian, A. N. Harutyunyan, and A. R. Ghazaryan, “On ratereliability-distortion function for robust descriptions system,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2690–2697, 2000.
References
165
[122] E. A. Haroutunian and A. R. Kazarian, “On cascade system coding rates with respect to distortion criteria and reliability,” Transactions of the Institute Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 18, pp. 19–32, 1997. [123] E. A. Haroutunian and R. S. Maroutian, “(E, ∆)-achievable rates for multiple descriptions of random varying source,” Problems of Control and Information Theory, vol. 20, no. 2, pp. 165–178, 1991. [124] E. A. Haroutunian and B. Mekoush, “Estimates of optimal rates of codes with given error probability exponent for certain sources,” in 6th International Symposium on Information Theory, vol. 1, pp. 22–23, Tashkent, 1984. (in Russian). [125] E. A. Haroutunian and A. O. Yessayan, “On hypothesis optimal testing for two differently distributed objects,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 25, pp. 89–94, 2006. [126] E. A. Haroutunian and A. O. Yessayan, “On logarithmically asymptotically optimal hypothesis testing for Pair of statistically dependent objects,” Transactions of the Institute for Informatics and Automation Problems NAS of RA, Mathematical Problems of Computer Science, vol. 29, pp. 117–123, 2007. [127] M. E. Haroutunian, “E-capacity of arbitrarily varying channel with informed encoder,” Problems of Information Transmission, (in Russian), vol. 26, no. 4, pp. 16–23, 1990. [128] M. E. Haroutunian, “Bounds of E-capacity for the channel with random parameter,” Problems of Information Transmission, (in Russian), vol. 27, no. 1, pp. 14–23, 1991. [129] M. E. Haroutunian, “About achievable rates of transmission for interference channel,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 20, pp. 79–89, 1998. [130] M. E. Haroutunian, “Random coding bound for E-capacity region of the broadcast channel,” Transactions of Institute Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 21, pp. 50–60, 2000. [131] M. E. Haroutunian, “New bounds for E-capacities of arbitrarily varying channel and channel with random parameter,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 22, pp. 44–59, 2001. [132] M. E. Haroutunian, “On E-capacity region of multiple-access channel,” (in Russian) Izvestiya Akademii Nauk Armenii, Matematika, vol. 38, no. 1, pp. 3– 22, 2003. [133] M. E. Haroutunian, “On multiple-access channel with random parameter,” in Proceedings of International Conference on Computer Science and Information Technology, pp. 174–178, Yerevan, Armenia, 2003. [134] M. E. Haroutunian, “Bounds of E-capacity for multiple-access chanel with random parameter,” in Lecture Notes in Computer Science, General Theory
166
[135]
[136]
[137]
[138]
[139]
[140]
[141]
[142]
[143]
[144]
[145] [146]
[147]
References of Information Transfer and Combinatorics, pp. 166–183, Springer Verlag, 2005. M. E. Haroutunian and A. H. Amirbekyan, “Random coding bound for E-capacity region of the channel with two inputs and two outputs,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 20, pp. 90–97, 1998. M. E. Haroutunian and S. A. Tonoyan, “On estimates of rate-reliabilitydistortion function for information hiding system,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 23, pp. 20–31, 2004. M. E. Haroutunian and S. A. Tonoyan, “Random coding bound of information hiding E-capacity,” Transactions of IEEE International Symposium Information Theory, Chicago, USA, p. 536, 2004. M. E. Haroutunian and S. A. Tonoyan, “On information hiding system with multiple messages,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 24, pp. 89–103, 2005. M. E. Haroutunian, S. A. Tonoyan, O. Koval, and S. Voloshynovskiy, “Random coding bound of reversible information hiding E-capacity,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 28, pp. 18–33, 2007. A. N. Haroutunian (Harutyunyan) and E. A. Haroutunian, “An achievable rates-reliabilities-distortions dependence for source coding with three descriptions,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 17, pp. 70–75, 1997. A. N. Harutyunyan, “Notes on conditions for successive refinement of information,” in Lecture Notes in Computer Science, General Theory of Information Transfer and Combinatorics, pp. 130–138, Springer Verlag, 2006. A. N. Harutyunyan and A. J. Han Vinck, “Error exponent in AVS coding,” in Proceedings of IEEE International Symposium on Information Theory, pp. 2166–2170, Seattle, WA, July 9–14 2006. A. N. Harutyunyan and E. A. Haroutunian, “On properties of rate-reliabilitydistortion function,” IEEE Transactions on Information Theory, vol. 50, no. 11, pp. 2768–2769, 1996. A. S. Hekstra and F. M. J. Willems, “Dependence balance bounds for singleoutput two-way channel,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 44–53, 1989. W. Hoeffding, “Asymptotically optimal tests for multinomial distributions,” Annals of Mathematical Statistics, vol. 36, pp. 369–401, 1965. B. L. Hughes and T. G. Thomas, “On error exponents for arbitrarily varying channels,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 87– 98, 1996. S. Ihara, Information Theory for Continuous Systems. Singapore: World Scientific, 1993.
References
167
[148] P. Jacket and W. Szpankovksi, “Markov types and minimax redundancy for Markov sources,” IEEE Transactions on Information Theory, vol. 50, no. 7, pp. 1393–1402, 2004. [149] J. Jahn, “Coding of arbitrarily varying multiuser channels,” IEEE Transactions on Information Theory, vol. 27, no. 2, pp. 212–226, 1981. [150] F. Jelinek, “Evaluation of expurgated bound exponents,” IEEE Transactions on Information Theory, vol. 14, pp. 501–505, 1968. [151] A. Kanlis and P. Narayan, “Error exponents for successive refinement by partitioning,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 275– 282, 1996. [152] V. D. Kolesnik and G. S. Poltirev, Textbook of Information Theory. (in Russian), Nauka, Moscow, 1982. [153] J. K¨ orner and K. Marton, “General broadcast channels with degraded message sets,” IEEE Transactions on Information Theory, vol. 23, pp. 60–64, 1977. [154] J. K¨ orner and A. Sgarro, “Universally attainable error exponents for broadcast channels with degraded message sets,” IEEE Transactions on Information Theory, vol. 26, pp. 670–679, 1980. [155] V. N. Koshelev, “Multilevel source coding and data-transmission theorem,” in Proceedings of VII All-Union Conference on Theory of Coding and Data Transmission, pp. 85–92, Vilnius, U.S.S.R., pt. 1, 1978. [156] V. N. Koshelev, “Hierarchical coding of discrete sources,” (in Russian), Problems of Information Transmission, vol. 16, no. 3, pp. 31–49, 1980. [157] V. N. Koshelev, “An evaluation of the average distortion for discrete scheme of sequential approximation,” (in Russian), Problems on Information Transmission, vol. 17, no. 3, pp. 20–30, 1981. [158] V. N. Koshelev, “On divisibility of discrete sources with the single-letteradditive measure of distortion,” Problems on Information Transmission, vol. 30, no. 1, pp. 31–50, (in Russian), 1994. [159] B. D. Kudryashov and G. S. Poltyrev, “Upper bounds for decoding error probability in some broadcast channels,” (in Russian), Problems on Information Transmission, vol. 15, no. 3, pp. 3–17, 1979. [160] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959. [161] E. Levitan and N. Merhav, “A competitive Neyman-Pearson approach to universal hypothesis testing with applications,” IEEE Transactions on Information Theory, vol. 48, no. 8, pp. 2215–2229, 2002. [162] Y. Liang and G. Kramer, “Rate regions for relay broadcast channels,” IEEE Transactions on Information Theory, vol. 53, no. 10, pp. 3517–3535, 2007. [163] Y. N. Lin’kov, “On asymptotical discrimination of two simple statistical hypotheses,” (in Russian), Preprint 86.45, Kiev, 1986. [164] Y. N. Lin’kov, “Methods of solving asymptotical problems of two simple statistical hypotheses testing,” (in Russian), Preprint 89.05, Doneck, 1989. [165] Y. N. Lin’kov, Asymptotical Methods of Random Processes Statistics, (in Russian). Naukova Dumka, Kiev, 1993. [166] Y. S. Liu and B. L. Hughes, “A new universal coding bound for the multipleaccess channel,” IEEE Transactions on Information Theory, vol. 42, pp. 376– 386, 1996.
168
References
[167] G. Longo and A. Sgarro, “The error exponent for the testing of simple statistical hypotheses: A combinatorial approach,” Journal of Combinatories, Informational System Sciences, vol. 5, no. 1, pp. 58–67, 1980. [168] A. A. Lyapunov, “On selection between finite number of distributions,” (in Russian), Uspekhi Matematicheskikh Nauk, vol. 6, no. 1, pp. 178–186, 1951. [169] I. Mari´c, R. D. Yates, and G. Kramer, “Capacity of interference channels with partial transmitter cooperation,” IEEE Transactions on Information Theory, vol. 53, no. 10, pp. 3536–3548, 2007. [170] R. S. Maroutian, “Achievable rates for multiple descriptions with given exponent and distortion levels,” (in Russian), Problems on Information Transmission, vol. 26, no. 1, pp. 83–89, 1990. [171] K. Marton, “Error exponent for source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 197–199, 1974. [172] K. Marton, “A coding theorem for the discrete memoryless broadcast channel,” IEEE Transactions on Information Theory, vol. 25, pp. 306–311, 1979. [173] U. M. Maurer, “Authentication theorey and hypothesis testing,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1350–1356, 2000. [174] N. Merhav, “On random coding error exponents of watermarking systems,” IEEE Transactions on Information Theory, vol. 46, no. 2, pp. 420–430, 2000. [175] P. Moulin and J. A. O’Sullivan, “Information theoretic analysis of information hiding,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 563– 593, 2003. [176] P. Moulin and Y. Wang, “Capacity and random-coding exponents for channel coding with side information,” IEEE Transactions on Information Theory, vol. 53, no. 4, pp. 1326–1347, 2007. [177] S. Natarajan, “Large deviations, hypotheses testing, and source coding for finite Markov chains,” IEEE Transactions on Information Theory, vol. 31, no. 3, pp. 360–365, 1985. [178] J. K. Omura, “A lower bounding method for channel and source coding probabilities,” Information and Control, vol. 27, pp. 148–177, 1975. [179] A. Peres, “Second-type-error exponent given the first-type-error exponent in the testing statistical hypotheses by unfitted procedures,” in 6th International Symposium on Information Theory, pp. 277–279, Tashkent, Part 1, 1984. [180] M. S. Pinsker, “Capacity of noiseless broadcast channels,” Problems on Information Transmission, (in Russian), vol. 14, no. 2, pp. 28–34, 1978. [181] M. S. Pinsker, “Multi-user channels,” in II Joint Swedish-Soviet International workshop on Information Theory, pp. 160–165, Gr¨ anna, Sweden, 1985. [182] J. Pokorny and H. M. Wallmeier, “Random coding bound and codes produced by permutations for the multiple-access channel,” IEEE Transactions on Information Theory, vol. 31, pp. 741–750, 1985. [183] G. S. Poltyrev, “Random coding bounds for some broadcast channels,” (in Russian), Problems on Information Transmission, vol. 19, no. 1, pp. 9–20, 1983. [184] H. V. Poor and S. Verd´ u, “A lower bound on the probability of error in multihypothesis testing,” IEEE Transactions on Information Theory, vol. 41, no. 6, pp. 1992–1995, 1995.
References
169
[185] V. V. Prelov, “Information transmission by the multiple access channel with certain hierarchy of sources,” (in Russian), Problems on Information Transmission, vol. 20, no. 4, pp. 3–10, 1984. [186] Puhalskii and Spokoiny, “On large deviation efficiency in statistical inference,” Bernoulli, vol. 4, no. 2, pp. 203–272, 1998. [187] B. Rimoldi, “Successive refinement of information: Characterization of the achievable rates,” IEEE Transactions on Information Theory, vol. 40, no. 1, pp. 253–259, 1994. [188] H. Sato, “Two-user communication channels,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 295–304, 1977. [189] H. Sato, “On the capacity region of a discrete two-user channel for strong interference,” IEEE Transactions on Information Theory, vol. 3, pp. 377–379, 1978. [190] H. Sato, “An outer bound to the capacity region of broadcast channel,” IEEE Transactions on Information Theory, vol. 24, pp. 374–377, 1979. [191] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948. [192] C. E. Shannon, “The zero-error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, no. 1, pp. 8–19, 1956. [193] C. E. Shannon, “Channel with side information at the transmitter,” IBM Journal on Research and Development, vol. 2, no. 4, pp. 289–293, 1958. [194] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” IRE National Convention Record, vol. 7, pp. 142–163, 1959. [195] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” Bell System Technical Journal, vol. 38, no. 5, pp. 611–656, 1959. [196] C. E. Shannon, “Two-way communication channels,” in Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability, pp. 611–644, Berkeley: University of California Press, 1961. [197] C. E. Shannon, “Works on information theory and cybernetics,” in Collection of Papers, (R. L. Dobrushin and O. B. Lupanov, eds.), Moscow: Publishing House of Foreign Literature, 1963. (in Russian). [198] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp, “Lower bounds to error probability for coding in discrete memoryless channel,” Information and Control, vol. 10, no. 1, pp. 65–103, no. 2, pp. 523–552, 1967. [199] D. Slepian and J. K. Wolf, “A coding theorem for multiple access channels with correlated sources,” Bell System Technical Journal, vol. 52, pp. 1037– 1076, 1973. [200] A. Somekh-Baruch and N. Merhav, “On error exponent and capacity games of private watermarking systems,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 537–562, 2003. [201] A. Somekh-Baruch and N. Merhav, “On the capacity game of public watermarking systems,” IEEE Transactions on Information Theory, vol. 50, no. 3, pp. 511–524, 2004. [202] A. Somekh-Baruch and N. Merhav, “On the random coding error exponents of the single-user and the multiple-access Gelfand-Pinsker channels,” in Pro-
170
[203] [204]
[205] [206] [207] [208]
[209]
[210]
[211]
[212]
[213]
[214]
[215] [216] [217] [218]
References ceedings of IEEE International Symposium on Information Theory, p. 448, Chicago, USA, 2004. H. H. Tan, “Two-user interference channels with correlated information sources,” Information and Control, vol. 44, no. 1, pp. 77–104, 1980. S. A. Tonoyan, “Computation of information hiding capacity and E-capacity lower bounds,” Transactions of the Institute for Informatics and Automation Problems NAS of RA and YSU, Mathematical Problems of Computer Science, vol. 26, pp. 33–37, 2006. E. Tuncel, “On error exponent in hypothesis testing,” IEEE Transactions on Information Theory, vol. 51, no. 8, pp. 2945–2950, 2005. E. Tuncel and K. Rose, “Error exponents in scalable source coding,” IEEE Transactions on Information Theory, vol. 49, pp. 289–296, January 2003. G. Tusn´ ady, “On asymptotically optimal tests,” Annals of Statistics, vol. 5, no. 2, pp. 385–393, 1977. G. Tusn´ ady, “Testing statistical hypoteses (an information theoretic aproach),” Preprint, Mathematical Institute Hungarian Academy Sciences, Budapest, 1979, 1982. E. C. van der Meulen, “The discrete memoryless channel with two senders and one receiver,” in Proceedings of 2nd International Symposium on Information Theory, Tsahkadzor, Armenia, 1971, pp. 103–135, Budapest: Akad. Kiado, 1973. E. C. van der Meulen, “Random coding theorems for the general discrete memoryless broadcast channel,” IEEE Transactions on Information Theory, vol. 21, pp. 180–190, 1975. E. C. van der Meulen, “A Survey of multi-way channels in information theory: 1961–1976,” IEEE Transactions on Information Theory, vol. 23, no. 1, pp. 1–37, 1977. E. C. van der Meulen, “Some recent results on the asymmetric multiple-access channel,” in Proceedings of 2nd Joint Swedish-Soviet International Workshop on Information Theory, pp. 172–176, Granna, Sweden, 1985. E. C. van der Meulen, E. A. Haroutunian, A. N. Harutyunyan, and A. R. Ghazaryan, “On the rate-reliability-distortion and partial secrecy region of a one-stage branching communication system,” in Proceedings of IEEE International Symposium on Information Theory, p. 211, Sorrento, Italy, 2000. S. Verd´ u, “Asymptotic error probability of binary hypothesis testing for poisson point-process observations,” IEEE Transactions on Information Theory, vol. 32, no. 1, pp. 113–115, 1986. S. Verd´ u, “Guest Editorial,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2042–2043, 1998. S. Verd´ u, Multiuser Detection. Cambridge University Press, 1998. F. M. J. Willems, Information Theoretical Results for the Discrete Memoryless Multiple Access Channel. PhD thesis, Katholieke University Leuven, 1982. F. M. J. Willems, “The maximal-error and average-error capacity region of the broadcast channel are identical,” Problems of Control and Information Theory, vol. 19, no. 4, pp. 339–347, 1990.
References
171
[219] F. M. J. Willems and E. C. van der Meulen, “The discrete memoryless multiple-access channel with cribbing encoders,” IEEE Transactions on Information Theory, vol. 31, no. 3, pp. 313–327, 1985. [220] J. Wolf, A. D. Wyner, and J. Ziv, “Source coding for multiple description,” Bell System Technical Journal, vol. 59, no. 8, pp. 1417–1426, 1980. [221] J. Wolfowitz, “Simultaneous channels,” Archive for Rational Mechanics and Analysis, vol. 4, no. 4, pp. 371–386, 1960. [222] J. Wolfowitz, Coding Theorems of Information Theory. Berlin-Heidelberg: Springer Verlag, 3rd ed., 1978. [223] H. Yamamoto, “Source coding theory for cascade and branching communication systems,” IEEE Transactions on Information Theory, vol. 27, no. 3, pp. 299–308, 1981. [224] R. Yeung, A First Course in Information Theory. New York: Kluwer Academic, 2002. [225] O. Zeitouni and M. Gutman, “On universal hypotheses testing via large deviations,” IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 285–290, 1991. [226] Z. Zhang and T. Berger, “New results in binary multiple description,” IEEE Transactions on Information Theory, vol. 33, no. 4, pp. 502–521, 1987. [227] Z. Zhang and T. Berger, “Estimation via compressed information,” IEEE Transactions on Information Theory, vol. 34, no. 2, pp. 198–211, 1988. [228] Z. Zhang, T. Berger, and J. P. M. Schalkwijk, “New outer bounds to capacity regions of two-way channels,” IEEE Transactions on Information Theory, vol. 32, no. 3, pp. 383–386, 1986. [229] J. Ziv, “On classification with empirically observed statistics and universal data compression,” IEEE Transactions on Information Theory, vol. 34, no. 2, pp. 278–286, 1988. [230] G. Zoutendijk, Methods of Feasible Directions. A Study in Linear and NonLinear Programming. Amsterdam: Elsevier, 1960.